A practical guide covering architecture, workloads, networking, config, and debugging.
- Architecture
- Pods
- Labels & Selectors
- Deployments
- Resource Limits
- Services
- Ingress
- Network Policies
- ConfigMaps
- Secrets
- Debugging Pods
- Health Checks
The control plane manages overall cluster state. Its core components are:
| Component | Role |
|---|---|
| kube-apiserver | Front door to the cluster — all communication goes through it via REST |
| etcd | Distributed key-value store holding all cluster state and config |
| kube-scheduler | Assigns unscheduled Pods to nodes based on resources, taints, affinity |
| kube-controller-manager | Runs controllers (Deployment, ReplicaSet, Node) to reconcile desired vs actual state |
| cloud-controller-manager | (Optional) Integrates with cloud provider APIs for LBs, persistent volumes, etc. |
The kubelet is the agent running on every worker node. It:
- Watches the API server for Pods scheduled to its node
- Instructs the container runtime (e.g., containerd) to pull images and start/stop containers
- Reports node and Pod status back to the control plane
- Runs liveness/readiness/startup probes and restarts containers when needed
Note: The kubelet only manages containers it created through Kubernetes — not any containers started manually on the node.
etcd is the single source of truth for the entire cluster. Every object you create — Pods, Services, ConfigMaps, Secrets — is stored here as key-value data.
- The API server is the only component that talks directly to etcd
- If etcd loses data, the cluster loses its state
- Always run with redundancy in production (typically 3 or 5 nodes)
kube-proxy runs on every node and programs network rules (iptables or IPVS) that implement Kubernetes Services.
When you create a Service, kube-proxy ensures that traffic to the Service's ClusterIP gets forwarded to the correct backend Pod IPs, and handles load balancing across Pod endpoints at the network level.
Kubernetes needed a unit that could group tightly coupled containers that must share the same network namespace and storage. A Pod gives all its containers:
- The same IP address
- The same localhost
- Optionally shared volumes
You can't schedule a raw container in Kubernetes — the Pod is the scheduling unit. In practice most Pods have one container, but the abstraction enables patterns like sidecars.
A sidecar is a second container in the same Pod that extends or supports the main container without being part of its core logic. They share the same network and can share volumes.
Use case — Log shipping:
Main container → writes logs to shared volume
Sidecar (Fluentd) → reads that file, ships logs to Elasticsearch
The app doesn't need to know anything about the logging infrastructure.
Other examples: Envoy proxy (service mesh), secrets-syncing sidecars, metrics exporters.
The kubelet detects the crash and restarts the container per the Pod's restartPolicy:
| Policy | Behaviour |
|---|---|
Always (default) |
Always restart — used for long-running services |
OnFailure |
Restart only if exit code is non-zero — used for Jobs |
Never |
Don't restart |
If the container keeps crashing, Kubernetes applies exponential back-off (10s → 20s → 40s → up to 5 min). This is the CrashLoopBackOff state. The Pod itself stays alive; only the container restarts.
| Phase | Meaning |
|---|---|
| Pending | Pod accepted by the cluster but not yet scheduled, or containers not yet started |
| Running | Pod is bound to a node and at least one container is running |
| Succeeded | All containers exited with code 0 and won't be restarted (common for Jobs) |
| Failed | All containers terminated and at least one exited with a non-zero code |
| Unknown | Pod state can't be determined, usually due to a node communication failure |
kubectl apply → API server stores spec in etcd
↓
Scheduler picks a node, updates nodeName
↓
kubelet sees assignment, pulls image(s)
↓
Init containers run (sequentially, must all succeed)
↓
App containers start
↓
Startup probe runs (if set)
↓
Readiness + Liveness probes begin
↓
Pod marked Ready → receives traffic from Services
↓
On deletion: PreStop hook → SIGTERM → grace period → SIGKILL
Via label selectors. A Service has a selector field (e.g., app: my-api). Kubernetes constantly watches for Pods matching that label and populates the Service's Endpoints object with their IPs.
Traffic sent to the Service IP is forwarded to those Pod IPs. If no Pods match, the Service has no endpoints and traffic goes nowhere.
| Labels | Annotations | |
|---|---|---|
| Purpose | Identification & selection | Non-identifying metadata |
| Used by selectors? | ✅ Yes | ❌ No |
| Value size | Short, concise | Can be large (e.g. JSON blobs) |
| Examples | app: api, env: prod |
Git commit SHA, contact info, tool config |
Think of labels as tags for filtering; annotations as sticky notes for humans and tools.
# Single label
kubectl get pods -l app=my-api
# Multiple labels (AND condition)
kubectl get pods -l app=my-api,env=production
# Set-based selector
kubectl get pods -l 'env in (staging, production)'
# Using --selector (same as -l)
kubectl get pods --selector=app=my-api| ReplicaSet | Deployment | |
|---|---|---|
| Maintains N replicas | ✅ | ✅ (via RS) |
| Handles rolling updates | ❌ | ✅ |
| Stores rollout history | ❌ | ✅ |
| Supports rollback | ❌ | ✅ |
A ReplicaSet ensures N Pods are running but has no concept of graceful updates.
A Deployment manages ReplicaSets — on update, it creates a new RS and gradually shifts traffic. Always use Deployments, not bare ReplicaSets.
Rolling update flow:
Old RS: 4 pods New RS: 0 pods
Old RS: 3 pods → New RS: 1 pod
Old RS: 2 pods → New RS: 2 pods
Old RS: 1 pod → New RS: 3 pods
Old RS: 0 pods → New RS: 4 pods ✅
Pace is controlled by maxSurge and maxUnavailable.
Rolling back:
# Roll back to previous version
kubectl rollout undo deployment/my-api
# Roll back to a specific revision
kubectl rollout undo deployment/my-api --to-revision=2
# View rollout history
kubectl rollout history deployment/my-apiBoth are set under spec.strategy.rollingUpdate:
| Field | Default | Meaning |
|---|---|---|
maxUnavailable |
25% |
How many Pods can be unavailable during the update. Set to 0 for zero-downtime deployments. |
maxSurge |
25% |
How many extra Pods can exist above the desired count. Set to 0 when resources are tight. |
Example: 4 replicas, maxSurge: 1, maxUnavailable: 0 → at most 5 Pods exist at once, and availability is always maintained.
| Requests | Limits | |
|---|---|---|
| Used by | Scheduler (for placement) | Kernel / cgroup enforcement |
| Guarantee | Container gets at least this | Container cannot exceed this |
| CPU behaviour | Guaranteed allocation | Throttled if exceeded |
| Memory behaviour | Guaranteed allocation | Container killed (OOMKilled) |
Always set both. If you only set limits, requests default to the same value. If you set neither, the Pod is BestEffort and evicted first under pressure.
The Linux kernel's OOM killer kills the container process immediately.
Kubernetes marks this as OOMKilled and restarts the container per the restartPolicy. This is not a graceful shutdown.
OOMKilled (Out Of Memory Killed) means the container exceeded its memory limit and the kernel killed it.
Diagnose:
kubectl describe pod <pod> # look for "OOMKilled" in Last StateFix:
- Increase the memory limit if usage is legitimate and you have node capacity
- Fix memory leaks in your application
- Tune JVM heap settings (common in Java apps that ignore container limits, e.g.
-XX:MaxRAMPercentage=75) - Use a VPA (Vertical Pod Autoscaler) to right-size automatically
| Type | Accessible From | Use Case |
|---|---|---|
| ClusterIP (default) | Inside cluster only | Internal service-to-service communication |
| NodePort | <any-node-ip>:<nodePort> (30000–32767) |
Dev/testing; rarely used in production directly |
| LoadBalancer | External internet (cloud LB provisioned) | Exposing services publicly in cloud environments |
| ExternalName | Inside cluster | Maps a Service to an external DNS name |
Via DNS. Kubernetes runs CoreDNS in the cluster. Every Service gets a DNS name:
<service-name>.<namespace>.svc.cluster.local
Within the same namespace, the short name works:
http://payments-service/charge
Kubernetes also injects environment variables into Pods, but DNS is the standard approach.
A Service with clusterIP: None. Kubernetes doesn't assign a virtual IP — instead, DNS returns the individual Pod IPs directly. Useful when:
- Clients need to connect to specific instances (e.g., databases)
- Service meshes do their own load balancing
- Used with StatefulSets (each Pod gets a stable DNS entry)
A LoadBalancer Service provisions one external cloud LB per service — expensive and unscalable at many services.
Ingress lets you use a single load balancer (one external IP) to route traffic to multiple services based on HTTP host or path rules. It also handles TLS termination natively.
Internet → Ingress (1 LB, 1 IP)
├── /api → api-service
├── /auth → auth-service
└── / → frontend-service
Ingress resources are just config objects — they do nothing by themselves.
An Ingress controller is the actual implementation that reads those rules and configures a real proxy.
Common controllers:
nginx-ingress- AWS ALB Ingress Controller
- Traefik
- HAProxy / Contour
You must deploy a controller for Ingress resources to have any effect.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-ingress
spec:
rules:
- host: myapp.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 8080
- path: /
pathType: Prefix
backend:
service:
name: frontend-service
port:
number: 80
⚠️ More specific paths should come first —/apibefore/.
A NetworkPolicy defines firewall rules for Pods — controlling which Pods can talk to which other Pods (or external IPs) on which ports.
By default, Kubernetes has no network isolation. As a backend dev, NetworkPolicies let you:
- Ensure only authorized services can reach your API
- Prevent a compromised Pod from reaching your database
- Reduce the blast radius of a security incident
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-only
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- port: 8080This blocks all other ingress to backend pods except from frontend-labeled pods on port 8080.
All traffic is allowed by default. Any Pod can reach any other Pod in the cluster on any port.
Once you apply even one NetworkPolicy that selects a Pod, it becomes deny-by-default for that Pod — only explicitly allowed traffic is permitted.
⚠️ The first NetworkPolicy you apply can be a breaking change if you're not accounting for all the traffic you need to allow.
Option 1 — Environment variables:
envFrom:
- configMapRef:
name: my-configSimple, but the app must restart to pick up changes.
Option 2 — Volume mount:
volumes:
- name: config-vol
configMap:
name: my-config
volumeMounts:
- mountPath: /etc/config
name: config-volFiles appear in the container. Kubernetes updates them when the ConfigMap changes (~1 min sync), but your app must re-read the file to pick up the change.
Partially. If mounted as a volume, the files on disk are eventually updated (kubelet syncs every ~1 minute). But the running process does not automatically restart — you need your app to watch the file, or use a tool like Reloader to trigger a rolling restart.
If injected as env vars, the Pod must be restarted entirely to see new values.
| Use ConfigMap when... | Use plain env vars when... |
|---|---|
| Config is shared across multiple Pods | Value is simple and per-deployment |
| Config may change independently of the deployment | Value won't change between deployments |
| Config is complex (multi-line files, many keys) | You want minimal overhead |
| You want to manage config separately in Git/CI |
❌ Never put secrets in either — use Secrets or an external vault.
Functionally they work similarly (env vars or volume mounts), but Secrets are intended for sensitive data (passwords, tokens, keys). Kubernetes:
- Stores them base64-encoded (not encrypted by default)
- Restricts access via RBAC
- Doesn't log values in
kubectl describe - Can integrate with external secret stores
They signal intent — team members and tooling know to treat these values carefully.
Not really. By default, Secrets are stored in etcd as base64-encoded (not encrypted) data. Anyone with etcd access or sufficient RBAC permissions can read them.
To make them genuinely secure you need:
- ✅ Encryption at rest (
EncryptionConfigurationin the API server) - ✅ Tight RBAC (restrict who can
get/listsecrets) - ✅ Audit logging
- ✅ External secrets manager (HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager) via an operator like External Secrets Operator
Best practice:
- Store credentials in an external secrets manager (Vault, AWS Secrets Manager, etc.)
- Use External Secrets Operator or a Vault agent sidecar to sync the secret into a Kubernetes Secret
- Mount the Secret as a volume or env var in your Pod
- Restrict access via RBAC so only the relevant ServiceAccount can read the secret
- Enable etcd encryption at rest
❌ Never hardcode credentials in images or ConfigMaps, and never commit them to Git.
# 1. Check events and last state
kubectl describe pod <pod>
# 2. Check current logs
kubectl logs <pod>
# 3. Check logs from the PREVIOUS (crashed) container — most useful
kubectl logs <pod> --previous
# 4. Manually run the image to poke around
kubectl run debug --image=<same-image> --command -- sleep 3600
kubectl exec -it debug -- /bin/shCommon exit codes:
| Code | Meaning |
|---|---|
0 |
Clean exit |
1 |
Application error |
137 |
OOMKilled (128 + SIGKILL) |
143 |
SIGTERM received |
Also check: misconfigured env vars, missing ConfigMaps/Secrets, failed startup probes, resource limits.
A Pending Pod hasn't been scheduled to a node yet.
| Cause | How to fix |
|---|---|
| Insufficient CPU/memory | Scale the cluster or reduce resource requests |
| Unschedulable | Check node selectors, affinity rules, taints/tolerations |
| PVC not bound | Check if the PersistentVolumeClaim is stuck in Pending |
| Resource quota exceeded | Check namespace quotas with kubectl describe quota |
| No matching nodes | Ensure node labels match nodeSelector |
# Events section tells you exactly why scheduling failed
kubectl describe pod <pod># Basic shell access
kubectl exec -it <pod-name> -- /bin/sh
# If bash is available
kubectl exec -it <pod-name> -- /bin/bash
# Specific container in a multi-container Pod
kubectl exec -it <pod-name> -c <container-name> -- /bin/sh
# If the container has no shell (e.g., distroless image)
kubectl debug -it <pod-name> --image=busybox --target=<container-name>| Probe | Question it answers | On failure | Container restarted? |
|---|---|---|---|
| Liveness | Is the container still alive? | Kills & restarts the container | ✅ Yes |
| Readiness | Is the container ready for traffic? | Removed from Service endpoints | ❌ No |
| Startup | Has the app finished starting up? | Kills & restarts (while running) | ✅ Yes |
Execution order: Startup → (once passed) Liveness + Readiness run concurrently.
Kubernetes kills the container (SIGTERM → grace period → SIGKILL) and restarts it. If this keeps failing, it enters CrashLoopBackOff with exponential back-off.
Important: A failing liveness probe causes restarts, not traffic removal. Traffic removal is readiness probe's job.
Use a readiness probe whenever your API isn't immediately ready to serve traffic after the process starts:
- App needs time to connect to a database or warm up a connection pool
- App loads a large model or cache into memory at startup
- App runs database migrations on startup before it can serve requests
- App has a
/health/readyendpoint that checks downstream dependencies
Without a readiness probe, Kubernetes sends traffic to the Pod as soon as the container starts, causing errors during the warm-up window.