nav16/k8.md

Kubernetes Reference

A practical guide covering architecture, workloads, networking, config, and debugging.

1. Architecture

What are the main components of the control plane?

The control plane manages overall cluster state. Its core components are:

Component	Role
kube-apiserver	Front door to the cluster — all communication goes through it via REST
etcd	Distributed key-value store holding all cluster state and config
kube-scheduler	Assigns unscheduled Pods to nodes based on resources, taints, affinity
kube-controller-manager	Runs controllers (Deployment, ReplicaSet, Node) to reconcile desired vs actual state
cloud-controller-manager	(Optional) Integrates with cloud provider APIs for LBs, persistent volumes, etc.

What does the kubelet do on a worker node?

The kubelet is the agent running on every worker node. It:

Watches the API server for Pods scheduled to its node
Instructs the container runtime (e.g., containerd) to pull images and start/stop containers
Reports node and Pod status back to the control plane
Runs liveness/readiness/startup probes and restarts containers when needed

Note: The kubelet only manages containers it created through Kubernetes — not any containers started manually on the node.

How does etcd fit into the cluster?

etcd is the single source of truth for the entire cluster. Every object you create — Pods, Services, ConfigMaps, Secrets — is stored here as key-value data.

The API server is the only component that talks directly to etcd
If etcd loses data, the cluster loses its state
Always run with redundancy in production (typically 3 or 5 nodes)

What is kube-proxy responsible for?

kube-proxy runs on every node and programs network rules (iptables or IPVS) that implement Kubernetes Services.
When you create a Service, kube-proxy ensures that traffic to the Service's ClusterIP gets forwarded to the correct backend Pod IPs, and handles load balancing across Pod endpoints at the network level.

2. Pods

Why is a Pod the smallest deployable unit, not a container?

Kubernetes needed a unit that could group tightly coupled containers that must share the same network namespace and storage. A Pod gives all its containers:

The same IP address
The same localhost
Optionally shared volumes

You can't schedule a raw container in Kubernetes — the Pod is the scheduling unit. In practice most Pods have one container, but the abstraction enables patterns like sidecars.

What is a sidecar container pattern? Give a use case.

A sidecar is a second container in the same Pod that extends or supports the main container without being part of its core logic. They share the same network and can share volumes.

Use case — Log shipping:

Main container  →  writes logs to shared volume
Sidecar (Fluentd)  →  reads that file, ships logs to Elasticsearch

The app doesn't need to know anything about the logging infrastructure.

Other examples: Envoy proxy (service mesh), secrets-syncing sidecars, metrics exporters.

What happens when a container in a Pod crashes?

The kubelet detects the crash and restarts the container per the Pod's restartPolicy:

Policy	Behaviour
`Always` (default)	Always restart — used for long-running services
`OnFailure`	Restart only if exit code is non-zero — used for Jobs
`Never`	Don't restart

If the container keeps crashing, Kubernetes applies exponential back-off (10s → 20s → 40s → up to 5 min). This is the CrashLoopBackOff state. The Pod itself stays alive; only the container restarts.

Explain Pod lifecycle phases

Phase	Meaning
Pending	Pod accepted by the cluster but not yet scheduled, or containers not yet started
Running	Pod is bound to a node and at least one container is running
Succeeded	All containers exited with code `0` and won't be restarted (common for Jobs)
Failed	All containers terminated and at least one exited with a non-zero code
Unknown	Pod state can't be determined, usually due to a node communication failure

Explain Pod lifecycle (end-to-end)

kubectl apply  →  API server stores spec in etcd
                      ↓
              Scheduler picks a node, updates nodeName
                      ↓
              kubelet sees assignment, pulls image(s)
                      ↓
              Init containers run (sequentially, must all succeed)
                      ↓
              App containers start
                      ↓
              Startup probe runs (if set)
                      ↓
              Readiness + Liveness probes begin
                      ↓
              Pod marked Ready → receives traffic from Services
                      ↓
              On deletion: PreStop hook → SIGTERM → grace period → SIGKILL

3. Labels & Selectors

How do Services find their target Pods?

Via label selectors. A Service has a selector field (e.g., app: my-api). Kubernetes constantly watches for Pods matching that label and populates the Service's Endpoints object with their IPs.
Traffic sent to the Service IP is forwarded to those Pod IPs. If no Pods match, the Service has no endpoints and traffic goes nowhere.

What's the difference between labels and annotations?

	Labels	Annotations
Purpose	Identification & selection	Non-identifying metadata
Used by selectors?	✅ Yes	❌ No
Value size	Short, concise	Can be large (e.g. JSON blobs)
Examples	`app: api`, `env: prod`	Git commit SHA, contact info, tool config

Think of labels as tags for filtering; annotations as sticky notes for humans and tools.

How do you filter Pods by label in kubectl?

# Single label
kubectl get pods -l app=my-api

# Multiple labels (AND condition)
kubectl get pods -l app=my-api,env=production

# Set-based selector
kubectl get pods -l 'env in (staging, production)'

# Using --selector (same as -l)
kubectl get pods --selector=app=my-api

4. Deployments

Deployment vs ReplicaSet — what's the difference?

	ReplicaSet	Deployment
Maintains N replicas	✅	✅ (via RS)
Handles rolling updates	❌	✅
Stores rollout history	❌	✅
Supports rollback	❌	✅

A ReplicaSet ensures N Pods are running but has no concept of graceful updates.
A Deployment manages ReplicaSets — on update, it creates a new RS and gradually shifts traffic. Always use Deployments, not bare ReplicaSets.

How does a rolling update work? How do you roll back?

Rolling update flow:

Old RS: 4 pods          New RS: 0 pods
Old RS: 3 pods    →     New RS: 1 pod
Old RS: 2 pods    →     New RS: 2 pods
Old RS: 1 pod     →     New RS: 3 pods
Old RS: 0 pods    →     New RS: 4 pods  ✅

Pace is controlled by maxSurge and maxUnavailable.

Rolling back:

# Roll back to previous version
kubectl rollout undo deployment/my-api

# Roll back to a specific revision
kubectl rollout undo deployment/my-api --to-revision=2

# View rollout history
kubectl rollout history deployment/my-api

What are maxSurge and maxUnavailable?

Both are set under spec.strategy.rollingUpdate:

Field	Default	Meaning
`maxUnavailable`	`25%`	How many Pods can be unavailable during the update. Set to `0` for zero-downtime deployments.
`maxSurge`	`25%`	How many extra Pods can exist above the desired count. Set to `0` when resources are tight.

Example: 4 replicas, maxSurge: 1, maxUnavailable: 0 → at most 5 Pods exist at once, and availability is always maintained.

5. Resource Limits

Difference between requests and limits?

	Requests	Limits
Used by	Scheduler (for placement)	Kernel / cgroup enforcement
Guarantee	Container gets at least this	Container cannot exceed this
CPU behaviour	Guaranteed allocation	Throttled if exceeded
Memory behaviour	Guaranteed allocation	Container killed (OOMKilled)

Always set both. If you only set limits, requests default to the same value. If you set neither, the Pod is BestEffort and evicted first under pressure.

What happens if a container exceeds its memory limit?

The Linux kernel's OOM killer kills the container process immediately.
Kubernetes marks this as OOMKilled and restarts the container per the restartPolicy. This is not a graceful shutdown.

What is OOMKilled and how do you fix it?

OOMKilled (Out Of Memory Killed) means the container exceeded its memory limit and the kernel killed it.

Diagnose:

kubectl describe pod <pod>   # look for "OOMKilled" in Last State

Fix:

Increase the memory limit if usage is legitimate and you have node capacity
Fix memory leaks in your application
Tune JVM heap settings (common in Java apps that ignore container limits, e.g. -XX:MaxRAMPercentage=75)
Use a VPA (Vertical Pod Autoscaler) to right-size automatically

6. Services

Explain ClusterIP, NodePort, and LoadBalancer types

Type	Accessible From	Use Case
ClusterIP (default)	Inside cluster only	Internal service-to-service communication
NodePort	`<any-node-ip>:<nodePort>` (30000–32767)	Dev/testing; rarely used in production directly
LoadBalancer	External internet (cloud LB provisioned)	Exposing services publicly in cloud environments
ExternalName	Inside cluster	Maps a Service to an external DNS name

How does a backend service discover another service?

Via DNS. Kubernetes runs CoreDNS in the cluster. Every Service gets a DNS name:

<service-name>.<namespace>.svc.cluster.local

Within the same namespace, the short name works:

http://payments-service/charge

Kubernetes also injects environment variables into Pods, but DNS is the standard approach.

What is a headless service?

A Service with clusterIP: None. Kubernetes doesn't assign a virtual IP — instead, DNS returns the individual Pod IPs directly. Useful when:

Clients need to connect to specific instances (e.g., databases)
Service meshes do their own load balancing
Used with StatefulSets (each Pod gets a stable DNS entry)

7. Ingress

What problem does Ingress solve vs a LoadBalancer Service?

A LoadBalancer Service provisions one external cloud LB per service — expensive and unscalable at many services.

Ingress lets you use a single load balancer (one external IP) to route traffic to multiple services based on HTTP host or path rules. It also handles TLS termination natively.

Internet → Ingress (1 LB, 1 IP)
               ├── /api  →  api-service
               ├── /auth →  auth-service
               └── /     →  frontend-service

What is an Ingress controller?

Ingress resources are just config objects — they do nothing by themselves.
An Ingress controller is the actual implementation that reads those rules and configures a real proxy.

Common controllers:

nginx-ingress
AWS ALB Ingress Controller
Traefik
HAProxy / Contour

You must deploy a controller for Ingress resources to have any effect.

How would you route /api to one service and / to another?

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
spec:
  rules:
  - host: myapp.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 8080
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 80

⚠️ More specific paths should come first — /api before /.

8. Network Policies

What is a NetworkPolicy and why does a backend dev care?

A NetworkPolicy defines firewall rules for Pods — controlling which Pods can talk to which other Pods (or external IPs) on which ports.

By default, Kubernetes has no network isolation. As a backend dev, NetworkPolicies let you:

Ensure only authorized services can reach your API
Prevent a compromised Pod from reaching your database
Reduce the blast radius of a security incident

How do you allow only the frontend Pod to reach the backend?

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-only
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - port: 8080

This blocks all other ingress to backend pods except from frontend-labeled pods on port 8080.

What happens to traffic if no NetworkPolicy exists?

All traffic is allowed by default. Any Pod can reach any other Pod in the cluster on any port.

Once you apply even one NetworkPolicy that selects a Pod, it becomes deny-by-default for that Pod — only explicitly allowed traffic is permitted.

⚠️ The first NetworkPolicy you apply can be a breaking change if you're not accounting for all the traffic you need to allow.

9. ConfigMaps

How do you inject config into a Pod (env vs volume)?

Option 1 — Environment variables:

envFrom:
- configMapRef:
    name: my-config

Simple, but the app must restart to pick up changes.

Option 2 — Volume mount:

volumes:
- name: config-vol
  configMap:
    name: my-config
volumeMounts:
- mountPath: /etc/config
  name: config-vol

Files appear in the container. Kubernetes updates them when the ConfigMap changes (~1 min sync), but your app must re-read the file to pick up the change.

Does a Pod auto-reload when a ConfigMap changes?

Partially. If mounted as a volume, the files on disk are eventually updated (kubelet syncs every ~1 minute). But the running process does not automatically restart — you need your app to watch the file, or use a tool like Reloader to trigger a rolling restart.

If injected as env vars, the Pod must be restarted entirely to see new values.

When would you use a ConfigMap vs an environment variable?

Use ConfigMap when...	Use plain env vars when...
Config is shared across multiple Pods	Value is simple and per-deployment
Config may change independently of the deployment	Value won't change between deployments
Config is complex (multi-line files, many keys)	You want minimal overhead
You want to manage config separately in Git/CI

❌ Never put secrets in either — use Secrets or an external vault.

10. Secrets

How are Secrets different from ConfigMaps?

Functionally they work similarly (env vars or volume mounts), but Secrets are intended for sensitive data (passwords, tokens, keys). Kubernetes:

Stores them base64-encoded (not encrypted by default)
Restricts access via RBAC
Doesn't log values in kubectl describe
Can integrate with external secret stores

They signal intent — team members and tooling know to treat these values carefully.

Are Kubernetes Secrets actually secure by default?

Not really. By default, Secrets are stored in etcd as base64-encoded (not encrypted) data. Anyone with etcd access or sufficient RBAC permissions can read them.

To make them genuinely secure you need:

✅ Encryption at rest (EncryptionConfiguration in the API server)
✅ Tight RBAC (restrict who can get/list secrets)
✅ Audit logging
✅ External secrets manager (HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager) via an operator like External Secrets Operator

How would you manage DB credentials for a backend service?

Best practice:

Store credentials in an external secrets manager (Vault, AWS Secrets Manager, etc.)
Use External Secrets Operator or a Vault agent sidecar to sync the secret into a Kubernetes Secret
Mount the Secret as a volume or env var in your Pod
Restrict access via RBAC so only the relevant ServiceAccount can read the secret
Enable etcd encryption at rest

❌ Never hardcode credentials in images or ConfigMaps, and never commit them to Git.

11. Debugging Pods

Pod is in CrashLoopBackOff — how do you debug it?

# 1. Check events and last state
kubectl describe pod <pod>

# 2. Check current logs
kubectl logs <pod>

# 3. Check logs from the PREVIOUS (crashed) container — most useful
kubectl logs <pod> --previous

# 4. Manually run the image to poke around
kubectl run debug --image=<same-image> --command -- sleep 3600
kubectl exec -it debug -- /bin/sh

Common exit codes:

Code	Meaning
`0`	Clean exit
`1`	Application error
`137`	OOMKilled (`128 + SIGKILL`)
`143`	SIGTERM received

Also check: misconfigured env vars, missing ConfigMaps/Secrets, failed startup probes, resource limits.

Pod is Pending — what are the possible causes?

A Pending Pod hasn't been scheduled to a node yet.

Cause	How to fix
Insufficient CPU/memory	Scale the cluster or reduce resource requests
Unschedulable	Check node selectors, affinity rules, taints/tolerations
PVC not bound	Check if the PersistentVolumeClaim is stuck in Pending
Resource quota exceeded	Check namespace quotas with `kubectl describe quota`
No matching nodes	Ensure node labels match nodeSelector

# Events section tells you exactly why scheduling failed
kubectl describe pod <pod>

How do you exec into a running container?

# Basic shell access
kubectl exec -it <pod-name> -- /bin/sh

# If bash is available
kubectl exec -it <pod-name> -- /bin/bash

# Specific container in a multi-container Pod
kubectl exec -it <pod-name> -c <container-name> -- /bin/sh

# If the container has no shell (e.g., distroless image)
kubectl debug -it <pod-name> --image=busybox --target=<container-name>

12. Health Checks

Difference between liveness, readiness, and startup probes?

Probe	Question it answers	On failure	Container restarted?
Liveness	Is the container still alive?	Kills & restarts the container	✅ Yes
Readiness	Is the container ready for traffic?	Removed from Service endpoints	❌ No
Startup	Has the app finished starting up?	Kills & restarts (while running)	✅ Yes

Execution order: Startup → (once passed) Liveness + Readiness run concurrently.

What happens if a liveness probe fails?

Kubernetes kills the container (SIGTERM → grace period → SIGKILL) and restarts it. If this keeps failing, it enters CrashLoopBackOff with exponential back-off.

Important: A failing liveness probe causes restarts, not traffic removal. Traffic removal is readiness probe's job.

When would you use a readiness probe for a backend API?

Use a readiness probe whenever your API isn't immediately ready to serve traffic after the process starts:

App needs time to connect to a database or warm up a connection pool
App loads a large model or cache into memory at startup
App runs database migrations on startup before it can serve requests
App has a /health/ready endpoint that checks downstream dependencies

Without a readiness probe, Kubernetes sends traffic to the Pod as soon as the container starts, causing errors during the warm-up window.