Resource Requests & Limits
Requests are the guaranteed minimum resources for scheduling. Limits are the maximum a container can use. Both are set per container, not per pod.
Requests = your reserved table at a restaurant (guaranteed). Limits = the buffet plate size. CPU limit = small plate (you eat slower but survive). Memory limit = the bouncer at the door (exceed it and you get THROWN OUT — OOMKilled). CPU throttles, memory kills.
CPU: requests used for scheduling (scheduler finds nodes with sufficient remaining capacity). CPU limit enforced via cgroup CPU throttling — exceeding the limit slows the process, not kills it. Memory: requests for scheduling. Memory limit enforced via cgroup OOM killer — exceeding kills the container (OOMKilled). Formats: CPU in millicores (500m = 0.5 CPU), memory in Mi/Gi. QoS classes: Guaranteed (request==limit for all containers), Burstable (at least one container has a request), BestEffort (no requests or limits) — Guaranteed pods are evicted last under node pressure.
CPU throttling: a container with limit 1000m and requesting 2000m of actual CPU gets throttled to 1000m — it runs at 50% of its requested speed. This is a common performance issue with tight CPU limits. Memory overcommit is dangerous — if the node runs out of memory, the OOM killer evicts pods based on QoS class and actual usage. LimitRange sets default requests/limits for a namespace (important: without LimitRange, containers with no requests get BestEffort QoS and are first evicted). ResourceQuota limits total resource consumption per namespace. Vertical Pod Autoscaler (VPA) recommends or auto-adjusts requests based on actual usage. Horizontal Pod Autoscaler (HPA) scales replica count based on CPU/memory or custom metrics.
Always set resource requests and limits. Requests tell the scheduler how much capacity to reserve — no requests means BestEffort QoS and first-evicted under pressure. Limits prevent runaway processes from starving neighbors. CPU limits cause throttling (process slows), memory limits cause OOMKill (process dies). A common production issue: tight CPU limits cause request latency spikes even when CPUs are underutilized — because the cgroup throttles the container mid-request. Set CPU limits conservatively or leave them unset for latency-sensitive services.
CPU limits cause throttling, not kills. A container repeatedly hitting its CPU limit will show high CPU throttle % in metrics but not OOMKill. This is a silent performance issue — the pod stays Running but responds slowly. Monitor cpu_throttled_seconds_total.