Kubernetes uses resource requests to schedule Pods and limits to cap runtime consumption. Omitting either creates operational problems that are nearly impossible to debug under load.
Requests vs Limits
- Request — Reserved capacity the scheduler uses to find a suitable node. A Pod won't be placed unless a node has at least this much free.
- Limit — Runtime cap. Exceed memory: OOMKilled. Exceed CPU: throttled (no kill, but latency spikes).
Finding the Right Values
Use kubectl top pods or Prometheus container_cpu_usage_seconds_total and container_memory_working_set_bytes metrics from a representative load period. A safe heuristic: requests at P50, limits at P99.
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"QoS Classes
When requests equal limits, the Pod gets Guaranteed QoS — the highest eviction priority. Kubernetes evicts BestEffort and Burstable Pods first under node pressure. For latency-sensitive services, always use Guaranteed QoS.
Enforce Cluster-Wide
Use a LimitRange to set namespace defaults and a ResourceQuota to cap total consumption. Without both, one misconfigured Deployment can exhaust all node resources and starve every other workload in the namespace.