What is Load Balancers: L4 vs L7 in System Design?

Networkingcritical

Load Balancers: L4 vs L7

Load balancers distribute incoming traffic across multiple backend servers. L4 (transport layer) balances based on TCP/UDP without inspecting payload. L7 (application layer) understands HTTP and can route based on URL, headers, and cookies.

Diagram

Rendering diagram…

Memory anchor

L4 load balancer = a highway traffic cop waving cars to different lanes (fast, doesn't care what's inside). L7 = a hotel concierge reading your reservation to send you to the right floor (smarter, but takes a moment to read).

Expected depth

L4 load balancers (AWS NLB, HAProxy TCP mode): operate on IP and port, extremely fast (nanoseconds per packet), support any TCP/UDP protocol, no TLS termination at the balancer level (TLS passthrough). L7 load balancers (AWS ALB, Nginx, Envoy): inspect HTTP headers and URLs, enable path-based routing (/api to service A, /static to CDN), host-based routing (api.example.com vs www.example.com), sticky sessions by cookie, TLS termination, WebSocket upgrades, HTTP/2 support. Algorithms: round-robin, least connections, IP hash (sticky), weighted round-robin.

Deep — senior internals

L7 load balancers are the standard for microservices — they serve as the ingress layer for service routing, auth middleware, rate limiting, and observability. Envoy Proxy is the sidecar of choice in service meshes (Istio, Linkerd) — each pod gets an Envoy sidecar that handles all inbound/outbound traffic with circuit breaking, retries, and distributed tracing built in. Health checks: L4 checks whether the TCP port is open; L7 sends HTTP requests to a /health endpoint and checks the response code. L7 health checks catch application-level failures (DB connection pool exhausted) that L4 misses. Consistent hashing in load balancers: for stateful upstreams (caching tiers), consistent hashing ensures the same request always routes to the same upstream server, maximizing cache utilization. Connection draining: when removing a backend from rotation, the load balancer stops sending new connections but waits for existing connections to complete — critical for zero-downtime deployments.

🎤Interview-ready answer

For a typical web application, I'd use L7 load balancing (ALB or Nginx) at the edge for HTTP routing, TLS termination, and health checking. Behind it, I'd use L4 for any non-HTTP protocols. In a microservices architecture, I'd use a service mesh (Istio + Envoy) for internal service-to-service load balancing — it gives me circuit breaking, retries, and distributed tracing for free. For the cache tier, I'd configure the load balancer with consistent hashing to maximize cache hit rates.

⚠Common trap

Forgetting that L7 load balancers introduce additional latency and are themselves a single point of failure. Always run multiple load balancer instances behind a VIP (Virtual IP) with ECMP routing, and monitor load balancer CPU — they can saturate under TLS handshake load.

Related concepts

High-Level Design

Cognitive Load Management