What is Cache Stampede (Thundering Herd) in System Design?

Cachinghigh

Cache Stampede (Thundering Herd)

A cache stampede occurs when a popular cache entry expires and many concurrent requests simultaneously miss the cache, all rushing to recompute or fetch the value from the database, overwhelming it.

Memory anchor

Thundering herd = a Black Friday crowd waiting behind a locked door. The moment it opens, EVERYONE stampedes in. Fix: let one person in to restock while everyone else gets yesterday's flyer (stale-while-revalidate).

Expected depth

Mitigation strategies: (1) Probabilistic early expiration — start recomputing before the TTL expires, with probability increasing as expiration approaches. (2) Mutex/distributed lock — first miss acquires a lock and recomputes; other misses wait or serve stale data. (3) Background refresh — a background job proactively refreshes entries before they expire. (4) Stale-while-revalidate — serve the stale entry immediately while triggering an async refresh.

Deep — senior internals

The XFetch algorithm (probabilistic early expiration) is mathematically elegant: a cache entry with remaining TTL δ is recomputed with probability exp(-δ/β) where β is a tuning parameter. As expiration approaches (δ → 0), recomputation probability approaches 1, distributing the recompute load across the expiration window. Redis doesn't natively implement this but it's easy to add at the application layer. Distributed mutex using Redis SET NX (set if not exists) with a short TTL: the first thread to acquire the lock recomputes; others either wait or return stale. The lock TTL must exceed recompute time — set it conservatively. A simpler approach: 'jitter' the TTL. Instead of all entries expiring at T, expire them at T + random(0, T*0.1). This spreads stampedes over time for batch-loaded caches.

🎤Interview-ready answer

I'd use stale-while-revalidate as my primary defense: always serve whatever is in the cache (even if expired) and trigger an async background refresh. This keeps latency low and protects the database. For entries where serving stale is unacceptable, I'd use a Redis NX lock with a short TTL: winner recomputes, losers serve the stale value (or wait briefly if no stale value exists). I'd also add TTL jitter at population time to spread expirations.

⚠Common trap

Using a distributed lock without a fallback for the lock waiters. If the lock holder crashes mid-recompute, all waiters either deadlock (waiting for the lock to expire) or hammer the DB simultaneously when it does. Always set a lock TTL and serve stale while waiting.

Related concepts

Kubernetes & Docker

Build Cache & Optimization