What is Rolling Hash / Rabin-Karp in DSA?

Arrays & Stringsmedium

Rolling Hash / Rabin-Karp

Rolling hash computes a hash of a fixed-length window and updates it in O(1) by removing the outgoing character and adding the incoming character, enabling O(n) string matching.

Memory anchor

Rolling hash = a conveyor belt sushi scanner. As each plate rolls by, you update the fingerprint by dropping the leftmost plate and adding the new one -- no need to re-scan the whole belt.

Expected depth

Rabin-Karp: hash(s[i..i+m-1]) = (s[i]*p^(m-1) + s[i+1]*p^(m-2) + ... + s[i+m-1]) mod q. Slide: subtract s[i]*p^(m-1) mod q, multiply by p, add s[i+m]. Used for exact string matching in O(n+m) average, and for repeated/duplicate substring problems. Multiple pattern search: compute hashes of all patterns into a set, then slide over text — O(n + sum(pattern lengths)).

Deep — senior internals

Hash collisions (same hash, different string) require verification — compare the actual strings when hashes match (the 'fingerprint' approach). Choose q as a large prime and p as a prime base (e.g., p=31, q=10^9+7) to minimize collisions. Double hashing (two different (p, q) pairs) reduces collision probability to near zero. Rolling hash enables 'longest duplicate substring' with binary search: binary search on length L, use rolling hash to check if any substring of length L repeats — O(n log n) total.

🎤Interview-ready answer

Rolling hash is my go-to for fixed-window substring search or when I need O(n) string matching without KMP's complexity. The key idea: polynomial hash lets you update the window in O(1) by subtracting the outgoing character's contribution and adding the incoming one. I always verify on hash match because collisions can occur — the overall expected complexity is still O(n+m). For interview problems about finding repeated substrings, I combine binary search on length with rolling hash.

⚠Common trap

A hash match does NOT guarantee a string match — you must verify by comparing the actual substrings. Forgetting verification turns a probabilistically correct algorithm into an incorrect one. Also: when removing the leading character, you must multiply by the modular inverse of p (or precompute p^(m-1) mod q) — getting the exponent wrong makes the hash incorrect.