Collision Resolution
Collisions occur when two keys hash to the same index. Two main strategies: chaining (each bucket is a linked list) and open addressing (probe for the next empty slot).
Collision resolution: Chaining = apartment mailboxes where multiple letters stack in one box (linked list per slot). Open addressing = a parking lot where you circle to the next empty spot if yours is taken.
Chaining: each bucket holds a list of (key, value) pairs. Worst case O(n) if all keys hash to same bucket. Average O(1) with load factor < 1 (can exceed 1). Open addressing: probe sequence until empty slot found. Linear probing (next slot), quadratic probing, double hashing. Load factor must stay < 1; typically trigger resize at 0.7. Clustering: linear probing causes primary clustering — long runs of occupied slots that slow lookup.
Linear probing has excellent cache behavior (sequential memory access) but suffers from primary clustering. Quadratic probing (probe by i²) reduces primary clustering but doesn't guarantee probing all slots unless m is prime and load factor < 0.5. Double hashing (second hash determines step size) avoids clustering and probes all slots — most theoretically sound open addressing. Robin Hood hashing: on insert, if the new element's probe distance is greater than the incumbent's, steal the slot and continue inserting the displaced element — reduces variance in lookup times. Cuckoo hashing: uses two hash functions and two tables — guaranteed O(1) worst-case lookup, amortized O(1) insert.
Chaining and open addressing are the two fundamental collision strategies. Chaining is simpler and degrades gracefully with high load, but has pointer overhead and poor cache performance. Open addressing is cache-friendly but requires the load factor to stay well below 1. In practice, Java's HashMap uses chaining with tree nodes at high load (convert bucket list to red-black tree at length 8 for O(log n) worst case). Python uses open addressing with compact storage. I use this knowledge to explain why hash table performance degrades with adversarial inputs and how randomized hashing mitigates it.
Deleting from an open-addressing hash table cannot simply zero out the slot — subsequent lookups that probed through this slot will terminate early and fail to find keys that should be found. Use a 'tombstone' marker: the slot is treated as occupied for probing but empty for insertion. Forgetting tombstones causes silent data corruption in deletion-heavy workloads.