Hash Tablescritical

Hash Table Internals

A hash table maps keys to values using a hash function to compute an array index. O(1) average insert, lookup, and delete.

Memory anchor

Hash table = a coat check. Hand over your coat (value), get a numbered ticket (hash). Retrieval is instant -- just show the ticket. Collisions = two coats get the same ticket number, so the attendant has to dig through a pile.

Expected depth

Hash function maps key to index in [0, m-1]. Ideal: uniform distribution, fast to compute, deterministic. Load factor α = n/m (n=elements, m=buckets). As α increases, collisions increase. Resize (rehash) when α exceeds threshold (typically 0.7-0.75). Python dicts are open-addressing hash maps; Java HashMap uses chaining. Amortized O(1) per operation across a sequence of operations including resizes.

Deep — senior internals

Python's dict uses open addressing with pseudo-random probing and a growth factor of ~1.3x (resizes to 2x capacity). Python 3.7+ dicts maintain insertion order by using a compact array of indices pointing to a dense key-value array — this is why Python dicts are memory-efficient and ordered. JavaScript objects use hash tables internally but V8 switches to 'fast mode' (hidden classes / shape) for object properties that follow a fixed pattern. Hash table amortized O(1) relies on the number of resizes being O(log n) total across n inserts, each resize copying O(current size) — geometric series gives O(n) total copy work.

🎤Interview-ready answer

A hash table is my default for O(1) lookup when keys are hashable. I understand that 'O(1)' is amortized average — worst case is O(n) with hash collisions or rehashing. I'd mention load factor as the key operational parameter: too low wastes space, too high degrades to O(n). In Python, dict and set are hash-based. For interview problems, I use a dict for frequency counting, inverse indexing, or memoization. For custom keys, I ensure the key is hashable and that equal objects have equal hashes.

Common trap

Mutable objects cannot be dict keys in Python — they're not hashable. Lists and dicts are unhashable. Convert to tuples for hashability. Also: two objects that compare equal MUST have the same hash — if you override __eq__ without overriding __hash__, the default hash still uses object identity, which can cause correctness bugs.

Related concepts