Trees & Graphshigh

Union-Find (Disjoint Set Union)

Union-Find tracks which elements belong to the same set (connected component) and supports union (merge two sets) and find (which set does an element belong to) operations.

Memory anchor

Union-Find = friend groups at a party. 'Find' asks 'who's your group leader?' 'Union' merges two groups under one leader. Path compression = everyone just memorizes the big boss directly, skipping the chain of command.

Expected depth

Basic implementation: parent array where parent[i] = i means i is the root. Find: follow parent pointers to root. Union: set one root's parent to the other. With path compression (find sets parent to root directly) and union by rank (always attach shorter tree under taller), both operations are nearly O(1) amortized — technically O(α(n)) where α is the inverse Ackermann function, effectively constant for all practical n.

Deep — senior internals

Path compression halves the tree height on each find. Union by rank (or size) prevents tall trees. Together they achieve amortized O(α(n)) per operation. Union-Find supports cycle detection in undirected graphs: if union(u, v) finds find(u) == find(v), adding edge u-v creates a cycle. Weighted Union-Find stores edge weights for relative distance problems (e.g., bipartite check with parity). Offline connectivity algorithms (e.g., Kruskal's MST) use Union-Find as their core data structure.

🎤Interview-ready answer

Union-Find is my default for dynamic connectivity problems: 'are these two nodes in the same group?', 'how many connected components after each union?', 'detect cycle in undirected graph'. I always implement with path compression and union by rank/size — the code is short and nearly O(1) per operation. For number of connected components, I maintain a count variable that decrements each time a successful union (of different components) occurs.

Common trap

Union-Find works for UNDIRECTED connectivity. For directed graphs, use DFS/BFS or Tarjan's SCC algorithm. Also: path compression mutates the parent array during find — this is fine for connectivity queries but breaks weighted Union-Find if you forget to accumulate weights during compression.

Related concepts