Many graph and connectivity problems seem to require complex traversal algorithms or maintaining expensive data structures. Questions about whether elements belong to the same group, whether adding an edge creates a cycle, or how many disconnected components exist can appear deceptively difficult to solve efficiently.
Union Find, also known as Disjoint Set Union (DSU), is a specialized data structure that solves exactly these kinds of problems with remarkable efficiency. It provides near-constant time operations for checking connectivity and merging groups, making it indispensable for a specific category of interview problems.
Understanding Union Find not only unlocks solutions to connectivity problems but also demonstrates your knowledge of specialized data structures and your ability to choose the right tool for the job. This is exactly the kind of algorithmic maturity that interviewers look for in strong candidates.
What is Union Find?
Union Find is a data structure that efficiently tracks a partition of elements into disjoint (non-overlapping) sets. It supports two primary operations that give the structure its name:
- Find: Determine which set a particular element belongs to. This is typically done by finding the representative (or root) of the set.
- Union: Merge two sets into a single set by connecting their representatives.
The beauty of Union Find lies in its simplicity and efficiency. At its core, it maintains a forest of trees where each tree represents a set, and the root of each tree serves as the representative of that set. Elements within the same tree belong to the same set.
Initially, each element is in its own set, essentially a tree with a single node. As unions are performed, trees merge together, creating larger sets. The Find operation traverses from an element up to its root, while the Union operation connects the root of one tree to the root of another.
What makes Union Find particularly powerful is that with two simple optimizations path compression and union by rank both operations can be performed in nearly constant time.
This transforms problems that would otherwise require O(n) time per query into ones that run in O(α(n)) time, where α is the inverse Ackermann function, which grows so slowly it is effectively constant for all practical purposes.
How Union Find works (Step-by-step)
Union Find is built on a simple yet powerful idea:
Basic structure
The Union Find data structure uses a parent array where each element initially points to itself, indicating that each element is in its own set. The index represents an element, and the value at that index represents its parent.
For example, if we have 5 elements (0, 1, 2, 3, 4), the initial parent array looks like:
Each element is its own parent, meaning each is the root of its own tree containing only itself.
The Find operation
The Find operation determines which set an element belongs to by finding the root of its tree. Starting from the element, we follow parent pointers until we reach a node that is its own parent this is the root.
For example, if parent = [0, 0, 1, 2, 3], finding the root of element 3:
- Start at 3,
parent[3] = 2 - Move to 2,
parent[2] = 1 - Move to 1,
parent[1] = 0 - Move to 0,
parent[0] = 0(root found)
The root of element 3 is 0, so element 3 belongs to the set represented by 0.
The Union operation
The Union operation merges two sets by connecting their roots. To union elements x and y:
- Find the root of x (call it
rootX) - Find the root of y (call it
rootY) - If
rootXandrootYare different, make one the parent of the other - If they are the same, the elements are already in the same set
For example, to union elements 3 and 4 in parent = [0, 0, 1, 2, 4]:
- Find root of 3: follows 3 → 2 → 1 → 0, so root is 0
- Find root of 4:
parent[4] = 4, so root is 4 - Set
parent[4] = 0(orparent[0] = 4)
Now elements 3 and 4 are in the same set, represented by root 0.
Python implementation (basic)
Here is a complete basic Python implementation:
Optimizations
The above implementation works correctly but can be slow. In the worst case, the tree can become a long chain, making Find operations O(n).
Two key optimizations reduce this to nearly constant time.
1. Path Compression
Path compression is an optimization applied during the Find operation. After finding the root, we make every node on the path point directly to the root. This flattens the tree structure, making future Find operations much faster.
For example, if we have a chain 0 ← 1 ← 2 ← 3 and call find(3):
- We traverse 3 → 2 → 1 → 0 and find the root is 0
- On the way back, we update:
parent[3] = 0,parent[2] = 0,parent[1] = 0 - Now all nodes point directly to the root, making future finds O(1)
2. Union by Rank
Union by rank (or union by size) is an optimization for the Union operation. The idea is to keep trees as flat as possible by always attaching the smaller tree under the root of the larger tree.
We maintain a rank array where rank[i] represents the approximate height of the tree rooted at i. When performing a union, we compare ranks and attach the tree with smaller rank to the tree with larger rank. If ranks are equal, we arbitrarily choose one and increment its rank.
This ensures that trees remain balanced and prevents the creation of long chains.
Combined analysis
When both path compression and union by rank are used together, the amortized time complexity of both Find and Union operations becomes O(α(n)), where α is the inverse Ackermann function. This function grows so incredibly slowly that for all practical values of n (even astronomically large ones), α(n) is at most 4 or 5.
Python implementation (optimized)
Here is the optimized implementation with both path compression and union by rank:
Time and space complexity analysis
Understanding the complexity of Union Find is crucial for appreciating its power and knowing when to apply it.
Without optimizations
In the basic implementation without path compression or union by rank:
- Find: O(n) in the worst case when the tree becomes a long chain
- Union: O(n) because it calls
Findtwice - Space: O(n) for the
parentarray
This basic version is rarely acceptable in interviews when better performance is achievable.
With optimizations
With path compression and union by rank:
- Find: O(α(n)) amortized, where α is the inverse Ackermann function
- Union: O(α(n)) amortized
- Space: O(n) for
parentandrankarrays
When should you think about Union Find?
Recognizing when to apply Union Find during an interview is crucial. Certain problem characteristics serve as strong signals that Union Find is the optimal approach.
This table summarizes the key indicators:
| Problem Signal | What It Usually Means | Why Union Find Helps |
| Connected components | Need to count or identify groups | Efficiently tracks and merges disjoint sets |
| Cycle detection | Check if adding edge creates cycle | If nodes already connected, edge creates cycle |
| Dynamic connectivity | Connections added over time | Near-constant time queries and updates |
| Equivalence relations | Elements that are equivalent should be grouped | Naturally models transitive relationships |
| Minimum spanning tree | Kruskal’s algorithm needed | Detects cycles while building MST |
| Grid connectivity | Finding islands or regions in 2D grid | Alternative to DFS/BFS with better performance for certain queries |
Union Find vs. other approaches
Choosing between Union Find and other connectivity approaches depends on the specific problem requirements.
Here is a comprehensive comparison:
| Approach | Time per Query | Space | Best Use Case |
| Union Find | O(α(n)) ≈ O(1) | O(n) | Many connectivity queries, dynamic connections, cycle detection |
| DFS/BFS | O(V + E) | O(V) | Single query, finding paths, exploring graph structure |
| Adjacency List | O(degree) | O(V + E) | General graph problems, need neighbor access |
| Matrix/Set | O(n) lookup | O(n²) or O(n) | Small graphs, group membership tracking |
Key takeaways from this comparison:
- Union Find excels when you need many connectivity queries or need to dynamically merge groups. It is specifically designed for these operations and performs them in nearly constant time.
- DFS/BFS is better when you need to explore the graph structure, find specific paths, or perform a single connectivity check. These algorithms provide more information about the graph beyond just connectivity.
- Adjacency lists are essential when you need to access neighbors quickly or work with general graph algorithms that require iterating through edges.
- Simple structures like sets or matrices may suffice for very small graphs or when the problem does not require the efficiency of Union Find.
Common mistakes when implementing Union Find
Even though Union Find is conceptually simple, several common pitfalls can lead to bugs or suboptimal implementations during interviews.
- Forgetting path compression: The most common mistake is implementing the basic version without path compression. While functionally correct, it performs poorly and defeats the purpose of using Union Find. Always implement path compression by making nodes point directly to the root during the
Findoperation. - Incorrect union logic: Another frequent error is connecting elements directly instead of connecting their roots. Always find roots first:
self.parent[self.find(x)] = self.find(y), notself.parent[x] = y. - Not handling self-loops and initialization errors: When performing a union, always check if
root_x == root_yand return early to avoid self-loops. Additionally, ensure theparentarray has exactlynelements indexed from0ton-1when initializing with n elements:self.parent = list(range(n)).
Beyond basic connectivity queries, Union Find enables solutions to several important algorithmic problems.
- Kruskal’s Minimum Spanning Tree: Union Find is essential for Kruskal’s MST algorithm: sort edges by weight, then for each edge, if its endpoints aren’t connected (via Union Find), add the edge to the MST and union the endpoints. Stop at
n-1edges. Union Find efficiently detects whether adding an edge creates a cycle. - Cycle detection and account merging: Union Find detects cycles in undirected graphs by checking if both endpoints of an edge are already in the same set. It’s also perfect for merging accounts or entities based on shared properties, like grouping accounts that share email addresses by treating each account as a node.
- Image processing: In image processing, Union Find identifies connected regions of similar color or intensity by treating each pixel as a node and unioning adjacent similar pixels. This is useful for flood fill, segmentation, and object identification.