Level Up Your Coding Skills & Crack Interviews — Save up to 50% or more on Educative.io Today! Claim Discount

Arrow
Table of contents

Union Find

Many graph and connectivity problems seem to require complex traversal algorithms or maintaining expensive data structures. Questions about whether elements belong to the same group, whether adding an edge creates a cycle, or how many disconnected components exist can appear deceptively difficult to solve efficiently.

Union Find, also known as Disjoint Set Union (DSU), is a specialized data structure that solves exactly these kinds of problems with remarkable efficiency. It provides near-constant time operations for checking connectivity and merging groups, making it indispensable for a specific category of interview problems.

Understanding Union Find not only unlocks solutions to connectivity problems but also demonstrates your knowledge of specialized data structures and your ability to choose the right tool for the job. This is exactly the kind of algorithmic maturity that interviewers look for in strong candidates.

What is Union Find?

Union Find is a data structure that efficiently tracks a partition of elements into disjoint (non-overlapping) sets. It supports two primary operations that give the structure its name:

  • Find: Determine which set a particular element belongs to. This is typically done by finding the representative (or root) of the set.
  • Union: Merge two sets into a single set by connecting their representatives.

The beauty of Union Find lies in its simplicity and efficiency. At its core, it maintains a forest of trees where each tree represents a set, and the root of each tree serves as the representative of that set. Elements within the same tree belong to the same set.

Initially, each element is in its own set, essentially a tree with a single node. As unions are performed, trees merge together, creating larger sets. The Find operation traverses from an element up to its root, while the Union operation connects the root of one tree to the root of another.

What makes Union Find particularly powerful is that with two simple optimizations path compression and union by rank both operations can be performed in nearly constant time.

This transforms problems that would otherwise require O(n) time per query into ones that run in O(α(n)) time, where α is the inverse Ackermann function, which grows so slowly it is effectively constant for all practical purposes.

How Union Find works (Step-by-step)

Union Find is built on a simple yet powerful idea:

Represent each set as a tree, where the root serves as the representative of the set. Understanding the basic operations is essential before exploring optimizations.

Basic structure

The Union Find data structure uses a parent array where each element initially points to itself, indicating that each element is in its own set. The index represents an element, and the value at that index represents its parent.

For example, if we have 5 elements (0, 1, 2, 3, 4), the initial parent array looks like:

Each element is its own parent, meaning each is the root of its own tree containing only itself.

The Find operation

The Find operation determines which set an element belongs to by finding the root of its tree. Starting from the element, we follow parent pointers until we reach a node that is its own parent this is the root.

For example, if parent = [0, 0, 1, 2, 3], finding the root of element 3:

  • Start at 3, parent[3] = 2
  • Move to 2, parent[2] = 1
  • Move to 1, parent[1] = 0
  • Move to 0, parent[0] = 0 (root found)

The root of element 3 is 0, so element 3 belongs to the set represented by 0.

The Union operation

The Union operation merges two sets by connecting their roots. To union elements x and y:

  • Find the root of x (call it rootX)
  • Find the root of y (call it rootY)
  • If rootX and rootY are different, make one the parent of the other
  • If they are the same, the elements are already in the same set

For example, to union elements 3 and 4 in parent = [0, 0, 1, 2, 4]:

  • Find root of 3: follows 3 → 2 → 1 → 0, so root is 0
  • Find root of 4: parent[4] = 4, so root is 4
  • Set parent[4] = 0 (or parent[0] = 4)

Now elements 3 and 4 are in the same set, represented by root 0.

Python implementation (basic)

Here is a complete basic Python implementation:

Optimizations

The above implementation works correctly but can be slow. In the worst case, the tree can become a long chain, making Find operations O(n).

Two key optimizations reduce this to nearly constant time.

1. Path Compression

Path compression is an optimization applied during the Find operation. After finding the root, we make every node on the path point directly to the root. This flattens the tree structure, making future Find operations much faster.

Implementation is simple: after finding the root recursively, set the parent of the current node to the root.

For example, if we have a chain 0 ← 1 ← 2 ← 3 and call find(3):

  • We traverse 3 → 2 → 1 → 0 and find the root is 0
  • On the way back, we update: parent[3] = 0, parent[2] = 0, parent[1] = 0
  • Now all nodes point directly to the root, making future finds O(1)
2. Union by Rank

Union by rank (or union by size) is an optimization for the Union operation. The idea is to keep trees as flat as possible by always attaching the smaller tree under the root of the larger tree.

We maintain a rank array where rank[i] represents the approximate height of the tree rooted at i. When performing a union, we compare ranks and attach the tree with smaller rank to the tree with larger rank. If ranks are equal, we arbitrarily choose one and increment its rank.

This ensures that trees remain balanced and prevents the creation of long chains.

Combined analysis

When both path compression and union by rank are used together, the amortized time complexity of both Find and Union operations becomes O(α(n)), where α is the inverse Ackermann function. This function grows so incredibly slowly that for all practical values of n (even astronomically large ones), α(n) is at most 4 or 5.

In practice, this means Union Find operations are effectively constant time, which is why the data structure is so powerful for connectivity problems.

Python implementation (optimized)

Here is the optimized implementation with both path compression and union by rank:

Time and space complexity analysis

Understanding the complexity of Union Find is crucial for appreciating its power and knowing when to apply it.

Without optimizations

In the basic implementation without path compression or union by rank:

  • Find: O(n) in the worst case when the tree becomes a long chain
  • Union: O(n) because it calls Find twice
  • Space: O(n) for the parent array

This basic version is rarely acceptable in interviews when better performance is achievable.

With optimizations

With path compression and union by rank:

  • Find: O(α(n)) amortized, where α is the inverse Ackermann function
  • Union: O(α(n)) amortized
  • Space: O(n) for parent and rank arrays

Why this matters?

The near-constant time complexity is what makes Union Find superior to naive approaches for connectivity problems:

  • Checking connectivity with DFS/BFS: O(V + E) per query
  • Union Find with optimizations: O(α(n)) ≈ O(1) per query
  • For m operations on n elements: O(m) with Union Find vs O(m × n) with DFS/BFS

When you have many connectivity queries, Union Find can be orders of magnitude faster than graph traversal approaches.

When should you think about Union Find?

Recognizing when to apply Union Find during an interview is crucial. Certain problem characteristics serve as strong signals that Union Find is the optimal approach.

This table summarizes the key indicators:

Problem SignalWhat It Usually MeansWhy Union Find Helps
Connected componentsNeed to count or identify groupsEfficiently tracks and merges disjoint sets
Cycle detectionCheck if adding edge creates cycleIf nodes already connected, edge creates cycle
Dynamic connectivityConnections added over timeNear-constant time queries and updates
Equivalence relationsElements that are equivalent should be groupedNaturally models transitive relationships
Minimum spanning treeKruskal’s algorithm neededDetects cycles while building MST
Grid connectivityFinding islands or regions in 2D gridAlternative to DFS/BFS with better performance for certain queries

The key insight is this: if the problem involves dynamically merging groups and repeatedly checking whether elements belong to the same group, Union Find is likely the optimal solution. When you see phrases like “connected components,” “friend circles,” “network connectivity,” or “redundant connection,” Union Find should immediately come to mind.

Union Find vs. other approaches

Choosing between Union Find and other connectivity approaches depends on the specific problem requirements.

Here is a comprehensive comparison:

ApproachTime per QuerySpaceBest Use Case
Union FindO(α(n)) ≈ O(1)O(n)Many connectivity queries, dynamic connections, cycle detection
DFS/BFSO(V + E)O(V)Single query, finding paths, exploring graph structure
Adjacency ListO(degree)O(V + E)General graph problems, need neighbor access
Matrix/SetO(n) lookupO(n²) or O(n)Small graphs, group membership tracking

Key takeaways from this comparison:

  • Union Find excels when you need many connectivity queries or need to dynamically merge groups. It is specifically designed for these operations and performs them in nearly constant time.
  • DFS/BFS is better when you need to explore the graph structure, find specific paths, or perform a single connectivity check. These algorithms provide more information about the graph beyond just connectivity.
  • Adjacency lists are essential when you need to access neighbors quickly or work with general graph algorithms that require iterating through edges.
  • Simple structures like sets or matrices may suffice for very small graphs or when the problem does not require the efficiency of Union Find.

In interviews, demonstrating that you understand these trade-offs and can select the appropriate data structure shows deep algorithmic knowledge.

Common mistakes when implementing Union Find

Even though Union Find is conceptually simple, several common pitfalls can lead to bugs or suboptimal implementations during interviews.

  1. Forgetting path compression: The most common mistake is implementing the basic version without path compression. While functionally correct, it performs poorly and defeats the purpose of using Union Find. Always implement path compression by making nodes point directly to the root during the Find operation.
  2. Incorrect union logic: Another frequent error is connecting elements directly instead of connecting their roots. Always find roots first: self.parent[self.find(x)] = self.find(y), not self.parent[x] = y.
  3. Not handling self-loops and initialization errors: When performing a union, always check if root_x == root_y and return early to avoid self-loops. Additionally, ensure the parent array has exactly n elements indexed from 0 to n-1 when initializing with n elements: self.parent = list(range(n)).

Beyond basic connectivity queries, Union Find enables solutions to several important algorithmic problems.

  1. Kruskal’s Minimum Spanning Tree: Union Find is essential for Kruskal’s MST algorithm: sort edges by weight, then for each edge, if its endpoints aren’t connected (via Union Find), add the edge to the MST and union the endpoints. Stop at n-1 edges. Union Find efficiently detects whether adding an edge creates a cycle.
  2. Cycle detection and account merging: Union Find detects cycles in undirected graphs by checking if both endpoints of an edge are already in the same set. It’s also perfect for merging accounts or entities based on shared properties, like grouping accounts that share email addresses by treating each account as a node.
  3. Image processing: In image processing, Union Find identifies connected regions of similar color or intensity by treating each pixel as a node and unioning adjacent similar pixels. This is useful for flood fill, segmentation, and object identification.

Share with others:

Unlock up to 68% off lifetime access to Coding Interview prep with Educative

Getting ready for coding interviews or sharpening your problem-solving skills? Unlock a lifetime discount with comprehensive resources designed to help you master technical interviews.

Data structures and algorithms

Pattern-based problem solving

Mock interview practice

Real-world coding challenges

Coding Interview Logo