Detecting duplicates in an array is a fundamental problem that appears frequently in technical interviews at companies like Google, Amazon, and Microsoft. This problem tests your understanding of hash-based data structures and your ability to optimize from a quadratic solution to linear time.
Problem statement
You are given an integer array nums. Your task is to determine whether any value appears more than once in the array. Return True if it does; otherwise, return False.
Constraints:
- 1 <=
nums.length<= 105 - -109 <=
nums[i]<= 109
Examples
| Input | Output | Explanation |
[3, 7, 1, 3] | True | The number 3 appears at indices 0 and 3. |
[5, 2, 8, 1] | False | Every element in the array is unique. |
[2, 2, 2, 5, 5, 9, 5, 3, 9, 3] | True | Multiple elements repeat: 2, 5, 9, and 3 all appear more than once. |
Why checking all pairs becomes expensive
When we first encounter this problem, a common brute-force approach is to use a nested loop: compare each element with every element that comes after it. Specifically, for each index i, we scan indices i + 1 through n - 1 to check whether the same value appears again. This method correctly detects duplicates, but it becomes very inefficient as the array grows.
The issue is that for every element, we potentially perform a linear scan over the remaining portion of the array. If the array contains no duplicates, we end up checking almost every possible pair of elements. The total number of comparisons in this worst case is O(n²). With n = 105, this could mean ~1010 comparisons, which is far too slow.
Optimized approach using Set
The “check all pairs“ method wastes work because it re-checks the same information again and again. When we move from index i to i + 1, we basically restart the search from scratch, even though we already learned a lot about the earlier elements.
So instead of repeatedly scanning the remaining part of the array for each number, we can flip the perspective:
- While traversing the array from left to right, anything we’ve already passed is known.
- For the current value, the only question we need to answer is: Have we seen this before?
If we can answer that question quickly, we can eliminate the expensive inner loop entirely. That’s exactly what a set gives us: we keep a running record of values we’ve encountered so far, and each time we read a new number, we do a constant-time lookup. If it’s already present, we’ve found a duplicate immediately; if not, we record it and continue.
Why do common approaches like sorting fail?
While sorting the array and then checking adjacent elements would work and give us O(n log n) time complexity, it’s slower than necessary and also requires either modifying the input array or creating a sorted copy. Our set based approach achieves better time complexity while leaving the original array unchanged.
Step-by-step algorithm
- Initialize an empty set called
records. - Iterate through the
numsarray, and for each numbernum,- Check if
numalready exists inrecords- If it does, return
Trueas we’ve encountered a duplicate. - Otherwise, add it to
recordswithnumas the key.
- If it does, return
- Check if
- After processing all numbers without finding duplicates, return
False.
Take a look at the illustration below to understand the solution more clearly.
Code implementation
Let’s look at the code for the solution we just discussed.
Code for the contains duplicate problem
Time complexity
The time complexity of this algorithm is O(n), where n is the number of elements in the input array. We iterate through the array exactly once, and for each element, we perform a set lookup and insertion, both of which are O(1) operations on average. Even in the worst case, where no duplicates exist, we still only make a single pass through all n elements.
Space complexity
The space complexity is O(n) in the worst case. This occurs when all elements in the array are distinct, requiring us to store every element in our records hash map.