Level Up Your Coding Skills & Crack Interviews — Save up to 50% or more on Educative.io Today! Claim Discount

Arrow
Table of contents

217. Contains Duplicate

Detecting duplicates in an array is a fundamental problem that appears frequently in technical interviews at companies like Google, Amazon, and Microsoft. This problem tests your understanding of hash-based data structures and your ability to optimize from a quadratic solution to linear time.

Problem statement

You are given an integer array nums. Your task is to determine whether any value appears more than once in the array. Return True if it does; otherwise, return False.

Constraints:

  • 1 <= nums.length <= 105
  • -109 <= nums[i] <= 109

Examples

InputOutputExplanation
[3, 7, 1, 3]TrueThe number 3 appears at indices 0 and 3.
[5, 2, 8, 1]FalseEvery element in the array is unique.
[2, 2, 2, 5, 5, 9, 5, 3, 9, 3]TrueMultiple elements repeat: 25, 9, and 3 all appear more than once.

Why checking all pairs becomes expensive

When we first encounter this problem, a common brute-force approach is to use a nested loop: compare each element with every element that comes after it. Specifically, for each index i, we scan indices i + 1 through n - 1 to check whether the same value appears again. This method correctly detects duplicates, but it becomes very inefficient as the array grows.

The issue is that for every element, we potentially perform a linear scan over the remaining portion of the array. If the array contains no duplicates, we end up checking almost every possible pair of elements. The total number of comparisons in this worst case is O(n²). With n = 105, this could mean ~1010 comparisons, which is far too slow.

Optimized approach using Set

The check all pairs method wastes work because it re-checks the same information again and again. When we move from index i to i + 1, we basically restart the search from scratch, even though we already learned a lot about the earlier elements.

So instead of repeatedly scanning the remaining part of the array for each number, we can flip the perspective:

  • While traversing the array from left to right, anything we’ve already passed is known.
  • For the current value, the only question we need to answer is: Have we seen this before?

If we can answer that question quickly, we can eliminate the expensive inner loop entirely. That’s exactly what a set gives us: we keep a running record of values we’ve encountered so far, and each time we read a new number, we do a constant-time lookup. If it’s already present, we’ve found a duplicate immediately; if not, we record it and continue.

Why do common approaches like sorting fail?

While sorting the array and then checking adjacent elements would work and give us O(n log n) time complexity, it’s slower than necessary and also requires either modifying the input array or creating a sorted copy. Our set based approach achieves better time complexity while leaving the original array unchanged.

Step-by-step algorithm

  1. Initialize an empty set called records.
  2. Iterate through the nums array, and for each number num,
    1. Check if num already exists in records
      1. If it does, return True as we’ve encountered a duplicate.
      2. Otherwise, add it to records with num as the key.
  3. After processing all numbers without finding duplicates, return False.

Take a look at the illustration below to understand the solution more clearly.

1 / 5

Code implementation

Let’s look at the code for the solution we just discussed.

Code for the contains duplicate problem

Time complexity

The time complexity of this algorithm is O(n), where n is the number of elements in the input array. We iterate through the array exactly once, and for each element, we perform a set lookup and insertion, both of which are O(1) operations on average. Even in the worst case, where no duplicates exist, we still only make a single pass through all n elements.

Space complexity

The space complexity is O(n) in the worst case. This occurs when all elements in the array are distinct, requiring us to store every element in our records hash map.

Frequently Asked Questions

Can we solve this problem using sorting instead of a set?

Yes, we can sort the array first and then check adjacent elements for duplicates. This would have O(n log n) time complexity and O(1) or O(n) space, depending on whether we sort in-place or create a copy. However, this is slower than the hash map approach and may modify the input.

What if we need to find which element is duplicated, not just whether a duplicate exists?

We can modify our solution to return the element instead of True when we find it in the set. The algorithm structure remains the same; only the return value changes.

How would this change if the input were a data stream instead of an array?

The set based approach works perfectly for streams. We’d process elements one at a time as they arrive, maintaining our hash map across stream chunks. Early return optimization is even more valuable in streaming scenarios.

Share with others:

Leave a Reply

Your email address will not be published. Required fields are marked *