This problem often appears in interview settings because it goes beyond “find an index” and asks you to prove fair randomness. The trick is making sure each valid index is chosen with exactly the same probability, even though you discover matches one by one as you scan the array. That’s a realistic skill for building systems that sample logs, events, or streams fairly while keeping memory usage small.

Problem statement

You are given an integer array, nums, that may contain duplicates. For a given integer, target, return a random index i such that nums[i] == target. If the target occurs multiple times, each matching index must be chosen with equal probability.

Design a Solution class that can:

  • Solution(int[] nums) Initializes the object with the array nums.
  • int pick(int target) Picks a random index i from nums where nums[i] == target. If there are multiple valid i’s, then each index should have an equal probability of returning.

It is guaranteed that target exists in nums when pick(target) is called.

Constraints:

  • 1 <= nums.length <= 2 x 104
  • -231 <= nums[i] <= 231 – 1
  • target is an integer from nums.
  • At most 104 calls will be made to pick.

Examples

numstargetpicksOutputExplanation
[4, 9, 4, 1, 4, 7]48e.g. [0, 2, 4, 4, 0, 2, 0, 4]Target 4 occurs at indices {0,2,4}. Each call should return one of these indices uniformly at random.
[-10, 5, -10, 3, 0, -10]-1010e.g. [2, 0, 5, 5, 0, 2, 0, 5, 2, 0]Works with negative numbers too. Valid indices are {0,2,5}.
[42]426[0, 0, 0, 0, 0, 0]Array length is 1, so the only valid index is always returned.
[8, 8, 8, 8, 8]87e.g. [3, 1, 4, 0, 2, 2, 1]All values are the target; any index0..4is valid with equal probability.
[0, 1, 2, 3, 4, 5]35[3, 3, 3, 3, 3]Target appears once, so it always returns that single index.
[2**31 – 1, -2**31, 7, 7, 7]79e.g. [4, 2, 3, 4, 2, 2, 3, 4, 3]Includes extreme 32-bit integer values; target indices are{2,3,4}.
[6, 1, 6, 2, 6, 3, 6, 4]612e.g. [6, 0, 4, 2, 0, 6, 4, 2, 6, 0, 2, 4]Target appears many times, spaced out. Valid indices are {0,2,4,6}. Output should only be from this set.

Naive approach

A simple way to solve this problem is to handle each pick(target) request independently. Every time pick is called, we scan the entire array nums from left to right and collect all indices i where nums[i] == target into a temporary list. After we finish the scan, we choose one index from that list at random and return it.

This approach is correct because the list contains exactly the valid indices for the target, and selecting uniformly from it guarantees that each valid index has the same probability of being returned.

However, the downside is efficiency. If pick is called many times, we end up scanning the entire array repeatedly and rebuilding the same list again and again. That makes the time complexity O(n) per call, which becomes expensive when the number of calls is large.

Key observation

The key observation is that the input array nums does not change after the object is initialized. Because the array is fixed, the set of indices where a particular value appears is also fixed.

That means we can do the “searching work” just once. If we precompute and store all indices for every distinct value in nums, then later, when pick(target) is called, we do not need to scan the array at all. We can immediately access the list of indices for target and randomly choose one index from that list.

This works especially well because the problem only requires that we pick a random index uniformly among the occurrences of target. If we have a list of all valid indices, uniform randomness is as simple as choosing a random element from the list.

Optimized solution

In the optimized solution, we build a hash map (dictionary) during initialization:

  • The key is a number from the array.
  • The value is a list of all indices where that number appears.

For example, if nums = [1, 2, 3, 3, 3], the map looks like this:

  • 1 -> [0]
  • 2 -> [1]
  • 3 -> [2, 3, 4]

Now, when we call pick(3), we directly retrieve the list [2, 3, 4] and return one of those indices at random. Since the random choice is uniform over the list, every index in the list has an equal probability of being returned, which is exactly what the problem asks for.

This solution is efficient because we pay the cost of scanning the array only once during initialization. After that, each pick operation is constant time.

Python implementation for the optimal solution

Now, let’s take a look at the code that implements this solution:

Code for the Random Pick Index problem

Time complexity

  • Initialization (__init__): This takes O(n) because we iterate through the entire input array exactly once to populate the dictionary. Each dictionary insertion (appending to a list) takes O(1) on average.
  • Picking (pick): Retrieving the list of indices from the dictionary takes O(1) average time, and random.choice() selects an element from that list in O(1) time.

Space complexity

The space complexity of this solution is O(n) in the worst case, because we store each index exactly once across the lists.

Edge cases

  1. Target appears once
  • Example: nums = [5, 1, 2]target = 1
  • Stored list: 1 -> [1]
  • random.choice([1]) always returns 1 (probability 1).
  1. Target appears many times (duplicates)
  • Example: nums = [3, 3, 3, 3]target = 3
  • Stored list: 3 -> [0, 1, 2, 3]
  • List size m = 4, so each index is chosen with probability 1/4.
  1. All values equal the target
  • Example: nums = [8, 8, 8, 8, 8]target = 8
  • Stored list contains every index [0..4], each returned with probability 1/5.
  1. Negative and very large values
  • Example: nums = [-2**31, 7, -2**31]target = -2**31
  • Hash keys can be any int; comparison is exact, so indices are stored/retrieved normally.
  1. Many repeated pick(target) calls
  • The map is built once and never changes.
  • Each call selects uniformly from the same index list, so the distribution stays uniform across calls.

Common pitfalls

  1. Accidentally returning a random VALUE instead of a random INDEX
  • Your map should store indices, not values.
  1. Using a non-uniform selection
  • Bad: random.randint(0, len(lst)) (off-by-one, can crash or bias if “fixed”)
  • Good: random.randrange(len(lst)) or random.choice(lst)
  1. Rebuilding the list on every pick
  • Works, but wastes time: turns each pick into O(n) instead of O(1).
  1. Mutating the stored lists by mistake
  • Don’t sort/shuffle/pop from self.indices[target] during pick, or you’ll change future probabilities.