The High Five problem is a classic introduction to combining hash maps with heaps for “top-K” style calculations. It is popular in interviews because it tests whether you can keep only the most relevant data while processing input in a single pass. While sorting every student’s full score list may seem straightforward, this problem shows why maintaining a fixed-size min-heap is the more efficient and scalable approach.
Problem statement
You’re given a list items where each entry is a pair [IDi, scorei], meaning student IDi received the score scorei. For every student, compute their top five average by:
- Taking their five highest scores
- Adding those five scores together
- Dividing the sum by 5 using integer (floor) division
Return the results as a list of pairs result, where each pair is [ID, topFiveAverage]. The final list must be sorted by ID in increasing order.
Constraints:
- 1 <=
items.length<= 1000 items[i].length== 2- 1 <=
IDi<= 1000 - 0 <=
scorei<= 100 - For each
IDi, there will be at least five scores.
Examples
| Input | Output | Explanation |
| [[10,50],[10,40],[10,60],[10,70],[10,80],[10,90],[10,100]] | [[10,80]] | Student 10 has 7 scores. Top five are 100, 90, 80, 70, 60 → sum 400 → 400 // 5 = 80. Lower scores 50, 40 are ignored. |
| [[5,100],[5,99],[5,98],[5,97],[5,96]] | [[5,98]] | We have exactly five scores, so we average all: sum 490 → 490 // 5 = 98. |
| [[1,91], [2,88], [1,76], [2,100], [1,84], [2,93], [1,90], [2,85], [1,87], [2,92]] | [[1,85], [2,91]] | We compute the top five averages per student, then return the results sorted by student ID.Student 1 has five scores: 91, 76, 84, 90, 87. Since there are exactly five, all of them count as the “top five.” Their sum is 428, and the integer average is (428 // 5 = 85).Student 2 has five scores: 88, 100, 93, 85, 92. Again, all five scores are included. Their sum is 458, and the integer average is (458 // 5 = 91). |
Intuition
The main idea behind this intuition is: we don’t care about all scores; we only need the top 5 scores per student. So for each student, every time we see a new score, we ask:
- Does this score belong in their top 5?
- If we already have 5 scores, we may need to discard the smallest among the top 5.
Naive approach: Sorting
One straightforward approach is to group all scores by student ID first. Then, process students in increasing ID order, and for each student, sort their scores in descending order. Finally, for each student, take their top five scores, compute their sum, and return the average.
As simple as this solution looks, it has a major drawback: Sorting every student’s full score list can be expensive. If a student has many scores, sorting will have a time complexity of O(n log n) for that student. This is a major overkill when we only need the top 5.
Optimal approach using Min-Heap
Instead of collecting every score and sorting the entire list for each student, it is more efficient to process the input in a single pass while keeping only the information you actually need. Since the final answer depends only on the five highest scores per student, any score that is not competitive with those top five can be discarded immediately. This reduces unnecessary work (especially repeated sorting) and keeps memory usage predictable, as each student’s stored data never exceeds a small, fixed size.
A practical way to enforce this is to maintain, for each student ID, a min-heap that holds at most five scores. When a new score arrives, you insert it into that student’s heap; if the heap now contains more than five elements, you remove the smallest one. This ensures the heap always contains the best five scores seen so far for that student, without ever sorting the full set. After processing all pairs, you compute each student’s average by summing the heap and dividing by 5 using integer division, then output the results sorted by student ID.
Step-by-step algorithm
- Create a hashmap
scores_by_idmapping eachstudentIdto a min-heap of that student’s top scores. - Iterate through
itemsand for each[studentId, score]:- Push
scoreinto that student’s heap. - If heap size becomes > 5, pop the smallest score (so only the top 5 remain).
- Push
- Create a list
resultthat will store students and the average of their top five sum. - Iterate through each
studentIdin sorted order and for each student:- Sum the five scores currently in that student’s heap
- Compute the average scores using
sum // 5, and append[student_id, average]to the result.
- Finally, return the
resultarray.
Let’s look at the following illustration to get a better understanding of the solution:
Code implementation
Let’s look at the code for the solution we just discussed.
Code for the high five problem
Time complexity
The time complexity of this solution is O(N + S log S), where N is the number of score entries in items and S is the number of unique student IDs.
- For each of the N entries, we do a
heappush(and sometimes aheappop) on a heap capped at size 5, so each heap operation is O(log 5) = O(1). This makes the first pass O(N). - We then sort the S student IDs, which costs O(S log S).
- Summing each student’s heap is O(5) = O(1) per student, so this step is O(S).
Space complexity
The space complexity is O(S). The dictionary stores one heap per student, and each heap holds at most 5 scores, so total stored scores are at most 5S, which is O(S).
Common pitfalls and interview tips
- Forgetting integer division: The average must be
sum // 5, not float division. - Storing all scores unnecessarily: Sorting entire lists works, but interviewers often prefer the fixed-size heap since we only need the top 5.
- Assuming input is already grouped/sorted: It isn’t, always build your own grouping structure and sort ID only at the end.