Level Up Your Coding Skills & Crack Interviews — Save up to 50% or more on Educative.io Today! Claim Discount

Arrow
Table of contents

49. Group Anagrams

The Group anagrams problem is a classic introduction to using hashing for efficient grouping. It is a favorite in interviews because it tests whether you can replace repeated pairwise comparisons with a single, consistent representation for each string. While checking anagrams directly may seem straightforward, this problem highlights why computing a canonical key (such as a character-frequency map or a sorted string) is crucial for efficient scaling.

Problem statement

Given an array of strings strs, your task is to group together strings that are anagrams of each other. You can return the group in any order.

An anagram is a word or phrase formed from another word by rearranging its letters.

Constraints:

  • 1 <= strs.length <= 104
  • 0 <= strs[i].length <= 100
  • strs[i] consists of lowercase English letters.

Examples

InputOutputExplanation
[“arc”, “car”, “elbow”, “below”, “state”, “taste”, “cat”][[“arc”, “car”], [“elbow”, “below”],[“state”, “taste”],[“cat”]]“arc” and “car” both share the letters {a, c, r} so they form one group. “elbow” and “below” share {b,e,l,o,w} so they form another. “state” and “taste” share {a, e, s, t, t} so they form another. “cat” stands alone.
[“abc”, “bca”, “cab”, “xyz”, “zyx”, “zxy”, “foo”][[“abc”, “bca”, “cab”], [“xyz”, “zyx”, “zxy”], [“foo”]]“abc”, “bca”, and “cab” all share the letters {a, b, c}, so they form one group. “xyz”, “zyx”, and “zxy” share {x, y, z}, so they form another. “foo” stands alone.
[“aa”, “aa”, “a”, “bb”, “b”, “b”][[“aa”, “aa”], [“a”], [“bb”], [“b”, “b”]]The two “aa” strings share the letters {a, a}, so they form one group. The two “b” strings share {b}, so they form another anagram. “a” and “bb” don’t share the exact same multiset of letters with any other string, so each stands alone.

Naive approach

A simple approach is to compare every pair of strings and decide if they are anagrams. That means repeatedly checking the same strings against many others. For example, given the strings: ["eat", "tea", "tan", "ate"], you end up comparing each string with every other string, i.e.,

  • Comparing "eat" with "tea"
  • Comparing "eat" with "tan"
  • Comparing "eat" with "ate"
  • Comparing "tea" with "tan"
  • Comparing "tea" with "ate"
  • Comparing "tan" with "ate"

That is 6 comparisons for just 4 strings. In general, if there are n strings, comparing every pair takes n(n−1)/2 pair checks, which takes O(n²) time. With up to 104 strings, the number of comparisons becomes enormous, and that’s before accounting for the cost of each anagram check.

Each pair check is also non-trivial: to confirm two strings are anagrams, you must process their characters, either by building frequency counts or by sorting. That makes the total runtime:

  • If you check anagrams using character counts, the overall cost is O(n² x k) time and O(1) extra space, where n is the number of strings and k is the average string length.
  • If you check anagrams by sorting each pair of strings, the cost becomes O(n² x k log k) time. This method typically requires creating sorted versions of the strings, which can use O(k) additional space per comparison depending on the implementation.

Even if you optimize by sorting strings before comparing them, the approach is still wasteful if you don’t reuse results. You repeatedly rebuild the same sorted form (the anagram signature) for the same strings across multiple comparisons, thereby repeating the most expensive part of the work.

Optimal solution using Hash Maps 

To overcome repeated comparisons, we need to redefine what it means to “match” two strings. Anagrams become straightforward to group once we convert each word into a canonical signature, a representation that’s guaranteed to be identical for all rearrangements of the same letters. Instead of repeatedly asking whether two strings are anagrams, we compute a single “identity” for each string that captures only its character makeup.

Once every string has a stable identity, grouping becomes a simple collection task. We compute the key for a string and drop it into a hash map bucket for that key. A clean key is the string with its characters sorted: "eat""tea", and "ate" all normalize to "aet", so they automatically end up together under the "aet" entry without any pairwise checking.

Why does this work even with empty strings?

Empty strings have a well-defined key as well: sorting "" still yields "". That means empty strings naturally group together without requiring special-case logic.

Step-by-step algorithm

This algorithm is implemented as follows:

  1. Create a dictionary groups.
  2. For each word in the input string strs:
    1. Initialize a list word_frequency of size 26 with all 0‘s.
    2. For each character in word:
      1. Turn each character into a number from 0 to 25 using ord(character) - ord('a'), where ord() in Python returns the ASCII value of a character (e.g., 97 for 'a'98 for 'b'). This subtraction gives us alphabetical offsets: 'a' maps to 0'b' to 1, etc. Use that number as an index in word_frequency and increment by 1 to count that letter’s occurrence.
      2. Initialize a tuple key by word_frequency, so it can be used as a dictionary key.
    3. Append the string word to groups[key].
  3. Once all words are processed, return the grouped anagrams as a list.

To better understand the solution, let’s walk through the algorithm using the illustration below:

1 / 4

Executable Python implementation 

The code below implements the algorithm described above.

Code for the group anagrams problem

Time complexity

The time complexity of this solution is O(n x k log k), where n is the number of strings and k is the average length of a string. For each string, we sort its characters, which costs O(k log k), and we do this once per string.

Space complexity

The space complexity of this solution is O(n x k), because in the worst case, we store all strings inside the hash map buckets and create keys whose total size can scale with the total input size.

Common pitfalls to avoid

  • Sorting the entire list instead of sorting each string. You need a per-string identifier; sorting strs doesn’t reveal anagram group membership.
  • Using the sorted characters but forgetting to join into a string key. Lists aren’t hashable in Python, so you need a hashable key (like a string or tuple).
  • Assuming output order matters. The problem allows any order, so don’t waste time trying to match a specific ordering unless explicitly required.
  • Not handling empty strings naturally. Avoid special casing, this approach already handles them correctly.

Frequently Asked Questions

Why doesn’t comparing every pair of strings work well?

Because the number of comparisons grows rapidly with input size, and each comparison requires inspecting characters, leading to a large runtime for big lists.

How do I recognize this pattern in an interview?

If you need to group items by an equivalence relationship (like “same letters”), look for a way to compute a key that represents the group.

Is there a faster approach than sorting each string?

Yes. You can build a 26-length character frequency key per string in O(m) time, avoiding sorting, but the canonical “sorted string key” approach is often simpler and still fast enough for these constraints.

Do empty strings or very short strings cause issues?

No. Sorting preserves correctness even for "" or "a", so they group naturally without extra checks.

Would this change if there were uppercase letters or Unicode?

The grouping idea stays the same, but key construction must reflect the character set (sorting still works; frequency arrays may need resizing or a different mapping).

Share with others:

Leave a Reply

Your email address will not be published. Required fields are marked *