Level Up Your Coding Skills & Crack Interviews — Save up to 50% or more on Educative.io Today! Claim Discount

Arrow
Table of contents

288. Unique Word Abbreviation

This is a common hashing interview problem because it tests whether we can precompute a hashed lookup once and then answer many queries effectively. It commonly appears in screening rounds where interviewers want to see clean use of a hash map, careful handling of collisions, and attention to performance constraints such as large dictionaries and repeated lookups.

Problem statement

We are given a list word, and a dictionary of lowercase words. We are also given a rule for abbreviating a word: take its first letter, then the count of letters between the first and last letter, then the last letter. For example:

  • "calendar" can be abbreviated as "c6r" ("c alenda r" replacing six characters "alenda" with their length 6.)
  • "dog" can be abbreviated as "d1g" ("d o g" replacing one character "o" with its length 1.)
  • Words of length 1 or 2 abbreviate to themselves such as "ox" will be abbreviated as "ox".

We must implement a ValidWordAbbr class that contains the following functions:

  • Constructor: ValidWordAbbr(String[] dictionary) which initializes the object using dictionary.
  • boolean isUnique(string word): For each word, return True if the following conditions hold, otherwise return False:
    • No word in the dictionary has the same abbreviation as the given word.
    • The only dictionary word(s) with that abbreviation are exactly the same as the given word.

Constraints:

  • 1 <= dictionary.length <= 3 * 104
  • 1 <= dictionary[i].length <= 20
  • dictionary[i] consists of lowercase English letters.
  • 1 <= word.length <= 20
  • word consists of lowercase English letters.
  • At most 5000 calls will be made to isUnique.

Examples

DictionaryWordOutputExplanation
[“lamp”, “limp”, “like”][“lump”]FalseThe words in the dictionary have the following abbreviations:"lamp" -> l2p""limp" -> l2p""like" -> l2e"The abbreviation of the given word"lump" is l2p.Although two words from the dictionary "lamp" and "limp" have the same abbreviation as "lump", but they are not the same words. So the given word is not unique.
[“cake”, “card”][“cake”]TrueThe words in the dictionary have the following abbreviations:"cake" -> "c2e""card" -> "c2d"The abbreviation of the given word"cake" is "c2e".The dictionary contains a word with the same abbreviation, and that word is also "cake". As no other dictionary word shares the same abbreviation "c2e", so the given word is unique.
[“cake”, “card”, “cake”][“cake”]TrueThe words in the dictionary have the following abbreviations:"cake" -> "c2e""card" -> "c2d""cake" -> "c2e"The abbreviation of the given word"cake" is "c2e".The dictionary contains multiple words with the same abbreviation, and those words are also "cake". As no other dictionary word shares the same abbreviation "c2e", so the given word is unique.
[“stone”, “stare”, “score”][“stare”]FalseThe words in the dictionary have the following abbreviations:"stone" -> "s3e""stare" -> "s3e""score" -> "s3e"The abbreviation of the given word"stare" is "s3e".Because multiple different words from the dictionary have the same abbreviation as "stare", so the given word is not unique.

Naive approach

A naive approach would compute the abbreviation for the query word, then iterate through all dictionary words, compute their abbreviations, and check for conflicts. With up to 30000 dictionary words and 5000 queries, this yields 150 million operations, which is far too slow. The repeated work of computing abbreviations and scanning the entire dictionary for every query is wasteful.

Intuition

The key insight is that uniqueness is determined by the word’s abbreviation, not by the word itself. Therefore, for any query word, we only need to check which dictionary words share its abbreviation.

For a given abbreviation, there are three possible cases:

  1. No dictionary word uses this abbreviation. In this case, the query word is unique.
  2. Only the query word itself uses this abbreviation. In this case, the query word is unique.
    1. This holds True even if the same word appears multiple times in the dictionary, as long as no other different word shares this abbreviation.
  3. At least one different dictionary word shares this abbreviation (either a single distinct word or multiple distinct words): In this case, the query word is not unique.

This means the problem reduces to maintaining a preprocessed abbreviation-to-words lookup, so each query can be answered by checking only the abbreviation’s associated dictionary word(s), rather than scanning the full dictionary.

Optimized approach using HashMap

The key to solving this problem is preprocessing with a hash map of sets. Instead of checking abbreviations during each query, we precompute all abbreviations during initialization and store them so that each abbreviation maps to the actual words it represents.

A hash map alone is insufficient because we need to determine whether an abbreviation belongs to any word other than the query word itself. If we only count occurrences of abbreviations, we cannot distinguish between the same word appearing twice and two different words sharing an abbreviation.

For example:

  • If the dictionary contains ["cake", "cake"] and we query "cake", the count of abbreviation "c2e" is 2, and the result should be True.
  • If the dictionary contains ["cake", "cane"] and we query "cake", the count of abbreviation "c2e" is also 2, but the result should be False.

A count-based approach cannot differentiate between these scenarios. By using a set as the hash map’s value, we track the actual words rather than just their frequency, allowing us to handle dictionary duplicates correctly and perform O(1) lookups to check if conflicting words exist.

This leads naturally to a two-phase implementation:

  1. During initialization, we iterate through the dictionary and compute the abbreviation for each word. We store these in a hash map, where the key is the abbreviation and the value of the hash map is a set of all dictionary words that use that abbreviation.
  2. When checking if a query word is unique, we compute its abbreviation and look it up in the hash map. The word is unique if either the abbreviation is not in the hash map, or the set associated with that abbreviation contains only the query word and no other words.

Step-by-step algorithm

Constructor: The constructor initializes the data structure by preprocessing the dictionary and building the abbreviation map by performing the following operations:

  1. Initialize an empty hash map abbr_map to store abbreviations as keys and sets of words as values.
  2. Iterate through each word in the dictionary, and for each word:
    1. Compute its abbreviation using a helper function _get_abbreviation and store the result in a variable abbr.
    2. If the abbreviation abbr is not a key in the abbr_map:
      1. Create a new entry in the hash map.
    3. Insert the current word into the set associated with its abbreviation abbr. If the word already exists in the set, it will not be added again as sets automatically handle duplicates.

Helper method _get_abbreviation(self, word: str): This function computes the abbreviation for a given word following the specified rules:

  1. Store the word length in a variable n.
  2. If the word has 2 or fewer characters:
    1. Return the word itself as its abbreviation.
  3. Compute the abbreviation for longer words by concatenating the first character, the count of middle characters (computed as n - 2), and the last character.

Query method isUnique(word): This function checks whether a given word has a unique abbreviation relative to the dictionary.

  1. Compute the abbreviation of the query word by using the helper method _get_abbreviation.
  2. If the abbreviation does not exist in the abbr_map:
    1. No dictionary word shares this abbreviation, so return True.
  3. Retrieve the set of all dictionary words that have the same abbreviation and store it in words_with_abbr.
  4. If the set, words_with_abbr, contains exactly one word, and that word is the query word itself, return True, otherwise, return False.

To better understand the solution, let’s walk through the algorithm using the illustration below:

1 / 5

Code implementation

Let’s look at the following code to understand its implementation.

Code for the Unique Word Abbreviation problem

Time complexity

The time complexity of this solution is O(n × m), where n is the number of words in the dictionary and m is the average word length. We process each word once, computing its abbreviation (O(m) for string slicing and concatenation) and adding it to a set (O(1)). Building the hash map takes O(n) insertions overall.

isUnique method: O(m) where m is the length of the query word. Computing the abbreviation takes O(m) time, and hash map lookup and set membership check both take O(1). The length check on the set is O(1) as well.

Space complexity

The space complexity of this solution is O(n × m), where n is the number of unique words in the dictionary and m is the average word length. In the worst case, every dictionary word has a unique abbreviation, so we store all words in sets across the hash map. Each word is stored once, and each abbreviation key requires O(1) space since abbreviations are bounded by the maximum word length.

Common pitfalls and interview tips

  1. Forgetting to handle duplicates in the dictionary: If the same word appears multiple times, a list-based approach may incorrectly treat it as “multiple different words.” Using a set prevents false negatives.
  2. Incorrect abbreviation for short words: Length 1–2 words abbreviate to themselves, not to something like a0b.
  3. Misreading “unique”: A word can be considered unique even if its abbreviation exists, as long as the only dictionary word with that abbreviation is the same word.

Frequently Asked Questions

What if the dictionary is huge and memory is tight?

You can store an abbreviation: a single representative word plus a boolean “conflict” flag (meaning multiple distinct words exist). This reduces space usage compared to storing full sets.

Can we support dynamic updates (add/remove words)?

Yes, but you need counts per word per abbreviation (e.g., abbreviation → map(word → count)) to handle deletions correctly.

Share with others:

Leave a Reply

Your email address will not be published. Required fields are marked *