This is a common hashing interview problem because it tests whether we can precompute a hashed lookup once and then answer many queries effectively. It commonly appears in screening rounds where interviewers want to see clean use of a hash map, careful handling of collisions, and attention to performance constraints such as large dictionaries and repeated lookups.
Problem statement
We are given a list word, and a dictionary of lowercase words. We are also given a rule for abbreviating a word: take its first letter, then the count of letters between the first and last letter, then the last letter. For example:
"calendar"can be abbreviated as"c6r"("calendar"replacing six characters"alenda"with their length 6.)"dog"can be abbreviated as"d1g"("dog"replacing one character"o"with its length 1.)- Words of length 1 or 2 abbreviate to themselves such as
"ox"will be abbreviated as"ox".
We must implement a ValidWordAbbr class that contains the following functions:
- Constructor:
ValidWordAbbr(String[] dictionary)which initializes the object usingdictionary. boolean isUnique(string word):For eachword, returnTrueif the following conditions hold, otherwise returnFalse:- No word in the
dictionaryhas the same abbreviation as the givenword. - The only
dictionaryword(s) with that abbreviation are exactly the same as the givenword.
- No word in the
Constraints:
- 1 <=
dictionary.length<= 3 * 104 - 1 <=
dictionary[i].length<= 20 dictionary[i]consists of lowercase English letters.- 1 <=
word.length<= 20 wordconsists of lowercase English letters.- At most 5000 calls will be made to
isUnique.
Examples
| Dictionary | Word | Output | Explanation |
| [“lamp”, “limp”, “like”] | [“lump”] | False | The words in the dictionary have the following abbreviations:"lamp" -> l2p""limp" -> l2p""like" -> l2e"The abbreviation of the given word: "lump" is l2p.Although two words from the dictionary "lamp" and "limp" have the same abbreviation as "lump", but they are not the same words. So the given word is not unique. |
| [“cake”, “card”] | [“cake”] | True | The words in the dictionary have the following abbreviations:"cake" -> "c2e""card" -> "c2d"The abbreviation of the given word: "cake" is "c2e".The dictionary contains a word with the same abbreviation, and that word is also "cake". As no other dictionary word shares the same abbreviation "c2e", so the given word is unique. |
| [“cake”, “card”, “cake”] | [“cake”] | True | The words in the dictionary have the following abbreviations:"cake" -> "c2e""card" -> "c2d""cake" -> "c2e"The abbreviation of the given word: "cake" is "c2e".The dictionary contains multiple words with the same abbreviation, and those words are also "cake". As no other dictionary word shares the same abbreviation "c2e", so the given word is unique. |
| [“stone”, “stare”, “score”] | [“stare”] | False | The words in the dictionary have the following abbreviations:"stone" -> "s3e""stare" -> "s3e""score" -> "s3e"The abbreviation of the given word: "stare" is "s3e".Because multiple different words from the dictionary have the same abbreviation as "stare", so the given word is not unique. |
Naive approach
A naive approach would compute the abbreviation for the query word, then iterate through all dictionary words, compute their abbreviations, and check for conflicts. With up to 30000 dictionary words and 5000 queries, this yields 150 million operations, which is far too slow. The repeated work of computing abbreviations and scanning the entire dictionary for every query is wasteful.
Intuition
The key insight is that uniqueness is determined by the word’s abbreviation, not by the word itself. Therefore, for any query word, we only need to check which dictionary words share its abbreviation.
For a given abbreviation, there are three possible cases:
- No
dictionaryword uses this abbreviation. In this case, the querywordis unique. - Only the query
worditself uses this abbreviation. In this case, the querywordis unique.- This holds
Trueeven if the same word appears multiple times in thedictionary, as long as no other different word shares this abbreviation.
- This holds
- At least one different
dictionaryword shares this abbreviation (either a single distinct word or multiple distinct words): In this case, the query word is not unique.
This means the problem reduces to maintaining a preprocessed abbreviation-to-words lookup, so each query can be answered by checking only the abbreviation’s associated dictionary word(s), rather than scanning the full dictionary.
Optimized approach using HashMap
The key to solving this problem is preprocessing with a hash map of sets. Instead of checking abbreviations during each query, we precompute all abbreviations during initialization and store them so that each abbreviation maps to the actual words it represents.
A hash map alone is insufficient because we need to determine whether an abbreviation belongs to any word other than the query word itself. If we only count occurrences of abbreviations, we cannot distinguish between the same word appearing twice and two different words sharing an abbreviation.
For example:
- If the
dictionarycontains["cake", "cake"]and we query"cake", the count of abbreviation"c2e"is 2, and the result should beTrue. - If the
dictionarycontains["cake", "cane"]and we query"cake", the count of abbreviation"c2e"is also 2, but the result should beFalse.
A count-based approach cannot differentiate between these scenarios. By using a set as the hash map’s value, we track the actual words rather than just their frequency, allowing us to handle dictionary duplicates correctly and perform O(1) lookups to check if conflicting words exist.
This leads naturally to a two-phase implementation:
- During initialization, we iterate through the
dictionaryand compute the abbreviation for each word. We store these in a hash map, where the key is the abbreviation and the value of the hash map is a set of alldictionarywords that use that abbreviation. - When checking if a query
wordis unique, we compute its abbreviation and look it up in the hash map. The word is unique if either the abbreviation is not in the hash map, or the set associated with that abbreviation contains only the querywordand no other words.
Step-by-step algorithm
Constructor: The constructor initializes the data structure by preprocessing the dictionary and building the abbreviation map by performing the following operations:
- Initialize an empty hash map
abbr_mapto store abbreviations as keys and sets of words as values. - Iterate through each word in the
dictionary, and for each word:- Compute its abbreviation using a helper function
_get_abbreviationand store the result in a variableabbr. - If the abbreviation
abbris not a key in theabbr_map:- Create a new entry in the hash map.
- Insert the current
wordinto the set associated with its abbreviationabbr. If thewordalready exists in the set, it will not be added again as sets automatically handle duplicates.
- Compute its abbreviation using a helper function
Helper method _get_abbreviation(self, word: str): This function computes the abbreviation for a given word following the specified rules:
- Store the
wordlength in a variablen. - If the
wordhas2or fewer characters:- Return the
worditself as its abbreviation.
- Return the
- Compute the abbreviation for longer words by concatenating the first character, the count of middle characters (computed as
n - 2), and the last character.
Query method isUnique(word): This function checks whether a given word has a unique abbreviation relative to the dictionary.
- Compute the abbreviation of the query
wordby using the helper method_get_abbreviation. - If the abbreviation does not exist in the
abbr_map:- No
dictionaryword shares this abbreviation, so returnTrue.
- No
- Retrieve the set of all
dictionarywords that have the same abbreviation and store it inwords_with_abbr. - If the set,
words_with_abbr, contains exactly one word, and that word is the queryworditself, returnTrue, otherwise, returnFalse.
To better understand the solution, let’s walk through the algorithm using the illustration below:
Code implementation
Let’s look at the following code to understand its implementation.
Code for the Unique Word Abbreviation problem
Time complexity
The time complexity of this solution is O(n × m), where n is the number of words in the dictionary and m is the average word length. We process each word once, computing its abbreviation (O(m) for string slicing and concatenation) and adding it to a set (O(1)). Building the hash map takes O(n) insertions overall.
isUnique method: O(m) where m is the length of the query word. Computing the abbreviation takes O(m) time, and hash map lookup and set membership check both take O(1). The length check on the set is O(1) as well.
Space complexity
The space complexity of this solution is O(n × m), where n is the number of unique words in the dictionary and m is the average word length. In the worst case, every dictionary word has a unique abbreviation, so we store all words in sets across the hash map. Each word is stored once, and each abbreviation key requires O(1) space since abbreviations are bounded by the maximum word length.
Common pitfalls and interview tips
- Forgetting to handle duplicates in the dictionary: If the same word appears multiple times, a list-based approach may incorrectly treat it as “multiple different words.” Using a set prevents false negatives.
- Incorrect abbreviation for short words: Length 1–2 words abbreviate to themselves, not to something like
a0b. - Misreading “unique”: A word can be considered unique even if its abbreviation exists, as long as the only dictionary word with that abbreviation is the same word.