Roman numeral conversion is a classic string manipulation problem frequently asked in technical interviews at companies like Google, Amazon, and Microsoft. This problem tests your ability to recognize patterns, work with mappings, and handle edge cases involving the subtractive notation used in Roman numerals.
Problem statement
You are given a string s representing a valid Roman numeral. Your task is to convert this Roman numeral into its equivalent integer value.
Roman numerals use seven distinct symbols with fixed values:
| Symbol | Value |
| I | 1 |
| V | 5 |
| X | 10 |
| L | 50 |
| C | 100 |
| D | 500 |
| M | 1000 |
Typically, Roman numerals are written with symbols arranged from largest to smallest, left to right, and their values are added together. For instance, XII equals 12 (10 + 1 + 1), and XXVII equals 27 (10 + 10 + 5 + 1 + 1).
However, Roman numerals also employ a subtractive principle in specific cases to avoid repetition. When a smaller symbol appears immediately before a larger one, we subtract the smaller value instead of adding it. There are exactly six such subtractive combinations:
IbeforeV(5) orX(10) forms 4 or 9XbeforeL(50) orC(100) forms 40 or 90CbeforeD(500) orM(1000) forms 400 or 900
Your goal is to parse the given Roman numeral string and return the corresponding integer.
Constraints
- 1 <=
s.length<= 15 scontains only the characters('I', 'V', 'X', 'L', 'C', 'D', 'M').- It is guaranteed that
sis a valid roman numeral in the range [1, 3999].
Examples
| Input | Output | Explanation |
"VII" | 7 | V = 5, I = 1, I = 1 → 5 + 1 + 1 = 7 |
"XLII" | 42 | XL = 40, I = 1, I = 1 → 40 + 1 + 1 = 42 |
"MDCCLXVI" | 1766 | M = 1000, D = 500, CC = 200, L = 50, X = 10, V = 5, I = 1 → 1766 |
Why the character-by-character approach falls short
A naive way to approach this problem would be to scan the string character by character and apply the additive or subtractive rule at each position by comparing the current symbol’s value with the next symbol’s value. For every character, we’d need to check if it forms part of a subtractive pair by examining whether its value is less than the symbol that follows it. If so, we subtract it; otherwise, we add it.
While this works with O(n) time and O(1) space complexity, it has downsides: repeated conditional checks, look-ahead logic for every character, and manual edge case handling (like the last character having no successor). This makes the code harder to read and maintain, and more prone to off-by-one errors. The approach also ignores that only six subtractive cases exist, which could be precomputed for cleaner, more efficient lookups.
Optimized approach using HashMap
The key insight here is that Roman numeral conversion follows only two patterns: additive and subtractive. Instead of repeatedly comparing adjacent characters during traversal, we can precompute all possible symbol values, both single characters and the six subtractive pairs, and store them in a dictionary. This transforms the problem from a comparison-heavy scan into a simple lookup-and-sum operation.
By including both individual symbols (I, V, X, etc.) and subtractive combinations (IV, IX, XL, XC, CD, CM) in our mapping, we can process the string in a single left-to-right pass. At each step, we check whether the next two characters form a subtractive pair. If they do, we consume both characters at once and add the pair’s value to our result. If not, we process just the current character and move forward by one position.
This approach works universally because the problem guarantees valid input. We never encounter ambiguous or malformed patterns. Even if the string contains complex combinations like MCMXCIV (1994), the dictionary allows us to recognize CM and XC as single logical units, simplifying the parsing logic dramatically.
Step-by-step algorithm
- Create a dictionary,
value_map, that maps every standard Roman symbol (I,V,X,L,C,D,M) and every valid subtractive pair (IV,IX,XL,XC,CD,CM) to their integer equivalents. - Initialize a variable
int_valto 0. This will accumulate the final integer result. - Initialize a pointer
ito 0 to traverse the stringsfrom left to right. - While
iis less than the length ofs:- Extract the two-character substring starting at position
i(i.e.,s[i:i+2]). - Check if this substring exists in
value_map. - If it does, add its value to
int_valand incrementiby 2 (since we’ve processed two characters). - If it doesn’t, add the value of the single character
s[i]toint_valand incrementiby 1.
- Extract the two-character substring starting at position
- After the loop completes, return
int_valas the integer representation of the Roman numeral.
Code implementation
Time complexity
The time complexity of this algorithm is O(n), where n is the length of the input string s. We traverse the string exactly once from left to right, and at each position, we perform a constant-time dictionary lookup to either process one or two characters.
Space complexity
The space complexity is O(1). Although we use a dictionary to store the mappings, its size is fixed at 13 entries (seven individual symbols and six subtractive pairs) and does not grow with the input size.
Common pitfalls and interview tips
- Forgetting to handle the last character: When checking for subtractive pairs, always ensure
i + 1 < len(s)before accessings[i:i+2]. Without this guard, you might try to read beyond the string boundary and cause an index error. - Overlooking all six subtractive cases: Some candidates only remember
IVandIXbut forget aboutXL,XC,CD, andCM. Make sure your dictionary includes all six pairs to handle inputs likeCDXLIVcorrectly. - Assuming all inputs are minimal: Even though the problem guarantees valid Roman numerals, interviewers might ask what happens if the string contains invalid patterns like
IIIIorVV. Be prepared to discuss validation, even if it’s not required in this problem.