Identifying the extra character between two shuffled strings is a neat example of finding a tiny mismatch without getting distracted by order. Even though one string is just the other with one additional letter, repeated characters can make it surprisingly easy to miscount or “match” the wrong occurrence. This problem highlights how using a clean counting-based idea can avoid slow, error-prone scanning and pinpoint the extra character in a single efficient pass.
Problem statement
You’re given two lowercase strings s and t, where t contains all characters of s but shuffled in a random order, plus exactly one extra character inserted somewhere. Your task is to return that extra character.
Constraints:
- 0 ≤
s.length≤ 1000 t.length == s.length + 1sandtconsist of lowercase English letters
Examples
| Input | Output | Explanation |
s = “”, t = “y” | “y” | Since s is empty, the only character in t is the added one. |
s = “a”, t = “aa” | “a” | t has one extra "a" compared to s. |
s = “ab”, t = “bab” | “b” | Both have “a” and “b”, but t has one additional “b”. |
s = “xyz”, t = “xzyw” | “w” | t is the same letters as s plus extra "w" (order doesn’t matter). |
s = “hello”, t = “hleloo” | “o” | t has all letters of s, plus one extra "o". |
Why matching characters one-by-one is inefficient
A straightforward idea is: for each character in t, try to “match” it against a character in s (or remove matched characters from s). If implemented with repeated searches/removals, you can end up doing work proportional to scanning most of the string many times. That repeated work adds up quickly, especially with duplicates where “which occurrence” you remove matters.
Optimized approach using Bitwise Manipulation
Think of each character as a small number (its ASCII/Unicode code). XOR has two key properties that make this trick work:
- Cancellation:
x ^ x = 0 - Order doesn’t matter (commutative/associative): you can XOR in any order.
Now, since t contains every character from s plus one extra, if you XOR all characters from s and t together, every matching character appears twice overall (once from s, once from t) and cancels out. What remains is only the extra character, because it appears one more time.
Solution steps
- Initialize an integer accumulator
xor_valto0. - Iterate over each character in
s:- Convert the character to its ASCII/Unicode code using
ord(ch). XOR it into the accumulator:xor_val ^= ord(ch).
- Convert the character to its ASCII/Unicode code using
- Iterate over each character in
t:- Convert the character to its ASCII/Unicode code using
ord(ch). XOR it into the accumulator:xor_val ^= ord(ch).
- Convert the character to its ASCII/Unicode code using
- Convert the final
xor_valback to a character usingchr(xor_val)and return it.
Take a look at the illustration below to understand the solution more clearly.
Python Implementation
Let’s look at the code for the solution we just discussed.
Time complexity
The time complexity is linear in the total number of characters processed. We scan s once and t once, so the work grows proportionally with s.length + t.length, which is O(n) where nnn is the length of t.
Space complexity
The space complexity is O(1) because we only keep a single integer accumulator regardless of input size, and we do not allocate any additional data structures that grow with nnn.
Edge cases
- Scenario:
sis empty- Example:
s = "",t = "y" - Why it works: XOR over an empty string contributes nothing, so the result is just the XOR of
t, which leaves"y".
- Example:
- Scenario: Added character is a duplicate of an existing one
- Example:
s = "aab",t = "abaa" - Why it works: Counts still cancel correctly; the duplicate has one extra occurrence in
t, so it remains after XOR cancellation.
- Example:
- Scenario: Added character appears at any position (beginning/middle/end)
- Example:
s = "abcd",t = "eabcd" - Why it works: XOR is order-independent, so position doesn’t matter.
- Example:
- Scenario: All characters are the same
- Example:
s = "zzz",t = "zzzz" - Why it works: Three
zvalues cancel with three fromt, leaving one extraz.
- Example:
Common pitfalls
- Using string removal/search loops that repeatedly scan for matches, leading to unnecessarily slow solutions and tricky duplicate-handling bugs.
- Sorting both strings works but is heavier than needed and is easy to overlook as suboptimal in interviews.
- Forgetting to convert characters consistently (mixing characters and ASCII codes) when doing XOR logic.
- Assuming unique letters and using a set difference approach, which breaks as soon as duplicates exist.