Level Up Your Coding Skills & Crack Interviews — Save up to 50% or more on Educative.io Today! Claim Discount

Arrow
Table of contents

389. Find the Difference

Identifying the extra character between two shuffled strings is a neat example of finding a tiny mismatch without getting distracted by order. Even though one string is just the other with one additional letter, repeated characters can make it surprisingly easy to miscount or “match” the wrong occurrence. This problem highlights how using a clean counting-based idea can avoid slow, error-prone scanning and pinpoint the extra character in a single efficient pass.

Problem statement

You’re given two lowercase strings s and t, where t contains all characters of s but shuffled in a random order, plus exactly one extra character inserted somewhere. Your task is to return that extra character.

This is trickier than it sounds because a naive approach often ends up repeatedly scanning for matches or removing characters, which becomes slow and error-prone when there are duplicates (like many repeated letters) and when order is irrelevant.

Constraints:

  • 0 ≤ s.length ≤ 1000
  • t.length == s.length + 1
  • s and t consist of lowercase English letters

Examples

InputOutputExplanation
s = “”, t = “y”“y”Since s is empty, the only character in t is the added one.
s = “a”, t = “aa”“a”t has one extra "a" compared to s.
s = “ab”, t = “bab”“b”Both have “a” and “b”, but t has one additional “b”.
s = “xyz”, t = “xzyw”“w”t is the same letters as s plus extra "w" (order doesn’t matter).
s = “hello”, t = “hleloo”“o”t has all letters of s, plus one extra "o".

Why matching characters one-by-one is inefficient

A straightforward idea is: for each character in t, try to “match” it against a character in s (or remove matched characters from s). If implemented with repeated searches/removals, you can end up doing work proportional to scanning most of the string many times. That repeated work adds up quickly, especially with duplicates where “which occurrence” you remove matters.

Optimized approach using Bitwise Manipulation

Think of each character as a small number (its ASCII/Unicode code). XOR has two key properties that make this trick work:

  • Cancellation: x ^ x = 0
  • Order doesn’t matter (commutative/associative): you can XOR in any order.

Now, since t contains every character from s plus one extra, if you XOR all characters from s and t together, every matching character appears twice overall (once from s, once from t) and cancels out. What remains is only the extra character, because it appears one more time.

Why this works even with duplicates

Duplicates don’t break this at all. If a letter appears, say, 5 times in s and 5 times in t, it cancels out 10 times total across the combined XOR. The only character that doesn’t fully cancel is the one added to t, because it appears one more time than in s.

Solution steps

  1. Initialize an integer accumulator xor_val to 0.
  2. Iterate over each character in s:
    1. Convert the character to its ASCII/Unicode code using ord(ch). XOR it into the accumulator: xor_val ^= ord(ch).
  3. Iterate over each character in t:
    1. Convert the character to its ASCII/Unicode code using ord(ch). XOR it into the accumulator: xor_val ^= ord(ch).
  4. Convert the final xor_val back to a character using chr(xor_val) and return it.

Take a look at the illustration below to understand the solution more clearly.

1 / 10

Python Implementation

Let’s look at the code for the solution we just discussed.

Time complexity

The time complexity is linear in the total number of characters processed. We scan s once and t once, so the work grows proportionally with s.length + t.length, which is O(n) where nnn is the length of t.

Space complexity

The space complexity is O(1) because we only keep a single integer accumulator regardless of input size, and we do not allocate any additional data structures that grow with nnn.

Edge cases

  • Scenario: s is empty
    • Example: s = "", t = "y"
    • Why it works: XOR over an empty string contributes nothing, so the result is just the XOR of t, which leaves "y".
  • Scenario: Added character is a duplicate of an existing one
    • Example: s = "aab", t = "abaa"
    • Why it works: Counts still cancel correctly; the duplicate has one extra occurrence in t, so it remains after XOR cancellation.
  • Scenario: Added character appears at any position (beginning/middle/end)
    • Example: s = "abcd", t = "eabcd"
    • Why it works: XOR is order-independent, so position doesn’t matter.
  • Scenario: All characters are the same
    • Example: s = "zzz", t = "zzzz"
    • Why it works: Three z values cancel with three from t, leaving one extra z.

Common pitfalls

  • Using string removal/search loops that repeatedly scan for matches, leading to unnecessarily slow solutions and tricky duplicate-handling bugs.
  • Sorting both strings works but is heavier than needed and is easy to overlook as suboptimal in interviews.
  • Forgetting to convert characters consistently (mixing characters and ASCII codes) when doing XOR logic.
  • Assuming unique letters and using a set difference approach, which breaks as soon as duplicates exist.

Frequently Asked Questions

Why not just use a set to find the extra character?

A set ignores duplicates. If the added character is the same as an existing one (like adding another "a"), set-based logic can’t reliably detect the extra occurrence.

How do I recognize XOR Cancellation as the right pattern?

When two collections contain the same items except for one extra/missing item, and order is irrelevant, XOR is a strong candidate because it cancels pairs and doesn’t care about ordering.

Is sorting an acceptable approach in interviews?

It’s acceptable, but it’s not the best. Sorting typically costs more time than a single pass. XOR demonstrates stronger optimization instincts.

Can I solve this with counting instead?

Yes. A frequency array of size 26 works well here. XOR is simpler and uses constant space without explicit counting.

Is there any faster-than-linear solution?

No. You must at least read the input to know what character was added, so linear time is optimal.

Share with others:

Leave a Reply

Your email address will not be published. Required fields are marked *