Why does the algorithm need multiple passes with different starting offsets?

Because the words have a fixed length, the string can be decomposed into word-sized blocks in word_len different ways depending on where the first block starts. A valid concatenation might begin at any character position, not just multiples of the word length. Running one pass per offset from 0 to word_len - 1 ensures that every possible alignment is considered.

How does this problem differ from a standard two-pointer sliding window?

In a typical sliding window problem, the pointers advance one element at a time. Here, both pointers advance by word_len characters at a time because the fundamental unit being tracked is a word, not a character. The window size is measured in words rather than characters, and the frequency comparison is between word counts rather than character counts.

Can this approach handle words that appear multiple times in the array?

Yes. The Counter built from words naturally records how many times each word is required. During the sliding window pass, the seen counter tracks actual occurrences, and the shrinking logic ensures that no word exceeds its allowed frequency. This handles duplicates correctly without any special-case logic.

Would a hash-based approach like Rabin-Karp improve performance here?

Rolling hashes could theoretically reduce the cost of substring comparison from O(word_len) to O(1) per step. However, given the constraints (string length up to 10^3 and word length up to 30), the practical benefit is negligible. The sliding window approach with direct string slicing is already efficient enough and avoids the complexity of hash collision handling.

30. Substring With Concatenation Of All Words

This problem trips up many candidates because it sits at the intersection of string manipulation and frequency counting, wrapped inside a sliding window that moves in fixed-size chunks rather than one character at a time. The uniform word length is the key structural insight that most people overlook on their first attempt. Once you recognize that every word occupies the same number of characters, the problem transforms from a combinatorial nightmare into a disciplined window-based scan. Mastering this pattern builds strong intuition for problems where fixed-size blocks and frequency matching intersect.

Problem statement

You are given a string s and an array of strings words, where every string in words has the same length.

A concatenated substring is formed by joining every word in words exactly once, in any order, with no extra characters between them. For instance, if words = ["ab", "cd", "ef"], then "abcdef", "cdabef", "efcdab", and all other permutations of these three words joined together are valid concatenated substrings. However, something like "acdbef" is not valid because it cannot be decomposed into the original words in any arrangement.

Your goal is to find every starting index in s where such a concatenated substring begins. Return the list of starting indices in any order.

Constraints

1 ≤ s.length ≤ 10^3
1 ≤ words.length ≤ 1000
1 ≤ words[i].length ≤ 30
s and words[i] consist of lowercase English letters

Examples

Input	Output	Explanation
s = “aabbccaabbcc”, words = [“aa”,”bb”,”cc”]	[0, 6]	The substring at index 0 is “aabbcc” (aa + bb + cc), and the substring at index 6 is also “aabbcc”, both valid concatenations.
s = “abcdefgh”, words = [“xyz”,”abc”]	[]	The word “xyz” never appears anywhere in the string, so no valid concatenation of length 6 exists.
s = “aaaaaa”, words = [“aa”,”aa”,”aa”]	[0]	The entire string “aaaaaa” starting at index 0 splits into “aa”, “aa”, “aa”, matching the required word frequencies exactly.
s = “catdogcatdogcat”, words = [“cat”,”dog”]	[0, 3, 6, 9]	Starting at indices 0, 3, 6, and 9, the length-6 substrings “catdog”, “dogcat”, “catdog”, and “dogcat” are each valid concatenations of “cat” and “dog”.
s = “abab”, words = [“ab”]	[0, 2]	With a single word “ab”, the substrings at index 0 and index 2 each match.
s = “xyzxyzxyzxyz”, words = [“xyz”,”xyz”]	[0, 3, 6]	Substrings of length 6 at indices 0, 3, and 6 are all “xyzxyz”, a valid concatenation of the two required copies of “xyz”.

Why enumerating all possibilities doesn’t scale

The most direct approach would be to generate every permutation of the words array, join each permutation into a single string, and then search for each of those strings inside s. If there are k words, there are k! permutations. Even with just 10 words, that is over 3.6 million permutations, each requiring a substring search across s. With up to 1000 words, this approach is astronomically expensive and completely impractical.

A slightly better but still inefficient approach checks every possible starting position in s, extracts a substring of the required total length, and then repeatedly slices it into word-sized chunks while comparing against the frequency map. This avoids generating permutations but still performs redundant work. Each new starting position rebuilds the frequency map from scratch, even though consecutive windows share nearly all their content. The overlapping computation makes this quadratic in practice when the string is long relative to the word size.

The core issue in both approaches is the failure to reuse information from one window position to the next. Any efficient solution must find a way to incrementally update the state as the window shifts.

Optimized approach

The uniform word length is what makes this problem tractable. Because every word occupies exactly word_len characters, the string can be thought of as a sequence of word-sized blocks. A sliding window of exactly num_words blocks can glide across these blocks, and at each step, only one block leaves the window and one block enters. This means the frequency map can be updated incrementally rather than rebuilt.

There is a subtlety: the blocks depend on where you start. If word_len is 3, starting at index 0 gives blocks at positions 0, 3, 6, 9, …, while starting at index 1 gives blocks at 1, 4, 7, 10, …. These are entirely different decompositions. So the algorithm runs word_len separate passes, one for each starting offset from 0 to word_len - 1, and within each pass, it maintains a single sliding window.

During each pass, the window tracks a seen counter that records how many times each word appears in the current window. When a new word enters from the right:

If the word is not in the target word list, the window is invalid and must be completely reset.
If the word is valid but appears more times than required, the left boundary advances until the excess is eliminated.
If the window reaches the exact total length needed, the left index is recorded as a valid answer.

This strategy processes each character of s a constant number of times per offset, yielding efficient performance.

Why the optimized approach works well

The sliding window avoids redundant computation by carrying forward the frequency state from one position to the next. Instead of re-examining all words in the window every time it shifts, only the entering and exiting words are processed. The word_len separate passes guarantee that every possible alignment is covered, and within each pass, the two pointers only move forward, preventing any backtracking or repeated scanning.

The frequency map comparison is also handled incrementally. Rather than comparing two full maps at every step, the algorithm maintains a running count and only checks whether the window has reached the target length. This keeps each window adjustment at O(1) amortized cost (excluding the word slicing itself), making the overall approach both time-efficient and straightforward to implement.

Step-by-step algorithm

Compute word_len as the length of the first word, num_words as the total count of words, and total_len as word_len * num_words.
Build a frequency map word_count from the words array, counting how many times each word should appear.
Initialize an empty result list.
Iterate over each starting offset i from 0 to word_len - 1.
For each offset, set left = i, right = i, and create an empty counter seen.
While right + word_len does not exceed the length of s, extract the word s[right:right + word_len] and advance right by word_len.
If the extracted word exists in word_count, increment its count in seen. If its count in seen exceeds the allowed count in word_count, advance left by word_len repeatedly, decrementing the exiting word’s count each time, until the overcount is resolved.
If the extracted word does not exist in word_count, clear the seen counter and move left to right to fully reset the window.
After updating the window, check if right - left equals total_len. If so, append left to the result list.
After all offsets are processed, return the result list.

1 / 17

Python implementation

from collections import Counter

def findSubstring(s: str, words: list[str]) -> list[int]:
    # Length of each individual word
    word_len = len(words[0])
    # Total number of words to concatenate
    num_words = len(words)
    # Total character length of a valid concatenated substring
    total_len = word_len * num_words
    # Frequency map of required words
    word_count = Counter(words)
    # Stores all valid starting indices
    result = []

    # Try each possible alignment offset within one word length
    for i in range(word_len):
        left = i
        right = i
        seen = Counter()

        # Slide the window across the string in word-sized steps
        while right + word_len <= len(s):
            # Extract the next word-sized chunk
            word = s[right:right + word_len]
            right += word_len

            if word in word_count:
                # Count the word in the current window
                seen[word] += 1

                # Shrink window from the left if this word appears too often
                while seen[word] > word_count[word]:
                    left_word = s[left:left + word_len]
                    seen[left_word] -= 1
                    left += word_len

                # Window matches the full concatenation length
                if right - left == total_len:
                    result.append(left)
            else:
                # Word not in target list, reset the window entirely
                seen.clear()
                left = right

    return result


if __name__ == "__main__":
    test_cases = [
        ("aabbccaabbcc", ["aa", "bb", "cc"]),
        ("abcdefgh", ["xyz", "abc"]),
        ("aaaaaa", ["aa", "aa", "aa"]),
        ("catdogcatdogcat", ["cat", "dog"]),
        ("xyzxyzxyzxyz", ["xyz", "xyz"]),
    ]

    for i, (s, words) in enumerate(test_cases, 1):
        result = findSubstring(s, words)
        print(f"Test {i}")
        print(f"  String: {s}")
        print(f"  Words:  {words}")
        print(f"  Output: {result}")
        print("-" * 60)

Code for Substring With Concatenation Of All Words

Time complexity

The algorithm runs word_len separate passes over the string. Within each pass, both the left and right pointers traverse the string from their starting offset to the end, and each pointer only moves forward. Extracting a word-sized substring costs O(word_len) due to string slicing. Across all offsets, every character in s is visited as part of a word extraction a constant number of times. This gives an overall time complexity of O(n * m), where n is the length of s and m is the length of each word. The inner while loop that shrinks the window does not add extra cost because each word can only be added and removed from the window once per pass.

Space complexity

The algorithm uses two Counter objects: word_count, which stores up to k unique words from the input, and seen, which stores at most k unique words at any point during a pass. Here k is the number of distinct words in the words array. The result list stores at most O(n / m) indices. No recursion is used, so there is no stack overhead. The overall space complexity is O(k), where k is the number of unique words.

Edge cases

Single word in the array: The window only needs to match one word at a time, so the algorithm reduces to finding all occurrences of that single word in s.
All words are identical: The frequency map has one entry with a count greater than one. The sliding window correctly tracks how many times the repeated word appears and ensures it matches the required count exactly.
Total concatenation length exceeds the string length: When word_len * num_words > len(s), no valid substring can exist. The while loop condition right + word_len <= len(s) naturally prevents any processing, and the result is an empty list.
No words from the array appear in the string: Every extracted word fails the word in word_count check, causing the window to reset continuously. The result is an empty list.

Common pitfalls

Forgetting to iterate over all starting offsets: If the algorithm only starts from offset 0, it misses valid concatenations that begin at positions not aligned to the first block boundary. For example, with word_len = 3 and a valid match starting at index 1, only the offset-1 pass would catch it.
Resetting the entire window when a word is merely overcounted: When a valid word appears one too many times, the correct action is to shrink the window from the left until the count is valid again, not to clear everything. Clearing the window discards valid partial matches unnecessarily.
Comparing full frequency maps at every step: Building and comparing two complete maps each time the window shifts adds unnecessary overhead. The incremental approach of maintaining seen and checking window size is both simpler and faster.
Not handling duplicate words in the array: If words contains the same string multiple times, the algorithm must track exact counts rather than just presence. Using a set instead of a counter would incorrectly accept windows where a duplicate word appears only once.

30. Substring With Concatenation Of All Words

Problem statement

Constraints

Examples

Why enumerating all possibilities doesn’t scale

Optimized approach

Why the optimized approach works well

Step-by-step algorithm

Python implementation

Time complexity

Space complexity

Edge cases

Common pitfalls

Frequently Asked Questions

Leave a Reply Cancel reply

30. Substring With Concatenation Of All Words

17. Letter Combinations Of A Phone Number

3. Longest Substring Without Repeating Characters

19. Remove Nth Node From End Of List

14. Longest Common Prefix

25. Reverse Nodes In K-group

15. 3sum

22. Generate Parentheses

23. Merge K Sorted Lists

136. Single Number