Problem Description
Given a string s, find the length of its longest self-contained substring. A substring t (with t ≠ s) is defined as self-contained if for every character in t, none of its other occurrences exist in s outside t. If no such substring exists, return -1.
Key Insights
- A self-contained substring t must include all occurrences of each character that appears in t; that is, for each letter c in t, the first and last occurrence of c in s are both inside t.
- You can precompute the first and last occurrence indexes for each character.
- When exploring a candidate substring starting at index i, you can “grow” the candidate until it covers the entire interval for every character it has seen.
- After computing a candidate interval [i, j], ensure that for every position k within [i, j] the first occurrence of s[k] is not before i. This guarantees no letter in the candidate occurs before it.
- The candidate must not equal the entire string s.
Space and Time Complexity
Time Complexity: O(n^2) in the worst case (though in practice, with only 26 letters and early breaks, the average runtime is much lower). Space Complexity: O(1) extra space (aside from the output and the fixed table for the 26 letters).
Solution
We first precompute the first and last occurrence indices for every character in s. Then, for every index i that is the first occurrence of its character, we try to build a candidate substring starting at i. Initialize j as the last occurrence of s[i] and then iterate the candidate from i to j; for each index k in the candidate, update j = max(j, lastOccurrence(s[k])). While iterating, if you find any s[k] whose first occurrence is before i, then the candidate is invalid because that character appears outside the candidate. Finally, if the candidate [i, j] is valid and does not span the entire string s, update your answer with its length. This greedy “expansion” guarantees that for every character in the substring, all its occurrences are captured inside the candidate interval.
Code Solutions
Below are implementations in Python, JavaScript, C++, and Java with line-by-line comments.