LeetCode Solutions Blog

Longest Duplicate Substring

Number: 1122

Difficulty: Hard

Paid? No

Problem Description

Given a string s, find any duplicated (contiguous) substring that occurs at least twice and has the longest possible length. When there is no such duplicate, return an empty string.

Key Insights

Use binary search on the possible lengths of duplicated substrings.
For each candidate length, use a rolling hash (polynomial rolling hash) to efficiently hash every substring of that length.
Use a hash set to check whether a substring hash has been seen before.
Adjust the binary search range based on whether a duplicate was found, ultimately retrieving the longest duplicate substring.

Space and Time Complexity

Time Complexity: O(n log n)
Space Complexity: O(n)

Solution

The solution leverages binary search to determine the maximum length L for which a duplicate substring exists. For each candidate L, we use a rolling hash technique to compute hash values for all substrings of length L in O(n) time. A hash set tracks previously seen substrings by their hash values; if a duplicate exists, we update our answer and attempt a larger L. This approach is efficient for the problem's constraints and handles overlapping duplicates. Key details include careful management of the modulo arithmetic in the rolling hash to avoid collisions and overflow, and possibly using large mod values or double hashing in competitive environments.

Code Solutions

# Define the Solution class with the function longestDupSubstring
class Solution:
    def longestDupSubstring(self, s: str) -> str:
        n = len(s)
        # Convert characters to integers: 'a' -> 0, 'b' -> 1, ..., 'z' -> 25
        nums = [ord(c) - ord('a') for c in s]
        mod = 2**63 - 1  # a large modulus to minimize hash collisions
        base = 26  # base value for characters
        
        # search function returns the starting index of a duplicate substring of length L if it exists, else -1
        def search(L):
            h = 0
            # Compute the hash for the first substring of length L
            for i in range(L):
                h = (h * base + nums[i]) % mod
            seen = {h}
            # Precompute base^L % mod for use in rolling hash
            baseL = pow(base, L, mod)
            for i in range(1, n - L + 1):
                # Compute rolling hash in O(1) time
                h = (h * base - nums[i - 1] * baseL + nums[i + L - 1]) % mod
                if h in seen:
                    return i  # found duplicate, return starting index
                seen.add(h)
            return -1
        
        left, right = 1, n
        start = -1
        # Binary search for the maximum length L that has a duplicate substring
        while left <= right:
            mid = left + (right - left) // 2
            idx = search(mid)
            if idx != -1:
                left = mid + 1  # duplicate found, try longer substring
                start = idx
            else:
                right = mid - 1  # no duplicate found, decrease length
        return s[start:start+left-1] if start != -1 else ""