Given a string s, find any duplicated (contiguous) substring that occurs at least twice and has the longest possible length. When there is no such duplicate, return an empty string.
Key Insights
Use binary search on the possible lengths of duplicated substrings.
For each candidate length, use a rolling hash (polynomial rolling hash) to efficiently hash every substring of that length.
Use a hash set to check whether a substring hash has been seen before.
Adjust the binary search range based on whether a duplicate was found, ultimately retrieving the longest duplicate substring.
Space and Time Complexity
Time Complexity: O(n log n)
Space Complexity: O(n)
Solution
The solution leverages binary search to determine the maximum length L for which a duplicate substring exists. For each candidate L, we use a rolling hash technique to compute hash values for all substrings of length L in O(n) time. A hash set tracks previously seen substrings by their hash values; if a duplicate exists, we update our answer and attempt a larger L. This approach is efficient for the problem's constraints and handles overlapping duplicates. Key details include careful management of the modulo arithmetic in the rolling hash to avoid collisions and overflow, and possibly using large mod values or double hashing in competitive environments.
Code Solutions
# Define the Solution class with the function longestDupSubstringclassSolution:deflongestDupSubstring(self, s:str)->str: n =len(s)# Convert characters to integers: 'a' -> 0, 'b' -> 1, ..., 'z' -> 25 nums =[ord(c)-ord('a')for c in s] mod =2**63-1# a large modulus to minimize hash collisions base =26# base value for characters# search function returns the starting index of a duplicate substring of length L if it exists, else -1defsearch(L): h =0# Compute the hash for the first substring of length Lfor i inrange(L): h =(h * base + nums[i])% mod
seen ={h}# Precompute base^L % mod for use in rolling hash baseL =pow(base, L, mod)for i inrange(1, n - L +1):# Compute rolling hash in O(1) time h =(h * base - nums[i -1]* baseL + nums[i + L -1])% mod
if h in seen:return i # found duplicate, return starting index seen.add(h)return-1 left, right =1, n
start =-1# Binary search for the maximum length L that has a duplicate substringwhile left <= right: mid = left +(right - left)//2 idx = search(mid)if idx !=-1: left = mid +1# duplicate found, try longer substring start = idx
else: right = mid -1# no duplicate found, decrease lengthreturn s[start:start+left-1]if start !=-1else""