We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Find the Shortest Superstring

Difficulty: Hard


Problem Description

Given an array of strings words, return the smallest string that contains each string in words as a substring. If there are multiple valid strings of the smallest length, return any of them. You may assume that no string in words is a substring of another string in words.


Key Insights

  • The problem requires constructing the shortest superstring that contains all given strings.
  • A brute force approach would involve trying all permutations of the strings, which is inefficient.
  • The optimal approach involves using a greedy algorithm combined with bit manipulation, where we keep track of used words and calculate overlaps.
  • Overlapping is crucial to minimize the length of the resultant string.

Space and Time Complexity

Time Complexity: O(n^2 * 2^n) - where n is the number of strings, due to the combination of substring overlaps and permutations. Space Complexity: O(n * 2^n) - for storing the dynamic programming states.


Solution

To solve the problem, we can use a bitmask dynamic programming approach. We define a DP table where dp[mask] represents the shortest superstring that can be formed using the words represented by the mask. The mask is a bit representation of which words have been used. We also maintain a function to calculate the overlap between two strings to minimize their combined length when concatenating them.

  1. Calculate the overlap between each pair of strings.
  2. Use a bitmask to represent the state of used strings.
  3. Update the DP table by trying to add each unused string to the current combination.
  4. The final result will be the shortest superstring found in the DP table.

Code Solutions

def findShortestSuperstring(words):
    n = len(words)
    
    # Helper function to calculate the overlap between two strings
    def overlap(a, b):
        max_overlap = 0
        for i in range(1, min(len(a), len(b)) + 1):
            if a[-i:] == b[:i]:
                max_overlap = i
        return max_overlap

    # Precompute all overlaps
    overlap_matrix = [[0] * n for _ in range(n)]
    for i in range(n):
        for j in range(n):
            if i != j:
                overlap_matrix[i][j] = overlap(words[i], words[j])

    # DP to find the shortest superstring
    dp = [""] * (1 << n)
    for i in range(1 << n):
        dp[i] = None  # Initialize

    dp[0] = ""  # Base case

    for mask in range(1 << n):
        for i in range(n):
            if mask & (1 << i) == 0:  # If i is not used
                next_mask = mask | (1 << i)
                new_string = ""
                if dp[mask] is not None:
                    new_string = dp[mask] + words[i]
                    if dp[next_mask] is None or len(dp[next_mask]) > len(new_string):
                        dp[next_mask] = new_string
                else:
                    continue

                # Consider overlaps
                if dp[mask] is not None:
                    for j in range(n):
                        if mask & (1 << j) != 0:
                            overlap_length = overlap_matrix[j][i]
                            combined_string = dp[mask][:len(dp[mask]) - overlap_length] + words[i]
                            if dp[next_mask] is None or len(dp[next_mask]) > len(combined_string):
                                dp[next_mask] = combined_string

    # Result is the shortest superstring
    return min(dp[1 << n - 1:], key=len)
← Back to All Questions