We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Minimum Unique Word Abbreviation

Number: 411

Difficulty: Hard

Paid? Yes

Companies: Google


Problem Description

Given a target string and a dictionary of words, find an abbreviation of the target with the shortest possible length that is NOT a valid abbreviation of any word in the dictionary. An abbreviation is obtained by replacing one or more non‐adjacent substrings with their respective lengths. The “length” of an abbreviation is defined as the count of non‐abbreviated characters plus the number of abbreviated groups. For example, "s10n" is an abbreviation of "substitution" (the preserved letters are “s” and “n” and the number 10 represents the replaced substring “ubstitutio”) with an abbreviation length of 3. The twist is that the chosen abbreviation must not match (i.e. be a valid abbreviation of) any word in the dictionary.


Key Insights

  • Only dictionary words with the same length as the target can possibly conflict.
  • For each conflicting dictionary word, precompute a bit mask indicating the positions where the target and the word differ.
  • Represent an abbreviation pattern as a bit mask where a bit 1 means “keep the original character” and 0 means “abbreviate that character.”
  • An abbreviation pattern is valid against a dictionary word if at least one of the positions where the dictionary word differs from the target is preserved in the abbreviation.
  • The abbreviation’s “length” is computed by counting the kept characters plus the number of consecutive groups of abbreviated characters.
  • Use bit manipulation and DFS/backtracking (or even complete enumeration since m ≤ 21) to efficiently search for the valid abbreviation with minimum length.

Space and Time Complexity

Time Complexity: O(2^m * k * m) in the worst case, where m is the target length and k is the number of dictionary words with length equal to target. (Enumeration is feasible since m ≤ 21.) Space Complexity: O(k), for storing the diff masks for each word.


Solution

We first filter the dictionary words to only those that have the same length as the target. For each such word, we compute a bit mask where each bit indicates whether the character at that position is different from the target. Then, we enumerate all possible abbreviation patterns for the target; each pattern is represented as a bit mask of length m (with m being the target length) where a bit set (1) means we keep the original character and a 0 means that character is replaced by a number. An abbreviation pattern is “conflicting” with a dictionary word if for that word’s diff mask, none of the differing positions is preserved (i.e. the bitwise AND is 0). Therefore, the pattern is valid if for all diff masks, the bitwise AND is not 0. We also compute the abbreviation “length” from the pattern by counting the number of kept characters plus the number of consecutive groups of abbreviated positions. Finally, we return the abbreviation (converted from the bit mask) that has the smallest calculated length. We provide code solutions in several languages using clear variable names and inline comments.


Code Solutions

# Python Solution

def minAbbreviation(target, dictionary):
    n = len(target)
    # Precompute diff masks for every dictionary word with the same length as target
    diff_masks = []
    for word in dictionary:
        if len(word) != n:
            continue
        mask = 0
        for i in range(n):
            # Set bit i if the character in target differs from word's character at i
            if target[i] != word[i]:
                mask |= (1 << i)
        diff_masks.append(mask)
    
    # Helper function to compute the abbreviation length for a given bit mask.
    # The mask indicates positions where the original letter is kept.
    def abbr_len(mask):
        length = 0
        i = 0
        while i < n:
            if mask & (1 << i):
                # Kept letter contributes one character
                length += 1
                i += 1
            else:
                # Abbreviated group: count as one number no matter how many digits
                while i < n and not (mask & (1 << i)):
                    i += 1
                length += 1
        return length

    # Function to convert a mask to its corresponding abbreviation string.
    def to_abbr(mask):
        abbr = []
        i = 0
        while i < n:
            if mask & (1 << i):
                abbr.append(target[i])
                i += 1
            else:
                j = i
                # Find the end of consecutive abbreviated positions
                while j < n and not (mask & (1 << j)):
                    j += 1
                abbr.append(str(j - i))
                i = j
        return "".join(abbr)

    # If no dictionary word has the same length as target, the shortest abbreviation is simply the count.
    if not diff_masks:
        return str(n)
    
    best_mask = None
    best_len = float('inf')
    total_masks = 1 << n
    # Enumerate all possible masks (each representing an abbreviation pattern)
    for mask in range(total_masks):
        valid = True
        # Check this abbreviation against every diff mask from dictionary words.
        for dmask in diff_masks:
            if (mask & dmask) == 0:
                # This pattern does not preserve any differentiating character.
                valid = False
                break
        if valid:
            current_len = abbr_len(mask)
            if current_len < best_len:
                best_len = current_len
                best_mask = mask
                # Early exit if we found the absolute minimum
                if best_len == 1:
                    break
    return to_abbr(best_mask)

# Example Usage:
print(minAbbreviation("apple", ["blade"]))   # Expected output: "a4"
print(minAbbreviation("apple", ["blade", "plain", "amber"]))   # Expected output: one valid abbreviation like "1p3"
← Back to All Questions