We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Count the Number of Good Subsequences

Number: 2683

Difficulty: Medium

Paid? Yes

Companies: Palantir Technologies, Nvidia, Oracle, TuSimple


Problem Description

Given a string s, a subsequence is called good if it is non-empty and the frequency (count) of every character in the subsequence is the same. In other words, if the subsequence contains one or more distinct letters, each letter must appear exactly m times (with m ≥ 1). Return the total number of good subsequences in s modulo 10^9 + 7.

For example, if s = "aabb", then there are 11 good subsequences.


Key Insights

  • A subsequence consisting of only one type of character is always good. For a letter with frequency f, any non-empty combination of its occurrences (2^f – 1 ways) forms a good subsequence.
  • For subsequences having two or more distinct characters, every selected letter must appear exactly m times (for some m >= 1). For each letter c with frequency f[c] (and f[c] ≥ m), the number of ways to pick exactly m occurrences is C(f[c], m).
  • For a fixed m, consider all 26 letters. For each letter, if f[c] ≥ m, it contributes a factor (1 + C(f[c], m)) (the “1” representing the option not to choose that letter). Taking the product over all letters results in a total count (for common frequency m) that includes:
    • The empty selection (choosing “not” for every letter).
    • Selections where exactly one letter is chosen (which are already counted in the one-letter case).
  • Thus, for each m from 1 to max(frequency among letters), subtract the “empty” case and all one-letter selections from the product. Then add the one-letter subsequences separately.

Space and Time Complexity

Time Complexity: O(U · M) where U is the number of unique letters (at most 26) and M is the maximum frequency among letters (up to 10^4), so overall around 26 · 10^4 iterations. Space Complexity: O(M) for the precomputation of factorials and inverse factorials (up to the maximum frequency).


Solution

The solution works in two main parts:

  1. Count subsequences that contain exactly one distinct letter. For each letter c with frequency f, the number of non-empty subsequences is (2^f - 1).

  2. Count multi-letter good subsequences. For each possible common frequency m (from 1 to the maximum frequency among any letter):

    • For each letter c with frequency f[c] ≥ m, we have an option to either take it with exactly m occurrences (in C(f[c], m) ways) or not use it.
    • The total number of ways for common frequency m is the product over all letters of (if f[c] ≥ m then 1 + C(f[c], m) else 1).
    • This product counts the empty selection and all cases where only one letter is chosen. Hence, subtract the sum of individual contributions (for each letter with f[c] ≥ m, subtract C(f[c], m)) and also subtract 1 for the empty case.
    • The remainder is the count of good subsequences (with at least two distinct letters) for that m.

Finally, sum the counts from (1) and (2) and return the answer modulo 10^9+7.

A key detail is to precompute factorials and inverse factorials (using Fermat’s Little Theorem) up to the maximum frequency among letters, so we can compute combinations C(n, m) efficiently.


Code Solutions

# Python solution with detailed comments
MOD = 10**9 + 7

def modexp(a, b, mod):
    # Fast modular exponentiation: compute (a^b) % mod
    result = 1
    a = a % mod
    while b:
        if b & 1:
            result = (result * a) % mod
        a = (a * a) % mod
        b //= 2
    return result

def precompute_factorials(n, mod):
    # Precompute factorials and inverse factorials for 0 <= i <= n.
    fact = [1] * (n + 1)
    invfact = [1] * (n + 1)
    for i in range(1, n + 1):
        fact[i] = (fact[i - 1] * i) % mod
    invfact[n] = modexp(fact[n], mod - 2, mod)  # Fermat's little theorem
    for i in range(n - 1, -1, -1):
        invfact[i] = (invfact[i + 1] * (i + 1)) % mod
    return fact, invfact

def nCr(n, r, fact, invfact, mod):
    # Compute combination C(n, r) modulo mod
    if r < 0 or r > n:
        return 0
    return (fact[n] * invfact[r] % mod) * invfact[n - r] % mod

def countGoodSubsequences(s):
    # Count frequency of each letter
    freq = {}
    for ch in s:
        freq[ch] = freq.get(ch, 0) + 1
    # Maximum frequency among all letters
    max_freq = max(freq.values())
    
    # Precompute factorials and inverse factorials up to max_freq.
    fact, invfact = precompute_factorials(max_freq, MOD)
    
    total_good = 0
    
    # Part 1: Count one-letter good subsequences for each letter.
    for ch in freq:
        f = freq[ch]
        # Count all non-empty subsequences: 2^f - 1
        total_good = (total_good + modexp(2, f, MOD) - 1) % MOD
    
    # Part 2: Count good subsequences with two or more distinct letters.
    # For each possible common frequency m from 1 to max_freq:
    for m in range(1, max_freq + 1):
        prod = 1
        sum_one_letter = 0  # To subtract one-letter selection cases for this m
        for ch in "abcdefghijklmnopqrstuvwxyz":
            f = freq.get(ch, 0)
            if f >= m:
                choose_exact = nCr(f, m, fact, invfact, MOD)
                # Option: either do not include ch or include exactly m copies
                prod = (prod * (1 + choose_exact)) % MOD
                sum_one_letter = (sum_one_letter + choose_exact) % MOD
            else:
                prod = (prod * 1) % MOD  # No change if current letter cannot contribute
                
        # Remove the empty selection and all one-letter selections.
        ways_multi = (prod - sum_one_letter - 1) % MOD
        total_good = (total_good + ways_multi) % MOD
        
    return total_good

# Example usage:
if __name__ == "__main__":
    s = "aabb"  # Example input
    print(countGoodSubsequences(s))
← Back to All Questions