Problem Description
Given a string s, a subsequence is called good if it is non-empty and the frequency (count) of every character in the subsequence is the same. In other words, if the subsequence contains one or more distinct letters, each letter must appear exactly m times (with m ≥ 1). Return the total number of good subsequences in s modulo 10^9 + 7.
For example, if s = "aabb", then there are 11 good subsequences.
Key Insights
- A subsequence consisting of only one type of character is always good. For a letter with frequency f, any non-empty combination of its occurrences (2^f – 1 ways) forms a good subsequence.
- For subsequences having two or more distinct characters, every selected letter must appear exactly m times (for some m >= 1). For each letter c with frequency f[c] (and f[c] ≥ m), the number of ways to pick exactly m occurrences is C(f[c], m).
- For a fixed m, consider all 26 letters. For each letter, if f[c] ≥ m, it contributes a factor (1 + C(f[c], m)) (the “1” representing the option not to choose that letter). Taking the product over all letters results in a total count (for common frequency m) that includes:
- The empty selection (choosing “not” for every letter).
- Selections where exactly one letter is chosen (which are already counted in the one-letter case).
- Thus, for each m from 1 to max(frequency among letters), subtract the “empty” case and all one-letter selections from the product. Then add the one-letter subsequences separately.
Space and Time Complexity
Time Complexity: O(U · M) where U is the number of unique letters (at most 26) and M is the maximum frequency among letters (up to 10^4), so overall around 26 · 10^4 iterations. Space Complexity: O(M) for the precomputation of factorials and inverse factorials (up to the maximum frequency).
Solution
The solution works in two main parts:
-
Count subsequences that contain exactly one distinct letter. For each letter c with frequency f, the number of non-empty subsequences is (2^f - 1).
-
Count multi-letter good subsequences. For each possible common frequency m (from 1 to the maximum frequency among any letter):
- For each letter c with frequency f[c] ≥ m, we have an option to either take it with exactly m occurrences (in C(f[c], m) ways) or not use it.
- The total number of ways for common frequency m is the product over all letters of (if f[c] ≥ m then 1 + C(f[c], m) else 1).
- This product counts the empty selection and all cases where only one letter is chosen. Hence, subtract the sum of individual contributions (for each letter with f[c] ≥ m, subtract C(f[c], m)) and also subtract 1 for the empty case.
- The remainder is the count of good subsequences (with at least two distinct letters) for that m.
Finally, sum the counts from (1) and (2) and return the answer modulo 10^9+7.
A key detail is to precompute factorials and inverse factorials (using Fermat’s Little Theorem) up to the maximum frequency among letters, so we can compute combinations C(n, m) efficiently.