Problem Description
Design and implement a search autocomplete system for a search engine. The system takes a set of historical sentences with their corresponding occurrence counts. As the user types character by character, the system suggests the top three historical sentences that share the same prefix as the current input. When the user inputs the special character '#' it signifies completion of the sentence; it is then stored in the system, and an empty list is returned.
Key Insights
- Use a Trie to efficiently store and retrieve sentences based on their prefixes.
- Maintain a mapping at each Trie node to record sentences (or their frequencies) that pass through that node.
- Use sorting or a min-heap to select the top three sentences based on their frequency (hot degree) and lexicographical order if frequencies tie.
- On receiving '#', update the Trie with the current sentence and reset the current search state.
Space and Time Complexity
Time Complexity: O(L * log k) per character input query, where L is the length of the current prefix and k is 3 (constant factor), thus ensuring efficient autocompletion. Space Complexity: O(N * L), where N is the number of unique sentences and L is the average sentence length, for maintaining the Trie.
Solution
The solution revolves around building a Trie where each node contains:
- A mapping to its child nodes.
- A hash map that records all sentences passing through the node along with their occurrence counts.
For each character entered (except '#'), the system:
- Traverses the Trie to find the current prefix node.
- Retrieves stored sentences at that node and sorts them based on the following criteria: descending by frequency and ascending lexicographically if frequencies are equal.
- Returns the top three sentences from the sorted result.
When the user inputs '#', the current sentence (tracked as user input so far) is added or its frequency updated in the Trie, and the input state resets.
Code Solutions
Below are implementations in Python, JavaScript, C++, and Java with clear line-by-line comments.