We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

HTML Entity Parser

Difficulty: Medium


Problem Description

Implement an HTML entity parser that takes a string input containing HTML code and replaces all specified HTML entities with their corresponding special characters.


Key Insights

  • The problem involves recognizing specific HTML entities and replacing them with their respective characters.
  • Only a limited set of entities need to be processed, which simplifies the parsing logic.
  • The input string can contain other characters that are not entities, and these should remain unchanged.

Space and Time Complexity

Time Complexity: O(n), where n is the length of the input string, since we scan through the string once. Space Complexity: O(1), if we consider the output space as part of the input. Otherwise, O(n) if we count the space for the output string separately.


Solution

The solution utilizes a single pass through the input string, checking for occurrences of HTML entities using a hashmap (or dictionary) to map entities to their corresponding characters. As we iterate through the string, we can build the output string by replacing recognized entities while keeping track of characters that do not correspond to any entities.


Code Solutions

def entityParser(text: str) -> str:
    # Map of HTML entities to their corresponding characters
    html_entities = {
        """: '"',
        "'": "'",
        "&": "&",
        ">": ">",
        "&lt;": "<",
        "&frasl;": "/"
    }
    
    # Initialize an empty output list
    output = []
    i = 0
    
    while i < len(text):
        # Check if the current character starts an entity
        if text[i] == '&':
            # Try to find the entity in the next characters
            j = i
            while j < len(text) and text[j] != ';':
                j += 1
            # If we found a complete entity
            if j < len(text) and text[i:j+1] in html_entities:
                output.append(html_entities[text[i:j+1]])
                i = j + 1  # Move past the entity
                continue
        
        # If no entity was found, just append the current character
        output.append(text[i])
        i += 1
    
    # Join the output list into a final string
    return ''.join(output)
← Back to All Questions