We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Design Excel Sum Formula

Number: 631

Difficulty: Hard

Paid? Yes

Companies: Rippling, Google, OpenAI, Citadel, Opendoor, Microsoft


Problem Description

Design a simple Excel system that supports setting cell values, retrieving cell values, and defining a sum formula that can refer to individual cells and cell ranges. Once a sum formula is defined on a cell, the cell's value should automatically update if any of the cells it depends on are changed.


Key Insights

  • Each cell can hold either an integer or a formula that is a weighted sum of other cells.
  • A dependency graph must be maintained to know which cells depend on a given cell so that updates can propagate.
  • Removing or updating a formula should clear the previous dependency links.
  • Parsing of cell range strings (e.g., "A1:B2") needs to convert to the corresponding set of cell coordinates.
  • Given the small grid size (max 26 rows and 26 columns), direct recomputation via DFS or recursion is efficient.

Space and Time Complexity

Time Complexity: O(R * C) in the worst-case scenario when propagating updates through all dependent formulas (with R ≤ 26, C ≤ 26). Space Complexity: O(R * C) for storing cell values, formulas, and the dependency graph.


Solution

We implement the Excel class using the following approach:

  1. Maintain a 2D matrix to hold the value of each cell.
  2. Use a hash map (or equivalent) to store formulas for cells where the formula is represented by a mapping of dependent cells (and their counts if referenced multiple times).
  3. Construct a dependency graph that maps a cell to the set of cells that depend on it. When a cell is updated, perform a DFS over this graph to recalculate any formulas affected by the change.
  4. For the sum function, parse each input string: if it represents a single cell (e.g., "A1") add that dependency; if it is a range (e.g., "A1:B2"), iterate over the range and add all cells within.
  5. When updating a cell's value (either via set or formula evaluation), clear any existing formula (and its dependency links) for that cell, update its value, and then propagate the change to all dependent cells recursively.
  6. This design avoids issues with circular dependencies (as guaranteed by the problem) and supports efficient updates given the small grid size.

Code Solutions

class Excel:
    def __init__(self, height, width):
        # Convert width (letter) to number of columns (e.g. 'C' -> 3)
        self.rows = height
        self.cols = ord(width) - ord('A') + 1
        # 2D matrix to store cell values; using 1-index internally.
        self.values = [[0] * (self.cols + 1) for _ in range(self.rows + 1)]
        # Formulas stored: key is (row, col), value is a dictionary mapping dependency cell (r, c) to count.
        self.formulas = {}
        # Dependency graph: key is (row, col), value is a set of cells (r, c) that depend on it.
        self.dependents = {}

    def set(self, row, column, val):
        col = ord(column) - ord('A') + 1
        # Clear any previous formula for this cell.
        self._clear_formula(row, col)
        # Set the value.
        self.values[row][col] = val
        # Propagate updates to cells that depend on (row, col).
        self._propagate((row, col))

    def get(self, row, column):
        col = ord(column) - ord('A') + 1
        return self.values[row][col]

    def sum(self, row, column, numbers):
        col = ord(column) - ord('A') + 1
        # Clear any previous formula for this cell.
        self._clear_formula(row, col)
        # Build the new formula dependency dictionary.
        formula_dict = {}
        for s in numbers:
            if ':' in s:
                # Range format e.g., "A1:B2"
                start, end = s.split(':')
                start_row, start_col = self._parse_cell(start)
                end_row, end_col = self._parse_cell(end)
                for r in range(start_row, end_row + 1):
                    for c in range(start_col, end_col + 1):
                        formula_dict[(r, c)] = formula_dict.get((r, c), 0) + 1
            else:
                # Single cell format e.g., "A1"
                r, c = self._parse_cell(s)
                formula_dict[(r, c)] = formula_dict.get((r, c), 0) + 1

        # Save the formula for cell (row, col).
        self.formulas[(row, col)] = formula_dict
        # For each dependency, add (row, col) to its set of dependents.
        for key in formula_dict:
            if key not in self.dependents:
                self.dependents[key] = set()
            self.dependents[key].add((row, col))
        # Calculate the sum for cell (row, col).
        self.values[row][col] = self._calculate_formula(formula_dict)
        # Propagate the update.
        self._propagate((row, col))
        return self.values[row][col]

    def _parse_cell(self, s):
        # Parse a cell coordinate e.g., "A1" -> (1, 1)
        col = ord(s[0]) - ord('A') + 1
        row = int(s[1:])
        return row, col

    def _calculate_formula(self, formula_dict):
        # Compute the sum of the current values from dependencies multiplied by their counts.
        total = 0
        for (r, c), count in formula_dict.items():
            total += self.values[r][c] * count
        return total

    def _clear_formula(self, row, col):
        # If cell has a formula, remove the dependency links.
        if (row, col) in self.formulas:
            formula_dict = self.formulas[(row, col)]
            for key in formula_dict:
                if key in self.dependents and (row, col) in self.dependents[key]:
                    self.dependents[key].remove((row, col))
            del self.formulas[(row, col)]
    
    def _propagate(self, cell):
        # Propagate changes recursively to all cells that depend on the given cell.
        if cell not in self.dependents:
            return
        for dep in list(self.dependents[cell]):
            # Only recalc if the dependent cell is a formula.
            if dep in self.formulas:
                new_val = self._calculate_formula(self.formulas[dep])
                # Only propagate if there's a change.
                if self.values[dep[0]][dep[1]] != new_val:
                    self.values[dep[0]][dep[1]] = new_val
                    self._propagate(dep)

# Example usage:
# excel = Excel(3, "C")
# excel.set(1, "A", 2)
# print(excel.sum(3, "C", ["A1", "A1:B2"]))  # Should print 4
# excel.set(2, "B", 2)
# print(excel.get(3, "C"))  # Should print 6
← Back to All Questions