We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Design Log Storage System

Number: 635

Difficulty: Medium

Paid? Yes

Companies: Amazon, Snap


Problem Description

Design a log storage system that supports inserting log entries with unique IDs and timestamp strings, and retrieving the IDs of all logs whose timestamps fall within a specified range (inclusive). The retrieval is based on a given granularity, i.e. Year, Month, Day, Hour, Minute, or Second, which determines the level of precision for the range query.


Key Insights

  • Store logs in a simple data structure such as a list or array.
  • Convert the timestamps into strings; due to fixed-length and zero padding, lexicographical string comparisons are valid.
  • Use a mapping to determine the slice length for each granularity, e.g. "Year" corresponds to the first 4 characters, "Month" corresponds to the first 7 characters, etc.
  • During retrieval, compare the relevant substring of each stored timestamp against the start and end query timestamp slices.
  • The inclusive nature of the query simplifies checking conditions.

Space and Time Complexity

Time Complexity: O(N) per retrieval query, where N is the number of logs stored (since each log is checked). In the worst case, all log entries are examined. Space Complexity: O(N) where N is the number of logs stored.


Solution

We implement a LogSystem class that supports putting log entries and retrieving them using a specified granularity.

  • Data Structures:
    • Use a list/array to store log entries as tuples (or objects) containing the timestamp and log ID.
    • A dictionary/map is used to store the mapping from granularity string to the substring index (e.g., "Year" => 4, "Month" => 7, etc.).
  • For the put() method, simply append the new log entry into the list.
  • For the retrieve() method, compute the substring length based on the provided granularity. Then iterate over all stored logs and compare the sliced timestamp against the sliced start and end query values. If it falls within the bounds, include the log ID in the result.
  • Since the timestamps have fixed width with leading zeros, slicing and lexicographical comparisons are both valid and simple.

Code Solutions

class LogSystem:
    def __init__(self):
        # List to store logs as (timestamp, id) tuples
        self.logs = []
        # Mapping granularity to index for the substring (fixed format: Year:Month:Day:Hour:Minute:Second)
        self.granularity_to_index = {
            "Year": 4,
            "Month": 7,
            "Day": 10,
            "Hour": 13,
            "Minute": 16,
            "Second": 19
        }

    def put(self, id, timestamp):
        # Append the log as a tuple
        self.logs.append((timestamp, id))

    def retrieve(self, start, end, granularity):
        # Determine the slicing index based on granularity
        idx = self.granularity_to_index[granularity]
        # Slice the given start and end timestamps
        start_slice = start[:idx]
        end_slice = end[:idx]
        result = []
        # Iterate over all logs and check if the timestamp slice is within the range
        for ts, log_id in self.logs:
            ts_slice = ts[:idx]
            if start_slice <= ts_slice <= end_slice:
                result.append(log_id)
        return result

# Example usage:
# logSystem = LogSystem()
# logSystem.put(1, "2017:01:01:23:59:59")
# logSystem.put(2, "2017:01:01:22:59:59")
# logSystem.put(3, "2016:01:01:00:00:00")
# print(logSystem.retrieve("2016:01:01:01:01:01", "2017:01:01:23:00:00", "Year"))   # Output: [3, 2, 1]
# print(logSystem.retrieve("2016:01:01:01:01:01", "2017:01:01:23:00:00", "Hour"))  # Output: [2, 1]
← Back to All Questions