We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Statistics from a Large Sample

Difficulty: Medium


Problem Description

You are given a large sample of integers in the range [0, 255]. Since the sample is so large, it is represented by an array count where count[k] is the number of times that k appears in the sample. Calculate the following statistics: minimum, maximum, mean, median, and mode. Return the statistics of the sample as an array of floating-point numbers [minimum, maximum, mean, median, mode]. Answers within 10^-5 of the actual answer will be accepted.


Key Insights

  • The range of possible integers is limited to [0, 255], allowing efficient counting.
  • The mode is guaranteed to be unique, simplifying the mode calculation.
  • Mean is straightforward, but median requires careful consideration of odd/even sample sizes.
  • We need to iterate through the count array to derive the necessary statistics.

Space and Time Complexity

Time Complexity: O(1) - The time complexity is constant as the size of the input (count) is fixed at 256. Space Complexity: O(1) - We use a small, constant amount of additional space.


Solution

To solve the problem, we will utilize an array of size 256 to represent the count of each integer from 0 to 255. The following steps outline the algorithm:

  1. Minimum and Maximum: Iterate through the count array to find the first and last indices with non-zero counts.
  2. Mean: Calculate the total sum of all elements by multiplying each index by its count and then divide by the total number of elements.
  3. Median: Traverse the count array to find the median by counting elements until reaching the midpoint.
  4. Mode: Identify the index with the maximum count to determine the mode.

This approach ensures that we efficiently calculate all required statistics using a single pass through the count array when needed.


Code Solutions

def getStatistics(count):
    total_elements = sum(count)
    total_sum = sum(i * count[i] for i in range(256))
    
    minimum = next(i for i in range(256) if count[i] > 0)
    maximum = next(i for i in range(255, -1, -1) if count[i] > 0)
    
    mean = total_sum / total_elements
    
    # Find median
    mid = total_elements // 2
    current_count = 0
    if total_elements % 2 == 1:  # Odd
        for i in range(256):
            current_count += count[i]
            if current_count > mid:
                median = i
                break
    else:  # Even
        for i in range(256):
            current_count += count[i]
            if current_count >= mid:
                first_middle = i
                break
        for i in range(first_middle, 256):
            current_count += count[i]
            if current_count >= mid + 1:
                second_middle = i
                break
        median = (first_middle + second_middle) / 2
    
    # Find mode
    mode = count.index(max(count))
    
    return [minimum, maximum, mean, median, mode]
← Back to All Questions