We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Reshape Data: Pivot

Difficulty: Easy


Problem Description

Write a solution to pivot the data so that each row represents temperatures for a specific month, and each city is a separate column.


Key Insights

  • The problem requires transforming a DataFrame where each city has multiple rows for different months into a format where each city is represented as a column.
  • The pivot operation typically requires aggregating data based on distinct values in specific columns.
  • The resulting table should maintain a clear relationship between months and temperatures across different cities.

Space and Time Complexity

Time Complexity: O(n), where n is the number of entries in the input DataFrame. Space Complexity: O(m * c), where m is the number of unique months and c is the number of unique cities.


Solution

To solve the problem, we will use the pivot operation on the DataFrame. The original DataFrame will be transformed such that:

  • The month column becomes the index of the new DataFrame.
  • Each unique city becomes a column in the new DataFrame.
  • The values in the DataFrame will be filled with the corresponding temperature for each city and month combination.

This can be efficiently achieved using pandas in Python, which provides a built-in method for pivoting DataFrames.


Code Solutions

import pandas as pd

# Sample input DataFrame
data = {
    'city': ['Jacksonville', 'Jacksonville', 'Jacksonville', 'Jacksonville', 'Jacksonville',
             'ElPaso', 'ElPaso', 'ElPaso', 'ElPaso', 'ElPaso'],
    'month': ['January', 'February', 'March', 'April', 'May',
              'January', 'February', 'March', 'April', 'May'],
    'temperature': [13, 23, 38, 5, 34, 20, 6, 26, 2, 43]
}

df = pd.DataFrame(data)

# Pivot the DataFrame
result = df.pivot(index='month', columns='city', values='temperature').reset_index()

# Optionally, rename the columns
result.columns.name = None  # Remove the name of the columns index
print(result)
← Back to All Questions