0.5 C
Washington

How to Handle Time Zones and Timestamps Accurately with Pandas

Image by Author | Midjourney
 
Time-based data can be unique when we face different time-zones. However, interpreting timestamps can be hard because of these differences. This guide will help you manage time zones and timestamps with the Pandas library in Python.
 
Preparation
 
In this tutorial, we’ll use the Pandas package. We can install the package using the following code.

 
Now, we’ll explore how to work with time-based data in Pandas with practical examples. 
Handling Time Zones and Timestamps with Pandas
 
Time data is a unique dataset that provides a time-specific reference for events. The most accurate time data is the timestamp, which contains detailed information about time from year to millisecond.
Let’s start by creating a sample dataset.

import pandas as pd

data = {
‘transaction_id’: [1, 2, 3],
‘timestamp’: [‘2023-06-15 12:00:05’, ‘2024-04-15 15:20:02’, ‘2024-06-15 21:17:43’],
‘amount’: [100, 200, 150]
}

df = pd.DataFrame(data)
df[‘timestamp’] = pd.to_datetime(df[‘timestamp’])

 
The ‘timestamp’ column in the example above contains time data with second-level precision. To convert this column to a datetime format, we should use the pd.to_datetime function.”
Afterward, we can make the datetime data timezone-aware. For example, we can convert the data to Coordinated Universal Time (UTC)

df[‘timestamp_utc’] = df[‘timestamp’].dt.tz_localize(‘UTC’)
print(df)

 

Output>>
transaction_id timestamp amount timestamp_utc
0 1 2023-06-15 12:00:05 100 2023-06-15 12:00:05+00:00
1 2 2024-04-15 15:20:02 200 2024-04-15 15:20:02+00:00
2 3 2024-06-15 21:17:43 150 2024-06-15 21:17:43+00:00

 
The ‘timestamp_utc’ values contain much information, including the time-zone. We can convert the existing time-zone to another one. For example, I used the UTC column and changed it to the Japan Timezone.

df[‘timestamp_japan’] = df[‘timestamp_utc’].dt.tz_convert(‘Asia/Tokyo’)
print(df)

 

Output>>>
transaction_id timestamp amount timestamp_utc \
0 1 2023-06-15 12:00:05 100 2023-06-15 12:00:05+00:00
1 2 2024-04-15 15:20:02 200 2024-04-15 15:20:02+00:00
2 3 2024-06-15 21:17:43 150 2024-06-15 21:17:43+00:00

timestamp_japan
0 2023-06-15 21:00:05+09:00
1 2024-04-16 00:20:02+09:00
2 2024-06-16 06:17:43+09:00

 
We could filter the data according to a particular time-zone with this new time-zone. For example, we can filter the data using Japan time.

start_time_japan = pd.Timestamp(‘2024-06-15 06:00:00′, tz=’Asia/Tokyo’)
end_time_japan = pd.Timestamp(‘2024-06-16 07:59:59′, tz=’Asia/Tokyo’)

filtered_df = df[(df[‘timestamp_japan’] >= start_time_japan) & (df[‘timestamp_japan’] <= end_time_japan)]

print(filtered_df)

 

Output>>>
transaction_id timestamp amount timestamp_utc \
2 3 2024-06-15 21:17:43 150 2024-06-15 21:17:43+00:00

timestamp_japan
2 2024-06-16 06:17:43+09:00

 
Working with time-series data would allow us to perform time-series resampling. Let’s look at an example of data resampling hourly for each column in our dataset.

resampled_df = df.set_index(‘timestamp_japan’).resample(‘H’).count()

 
Leverage Pandas’ time-zone data and timestamps to take full advantage of its features.
 
Additional Resources
 

  
Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.

━ more like this

Newbury BS cuts resi, expat, landlord rates by up to 30bps  – Mortgage Strategy

Newbury Building Society has cut fixed-rate offers by up to 30 basis points across a range of mortgage products including standard residential, shared...

Rate and Term Refinances Are Up a Whopping 300% from a Year Ago

What a difference a year makes.While the mortgage industry has been purchase loan-heavy for several years now, it could finally be starting to shift.A...

Goldman Sachs loses profit after hits from GreenSky, real estate

Second-quarter profit fell 58% to $1.22 billion, or $3.08 a share, due to steep declines in trading and investment banking and losses related to...

Building Data Science Pipelines Using Pandas

Image generated with ChatGPT   Pandas is one of the most popular data manipulation and analysis tools available, known for its ease of use and powerful...

#240 – Neal Stephenson: Sci-Fi, Space, Aliens, AI, VR & the Future of Humanity

Podcast: Play in new window | DownloadSubscribe: Spotify | TuneIn | Neal Stephenson is a sci-fi writer (Snow Crash, Cryptonomicon, and new book Termination...