Featured Post
5 Python Pandas Tricky Examples for Data Analysis
- Get link
- X
- Other Apps
#1 Dealing with datetime data (parse_dates pandas example)
import pandas as pd
# Convert a column to datetime format
data['date_column'] = pd.to_datetime(data['date_column'])
# Extract components from datetime (e.g., year, month, day)
data['year'] = data['date_column'].dt.year
data['month'] = data['date_column'].dt.month
# Calculate the time difference between two datetime columns
data['time_diff'] = data['end_time'] - data['start_time']
#2 Working with text data
# Convert text to lowercase
data['text_column'] = data['text_column'].str.lower()
# Count the occurrences of specific words in a text column
data['word_count'] = data['text_column'].str.count('word')
# Extract information using regular expressions
data['extracted_info'] = data['text_column'].str.extract(r'(\d+)')
#3 Handling large datasets efficiently
# Read a large dataset in chunks
chunk_size = 100000
data_chunks = pd.read_csv('large_data.csv', chunksize=chunk_size)
# Process data in chunks
for chunk in data_chunks:
# Perform calculations or manipulations on each chunk
# Append data from multiple files
file_list = ['file1.csv', 'file2.csv', 'file3.csv']
combined_data = pd.concat([pd.read_csv(file) for file in file_list])
#4 Pivot tables and reshaping data
# Create a pivot table
pivot_table = data.pivot_table(values='column2', index='column1', columns='column3', aggfunc='mean')
# Unstack a multi-index DataFrame
unstacked_data = pivot_table.unstack().reset_index()
# Melt a DataFrame from wide to long format
melted_data = pd.melt(data, id_vars=['id'], value_vars=['var1', 'var2'], var_name='variable', value_name='value')
#5 Efficient memory usage
# Optimize memory usage of DataFrame columns
data['numeric_column'] = pd.to_numeric(data['numeric_column'], downcast='integer')
data['category_column'] = data['category_column'].astype('category')
# Load a subset of columns from a large dataset
selected_columns = ['column1', 'column2', 'column3']
data_subset = pd.read_csv('large_data.csv', usecols=selected_columns)
These examples demonstrate more advanced techniques for handling datetime data, text data, large datasets, reshaping data, and optimizing memory usage. They highlight some of the powerful features that pandas provide for complex data analysis tasks.
Related
Comments
Post a Comment
Thanks for your message. We will get back you.