This cheat code provides a quick overview of essential and advanced Pandas operations for data manipulation. Use these snippets to handle, clean, and analyze your data efficiently.
1. Import Pandas
import pandas as pd
2. Create a DataFrame
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)
3. Read and Write CSV Files
# Read CSV
df = pd.read_csv('file.csv')
# Write to CSV
df.to_csv('output.csv', index=False)
4. Inspect Data
print(df.head()) # First 5 rows
print(df.info()) # Data types and summary
5. Select Columns
age_column = df['Age']
subset = df[['Name', 'Age']]
6. Filter Rows
filtered = df[df['Age'] > 25]
7. Add and Remove Columns
# Add a column
df['Salary'] = [50000, 60000]
# Drop a column
df = df.drop(columns=['Salary'])
8. Sorting
# Sort by age
df_sorted = df.sort_values(by='Age', ascending=False)
9. GroupBy and Aggregation
# Group by and calculate mean
grouped = df.groupby('Age').mean()
10. Handle Missing Data
# Fill missing values
df['Age'] = df['Age'].fillna(0)
# Drop rows with missing values
df = df.dropna()
11. Merge and Join
# Merge two DataFrames
df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2], 'Score': [90, 80]})
merged = pd.merge(df1, df2, on='ID')
12. Concatenate DataFrames
concat = pd.concat([df1, df2], axis=0)
13. Apply Functions
# Apply a custom function
df['AgeSquared'] = df['Age'].apply(lambda x: x ** 2)
14. Pivot Table
# Create a pivot table
pivot = df.pivot_table(values='Score', index='Name', aggfunc='mean')
15. Iterating Over Rows
for index, row in df.iterrows():
print(row['Name'], row['Age'])
16. Reset Index
df = df.reset_index(drop=True)
17. Set Index
df = df.set_index('Name')
18. String Operations
# Convert names to lowercase
df['Name'] = df['Name'].str.lower()
19. Convert Data Types
# Convert Age to float
df['Age'] = df['Age'].astype(float)
20. Save to Excel
df.to_excel('output.xlsx', index=False)
21. Load Excel File
df = pd.read_excel('file.xlsx')
22. Handle Duplicate Rows
# Drop duplicates
df = df.drop_duplicates()
23. Rename Columns
df = df.rename(columns={'Name': 'FullName'})
24. Check for Null Values
null_check = df.isnull().sum()
25. Get Column Statistics
mean_age = df['Age'].mean()
sum_age = df['Age'].sum()