Python Pandas Module Cheatsheet
Aus MattWiki
Import Pandas Module
import pandas as pd
Creating Dataframes
From a Dictionary
df1 = pd.DataFrame({
'name': ['John Smith', 'Jane Doe', 'Joe Schmo'],
'address': ['123 Main St.', '456 Maple Ave.', '789 Broadway'],
'age': [34, 28, 51]
})
From a List
df2 = pd.DataFrame([
['John Smith', '123 Main St.', 34],
['Jane Doe', '456 Maple Ave.', 28],
['Joe Schmo', '789 Broadway', 51]
],
columns=['name', 'address', 'age'])
From a CSV File
df3 = pd.read_csv('sample.csv')
Viewing Dataframes
Show top lines
print(df.head()) # print first 5 lines
print(df.head(10) # print first 10 lines
Get Informations about Dataframe Data
print(df.info())
Prints Number of records, and name, datatype and number of filled lines of each individual column.
Select one Column
Given a dataframe there are two ways to return a column:
column_way1 = df['columnname']
# this only works if the columnname has no special characters and spaces
column_way2 = df.columnname
The type of the returned value from a selected column is called a Series, if only one column was selected.
print(type(column_way1))
# returns:
# <class 'pandas.core.series.Series'>
Select Multiple Columns
Selecting columns 3 and 7 from a dataframe with multiple columns:
new_df = df[['column3', 'column7']]
Note: Double set of brackets [[]]
is mandatory.
The type of the returned value from two selected columns is a DataFrame.
print(type(new_df))
# returns:
# <class 'pandas.core.frame.DataFrame'>
Select one Row
DataFrames are zero-indexed. This returns the third row from the DataFrame:
new_df = df.iloc[2]
Select Multiple Rows
Rows are zero-indexed:
# from 3rd row (zero-indexed) to but not incl. 7th row
new_df = df.iloc[3:7]
# from zeroth row to but not incl. 4th row
new_df = df.iloc[:4]
# from 3rd to last row to the final row
new_df = df.iloc[-3:]
Select Multiple Rows with logical statement
new_df = df[df.a_column == 'desired value']
new_df = df[df.a_column != 'desired value']
new_df = df[(df.a_column == 'desired value') |
(df.a_column == 'another desired value')]
new_df = df[df.a_column.isin(['desired value','another desired value'])]
new_df = df[df.b_column > 5]