Python Pandas Module Cheatsheet

Import Pandas Module

import pandas as pd

Creating Dataframes

From a Dictionary

df1 = pd.DataFrame({
    'name': ['John Smith', 'Jane Doe', 'Joe Schmo'],
    'address': ['123 Main St.', '456 Maple Ave.', '789 Broadway'],
    'age': [34, 28, 51]
})

From a List

df2 = pd.DataFrame([
    ['John Smith', '123 Main St.', 34],
    ['Jane Doe', '456 Maple Ave.', 28],
    ['Joe Schmo', '789 Broadway', 51]
    ],
    columns=['name', 'address', 'age'])

From a CSV File

df3 = pd.read_csv('sample.csv')

Viewing Dataframes

Show top lines

print(df.head())     # print first 5 lines
print(df.head(10)    # print first 10 lines

Get Informations about Dataframe Data

print(df.info())

Prints Number of records, and name, datatype and number of filled lines of each individual column.

Select one Column

Given a dataframe there are two ways to return a column:

column_way1 = df['columnname']

# this only works if the columnname has no special characters and spaces
column_way2 = df.columnname

The type of the returned value from a selected column is called a Series, if only one column was selected.

print(type(column_way1))

# returns:
# <class 'pandas.core.series.Series'>

Select Multiple Columns

Selecting columns 3 and 7 from a dataframe with multiple columns:

new_df = df[['column3', 'column7']]

Note: Double set of brackets [[]] is mandatory. The type of the returned value from two selected columns is a DataFrame.

print(type(new_df))

# returns: 
# <class 'pandas.core.frame.DataFrame'>

Select one Row

DataFrames are zero-indexed. This returns the third row from the DataFrame:

new_df = df.iloc[2]

Select Multiple Rows

Rows are zero-indexed:

# from 3rd row (zero-indexed) to but not incl. 7th row
new_df = df.iloc[3:7]

# from zeroth row to but not incl. 4th row
new_df = df.iloc[:4]

# from 3rd to last row to the final row
new_df = df.iloc[-3:]

Select Multiple Rows with logical statement

new_df = df[df.a_column == 'desired value']
new_df = df[df.a_column != 'desired value']

new_df = df[(df.a_column == 'desired value') |
            (df.a_column == 'another desired value')]

new_df = df[df.a_column.isin(['desired value','another desired value'])]


new_df = df[df.b_column > 5]

Anonym

Suche

Python Pandas Module Cheatsheet

Namensräume

Mehr

Seitenaktionen

Inhaltsverzeichnis

Import Pandas Module

Creating Dataframes

From a Dictionary

From a List

From a CSV File

Viewing Dataframes

Show top lines

Get Informations about Dataframe Data

Select one Column

Select Multiple Columns

Select one Row

Select Multiple Rows

Select Multiple Rows with logical statement

Navigation

Navigation

SAP Development

Debian GNU/Linux

Wikiwerkzeuge

Wikiwerkzeuge

Anonym

Suche

Python Pandas Module Cheatsheet

Import Pandas Module

Creating Dataframes

From a Dictionary

From a List

From a CSV File

Viewing Dataframes

Show top lines

Get Informations about Dataframe Data

Select one Column

Select Multiple Columns

Select one Row

Select Multiple Rows

Select Multiple Rows with logical statement

Navigation

Wikiwerkzeuge

Seitenwerkzeuge