Python Pandas Module Cheatsheet: Unterschied zwischen den Versionen
Matt (Diskussion | Beiträge) Keine Bearbeitungszusammenfassung |
Matt (Diskussion | Beiträge) Keine Bearbeitungszusammenfassung |
||
(5 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt) | |||
Zeile 5: | Zeile 5: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
== Creating | == Creating DataFrames == | ||
=== From a Dictionary === | === From a Dictionary === | ||
Zeile 14: | Zeile 14: | ||
'age': [34, 28, 51] | 'age': [34, 28, 51] | ||
}) | }) | ||
</syntaxhighlight> | </syntaxhighlight>'''Note:''' All columns need the same number of elements. | ||
=== From a List === | === From a List === | ||
Zeile 32: | Zeile 32: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
== Viewing | == Viewing DataFrames == | ||
=== Show top lines === | === Show top lines === | ||
Zeile 40: | Zeile 40: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
=== Get Informations about | === Get Informations about DataFrame Data === | ||
<syntaxhighlight lang="python3"> | <syntaxhighlight lang="python3"> | ||
print(df.info()) | print(df.info()) | ||
</syntaxhighlight>Prints Number of records, and name, datatype and number of filled lines of each individual column. | </syntaxhighlight>Prints Number of records, and name, datatype and number of filled lines of each individual column. | ||
=== Select | === Select one Column === | ||
Given a dataframe there are two ways to return a column:<syntaxhighlight lang="python3"> | Given a dataframe there are two ways to return a column:<syntaxhighlight lang="python3"> | ||
column_way1 = df['columnname'] | column_way1 = df['columnname'] | ||
Zeile 52: | Zeile 52: | ||
column_way2 = df.columnname | column_way2 = df.columnname | ||
</syntaxhighlight>The returned value from a selected column is called a ''Series.'' | </syntaxhighlight>The type of the returned value from a selected column is called a ''Series'', if only one column was selected.<syntaxhighlight lang="python3"> | ||
print(type(column_way1)) | |||
# returns: | |||
# <class 'pandas.core.series.Series'> | |||
</syntaxhighlight> | |||
=== Select Multiple Columns === | |||
Selecting columns 3 and 7 from a dataframe with multiple columns:<syntaxhighlight lang="python3"> | |||
new_df = df[['column3', 'column7']] | |||
</syntaxhighlight>'''Note:''' Double set of brackets <code>[[]]</code> is mandatory. | |||
The type of the returned value from two selected columns is a ''DataFrame.''<syntaxhighlight lang="python3"> | |||
print(type(new_df)) | |||
# returns: | |||
# <class 'pandas.core.frame.DataFrame'> | |||
</syntaxhighlight> | |||
=== Select one Row === | |||
DataFrames are zero-indexed. This returns the third row from the DataFrame:<syntaxhighlight lang="python3"> | |||
new_df = df.iloc[2] | |||
</syntaxhighlight> | |||
=== Select Multiple Rows === | |||
Rows are zero-indexed:<syntaxhighlight lang="python3"> | |||
# from 3rd row (zero-indexed) to but not incl. 7th row | |||
new_df = df.iloc[3:7] | |||
# from zeroth row to but not incl. 4th row | |||
new_df = df.iloc[:4] | |||
# from 3rd to last row to the final row | |||
new_df = df.iloc[-3:] | |||
</syntaxhighlight> | |||
=== Select Multiple Rows with Logical Statement === | |||
Comparing against single values:<syntaxhighlight lang="python3">new_df = df[df.a_column == 'desired value'] | |||
new_df = df[df.a_column != 'desired value'] | |||
new_df = df[(df.a_column == 'desired value') | | |||
(df.a_column == 'another desired value')] | |||
new_df = df[df.b_column > 5]</syntaxhighlight> | |||
=== Select Multiple Rows by List of Values === | |||
Comparing against a list of values:<syntaxhighlight lang="python3">new_df = df[df.a_column.isin(['desired value','another desired value'])]</syntaxhighlight> | |||
=== Resetting Indices === | |||
Selecting parts of dataframes leads to non-consecuritve indices. To correct for that:<syntaxhighlight lang="python3"># Creates new dataframe with consecurive indices and a new column holding old indices | |||
new_df = df.reset_index() | |||
# Creates new dataframe with consecurive indices without a column holding old indices | |||
new_df = df.reset_index(drop=True) | |||
# Replaces indices inplace without extra column for old indices | |||
df.reset_index(inplace=True, drop=True) | |||
</syntaxhighlight> | |||
== Modifying DataFrames == | |||
=== Adding Columns === | |||
Add a new column by use of a list of the same length as the existing dataframe:<syntaxhighlight lang="python3"> | |||
df['Quantity'] = [100, 150, 50, 35] | |||
</syntaxhighlight>Add a new column that contains the same value in all rows:<syntaxhighlight lang="python3"> | |||
df['In Stock'] = True | |||
</syntaxhighlight> | |||
=== Adding a Calculated Column === | |||
Add a column calculated based on the contents of another column:<syntaxhighlight lang="python3"> | |||
df['Margin'] = df.Price - df.Cost | |||
</syntaxhighlight> | |||
=== Perform a Function on a Column === | |||
<syntaxhighlight lang="python3"> | |||
df['Name'] = df.Name.apply(str.upper) | |||
</syntaxhighlight> | |||
[[Kategorie:Python]] |
Aktuelle Version vom 6. März 2025, 09:00 Uhr
Import Pandas Module
import pandas as pd
Creating DataFrames
From a Dictionary
df1 = pd.DataFrame({
'name': ['John Smith', 'Jane Doe', 'Joe Schmo'],
'address': ['123 Main St.', '456 Maple Ave.', '789 Broadway'],
'age': [34, 28, 51]
})
Note: All columns need the same number of elements.
From a List
df2 = pd.DataFrame([
['John Smith', '123 Main St.', 34],
['Jane Doe', '456 Maple Ave.', 28],
['Joe Schmo', '789 Broadway', 51]
],
columns=['name', 'address', 'age'])
From a CSV File
df3 = pd.read_csv('sample.csv')
Viewing DataFrames
Show top lines
print(df.head()) # print first 5 lines
print(df.head(10) # print first 10 lines
Get Informations about DataFrame Data
print(df.info())
Prints Number of records, and name, datatype and number of filled lines of each individual column.
Select one Column
Given a dataframe there are two ways to return a column:
column_way1 = df['columnname']
# this only works if the columnname has no special characters and spaces
column_way2 = df.columnname
The type of the returned value from a selected column is called a Series, if only one column was selected.
print(type(column_way1))
# returns:
# <class 'pandas.core.series.Series'>
Select Multiple Columns
Selecting columns 3 and 7 from a dataframe with multiple columns:
new_df = df[['column3', 'column7']]
Note: Double set of brackets [[]]
is mandatory.
The type of the returned value from two selected columns is a DataFrame.
print(type(new_df))
# returns:
# <class 'pandas.core.frame.DataFrame'>
Select one Row
DataFrames are zero-indexed. This returns the third row from the DataFrame:
new_df = df.iloc[2]
Select Multiple Rows
Rows are zero-indexed:
# from 3rd row (zero-indexed) to but not incl. 7th row
new_df = df.iloc[3:7]
# from zeroth row to but not incl. 4th row
new_df = df.iloc[:4]
# from 3rd to last row to the final row
new_df = df.iloc[-3:]
Select Multiple Rows with Logical Statement
Comparing against single values:
new_df = df[df.a_column == 'desired value']
new_df = df[df.a_column != 'desired value']
new_df = df[(df.a_column == 'desired value') |
(df.a_column == 'another desired value')]
new_df = df[df.b_column > 5]
Select Multiple Rows by List of Values
Comparing against a list of values:
new_df = df[df.a_column.isin(['desired value','another desired value'])]
Resetting Indices
Selecting parts of dataframes leads to non-consecuritve indices. To correct for that:
# Creates new dataframe with consecurive indices and a new column holding old indices
new_df = df.reset_index()
# Creates new dataframe with consecurive indices without a column holding old indices
new_df = df.reset_index(drop=True)
# Replaces indices inplace without extra column for old indices
df.reset_index(inplace=True, drop=True)
Modifying DataFrames
Adding Columns
Add a new column by use of a list of the same length as the existing dataframe:
df['Quantity'] = [100, 150, 50, 35]
Add a new column that contains the same value in all rows:
df['In Stock'] = True
Adding a Calculated Column
Add a column calculated based on the contents of another column:
df['Margin'] = df.Price - df.Cost
Perform a Function on a Column
df['Name'] = df.Name.apply(str.upper)