Python Pandas Module Cheatsheet: Unterschied zwischen den Versionen

Aktuelle Version vom 6. März 2025, 09:00 Uhr

Import Pandas Module

import pandas as pd

Creating DataFrames

From a Dictionary

df1 = pd.DataFrame({
    'name': ['John Smith', 'Jane Doe', 'Joe Schmo'],
    'address': ['123 Main St.', '456 Maple Ave.', '789 Broadway'],
    'age': [34, 28, 51]
})

Note: All columns need the same number of elements.

From a List

df2 = pd.DataFrame([
    ['John Smith', '123 Main St.', 34],
    ['Jane Doe', '456 Maple Ave.', 28],
    ['Joe Schmo', '789 Broadway', 51]
    ],
    columns=['name', 'address', 'age'])

From a CSV File

df3 = pd.read_csv('sample.csv')

Viewing DataFrames

Show top lines

print(df.head())     # print first 5 lines
print(df.head(10)    # print first 10 lines

Get Informations about DataFrame Data

print(df.info())

Prints Number of records, and name, datatype and number of filled lines of each individual column.

Select one Column

Given a dataframe there are two ways to return a column:

column_way1 = df['columnname']

# this only works if the columnname has no special characters and spaces
column_way2 = df.columnname

The type of the returned value from a selected column is called a Series, if only one column was selected.

print(type(column_way1))

# returns:
# <class 'pandas.core.series.Series'>

Select Multiple Columns

Selecting columns 3 and 7 from a dataframe with multiple columns:

new_df = df[['column3', 'column7']]

Note: Double set of brackets [[]] is mandatory. The type of the returned value from two selected columns is a DataFrame.

print(type(new_df))

# returns: 
# <class 'pandas.core.frame.DataFrame'>

Select one Row

DataFrames are zero-indexed. This returns the third row from the DataFrame:

new_df = df.iloc[2]

Select Multiple Rows

Rows are zero-indexed:

# from 3rd row (zero-indexed) to but not incl. 7th row
new_df = df.iloc[3:7]

# from zeroth row to but not incl. 4th row
new_df = df.iloc[:4]

# from 3rd to last row to the final row
new_df = df.iloc[-3:]

Select Multiple Rows with Logical Statement

Comparing against single values:

new_df = df[df.a_column == 'desired value']
new_df = df[df.a_column != 'desired value']

new_df = df[(df.a_column == 'desired value') |
            (df.a_column == 'another desired value')]

new_df = df[df.b_column > 5]

Select Multiple Rows by List of Values

Comparing against a list of values:

new_df = df[df.a_column.isin(['desired value','another desired value'])]

Resetting Indices

Selecting parts of dataframes leads to non-consecuritve indices. To correct for that:

# Creates new dataframe with consecurive indices and a new column holding old indices
new_df = df.reset_index()

# Creates new dataframe with consecurive indices without a column holding old indices
new_df = df.reset_index(drop=True)

# Replaces indices inplace  without extra column for old indices
df.reset_index(inplace=True, drop=True)

Modifying DataFrames

Adding Columns

Add a new column by use of a list of the same length as the existing dataframe:

df['Quantity'] = [100, 150, 50, 35]

Add a new column that contains the same value in all rows:

df['In Stock'] = True

Adding a Calculated Column

Add a column calculated based on the contents of another column:

df['Margin'] = df.Price - df.Cost

Perform a Function on a Column

df['Name'] = df.Name.apply(str.upper)

@@ Zeile 5: / Zeile 5: @@
 </syntaxhighlight>
-== Creating Dataframes ==
+== Creating DataFrames ==
 === From a Dictionary ===
@@ Zeile 14: / Zeile 14: @@
      'age': [34, 28, 51]
 })
-</syntaxhighlight>
+</syntaxhighlight>'''Note:''' All columns need the same number of elements.
 === From a List ===
@@ Zeile 32: / Zeile 32: @@
 </syntaxhighlight>
-== Viewing Dataframes ==
+== Viewing DataFrames ==
 === Show top lines ===
@@ Zeile 40: / Zeile 40: @@
 </syntaxhighlight>
-=== Get Informations about Dataframe Data ===
+=== Get Informations about DataFrame Data ===
 <syntaxhighlight lang="python3">
 print(df.info())
 </syntaxhighlight>Prints Number of records, and name, datatype and number of filled lines of each individual column.
-=== Select a Column ===
+=== Select one Column ===
 Given a dataframe there are two ways to return a column:<syntaxhighlight lang="python3">
 column_way1 = df['columnname']
@@ Zeile 52: / Zeile 52: @@
 column_way2 = df.columnname
-</syntaxhighlight>The returned value from a selected column is called a ''Series.''
+</syntaxhighlight>The type of the returned value from a selected column is called a ''Series'', if only one column was selected.<syntaxhighlight lang="python3">
+print(type(column_way1))
+# returns:
+# <class 'pandas.core.series.Series'>
+</syntaxhighlight>
+=== Select Multiple Columns ===
+Selecting columns 3 and 7 from a dataframe with multiple columns:<syntaxhighlight lang="python3">
+new_df = df[['column3', 'column7']]
+</syntaxhighlight>'''Note:''' Double set of brackets <code>[[]]</code> is mandatory.
+The type of the returned value from two selected columns is a ''DataFrame.''<syntaxhighlight lang="python3">
+print(type(new_df))
+# returns:
+# <class 'pandas.core.frame.DataFrame'>
+</syntaxhighlight>
+=== Select one Row ===
+DataFrames are zero-indexed. This returns the third row from the DataFrame:<syntaxhighlight lang="python3">
+new_df = df.iloc[2]
+</syntaxhighlight>
+=== Select Multiple Rows ===
+Rows are zero-indexed:<syntaxhighlight lang="python3">
+# from 3rd row (zero-indexed) to but not incl. 7th row
+new_df = df.iloc[3:7]
+# from zeroth row to but not incl. 4th row
+new_df = df.iloc[:4]
+# from 3rd to last row to the final row
+new_df = df.iloc[-3:]
+</syntaxhighlight>
+=== Select Multiple Rows with Logical Statement ===
+Comparing against single values:<syntaxhighlight lang="python3">new_df = df[df.a_column == 'desired value']
+new_df = df[df.a_column != 'desired value']
+new_df = df[(df.a_column == 'desired value') |
+            (df.a_column == 'another desired value')]
+new_df = df[df.b_column > 5]</syntaxhighlight>
+=== Select Multiple Rows by List of Values ===
+Comparing against a list of values:<syntaxhighlight lang="python3">new_df = df[df.a_column.isin(['desired value','another desired value'])]</syntaxhighlight>
+=== Resetting Indices ===
+Selecting parts of dataframes leads to non-consecuritve indices. To correct for that:<syntaxhighlight lang="python3"># Creates new dataframe with consecurive indices and a new column holding old indices
+new_df = df.reset_index()
+# Creates new dataframe with consecurive indices without a column holding old indices
+new_df = df.reset_index(drop=True)
+# Replaces indices inplace  without extra column for old indices
+df.reset_index(inplace=True, drop=True)
+</syntaxhighlight>
+== Modifying DataFrames ==
+=== Adding Columns ===
+Add a new column by use of a list of the same length as the existing dataframe:<syntaxhighlight lang="python3">
+df['Quantity'] = [100, 150, 50, 35]
+</syntaxhighlight>Add a new column that contains the same value in all rows:<syntaxhighlight lang="python3">
+df['In Stock'] = True
+</syntaxhighlight>
+=== Adding a Calculated Column ===
+Add a column calculated based on the contents of another column:<syntaxhighlight lang="python3">
+df['Margin'] = df.Price - df.Cost
+</syntaxhighlight>
+=== Perform a Function on a Column ===
+<syntaxhighlight lang="python3">
+df['Name'] = df.Name.apply(str.upper)
+</syntaxhighlight>
+[[Kategorie:Python]]

Anonym

Suche

Python Pandas Module Cheatsheet: Unterschied zwischen den Versionen

Namensräume

Mehr

Seitenaktionen

Aktuelle Version vom 6. März 2025, 09:00 Uhr

Inhaltsverzeichnis

Import Pandas Module

Creating DataFrames

From a Dictionary

From a List

From a CSV File

Viewing DataFrames

Show top lines

Get Informations about DataFrame Data

Select one Column

Select Multiple Columns

Select one Row

Select Multiple Rows

Select Multiple Rows with Logical Statement

Select Multiple Rows by List of Values

Resetting Indices

Modifying DataFrames

Adding Columns

Adding a Calculated Column

Perform a Function on a Column

Navigation

Navigation

SAP Development

Debian GNU/Linux

Wikiwerkzeuge

Wikiwerkzeuge

Anonym

Suche

Python Pandas Module Cheatsheet: Unterschied zwischen den Versionen

Aktuelle Version vom 6. März 2025, 09:00 Uhr

Import Pandas Module

Creating DataFrames

From a Dictionary

From a List

From a CSV File

Viewing DataFrames

Show top lines

Get Informations about DataFrame Data

Select one Column

Select Multiple Columns

Select one Row

Select Multiple Rows

Select Multiple Rows with Logical Statement

Select Multiple Rows by List of Values

Resetting Indices

Modifying DataFrames

Adding Columns

Adding a Calculated Column

Perform a Function on a Column

Navigation

Wikiwerkzeuge

Seitenwerkzeuge

Kategorien