Data Science Knowledge Base

Aus MattWiki
Version vom 14. Dezember 2023, 15:27 Uhr von Matt (Diskussion | Beiträge) (Die Seite wurde neu angelegt: „This article is a knowledge base with basics for how to start a data science project. Sources: * openHPI Data Science Bootcamp: https://open.hpi.de/courses/datascience2023 * Numpy and Pandas tutorials and reference ** https://www.w3schools.com/python/numpy/default.asp ** https://www.w3schools.com/python/pandas/default.asp = Exploratory Data Analysis = <syntaxhighlight lang="python3" line="1"> # Load numpy and pandas libraries import numpy as np import…“)
(Unterschied) ← Nächstältere Version | Aktuelle Version (Unterschied) | Nächstjüngere Version → (Unterschied)

This article is a knowledge base with basics for how to start a data science project.

Sources:

Exploratory Data Analysis

# Load numpy and pandas libraries
import numpy as np
import pandas as pd

# Read data from CSV file into a dataframe
df = pd.read_csv('911.csv')                

# Show informations about columns, and number and data type of their content
print(df.info())

# Show first and last rows and columns of the dataframe
print(df)

# Show first 10 columns of dataframe
print(df.head(10))

# Describe numerical columns of dataframe by showing their min, max, count, mean and other:
print(df.describe())

# Analyze columns of interest, i.e. ZIP code, title and timeStamp:
print(df["zip"].mean())
print(df["zip"].value_counts().head(10))
print(df["zip"].value_counts().tail(10))
print(df["zip"].nunique())
print(df["title"].nunique())
print(df["timeStamp"].min())
print(df["timeStamp"].max())


Finish the exploratory data analysis by writing a management summary containing gained knowledge about the dataset.