Automated EDA with Pandas Profiling

Learn how to use Pandas Profiling for automated EDA
Author

Kedar Dabhadkar

Published

June 6, 2021

Pandas profiling allows us to create a static report of the distributions of all columns in a dataframe, categorical or continuous. It also finds the correlations among these columns. I also deployed this feature as a Flask app a few months ago: https://data-analyzer-hpn4y2dvda-uc.a.run.app/

import pandas as pd
from pandas_profiling import ProfileReport

Import sample data

data_df = pd.read_csv("https://raw.githubusercontent.com/dkedar7/Data-Analyzer/master/Analyzer/titanic.csv")
data_df = data_df.sample(100).reset_index(drop=True).astype(str)

Create the report

profile = ProfileReport(data_df, title="Pandas Profiling Report")
profile

Explore deeper

profile = ProfileReport(data_df, title="Pandas Profiling Report", explorative=True)
profile

Export report

profile.to_widgets()
profile.to_file('report.html')

Read more

[1] https://github.com/pandas-profiling/pandas-profiling