This tutorial explains how to use pandas-profiling to create feature profiles of a pandas dataframe and both view the profile in the notebook and save to an HTML file.

Packages

This tutorial uses:

Initial Import


import statsmodels.api as sm
import pandas as pd
import numpy as np

from pandas_profiling import ProfileReport

Reading the data

The data is from rdatasets imported using the Python package statsmodels.

df = sm.datasets.get_rdataset('flights', 'nycflights13').data

Feature Engineering

Convert the times from floats or ints to hour and minutes

Convert some of the fields into more meaningful fields to better understand the time flights depart and arrive. Next the original fields are dropped as they are now redundant.


df.dropna(inplace=True)
df['arr_hour'] = df.arr_time.apply(lambda x: int(np.floor(x/100)))
df['arr_minute'] = df.arr_time.apply(lambda x: int(x - np.floor(x/100)*100))
df['sched_arr_hour'] = df.sched_arr_time.apply(lambda x: int(np.floor(x/100)))
df['sched_arr_minute'] = df.sched_arr_time.apply(lambda x: int(x - np.floor(x/100)*100))
df['sched_dep_hour'] = df.sched_dep_time.apply(lambda x: int(np.floor(x/100)))
df['sched_dep_minute'] = df.sched_dep_time.apply(lambda x: int(x - np.floor(x/100)*100))
df.rename(columns={'hour': 'dep_hour',
                   'minute': 'dep_minute'}, inplace=True)
df.drop(columns=['time_hour', 'dep_time', 'sched_dep_time', 'arr_time', 'sched_arr_time', 'dep_delay'], inplace=True)

Profile Features

profile = ProfileReport(df, title="NYC Flights Profiling Report")

Show profile with notebook widgets


profile.to_widgets()

You should see something like:

HBox(children=(FloatProgress(value=0.0, description='Summarize dataset', max=33.0, style=ProgressStyle(descrip… HBox(children=(FloatProgress(value=0.0, description='Generate report structure', max=1.0, style=ProgressStyle(… HBox(children=(FloatProgress(value=0.0, description='Render widgets', max=1.0, style=ProgressStyle(description… VBox(children=(Tab(children=(Tab(children=(GridBox(children=(VBox(children=(GridspecLayout(children=(HTML(valu…

Show profile by displaying HTML in the notebook


profile.to_notebook_iframe()

Save the profile to an HTML file


profile.to_file("nyc_flights_profile.html")