This tutorial explains how to identify missing data with pyrasgo.

Packages

This tutorial uses:

Open a Jupyter Notebook and import the following:


import pandas as pd
import pyrasgo

Connect to Rasgo

If you haven't done so already, head over to https://docs.rasgoml.com/rasgo-docs/onboarding/initial-setup and follow the steps outlined there to create your free account. This account gives you free access to the Rasgo API which will calculate dataframe profiles, generate feature importance score, and produce feature explainability for you analysis. In addition, this account allows you to maintain access to your analysis and share with your colleagues.


rasgo = pyrasgo.login(email='', password='')

Creating the data

We will create a dataframe that contains multiple occurrences of duplication for this example.


df = pd.DataFrame({'A': ['text']*20,
                   'B': [1, 2.2]*10,
                   'C': [True, False]*10,
                   'D': pd.to_datetime('2020-01-01')
                  })
                  

Next, delete some of the entries to create missing data.


df.iloc[0,0] = None
df.iloc[1,0] = None
df.iloc[10,0] = None
df.iloc[5,1] = None
df.iloc[7,1] = None
df.iloc[4,2] = None
df.iloc[5,2] = None
df.iloc[9,2] = None
df.iloc[12,2] = None
df.iloc[2,3] = None
df.iloc[12,3] = None
df

Identify missing data

The function evaluate.missing_data will identify missing data in the dataframe.


missing = rasgo.evaluate.missing_data(df)
missing