Pandas Missing Data

This tutorial explains how to identify missing data with pandas.

Packages

This tutorial uses:

Open a Jupyter Notebook and enter the following:


import pandas as pd

Creating the data

We will create a dataframe that contains multiple occurrences of duplication for this example.


df = pd.DataFrame({'A': ['text']*20,
                   'B': [1, 2.2]*10,
                   'C': [True, False]*10,
                   'D': pd.to_datetime('2020-01-01')
                  })
                  

Next, delete some of the entries to create missing data.


df.iloc[0,0] = None
df.iloc[1,0] = None
df.iloc[10,0] = None
df.iloc[5,1] = None
df.iloc[7,1] = None
df.iloc[4,2] = None
df.iloc[5,2] = None
df.iloc[9,2] = None
df.iloc[12,2] = None
df.iloc[2,3] = None
df.iloc[12,3] = None
df

Identify missing data

The function isna will identify duplicates in the data.


missing = df.isna()
missing


Use sum to get the count of missing values in each column.


missing.sum()

The rows that contain missing data can be selected using the pandas function any with axis set to 1.


anymissing = missing.any(axis=1)
anymissing

df[anymissing]