This tutorial explains how to create a moving average in pandas using the gold and silver price data from rdatasets.

Packages

The documentation for each package used in this tutorial is linked below:

Open up a Jupyter notebook and import the following:


import statsmodels.api as sm
import pandas as pd

Create initial dataset

The data is from rdatasets imported using the Python package statsmodels.


df = sm.datasets.get_rdataset('GoldSilver', 'AER').data.reset_index().rename(columns={'index': 'date'})
df.info()

df['date'] = pd.to_datetime(df.date)

Moving Average

To create a moving average, a rolling window first needs to be created using the pandas function rolling. Then any aggregation function, sum, mean, std, etc.


df.sort_values('date', inplace=True)
df['silver_moving_average_5'] = df['silver'].rolling(5).mean()

df.head(10).tail()

your dataframe should look like this:


	date	gold	silver	silver_moving_average_5
5	1978-01-06	100.00	227.19	226.604
6	1978-01-09	101.23	229.62	227.844
7	1978-01-10	100.95	228.97	227.670
8	1978-01-11	102.25	231.22	228.998
9	1978-01-12	100.88	227.89	228.978

Alternatively, a rolling window could be created and multiple aggregations applied to it.


rolling_5 = df.rolling(5)
df['silver_moving_average_5_two'] = rolling_5['silver'].mean()
df['silver_moving_sum_5'] = rolling_5['silver'].sum()

df.head(10).tail()

	date	gold	silver	silver_moving_average_5	silver_moving_average_5_two	silver_moving_sum_5
5	1978-01-06	100.00	227.19	226.604	226.604	1133.02
6	1978-01-09	101.23	229.62	227.844	227.844	1139.22
7	1978-01-10	100.95	228.97	227.670	227.670	1138.35
8	1978-01-11	102.25	231.22	228.998	228.998	1144.99
9	1978-01-12	100.88	227.89	228.978	228.978	1144.89

This creates a moving average based on the last five observations, but rolling can also take an offset to specify the time to use in the rolling window. For example, 5D can be used as an offset for 5 days.

First, a datetime index must be created from date.


df.set_index('date', inplace=True)

rolling_5d = df.rolling('5D')
df['silver_moving_average_5D'] = rolling_5d['silver'].mean()
df['silver_moving_sum_5D'] = rolling_5d['silver'].sum()

df.head(10).tail()

	gold	silver	silver_moving_average_5	silver_moving_average_5_two	silver_moving_sum_5	silver_moving_average_5D	silver_moving_sum_5D
date							
1978-01-06	100.00	227.19	226.604	226.604	1133.02	226.604000	1133.02
1978-01-09	101.23	229.62	227.844	227.844	1139.22	228.266667	684.80
1978-01-10	100.95	228.97	227.670	227.670	1138.35	228.593333	685.78
1978-01-11	102.25	231.22	228.998	228.998	1144.99	229.936667	689.81
1978-01-12	100.88	227.89	228.978	228.978	1144.89	229.425000	917.70