This tutorial explains how to create lag variables in pandas using gold and silver price data from rdatasets.
Packages
The documentation for each package used in this tutorial is linked below:
import statsmodels.api as sm
import pandas as pd
Create initial dataset
The data is from rdatasets imported using the Python package statsmodels.
df = sm.datasets.get_rdataset('GoldSilver', 'AER').data.reset_index().rename(columns={'index': 'date'})
df.info()
df['date'] = pd.to_datetime(df.date)
Lag Variables
Create lag variables, using the shift function. shift(1) creates a lag of a single record, while shift(5) creates a lag of five records.
df.sort_values('date', inplace=True)
df['silver_lag_1'] = df['silver'].shift(1)
df['silver_lag_5'] = df['silver'].shift(5)
date gold silver silver_lag_1 silver_lag_5
0 1977-12-30 100.00 223.42 NaN NaN
1 1978-01-02 100.00 223.42 223.42 NaN
2 1978-01-03 100.00 229.84 223.42 NaN
3 1978-01-04 100.00 224.58 229.84 NaN
4 1978-01-05 100.00 227.99 224.58 NaN
5 1978-01-06 100.00 227.19 227.99 223.42
6 1978-01-09 101.23 229.62 227.19 223.42
7 1978-01-10 100.95 228.97 229.62 229.84
8 1978-01-11 102.25 231.22 228.97 224.58
9 1978-01-12 100.88 227.89 231.22 227.99
10 1978-01-13 102.62 232.55 227.89 227.19
11 1978-01-16 103.09 232.49 232.55 229.62
12 1978-01-17 101.78 230.18 232.49 228.97
13 1978-01-18 100.87 227.96 230.18 231.22
14 1978-01-19 101.70 228.88 227.96 227.89
15 1978-01-20 101.72 228.92 228.88 232.55
16 1978-01-23 103.62 234.00 228.92 232.49
17 1978-01-24 103.59 233.63 234.00 230.18
18 1978-01-25 103.55 234.50 233.63 227.96
19 1978-01-26 103.28 231.49 234.50 228.88
This creates a lag variable based on the prior observations, but shift can also take a time offset to specify the time to use in shift. For example, 1D and 5D can be used to lag by 1 and 5 days respectively.
First, a datetime index must be created from date.
df.set_index('date', inplace=True)
df['silver_lag_1d'] = df['silver'].shift(freq='1D')
df['silver_lag_5d'] = df['silver'].shift(freq='5D')
gold silver silver_lag_1 silver_lag_5 silver_lag_1d silver_lag_5d
date
1977-12-30 100.00 223.42 NaN NaN NaN NaN
1978-01-02 100.00 223.42 223.42 NaN NaN NaN
1978-01-03 100.00 229.84 223.42 NaN 223.42 NaN
1978-01-04 100.00 224.58 229.84 NaN 229.84 223.42
1978-01-05 100.00 227.99 224.58 NaN 224.58 NaN
1978-01-06 100.00 227.19 227.99 223.42 227.99 NaN
1978-01-09 101.23 229.62 227.19 223.42 NaN 224.58
1978-01-10 100.95 228.97 229.62 229.84 229.62 227.99
1978-01-11 102.25 231.22 228.97 224.58 228.97 227.19
1978-01-12 100.88 227.89 231.22 227.99 231.22 NaN
1978-01-13 102.62 232.55 227.89 227.19 227.89 NaN
1978-01-16 103.09 232.49 232.55 229.62 NaN 231.22
1978-01-17 101.78 230.18 232.49 228.97 232.49 227.89
1978-01-18 100.87 227.96 230.18 231.22 230.18 232.55
1978-01-19 101.70 228.88 227.96 227.89 227.96 NaN
1978-01-20 101.72 228.92 228.88 232.55 228.88 NaN
1978-01-23 103.62 234.00 228.92 232.49 NaN 227.96
1978-01-24 103.59 233.63 234.00 230.18 234.00 228.88
1978-01-25 103.55 234.50 233.63 227.96 233.63 228.92
1978-01-26 103.28 231.49 234.50 228.88 234.50 NaN