Seasonal ARIMA Models
Quick Summary
Seasonality
Seasonality is the component of time series that represents the effects of seasonal variation. The seasonal component of a time series is the repeated pattern over a fixed period of time. For example, the airlines dataset we’ve been working through (shown below) shows seasonality every 12 observations (12 months) where we see different patterns in summer and holiday times of the year as compared to other times of the year. Seasonality occurs over a fixed and known period of time.
The length of the seasonal periods is denoted as S. Therefore, seasonal factors repeat every S units of time.
As mentioned in the stationarity section, seasonal models are not stationary by default because the time series doesn’t revert to a long term mean. When your data has both trend and seasonality, the seasonality should be the first problem you try and correct to make stationary. We do this because making the data stationary by accounting for the seasonality might also account for trend. This is not true for trend. Our process for building ARIMA models is the following:
Similar to when we accounted for trend, seasonality can be solved with a deterministic or stochastic solution.
Deterministic - Seasonal dummy variables, Fourier transformations, predictor variables
Stochastic - Seasonal differences
Once data is made stationary, we can model with traditional and seasonal ARIMA approaches. When we forecast our data we account for the deterministic or stochastic solution above and build that into our forecasts.
Seasonal “Unit-Root” Testing
Similar to trend, we can perform statistical tests to try and evaluate whether we need to take a seasonal difference or perform more deterministic solutions. However, seasonal unit root tests have problems with large seasonal frequencies - values of S that are larger than 12. To counter these problems with seasonal unit roots we use measures of seasonal strength instead:
\[ F_S = \max{(0, 1 - \frac{Var(E_t)}{Var(S_t + E_t)})} \]
where \(E_t\) is the error component and \(S_t\) is the seasonal component. These components both come from a time series decomposition of the data. If \(F_S < 0.64\) then no seasonal differencing would be recommended, while if \(F_S \ge 0.64\) then take one seasonal difference.
The only downside of this approach is that if your data has any real amount of seasonality it would just recommend taking the differences without any notion of deterministic solutions. Therefore, this really isn’t an approach to compare whether you should use deterministic or stochastic solution for your specific problem. In practice, we typically model our data with both deterministic and stochastic solutions and compare the forecasts to see which was more accurate.
Let’s see how to evaluate this in each of our softwares!
R has this functionality built in with its unitroot_nsdiffs
option inside of the features
function. We just need to specify the variable we want to calculate this test on, which for us is the variable Passengers. The output reports the number of seasonal differences it suggests (either 1 or 0) based on the cut-off described above.
Code
%>%
train features(Passengers, unitroot_nsdiffs)
# A tibble: 1 × 1
nsdiffs
<int>
1 1
Python doesn’t have a built in function at the time of writing this to calculate this. However, we can easily calculate this ourselves from the time series decomposition we have done in previous sections. Inside of our StatsForecast
function we use the MSTL
function with a frequency set to monthly data. This MSTL
function calculates the STL version of the decomposition. We then use the fit
function with the df =
option to specify our training data we want to decompose. We can then use the .fitted_[0,0].model_
functions to print out each of the above components for each observation in our training data.
Code
from statsforecast import StatsForecast
from statsforecast.models import MSTL
= StatsForecast(models = [MSTL(season_length = 12)], freq = 'M')
dcmp = train_sf) dcmp.fit(df
StatsForecast(models=[MSTL])
Code
= dcmp.fitted_[0,0].model_
result result
data trend seasonal remainder
0 34348.0 39019.258306 -4656.823649 -14.434657
1 33536.0 38883.031147 -5824.358016 477.326868
2 40578.0 38749.491951 1252.474783 576.033266
3 38267.0 38618.251802 -554.512182 203.260380
4 38249.0 38488.211305 388.631112 -627.842417
.. ... ... ... ...
202 60341.0 62515.681097 -2946.195752 771.514655
203 60791.0 62606.870904 -1594.438482 -221.432422
204 57233.0 62699.163218 -6078.039540 611.876322
205 54140.0 62792.107128 -7499.634204 -1152.472924
206 67342.0 62885.168552 4003.645894 453.185554
[207 rows x 4 columns]
Now that we have our components, we can calculate \(F_S\) by hand. We just use the var
function from numpy
to calculate the variance.
Code
import numpy as np
= 1 - (np.var(result['remainder'])/(np.var(result['remainder']) + np.var(result['seasonal'])))
FS
FS
0.9140123862305708
SAS doesn’t have a built in function at the time of writing this to calculate this. However, we can easily calculate this ourselves from the time series decomposition we have done in previous sections. By default SAS will provide the classical decomposition. In the PROC TIMESERIES procedure, we use the data =
option to provide our training data and the plots = (series decomp)
option to ask SAS for a plot of the original data series and a plot of the decomposition. The outdecomp
option saves the dataset with teh decomposition of our time series. The ID statement is where we provide our date variable and specify our monthly seasonality with the interval = month
option. Next, in the VAR statement we provide our target variable, Passengers.
Code
= work.train plots = (series decomp) outdecomp = decomp;
proc timeseries data = month;
id date interval
var Passengers; run;
Now that we have our components, we can calculate \(F_S\) by hand. First, we need to combine our seasonal and error components represented in our data as SC and IC respectively. Once we combine these together in a DATA STEP we can use the PROC MEANS procedure to calculate the variance of each of them. We just use the var
option in the PROC MEANS statement and name the variables in the VAR statement. From there we can use another DATA STEP to calculate the value of \(F_S\) and print it out with the PROC PRINT procedure.
Code
data decomp;
set decomp;= SC + IC;
S_E
run;
= decomp var;
proc means data
var IC S_E;
run;
data calculation;= 1 - (0.0011888/0.0084828);
FS
run;
= calculation;
proc print data run;
The tests above suggest taking a seasonal difference. However, let’s solve our problem with both a deterministic approach and stochastic approach to see which is better for our data.
Deterministic Solution
Unlike trend, there are many different approaches to accounting for seasonality deterministically. The approaches we will discuss are seasonal dummy variables, Fourier transforms, and predictor variables.
Seasonal Dummy Variables
Seasonal dummy variables are exactly what their name implies. We will use a set of dummy variables to account for the seasonal effects in our dataset. For a time series with S periods within a season, there will be S-1 dummy variables - one for each period (and one accounted for with the intercept).
Since our US airlines dataset is monthly, we will build a dummy variable for each month and pick one to not include. For example, we could build the following model:
\[ Y_t = \beta_0 + \beta_1JAN + \beta_2FEB + \cdots + \beta_{11}NOV + e_t \]
In the above model, we chose to leave out the month of December’s dummy variable since our model has an intercept. In fact, the intercept \(\beta_0\) is the average effect of December. Each of the coefficients on the respective variables measures the average difference in effect from that month and December (our reference level).
Let’s see how to do this in each of our softwares!
Adding external variables to an ARIMA model in R is quite straightforward. By using the formula structure in the ARIMA
function in the model
function (from the fable
package), we can add these dummy variables to our model as well as build out an automatic ARIMA model. We specify that the Month variable in our dataset is a categorical variable with the factor
function. To prevent the model from automatically trying to take a seasonal difference we use the PDQ
function where we specify the D = 0
option.
Code
<- train %>%
model_SD_ARIMA model(ARIMA(Passengers ~ factor(Month) + PDQ(D = 0)))
report(model_SD_ARIMA)
Series: Passengers
Model: LM w/ ARIMA(1,1,1)(0,0,2)[12] errors
Coefficients:
ar1 ma1 sma1 sma2 factor(Month)2 factor(Month)3
0.4436 -0.7882 0.2291 0.1213 -1128.2437 8309.8747
s.e. 0.1350 0.0974 0.0684 0.0697 601.3701 733.7967
factor(Month)4 factor(Month)5 factor(Month)6 factor(Month)7
5715.1870 6702.3514 9450.1273 11956.3215
s.e. 802.3318 834.1377 849.2332 854.2089
factor(Month)8 factor(Month)9 factor(Month)10 factor(Month)11
11633.4452 1025.323 5203.3424 2791.812
s.e. 850.3883 836.435 807.5957 748.587
factor(Month)12 intercept
3925.2899 114.6232
s.e. 615.1444 64.3706
sigma^2 estimated as 3535537: log likelihood=-1837.63
AIC=3709.26 AICc=3712.52 BIC=3765.84
R found the “best” model based on automatic selection using seasonal dummy variables as the ARIMA model with one autoregressive (AR) term, one regular difference, one moving average (MA) term, and two seasonal moving average (SMA) terms. Seasonal AR and MA terms are discussed below in the seasonal ARIMA section.
Adding external variables to an ARIMA model in Python is quite straightforward. First, we use the get_dummies
function from pandas
to make our Month variable into dummy variables. By using the merge
function on our training dataset we can add this new variable to our training data.
Code
= {'unique_id': 1, 'ds': train.index, 'Month': usair['Month'].head(n = 207)}
d_X = pd.DataFrame(data = d_X)
X_sf
= pd.get_dummies(X_sf, columns = ['Month'])
X_sf
= train_sf.merge(X_sf, how = 'left', on = ['unique_id', 'ds']) train_sf_X
Now, in the AutoARIMA
function, we can add these seasonal dummy variables to our model as well as build out an automatic ARIMA by just putting the training dataset with these new columns in the .fit
function. This means we have to be careful about which datasets we use since Python automatically assumes any additional column (beyond unique_id, ds, and y) are predictor variables. We use the .fitted_[0][0].model_.get()
and function to evaluate our model by asking for the order of the ARIMA model (arma
), the coefficients in the ARIMA model (coef
) and the AIC (aic
). To prevent the model from automatically trying to take a seasonal difference we use the D = 0
option. To make sure it knows this data has possible seasonal correlations we also use the season_length = 12
option.
Code
from statsforecast.models import AutoARIMA, ARIMA
= StatsForecast(models = [AutoARIMA(D = 0, season_length = 12)], freq = 'M')
model_SD_ARIMA = train_sf_X) model_SD_ARIMA.fit(df
StatsForecast(models=[AutoARIMA])
Code
0][0].model_.get("arma") model_SD_ARIMA.fitted_[
(1, 1, 0, 2, 12, 1, 0)
Code
0][0].model_.get("coef") model_SD_ARIMA.fitted_[
{'ar1': 0.3713252409089027, 'ma1': -0.7274856464839891, 'sma1': 0.2466963504128275, 'sma2': 0.140766068113354, 'ex_1': -5545.2314453125, 'ex_2': -6631.3447265625, 'ex_3': 2788.095458984375, 'ex_4': 234.21002197265625, 'ex_5': 1250.2069091796875, 'ex_6': 4022.0859375, 'ex_7': 6561.49462890625, 'ex_8': 6233.07861328125, 'ex_9': -4427.6884765625, 'ex_10': -241.98648071289062, 'ex_11': -2695.10693359375, 'ex_12': -1547.815673828125}
Code
0][0].model_.get("aic") model_SD_ARIMA.fitted_[
3712.022845450682
Python found the “best” model based on automatic selection using seasonal dummy variables as the ARIMA model with one autoregressive (AR) term, one regular difference, one moving average (MA) term, and two seasonal moving average (SMA) terms. Seasonal AR and MA terms are discussed below in the seasonal ARIMA section.
Adding external variables to an ARIMA model in SAS is quite straightforward. First, we use the DATA STEP to make our Month variable into dummy variables.
From there we use the PROC ARIMA procedure with this newly created dataset. With the IDENTIFY statement we give our target variable with the VAR option. We also include the CROSSCORR option with our all of our dummy variables we just calculated. Unfortunately, we cannot use the automatic selection techniques here since they will just be on the original target variable. We then use the ESTIMATE statement to define our model with only the seasonal dummy variables. The INPUT option is where we put our dummy variables into the actual model. We finish with the FORECAST statement with the LEAD option set to 0 to just give us fitted values from our training dataset. The OUT option at the very beginning of the procedure will save the model predictions and residuals which we will use in the next PROC ARIMA.
Since the PROC ARIMA procedure can only do automatic selection techniques on the variables in the IDENTIFY which don’t include the dummy variables yet, we have to use the residuals from our seasonal dummy variables model to put in another PROC ARIMA procedure to get an automatic selection for our model. First, we run the residuals through a Dickey-Fuller test using the STATIONARITY option with the ADF option inside of it to see if we need to difference. After looking at the results below we see we need to take a difference so we rerun the PROC ARIMA procedure with the additional IDENTIFY statement on the first difference of the residuals using the (1)
notation. Then we can use the MINIC to do automatic selection.
Code
data train_sd;
set train;if month=1 then Jan = 1; else Jan=0;
if month=2 then Feb = 1; else Feb=0;
if month=3 then Mar = 1; else Mar=0;
if month=4 then Apr = 1; else Apr=0;
if month=5 then May = 1; else May=0;
if month=6 then Jun = 1; else Jun=0;
if month=7 then Jul = 1; else Jul=0;
if month=8 then Aug = 1; else Aug=0;
if month=9 then Sep = 1; else Sep=0;
if month=10 then Oct = 1; else Oct=0;
if month=11 then Nov = 1; else Nov=0;
if month=12 then Dec = 1; else Dec=0;
run;
= train_sd out = train_sd_f ;
proc arima data = Passengers nlag = 36 crosscorr = (JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV);
identify var = (JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV);
estimate input = 0;
forecast lead
run;
= train_sd_f;
proc arima data = residual nlag = 36 stationarity=(adf = 5);
identify var = residual(1) nlag = 36 minic P=(1:24) Q=(1:24);
identify var run;
The results from the MINIC selection gave us only an ARIMA (1,1,1) model. However, there appears to be spikes in our correlation plots below at seasonal lags (explained in the section below on seasonal ARIMA).
Once we decide what model we would like to build, we can go back to the original PROC ARIMA procedure and add our autoregressive (AR) lags with the P option and our moving average (MA) terms with the Q option. We have a model with lags on the seasonal lags as well which is explained in the section below.
Code
= train_sd out = train_sd_f ;
proc arima data = Passengers(1) nlag = 36 crosscorr = (JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV);
identify var = (JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV) p = (1)(12) q = (1)(12);
estimate input run;
There are some advantages and disadvantages to the seasonal dummy variable approach. The advantages are that the model’s seasonal dummy variables have some nice interpretability. They tells us the average impact each seasonal component has on the target variable. It is also a rather straight-forward approach to implement. The main disadvantages however are that if your seasonal is especially long or complex, seasonal dummy variables are burdensome. Also, the constant effect of each season that is assumed may not be appropriate.
Fourier Transformations
Harmonic regression using Fourier transforms is another approach to account for seasonality deterministically. Fourier showed that a series of sine and cosine terms of the right frequencies approximate periodic patterns in a data series. To do this, we add Fourier variables to a regression model to account for the seasonal pattern. The odd terms \(k=1,3,5\) etc. are accounted for with sine variables:
\[ X_{k, t} = \sin(k \times \frac{2\pi t}{S}) \]
The even terms \(k = 2, 4, 6\) etc. are accounted for with cosine variables:
\[ X_{k, t} = \cos(k \times \frac{2\pi t}{S}) \]
The goal is to find the right combination of sine and cosine terms that when blended together mimic the unique pattern in our data. The idea of adding these terms together to get a new unique wave is show below:
In fact, if you add the same number of Fourier variables as you have seasonal dummy variables, you will get the same predictions. However, we typically do not need all of the Fourier variables, especially with large values of S. We need to decide how many of these values we need. One way to approach this is to build many models, while increasing the number of Fourier terms in the models and compare them all via a metric like BIC.
Let’s see how to do this in each of our softwares!
Adding external variables to an ARIMA model in R is quite straightforward. By using the formula structure in the ARIMA
function in the model
function, we can add these fourier variables to our model as well as build out an automatic ARIMA model. The fourier
function in our formula will add the Fourier variables in pairs of sine and cosine functions with the K =
option defining how many pairs of the Fourier variables to add. To prevent the model from automatically trying to take a seasonal difference we use the PDQ
function where we specify the D = 0
option. We can use the glance
function to quickly compare all these models.
Code
<- train %>%
model_F_ARIMA model(
`K = 1` = ARIMA(Passengers ~ fourier(K=1) + PDQ(D = 0)),
`K = 2` = ARIMA(Passengers ~ fourier(K=2) + PDQ(D = 0)),
`K = 3` = ARIMA(Passengers ~ fourier(K=3) + PDQ(D = 0)),
`K = 4` = ARIMA(Passengers ~ fourier(K=4) + PDQ(D = 0)),
`K = 5` = ARIMA(Passengers ~ fourier(K=5) + PDQ(D = 0)),
`K = 6` = ARIMA(Passengers ~ fourier(K=6) + PDQ(D = 0))
)
glance(model_F_ARIMA)
# A tibble: 6 × 8
.model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
1 K = 1 5041124. -1887. 3785. 3786. 3805. <cpl [13]> <cpl [1]>
2 K = 2 5086926. -1887. 3789. 3790. 3816. <cpl [13]> <cpl [1]>
3 K = 3 4296144. -1866. 3753. 3754. 3786. <cpl [24]> <cpl [1]>
4 K = 4 4327980. -1866. 3756. 3757. 3796. <cpl [24]> <cpl [1]>
5 K = 5 3689834. -1843. 3718. 3720. 3771. <cpl [0]> <cpl [27]>
6 K = 6 3535545. -1838. 3709. 3713. 3766. <cpl [1]> <cpl [25]>
Based on the results above, the model with all of the Fourier terms has the lowest values of AIC, AICc, and BIC. This model is the same as the model that we had with the seasonal dummy variables. We can see a plot of these different models below.
Adding external variables to an ARIMA model in Python is quite straightforward. First, we use the fourier
function from the utils.feature_engineering
package to make our Fourier variables. We need to specify the length of the season as well as how many pairs of sine and cosine terms to be created with the k =
option.
Code
from utilsforecast.feature_engineering import fourier
= fourier(train_sf, freq='ME', season_length = 12, k = 6, h = 12) train_sf_f, test_sf_f
Now that we have created our training data we can use a for
loop to try out the six different models with up to six pairs of Fourier variables. We can compare these models based on AIC.
Code
= np.array([])
aic
for i in range(7):
= StatsForecast(models = [AutoARIMA(season_length = 12, D = 0)], freq = 'M')
model_F_ARIMA = train_sf_f.iloc[:, 0:3 + 2*i])
model_F_ARIMA.fit(df = np.append(aic, model_F_ARIMA.fitted_[0][0].model_.get("aic")) aic
StatsForecast(models=[AutoARIMA])
StatsForecast(models=[AutoARIMA])
StatsForecast(models=[AutoARIMA])
StatsForecast(models=[AutoARIMA])
StatsForecast(models=[AutoARIMA])
StatsForecast(models=[AutoARIMA])
StatsForecast(models=[AutoARIMA])
Code
print(aic)
[3803.94123734 3814.96632136 3790.46280545 3780.93259849 3742.63203825
3720.40999863 3711.34444606]
Based on the results above, the model with all of the Fourier terms has the lowest values of AIC, AICc, and BIC. This model is the same as the model that we had with the seasonal dummy variables. We can build this final model below to see what the automatic ARIMA model is with these Fourier terms.
Code
= StatsForecast(models = [AutoARIMA(season_length = 12, D = 0)], freq = 'M')
model_F_ARIMA = train_sf_f) model_F_ARIMA.fit(df
StatsForecast(models=[AutoARIMA])
Code
0][0].model_.get("arma") model_F_ARIMA.fitted_[
(1, 1, 0, 2, 12, 1, 0)
Code
0][0].model_.get("coef") model_F_ARIMA.fitted_[
{'ar1': 0.37350534534395297, 'ma1': -0.730986579162173, 'sma1': 0.24993541027892932, 'sma2': 0.13427170126572247, 'ex_1': -1249.3470045450085, 'ex_2': 303.7786961746169, 'ex_3': -2557.2939268712653, 'ex_4': 388.5462446133606, 'ex_5': 2282.5500531224534, 'ex_6': -6944314.793335438, 'ex_7': -4354.82219970625, 'ex_8': 657.9607440973632, 'ex_9': 1270.699975324329, 'ex_10': 202.6587632324081, 'ex_11': 285.76890370837646, 'ex_12': 413.6526362558375}
Code
0][0].model_.get("aic") model_F_ARIMA.fitted_[
3711.3444460568417
Adding external variables to an ARIMA model in SAS is quite straightforward. First, we use the DATA STEP to make our Fourier variables by calculating them by hand.
From there we use the PROC ARIMA procedure with this newly created dataset. With the IDENTIFY statement we give our target variable with the VAR option. We also include the CROSSCORR option with our all of our Fourier variables we just calculated. We then use the ESTIMATE statement to define our model with the Fourier variables in six different combinations where we add a new pair of sine and cosine functions with each new model. The INPUT option is where we put our Fourier variables into the actual model. We can add our autoregressive (AR) lags with the P option and our moving average (MA) terms with the Q option. We have a model with lags on the seasonal lags as well which is explained in the section below.
Code
data train_f;
set train;= constant('PI');
pi =cos(2*pi*1*_n_/12); x2=sin(2*pi*1*_n_/12);
x1=cos(2*pi*2*_n_/12); x4=sin(2*pi*2*_n_/12);
x3=cos(2*pi*3*_n_/12); x6=sin(2*pi*3*_n_/12);
x5=cos(2*pi*4*_n_/12); x8=sin(2*pi*4*_n_/12);
x7=cos(2*pi*5*_n_/12); x10=sin(2*pi*5*_n_/12);
x9=cos(2*pi*6*_n_/12); x12=sin(2*pi*6*_n_/12);
x11
run;
= train_f;
proc arima data = Passengers nlag = 36 crosscorr = (x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12);
identify var = (x1 x2) p = (1)(12) q = (1)(12);
estimate input = (x1 x2 x3 x4) p = (1)(12) q = (1)(12);
estimate input = (x1 x2 x3 x4 x5 x6) p = (1)(12) q = (1)(12);
estimate input = (x1 x2 x3 x4 x5 x6 x7 x8) p = (1)(12) q = (1)(12);
estimate input = (x1 x2 x3 x4 x5 x6 x7 x8 x9 x10) p = (1)(12) q = (1)(12);
estimate input = (x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12) p = (1)(12) q = (1)(12);
estimate input
run; quit;
The results shown is the just from the model with the best AIC value.
Based on the results above, the model with all of the Fourier terms has the lowest value of AIC. This model is the same as the model that we had with the seasonal dummy variables.
There are some advantages and disadvantages of the Fourier term approach to accounting for seasonality. The main advantage is that Fourier terms can handle long and complex seasonality. In fact, for multiple seasons in a dataset, we can add more Fourier variables at different frequencies to account for this. The disadvantages of the Fourier approach are that the Fourier terms themselves are not really interpretable and that we need to use trial and error to find the “right” amount of variables to use.
Other Predictor Variables
The last common approach to accounting for seasonality in data is to use other predictor variables that have a matching seasonality to use them in a model. Modeling these variables against the target variable might account for the seasonality in our dataset. An example of this might be outdoor temperature modeling energy usage because they have the same 24 hour cycle. This cycle might not match during certain times of the year as well as others which might need to be accounted for in modeling the seasonality.
There are some advantages and disadvantages of the external predictor variable approach to accounting for seasonality. The main advantage is that these variables can handle long and complex seasonality. In fact, for multiple seasons in a dataset, we can add more predictor variables to potentially account for this. There is also the potential of nice interpretability for these variables in the model. The disadvantages of the external predictor variable approach are that we need to use trial and error to find the “right” variables to use if there are even variables we can use in this context.
Stochastic Solution
When a stochastic solution is best to solve seasonality, we need to take seasonal differences to account for the seasonality. A difference on a season is when we look at the difference between the current point and the same point in the previous season: \(Y_t - Y_{t-S}\). For our dataset, it can be thought of as the year over year change in our data.
Let’s look at a plot of the original data vs. the seasonally differenced data.
There are some limitations to differencing. Differencing is hard to evaluate for long and complex seasons due to the statistical tests for stochastic differencing typically ended at a season length of 24. Therefore, long and/or complex seasons are typically best approached with deterministic solutions. In fact, it is hard to imagine a difference being very long in terms of time points logically. For example, if you had daily data and thought you had an annual season, it is hard to imagine there is an actual impact of noon on Jan 26 from last year on noon on Jan 26 of this year.
Prepare for Modeling
After we remove the seasonal effect through either deterministic or stochastic approaches we need to check to see if we have stationary data for modeling. Just because we remove the seasonality doesn’t mean the data is fully stationary. For this we would use the same techniques as we saw in the section of stationarity. For this example we will just use the differenced data from our stochastic solution, but the following should be done if you want to check the deterministic solutions as well.
Let’s see how to do this in our softwares!
R has this functionality built in with its unitroot_ndiffs
option inside of the features
function. We just need to specify the variable we want to calculate this test on, which for us is the seasonal difference on the variable Passengers. To get this variable we use the mutate
function to create our seasonal difference of 12 lags. The output reports the number of differences it suggests based on the unit root test.
Code
%>%
train mutate(Pass_diff = difference(Passengers, lag = 12)) %>%
features(Pass_diff, unitroot_ndiffs)
# A tibble: 1 × 1
ndiffs
<int>
1 0
Python has this functionality built in with its adfuller
function inside of the statsmodels.tsa.stattools
package. We just need to specify the variable we want to calculate this test on, which for us is the seasonal difference on the variable Passengers. To get this variable we use the shift
function to create our seasonal difference of 12 lags.
Code
from statsmodels.tsa.stattools import adfuller
"Passengers_diff"] = train["Passengers"] - train["Passengers"].shift(12) train[
<string>:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Code
= adfuller(train.iloc[12:207]["Passengers_diff"])
result result
(-3.119380688427271, 0.025151781735149783, 15, 179, {'1%': -3.4674201432469816, '5%': -2.877826051844538, '10%': -2.575452082332012}, 3265.477674996173)
SAS has this functionality built in with its STATIONARITY option inside of the IDENTIFY statement from the PROC ARIMA procedure. We just need to specify the variable we want to calculate this test on, which for us is the seasonal difference on the variable Passengers. To get this variable we use the (12)
specification to create our seasonal difference of 12 lags.
Code
= train;
proc arima data = Passengers(12) stationarity = (adf = 5);
identify var run;
Based on the results above, our differenced data is stationary. Once our data is made stationary by either deterministic or stochastic approaches, the remaining data or differences are modeled with seasonal ARIMA models. The key to these new seasonal ARIMA models is that there still might be effects at seasonal lags in the dataset, even though the main seasonality is accounted for.
Seasonal ARIMA Modeling
When extending the ARIMA model framework to the seasonal ARIMA framework, we add another set of terms - P, D, Q, and S. Notice how these terms are capitalized.
\[ ARIMA(p, d, q)(P, D, Q)_S \]
The terms represent the number of seasonal AR terms (\(P\)), the number of seasonal MA terms (\(Q\)), and the number of seasonal differences (\(D\)). The length of the season is still defined as \(S\). Imagine we have an \(ARIMA(1,0,1)(2,1,0)_{12}\) model. This would be the structure:
\[ Y_{t} - Y_{t-12} = W_t \]
\[ W_t = \omega + \phi_1 W_{t-1} + \phi_2 W_{t-12} + \phi_3 W_{t-24} + \theta_1 e_{t-1} + e_t \]
where \(W_t\) is the seasonal difference from \(D = 1\). The regular AR term of \(p = 1\) is represented by the first lag \(W_{t-1}\). The seasonal AR term of \(P = 2\) is represented by the first two seasonal lags \(W_{t-12}\) and \(W_{t-24}\). The regular MA term of \(q=1\) is represented as the first error lag \(e_{t-1}\).
Seasonal ARIMA models have the same structure and approach as typical ARIMA models with AR and MA patterns in the ACF and PACF. The main difference is that the pattern is just on the seasonal lag instead of the individual lags. For example, if you had an \(ARIMA(0,0,0)(1,0,0)_{12}\) model, then the ACF would have an exponentially decreasing pattern every 12 lags while the PACF only had a single spike at lag 12.
The pattern is the same, but it is only seen on the season since we are dealing with the seasonal lag.
The opposite would be true for an \(ARIMA(0,0,0)(0,0,1)_{12}\) model. The PACF would have an exponentially decreasing pattern every 12 lags while the ACF only had a single spike at lag 12.
The pattern is the same, but it is only seen on the season since we are dealing with the seasonal lag. For our US airlines data we can observe the lag structure on the differenced data below.
Warning: Removed 12 rows containing missing values or values outside the scale range
(`geom_line()`).
Warning: Removed 12 rows containing missing values or values outside the scale range
(`geom_point()`).
From the above ACF and PACF plots we see a large spike in the PACF at lag 1 and an exponential decrease in the ACF within the first season. This would lend itself to a \(p = 1\) (AR term at lag 1) with \(q = 0\) (no MA term within a season). On the seasonal lags we see spikes at most of the seasons (12, 24, 36, etc.) in both plots. This kind of complicated pattern would lend itself to both a \(P =1\) (seasonal AR term at lag 12) and \(Q = 1\) (seasonal MA term at lag 12). Overall, for our data we can try the \(ARIMA(1,0,0)(1,1,1)_{12}\) model based on these findings.
Let’s build this seasonal ARIMA model in each of our softwares!
In R we can build both a specific ARIMA model and automatically select an ARIMA model using the same ARIMA
function inside of the model
function. To specify the specific ARIMA model we want we use the pdq
and PDQ
functions in the formula framework to build the regular and seasonal lags respectively. In the pdq
function we specify the \(p=1\) piece along with the \(d=0\) and \(q=0\). In the PDQ
function we are specifying the \(P = 1\), \(D = 1\), and \(Q = 1\) terms. To let R build its own ARIMA model through automatic selection we just use the ARIMA
function on the Passengers variable without specifying the pdq
and PDQ
functions.
Code
<- train %>%
model_SARIMA model(
hand = ARIMA(Passengers ~ pdq(1,0,0) + PDQ(1,1,1)),
auto = ARIMA(Passengers)
)
With these models built we can investigate them one at a time. Below we use the select
function to first select the model by built by hand and then view a summary of this model using the report
function. We can isolate the same model and also use the gg_tsresiduals
function to look at the residuals from this model up to lag 36 in the code below. Lastly, we look at the Ljung-Box test for white noise using the filter
function to get the model we want and the features
function with the ljung_box
option. We have 3 degrees of freedom (dof = 3
) because we have 3 terms we are estimating in our seasonal ARIMA model - \(p = 1\), \(P = 1\), and \(Q = 1\).
Code
%>%
model_SARIMA select(hand) %>%
report
Series: Passengers
Model: ARIMA(1,0,0)(1,1,1)[12] w/ drift
Coefficients:
ar1 sar1 sma1 constant
0.7444 0.1721 -0.7755 319.7502
s.e. 0.0487 0.1040 0.0753 38.5404
sigma^2 estimated as 3745598: log likelihood=-1754.8
AIC=3519.61 AICc=3519.93 BIC=3535.97
Code
%>%
model_SARIMA select(hand) %>%
gg_tsresiduals(lag = 36)
Code
augment(model_SARIMA) %>%
filter(.model == "hand") %>%
features(.innov, ljung_box, lag = 36, dof = 3)
# A tibble: 1 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 hand 37.4 0.275
From the output above we can see that the estimation of the coefficients from our model summary and the residuals from the plot. We also have statistical white noise with our residuals based on the results from the Ljung-Box test.
Let’s use the same code as above to view the results from R’s automatically selected seasonal ARIMA model.
Code
%>%
model_SARIMA select(auto) %>%
report
Series: Passengers
Model: ARIMA(1,0,1)(0,1,1)[12] w/ drift
Coefficients:
ar1 ma1 sma1 constant
0.8801 -0.2962 -0.6785 179.8722
s.e. 0.0454 0.0950 0.0600 34.0147
sigma^2 estimated as 3639496: log likelihood=-1751.67
AIC=3513.34 AICc=3513.66 BIC=3529.7
Code
%>%
model_SARIMA select(auto) %>%
gg_tsresiduals(lag = 36)
Code
augment(model_SARIMA) %>%
filter(.model == "auto") %>%
features(.innov, ljung_box, lag = 36, dof = 3)
# A tibble: 1 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 auto 29.4 0.648
From the output above we can see that the estimation of the coefficients from our model summary as well as what model R selected automatically - \(ARIMA(1,0,1)(0,1,1)_{12}\). We also have statistical white noise with our residuals based on the results from the Ljung-Box test.
Let’s forecast both of these models using the the forecast
function with the h = 12
option to forecast 12 months into the future. We also plot these two models’ forecasts against each other.
Code
<- forecast(model_SARIMA, h = 12)
model_SARIMA_for
%>%
model_SARIMA_for autoplot(train) +
autolayer(fitted(model_SARIMA), col = "blue", linetype = "dashed") +
geom_vline(xintercept = as_date("2007-03-15"),color="orange",linetype="longdash")
Plot variable not specified, automatically selected `.vars = .fitted`
Let’s take both of these models to the test dataset to see which forecasts better since both of these were rather comparable on the training dataset. For this we use the accuracy
function from the fabletools
library.
Code
::accuracy(model_SARIMA_for, test) fabletools
# A tibble: 2 × 10
.model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 auto Test 807. 1339. 1229. 1.20 1.89 NaN NaN 0.446
2 hand Test 752. 1280. 1162. 1.11 1.78 NaN NaN 0.494
Based on the output above it appears that the model built based on the ACF and PACF plots forecasts better than the automatically selected model. This isn’t always the case, but still a good example to show that automatic selection techniques are good starting points, but you should not be afraid to adjust them to see if you can get better models. This model was not as good as the previously built Holt-Winters exponentially smoothing model on the testing dataset, but again, this is not always the case.
In Python we can build both a specific ARIMA model and automatically select an ARIMA model using the same StatsForecast
function from the statsforecast
package. To specify the specific ARIMA model we want we use the order
and seasonal_order
functions in the ARIMA
function to build the regular and seasonal lags respectively. In the order
function we specify the \(p=1\) piece along with the \(d=0\) and \(q=0\). In the seasonal_order
function we are specifying the \(P = 1\), \(D = 1\), and \(Q = 1\) terms along with the season length of 12 months (season_length = 12
). We also include a drift or intercept terms in our model. To let Python build its own ARIMA model through automatic selection we just use the AutoARIMA
function with only specifying the season length and drift. Then we have to tell the model we have a monthly frequency to our dataset with the freq = 'ME'
option. Lastly, we just use the .fit
function on our training dataset.
Code
= StatsForecast(models = [ARIMA(order = (1, 0, 0), season_length = 12, seasonal_order = (1,1,1), include_drift = True), AutoARIMA(season_length = 12, allowdrift = True)], freq = 'ME')
model_SARIMA = train_sf) model_SARIMA.fit(df
StatsForecast(models=[ARIMA,AutoARIMA])
With these models built we can investigate them one at a time. Below we use the .fitted_[0][0]
function to first select the model by built by hand since it was the first model built. We then use the .model_.get()
function along with different inputs to view a summary of this model. Specifically we ask for the ARIMA structure (arma
), coefficients (coef
), and AICc (aicc
) from the model. Lastly, we look at the Ljung-Box test for white noise using the residuals
option to get the residuals we want and the acorr_ljungbox
function for the test itself. We have 3 degrees of freedom (model_df = 3
) because we have 3 terms we are estimating in our seasonal ARIMA model - \(p = 1\), \(P = 1\), and \(Q = 1\).
Code
0][0].model_.get("arma") model_SARIMA.fitted_[
(1, 0, 1, 1, 12, 0, 1)
Code
0][0].model_.get("coef") model_SARIMA.fitted_[
{'ar1': 0.6391833304171034, 'sar1': 0.030766491031681567, 'sma1': -0.677194898063756, 'drift': 123.67179487179486}
Code
0][0].model_.get("aicc") model_SARIMA.fitted_[
3528.3589065531787
Code
0][0].model_.get("residuals"), lags = [36], model_df = 3) sm.stats.acorr_ljungbox(model_SARIMA.fitted_[
lb_stat lb_pvalue
36 72.651693 0.000083
From the output above we can see that the estimation of the coefficients from our model summary. We also have statistical white noise with our residuals based on the results from the Ljung-Box test.
Let’s use the same code as above to view the results from Python’s automatically selected seasonal ARIMA model represented with the .fitted_[0][1]
model.
Code
0][1].model_.get("arma") model_SARIMA.fitted_[
(1, 0, 0, 2, 12, 0, 1)
Code
0][1].model_.get("coef") model_SARIMA.fitted_[
{'ar1': 0.7259352846043337, 'sma1': -0.6129986684467381, 'sma2': -0.06770550890976591}
Code
0][1].model_.get("aicc") model_SARIMA.fitted_[
3575.825387284918
Code
0][1].model_.get("residuals"), lags = [36], model_df = 3) sm.stats.acorr_ljungbox(model_SARIMA.fitted_[
lb_stat lb_pvalue
36 52.813322 0.015734
From the output above we can see that the estimation of the coefficients from our model summary as well as what model Python selected automatically - \(ARIMA(1,0,0)(0,1,2)_{12}\). We also do not have statistical white noise with our residuals based on the results from the Ljung-Box test.
Let’s forecast both of these models using the the forecast
function with the h = 12
option to forecast 12 months into the future. We also plot these two models’ forecasts against each other.
Code
= model_SARIMA.forecast(df = train_sf, h = 12, level = [95])
model_SARIMA_for
model_SARIMA.plot(train_sf, model_SARIMA_for)
Let’s take both of these models to the test dataset to see which forecasts better since both of these were rather comparable on the training dataset.
Code
= np.array(test['Passengers']) - np.array(model_SARIMA_for['AutoARIMA'])
error = np.mean(abs(error)/test['Passengers'])*100
MAPE print("Auto. MAE =", np.mean(abs(error)), "\nAuto. MAPE =", MAPE)
Auto. MAE = 4475.8896484375
Auto. MAPE = 6.97691719464673
Code
= np.array(test['Passengers']) - np.array(model_SARIMA_for['ARIMA'])
error = np.mean(abs(error)/test['Passengers'])*100
MAPE print("Hand MAE =", np.mean(abs(error)), "\nHand MAPE =", MAPE)
Hand MAE = 1249.953125
Hand MAPE = 1.9164337911544131
Based on the output above it appears that the model built based on the ACF and PACF plots forecasts better than the automatically selected model. This isn’t always the case, but still a good example to show that automatic selection techniques are good starting points, but you should not be afraid to adjust them to see if you can get better models. This model was not as good as the previously built Holt-Winters exponentially smoothing model on the testing dataset, but again, this is not always the case.
In SAS we can build ARIMA models in the PROC ARIMA procedure. We can look at the IDENTIFY statement to examine the target variable. Here we are specifying the seasonal difference of the target variable using the (12)
at the end of the Passengers variable in the VAR option. The MINIC option will have SAS recommend a model based on lowest BIC. The P and Q options tell SAS the bounds on the models to test with the MINIC. When running the procedure we see SAS recommend an AR term of 1 and an MA term of 12.
Next, we use the ESTIMATE statements to build our models. The first estimate statement is the model we are developing by hand from looking at the ACF and PACF plots. In the P option we specify the \(p=1\) and \(P = 1\) lags by specifying the lags as (1)(12)
. In the Q option we specify the \(Q = 1\) by specifying the lags as (12)
. The AR terms are specified in multiplicative format and not additive format ((1, 12)
) as that is what is fit by default in R and Python. The difference between this is discussed in the section below. After the ESTIMATE statement we put a FORECAST statement with LEAD equaling 12 to forecast the next 12 months as well as the OUT option to save these forecasts.
The second estimate statement is the model we are developing automatically from the MINIC. In the P option we specify the \(p=1\) lag by specifying the lag as (1)
. In the Q option we specify the \(Q = 1\) by specifying the lags as (12)
. After the ESTIMATE statement we put a FORECAST statement with LEAD equaling 12 to forecast the next 12 months as well as the OUT option to save these forecasts.
Code
= train;
proc arima data = Passengers(12) nlag = 36 minic P = (0:12) Q = (0:12);
identify var = (1)(12) q = (12);
estimate p = 12 id = date interval = month out = work.SARIMA_hand_f;
forecast lead = 1 q = (12);
estimate p = 12 id = date interval = month out = work.SARIMA_auto_f;
forecast lead
run; quit;
From the output above we can see that the estimation of the coefficients from our model summaries. We also have statistical white noise with our residuals from both models based on the results from the Ljung-Box test plots above as well.
We need to isolate just the forecasted observations from the code above. For that we use the DATA STEP to just look at the predicted values from the two previous models beyond the March, 2007 training observations with the WHERE statement. Next, we use the MERGE statement in another DATA STEP to merge the original test dataset with the predicted values from the above model. Lastly, we calculate the absolute error and absolute percentage error for each observation with our last DATA STEP. To get the average of these calculations, we throw our test dataset into the PROC MEANS procedure where we specify these variables with the VAR statement.
Code
data SARIMA_hand_f;
set SARIMA_hand_f;> '1mar2007'd;
where date = Forecast;
Predict_SARIMA_hand
keep Predict_SARIMA_hand;
run;
data test;
merge test SARIMA_hand_f;
run;
data SARIMA_auto_f;
set SARIMA_auto_f;> '1mar2007'd;
where date = Forecast;
Predict_SARIMA_auto
keep Predict_SARIMA_auto;
run;
data test;
merge test SARIMA_auto_f;
run;
data test;
set test;= abs(Passengers - Predict_SARIMA_hand);
AE_SARIMA_hand = abs(Passengers - Predict_SARIMA_hand)/Passengers*100;
APE_SARIMA_hand = abs(Passengers - Predict_SARIMA_auto);
AE_SARIMA_auto = abs(Passengers - Predict_SARIMA_auto)/Passengers*100;
APE_SARIMA_auto
run;
= test;
proc means data
var AE_SARIMA_hand APE_SARIMA_hand AE_SARIMA_auto APE_SARIMA_auto; run;
Based on the output above it appears that the model built based on the ACF and PACF plots forecasts better than the automatically selected model. This isn’t always the case, but still a good example to show that automatic selection techniques are good starting points, but you should not be afraid to adjust them to see if you can get better models. This model was not as good as the previously built Holt-Winters exponentially smoothing model on the testing dataset, but again, this is not always the case.
Multiplicative vs. Additive
To look at the mathematical difference between multiplicative and additive models we need to first need to define the backshift operator, \(B\). The backshift operator on a variable shifts the variable back in time. For example, \(B(Y_t) = Y_{t-1}\). This can be extended to any number of lags. For example, a two lagged backshift is \(B^2(Y_t) = B(Y_{t-1}) = Y_{t-2}\).
For additive seasonal terms the model is structured as the following:
\[ (1 - \phi_1 B - \phi_2B^{12})Y_t = e_t \]
This can be factored out in the following way:
\[ Y_t -\phi_1 B(Y_t) - \phi_2 B^{12}(Y_t) = e_t \]
\[ Y_t - \phi_1 Y_{t-1} - \phi_2 Y_{t-12} = e_t \]
\[ Y_t = \phi_1 Y_{t-1} + \phi_2 Y_{t-12} + e_t \]
However, the multiplicative seasonal terms the model is structured differently:
\[ (1 - \phi_1B)(1 - \phi_2 B^{12})Y_t = e_t \]
This can be factored out in the following way:
\[ (1 - \phi_1 B - \phi_2 B^{12} - \phi_1 \phi_2 B^{13})Y_t = e_t \]
\[ Y_t -\phi_1 B(Y_t) - \phi_2 B^{12}(Y_t) - \phi_1 \phi_2 B^{13}(Y_t) = e_t \]
\[ Y_t - \phi_1 Y_{t-1} - \phi_2 Y_{t-12} - \phi_1 \phi_2 Y_{t-13}= e_t \]
\[ Y_t = \phi_1 Y_{t-1} + \phi_2 Y_{t-12} + \phi_1 \phi_2 Y_{t-13} + e_t \]
This effect on the \(13^{th}\) lag is the multiplied impact from the first and twelfth lag. In fact, we can see this effect when we plot the ACF and PACF plots. In a dataset with an additive seasonal effect we only see the spike on the lag. However, with a multiplicative seasonal effect we typically see what we call “counter spikes” on one or both sides of the seasonal spike as the plot shows below.
This multiplicative seasonal effect is the default for R and Python, but must be specified in SAS to get.