Seasonal ARIMA Models

Quick Summary

Seasonality

Seasonality is the component of time series that represents the effects of seasonal variation. The seasonal component of a time series is the repeated pattern over a fixed period of time. For example, the airlines dataset we’ve been working through (shown below) shows seasonality every 12 observations (12 months) where we see different patterns in summer and holiday times of the year as compared to other times of the year. Seasonality occurs over a fixed and known period of time.

The length of the seasonal periods is denoted as S. Therefore, seasonal factors repeat every S units of time.

As mentioned in the stationarity section, seasonal models are not stationary by default because the time series doesn’t revert to a long term mean. When your data has both trend and seasonality, the seasonality should be the first problem you try and correct to make stationary. We do this because making the data stationary by accounting for the seasonality might also account for trend. This is not true for trend. Our process for building ARIMA models is the following:

Similar to when we accounted for trend, seasonality can be solved with a deterministic or stochastic solution.

  • Deterministic - Seasonal dummy variables, Fourier transformations, predictor variables

  • Stochastic - Seasonal differences

Once data is made stationary, we can model with traditional and seasonal ARIMA approaches. When we forecast our data we account for the deterministic or stochastic solution above and build that into our forecasts.

Seasonal “Unit-Root” Testing

Similar to trend, we can perform statistical tests to try and evaluate whether we need to take a seasonal difference or perform more deterministic solutions. However, seasonal unit root tests have problems with large seasonal frequencies - values of S that are larger than 12. To counter these problems with seasonal unit roots we use measures of seasonal strength instead:

\[ F_S = \max{(0, 1 - \frac{Var(E_t)}{Var(S_t + E_t)})} \]

where \(E_t\) is the error component and \(S_t\) is the seasonal component. These components both come from a time series decomposition of the data. If \(F_S < 0.64\) then no seasonal differencing would be recommended, while if \(F_S \ge 0.64\) then take one seasonal difference.

The only downside of this approach is that if your data has any real amount of seasonality it would just recommend taking the differences without any notion of deterministic solutions. Therefore, this really isn’t an approach to compare whether you should use deterministic or stochastic solution for your specific problem. In practice, we typically model our data with both deterministic and stochastic solutions and compare the forecasts to see which was more accurate.

Let’s see how to evaluate this in each of our softwares!

The tests above suggest taking a seasonal difference. However, let’s solve our problem with both a deterministic approach and stochastic approach to see which is better for our data.

Deterministic Solution

Unlike trend, there are many different approaches to accounting for seasonality deterministically. The approaches we will discuss are seasonal dummy variables, Fourier transforms, and predictor variables.

Seasonal Dummy Variables

Seasonal dummy variables are exactly what their name implies. We will use a set of dummy variables to account for the seasonal effects in our dataset. For a time series with S periods within a season, there will be S-1 dummy variables - one for each period (and one accounted for with the intercept).

Since our US airlines dataset is monthly, we will build a dummy variable for each month and pick one to not include. For example, we could build the following model:

\[ Y_t = \beta_0 + \beta_1JAN + \beta_2FEB + \cdots + \beta_{11}NOV + e_t \]

In the above model, we chose to leave out the month of December’s dummy variable since our model has an intercept. In fact, the intercept \(\beta_0\) is the average effect of December. Each of the coefficients on the respective variables measures the average difference in effect from that month and December (our reference level).

Let’s see how to do this in each of our softwares!

There are some advantages and disadvantages to the seasonal dummy variable approach. The advantages are that the model’s seasonal dummy variables have some nice interpretability. They tells us the average impact each seasonal component has on the target variable. It is also a rather straight-forward approach to implement. The main disadvantages however are that if your seasonal is especially long or complex, seasonal dummy variables are burdensome. Also, the constant effect of each season that is assumed may not be appropriate.

Fourier Transformations

Harmonic regression using Fourier transforms is another approach to account for seasonality deterministically. Fourier showed that a series of sine and cosine terms of the right frequencies approximate periodic patterns in a data series. To do this, we add Fourier variables to a regression model to account for the seasonal pattern. The odd terms \(k=1,3,5\) etc. are accounted for with sine variables:

\[ X_{k, t} = \sin(k \times \frac{2\pi t}{S}) \]

The even terms \(k = 2, 4, 6\) etc. are accounted for with cosine variables:

\[ X_{k, t} = \cos(k \times \frac{2\pi t}{S}) \]

The goal is to find the right combination of sine and cosine terms that when blended together mimic the unique pattern in our data. The idea of adding these terms together to get a new unique wave is show below:

In fact, if you add the same number of Fourier variables as you have seasonal dummy variables, you will get the same predictions. However, we typically do not need all of the Fourier variables, especially with large values of S. We need to decide how many of these values we need. One way to approach this is to build many models, while increasing the number of Fourier terms in the models and compare them all via a metric like BIC.

Let’s see how to do this in each of our softwares!

There are some advantages and disadvantages of the Fourier term approach to accounting for seasonality. The main advantage is that Fourier terms can handle long and complex seasonality. In fact, for multiple seasons in a dataset, we can add more Fourier variables at different frequencies to account for this. The disadvantages of the Fourier approach are that the Fourier terms themselves are not really interpretable and that we need to use trial and error to find the “right” amount of variables to use.

Other Predictor Variables

The last common approach to accounting for seasonality in data is to use other predictor variables that have a matching seasonality to use them in a model. Modeling these variables against the target variable might account for the seasonality in our dataset. An example of this might be outdoor temperature modeling energy usage because they have the same 24 hour cycle. This cycle might not match during certain times of the year as well as others which might need to be accounted for in modeling the seasonality.

There are some advantages and disadvantages of the external predictor variable approach to accounting for seasonality. The main advantage is that these variables can handle long and complex seasonality. In fact, for multiple seasons in a dataset, we can add more predictor variables to potentially account for this. There is also the potential of nice interpretability for these variables in the model. The disadvantages of the external predictor variable approach are that we need to use trial and error to find the “right” variables to use if there are even variables we can use in this context.

Stochastic Solution

When a stochastic solution is best to solve seasonality, we need to take seasonal differences to account for the seasonality. A difference on a season is when we look at the difference between the current point and the same point in the previous season: \(Y_t - Y_{t-S}\). For our dataset, it can be thought of as the year over year change in our data.

Let’s look at a plot of the original data vs. the seasonally differenced data.

There are some limitations to differencing. Differencing is hard to evaluate for long and complex seasons due to the statistical tests for stochastic differencing typically ended at a season length of 24. Therefore, long and/or complex seasons are typically best approached with deterministic solutions. In fact, it is hard to imagine a difference being very long in terms of time points logically. For example, if you had daily data and thought you had an annual season, it is hard to imagine there is an actual impact of noon on Jan 26 from last year on noon on Jan 26 of this year.

Prepare for Modeling

After we remove the seasonal effect through either deterministic or stochastic approaches we need to check to see if we have stationary data for modeling. Just because we remove the seasonality doesn’t mean the data is fully stationary. For this we would use the same techniques as we saw in the section of stationarity. For this example we will just use the differenced data from our stochastic solution, but the following should be done if you want to check the deterministic solutions as well.

Let’s see how to do this in our softwares!

Based on the results above, our differenced data is stationary. Once our data is made stationary by either deterministic or stochastic approaches, the remaining data or differences are modeled with seasonal ARIMA models. The key to these new seasonal ARIMA models is that there still might be effects at seasonal lags in the dataset, even though the main seasonality is accounted for.

Seasonal ARIMA Modeling

When extending the ARIMA model framework to the seasonal ARIMA framework, we add another set of terms - P, D, Q, and S. Notice how these terms are capitalized.

\[ ARIMA(p, d, q)(P, D, Q)_S \]

The terms represent the number of seasonal AR terms (\(P\)), the number of seasonal MA terms (\(Q\)), and the number of seasonal differences (\(D\)). The length of the season is still defined as \(S\). Imagine we have an \(ARIMA(1,0,1)(2,1,0)_{12}\) model. This would be the structure:

\[ Y_{t} - Y_{t-12} = W_t \]

\[ W_t = \omega + \phi_1 W_{t-1} + \phi_2 W_{t-12} + \phi_3 W_{t-24} + \theta_1 e_{t-1} + e_t \]

where \(W_t\) is the seasonal difference from \(D = 1\). The regular AR term of \(p = 1\) is represented by the first lag \(W_{t-1}\). The seasonal AR term of \(P = 2\) is represented by the first two seasonal lags \(W_{t-12}\) and \(W_{t-24}\). The regular MA term of \(q=1\) is represented as the first error lag \(e_{t-1}\).

Seasonal ARIMA models have the same structure and approach as typical ARIMA models with AR and MA patterns in the ACF and PACF. The main difference is that the pattern is just on the seasonal lag instead of the individual lags. For example, if you had an \(ARIMA(0,0,0)(1,0,0)_{12}\) model, then the ACF would have an exponentially decreasing pattern every 12 lags while the PACF only had a single spike at lag 12.

The pattern is the same, but it is only seen on the season since we are dealing with the seasonal lag.

The opposite would be true for an \(ARIMA(0,0,0)(0,0,1)_{12}\) model. The PACF would have an exponentially decreasing pattern every 12 lags while the ACF only had a single spike at lag 12.

The pattern is the same, but it is only seen on the season since we are dealing with the seasonal lag. For our US airlines data we can observe the lag structure on the differenced data below.

Warning: Removed 12 rows containing missing values or values outside the scale range
(`geom_line()`).
Warning: Removed 12 rows containing missing values or values outside the scale range
(`geom_point()`).

From the above ACF and PACF plots we see a large spike in the PACF at lag 1 and an exponential decrease in the ACF within the first season. This would lend itself to a \(p = 1\) (AR term at lag 1) with \(q = 0\) (no MA term within a season). On the seasonal lags we see spikes at most of the seasons (12, 24, 36, etc.) in both plots. This kind of complicated pattern would lend itself to both a \(P =1\) (seasonal AR term at lag 12) and \(Q = 1\) (seasonal MA term at lag 12). Overall, for our data we can try the \(ARIMA(1,0,0)(1,1,1)_{12}\) model based on these findings.

Let’s build this seasonal ARIMA model in each of our softwares!

Multiplicative vs. Additive

To look at the mathematical difference between multiplicative and additive models we need to first need to define the backshift operator, \(B\). The backshift operator on a variable shifts the variable back in time. For example, \(B(Y_t) = Y_{t-1}\). This can be extended to any number of lags. For example, a two lagged backshift is \(B^2(Y_t) = B(Y_{t-1}) = Y_{t-2}\).

For additive seasonal terms the model is structured as the following:

\[ (1 - \phi_1 B - \phi_2B^{12})Y_t = e_t \]

This can be factored out in the following way:

\[ Y_t -\phi_1 B(Y_t) - \phi_2 B^{12}(Y_t) = e_t \]

\[ Y_t - \phi_1 Y_{t-1} - \phi_2 Y_{t-12} = e_t \]

\[ Y_t = \phi_1 Y_{t-1} + \phi_2 Y_{t-12} + e_t \]

However, the multiplicative seasonal terms the model is structured differently:

\[ (1 - \phi_1B)(1 - \phi_2 B^{12})Y_t = e_t \]

This can be factored out in the following way:

\[ (1 - \phi_1 B - \phi_2 B^{12} - \phi_1 \phi_2 B^{13})Y_t = e_t \]

\[ Y_t -\phi_1 B(Y_t) - \phi_2 B^{12}(Y_t) - \phi_1 \phi_2 B^{13}(Y_t) = e_t \]

\[ Y_t - \phi_1 Y_{t-1} - \phi_2 Y_{t-12} - \phi_1 \phi_2 Y_{t-13}= e_t \]

\[ Y_t = \phi_1 Y_{t-1} + \phi_2 Y_{t-12} + \phi_1 \phi_2 Y_{t-13} + e_t \]

This effect on the \(13^{th}\) lag is the multiplied impact from the first and twelfth lag. In fact, we can see this effect when we plot the ACF and PACF plots. In a dataset with an additive seasonal effect we only see the spike on the lag. However, with a multiplicative seasonal effect we typically see what we call “counter spikes” on one or both sides of the seasonal spike as the plot shows below.

This multiplicative seasonal effect is the default for R and Python, but must be specified in SAS to get.