ARIMA Models

Quick Summary

AR and MA Relationship

The best part about autoregressive (AR) models and moving average (MA) models is that they are approximately the same thing just in reverse. In the presence of stationarity, AR models can be mathematically represented as infinite MA models. Similarly, in certain situations, MA models can be mathematically represented as infinite AR models.

Let’s look at one of these scenarios. Remember, the recursive calculation for the AR model from the section on autoregressive models with the most basic case of only one lag value of $Y_t$, called the AR(1) model:

\[ Y_t = \omega + \phi Y_{t-1} + e_t \]

where $e_t$ is the error remaining in the model and assumed to by white noise as defined in the previous section on stationarity. With the AR(1) model, this relationship between t and t-1 exists for all one time period differences across the dataset. Therefore, we can recursively solve for $Y_t$. We do this because we know that the equation for $Y_{t-1}$ is:

\[ Y_{t-1} = \omega + \phi Y_{t-2} + e_t \]

By plugging this equation into the original equation of $Y_t$ we have:

\[ Y_t = \omega + \phi (\omega + \phi Y_{t-2} + e_{t-1}) + e_t \]

\[ Y_t = \omega^* + \phi^2 Y_{t-2} + \phi e_{t-1} + e_t \]

We also know the equation for $Y_{t-2}$ is the following:

\[ Y_{t-2} = \omega + \phi Y_{t-3} + e_{t-2} \]

By plugging this equation in the recursive solution of $Y_t$ above we get:

\[ Y_t = \omega^* + \phi^2 (\omega + \phi Y_{t-3} + e_{t-2}) + \phi e_{t-1} + e_t \]

\[ Y_t = \omega^{**} + \phi^3 Y_{t-3} + \phi^2 e_{t-2} + \phi e_{t-1} + e_t \]

We can continue this process until we get back to the first data point at $t=1$ in our dataset.

\[ Y_t = \frac{\omega}{1 - \phi} + \phi^t Y_1 + \phi^{t-1} e_2 + \phi^{t-2} e_3 + \cdots + e_t \]

Instead of focusing on the $Y_1$ value in the above equation, let’s focus on all of the terms after the $Y_1$ value. Notice that all the terms after the $Y_1$ value form a large moving average model - specifically an MA(t-1) model. If we were to imagine an infinitely long dataset, then the MA model would be an MA($\infty$).

A similar calculation can be done in reverse where an MA(1) model can be written as an infinitely long AR model. In fact, this is why the partial autocorrelation function (PACF) in an MA(1) model has an exponentially decreasing correlation as the lags increase.

ARIMA Models

There is nothing to limit both AR and MA terms to be in the model simultaneously. These “mixed” models are typically used to help reduce the number of parameters needed for good estimation in the model. The most basic blend of these models has only one lag of each of them - the ARMA(1,1) model:

\[ Y_t = \omega + \phi Y_{t-1} + e_t + \theta e_{t-1} \]

Where is the “I” in ARIMA though? The “I” in the ARIMA model stands for integrated. It refers to the process of making your data stationary through differencing. As mentioned in the stationarity section, stationarity means that the distribution of the data depends only on the differences in time, not the location in time. These pieces - AR, I, and MA - make up the full ARIMA(p,d,q) model. The p stands for the number of AR terms in the model. The q stands for the number of MA terms. The d stands for the number of first differences required to make the data stationary.

Let’s look at an example of the ARIMA(1,1,1) model:

\[ W_t = Y_t - Y_{t-1} \]

\[ W_t = \omega + \phi W_{t-1} + \theta e_{t-1} + e_t \]

The $W_t$ term is the first difference of the original data, $d = 1$. The $W_{t-1}$ terms is the autoregressive piece of the model, $p = 1$. Lastly, the $e_{t-1}$ term is the moving average piece of the model, $q = 1$.

Just like with the individual AR and MA models, we can calculate the autocorrelation function (ACF) and partial autocorrelation function (PACF). However, these blended models have a much more complicated pattern since they are a blend of the two patterns discussed in the AR and MA model sections. This means that both the ACF and PACF trail off exponentially.

Let’s imagine our data follows the following ARMA(1,1) process:

\[ Y_t = 0 + 0.8 Y_{t-1} + 0.8 e_{t-1} + e_t \]

The following plots would be be the ACF and PACF of the data that follows the above structure.

With this kind of blend, it makes determining which kind of model you have more complicated.

Model Selection

When you have ARIMA models with both autoregressive and moving average terms it can become difficult to select the optimal values of $p$ and $q$ for the model. There are two common approaches to trying to select a reasonable ARIMA model:

Patterns in ACF and PACF plots
Automatic Selection Techniques

Plotting Patterns

Another name for the approach to using the autocorrelation and partial autocorrelation functions on your data is the Box-Jenkins approach. The table below is a summary of the patterns we have described with each of the components of the ARIMA model.

In the table above T refers to the function tailing off exponentially. The value of D(i) refers to the function drops off to 0 after lag i.

--- title: "ARIMA Models" format: html: code-fold: show code-tools: true editor: visual --- ```{r} #| include: false #| warning: false #| message: false library(fpp3) file.dir = "https://raw.githubusercontent.com/sjsimmo2/TimeSeries/master/" input.file1 = "usairlines.csv" USAirlines = read.csv(paste(file.dir, input.file1,sep = "")) USAirlines <- USAirlines %>% mutate(date = yearmonth(lubridate::make_date(Year, Month))) USAirlines_ts <- as_tsibble(USAirlines, index = date) train <- USAirlines_ts %>% select(Passengers, date, Month) %>% filter_index(~ "2007-03") test <- USAirlines_ts %>% select(Passengers, date, Month) %>% filter_index("2007-04" ~ .) ``` ```{python} #| include: false #| warning: false #| message: false import pandas as pd import numpy as np import matplotlib.pyplot as plt import statsmodels.api as sm import seaborn as sns from statsforecast import StatsForecast usair = pd.read_csv("https://raw.githubusercontent.com/sjsimmo2/TimeSeries/master/usairlines.csv") df = pd.date_range(start = '1/1/1990', end = '3/1/2008', freq = 'MS') usair.index = pd.to_datetime(df) train = usair.head(207) test = usair.tail(12) d = {'unique_id': 1, 'ds': train.index, 'y': train['Passengers']} train_sf = pd.DataFrame(data = d) d = {'unique_id': 1, 'ds': test.index, 'y': test['Passengers']} test_sf = pd.DataFrame(data = d) ``` # Quick Summary {{< video https://www.youtube.com/embed/dXND1OEBABI?si=t4zYdgtOcxhRSlsK >}} # AR and MA Relationship The best part about autoregressive (AR) models and moving average (MA) models is that they are approximately the same thing just in reverse. In the presence of stationarity, AR models can be mathematically represented as infinite MA models. Similarly, in certain situations, MA models can be mathematically represented as infinite AR models. Let's look at one of these scenarios. Remember, the recursive calculation for the AR model from the section on autoregressive models with the most basic case of only one lag value of $Y_t$, called the AR(1) model: $$ Y_t = \omega + \phi Y_{t-1} + e_t $$ where $e_t$ is the error remaining in the model and assumed to by white noise as defined in the previous section on stationarity. With the AR(1) model, this relationship between *t* and *t-1* exists for **all** one time period differences across the dataset. Therefore, we can recursively solve for $Y_t$. We do this because we know that the equation for $Y_{t-1}$ is: $$ Y_{t-1} = \omega + \phi Y_{t-2} + e_t $$ By plugging this equation into the original equation of $Y_t$ we have: $$ Y_t = \omega + \phi (\omega + \phi Y_{t-2} + e_{t-1}) + e_t $$ $$ Y_t = \omega^* + \phi^2 Y_{t-2} + \phi e_{t-1} + e_t $$ We also know the equation for $Y_{t-2}$ is the following: $$ Y_{t-2} = \omega + \phi Y_{t-3} + e_{t-2} $$ By plugging this equation in the recursive solution of $Y_t$ above we get: $$ Y_t = \omega^* + \phi^2 (\omega + \phi Y_{t-3} + e_{t-2}) + \phi e_{t-1} + e_t $$ $$ Y_t = \omega^{**} + \phi^3 Y_{t-3} + \phi^2 e_{t-2} + \phi e_{t-1} + e_t $$ We can continue this process until we get back to the first data point at $t=1$ in our dataset. $$ Y_t = \frac{\omega}{1 - \phi} + \phi^t Y_1 + \phi^{t-1} e_2 + \phi^{t-2} e_3 + \cdots + e_t $$ Instead of focusing on the $Y_1$ value in the above equation, let's focus on all of the terms after the $Y_1$ value. Notice that all the terms after the $Y_1$ value form a large moving average model - specifically an MA(t-1) model. If we were to imagine an infinitely long dataset, then the MA model would be an MA($\infty$). A similar calculation can be done in reverse where an MA(1) model can be written as an infinitely long AR model. In fact, this is why the partial autocorrelation function (PACF) in an MA(1) model has an exponentially decreasing correlation as the lags increase. # ARIMA Models There is nothing to limit both AR and MA terms to be in the model simultaneously. These "mixed" models are typically used to help reduce the number of parameters needed for good estimation in the model. The most basic blend of these models has only one lag of each of them - the ARMA(1,1) model: $$ Y_t = \omega + \phi Y_{t-1} + e_t + \theta e_{t-1} $$ Where is the "I" in ARIMA though? The "I" in the ARIMA model stands for integrated. It refers to the process of making your data stationary through differencing. As mentioned in the stationarity section, stationarity means that the distribution of the data depends only on the differences in time, not the location in time. These pieces - AR, I, and MA - make up the full ARIMA(p,d,q) model. The *p* stands for the number of AR terms in the model. The *q* stands for the number of MA terms. The *d* stands for the number of first differences required to make the data stationary. Let's look at an example of the ARIMA(1,1,1) model: $$ W_t = Y_t - Y_{t-1} $$ $$ W_t = \omega + \phi W_{t-1} + \theta e_{t-1} + e_t $$ The $W_t$ term is the first difference of the original data, $d = 1$. The $W_{t-1}$ terms is the autoregressive piece of the model, $p = 1$. Lastly, the $e_{t-1}$ term is the moving average piece of the model, $q = 1$. Just like with the individual AR and MA models, we can calculate the autocorrelation function (ACF) and partial autocorrelation function (PACF). However, these blended models have a much more complicated pattern since they are a blend of the two patterns discussed in the AR and MA model sections. This means that both the ACF and PACF trail off exponentially. Let's imagine our data follows the following ARMA(1,1) process: $$ Y_t = 0 + 0.8 Y_{t-1} + 0.8 e_{t-1} + e_t $$ The following plots would be be the ACF and PACF of the data that follows the above structure. ![](image/ARMA11.png){fig-align="center" width="9.2in"} With this kind of blend, it makes determining which kind of model you have more complicated. # Model Selection When you have ARIMA models with both autoregressive and moving average terms it can become difficult to select the optimal values of $p$ and $q$ for the model. There are two common approaches to trying to select a reasonable ARIMA model: 1. Patterns in ACF and PACF plots 2. Automatic Selection Techniques ## Plotting Patterns Another name for the approach to using the autocorrelation and partial autocorrelation functions on your data is the **Box-Jenkins approach**. The table below is a summary of the patterns we have described with each of the components of the ARIMA model. ![](image/BoxJenkins.png){fig-align="center"} In the table above *T* refers to the function tailing off exponentially. The value of *D(i)* refers to the function drops off to 0 after lag *i*.