Moving Average Models

Quick Summary

Moving Average Models

Once the data is made stationary (or if the data is stationary by default) we can move into a class of models called stationary models. One of the other common stationary models beyond AR models is the moving average (MA) model.

Often you can forecast a series based solely on the past error values of the data. This kind of model is better for describing events whose effect last for only a short period of time. We are going to start by focusing on the most basic case with only one error lag value of $e_t$, called the MA(1) model:

\[ Y_t = \omega + \theta e_{t-1} + e_t \]

where $e_t$ is the error remaining in the model and assumed to by white noise as defined in the previous section on stationarity.

With the MA(1) model, this relationship between t and t-1 exists for all one time period differences across the dataset. However, we can write out a series of these models to see the individual “shocks” only last for a short period of time. Let’s look at three consecutive models:

\[ Y_{t-1} = \omega + \theta e_{t-2} + e_{t-1} \]

\[ Y_{t} = \omega + \theta e_{t-1} + e_{t} \]

\[ Y_{t+1} = \omega + \theta e_{t} + e_{t+1} \]

We can see that the effect of shocks that happened in the previous time point are no longer felt by the next future time point. Notice how the impact of $e_{t-1}$ is gone by $Y_{t+1}$ in the MA(1) model. This goes back to our idea of this being a stationary model. The dependence of previous “shocks” disappear over time.

Correlation Functions

For data that follows an MA(1) structure, there are specific patterns we see in the autocorrelation and partial autocorrelation function. For a review of these functions, please see the previous section on correlation functions. With an MA(1) structure to our data, the autocorrelation function (ACF) has a significant spike at the number one lag in the model, followed by nothing after. The partial autocorrelation function (PACF) decreases exponentially as the number of lags increases.

Let’s imagine our data follows the following MA(1) process:

\[ Y_t = 0 + 0.8 e_{t-1} + e_t \]

The following plots would be be the ACF and PACF of the data that follows the above structure.

In the ACF plot on the left above, the first lag takes a value of $0.8$ followed by nothing afterwards because the following lags have no impact on the current period. For the PACF plot, the values of the correlation decrease as time goes on.

MA(q) Model

This moving average model can be extended to include $q$ lags instead of just one lag:

\[ Y_t = \omega + \theta_1 e_{t-1} + \theta_2 e_{t-2} + \cdots + \theta_p e_{t-q} + e_t \]

With a moving average structure to our data, the partial autocorrelation function (PACF) decreases exponentially in the long run as the number of lags increases but with a variety of patterns. The autocorrelation function (ACF) has a significant spike up to $q$ lags in the model, followed by nothing after.

The stationarity assumption still holds for this model as well because the impacts of error lags do not last forever.

Let’s see how to build this in each of our softwares!

We need to use a different dataset that is much more situated for a moving average model instead of the US airlines passengers dataset we have used up until this point. For this we will load a dataset that specifically follows a moving average structure. Let’s explore the dataset with the gg_tsdisplay function. Specifically, we will look at the Y variable across 12 lags. The plot_type = 'partial' option shows us the ACF and PACF for the variable.

Code

ma1 %>%
  gg_tsdisplay(Y, lag = 12, plot_type='partial')

From the plots above we can see early spikes in the PACF that tail off as we go further back in time in terms of lags. However, the ACF plot only has one spike before all the remaining spikes are essentially 0. To see these patterns better we can use the Acf and Pacf functions from the forecast package to individually plot the ACF and PACF respectively.

Code

forecast::Acf(ma1$Y, lag = 12, main = "ACF Plot")

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo

Code

forecast::Pacf(ma1$Y, lag = 12, main = "PACF Plot")

From what we saw earlier, a pattern in the ACF where there is one significant spike followed by no correlation implies we should model our data with an MA(1) model. We can use the model function (from the fable package) with the ARIMA function to specify a moving average model. With the formula input to the ARIMA function we have our target variable of Y along with an intercept (represented by the 1) and the pdq function to specify the number of lags of the moving average term. The first term is the number of autoregressive lags which we set to 0. The second term is the number of single differences which we set to 0 since our data is already stationary. The last term is the number of lags in the moving average piece which we will set to 1. Lastly, we use the report function to get a summary of the output from the model.

Code

model_MA <- ma1 %>%
  model(
    ARIMA(Y ~ 1 + pdq(0,0,1))
    )

report(model_MA)

Series: Y 
Model: ARIMA(0,0,1) w/ mean 

Coefficients:
         ma1  constant
      0.7365   -0.0718
s.e.  0.0232    0.0538

sigma^2 estimated as 0.9615:  log likelihood=-1398.69
AIC=2803.38   AICc=2803.4   BIC=2818.1

What we can see form the above output is the values of $\theta_1$ along with the intercept, $\omega$.

To see if we removed all of the correlation from our data we can examine the residuals using the gg_tsresiduals function. To get a sense of whether these residuals are white noise we can use the features function with the ljung_box option. We will calculate this test for 12 lags with dof = 1 since our model had one coefficient.

Code

model_MA %>%
  gg_tsresiduals()

Code

augment(model_MA) %>%
  features(.innov, ljung_box, lag = 12, dof = 1)

# A tibble: 1 × 3
  .model                      lb_stat lb_pvalue
  <chr>                         <dbl>     <dbl>
1 ARIMA(Y ~ 1 + pdq(0, 0, 1))    8.94     0.628

Based on the picture above we can see that we have residuals that see to have no correlation to them. The p-value from the Ljung-Box test shows that we have white noise statistically.

We need to use a different dataset that is much more situated for a moving average model instead of the US airlines passengers dataset we have used up until this point. For this we will load a dataset that specifically follows a moving average structure. Let’s explore the dataset by looking at the ACF and PACF functions. To see these patterns better we can use the plot_acf and plot_pacf functions from the statsmodels.api package to individually plot the ACF and PACF respectively.

Code

import statsmodels.api as sm

sm.graphics.tsa.plot_acf(ma1['y'],  lags = 12)

plt.show();

Code

sm.graphics.tsa.plot_pacf(ma1['y'],  lags = 12)

plt.show();

From what we saw earlier, a pattern in the ACF where there is one significant spike followed by no correlation implies we should model our data with an MA(1) model. We can use the StatsForecast function with the ARIMA function to specify a moving average model. With the ARIMA function we have our target variable of Y and the order option to specify the number of lags of the moving average term. The first term is the number of autoregressive lags which we set to 0. The second term is the number of single differences which we set to 0 since our data is already stationary. The last term is the number of lags in the moving average piece which we will set to 1. By using the .fit function we can build our MA(1) model. Lastly, we use the .fitted_[0][0].model_.get('coef') function to get a summary of the coefficients from the model.

Code

from statsforecast import StatsForecast
from statsforecast.models import ARIMA

model_MA = StatsForecast(models = [ARIMA(order = (0, 0, 1))], freq = 1)
model_MA.fit(df = ma1)

StatsForecast(models=[ARIMA])

Code

model_MA.fitted_[0][0].model_.get("coef")

{'ma1': 0.7360609636084255, 'intercept': -0.07255961530439668}

What we can see form the above output is the values of $\theta_1$ along with the intercept, $\omega$.

To see if we removed all of the correlation from our data we can examine the residuals. To get a sense of whether these residuals are white noise we can use the acorr_ljungbox function. We will calculate this test for 12 lags with model_df = 1 since our model had one coefficient.

Code

sm.stats.acorr_ljungbox(model_MA.fitted_[0][0].model_.get("residuals"), lags = [12], model_df = 1)

     lb_stat  lb_pvalue
12  8.847728   0.635945

Based on the picture above we can see that we have residuals that see to have no correlation to them. The p-value from the Ljung-Box test shows that we have white noise statistically.

We need to use a different dataset that is much more situated for a moving average model instead of the US airlines passengers dataset we have used up until this point. For this we will load a dataset that specifically follows a moving average structure. Let’s explore the dataset by looking at the ACF and PACF functions. We can use the PROC ARIMA procedure in SAS to view the ACF and PACF functions. The plot = all option gives us the necessary plots. The IDENTIFY statement is where we specify the variable of interest using the var = option. We will look at the target variable Y for up to 12 lags.

Code

proc arima data = ma1 plot = all;
    identify var = Y nlag = 12;
run;

From the plots above we can see early spikes in the PACF that tail off as we go further back in time in terms of lags. However, the PACF plot only has two spikes before all the remaining spikes are essentially 0.

From what we saw earlier, a pattern in the PACF where there is one significant spike followed by no correlation implies we should model our data with an MA(1) model. We can use the same PROC ARIMA procedure to specify a moving average model too. We still need the same IDENTIFY statement to identify the target variable of Y. The ESTIMATE statement is used to specify the number of lags of the moving average term using the q = option. We set this number of moving average lags to 1. The last option we use is the method = ML option to specify building our model using maximum likelihood estimation.

Code

proc arima data = ma1 plot = all;
    identify var = Y nlag = 12;
    estimate q = 1 method = ML;
run;

What we can see form the above output is the values of $\theta_1$ along with the intercept, $\omega$. To see if we removed all of the correlation from our data we can examine the residual output from above as well. To get a sense of whether these residuals are white noise we can use the p-values in the autocorrelation check of residuals table. This shows us that our residuals are white noise. This can also be seen in the white noise plot above too. That white noise plot above is a little hard to grasp at first. The bars more represent the test statistics than the p-values. Larger test statistics would lead to smaller p-values. The alpha levels are marked on the y-axis to show you if the test statistic is too large.

--- title: "Moving Average Models" format: html: code-fold: show code-tools: true editor: visual --- ```{r} #| include: false #| warning: false #| message: false library(fpp3) file.dir = "https://raw.githubusercontent.com/sjsimmo2/TimeSeries/master/" input.file1 = "usairlines.csv" USAirlines = read.csv(paste(file.dir, input.file1,sep = "")) USAirlines <- USAirlines %>% mutate(date = yearmonth(lubridate::make_date(Year, Month))) USAirlines_ts <- as_tsibble(USAirlines, index = date) train <- USAirlines_ts %>% select(Passengers, date, Month) %>% filter_index(~ "2007-03") test <- USAirlines_ts %>% select(Passengers, date, Month) %>% filter_index("2007-04" ~ .) ``` ```{python} #| include: false #| warning: false #| message: false import pandas as pd import numpy as np import matplotlib.pyplot as plt import statsmodels.api as sm import seaborn as sns from statsforecast import StatsForecast usair = pd.read_csv("https://raw.githubusercontent.com/sjsimmo2/TimeSeries/master/usairlines.csv") df = pd.date_range(start = '1/1/1990', end = '3/1/2008', freq = 'MS') usair.index = pd.to_datetime(df) train = usair.head(207) test = usair.tail(12) d = {'unique_id': 1, 'ds': train.index, 'y': train['Passengers']} train_sf = pd.DataFrame(data = d) d = {'unique_id': 1, 'ds': test.index, 'y': test['Passengers']} test_sf = pd.DataFrame(data = d) ``` # Quick Summary {{< video https://www.youtube.com/embed/zNLG8tsA_Go?si=NNDHhAIa3btoHrgi >}} # Moving Average Models Once the data is made stationary (or if the data is stationary by default) we can move into a class of models called stationary models. One of the other common stationary models beyond AR models is the **moving average (MA) model**. Often you can forecast a series based solely on the past *error* values of the data. This kind of model is better for describing events whose effect last for only a short period of time. We are going to start by focusing on the most basic case with only one error lag value of $e_t$, called the MA(1) model: $$ Y_t = \omega + \theta e_{t-1} + e_t $$ where $e_t$ is the error remaining in the model and assumed to by white noise as defined in the previous section on stationarity. With the MA(1) model, this relationship between *t* and *t-1* exists for **all** one time period differences across the dataset. However, we can write out a series of these models to see the individual "shocks" only last for a short period of time. Let's look at three consecutive models: $$ Y_{t-1} = \omega + \theta e_{t-2} + e_{t-1} $$ $$ Y_{t} = \omega + \theta e_{t-1} + e_{t} $$ $$ Y_{t+1} = \omega + \theta e_{t} + e_{t+1} $$ We can see that the effect of shocks that happened in the previous time point are no longer felt by the next future time point. Notice how the impact of $e_{t-1}$ is gone by $Y_{t+1}$ in the MA(1) model. This goes back to our idea of this being a stationary model. The dependence of previous "shocks" disappear over time. # Correlation Functions For data that follows an MA(1) structure, there are specific patterns we see in the autocorrelation and partial autocorrelation function. For a review of these functions, please see the previous section on correlation functions. With an MA(1) structure to our data, the autocorrelation function (ACF) has a significant spike at the number one lag in the model, followed by nothing after. The partial autocorrelation function (PACF) decreases exponentially as the number of lags increases. Let's imagine our data follows the following MA(1) process: $$ Y_t = 0 + 0.8 e_{t-1} + e_t $$ The following plots would be be the ACF and PACF of the data that follows the above structure. ![](image/MA1.png){fig-align="center" width="9.2in"} In the ACF plot on the left above, the first lag takes a value of $0.8$ followed by nothing afterwards because the following lags have no impact on the current period. For the PACF plot, the values of the correlation decrease as time goes on. # MA(q) Model This moving average model can be extended to include $q$ lags instead of just one lag: $$ Y_t = \omega + \theta_1 e_{t-1} + \theta_2 e_{t-2} + \cdots + \theta_p e_{t-q} + e_t $$ With a moving average structure to our data, the partial autocorrelation function (PACF) decreases exponentially in the long run as the number of lags increases but with a variety of patterns. The autocorrelation function (ACF) has a significant spike up to $q$ lags in the model, followed by nothing after. The stationarity assumption still holds for this model as well because the impacts of error lags do not last forever. Let's see how to build this in each of our softwares! ::: {.panel-tabset .nav-pills} ## R We need to use a different dataset that is much more situated for a moving average model instead of the US airlines passengers dataset we have used up until this point. For this we will load a dataset that specifically follows a moving average structure. Let's explore the dataset with the `gg_tsdisplay` function. Specifically, we will look at the *Y* variable across 12 lags. The `plot_type = 'partial'` option shows us the ACF and PACF for the variable. ```{r} #| echo: false ma1 <- data.frame(Y = as.numeric(arima.sim(n = 1000, list(ma=c(0.7))))) ma1$index <- seq(1:1000) ma1 <- as_tsibble(ma1, index = index) ``` ```{r} ma1 %>% gg_tsdisplay(Y, lag = 12, plot_type='partial') ``` From the plots above we can see early spikes in the PACF that tail off as we go further back in time in terms of lags. However, the ACF plot only has one spike before all the remaining spikes are essentially 0. To see these patterns better we can use the `Acf` and `Pacf` functions from the `forecast` package to individually plot the ACF and PACF respectively. ```{r} forecast::Acf(ma1$Y, lag = 12, main = "ACF Plot") forecast::Pacf(ma1$Y, lag = 12, main = "PACF Plot") ``` From what we saw earlier, a pattern in the ACF where there is one significant spike followed by no correlation implies we should model our data with an MA(1) model. We can use the `model` function (from the `fable` package) with the `ARIMA` function to specify a moving average model. With the formula input to the `ARIMA` function we have our target variable of *Y* along with an intercept (represented by the 1) and the `pdq` function to specify the number of lags of the moving average term. The first term is the number of autoregressive lags which we set to 0. The second term is the number of single differences which we set to 0 since our data is already stationary. The last term is the number of lags in the moving average piece which we will set to 1. Lastly, we use the `report` function to get a summary of the output from the model. ```{r} model_MA <- ma1 %>% model( ARIMA(Y ~ 1 + pdq(0,0,1)) ) report(model_MA) ``` What we can see form the above output is the values of $\theta_1$ along with the intercept, $\omega$. To see if we removed all of the correlation from our data we can examine the residuals using the `gg_tsresiduals` function. To get a sense of whether these residuals are white noise we can use the `features` function with the `ljung_box` option. We will calculate this test for 12 lags with `dof = 1` since our model had one coefficient. ```{r} model_MA %>% gg_tsresiduals() augment(model_MA) %>% features(.innov, ljung_box, lag = 12, dof = 1) ``` Based on the picture above we can see that we have residuals that see to have no correlation to them. The p-value from the Ljung-Box test shows that we have white noise statistically. ## Python We need to use a different dataset that is much more situated for a moving average model instead of the US airlines passengers dataset we have used up until this point. For this we will load a dataset that specifically follows a moving average structure. Let's explore the dataset by looking at the ACF and PACF functions. To see these patterns better we can use the `plot_acf` and `plot_pacf` functions from the `statsmodels.api` package to individually plot the ACF and PACF respectively. ```{python} #| echo: false ma1 = r.ma1 d = {'unique_id': 1, 'ds': range(1000), 'y': ma1['Y']} ma1 = pd.DataFrame(data = d) ``` ```{python} import statsmodels.api as sm sm.graphics.tsa.plot_acf(ma1['y'], lags = 12) plt.show(); sm.graphics.tsa.plot_pacf(ma1['y'], lags = 12) plt.show(); ``` From the plots above we can see early spikes in the PACF that tail off as we go further back in time in terms of lags. However, the ACF plot only has one spike before all the remaining spikes are essentially 0. From what we saw earlier, a pattern in the ACF where there is one significant spike followed by no correlation implies we should model our data with an MA(1) model. We can use the `StatsForecast` function with the `ARIMA` function to specify a moving average model. With the `ARIMA` function we have our target variable of *Y* and the `order` option to specify the number of lags of the moving average term. The first term is the number of autoregressive lags which we set to 0. The second term is the number of single differences which we set to 0 since our data is already stationary. The last term is the number of lags in the moving average piece which we will set to 1. By using the `.fit` function we can build our MA(1) model. Lastly, we use the `.fitted_[0][0].model_.get('coef')` function to get a summary of the coefficients from the model. ```{python} from statsforecast import StatsForecast from statsforecast.models import ARIMA model_MA = StatsForecast(models = [ARIMA(order = (0, 0, 1))], freq = 1) model_MA.fit(df = ma1) model_MA.fitted_[0][0].model_.get("coef") ``` What we can see form the above output is the values of $\theta_1$ along with the intercept, $\omega$. To see if we removed all of the correlation from our data we can examine the residuals. To get a sense of whether these residuals are white noise we can use the `acorr_ljungbox` function. We will calculate this test for 12 lags with `model_df = 1` since our model had one coefficient. ```{python} sm.stats.acorr_ljungbox(model_MA.fitted_[0][0].model_.get("residuals"), lags = [12], model_df = 1) ``` Based on the picture above we can see that we have residuals that see to have no correlation to them. The p-value from the Ljung-Box test shows that we have white noise statistically. ## SAS We need to use a different dataset that is much more situated for a moving average model instead of the US airlines passengers dataset we have used up until this point. For this we will load a dataset that specifically follows a moving average structure. Let's explore the dataset by looking at the ACF and PACF functions. We can use the PROC ARIMA procedure in SAS to view the ACF and PACF functions. The `plot = all` option gives us the necessary plots. The IDENTIFY statement is where we specify the variable of interest using the `var =` option. We will look at the target variable *Y* for up to 12 lags. ```{r} #| eval: false proc arima data = ma1 plot = all; identify var = Y nlag = 12; run; ``` ![](image/7_a_SAS.png){fig-align="center" width="5in"} From the plots above we can see early spikes in the PACF that tail off as we go further back in time in terms of lags. However, the PACF plot only has two spikes before all the remaining spikes are essentially 0. From what we saw earlier, a pattern in the PACF where there is one significant spike followed by no correlation implies we should model our data with an MA(1) model. We can use the same PROC ARIMA procedure to specify a moving average model too. We still need the same IDENTIFY statement to identify the target variable of *Y.* The ESTIMATE statement is used to specify the number of lags of the moving average term using the `q =` option. We set this number of moving average lags to 1. The last option we use is the `method = ML` option to specify building our model using maximum likelihood estimation. ```{r} #| eval: false proc arima data = ma1 plot = all; identify var = Y nlag = 12; estimate q = 1 method = ML; run; ``` ![](image/7_b_SAS.png){fig-align="center" width="4.74in"} ![](image/7_c_SAS.png){fig-align="center" width="9.71in"} ![](image/7_d_SAS.png){fig-align="center" width="5in"} What we can see form the above output is the values of $\theta_1$ along with the intercept, $\omega$. To see if we removed all of the correlation from our data we can examine the residual output from above as well. To get a sense of whether these residuals are white noise we can use the p-values in the autocorrelation check of residuals table. This shows us that our residuals are white noise. This can also be seen in the white noise plot above too. That white noise plot above is a little hard to grasp at first. The bars more represent the test statistics than the p-values. Larger test statistics would lead to smaller p-values. The alpha levels are marked on the y-axis to show you if the test statistic is too large. :::