Code
%>%
ma1 gg_tsdisplay(Y, lag = 12, plot_type='partial')
Once the data is made stationary (or if the data is stationary by default) we can move into a class of models called stationary models. One of the other common stationary models beyond AR models is the moving average (MA) model.
Often you can forecast a series based solely on the past error values of the data. This kind of model is better for describing events whose effect last for only a short period of time. We are going to start by focusing on the most basic case with only one error lag value of \(e_t\), called the MA(1) model:
\[ Y_t = \omega + \theta e_{t-1} + e_t \]
where \(e_t\) is the error remaining in the model and assumed to by white noise as defined in the previous section on stationarity.
With the MA(1) model, this relationship between t and t-1 exists for all one time period differences across the dataset. However, we can write out a series of these models to see the individual “shocks” only last for a short period of time. Let’s look at three consecutive models:
\[ Y_{t-1} = \omega + \theta e_{t-2} + e_{t-1} \]
\[ Y_{t} = \omega + \theta e_{t-1} + e_{t} \]
\[ Y_{t+1} = \omega + \theta e_{t} + e_{t+1} \]
We can see that the effect of shocks that happened in the previous time point are no longer felt by the next future time point. Notice how the impact of \(e_{t-1}\) is gone by \(Y_{t+1}\) in the MA(1) model. This goes back to our idea of this being a stationary model. The dependence of previous “shocks” disappear over time.
For data that follows an MA(1) structure, there are specific patterns we see in the autocorrelation and partial autocorrelation function. For a review of these functions, please see the previous section on correlation functions. With an MA(1) structure to our data, the autocorrelation function (ACF) has a significant spike at the number one lag in the model, followed by nothing after. The partial autocorrelation function (PACF) decreases exponentially as the number of lags increases.
Let’s imagine our data follows the following MA(1) process:
\[ Y_t = 0 + 0.8 e_{t-1} + e_t \]
The following plots would be be the ACF and PACF of the data that follows the above structure.
In the ACF plot on the left above, the first lag takes a value of \(0.8\) followed by nothing afterwards because the following lags have no impact on the current period. For the PACF plot, the values of the correlation decrease as time goes on.
This moving average model can be extended to include \(q\) lags instead of just one lag:
\[ Y_t = \omega + \theta_1 e_{t-1} + \theta_2 e_{t-2} + \cdots + \theta_p e_{t-q} + e_t \]
With a moving average structure to our data, the partial autocorrelation function (PACF) decreases exponentially in the long run as the number of lags increases but with a variety of patterns. The autocorrelation function (ACF) has a significant spike up to \(q\) lags in the model, followed by nothing after.
The stationarity assumption still holds for this model as well because the impacts of error lags do not last forever.
Let’s see how to build this in each of our softwares!
We need to use a different dataset that is much more situated for a moving average model instead of the US airlines passengers dataset we have used up until this point. For this we will load a dataset that specifically follows a moving average structure. Let’s explore the dataset with the gg_tsdisplay
function. Specifically, we will look at the Y variable across 12 lags. The plot_type = 'partial'
option shows us the ACF and PACF for the variable.
From the plots above we can see early spikes in the PACF that tail off as we go further back in time in terms of lags. However, the ACF plot only has one spike before all the remaining spikes are essentially 0. To see these patterns better we can use the Acf
and Pacf
functions from the forecast
package to individually plot the ACF and PACF respectively.
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
From what we saw earlier, a pattern in the ACF where there is one significant spike followed by no correlation implies we should model our data with an MA(1) model. We can use the model
function (from the fable
package) with the ARIMA
function to specify a moving average model. With the formula input to the ARIMA
function we have our target variable of Y along with an intercept (represented by the 1) and the pdq
function to specify the number of lags of the moving average term. The first term is the number of autoregressive lags which we set to 0. The second term is the number of single differences which we set to 0 since our data is already stationary. The last term is the number of lags in the moving average piece which we will set to 1. Lastly, we use the report
function to get a summary of the output from the model.
Series: Y
Model: ARIMA(0,0,1) w/ mean
Coefficients:
ma1 constant
0.7365 -0.0718
s.e. 0.0232 0.0538
sigma^2 estimated as 0.9615: log likelihood=-1398.69
AIC=2803.38 AICc=2803.4 BIC=2818.1
What we can see form the above output is the values of \(\theta_1\) along with the intercept, \(\omega\).
To see if we removed all of the correlation from our data we can examine the residuals using the gg_tsresiduals
function. To get a sense of whether these residuals are white noise we can use the features
function with the ljung_box
option. We will calculate this test for 12 lags with dof = 1
since our model had one coefficient.
# A tibble: 1 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 ARIMA(Y ~ 1 + pdq(0, 0, 1)) 8.94 0.628
Based on the picture above we can see that we have residuals that see to have no correlation to them. The p-value from the Ljung-Box test shows that we have white noise statistically.
We need to use a different dataset that is much more situated for a moving average model instead of the US airlines passengers dataset we have used up until this point. For this we will load a dataset that specifically follows a moving average structure. Let’s explore the dataset by looking at the ACF and PACF functions. To see these patterns better we can use the plot_acf
and plot_pacf
functions from the statsmodels.api
package to individually plot the ACF and PACF respectively.
From the plots above we can see early spikes in the PACF that tail off as we go further back in time in terms of lags. However, the ACF plot only has one spike before all the remaining spikes are essentially 0.
From what we saw earlier, a pattern in the ACF where there is one significant spike followed by no correlation implies we should model our data with an MA(1) model. We can use the StatsForecast
function with the ARIMA
function to specify a moving average model. With the ARIMA
function we have our target variable of Y and the order
option to specify the number of lags of the moving average term. The first term is the number of autoregressive lags which we set to 0. The second term is the number of single differences which we set to 0 since our data is already stationary. The last term is the number of lags in the moving average piece which we will set to 1. By using the .fit
function we can build our MA(1) model. Lastly, we use the .fitted_[0][0].model_.get('coef')
function to get a summary of the coefficients from the model.
StatsForecast(models=[ARIMA])
{'ma1': 0.7360609636084255, 'intercept': -0.07255961530439668}
What we can see form the above output is the values of \(\theta_1\) along with the intercept, \(\omega\).
To see if we removed all of the correlation from our data we can examine the residuals. To get a sense of whether these residuals are white noise we can use the acorr_ljungbox
function. We will calculate this test for 12 lags with model_df = 1
since our model had one coefficient.
lb_stat lb_pvalue
12 8.847728 0.635945
Based on the picture above we can see that we have residuals that see to have no correlation to them. The p-value from the Ljung-Box test shows that we have white noise statistically.
We need to use a different dataset that is much more situated for a moving average model instead of the US airlines passengers dataset we have used up until this point. For this we will load a dataset that specifically follows a moving average structure. Let’s explore the dataset by looking at the ACF and PACF functions. We can use the PROC ARIMA procedure in SAS to view the ACF and PACF functions. The plot = all
option gives us the necessary plots. The IDENTIFY statement is where we specify the variable of interest using the var =
option. We will look at the target variable Y for up to 12 lags.
From the plots above we can see early spikes in the PACF that tail off as we go further back in time in terms of lags. However, the PACF plot only has two spikes before all the remaining spikes are essentially 0.
From what we saw earlier, a pattern in the PACF where there is one significant spike followed by no correlation implies we should model our data with an MA(1) model. We can use the same PROC ARIMA procedure to specify a moving average model too. We still need the same IDENTIFY statement to identify the target variable of Y. The ESTIMATE statement is used to specify the number of lags of the moving average term using the q =
option. We set this number of moving average lags to 1. The last option we use is the method = ML
option to specify building our model using maximum likelihood estimation.
What we can see form the above output is the values of \(\theta_1\) along with the intercept, \(\omega\). To see if we removed all of the correlation from our data we can examine the residual output from above as well. To get a sense of whether these residuals are white noise we can use the p-values in the autocorrelation check of residuals table. This shows us that our residuals are white noise. This can also be seen in the white noise plot above too. That white noise plot above is a little hard to grasp at first. The bars more represent the test statistics than the p-values. Larger test statistics would lead to smaller p-values. The alpha levels are marked on the y-axis to show you if the test statistic is too large.
---
title: "Moving Average Models"
format:
html:
code-fold: show
code-tools: true
editor: visual
---
```{r}
#| include: false
#| warning: false
#| message: false
library(fpp3)
file.dir = "https://raw.githubusercontent.com/sjsimmo2/TimeSeries/master/"
input.file1 = "usairlines.csv"
USAirlines = read.csv(paste(file.dir, input.file1,sep = ""))
USAirlines <- USAirlines %>%
mutate(date = yearmonth(lubridate::make_date(Year, Month)))
USAirlines_ts <- as_tsibble(USAirlines, index = date)
train <- USAirlines_ts %>%
select(Passengers, date, Month) %>%
filter_index(~ "2007-03")
test <- USAirlines_ts %>%
select(Passengers, date, Month) %>%
filter_index("2007-04" ~ .)
```
```{python}
#| include: false
#| warning: false
#| message: false
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns
from statsforecast import StatsForecast
usair = pd.read_csv("https://raw.githubusercontent.com/sjsimmo2/TimeSeries/master/usairlines.csv")
df = pd.date_range(start = '1/1/1990', end = '3/1/2008', freq = 'MS')
usair.index = pd.to_datetime(df)
train = usair.head(207)
test = usair.tail(12)
d = {'unique_id': 1, 'ds': train.index, 'y': train['Passengers']}
train_sf = pd.DataFrame(data = d)
d = {'unique_id': 1, 'ds': test.index, 'y': test['Passengers']}
test_sf = pd.DataFrame(data = d)
```
# Quick Summary
{{< video https://www.youtube.com/embed/zNLG8tsA_Go?si=NNDHhAIa3btoHrgi >}}
# Moving Average Models
Once the data is made stationary (or if the data is stationary by default) we can move into a class of models called stationary models. One of the other common stationary models beyond AR models is the **moving average (MA) model**.
Often you can forecast a series based solely on the past *error* values of the data. This kind of model is better for describing events whose effect last for only a short period of time. We are going to start by focusing on the most basic case with only one error lag value of $e_t$, called the MA(1) model:
$$
Y_t = \omega + \theta e_{t-1} + e_t
$$
where $e_t$ is the error remaining in the model and assumed to by white noise as defined in the previous section on stationarity.
With the MA(1) model, this relationship between *t* and *t-1* exists for **all** one time period differences across the dataset. However, we can write out a series of these models to see the individual "shocks" only last for a short period of time. Let's look at three consecutive models:
$$
Y_{t-1} = \omega + \theta e_{t-2} + e_{t-1}
$$
$$
Y_{t} = \omega + \theta e_{t-1} + e_{t}
$$
$$
Y_{t+1} = \omega + \theta e_{t} + e_{t+1}
$$
We can see that the effect of shocks that happened in the previous time point are no longer felt by the next future time point. Notice how the impact of $e_{t-1}$ is gone by $Y_{t+1}$ in the MA(1) model. This goes back to our idea of this being a stationary model. The dependence of previous "shocks" disappear over time.
# Correlation Functions
For data that follows an MA(1) structure, there are specific patterns we see in the autocorrelation and partial autocorrelation function. For a review of these functions, please see the previous section on correlation functions. With an MA(1) structure to our data, the autocorrelation function (ACF) has a significant spike at the number one lag in the model, followed by nothing after. The partial autocorrelation function (PACF) decreases exponentially as the number of lags increases.
Let's imagine our data follows the following MA(1) process:
$$
Y_t = 0 + 0.8 e_{t-1} + e_t
$$
The following plots would be be the ACF and PACF of the data that follows the above structure.
{fig-align="center" width="9.2in"}
In the ACF plot on the left above, the first lag takes a value of $0.8$ followed by nothing afterwards because the following lags have no impact on the current period. For the PACF plot, the values of the correlation decrease as time goes on.
# MA(q) Model
This moving average model can be extended to include $q$ lags instead of just one lag:
$$
Y_t = \omega + \theta_1 e_{t-1} + \theta_2 e_{t-2} + \cdots + \theta_p e_{t-q} + e_t
$$
With a moving average structure to our data, the partial autocorrelation function (PACF) decreases exponentially in the long run as the number of lags increases but with a variety of patterns. The autocorrelation function (ACF) has a significant spike up to $q$ lags in the model, followed by nothing after.
The stationarity assumption still holds for this model as well because the impacts of error lags do not last forever.
Let's see how to build this in each of our softwares!
::: {.panel-tabset .nav-pills}
## R
We need to use a different dataset that is much more situated for a moving average model instead of the US airlines passengers dataset we have used up until this point. For this we will load a dataset that specifically follows a moving average structure. Let's explore the dataset with the `gg_tsdisplay` function. Specifically, we will look at the *Y* variable across 12 lags. The `plot_type = 'partial'` option shows us the ACF and PACF for the variable.
```{r}
#| echo: false
ma1 <- data.frame(Y = as.numeric(arima.sim(n = 1000, list(ma=c(0.7)))))
ma1$index <- seq(1:1000)
ma1 <- as_tsibble(ma1, index = index)
```
```{r}
ma1 %>%
gg_tsdisplay(Y, lag = 12, plot_type='partial')
```
From the plots above we can see early spikes in the PACF that tail off as we go further back in time in terms of lags. However, the ACF plot only has one spike before all the remaining spikes are essentially 0. To see these patterns better we can use the `Acf` and `Pacf` functions from the `forecast` package to individually plot the ACF and PACF respectively.
```{r}
forecast::Acf(ma1$Y, lag = 12, main = "ACF Plot")
forecast::Pacf(ma1$Y, lag = 12, main = "PACF Plot")
```
From what we saw earlier, a pattern in the ACF where there is one significant spike followed by no correlation implies we should model our data with an MA(1) model. We can use the `model` function (from the `fable` package) with the `ARIMA` function to specify a moving average model. With the formula input to the `ARIMA` function we have our target variable of *Y* along with an intercept (represented by the 1) and the `pdq` function to specify the number of lags of the moving average term. The first term is the number of autoregressive lags which we set to 0. The second term is the number of single differences which we set to 0 since our data is already stationary. The last term is the number of lags in the moving average piece which we will set to 1. Lastly, we use the `report` function to get a summary of the output from the model.
```{r}
model_MA <- ma1 %>%
model(
ARIMA(Y ~ 1 + pdq(0,0,1))
)
report(model_MA)
```
What we can see form the above output is the values of $\theta_1$ along with the intercept, $\omega$.
To see if we removed all of the correlation from our data we can examine the residuals using the `gg_tsresiduals` function. To get a sense of whether these residuals are white noise we can use the `features` function with the `ljung_box` option. We will calculate this test for 12 lags with `dof = 1` since our model had one coefficient.
```{r}
model_MA %>%
gg_tsresiduals()
augment(model_MA) %>%
features(.innov, ljung_box, lag = 12, dof = 1)
```
Based on the picture above we can see that we have residuals that see to have no correlation to them. The p-value from the Ljung-Box test shows that we have white noise statistically.
## Python
We need to use a different dataset that is much more situated for a moving average model instead of the US airlines passengers dataset we have used up until this point. For this we will load a dataset that specifically follows a moving average structure. Let's explore the dataset by looking at the ACF and PACF functions. To see these patterns better we can use the `plot_acf` and `plot_pacf` functions from the `statsmodels.api` package to individually plot the ACF and PACF respectively.
```{python}
#| echo: false
ma1 = r.ma1
d = {'unique_id': 1, 'ds': range(1000), 'y': ma1['Y']}
ma1 = pd.DataFrame(data = d)
```
```{python}
import statsmodels.api as sm
sm.graphics.tsa.plot_acf(ma1['y'], lags = 12)
plt.show();
sm.graphics.tsa.plot_pacf(ma1['y'], lags = 12)
plt.show();
```
From the plots above we can see early spikes in the PACF that tail off as we go further back in time in terms of lags. However, the ACF plot only has one spike before all the remaining spikes are essentially 0.
From what we saw earlier, a pattern in the ACF where there is one significant spike followed by no correlation implies we should model our data with an MA(1) model. We can use the `StatsForecast` function with the `ARIMA` function to specify a moving average model. With the `ARIMA` function we have our target variable of *Y* and the `order` option to specify the number of lags of the moving average term. The first term is the number of autoregressive lags which we set to 0. The second term is the number of single differences which we set to 0 since our data is already stationary. The last term is the number of lags in the moving average piece which we will set to 1. By using the `.fit` function we can build our MA(1) model. Lastly, we use the `.fitted_[0][0].model_.get('coef')` function to get a summary of the coefficients from the model.
```{python}
from statsforecast import StatsForecast
from statsforecast.models import ARIMA
model_MA = StatsForecast(models = [ARIMA(order = (0, 0, 1))], freq = 1)
model_MA.fit(df = ma1)
model_MA.fitted_[0][0].model_.get("coef")
```
What we can see form the above output is the values of $\theta_1$ along with the intercept, $\omega$.
To see if we removed all of the correlation from our data we can examine the residuals. To get a sense of whether these residuals are white noise we can use the `acorr_ljungbox` function. We will calculate this test for 12 lags with `model_df = 1` since our model had one coefficient.
```{python}
sm.stats.acorr_ljungbox(model_MA.fitted_[0][0].model_.get("residuals"), lags = [12], model_df = 1)
```
Based on the picture above we can see that we have residuals that see to have no correlation to them. The p-value from the Ljung-Box test shows that we have white noise statistically.
## SAS
We need to use a different dataset that is much more situated for a moving average model instead of the US airlines passengers dataset we have used up until this point. For this we will load a dataset that specifically follows a moving average structure. Let's explore the dataset by looking at the ACF and PACF functions. We can use the PROC ARIMA procedure in SAS to view the ACF and PACF functions. The `plot = all` option gives us the necessary plots. The IDENTIFY statement is where we specify the variable of interest using the `var =` option. We will look at the target variable *Y* for up to 12 lags.
```{r}
#| eval: false
proc arima data = ma1 plot = all;
identify var = Y nlag = 12;
run;
```
{fig-align="center" width="5in"}
From the plots above we can see early spikes in the PACF that tail off as we go further back in time in terms of lags. However, the PACF plot only has two spikes before all the remaining spikes are essentially 0.
From what we saw earlier, a pattern in the PACF where there is one significant spike followed by no correlation implies we should model our data with an MA(1) model. We can use the same PROC ARIMA procedure to specify a moving average model too. We still need the same IDENTIFY statement to identify the target variable of *Y.* The ESTIMATE statement is used to specify the number of lags of the moving average term using the `q =` option. We set this number of moving average lags to 1. The last option we use is the `method = ML` option to specify building our model using maximum likelihood estimation.
```{r}
#| eval: false
proc arima data = ma1 plot = all;
identify var = Y nlag = 12;
estimate q = 1 method = ML;
run;
```
{fig-align="center" width="4.74in"}
{fig-align="center" width="9.71in"}
{fig-align="center" width="5in"}
What we can see form the above output is the values of $\theta_1$ along with the intercept, $\omega$. To see if we removed all of the correlation from our data we can examine the residual output from above as well. To get a sense of whether these residuals are white noise we can use the p-values in the autocorrelation check of residuals table. This shows us that our residuals are white noise. This can also be seen in the white noise plot above too. That white noise plot above is a little hard to grasp at first. The bars more represent the test statistics than the p-values. Larger test statistics would lead to smaller p-values. The alpha levels are marked on the y-axis to show you if the test statistic is too large.
:::