Exponential Smoothing Models

Quick Summary

Time Dependency

Time series data relies on the assumption that the observations at a certain time point depend on previous observations in time. The most extreme versions of this assumption is the naive model and the average model. The naive model assumes that any prediction in the future (h observations) is just the latest known value at the current time, t:

\[ \hat{Y}_{t+h} = Y_t \]

The average model is the other extreme that assumes that any prediction in the future (h observations) are the equally weighted average of all of the data in the past:

\[ \hat{Y}_{t+h} = \frac{1}{T} \sum^T_{t=1} Y_t \]

Obviously, a better approach might be to use all of the data (like the average model) but emphasize more recent data (like the naive model) more heavily. The is the approach of the class of models called exponential smoothing models (ESMs). In exponential smoothing models the weights on the observations emphasize the most recent data, requires only estimating a few parameters, and are simple and easy to implement.

There are many different types of exponential smoothing models. Here we will discuss three common types:

  1. Single (or Simple) ESM
  2. Linear / Holt (incorporating trend) ESM
  3. Holt-Winters (incorporating trend and seasonality) ESM

Single Exponential Smoothing

In single exponential smoothing we apply a weighting scheme that decreases exponentially the further back in time we go:

\[ \hat{Y}_{t+1} = \theta Y_t + \theta (1 - \theta) Y_{t-1} + \theta (1 - \theta)^2 Y_{t-2} + \theta(1 - \theta)^3 Y_{t-3} + \cdots \]

where \(0 \le \theta \le 1\). The previous equation for the single exponential smoothing model can be written more simply as the following:

\[ \hat{Y}_{t+1} = \theta Y_t + (1 - \theta)Y_t \]

The single exponential smoothing model can also be written in component form:

\[ \hat{Y}_{t+1} = L_t \] \[ L_t = \theta Y_t + (1 - \theta)L_{t-1} \]

where the first equation is the forecast equation and the second equation is the level equation.

The larger the value of \(\theta\), the more that the most recent observation is emphasized as shown by the chart below.

The optimal value of \(\theta\) is the value that minimizes the one-step ahead forecast errors using the sum of these squared errors:

\[ SSE = \sum_{t=1}^T (Y_t - \hat{Y}_t)^2 \]

Visually, this looks like the following picture:

where each of the dashed lines in the plot above would be squared and summed together. That value of \(\theta\) that minimizes that quantity is the optimal value. The only downside of optimizing only for the next time period is that you only get a forecast that is one value in the future. Any forecast beyond the initial point is the exact same as the initially forecasted point.

Some statistical software will report statistical tests for the \(\theta\) parameter, but not everyone considers the results of these tests because the model was never derived and designed with statistical distributions and significance in mind.

Seasonal Exponential Smoothing

Exponential smoothing models can also account for seasonal factors. In seasonal exponential smoothing models, weights decay with respect to the seasonal factor:

There are multiple ways to incorporate a seasonal effect in the exponential smoothing framework.

  • Winters Additive Seasonal ESM

  • Winters Multiplicative Seasonal ESM

  • Holt Winters Additive Seasonal ESM

  • Holt Winters Multiplicative Seasonal ESM

Winters Seasonal Exponential Smoothing

If your data is seasonal in structure, but has no trend, then the Winters seasonal exponential smoothing model is a good model to use. Instead of the trend component that we saw in the Holt exponential smoothing model, the Winters model includes a seasonal component. Similar to what we saw with time series decomposition, the seasonal term can have an additive or multiplicative structure:

Additive Seasonal Component:

\[ \hat{Y}_{t+k} = L_t + S_{t-p+k} \] \[ L_t = \theta (Y_t - S_{t-p}) + (1 - \theta)L_{t-1} \] \[ S_t = \delta(Y_t - L_t) + (1 - \delta)S_{t-p} \]

Multiplicative Seasonal Component:

\[ \hat{Y}_{t+k} = L_t \times S_{t-p+k} \] \[ L_t = \theta (Y_t / S_{t-p}) + (1 - \theta)L_{t-1} \] \[ S_t = \delta(Y_t / L_t) + (1 - \delta)S_{t-p} \]

where \(k\) is the number of observations to be forecasted. From the above equations we can see that the only difference is that we either add in the seasonal component, or multiply in the seasonal component. The optimal values of \(\theta\) and \(\delta\) are determined by minimizing the sum of squared errors of the one-step ahead forecast error. However, with the seasonal component, one-step ahead is one season ahead which means the model is optimized to predict one season into the future, instead of just one observation.

Holt Winters Seasonal Exponential Smoothing

If your data is seasonal in structure, and has a trend, then the Holt Winters (Triple) seasonal exponential smoothing model is a good model to use. Instead addition to the trend component that we saw in the Holt exponential smoothing model, the Holt Winters model includes a seasonal component as well. Similar to what we saw with time series decomposition, the seasonal term can have an additive or multiplicative structure:

Additive Seasonal Component:

\[ \hat{Y}_{t+k} = L_t + k T_t + S_{t-p+k} \] \[ L_t = \theta (Y_t - S_{t-p}) + (1 - \theta)(L_{t-1} + T_{t-1}) \] \[ T_t = \gamma(L_t - L_{t-1}) + (1 - \gamma) T_{t-1} \] \[ S_t = \delta(Y_t - L_t) + (1 - \delta)S_{t-p} \]

Multiplicative Seasonal Component:

\[ \hat{Y}_{t+k} = (L_t + k T_t) \times S_{t-p+k} \] \[ L_t = \theta (Y_t / S_{t-p}) + (1 - \theta)(L_{t-1} + T_{t-1}) \] \[ T_t = \gamma(L_t - L_{t-1}) + (1 - \gamma) T_{t-1} \] \[ S_t = \delta(Y_t / L_t) + (1 - \delta)S_{t-p} \]

where \(k\) is the number of observations to be forecasted. From the above equations we can see that the only difference is that we either add in the seasonal component, or multiply in the seasonal component. The optimal values of \(\theta\), \(\gamma\), and \(\delta\) are determined by minimizing the sum of squared errors of the one-step ahead forecast error. However, with the seasonal component, one-step ahead is one season ahead which means the model is optimized to predict one season into the future along a trend line, instead of just one observation.

Since our data has both a trend and seasonal component, we will build a Holt Winters Exponential smoothing model.

Let’s see how to do that in all of our software!