Concepts Through Examples

Introductory Example

The best way to learn simulation is through an example.

Suppose you want to invest $1,000 (call this \(P_0\)) in the US stock market for one year. You invest in a mutual fund that tries to produce the same return as the S&P500 Index. Let’s keep this example basic and only work with annual return. The price after one year can be calculated as follows:

\[ P_1 = P_0 \times (1 + r_{0,1}) \]

The term \(r_{0,1}\) refers to the annual return from time point 0 to time point 1. Let’s assume annual returns follow a Normal distribution with a historical mean of 8.79% and a standard deviation of 14.75%.

Let’s see how we would simulate this problem in each of our softwares!

How did we select the normal distribution for our returns? Is that a good guess of a distribution? That we will need to find out in our next section.

Distribution Selection

When designing your simulations, the biggest choice comes from the decision of the distribution on the inputs that vary. There are 3 common methods for selecting these distributions:

  1. Common Probability Distribution

  2. Historical (Empirical) Distribution

  3. Hypothesized Future Distribution

Common Probability Distribution

Typically, we assume a common probability distribution for inputs that vary in a simulation. We did this in our above example where we just assumed returns followed a normal distribution. When selecting distributions, we need to first decide if our variables are discrete or continuous in nature. A discrete variable take on outcomes that you can list out (even if you can not list all of them). Continuous variables on the other hand take on any possible value in a interval (even if the interval is unbounded).

Historical (Empirical) Distributions

If you are unsure of the distribution of the data you are trying to simulate, you can estimate it using the process of kernel density estimation. Kernel density estimation is a non-parametric method of estimating distributions of data through smoothing out of data values.

The kernel density estimator is as follows:

\[ \hat{f(x)} = \frac{1}{n\times h} \sum_{i=1}^n \kappa (\frac{x - x_i}{h}) \]

In the above equation \(\kappa\) is the kernel function. The kernel function builds a distribution centered around each of the points in your dataset. Imagine a small normal distribution built around each point and then adding these normal distributions together to form one larger distribution. The default kernel function distribution in SAS and R is the normal distribution, but there are other options like the quadratic, triangle, and Epanechnikov. The In the above equation \(h\) is the bandwidth. The bandwidth controls the variability around each point. The larger the variability, the more variation is built around each point through the distribution as seen in the chart below:

Comparison of 3 Different Estimations of Standard Normal Distribution

The above plot shows kernel density estimates of data sampled from a standard normal distribution. The actual distribution is highlighted with the dashed line above. Three estimates with different bandwidths are shown. The smaller the bandwidth, the smaller the variability around each point, but also the more highly fit our curve is to our data. The larger the bandwidth, the more variability around each point which smooths out the curve much more. We want to find the balance. There is a lot of research that has been done and still ongoing on how to best select bandwidth. We will leave it up to the computer to optimize this decision for us.

Once we have this kernel density estimate for our data, we can sample from this estimated distribution for our simulation. In our previous example we assumed a normal distribution for our returns of our mutual fund. Now, let’s build an empirical distribution through kernel density estimation and sample from that instead of the normal distribution.

Let’s see how to do this in each of our softwares!

Hypothesized Future Distribution

The final choice for a distribution is the hypothesized future distribution. Maybe you know of an upcoming change that will occur in your variable so that the past information is not going to be the future distribution. For example, the volatility in the market is forecasted to increase, so instead of a standard deviation of 14.75% you forecast it to be 18.25%. In these situations, you can select any distribution of choice.

Compounding and Correlations

Complication arises in simulation when you are now simulating multiple inputs changing at the same time. Even when the distributions of these inputs are the same, the final result can still be hard to mathematically calculate. This shows the value of simulation since we don’t have to work out the mathematical equations for the combination of many input distributions.

When dealing with multiple input probability distributions there are some general facts we can use.

  1. When a constant is added to a random variable (the variable with the distribution) then the distribution is the same, only shifted by the constant.

  2. The addition of many distributions that are the same is rarely the same shape of distribution.

    • The exception to this are INDEPENDENT normal distributions.
  3. The product of many distributions that are the same is rarely the same shape of distribution.

    • The exception to this are INDEPENDENT lognormal distributions.

Extended Example

Let’s adapt our previous example of investing in mutual funds. Now you want to invest $1,000 in the US stock market for 30 years. You invest in a mutual fund that tries to produce the same return as the S&P500 Index. Again, using the simple scenario where compounding isn’t continuous, we have the following value of our portfolio over 30 years:

\[ P_{30} = P_0 \times (1 + r_{0,1}) \times (1 + r_{1,2}) \times \cdots \times (1 + r_{29,30}) \]

The returns are structured as annual returns between each year which are now multiplied together. Again, we will assume annual returns follow a Normal distribution with historical mean of 8.79% and standard deviation of 14.75% for every year.

Let’s see how to do this in our softwares!

Adding Correlation

Not all inputs may be independent of each other. Having correlations between your input variables adds even more complication to the simulation and final distribution. In these scenarios, we may need to simulate random variables that have correlation with each other.

One way to “add” correlation to data is to multiply the correlation into the data through matrix multiplication (linear algebra). Think about the approach we would take to change the variance of a single variable. If we had a variable X with a normal distribution (mean of 3 and variance of 2), then how could we get the variance to be 4 instead? We can do this in a few simple steps:

  1. Standardize X to have a mean of 0 and variance of 1, \(\frac{X-3}{\sqrt{2}}\)

  2. Multiply X by the standard deviation you desire, \(\sqrt{4} \times X\)

  3. Recenter X back to its original mean, \(\sqrt{4} \times X + 3\)

Now we have this new \(Y\) where \(Y = \sqrt{4} \times X + 3\) which has the distribution that we want.

Now we have a variable that is normally distributed with a mean of 3 and variance of 4 like we wanted. We want to do the same thing on a multivariate scale to add correlation into data. We have a desired correlation matrix (that is a standardized version of the covariance matrix). We need to multiply this into the variables of interest to create our new variables. We do this in similar steps:

  1. Standardize each column of our data (X)

  2. Multiply X by the “square root” of the correlation matrix you want (this is called the Cholesky decomposition)

  3. Convert X back to its original pre-standardized mean and variance.

The Cholesky decomposition is essentially the equivalent to the square root of a number. A square root of a number is a number when multiplied by itself gets the original number. The Cholesky decomposition is the same in matrix form. It is the matrix \(L\) when multiplied by itself \(L \times L^T\) , gives you the original matrix.

To add in the correlation to a set of data, we multiply this Cholesky decomposition of the desired covariance matrix by our data. If we had two variables, essentially this multiplication will leave the first column of data alone and “bend” the second column to look more like the first. This method works best for normally distributed variables, but still OK if the variables are symmetric and unimodal.

Let’s extend our 30 year investment example from above. Instead of just investing in stocks, you are splitting your money equally across stocks and bonds for 30 years. The individual calculations will be similar to above, just with a different set of returns for bonds as compared to stocks.

\[ P_{30,S} = P_{0,S} \times (1 + r_{0,1,S}) \times (1 + r_{1,2,S}) \times \cdots \times (1 + r_{29,30,S}) \]

\[ P_{30,B} = P_{0,B} \times (1 + r_{0,1,B}) \times (1 + r_{1,2,B}) \times \cdots \times (1 + r_{29,30,B}) \]

\[ P_{30} = P_{30,S} + P_{30,B} \]

Historically, Treasury bonds are perceived as safer investments so when the stock market does poorly more people invest in bonds. This leads to a negative correlation between the two. Let’s assume that our returns for stocks follow a normal distribution with mean of 8.79% and standard deviation of 14.75%, our bond returns follow a normal distribution with mean of 4% and standard deviation of 7%, and there is a correlation of -0.2 between the returns in any given year.

Let’s see how to solve this in our softwares!

How Many and How Long

The possible number of outcomes from a simulation is basically infinite. Essentially, simulations sample these values. Therefore, the accuracy of the estimates of these simulations depends on the number of simulated values. This leads to the question of how many simulations should we run.

Confidence interval theory in statistics helps reveal the relationship between the accuracy and number of simulations. For example, if we wanted to know the mean value from our simulation, we know the margin of error around the mean.

\[ MoE(\hat{x}) = t \times \frac{s}{\sqrt{n}} \]

In the above equation, \(n\) is the number of simulations that we are running, while \(s\) is the standard deviation of the estimated means from the simulations. Therefore, to double the accuracy, we need to approximately quadruple the number of simulations.