Modeling and Simulation
Introduction to Probabilistic Simulations in Excel
You can use probability distributions to manage the uncertainty
about your assumptions when you create
simulations in Excel. Here's how.
by Charley Kyd, MBA Microsoft Excel MVP, 20052014
The Father of Spreadsheet Dashboard Reports

(This is the first article in a series. The second article
is, How to Create Monte Carlo Models and Forecasts Using Excel Data Tables.)
Back when I created models and forecasts for employers, I
KNEW that my results were going to be wrong, and I had no
practical way to express any degree of uncertainty in my
results.
For example, if I were forecasting profits for a period, my
spreadsheet would use one number for my forecast of sales in a
period, another number for my forecast of operating expenses, another
number for my costofgoodssold percentage, and so on.
I knew that
each predicted number in my forecast probably would land
somewhere between a bestcase and worstcase value...probably as
determined by a normal distribution as shown in this Excel chart.
(Click here to get
a copy of this chart with data, calculations, and documentation.)
But my models offered no way for me to include such
probabilities. And they offered no way to
assess or reduce my degree of uncertainty about my final forecast.
And also, during my years as an onsite Excel consultant, I
saw many models and forecasts created by spreadsheet users from around the world. And in all that time, I never saw a
model, or forecast that could have done any better.
Enter "Simulations"
Since then, I've come to realize that most Excel models and forecasts are actually a form of simulation.
Wikipedia tells us that a simulation is the imitation of the
operation of a realworld process or system over time.
The act of simulating something, Wikipedia explains, first
requires that a model be developed. This model represents the
key characteristics or behaviors of a system or process. The
model represents the system itself, and the simulation
represents the operation of the system over time.
In general, there are two types of models: deterministic
and probabilistic.
 Deterministic models use specific
numbers for assumed values, including ratios.
 Probabilistic (or "stochastic") models
rely on random numbers, typically drawn from a normal
distribution.
From this perspective, the type of simulation I prepared for
my employers, and the type I saw in other companies was a
deterministic simulation.
On the other hand, what we should have been using was a
probabilistic simulation. This would have allowed us to specify
our degree of uncertainty about each of our assumptions, and it also
would have helped us to evaluate our uncertainty about our final
results.
I think it's way past time for Excel users to reduce our
reliance on deterministic simulations. It's time to start using
probabilistic simulations in our work.
Two Types of Probabilistic Simulations
Loosely speaking, we can divide probabilistic simulations into
two types, tabular and expanded.
Tabular Simulations
With tabular simulations, you create your entire model
in several cells in one row of a spreadsheet. And some of these cells include random numbers.
To create
the simulation, you copy your row of formulas down their
columns to many thousands of rows. And then, to analyze your
simulation, you analyze those many different results generated
by those random numbers in each row of your table of simulations.
For example, suppose that your company places a variable number of
online ads each month, which generate a variable number of
visitors to your web site. And suppose that a variable number of
visitors is needed
to generate each sale of a variable amount.
So in several cells in a row, you could simulate one month of
activity, using random numbers to define each degree of
variability. You could copy those formulas down their columns to
create a table with thousands of possible results. And then,
you'd analyze the table to determine what your average sales
would be for a period, and how variable that estimate might be.
Expanded Simulations
The expanded approach is different. In this approach, you
create a model that can be as detailed as you would like it to
be. You could use as many rows in as many worksheets as you want
for your model.
Typically, this is known as a Monte Carlo analysis.
But unlike most models created in Excel, the Monte Carlo
analysis would use
random numbers to generate key assumptions.
For example, if your bestpossible sales one month would be
120, and your worstpossible sales would be 80, you'd use a
random number to choose between those limits. Or if your
highestpossible cost of sales would be 40% and your
lowestpossible cost would be 30%, you'd use another random
number to choose between those limits.
With this approach, your model gives you a revised forecast
each time you recalculate your workbook.
You easily can recalculate this model as many times as you
want and capture the results from each calculation...automatically. And you can
do it without programming. I'll show you how to do it in my next
article, How to Create Monte Carlo Models and Forecasts Using
Excel Data Tables.
But for now, let's dig into a critical ingredient of both types
of simulations: random numbers. First, I'll show you the obvious
method, which you seldom should use for your simulations. Then
I'll show you the method you should use most of the time.
Excel's Two Random Number Functions
Excel offers two functions that generate random numbers...
 RAND() returns a random number between
0 and 1.
 RANDBETWEEN(bottom, top) returns a
random integer between the bottom and top arguments.
Both functions return results that have the same chance of
appearing anywhere between the bottom and top boundaries.
To illustrate, I copied the RAND() function and pasted it to
10,000 cells in one column. I grouped its values into 10 groups
of equalsize bins, then I created a histogram that shows how many
times a value appeared in each bin.
The chart on the left shows the results after I pressed F9
the first time to recalculate my workbook. The first column in
the first chart shows that RAND generated a value between 0 and
.10 abut 1010 times in those 10,000 rows. And the last column in
that chart shows that RAND generated a value between .9 and 1.0
about 990 times. And the other chart shows the results after I
pressed F9 again.
As you can see, the results for each column are grouped
around 1000, which is each column's equal share of the results.
If I had used RANDBETWEEN rather than RAND, I would have seen
a similar pattern.
The
problem with using either RAND or RANDBETWEEN for your
simulations is that we want results that look about like this
Excel chart.
That is, once we define the boundaries of our assumptions, we
typically want the random numbers to be centerweighted.
So how can we do this? How can we return a random number from
a normal distribution?
Random Numbers from a Normal Distribution
To return a random number from a normal distribution, we rely
mostly on the
NORM.INV function, which uses this syntax:
=NORM.INV(probability, Mean, standard_dev)
The probability argument is easy to specify. Probabilities have
values from zero to one, just like the RAND function generates.
Therefore, if we can figure out how to calculate the Mean and
the standard deviation, we can use this formula to return a
random number from a normal distribution:
=NORM.INV(RAND(), Mean, standard_dev)
If you're a statistical superstar, you'll have no problem
calculating the Mean and standard deviation of your data for use
in this formula.
But if you're like most of us Excel users, you need an easier
approach. Here it is...
Take
another look at this Excel chart. The light blue area is within
one standard deviation of the mean.
As the first label below the chart illustrates, that
lightblue area represents about 68% of the total area.
Together, the light and mediumblue areas show what's within
two standard deviations of the mean. And the second label below
the chart shows that this represents about 95% of the total
area.
So think about what that means. Suppose your original
forecast shows that sales next month will be 100. But if your
job depended on it, what do you think your bestcase and worstcase
estimates might be for those sales? After you really think about it,
let's say you estimate that sales could be as high as 120 and as
low as 70.
Those two numbers define the outer boundaries of the medium
blue areas in the figure. That is, your estimate says that
there's about a 95% chance that actual sales will be between 70
and 120.
And therefore...
 Your mean is the average of 120 and 70, which is 95 in
this case.
 Your standard deviation is the difference between 120
and 70, divided by 4 standard deviations, which is 12.5 in
this case.
Therefore, here's your formula to return a random number
from a normal distribution with a mean of 95 and a standard
deviation of 12.5:
=NORM.INV(RAND(),95,12.5)
Now let's check whether this formula gives us the results we
expect.
Checking the Results with a Histogram
The following two figures aren't fancy, but they tell a story
that you need to understand if you create Excel models or forecasts.
This figure calculates how successfully the previous formula
returns numbers from a normal distribution. Here's how to set it
up...
Enter this formula in the cell shown:
A1: =NORM.INV(RAND(),95,12.5)
We want to copy this cell down the column to cell A10000.
Here's a fast way to do this:
 Copy cell A1.
 Press the F5 function key to launch the GoTo dialog.
 In the Reference edit box, enter A10000, then press
Enter to select cell A10000.
 Hold down your Ctrl and Shift keys, then press the
uparrow key, which will select all cells from A10000
through A1.
 Press Ctrl+v to paste
cell A1 to the selected cells.
The cells in column C show the min and max values in column
A. Enter the formulas in the cells shown:
C2: =MIN($A$1:$A$10000)
C3: =MAX($A$1:$A$10000)
Column D shows nine numbers to use as a visual guide. Enter them
any way you want.
The first formula in cell E2 merely references the min value:
E2: =C2
The next formula calculates the minimum value plus oneninth
of the distance between the min and max values:
E3: =E2+($C$3$C$2)/9
Copy the formula in cell E3 downward as shown. Notice that
cell E11 equals the max value in cell C3.
Set Up the Data for the Histogram
Column F contains the data that we'll display in our new
histogram. We'll use the FREQUENCY function to generate it.
To do so, first select the range F3:F11, then type this
formula into your formula bar:
=FREQUENCY(A1:A10000,E3:E11)
Now arrayenter this formula. That is, hold down your Ctrl
and Shift keys and then press Enter.
After you do so, you should have a pattern similar to the
data shown in the figure above. That is, the cells in the middle of
the formulas in column F should display much larger numbers than the cells at
each end.
Create the Histogram
The
final step of the test is to create the histogram, shown here.
To do so, make sure that the range F3:F11 is still selected,
then choose Insert, Charts, Column Chart, Clustered Column.
After you do so, Excel will create a chart somewhat like this
one. If you're using Excel 2007 or 2010, your chart might not
include a title.
You certainly could make this chart look better, but that's
not necessary for the test. The chart easily demonstrates that
we've done what we wanted by combining the NORM.INV and RAND
functions: We now have a way to return random numbers from a
normal distribution.
In
How to Create Monte Carlo Models and Forecasts Using Excel Data Tables, I show you how to build on this
foundation to create probabilistic simulations using the Monte
Carlo method.
And again, you can click here to get
a copy of this chart with data, calculations, and documentation.
Tags: #excel, #normal distribution, #stochastic,
#probabilistic, #deterministic, #simulation, #model, #random number, #forecast, #norm.inv, #rand,
#Monte Carlo
