It’s Not All About the Average
By Melanie Barker, Arena Simulation Consultant
Published: October 3, 2017
Worldwide, the average human is 5’6” (167cm) tall. This is a true fact – I know because I found it on the internet. However, is it a useful fact? Assume for a moment that you are a manufacturer of blue jeans. Would you take this average height value and then size all of your equipment so that it can only make jeans for individuals who are exactly this height? Now imagine your business was more narrowly focused in terms of your potential customer pool, e.g. men living in the United States. If you’re aiming for a smaller group, would it make any more sense to buy equipment that only makes one length of jeans? Or should you take the range of potential heights into account when designing your factory?
It’s obvious why building a factory to make jeans for only one height is a bad idea. We are well accustomed to the differences in height amongst humans, so we intuitively understand that we need to be prepared to handle variability. Even if we were to focus our business model on a smaller subset of the population, which presumably has less variability than the entire world population, we still recognize the need for different size options.
So when it comes to looking at our model output, do we still recognize the need for ranges, or do we simply report the average and call it a day? Most of us are very good at using variability in the input data that feeds our simulations, especially since that is often a reason that we choose to use simulation instead of a simple spreadsheet analysis. It’s the output side where I sometimes see a lack of thoroughness in dealing with variability. In Arena, output variability is reported via the half width, which is half of the 95th percentile confidence interval for the metric.
The main reason I find that most users are reluctant to include output variability is because they are presenting to an audience that is unfamiliar or uncomfortable with statistics and uncertainty. In this situation, one solution may be to use graphics or visual means to help users understand the information being presented.
It may also be wise to keep the amount of information simple and include the confidence intervals only when they are a significant portion of the mean. For example, if the average daily throughput from a factory is 9,483 with a half width of 5, it’s probably unnecessary to report that the true mean is in the range of 9,478 to 9,488. In this instance, the confidence interval is such a small percentage of the mean that we might choose to ignore it. However, if the range were 8,283 to 10,683, this is a very important piece of information as it shows that there could be a lack of control in the system that we’ve modeled. This assumes that you have run a reasonable number of replications – if not, start by increasing the number of replications to see how that affects your outputs.
One other approach to consider is looking at percentiles. Depending on the situation, you may wish to report a percentile instead of the average, e.g. the 65th percentile. Using the higher value may provide enough cushion for your operation to accommodate most of the variability without overscheduling your capacity dramatically. It is important to be clear with your audience if you have used this technique to avoid potential misinterpretations of the results being presented.
Ultimately, the onus resides on you to present your results in a way that is both accurate and useful. Simulation is a very powerful tool that provides a great deal of insight into the system being modeled. It is important to capitalize on the benefits of that tool by using all of the information that it provides.
Why is This Important?
Our tendency is to focus on the averages. It is easy to think that if the average result is satisfactory, then our path forward is acceptable. What most people fail to realize is that it is the "tails" of the distribution that are probably more important. The tails of the distribution tell you what is happening when things go wrong. It's in the tails of the distribution where the supply chain gets blocked up and important customer commitments are missed. In the case of hospitals, it might mean that patients do not receive critical care and suffer severe consequences.
It is important that you examine the "tails" of the distribution and ask yourself if your organization can live with these outcomes at the probability they are statistically likely to occur. If the answer is "No", then further simulation and analysis are required.
Don't get lulled into looking only at the averages. A careful examination of the tails of the distribution will give you significant insight into how to best run your operations and the potential risks.
Sign up for the Newsletter
Arena User Group
Arena has a very active user community on LinkedIn. Ask questions, learn about best practices and connect with other simulation professionals.