The Power of a Normal Distribution
Build a Histogram
Before diving into the normal distribution curve, let’s start with the building block of it, the histogram. The word histogram comes from the Greek 'histos', meaning "anything that stands upright", and 'gram', which means graph.
Building a histogram is relatively simple so let’s weigh some geese and build a histogram. Well just have these geese line up and start walking on our scale and we’ll round their weights to the nearest pound and record the data on this chart. Soon we will have a full histogram of goose weights.
You’ll notice that there are more goose data points in the center of the graph and less goose data points as you go away from the center. As such, the histogram provides a nice visual representation of the goose weights. You can really start to see the distribution in this sample set as it reflects a collection of the data around the center and then fewer and fewer data points as we move away from the center. This is a common feature with most histograms.
The important lesson here about the histogram is that it has a center (or average or mean) and it also has a distribution (or variation or standard deviation) and that is where the histogram stops and where the normal curve comes into play.
Seeing the Curve
In this chart, you can see the normal curve and how it nicely forms around the histogram. And it doesn’t matter what data set that the histogram represents, this normal curve magically transforms all of the various data sets into very predictable statistical models. And that my friends is what the normal distribution curve does for us!
You see, the histogram is only the cocoon. Then, a metamorphosis takes place between the histogram and the normal curve and the normal distribution curve becomes the butterfly of your statistical model. That is because the normal distribution curve standardizes the statistical formations that are hidden in your data set.
The Empirical Rule (68%-95%-99.7%)
What is so powerful about the normal curve is something called the Empirical Rule and while the Empirical Rule sounds like the title to a Star Wars movie, it is actually a principle that applies to the normal distribution of data. And it goes like this: 68% of the population data falls within plus or minus one standard deviation and 95% of the population data falls within two standard deviations and 99.7% of the population data falls within three standard deviations of the mean.
The normal distribution curve is taking your actual data and then creating a predictive model about the total population of your studied subject. That is how the one and only Empirical Rule can be applied to almost every set of sample data.
Note that from a visual perspective, histograms can have vastly different distribution looks. However, the normal curve and empirical rule allow us to standardize our data into one overarching distribution. And this goes back to the Central Limit Theorem (CLT) in that if we continued to collect data, our histograms will start to look more like a pyramid or bell in shape.
A wonderful thing about the normal distribution curve and empirical rule is that they allow us, through the magic of statistics, to take a giant data shortcut. You see, by using these principles we can jump directly from our data set into the normal distribution model without incurring the expense of massive sampling plans. So we get useful data without much cost.
Who's Not Normal?
In this article, I used the word “almost” quite often and there was a reason for that. It is because almost all data on planet earth is normally distributed, but not 100% of it. In order to be certain that your data is normally distributed, a statistical normality test should be conducted. The test to validate data normality is called the Anderson-Darling Normality Test.
The key output of the Anderson-Darling Test is a P-value and if the P-value is greater than 0.05, then you can assume that your data is normally distributed. You want to check the normality of your data early-on because you don't want to draw inaccurate conclusions. I use Minitab software but there is some free on-line software available too, such as the R-Project.
If you conduct the normality test and find that your data is not normally distributed, then you need to determine why. If you can’t explain why, then your data may actually be normally distributed, and hence your problem may be with the data collection. If so, get more or different samples and retest.
If your data really does represent a non-normal distribution, then it can be transformed into a useful format. However, this transformation process requires the statistical expertise of a six-sigma black belt. If your back is against the wall statistically speaking, then buy your company black belt a cup of coffee and pick their brain to learn more about transformation tools.
You Got the Power!
When your data is normally distributed, it can be used in all kinds of statistical studies such Process Capability, T-tests, F-tests and ANOVAs. To learn more, enroll in CML courses. These courses are effective, flexible and affordable.
Stay connected with news and updates!
If you want some weekly T4T wisdom coming straight to your inbox for your reading pleasure - look no further! Join our mailing list to receive the latest blogs and updates.
Don't worry, your information will not be shared.
We hate SPAM. We will never sell your information, for any reason.