How to Create a Histogram in Google Sheets and Overlay a Normal Distribution Curve

Video google sheet distribution chart

Histogram and Normal Distribution Curve in Google Sheets

In this tutorial, I’m going to show you how to create a histogram in Google Sheets and overlay a normal distribution curve, just like the image above, using Google Sheets.

This visual technique is very useful for determining whether your data is normally distributed, skewed, or simply scattered.

What is a Histogram?

A histogram is a graphical representation of the distribution of a set of data.

In this example, let’s say I have 1000 exam scores ranging from 0 to 100, and I want to see the distribution of these scores. What is the average score? Did more students get high or low scores? How tightly clustered are the student scores around the average? Are the scores normally distributed or skewed?

What is a Normal Distribution Curve?

The normal distribution curve is a graphical representation of the normal distribution theorem, which states that “…the means of randomly drawn variables, independently from independent distributions, converge in distribution to the normal, that is, become normally distributed when the number of random variables is sufficiently large.”

It may sound complicated, but in reality, the data converges around the mean without any left or right skew. This means we know the probability of how many values occurred near the mean.

We expect 68% of the values to fall within one standard deviation of the mean, and 95% to fall within two standard deviations. Values outside of two standard deviations are considered outliers.

We expect our exam scores to be fairly close to the normal distribution, but let’s graphically check that (it’s hard to see just from the data).

Now, let’s see how to create a histogram in Google Sheets and overlay a normal distribution curve, just like in the first image above.

How to Create a Histogram in Google Sheets

Step 1: Raw Data

Copy the raw scores from here into your own blank Google sheet. This is a list of 1000 exam scores ranging from 0 to 100, and we will be studying the distribution of these scores.

Step 2: Name This Range

Create a named range from these raw scores, called “scores,” to make our lives easier. Select all the data in column A, i.e., cells A1:A1000, then click on the “Data > Named ranges…” menu and name this range “scores.”

Step 3: Summary Statistics

Set up a small summary table with the mean, median, mode, and standard deviation of our population. The formulas are as follows:

Mean:
Median:
Mode:
Standard deviation:

Step 4: Create Frequency Intervals

Set up frequency intervals, from 0 to 100 at intervals of 5. Put 0 in cell F2, then use this formula to quickly fill in the remaining intervals:

(adds 5 to the cell above it). Name this range “interval.”

Step 5: Calculate Normal Distribution

Set up the values for the normal distribution curve.

Google Sheets has a NORMDIST formula that calculates the value of the normal distribution function for a given value, mean, and standard deviation. We calculated the mean and standard deviation in Step 3, and we will be using the interval values from Step 4 in the formula.

In cell G2, insert the formula:

Drag the formula down to cell G22 to fill the entire column with NORMDIST formulas.

Step 6: Normal Distribution Curve

Let’s see what the normal distribution curve looks like with this data.

Select the interval column and the NORMDIST column, then go to “Insert > Chart” and select “Line chart,” and make it smooth.

You will get a result like this:

Normal Distribution Curve in Google Sheets

That’s a normal distribution curve, centered around our mean of 56.9. Congratulations!

We now need to calculate the distribution of the 1000 exam scores for our histogram chart.

Since we will be creating a new chart with the histogram and overlaid normal curve (easier than modifying this one), you can set this normal distribution chart aside for now or delete it.

Step 7: Frequency Formula

Leave column H blank for now (we will fill it soon).

In column I, let’s use the FREQUENCY formula to assign our 1000 scores to the frequency intervals. Type the following formula in cell I2 and press Ctrl + Shift + Enter (on PC) or Cmd + Shift + Enter (on Mac) to create the array formula. It will fill the entire column and assign all the scores to the correct intervals:

If you’re new to array formulas, check out this article: How Do Array Formulas Work in Google Sheets?

Step 8: Copy Values

Copy this column of frequency values into the adjacent column J (we will need it for our chart).

Pro Tip: You can simply copy I1:I2 into J1:J2, which will fill the entire column with the values.

Step 9: Resize Normal Distribution Curve

We need to resize our normal distribution curve to fit on the same scale as the histogram. Since we have 1000 values in 5 intervals, our scaling factor is 5000. This means that when I multiply the values in the normal distribution by 5000, they will be comparable to the histogram values on the same axis. Additionally, they will sum up to 1000, which is the number of values in our population.

So, in the empty column H, add the following formula and drag it down to cell H22:

Our final data table now looks like this:

Final Table

Step 10: Create the Chart

This is where we finally see how to create a histogram in Google Sheets!

Note: The shared screenshots below show the old chart editor. The new chart editor opens in a sidebar, but the steps and options are essentially the same.

Hold down the Ctrl key (PC) or Cmd key (Mac) and select the data interval column, the normal distribution, and the two histogram columns, but exclude the NORMDIST formula column, like this:

Then, go to “Insert > Chart,” and select “Combo chart”:

Select the option to use column F as labels:

In the Customize tab, remove the title and legend. Select the Smooth option:

Select the vertical axis. Remove the axis name. Set the range from 0 to 150 and set major gridlines to 4.

In the Series section of the Customize menu, choose the “Normal Distribution” series and switch from columns to lines, so your chart looks like this:

Then, choose the “Histogram” series and switch from lines to columns:

Next, choose the “Histogram 2” series and switch from lines to stepped area:

Then, change the color to red, line thickness to 1px, and opacity to 70%, to give our chart the appearance of a histogram (which is why we needed two copies of the frequency column):

Finally, set the font size of the axis labels to 10, and then click inside the chart area to move and resize it by dragging the edges outward so that it fills the entire chart canvas:

And voila! You have now learned how to create a histogram in Google Sheets, overlaid with a normal distribution curve:

Final Result

Wanna have your own copy of this histogram chart? Click here to access your copy of this template.

Conclusion:

In conclusion, we can see that our exam score data is very close to the normal distribution. Hooray!

If we look closely, we notice that it is very slightly skewed to the left, i.e., it has a longer tail on the left, more spread out to the left. You will notice there is some space between the red bars and the blue line on the left side, but the red bars overlap the blue curve on the right side. It’s subtle.

Crawlan.com

Related posts