Normal Distribution of Sample Means

From Statistics Libre Text:

For samples of size 30 or more, the sample mean is approximately normally distributed, with mean μX = μ and standard deviation σX = σ/√n where n is the sample size. The larger the sample size, the better the approximation.

The above is a pretty typical assertion in statistics texts. The derivation must be a doozy, it’s never shown, and there’s rarely a discussion of the Central Limit Theory itself.

I wrote a program that creates a uniform distribution population, then takes a number of samples of a specified size, calculates each sample’s mean and the standard deviation of the sample’s means.


Code repo


There’s a bunch of assertions in the excerpt from Libre Text Statistics above.

1. Sample mean is approximately normally distributed

distribution of samples’ mean values

That’s a population of 100,000 values, each value between 0 and 1000. I had my program make 1000 samples of 35 values each, calculating each sample’s mean. You see above a density histogram of the samples’ mean values, along with a normal distribution in blue, and R “kernel density” curve.

That doesn’t look so great. There’s some mismatch between histogram and normal distribution. Other runs I did produced more “normal looking” histograms and a better visual match between density and normal distribution PDF. Numerically, the kurtosis of the distribution of sample means is always close to 3, that of the normal distribution. The histogram above represents a distribution with a kurtosis of 2.96, so even though it looks lumpy, it’s still close to a normal distribution.

I think this assertion is justified.

2. Sample distribution mean is the same as the population mean

3. sample mean distribution has σX = σ/√n

4. The larger the sample size, the better the approximation.

These assertions are all related.

The 1000 sample means from the graph above, each sample having 35 values, are surprisingly close to each other and to the population mean:

I wrote a second program to try different sample sizes for the same population, take 1000 samples of a given size, and look at how close the sample means distribution is to theoretical.

  • 100000 values in population
  • 1000 max value in population
  • Population mean 500.5
  • Population std dev 288.7
sample size min median max mean sample std dev theoretical std dev
15 260.7 496.3 730.9 498.2 72.7 74.6
30 325.5 499.7 676.2 500.3 52.7 52.7
50 365.8 499.3 623.9 501.6 40.5 40.8
100 400.3 499.4 594.3 500.4 29.6 28.9
500 460.2 501.0 537.4 500.9 13.0 12.9
1000 470.6 500.4 526.9 500.7 9.4 9.1
10000 491.8 500.3 509.8 500.4 2.9 2.9

Looks like all these assertions are justified. The sample distribution mean is close to the population mean, and the sample mean distribution standard deviation is essentially identical to the theoretical value asserted.

I hereby declare the Central Limit Theorem true and justified.