The Central Limit Theorem is something that can be challenging to understand. You take a random sample from one distribution, calculate the sample means, and then use those sample means to fill out a new distribution.
We provide two animations below to help visualize these steps. The initial animation utilizes a sample size of 10, while the subsequent one employs a sample size of 30. The top graph represents a density from which we will draw the sample. Although the density is exponential, I truncated it at 50 to ensure that the sample points remain on the graph. The top graph displays the sampled points and the sample mean. The sample mean then gets placed in the histogram below. The idea here was to demonstrate the process. Each animation processes one sample mean at a time for the first 20 sample means, but to speed up the process, I show only one sample mean and add 10 to the sample distribution at the end.
The R code is at the end. Please share and subscribe if you enjoyed this animation. Please share your ideas for future animations and critiques in the comments section.
The next animation has a sample size of 30 and we can see that the sample distribution is closer to normal.
Please share and like
Sharing and liking posts attracts new readers and boosts algorithm performance. Everything you do is appreciated.
Comments
Please point out if you think something was expressed wrongly or misinterpreted. I'd rather know the truth and understand the world than be correct. I welcome comments and disagreement. We should all be forced to express our opinions and change our minds, but we should also know how to respectfully disagree and move on. Send me article ideas, feedback, or other thoughts at briefedbydata@substack.com.
Bio
I am a tenured mathematics professor at Ithaca College (PhD Math: Stochastic Processes, MS Applied Statistics, MS Math, BS Math, BS Exercise Science), and I consider myself an accidental academic (opinions are my own). I'm a gardener, drummer, rower, runner, inline skater, 46er, and R user. I’ve written the textbooks R for College Mathematics and Statistics and Applied Calculus with R. I welcome any collaborations.
R Code
######################################################################
### Title Demonstrating the CLT
### By Thomas J Pfaff
### For Briefed by Data https://briefedbydata.substack.com/
######################################################################
## Packages
library(ggplot2)
library(patchwork) # for plot output of g1/g2 etc.
library(animation)
## Colors
MyPurple <- "#5B005B"
MyLightP <- "#dfdbdf"
MyLightP2 <- "#f8f4f8"
MyLightP3 <- "#fcfafc"
MyPurple5 <- "#9c669c"
## Defined variables
HistY <- 100
xmax <- 50
Erate <- 1 / 20
n <- 10 # Change Sample Size for Sample Mean
HistData <- -2
df2 <- data.frame(SampleMean = HistData)
TruncFunction <- function(n){
done <- 0
data <- NULL
while (done < n) {
temp <- round(rexp(1, rate=Erate ),1)
if(temp <= xmax) {data=c(data,temp)}
done <- length(data)
}
return(data)
}
## Create Annimation
saveGIF(
{ani.options(interval = 0.40, nmax = 50)
for (i in 1:20){
data <- TruncFunction(n)
Dmean <- mean(data)
df <- data.frame(xdata = data, ydata = rep(0.025,n))
## step 1
g1 <- ggplot() +
geom_function(fun = function(x){ dexp(x, rate = Erate)},
linewidth = 1.25, col = MyPurple) +
geom_point(data = df, aes(x = xdata,y = ydata), size = 4,
col = MyPurple) +
annotate("text", x = Dmean, y = 0.027, size=7,
label=bquote(paste( bar(x) * " = " * .(Dmean)))) +
annotate("text", x = 30, y = 0.04, size = 8,
label = paste("Sample Size = ", n, sep="")) +
scale_x_continuous(lim = c(0, xmax), expand = c(0, 0.1)) +
scale_y_continuous(lim = c(0, 0.05), expand = c(0, 0)) +
theme(axis.text = element_text(size = 14),
axis.title = element_text(size = 16),
plot.title = element_text(size = 20),
plot.background = element_rect(fill = MyLightP3),
panel.background = element_rect(fill = MyLightP),
legend.background = element_rect(fill = MyLightP2)) +
labs(title = "CLT: Sample Mean Distribution from a Density",
y = NULL, x = NULL)
g2 <- ggplot(df2, aes(x = SampleMean)) +
geom_histogram(bins = 40, fill = MyPurple, color = MyPurple5) +
scale_x_continuous(lim = c(0, xmax), expand = c(0, 0.1)) +
scale_y_continuous(lim = c(0, HistY), expand = c(0, 0)) +
theme(axis.text=element_text(size = 14),
axis.title = element_text(size = 16),
plot.title = element_text(size = 20),
plot.background = element_rect(fill = MyLightP3),
panel.background = element_rect(fill = MyLightP),
legend.background = element_rect(fill = MyLightP2),
plot.caption = element_text(hjust = 1, size = 14,
color = MyPurple )) +
labs(title = NULL, y = NULL, x = NULL,
caption = c("Briefed by Data || Thomas J Pfaff"))
plot(g1/g2)
ani.pause()
## step 2
g1 <- ggplot() +
geom_function(fun = function(x){dexp(x, rate = Erate)},
linewidth = 1.25, col = MyPurple ) +
geom_point(data = df, aes(x = xdata, y = ydata),
size = 4, col = MyPurple) +
annotate("text", x = Dmean, y = 0.027, size = 7,
label = bquote(paste( bar(x) * " = " * .(Dmean)))) +
annotate("text", x = 30, y = 0.04, size = 8,
label = paste("Sample Size = ", n, sep = "")) +
geom_segment(aes(x = Dmean, y = 0.025, xend = Dmean, yend = 0),
arrow = arrow(length = unit(0.5, "cm")), size = 3,
lineend = "round", linejoin = "round") +
scale_x_continuous(lim = c(0, xmax), expand = c(0, 0.1)) +
scale_y_continuous(lim = c(0, 0.05), expand = c(0, 0)) +
theme(axis.text = element_text(size = 14),
axis.title = element_text(size = 16),
plot.title = element_text(size = 20),
plot.background = element_rect(fill = MyLightP3),
panel.background = element_rect(fill = MyLightP),
legend.background = element_rect(fill = MyLightP2)) +
labs(title="CLT: Sample Mean Distribution from a Density",
y = NULL, x = NULL)
g2 <- ggplot(df2, aes(x = SampleMean)) +
geom_histogram(bins = 40, fill = MyPurple, color = MyPurple5) +
scale_x_continuous(lim = c(0, xmax), expand = c(0, 0.1)) +
scale_y_continuous(lim = c(0, HistY), expand = c(0, 0)) +
theme(axis.text=element_text(size = 14),
axis.title = element_text(size = 16),
plot.title = element_text(size = 20),
plot.background = element_rect(fill = MyLightP3),
panel.background = element_rect(fill = MyLightP),
legend.background = element_rect(fill = MyLightP2),
plot.caption = element_text(hjust = 1, size = 14,
color = MyPurple )) +
labs(title = NULL, y = NULL, x = NULL,
caption=c("Briefed by Data || Thomas J Pfaff"))
plot(g1/g2)
ani.pause()
## step 3
HistData <- c(HistData, Dmean)
df2 <- data.frame(SampleMean = HistData)
g1 <- ggplot() +
geom_function(fun = function(x){dexp(x, rate = Erate)},
linewidth = 1.25, col = MyPurple) +
annotate("text", x = Dmean, y = 0.027, size = 7,
label = bquote(paste( bar(x) * " = " * .(Dmean) ))) +
annotate("text", x = 30, y = 0.04, size = 8,
label=paste("Sample Size = ", n, sep = "")) +
geom_segment(aes(x = Dmean, y = 0.025, xend = Dmean, yend = 0),
arrow = arrow(length = unit(0.5, "cm")), size = 3,
lineend = "round", linejoin = "round") +
scale_x_continuous(lim = c(0, xmax), expand = c(0, 0.1)) +
scale_y_continuous(lim = c(0, 0.05), expand = c(0, 0)) +
theme(axis.text = element_text(size = 14),
axis.title = element_text(size = 16),
plot.title = element_text(size = 20),
plot.background = element_rect(fill = MyLightP3),
panel.background = element_rect(fill = MyLightP),
legend.background = element_rect(fill = MyLightP2)) +
labs(title="CLT: Sample Mean Distribution from a Density",
y = NULL, x = NULL)
g2 <- ggplot(df2, aes(x = SampleMean)) +
geom_histogram(bins = 40, fill = MyPurple, color = MyPurple5) +
geom_segment(aes(x = Dmean, y = HistY, xend = Dmean, yend = HistY/2),
arrow = arrow(length = unit(0.5, "cm")), size=3,
lineend = "round", linejoin = "round")+
annotate("text", x = 0.75 * xmax, y = 0.75 * HistY, size = 10,
label=paste("Simulation Size = ", i,sep = "")) +
scale_x_continuous(lim = c(0, xmax), expand = c(0, 0.1)) +
scale_y_continuous(lim = c(0, HistY), expand = c(0, 0)) +
theme(axis.text=element_text(size = 14),
axis.title = element_text(size = 16),
plot.title = element_text(size = 20),
plot.background = element_rect(fill = MyLightP3),
panel.background = element_rect(fill = MyLightP),
legend.background = element_rect(fill = MyLightP2),
plot.caption = element_text(hjust = 1, size = 14,
color = MyPurple )) +
labs(title = NULL, y = NULL, x = NULL,
caption=c("Briefed by Data || Thomas J Pfaff"))
plot(g1/g2)
ani.pause()
}
##########################
### Show 1 do 10
##########################
for (j in 1:40){
data <- TruncFunction(n)
Dmean <- mean(data)
df <- data.frame(xdata = data, ydata = rep(0.025,n))
## step 1
g1 <- ggplot() +
geom_function(fun = function(x){dexp(x, rate = Erate)},
linewidth = 1.25, col = MyPurple) +
geom_point(data = df, aes(x = xdata, y = ydata), size = 4,
col = MyPurple) +
annotate("text", x = Dmean, y = 0.027, size = 7,
label = bquote(paste( bar(x) * " = " * .(Dmean) ) )) +
annotate("text", x = 30, y = 0.04, size = 8,
label = paste("Sample Size = ", n, sep = "")) +
scale_x_continuous(lim = c(0, xmax), expand = c(0, 0.1)) +
scale_y_continuous(lim = c(0, 0.05), expand = c(0, 0)) +
theme(axis.text = element_text(size = 14),
axis.title = element_text(size = 16),
plot.title = element_text(size = 20),
plot.background = element_rect(fill = MyLightP3),
panel.background = element_rect(fill = MyLightP),
legend.background = element_rect(fill = MyLightP2)) +
labs(title="CLT: Sample Mean Distribution from a Density",
y = NULL, x = NULL)
g2 <- ggplot(df2, aes(x = SampleMean)) +
geom_histogram(bins = 40, fill = MyPurple, color = MyPurple5)+
scale_x_continuous(lim = c(0, xmax), expand = c(0, 0.1)) +
scale_y_continuous(lim = c(0, HistY), expand = c(0, 0)) +
theme(axis.text=element_text(size = 14),
axis.title = element_text(size = 16),
plot.title = element_text(size = 20),
plot.background = element_rect(fill = MyLightP3),
panel.background = element_rect(fill = MyLightP),
legend.background = element_rect(fill = MyLightP2),
plot.caption = element_text(hjust = 1, size = 14,
color = MyPurple )) +
labs(title = NULL, y = NULL, x = NULL,
caption=c("Briefed by Data || Thomas J Pfaff"))
plot(g1/g2)
ani.pause()
## step 2
g1 <- ggplot() +
geom_function(fun = function(x){dexp(x, rate = Erate)},
linewidth = 1.25, col = MyPurple) +
geom_point(data = df, aes(x = xdata, y = ydata), size = 4,
col = MyPurple) +
annotate("text", x = Dmean, y = 0.027, size = 7,
label = bquote(paste( bar(x) * " = " * .(Dmean)))) +
annotate("text", x = 30, y = 0.04, size = 8,
label = paste("Sample Size = ", n, sep = "")) +
geom_segment(aes(x = Dmean, y = 0.025, xend = Dmean, yend = 0),
arrow = arrow(length = unit(0.5, "cm")), size = 3,
lineend = "round", linejoin = "round")+
scale_x_continuous(lim = c(0, xmax), expand = c(0, 0.1)) +
scale_y_continuous(lim = c(0, 0.05), expand = c(0, 0)) +
theme(axis.text = element_text(size = 14),
axis.title = element_text(size = 16),
plot.title = element_text(size = 20),
plot.background = element_rect(fill = MyLightP3),
panel.background = element_rect(fill = MyLightP),
legend.background = element_rect(fill = MyLightP2)) +
labs(title = "CLT: Sample Mean Distribution from a Density",
y = NULL, x = NULL)
g2 <- ggplot(df2, aes(x = SampleMean)) +
geom_histogram(bins = 40, fill = MyPurple, color = MyPurple5) +
scale_x_continuous(lim = c(0, xmax), expand = c(0, 0.1)) +
scale_y_continuous(lim = c(0, HistY), expand = c(0, 0)) +
theme(axis.text=element_text(size = 14),
axis.title = element_text(size = 16),
plot.title = element_text(size = 20),
plot.background = element_rect(fill = MyLightP3),
panel.background = element_rect(fill = MyLightP),
legend.background = element_rect(fill = MyLightP2),
plot.caption = element_text(hjust = 1, size = 14,
color = MyPurple )) +
labs(title = NULL, y = NULL, x = NULL,
caption=c("Briefed by Data || Thomas J Pfaff"))
plot(g1/g2)
ani.pause()
## step 3
HistData <- c(HistData, Dmean, replicate(9, mean(TruncFunction(n))))
df2 <- data.frame(SampleMean = HistData)
g1 <- ggplot() +
geom_function(fun = function(x){dexp(x, rate = Erate)},
linewidth = 1.25, col = MyPurple) +
annotate("text", x = Dmean, y = 0.027, size = 7,
label = bquote(paste( bar(x) * " = " * .(Dmean)))) +
annotate("text", x = 30, y = 0.04, size = 8,
label = paste("Sample Size = ", n, sep = "")) +
geom_segment(aes(x = Dmean, y = 0.025, xend = Dmean, yend = 0),
arrow = arrow(length = unit(0.5, "cm")), size=3,
lineend = "round", linejoin = "round")+
scale_x_continuous(lim = c(0, xmax), expand = c(0, 0.1)) +
scale_y_continuous(lim = c(0, 0.05), expand = c(0, 0)) +
theme(axis.text = element_text(size = 14),
axis.title = element_text(size = 16),
plot.title = element_text(size = 20),
plot.background = element_rect(fill = MyLightP3),
panel.background = element_rect(fill = MyLightP),
legend.background = element_rect(fill = MyLightP2)) +
labs(title = "CLT: Sample Mean Distribution from a Density",
y = NULL, x = NULL)
g2 <- ggplot(df2,aes(x = SampleMean)) +
geom_histogram(bins = 40, fill = MyPurple, color = MyPurple5) +
geom_segment(aes(x = Dmean, y = HistY, xend = Dmean, yend = HistY/2),
arrow = arrow(length = unit(0.5, "cm")), size=3,
lineend = "round", linejoin = "round")+
annotate("text", x = 0.75 * xmax, y = 0.75 * HistY, size = 10,
label = paste("Simulation Size = ", i + 10 * j, sep = "")) +
annotate("text", x = 0.75 * xmax, y = 0.9 * HistY, size = 10,
label = "Now Adding 10 Sample Means\nbut Showing One.") +
scale_x_continuous(lim = c(0, xmax), expand = c(0, 0.1)) +
scale_y_continuous(lim = c(0, HistY), expand = c(0, 0)) +
theme(axis.text=element_text(size = 14),
axis.title = element_text(size = 16),
plot.title = element_text(size = 20),
plot.background = element_rect(fill = MyLightP3),
panel.background = element_rect(fill = MyLightP),
legend.background = element_rect(fill = MyLightP2),
plot.caption = element_text(hjust = 1, size = 14,
color = MyPurple )) +
labs(title = NULL, y = NULL, x = NULL,
caption=c("Briefed by Data || Thomas J Pfaff"))
plot(g1/g2)
ani.pause()
}
## adds a pause at the end
for (k in 1:10){
plot(g1/g2)
ani.pause() }
}, movie.name = "CLT-3.gif", ani.width = 936, ani.height = 936)