This is another short post with one great animation as a follow-up to Connecting a histogram and boxplot (7/27/2024). The goal here is to demonstrate what a boxplot might miss about the data as we continue to connect a boxplot and a histogram as two ways to graphically represent data. As a quick review of a boxplot, recall that the ends of the boxes are the first and third quartiles so that half the data lies within the box. The line within the box is the median. The ends of the tails, or whiskers, are the minimum and maximum values, so that 25% of the data is each of the whiskers. In this example, we demonstrate how the data can move within a quartile without changing the boxplot. In particular, a boxplot will miss bimodal data. You will see how the dots move within the boxplot, although they are jittered so that they aren’t all on top of each other, without changing the quartiles. Each data point has some transparency, so darker dots are overlaps of more than one dot. The R code is at the end. Please share and subscribe if you enjoyed this animation. Ideas for future animations are welcomed, put them in the comments.
Please share and like
Sharing and liking posts attracts new readers and boosts algorithm performance. Everything you do is appreciated.
Comments
Please point out if you think something was expressed wrongly or misinterpreted. I'd rather know the truth and understand the world than be correct. I welcome comments and disagreement. We should all be forced to express our opinions and change our minds, but we should also know how to respectfully disagree and move on. Send me article ideas, feedback, or other thoughts at briefedbydata@substack.com.
Bio
I am a tenured mathematics professor at Ithaca College (PhD Math: Stochastic Processes, MS Applied Statistics, MS Math, BS Math, BS Exercise Science), and I consider myself an accidental academic (opinions are my own). I'm a gardener, drummer, rower, runner, inline skater, 46er, and R user. I’ve written the textbooks R for College Mathematics and Statistics and Applied Calculus with R. I welcome any collaboration.
R Code
## Packages
library(ggplot2)
library(gganimate)
library(magick)
## Colors
MyPurple <- "#5B005B"
MyLightP <- "#dfdbdf"
MyLightP2 <- "#f8f4f8"
MyLightP3 <- "#fcfafc"
MyPurple5 <- "#9c669c"
## Create Data
data1 <- runif(6000, 0, 40)
U1 <- runif(4000,0,1)
U2 <- runif(4000,0,1)
temp1 <- 20*sqrt(U1)*cos(pi*U2) +20
temp2 <- rnorm(2000,20, 5)
data2<-c(temp1, temp2)
result <- summary(data2)
small <- result[[1]]
q1 <- result[[2]]
m <- result[[3]]
q3 <- result[[5]]
large <- result[[6]]
data3 <- numeric(6000)
for (i in 1:6000){
if(data2[i] <= q1 + 0.25 ){data3[i] = data2[i]}
if(data2[i] > q1 + 0.25 & data2[i] < m - 0.25){data3[i] = data2[i] - (data2[i] - q1) / 2}
if(data2[i] >= m - 0.25 & data2[i] <= m + 0.25){data3[i] = data2[i]}
if(data2[i] < q3 -0.25 & data2[i] > m + 0.25){data3[i] = data2[i] + (q3 -data2[i]) / 2}
if(data2[i] >= q3-0.25){data3[i] = data2[i]}
}
data4 <- numeric(6000)
for (i in 1:6000){
if(data2[i] >= q1 - 0.25 & data2[i] <= q1){data4[i] = data2[i]}
if(data2[i] <= 2 * small){data4[i] = data2[i]}
if(data2[i] < q1 - 0.25 & data2[i] > 2 * small){data4[i] = data2[i]/2}
if(data2[i] > q1 & data2[i] < m - 0.25){data4[i] = data2[i] - (data2[i] - q1) / 2}
if(data2[i] >= m-0.25 & data2[i] <= m + 0.25){data4[i] = data2[i]}
if(data2[i] < q3 & data2[i] > m +0.25){data4[i] = data2[i] + (q3 - data2[i]) / 2}
if(data2[i] > q3 + 0.25 ){data4[i] = data2[i] + (large - data2[i]) / 2}
if(data2[i] >= q3 & data2[i] <= q3 + 0.25){data4[i] = data2[i]}
}
data<- data.frame("value" = c(data1, data2, data3, data4),
"time" = rep( 1 : 4, each = 6000))
## Create Animated Graphs
g1 <- ggplot(data, aes(x="",y = value)) +
geom_boxplot(fill = MyPurple5, color = "black",lwd = 1.5, fatten = 1) +
coord_flip() +
geom_jitter(width = 0.25, color = MyPurple, size = 2, alpha = 0.2) +
transition_states(time, transition_length = 3, state_length = 2 ) +
enter_fade() +
exit_fade() +
theme(plot.title=element_text(size=20),
plot.background = element_rect(fill = MyLightP3),
panel.background = element_rect(fill=MyLightP),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_blank()
) +
labs(title="Histogram and Boxplot", y = NULL, x = NULL)
g2 <- ggplot(data, aes(x = value)) +
geom_histogram(bins = 30, fill = MyPurple5, color = "black") +
transition_states(time, transition_length = 3, state_length = 2 ) +
enter_fade() +
exit_fade() +
theme(axis.text = element_text(size = 14),
axis.title = element_text(size = 16),
plot.background = element_rect(fill = MyLightP3),
panel.background = element_rect(fill = MyLightP),
plot.caption = element_text(hjust = c(1), size = c(14),
color = c(MyPurple) )
) +
labs(title = NULL, y = NULL, x = NULL,
caption=c("Briefed by Data || Thomas J Pfaff") )
BoxPlotAnimate <- animate(g1 , fps = 5, duration = 10,
width = 1456 / 2, height = (936 / 2) / 2,
renderer = magick_renderer() )
HistPlotAnimate <- animate(g2 , fps = 5, duration = 10,
width = 1456 / 2, height = 3 * (936 / 2) / 4,
renderer = magick_renderer() )
## Combine the two animated graph into one image
HistBoxAnimate <- image_append(c(BoxPlotAnimate[1],HistPlotAnimate[1]), stack = TRUE)
for( i in 2:50) {
TempGif <- image_append(c(BoxPlotAnimate[i], HistPlotAnimate[i]), stack = TRUE)
HistBoxAnimate <- c(HistBoxAnimate, TempGif)
}
## Save graph
anim_save("HistBoxAnimate2.gif", HistBoxAnimate)