ANOVA can be a confusing statistical test. The name is analysis of variance, yet the null hypothesis is that all the group means are equal. The animation here aims to illustrate the formulas used for this hypothesis test. The R code for this animation is available on the R code GitHub site for Briefed by Data.
The idea behind ANOVA is that we look at the ratio, the F statistic, of calculating variance in two ways. For the numerator, we calculate the sum of the variance between each group mean and the overall mean, weighing each variance by the group size. This is SSG. The other way is to calculate the weighted sum of the variance of each group, SSE, which is the same as the sum of the square difference between each data point and its corresponding group mean. In other words, we calculate a variance in two ways with the data. The first method treats the data as a single, large data set, while the other method focuses on individual groups. If the means are the same, then F is small. The bigger the difference in the means, the larger F.
The F statistic is then (SSG / (k-1)) / (SSE / (n-k)), where k is the number of groups and n is the total amount of data points. In the animation, the three groups have the same variance, and you can see how the value of F increases as the groups separate.
Test yourself, or your students, with these questions:
What is the standard deviation of each group?
What is the overall group mean?
How big is the data set in each of the three groups?
What is the mean of the middle group?
Why does the overall group mean never change?
Please share and like
Sharing and liking posts attracts new readers and boosts algorithm performance. I appreciate everything you do to support Briefed by Data.
Comments
Please let me know if you believe I expressed something incorrectly or misinterpreted the data. I'd rather know the truth and understand the world than be correct. I welcome comments and disagreement. Please feel free to share article ideas, feedback, or any other thoughts at briefedbydata@substack.com.
Bio
I am a tenured mathematics professor at Ithaca College (PhD in Math: Stochastic Processes, MS in Applied Statistics, MS in Math, BS in Math, BS in Exercise Science), and I consider myself an accidental academic (opinions are my own). I'm a gardener, drummer, rower, runner, inline skater, 46er, and R user. I’ve written the textbooks “R for College Mathematics and Statistics” and “Applied Calculus with R.” I welcome any collaborations.