When is a regression line appropriate? P-values and Rsq are values we use to help answer this question, but they may fail us. The animation below has three different data sets with the same regression line. The p-values are essentially 0 and so the slope is significant. The Rsq values are either 0.62 or 0.92. Both very good. But, in two of the cases, the data clearly has curvature, and the regression line is inappropriate. It is easy enough to see in the scatter plot, but it is accentuated in the residual plot. R code at the end. Enjoy
Please share and subscribe if you enjoyed this animation. Ideas for future animations or improvements to this one are welcomed; put them in the comments.
Please share and like
Sharing and liking posts attracts new readers and boosts algorithm performance. Everything you do is appreciated.
Comments
Please point out if you think something was expressed wrongly or misinterpreted. I'd rather know the truth and understand the world than be correct. I welcome comments and disagreement. We should all be forced to express our opinions and change our minds, but we should also know how to respectfully disagree and move on. Send me article ideas, feedback, or other thoughts at briefedbydata@substack.com.
Bio
I am a tenured mathematics professor at Ithaca College (PhD Math: Stochastic Processes, MS Applied Statistics, MS Math, BS Math, BS Exercise Science), and I consider myself an accidental academic (opinions are my own). I'm a gardener, drummer, rower, runner, inline skater, 46er, and R user. I’ve written the textbooks R for College Mathematics and Statistics and Applied Calculus with R. I welcome any collaborations.
## Packages
library(ggplot2)
library(gganimate)
library(magick)
## Colors
MyPurple <- "#5B005B"
MyLightP <- "#dfdbdf"
MyLightP3 <- "#fcfafc"
## Create data
f <- function(x){4*x^2 - 56*x + 700}
set.seed(43)
noise <- rnorm(100, 0, 50)
x <- runif(100, 0, 20)
y2 <- f(x) + noise
result2 <- lm(y2 ~ x)
coef <- result2$coefficients
line <- function(x){coef[[2]]*x + coef[[1]]}
y1 <- line(x)+ noise
y3 <- line(x) - (y2 - line(x))
result1 <- lm(y1 ~ x)
result3 <- lm(y3 ~ x)
pv1 <- summary(result1)$coefficients[ , 4]
pv2 <- summary(result2)$coefficients[ , 4]
pv3 <- summary(result3)$coefficients[ , 4]
rsq1 <- round(summary(result1)$r.squared, 2)
rsq2 <- round(summary(result2)$r.squared, 2)
rsq3 <- round(summary(result3)$r.squared, 2)
data <- data.frame(x=c(x,x,x,x), y=c(y1,y2,y1,y3),
group=rep(1:4,each=100),
resid=c(result1$residuals,
result2$residuals,
result1$residuals,
result3$residuals),
pValue=rep(c(pv1,pv2,pv1,pv3),each=100),
rsq = rep(c(rsq1,rsq2,rsq1,rsq3),each=100))
## Create Animated Graphs
g1 <- ggplot(data, aes(x = x,y = y,
label=paste("P-value = ", format(pValue,digits=3),
": Rsq = ", rsq, sep=""))) +
geom_point(size = 3) +
geom_text(aes(0, 1150), size = 9, hjust = 0, color = MyPurple)+
transition_states(group, transition_length = 3, state_length = 2 ) +
enter_fade() +
exit_fade() +
theme(axis.text = element_text(size = 14),
axis.title = element_text(size = 16),
plot.title = element_text(size = 20),
axis.text.x = element_blank(),
plot.background = element_rect(fill = MyLightP3),
panel.background = element_rect(fill = MyLightP)) +
labs(title="Regression and Residuals Animation", y = NULL, x = NULL) +
geom_smooth(method = 'lm', formula = y ~ x, linewidth = 1.5,
se = FALSE, color = MyPurple)
g2 <- ggplot(data, aes(x = x, y = resid)) +
geom_point(size = 3) +
geom_hline(yintercept = 0, color = MyPurple, linewidth = 1.5) +
transition_states(group, transition_length = 3, state_length = 1.5 ) +
enter_fade() +
exit_fade() +
theme(axis.text = element_text(size = 14),
axis.title = element_text(size = 16),
plot.background = element_rect(fill = MyLightP3),
panel.background = element_rect(fill = MyLightP),
plot.caption = element_text(hjust = c(1), size = c(14),
color = c(MyPurple))) +
labs(title = NULL, y = NULL, x = NULL,
caption=c("Briefed by Data || Thomas J Pfaff") )
RegressionAnimate <- animate(g1, fps = 5, duration = 10,
width = 1456, height = (936 / 2),
renderer = magick_renderer() )
ResidualAnimate <- animate(g2, fps = 5, duration = 10,
width = 1456, height = (936 / 2),
renderer = magick_renderer() )
## Combine the two animated graph into one image
RegResidAnimate <- image_append(c(RegressionAnimate[1],ResidualAnimate[1]), stack = TRUE)
for( i in 2:50) {
TempGif <- image_append(c(RegressionAnimate[i], ResidualAnimate[i]), stack = TRUE)
RegResidAnimate <- c(RegResidAnimate, TempGif)
}
## Save graph
anim_save("RegResidAnimate3.gif", RegResidAnimate)