Analysis of Variance (ANOVA)
The Hypothesis Testing Process:- State hypotheses.
- Collect data.
- Construct a test statistic.
- Compute a p-value.
- Draw conclusions (in statistical terms and in context).
In R, there is a dataset titled warpbreaks that contains the number of warp breaks in yarn during weaving. Two different types of wool were weaved at three different levels of tension. Here is the structure of the data:
The numerical variable breaks is the number of warp breaks.
The categorical (factor) variable wool contains the two types of wool, A and B.
The categorical variable tension contains the three levels of tension, L, M and H.
We are interested in seeing if there is a difference in the number of warp breaks between the three tension levels.
# no pec
str(warpbreaks)
The numerical variable breaks is the number of warp breaks.
The categorical (factor) variable wool contains the two types of wool, A and B.
The categorical variable tension contains the three levels of tension, L, M and H.
We are interested in seeing if there is a difference in the number of warp breaks between the three tension levels.
State hypotheses
Suppose that we want to test to see if the mean number of warp breaks is the same at all three tension levels.$H_0: \mu_L = \mu_M = \mu_H$
$H_A: \mu_i \neq \mu_j \text{ for some }i,j.$
Collect data
We will use the warpbreaks dataset in R.Construct a test statistic and compute a p-value.
To conduct an Analysis of Variance (ANOVA) in R, we will use the aov() function.
Notice that we are using '~' notation to specify that we comparing the numerical variable, breaks, between the tension groups, tension.
Just using the aov() function by itself is not very useful. To get the ANOVA table (with the test statistic and p-value), nest aov() inside of summary().
Using summary(aov()) is much more insightful for your ANOVA analyses.
# no pec
aov(warpbreaks$breaks ~ warpbreaks$tension)
Just using the aov() function by itself is not very useful. To get the ANOVA table (with the test statistic and p-value), nest aov() inside of summary().
# no pec
summary(aov(warpbreaks$breaks ~ warpbreaks$tension))
Using summary(aov()) is much more insightful for your ANOVA analyses.
Draw conclusions.
Using $\alpha = 0.05$, we can see that our p-value from the ANOVA table above is less than $\alpha$.
$$pval = 0.00175 < 0.05 = \alpha$$
Therefore, we can reject our null hypothesis and say that there is evidence that the mean number of warp breaks between the 3 levels of tension are not all equal.
Therefore, we can reject our null hypothesis and say that there is evidence that the mean number of warp breaks between the 3 levels of tension are not all equal.
Pairwise Comparisons - (only if we reject the null hypothesis for ANOVA)
To make pairwise comparisons of each pair of populations, we will use the TukeyHSD() function with aov().
Looking at the p adj column, we see that two out of the three p-values are less than $\alpha = 0.05$. This tells us that there is a difference in the average number of warp breaks between the 'M' and 'L' tensions and the 'H' and 'L' tensions. The average number of warp breaks between 'H' and 'M' could be equal.
By creating a side-by-side boxplot, you can also see these pairwise differences graphically.
Looking at this comparative boxplot, we can see that this provides confirmatory evidence regarding our conclusions above. There is a difference in the average number of warp breaks between the 'M' and 'L' tensions and the 'H' and 'L' tensions. The average number of warp breaks between 'H' and 'M' are more similar and their difference in the number of breaks could be due to chance.
# no pec
TukeyHSD(aov(warpbreaks$breaks ~ warpbreaks$tension))
Looking at the p adj column, we see that two out of the three p-values are less than $\alpha = 0.05$. This tells us that there is a difference in the average number of warp breaks between the 'M' and 'L' tensions and the 'H' and 'L' tensions. The average number of warp breaks between 'H' and 'M' could be equal.
By creating a side-by-side boxplot, you can also see these pairwise differences graphically.
# no pec
boxplot(warpbreaks$breaks ~ warpbreaks$tension,
main = "Boxplot of Yarn Warp Breaks",
ylab = "Number of Breaks",
xlab = "Tension Level",
col = c("#66c2a5", "#fc8d62", "#8da0cb"))
Looking at this comparative boxplot, we can see that this provides confirmatory evidence regarding our conclusions above. There is a difference in the average number of warp breaks between the 'M' and 'L' tensions and the 'H' and 'L' tensions. The average number of warp breaks between 'H' and 'M' are more similar and their difference in the number of breaks could be due to chance.
Video Tutorial:
Another ANOVA example - Chick Weights
In R, there is a dataset titled chickwts that contains the weight of chickens on various feed supplements. There were six different feed types. Here is the structure of the data:
The numerical variable weight is the weight of the chicks.
The categorical (factor) variable feed contains the six types of feed.
We are interested in seeing if there is a difference in the average weight of chicks between the six feed supplements.
# no pec
str(chickwts)
The numerical variable weight is the weight of the chicks.
The categorical (factor) variable feed contains the six types of feed.
We are interested in seeing if there is a difference in the average weight of chicks between the six feed supplements.
State hypotheses about the parameter
Suppose that we want to test to see if the mean weight of chicks is the same for all 6 feed supplements.$H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4 = \mu_5 = \mu_6$
$H_A: \mu_i \neq \mu_j \text{ for some }i,j.$
Collect data
We will use the chickwts dataset in R.Construct a test statistic and compute a p-value.
To conduct an Analysis of Variance (ANOVA) in R, we will use the aov() function along with summary().
# no pec
summary(aov(chickwts$weight ~ chickwts$feed))
Draw conclusions.
Using $\alpha = 0.05$, we can see that our p-value from the ANOVA table above is very small compared to $\alpha$.
$$pval = 5.94 \times 10^{-10} < 0.05 = \alpha$$
Therefore, we can reject our null hypothesis and say that there is evidence that the mean weight of chicks are not all equal between the different feed supplements.
Therefore, we can reject our null hypothesis and say that there is evidence that the mean weight of chicks are not all equal between the different feed supplements.
Pairwise Comparisons - (only if we reject the null hypothesis for ANOVA)
To make pairwise comparisons of each pair of populations, we will use the TukeyHSD() function with aov().
Looking at the p adj column, we see that there are quite a few feed supplements that are significantly different from each other and some that are more similar.
By creating a side-by-side boxplot, you can also see these pairwise differences graphically.
Looking at this comparative boxplot, we can see that this provides confirmatory evidence regarding our conclusions above. For example, since the p-value for the difference between horsebean and casein is less than 0.05, we would expect a difference in the mean weight of chicks on these two feed types.
In contrast, there wouldn't be much of a difference between the mean chick weight for the linseed and soybean feed types because their p-value is greater than 0.05.
# no pec
TukeyHSD(aov(chickwts$weight ~ chickwts$feed))
Looking at the p adj column, we see that there are quite a few feed supplements that are significantly different from each other and some that are more similar.
By creating a side-by-side boxplot, you can also see these pairwise differences graphically.
# no pec
boxplot(chickwts$weight ~ chickwts$feed,
main = "Boxplot of Chick Weights",
ylab = "Weight (grams)",
xlab = "Feed Supplement",
col = c("#8dd3c7", "#ffffb3", "#bebada", "#fb8072", "#80b1d3",
"#fdb462"))
Looking at this comparative boxplot, we can see that this provides confirmatory evidence regarding our conclusions above. For example, since the p-value for the difference between horsebean and casein is less than 0.05, we would expect a difference in the mean weight of chicks on these two feed types.
In contrast, there wouldn't be much of a difference between the mean chick weight for the linseed and soybean feed types because their p-value is greater than 0.05.