Analysis of Variance (ANOVA)
Researchers examined hemoglobin levels of Austrailian athletes[1]. Hemoglobin is a protein found in
red blood cells that transports oxygen throughout
the body and athletes who have low hemoglobin levels are sometimes said to have "sports anemia".
Are there differences in the
mean hemoglobin levels of athletes that participate in rowing, basketball, netball (similar to basketball but without dribbling), running (distances longer than 400m),
and swimming?

To compare means of more than two independent populations, we use a hypothesis testing procedure called the Analysis of Variance, ANOVA for short.
Why is a procedure for comparing means called the Analysis of Variance? The Analysis of Variance depends on two types of variation.
- between-group variability: how the group means vary around the overall mean.
- within-group variability: how the measurements in a single group vary around their mean.
- Under what conditions is most of the variation from between groups?
- Under what conditions is most of the variation from within groups?
Analysis of Variance is a hypothesis testing procedure. We'll discuss each of the steps of the hypothesis testing process as we proceed in our discusssion of ANOVA.
- State hypotheses.
- Collect data.
- Construct a test statistic.
- Compute a p-value.
- Draw conclusions (in statistical terms and in context)
Step 1: State Hypotheses
The null hypothesis for an Analysis of Varance is that the means of the k populations are equal. If we denote the means of the k populations $\mu_1$, $\mu_2$, $\ldots$, $\mu_k$ then the null hypothesis stated symbolically is $$H_0: \mu_1=\mu_2=\ldots = \mu_k$$ The alternative hypothesis is that at least two of the means are not equal to each other, that is $$H_A: \mu_i \neq \mu_j \text{ for some }i,j.$$
Let $\mu_1$, $\mu_2$, $\mu_3$, $\mu_4$, and $\mu_5$ denote the mean hemoglobin levels of athletes competing in rowing, basketball, netball, running, and swimming respectively.
The null hypothesis of an Analysis of Variance to compare the mean hemoglobin levels of athletes in the 5 groups is $$H_0: \mu_1=\mu_2=\mu_3=\mu_4=\mu_5$$ and the alternative hypothesis is $$H_A: \mu_i \neq \mu_j \texttt{ for some }i,j.$$
Step 2: Collect Data
ANOVA is used to compare the means of three of more groups thus the data consist of numeric measurements made across at least three categories.
The full dataset on the Austrailian athletes is available through the software package R. The data needed for this analysis, containing the variables "hg" (hemoglobin level) and sport can be found here.
Step 3: Construct a Test Statistic
The test statistic for an ANOVA is the ratio of a quantity (called MSTr) measuring the between group variability to a quantity (called the MSE) measuring the within group variability. The resulting statistic has an F distribution thus
$$F = \frac{SSTr/df_1}{SSE/df_2} = \frac{MSTr}{MSE} \sim F_{df_1, df_2}$$
(We'll address the degrees of freedom later on.)
When there is more variability between the groups than within groups, this statistic is large and the p-value is small giving evidence of a difference between the group means.
The between group variability is is measured by the sum of squares for the treatments (SSTr): $$SSTr=\sum_{\texttt{(all groups)}} \texttt{group size}(\texttt{group mean}-\texttt{overall mean})^2$$ This measure is a summary of the variability of the group means around the overall mean.
T sum of squares for error (SSE) measures the variability of the individual subjects within a group around the group mean.
$$SSE= \sum_\texttt{(all groups)}\sum_\texttt{(all obs)}(\texttt{obs j from group i} - \texttt{group i mean})^2 $$
Total Sum of Squares, SST = SSTr + SSE.
The SST measures how the $k$ group means vary around the overall mean but the SSE takes into account the variability of all the $n_T$ observations around their respecitive group means. $n_T$ is typically much bigger than $k$ so the SSE is usually bigger than the SST just because its a total of more things. In order to obtain comparable measures of variability, we divide the sums of squares by corresponding degrees of freedom.
Mean Squares for Error: $MSE = \frac{SSE}{n_T - k}$, $n_T$ is the total number of observations.
Mean Squares for Rreatment: $ MSTr = \frac{SSTr}{k - 1}$, $k$ is the number of groups.
$F = MSTr/MSE$.
Under the null hypothesis of no difference among the population means, the $F$ statistic has approximately an $F$ distribution with $k-1$ numerator degrees of
freedom and $n_T-k$ denominator degrees of freedom.
The ANOVA Table
Results from an analysis of variance carried out in software are displayed in an ANOVA table like the one below. The row containing the totals is not essential to the output.
The ANOVA table for the analysis of the Austrailian athletes data is shown below.
DF | Sum of Squares | Mean Squares | F | p-value | |
Treatment | 4 | 63.1 | 15.775 | 13.59 | 2.7e-09 |
Error | 131 | 152 | 1.161 |
Step 4: Compute a p-value
The p-value is computed from the right tail area of the appropriate F .
$p-value = P(F \geq f)$ where $F\sim F_{k-1, n_T-k}$
From the ANOVA table we see that the p-value for the Australian athletes analysis is 2.7e-09.
Step 5: Draw Conclusions
- When the p-value is smaller than the chosen $\alpha$ (usually 0.05), we reject the null hypothesis and conclude that there is evidence of a difference in population means.
- Otherwise, we fail to reject the null hypothesis.
- We cannot determine from this which means are not equal, only whether they are not all equal.
Since the p-value for the Australian athletes analysis, p-value = 2.7e-09, is very small, we reject the null hypothesis and conclude that there are difference between the mean hemoglobin levels of at least two groups.
Comments Regarding ANOVA
ANOVA works if- the samples from the k populations are random,
- the observations within each group are approximately normally distributed (or the sample sizes are large enough to ensure approximate normality of the sample means), and
- the underlying k populations all have the same variance.
If the F test is significant (i.e., we reject the null hypothesis that the means are equal), this means that there is at least one pair of the k means that are significantly different. There are additional procedures that help us to determine which pairs of means differ.
Pairwise Comparisons
If we reject the null hypothesis, we can use pairwise comparisons to determine which means are different, and by how much. The comparisons are made by constructing confidence intervals or conducting hypothesis tests for the difference between the means of each pair of populations. If a confidence interval does not contain 0, or if the p-value for a hypothesis test is less than the significance level then there is evidence that the means of the compared groups are not equal.
Pairwise comparisons can tell us which means are different, and by how much.
The results of pairwise comparisons as a follow-up to significant results from an ANOVA are shown below.
- The first column shows which groups are being compared (in the image the groups are simply labeled 1, 2, 3).
- The second column, 'diff', gives a point estimate for the difference in the means of two groups.
- The third column, 'lwr', is a lower endpoint for a 95% confidence interval for the differences in the means.
- The next column, 'upr' is the upper endpoint of the confidence interval.
- The final column, 'padj', gives a p-value for a hypothesis test for ascertaining whether the difference in the two means is 0.

Since we rejected the null hypothesis for the the p-value for the Australian athletes analysis, we use pairwise comparisons to see which group means are different. br>

- Netball and Basketball
- Rowing and Netball
- Swimming and Netball
- Running and Netball
The point estimate comparing the netball and basketball means came from subtracting the basketball mean from the netball mean. Since the result is negative, the basketball mean is larger than the netball mean.
For all the other comparisons involving netball, the netball mean was subracted from the other group mean. These point estimates are all positive indicating that the netball mean was smaller.
There is evidence that the mean hemoglobin levels of athletes who participate in netball is lower than that of athletes who participate in the four other sports.
Footnotes: