Chi-Square Goodness-of-Fit Test
The Hypothesis Testing Process:- State hypotheses.
- Collect data.
- Construct a test statistic.
- Compute a p-value.
- Draw conclusions (in statistical terms and in context).
State hypotheses
Suppose that we want to test to see if the proportions of eye color in statistics students matches the demographic in the U.S.A.$H_0: p_{Brown} = 0.41,\, p_{Blue} = 0.32,\, p_{Hazel} = 0.15,\, p_{Green} = 0.12$
$H_A: p_i \neq p_i^* \text{ for some }i$
Collect data
We will use the HairEyeColor dataset that was introduced on the confidence intervals page.
Recall the dataset HairEyeColor:
To find the counts for each of the eye colors regardless of sex or hair color, I will sum the columns in both tables.
# no pec
HairEyeColor
To find the counts for each of the eye colors regardless of sex or hair color, I will sum the columns in both tables.
Number of Brown-Eyed People $= 32+53+10+3+36+66+16+4 = 220$
Number of Blue-Eyed People $= 11+50+10+30+9+34+7+64 = 215$
Number of Hazel-Eyed People $= 10+25+7+5+5+29+7+5 = 93$
Number of Green-Eyed People $= 3+15+7+8+2+14+7+8 = 64$
Construct a test statistic and compute a p-value.
To conduct a Chi-Square Goodness-of-Fit Test in R, we will use the chisq.test() function.
The first argument given to chisq.test() is the data, x.
The second argument, p is used to specify the specific probabilities in your null hypothesis. If this argument is omitted, R assumes the proportions for each group are equal.
# no pec
chisq.test(x = c(220, 215, 93, 64),
p = c(0.41, 0.32, 0.15, 0.12))
The second argument, p is used to specify the specific probabilities in your null hypothesis. If this argument is omitted, R assumes the proportions for each group are equal.
Draw conclusions.
Using $\alpha = 0.05$, we can see that our p-value is slightly greater than $\alpha$.
$$pval = 0.09079 > 0.05 = \alpha$$
Therefore, we will fail to reject our null hypothesis and say that there is not significant evidence to conclude the proportions of eye colors in statistics students do not match the U.S. demographic.
Therefore, we will fail to reject our null hypothesis and say that there is not significant evidence to conclude the proportions of eye colors in statistics students do not match the U.S. demographic.
Chi-Square Test of Independence
State hypotheses
Suppose that we want to test to see if Eye Color and Hair Color are independent of each other.$H_0: \text{Eye Color is independent of Hair Color.}$
$H_A: \text{Eye Color is not independent of Hair Color.}$
Collect data
We will use the HairEyeColor dataset that was introduced on the confidence intervals page.
Recall the dataset HairEyeColor:
To create a two-way table summarizing just Hair and Eye Color, regardless of Sex, I will add the individual cell values in both tables to create a new table called HairEye.
Then I will use cbind() to create a two-way table. (Could also use rbind())
# no pec
HairEyeColor
To create a two-way table summarizing just Hair and Eye Color, regardless of Sex, I will add the individual cell values in both tables to create a new table called HairEye.
| Eye | ||||
|---|---|---|---|---|
| Hair | Brown | Blue | Hazel | Green |
| Black | 32 + 36 = 68 | 11 + 9 = 20 | 10 + 5 = 15 | 3 + 2 = 5 |
| Brown | 119 | 84 | 54 | 29 |
| Red | 26 | 17 | 14 | 14 |
| Blond | 7 | 94 | 10 | 16 |
Then I will use cbind() to create a two-way table. (Could also use rbind())
# no pec
HairEye <- cbind(c(68, 119, 26, 7), c(20, 84, 17, 94),
c(15, 54, 14, 10), c(5, 29, 14, 16))
HairEye
Construct a test statistic and compute a p-value.
To conduct a Chi-Square Test of Independence in R, we will use the chisq.test() function again.
The first and only argument given to chisq.test() here is the data, x, which in this scenario is a two-way table of data.
HairEye <- HairEyeColor[, , 1] + HairEyeColor[, , 2]
chisq.test(x = HairEye)
The first and only argument given to chisq.test() here is the data, x, which in this scenario is a two-way table of data.
Draw conclusions.
Using $\alpha = 0.05$, we can see that our p-value is much smaller than $\alpha$.
$$pval = 2.2\times 10^{-16} < 0.05 = \alpha$$
Therefore, we will reject our null hypothesis and say that there is significant evidence to conclude that Hair Color and Eye Color are not independent of each other.
Therefore, we will reject our null hypothesis and say that there is significant evidence to conclude that Hair Color and Eye Color are not independent of each other.