Stats Stuff

Chi-Square Goodness-of-Fit Test

The Hypothesis Testing Process:

State hypotheses.
Collect data.
Construct a test statistic.
Compute a p-value.
Draw conclusions (in statistical terms and in context).

State hypotheses

Suppose that we want to test to see if the proportions of eye color in statistics students matches the demographic in the U.S.A.

$H_0: p_{Brown} = 0.41,\, p_{Blue} = 0.32,\, p_{Hazel} = 0.15,\, p_{Green} = 0.12$
$H_A: p_i \neq p_i^* \text{ for some }i$

Collect data

We will use the HairEyeColor dataset that was introduced on the confidence intervals page.

Recall the dataset HairEyeColor:

# no pec


        HairEyeColor

To find the counts for each of the eye colors regardless of sex or hair color, I will sum the columns in both tables.

Number of Brown-Eyed People $= 32+53+10+3+36+66+16+4 = 220$

Number of Blue-Eyed People $= 11+50+10+30+9+34+7+64 = 215$

Number of Hazel-Eyed People $= 10+25+7+5+5+29+7+5 = 93$

Number of Green-Eyed People $= 3+15+7+8+2+14+7+8 = 64$

Construct a test statistic and compute a p-value.

To conduct a Chi-Square Goodness-of-Fit Test in R, we will use the chisq.test() function.

# no pec


          chisq.test(x = c(220, 215, 93, 64),
                     p = c(0.41, 0.32, 0.15, 0.12))

The first argument given to chisq.test() is the data, x.

The second argument, p is used to specify the specific probabilities in your null hypothesis. If this argument is omitted, R assumes the proportions for each group are equal.

Draw conclusions.

Using $\alpha = 0.05$, we can see that our p-value is slightly greater than $\alpha$. $$pval = 0.09079 > 0.05 = \alpha$$
Therefore, we will fail to reject our null hypothesis and say that there is not significant evidence to conclude the proportions of eye colors in statistics students do not match the U.S. demographic.

Chi-Square Test of Independence

State hypotheses

Suppose that we want to test to see if Eye Color and Hair Color are independent of each other.

$H_0: \text{Eye Color is independent of Hair Color.}$
$H_A: \text{Eye Color is not independent of Hair Color.}$

Collect data

We will use the HairEyeColor dataset that was introduced on the confidence intervals page.

Recall the dataset HairEyeColor:

# no pec


        HairEyeColor

To create a two-way table summarizing just Hair and Eye Color, regardless of Sex, I will add the individual cell values in both tables to create a new table called HairEye.

Hair	Brown	Blue	Hazel	Green
	Eye
Black	32 + 36 = 68	11 + 9 = 20	10 + 5 = 15	3 + 2 = 5
Brown	119	84	54	29
Red	26	17	14	14
Blond	7	94	10	16

Then I will use cbind() to create a two-way table. (Could also use rbind())

# no pec


        HairEye <- cbind(c(68, 119, 26, 7), c(20, 84, 17, 94),
                         c(15, 54, 14, 10), c(5, 29, 14, 16))
        HairEye

Construct a test statistic and compute a p-value.

To conduct a Chi-Square Test of Independence in R, we will use the chisq.test() function again.


         HairEye <- HairEyeColor[, , 1] + HairEyeColor[, , 2]


          chisq.test(x = HairEye)

The first and only argument given to chisq.test() here is the data, x, which in this scenario is a two-way table of data.

Draw conclusions.

Using $\alpha = 0.05$, we can see that our p-value is much smaller than $\alpha$. $$pval = 2.2\times 10^{-16} < 0.05 = \alpha$$
Therefore, we will reject our null hypothesis and say that there is significant evidence to conclude that Hair Color and Eye Color are not independent of each other.

Chi-Square Goodness-of-Fit Test

State hypotheses

Collect data

Construct a test statistic and compute a p-value.

Draw conclusions.

Chi-Square Test of Independence

State hypotheses

Collect data

Construct a test statistic and compute a p-value.

Draw conclusions.

Video Tutorial: