Stats Stuff

One Sample Hypothesis Test for Means

The Hypothesis Testing Process:

State hypotheses about the parameter.
Collect data.
Construct a test statistic.
Apply a decision rule.
Draw conclusions (in statistical terms and in context).

State hypotheses about the parameter

Suppose that we want to test to see if the mean petal length for iris flowers is 4 cm.
$H_0: \mu = 4$
$H_A: \mu \neq 4$

Collect data

We will use the iris dataset that we used for confidence intervals.

Construct a test statistic.

The test statistic for a one sample hypothesis test for means is: $$\dfrac{\bar{X} - \mu_0}{\frac{s}{\sqrt{n}}}$$ where $\mu_0$ is the hypothesized mean value in the null hypothesis.

For the Petal.Length variable in the iris dataset:

# no pec


        xbar <- mean(iris$Petal.Length)
        xbar
        n <- length(iris$Petal.Length)
        n
        s <- sd(iris$Petal.Length)
        s
        mu <- 4
        t_test_stat <- (xbar - mu) / (s / sqrt(n))
        t_test_stat

$\bar{X} = 3.758$ $n = 150$ $s = 1.765298$ $\mu_0 = 4$

Test statistic $= -1.67897$

Apply a decision rule.

We can either define a rejection region based on a specified significance level ($\alpha$) or we can compute a p-value.

We will use a significance level of $\alpha = 0.05$


        xbar <- mean(iris$Petal.Length)
        n <- length(iris$Petal.Length)
        s <- sd(iris$Petal.Length)
        mu <- 4
        t_test_stat <- (xbar - mu) / (s / sqrt(n))


        alpha <- 0.05
        #Rejection Region
        #Since our test statistic is in the left tail of a t curve, we will say 'lower.tail = TRUE' here.
        reject <- qt(p = alpha / 2, df = n - 1, lower.tail = TRUE)
        reject
          
        #p-value
        #Since our alternative hypothesis is two-sided, we will multiply the tail probability given by pt() by 2. 
        2 * pt(q = t_test_stat, df = n - 1, lower.tail = TRUE)

Rejection Region
For a significance level of $\alpha = 0.05$, we would reject our null hypothesis for any test statistic values that are less than -1.976013 or greater than 1.976013 (since our alternative hypothesis is two-sided.)

p-value
With a test statistic value of -1.67897, the two-sided p-value is 0.09525381

Draw conclusions

Rejection Region
Our decision rule (with $\alpha = 0.05$) said that we should reject our null hypothesis if our test statistic value is less than -1.976013 or greater than 1.976013 $$T = -1.67897 > -1.976013$$ Since our test statistic is not in the rejection region, we would fail to reject our null hypothesis and state that we do not have significant evidence to say that the mean petal length of iris flowers is not 4 cm.

p-value
If our p-value is less than the specified significance level ($\alpha = 0.05$), then we will reject our null hypothesis. $$pval = 0.09525381 > 0.05 = \alpha$$ Since our p-value is greater than $\alpha$, we would fail to reject our null hypothesis and state that we do not have significant evidence to say that the mean petal length of iris flowers is not 4 cm.

Using the t.test() function

Instead of calculating all of these values "by hand", we can use a single function, t.test(), to conduct the t-test.

We want to test to see if the mean petal length for iris flowers is 4 cm.
$H_0: \mu = 4$
$H_A: \mu \neq 4$

#no pec


        t.test(iris$Petal.Length, mu = 4, alternative = "two.sided")

If you want to change what kind of alternative hypothesis you have, you can change the alternative argument in the function:

$H_A < 4$ or $H_A \leq 4$
t.test(iris$Petal.Length, mu = 4, alternative = "less")
$H_A > 4$ or $H_A \geq 4$
t.test(iris$Petal.Length, mu = 4, alternative = "greater")

Video Tutorial:

One Sample Hypothesis Test for Proportions

The Hypothesis Testing Process:

State hypotheses about the parameter.
Collect data.
Construct a test statistic.
Apply a decision rule.
Draw conclusions (in statistical terms and in context).

State hypotheses about the parameter

Suppose that we want to test to see if the proportion of red haired people is 0.15.
$H_0: p = 0.15$
$H_A: p \neq 0.15$

Collect data

We will use the HairEyeColor dataset that we used for confidence intervals.

Construct a test statistic.

The test statistic for a one sample hypothesis test for a proportion is: $$\dfrac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}$$ where $p_0$ is the hypothesized proportion value in the null hypothesis.

In the HairEyeColor dataset, there are 71 red haired people.

# no pec


        n <- sum(HairEyeColor)
        n
        phat <- 71 / n
        phat
        p0 <- 0.15
        z_test_stat <- (phat - p0) / sqrt((p0 * (1 - p0)) / n)
        z_test_stat

$n = 592$ $\hat{p} = \frac{71}{592} = 0.1199324$ $p_0 = 0.15$

Test statistic $= -2.048821$

Apply a decision rule.

We can either define a rejection region based on a specified significance level ($\alpha$) or we can compute a p-value.

We will use a significance level of $\alpha = 0.05$


        n <- sum(HairEyeColor)
        phat <- 71 / n
        p0 <- 0.15
        z_test_stat <- (phat - p0) / sqrt((p0 * (1 - p0)) / n)


        alpha <- 0.05
        #Rejection Region
        #Since our test statistic is in the left tail of a z curve, we will say 'lower.tail = TRUE' here.
        reject <- qnorm(p = alpha / 2, lower.tail = TRUE)
        reject
          
        #p-value
        #Since our alternative hypothesis is two-sided, we will multiply the tail probability given by pnorm() by 2. 
        2 * pnorm(q = z_test_stat, lower.tail = TRUE)

Rejection Region
For a significance level of $\alpha = 0.05$, we would reject our null hypothesis for any test statistic values that are less than -1.959964 or greater than 1.959964 (since our alternative hypothesis is two-sided.)

p-value
With a test statistic value of -2.048821, the two-sided p-value is 0.0404796

Draw conclusions

Rejection Region
Our decision rule (with $\alpha = 0.05$) said that we should reject our null hypothesis if our test statistic value is less than -1.959964 or greater than 1.959964 $$Z = -2.048821 < -1.959964$$ Since our test statistic is in the rejection region, we would reject our null hypothesis and state that we do have evidence to say that the true proportion of red haired people is not 0.15.

p-value
If our p-value is less than the specified significance level ($\alpha = 0.05$), then we will reject our null hypothesis. $$pval = 0.0404796 < 0.05 = \alpha$$ Since our p-value is less than $\alpha$, we would reject our null hypothesis and state that we do have evidence to say that the true proportion of red haired people is not 0.15.