Stats Stuff

Two Sample Hypothesis Tests - Paired Sample

The Hypothesis Testing Process:

State hypotheses about the parameter.
Collect data.
Construct a test statistic.
Compute a p-value.
Draw conclusions (in statistical terms and in context).

In R, there is a dataset titled sleep that contains the results from a paired sample $t$-test. 10 subjects were each given two different sleep-inducing drugs and the amount of extra sleep they got with each of these drugs, compared to control was recorded. Here is what the data look like:

# no pec


        sleep

The variable extra is the amount of extra sleep the subject got compared to control.
The variable group contains the two categories of sleep-inducing drugs.
The variable ID contains the individual subjects unique ID number.

In addition, we can see that extra is a numerical variable, and group and ID are categorical (factor) variables.

# no pec


        str(sleep)

State hypotheses about the parameter

Suppose that we want to test to see if the mean amount of extra sleep a participant gets is different between the two drugs.
$H_0: \mu_1 - \mu_2 = 0$
$H_A: \mu_1 - \mu_2 \neq 0$

Collect data

We will use the sleep dataset in R.

Construct a test statistic and compute a p-value.

Similarly to one-sample hypothesis tests, we will use the t.test() function.

# no pec


          t.test(sleep$extra ~ sleep$group, paired = TRUE, mu = 0)

Notice that we are using '~' notation to specify how the $t$-test should be conducted. We want a t-test comparing the numerical variable, extra, between the two drug groups, group.
One of the additional arguments for t.test() is to specify whether you are doing a paired sample or independent samples $t$-test. Since this is a paired sample, paired = TRUE.
Additionally, since we are testing whether the difference between these two drugs is equal to 0, our mu argument is set equal to 0, like our null hypothesis.

Draw conclusions.

Using $\alpha = 0.05$, we can see that our p-value is less than $\alpha$. $$pval = 0.002833 < 0.05 = \alpha$$
Therefore, we can reject our null hypothesis and say that there is evidence that there is a difference in the average amount of extra sleep subjects receive between these two sleep-inducing drugs.

If you wanted to investigate further and see which drug provided more sleep on average, constructing a side-by-side boxplot is a good way to start.

# no pec


          boxplot(sleep$extra ~ sleep$group,
                  main = "Boxplot of Extra Sleep",
                  ylab = "Hours of Extra Sleep",
                  names = c("Drug 1", "Drug 2"),
                  ylim = c(-2, 6),
                  col = c("steelblue1", "royalblue"))

Video Tutorial:

Two Sample Hypothesis Tests - Independent Samples

State hypotheses

Suppose that we want to test to see if the mean rate of miles per gallon (mpg) is different between automatic and manual transmission cars.
$H_0: \mu_A - \mu_M = 0$
$H_A: \mu_A - \mu_M \neq 0$

Collect data

We will use the mtcars dataset in R. (first used on Numerical Data Summary page)

Construct a test statistic and compute a p-value.

Similarly to one-sample hypothesis tests, we will use the t.test() function.

# no pec


          #Remember that group '0' is for automatic, '1' is for manual. 
          t.test(mtcars$mpg ~ as.factor(mtcars$am), paired = FALSE, mu = 0)

Notice that we are using '~' notation to specify how the $t$-test should be conducted. We want a t-test comparing the numerical variable, mpg, between the two transmission types, am.
One of the additional arguments for t.test() is to specify whether you are doing a paired sample or independent samples $t$-test. Since this is NOT a paired sample, paired = FALSE.
Additionally, since we are testing whether the difference between these two types of transmissions is equal to 0, our mu argument is set equal to 0, like our null hypothesis.

Draw conclusions.

Using $\alpha = 0.05$, we can see that our p-value is less than $\alpha$. $$pval = 0.001374 < 0.05 = \alpha$$
Therefore, we can reject our null hypothesis and say that there is evidence that there is a difference in the average rate of miles per gallon (mpg) between automatic and manual transmission cars.

If you wanted to investigate further and see which transmission type provided better gas mileage on average, constructing a side-by-side boxplot is a good way to start.

# no pec


          boxplot(mtcars$mpg ~ as.factor(mtcars$am),
                  main = "Boxplot of Miles per Gallon in Cars",
                  ylab = "Miles per gallon (mpg)",
                  xlab = "Transmission Type",
                  names = c("Automatic", "Manual"),
                  ylim = c(10, 40),
                  col = c("#003056", "#8A8D8F"))