Stats Stuff

Confidence Intervals for Means

The structure of a confidence interval consists of three pieces: a point estimate, a critical value, and the standard error of the statistic.

Point Estimates

A point estimate for a mean is usually denoted as $\bar{X}$. We can use the mean() function in R.

Let's say we are wanting to construct a confidence interval for the mean petal length in the iris dataset.

# no pec


        xbar <- mean(iris$Petal.Length)
        xbar

The mean petal length for the flowers in the iris dataset is 3.758.

$\bar{X} = 3.758$

Critical Values

A critical value for a confidence interval for a mean is taken from a t distribution. Our critical value is dependent on our value of $\alpha$ and the sample size, $n$.

We will use the length() function to find the number of observations in the sample, and the qt() function to calculate the critical value.

# no pec


        n <- length(iris$Petal.Length)
        n
        alpha <- 0.05
        t_crit <- qt(alpha / 2, df = n - 1, lower.tail = FALSE)
        t_crit

The iris dataset has 150 observations and with $\alpha = 0.05$, we have a critical value of 1.976013.

$t_{\alpha/2,\, n-1}=t_{.025,\, 149} = 1.976013$

Standard Error

To get the standard error for a mean, we will need the sample standard deviation, $s$, and the sample size, $n$.

We will use the sd() function to find the sample standard deviation.


        n <- length(iris$Petal.Length)


        s <- sd(iris$Petal.Length)
        s
        st_error <- s / sqrt(n)
        st_error

The Petal.Length variable has a standard deviation of 1.765298 and the standard error of the mean is 0.144136.

$s.e.(\bar{X}) = \frac{s}{\sqrt{n}} = 0.144136$

Creating the Confidence Interval

To create the confidence interval, we just need to combine all the pieces together:

$\bar{X} \pm t_{\alpha/2,\, n-1} * s.e.(\bar{X})$


        n <- length(iris$Petal.Length)
        alpha <- 0.05
        xbar <- mean(iris$Petal.Length)
        t_crit <- qt(alpha / 2, df = n - 1, lower.tail=FALSE)
        s <- sd(iris$Petal.Length)
        st_error <- s / sqrt(n)


        #lower bound
        xbar - t_crit * st_error
          
        #upper bound
        xbar + t_crit * st_error

The 95% confidence interval for the mean petal length of iris flowers is (3.473185, 4.042815).

You can also find the confidence interval by using the t.test() function.

# no pec


        t.test(iris$Petal.Length, conf.level = 0.95)

About halfway through the output, the confidence interval is listed.

Don't worry about the rest of the output for right now. It will be discussed in the hypothesis testing section.

Video Tutorial:

Confidence Intervals for Proportions

Point Estimates

A point estimate for a proportion is usually denoted as $\hat{p}$.

Let's say we are wanting to construct a confidence interval for the proportion of red haired people in the HairEyeColor dataset.

# no pec


        HairEyeColor

Notice that this dataset is separated into the categories of Sex, Hair Color, and Eye Color. To find the proportion of just red haired people, I will sum up all of the cells that contain a red haired person (third row in both tables).

Number of Red-Haired People $= 10+10+7+7+16+7+7+7 = 71$

In order to get the value of $\hat{p}$, we need the sample size, $n$. We will use the sum() function, in this instance, to find the total number of people in the sample.

# no pec


        n <- sum(HairEyeColor)
        n
        phat <- 71 / n  # 71 is the number of red-haired people
        phat

There are 592 people in the HairEyeColor dataset and the sample proportion of red-haired people is 0.1199324.

$\hat{p} = \frac{71}{592} = 0.1199324$

$n = 592$

Critical Values

A critical value for a confidence interval for a proportion is taken from a standard normal distribution. Our critical value is dependent on our value of $\alpha$.

We will use the qnorm() function to calculate the critical value.

# no pec


        alpha <- 0.05
        z_crit <- qnorm(alpha / 2, lower.tail = FALSE)
        z_crit

With $\alpha = 0.05$, we have a critical value of 1.959964.

$z_{\alpha/2}=z_{.025} = 1.959964$

Standard Error

To get the standard error for a proportion, we will simply need the values of $\hat{p}$ and $n$.


        n <- sum(HairEyeColor)
        phat <- 71 / n


        st_error <- sqrt(phat * (1 - phat) / n)
        st_error

The standard error of the proportion is 0.01335259.

$s.e.(\hat{p}) = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = 0.01335259$

Creating the Confidence Interval

To create the confidence interval, we just need to combine all the pieces together:

$\hat{p} \pm z_{\alpha/2} * s.e.(\hat{p})$


        n <- sum(HairEyeColor)
        alpha <- 0.05
        phat <- 71 / n
        z_crit <- qnorm(alpha / 2, lower.tail = FALSE)
        st_error <- sqrt(phat * (1 - phat) / n)


        #lower bound
        phat - z_crit * st_error
          
        #upper bound
        phat + z_crit * st_error

The 95% confidence interval for the proportion of red-headed people in this dataset is (0.09376184, 0.146103).