## The Normal Distribution

What does the protein content in cows' milk have in common with human IQ?

Both variables have approximately normal distributions. The normal distribution is a
good model for measurements of many kinds, including IQs, heights,
and lengths of pregnancies.

The distribution of the protein content in cow's milk has the classic bell shape
of the normal distribution. Most observations are near the mean (3.4 grams)
but a few are much larger or smaller.

The normal distribution is widely used in probability theory and underlies much of statistical inference.

The normal distribution is also called the "Gaussian distribution" or "bell curve". A normal distribution has two parameters, the mean $\mu$, and the variance $\sigma^2$. The mean can be any real number and the variance can be any non-negative number.

NOTATION: "\(X\sim N(\mu, \sigma^2)\)" indicates that the random variable X is normally distributed with mean $\mu$ and variance $\sigma^2$.

$f(x)=\frac{1}{\sqrt{2\pi}}e^{\frac{-(x-\mu)^2}{2\sigma^2}}$, $-\infty \lt x \lt \infty$

Adjust the
sliders to see how changing the values affected the graph.

$-5 \leq \mu \leq 5$

$0.5 \leq \sigma \leq 5$.

Note: \(\sigma=\sqrt{\sigma^2}\)

### The Standard Normal Distribution

The normal distribution that has mean 0 and variance 1 is called the 'standard normal' distribution. A random variable that has a standard normal distribution is usually denoted with $Z$. That is $Z\sim N(0,1)$. Moreover, we use $\phi(z)$ and $\Phi(z)$ to denote respectively the probability mass function and cumulative distribution function of a standard normal random variable.

### The Empirical Rule

For a normal distribution, the area under the curve within a given number of standard deviations (SDs) of the mean is the same regardless of the value of the mean and the standard deviation. In particular, about 68% of the area is within 1 standard deviation of the mean, 95% is within 2 standard deviations of the mean, and 99.7% is within 3 standard deviations of the mean.

- $P(\mu - \sigma \lt Z \lt \mu+\sigma)\approx 0.68$.
- $P(\mu - 2\sigma \lt Z \lt \mu+2\sigma)\approx 0.95$.
- $P(\mu - 3\sigma \lt Z \lt \mu+3\sigma)\approx 0.997$.

Use the radiobuttons at the bottom to show the regions within 1, 2, or 3 standard deviations ($\sigma$) of the mean ($\mu$).

Change the values of
$\mu$ and $\sigma$ to verify that the areas within a given number of sd's from the mean are the same regardless of the values of the mean and standard
deviation.

The green 'z' can be dragges to mark an area within 'z' standard deviations of the mean.

### Standard Units

Determining how many standard deviations a value is from the mean is called **standardizing**
the value or converting it to **standard units**.

**Standard units** indicate how many standard deviations a value is from the mean.

To **standardize** a value means to indicate how many standard deviations the value is from the mean.

Suppose the mean height of students in a particular statistics class is 69 inches with a standard deviation of 4 inches. How many standard deviations from the mean is a height of 73 inches? 63 inches?

73 inches is 4 inches, or 1 standard deviation, above the mean.

63 inches is 6 inches, or 1.5 standard deviations, below the mean.

When we standardize a normally distributed random variable, the resulting random variable has a standard normal distribution.

If $X$ is a random variable such that $X\sim N(\mu,\sigma^2)$, $$Z=\frac{X-\mu}{\sigma}\sim N(0,1)$$

We can also standardize a partiular value or find how many standard deviations the value is from the mean.

If $X\sim N(\mu, \sigma^2)$, to find how many standard deviations a value, x, is from the mean (i.e. to standardize x), subtract $\mu$ from $x$ and divide the result by the standard deviation: $z=\frac{x-\mu}{\sigma}$. The result is typcially denoted with 'z' and is often referred to as a**z-score**.

A **z-score** indicates how many standard deviations a particular value is from the mean, the standard units.

$z=\frac{x-\mu}{\sigma}$.

The units of the standard normal curve are standard units. That is if $Z\sim N(0,1)$, the value of Z that is 1 sd above the mean is 1, the value that is 2 sd's below the mean is -2, etc.

The units of the standard normal curve are standard units.

The mean protein content in the milk of a group of cows in the weeks after calving is 3.4g with a standard deviation of 0.3g.

- How many sd's from the mean is a value of 4.1g?
- Express 3.2g in standard units.
- What value is 2.5 sd's above the mean?

### Finding Probabilties with the Normal Distribution

For a continuous random variable X with probability density function \(\small{f(x)}\), \(\small{P(a \leq X \leq b) = \int_a^bf(x)dx}\). However, the probability density function of a normal random variable cannot be integrated by hand. To find probabilities pertaining to a normal distribution therefore, it is necessary either to use software or to use a table.

Use the sliders to adjust $\mu$ and $\sigma$.

Drag the orange triangle to change the value of x in the expression $P(X\leq x)$. The resulting probabability is given.

A normal cdf table gives values of the cumulative distribution function for the standard normal distribution. To use the table to find $P(X\leq x)$ where $X\sim N(\mu, \sigma^2)$:

- Compute the z-score for $x$, $z=\frac{x-\mu}{\sigma}$.
- Find the z-score on the table, the first two digits along the left margin and a third digit along the top.
- The value in the table on the row and column indicated by the previous step is $P(X \leq x)$.

z-scores are indicated along the margins of the table. The body of the table contains cumulative probabilities, $P(Z\leq z) = \Phi(z)$.

Move the arrows in the margins to locate a specific z-score. The first two digits of the z-score are marked on the left margin and the third digit on the top.
The value in the box, where the indicate row and column interact is the descired probability.

Find the probability that protein content, X, in the milk of a cow is less than 3g. $X \sim N(3.4, 0.3^2)$.

- Find 3g in standard units: $z=\frac{3-3.4}{0.3} = -1.33$
- Find -1.3 along the left margin of the table.
- Find 0.03 along the top margin of the table.

$P(X \leq -1.33) = 0.1020$

Find the probability that a standard normal random variable takes a value less than -0.72.

Using the table, find -0.7 along the left margin and 0.02 along the top.

$\Phi(-0.72) = 0.2483$.

Normal probabilities can also be found using software or with a calculator with statistical functions.

Find the probability that a standard normal random variable takes a value greater than 0.63.

Suppose $X\sim N(25, 9)$.

- Without doing any calculations, is $P(X \leq 23)$ greater than or less than 0.5?
- Find $P(X \leq 23)$.

To find the probability that a normal random variable takes on a value over a given interval: $\small{P(a \leq X \leq b) = F(b)-F(a)}$. Calculator or software can evaluate this probability directly.

$\small{P(a \leq X \leq b) = F(b)-F(a)}$

What is the probability that the protein content in the milk is between 2.5 and 3 grams? That is, if $X \sim N(3.4, 0.3^2)$, what is $P(2.5 \leq X \leq 3)$?

$X\sim N(25, 9)$. Find $P(22 \leq X \leq 30)$.

### Finding Percentiles of the Normal Distribution

To find percentiles from the normal distribution, calculate how many sd's the given percentile is from the mean, then find the value of the variable that corresponds to that z-score.

How many SDs from the mean is the 70

^{th}percentile?

This question is equivalent to asking for the value of z-score associated with the 70th percentile.
Regardless of the values of the mean and variance of a normal distribution, the z-score corresponding to the 70th percentile is the same.

The 70^{th} percentile is the value, T, such that 70% of the area is less than T.

To find T using the table, look for 0.7 in the body of the table and find the associated z-score. Since the exact value 0.7 is not in the table,
it is reasonable to use the closest available value, 0.6985. Reading from the margins, the z-score associated with 0.6985 is 0.52, that is
$P(Z \leq 0.52) = 0.6985$

A more precise value can be obtained using software
or a calculator (below).
Using either of these shows that $z = 0.5244$, that is $P(Z \leq 0.5244) = 0.7$

Use the z-score to find the 70

^{th}percentile of protein content for cow's milk. $X\sim N(3.4,0.3^2)$,

The 70

^{th}percentile is 0.52 SDs abovethe mean.

0.52 sd's is $0.52(0.3) = 0.156$ grams.

The value that is 0.52 sd's above the mean is $3.4 + 0.156 = 3.556$ grams.

The 70

^{th}percentile for the protein content of cows' milk is 3.556 grams.

### Sums of Normal Random Variables

A linear combination of normally distributed random variables is also normally distributed. For example, if $X$ is normally
distributed and $Y$ is normally distributed, then ($\small{X+Y}$), ($\small{Y-X}$), and ($\small{2X+3Y}$) are all normally distributed random variables as well.
The
Linearity Properties facilitate finding the expected value and variance.

Let $X_1, X_2, \ldots X_n$ be independent, normally distributed random variables with expected values $\mu_1, \mu_2\ldots \mu_n$ and variances $\sigma^2_1, \sigma^2_2, \ldots \sigma^2_n$ respectively and let $a_1, a_2, \ldots a_n$ be constants.

$\sum_{i=1}^na_iX_i \sim N\left(\sum_{i=1}^na_i\mu_i, \sum_{i=1}^na_i^2\sigma^2_i\right)$

Let $X_1, X_2, \ldots X_n$, $Y_1, Y_2, \ldots Y_n$, and $Z_1, Z_2, \ldots Z_n$ be independent random variables such that $X_i \sim N(4,4)$, $Y_i \sim N(2,9)$, and $Z_i \sim N(0,1)$. Find the distributions of the following random variables:

- $X_1+Y_1$
- $2Y_1-Z_2$
- $\sum_{i=1}^3 X_i-2Z_3$

Let $X_1, X_2, \ldots X_n$, be independent random variables such that $X_i \sim N(\mu,\sigma^2)$. What is the distribution of $\bar{X} = \sum_{i=1}^n\frac{1}{n}X_i$?