All knowledge degenerates into probability.

Top

## The Normal Distribution

What does the protein content in cows' milk have in common with human IQ?

Both variables have approximately normal distributions. The normal distribution is a good model for measurements of many kinds, including IQs, heights, and lengths of pregnancies.

The distribution of the protein content in cow's milk has the classic bell shape of the normal distribution. Most observations are near the mean (3.4 grams) but a few are much larger or smaller.

The normal distribution is widely used in probability theory and underlies much of statistical inference.

The normal distribution is also called the "Gaussian distribution" or "bell curve". A normal distribution has two parameters, the mean $\mu$, and the variance $\sigma^2$. The mean can be any real number and the variance can be any non-negative number.

NOTATION: "$X\sim N(\mu, \sigma^2)$" indicates that the random variable X is normally distributed with mean $\mu$ and variance $\sigma^2$.

$f(x)=\frac{1}{\sqrt{2\pi}}e^{\frac{-(x-\mu)^2}{2\sigma^2}}$, $-\infty \lt x \lt \infty$

Adjust the sliders to see how changing the values affected the graph.

$-5 \leq \mu \leq 5$
$0.5 \leq \sigma \leq 5$.

Note: $\sigma=\sqrt{\sigma^2}$

### The Standard Normal Distribution

The normal distribution that has mean 0 and variance 1 is called the 'standard normal' distribution. A random variable that has a standard normal distribution is usually denoted with $Z$. That is $Z\sim N(0,1)$. Moreover, we use $\phi(z)$ and $\Phi(z)$ to denote respectively the probability mass function and cumulative distribution function of a standard normal random variable.

NOTATION: The probability density function of a standard normal distribution is denoted by $\phi(z)$ and the cumulative distribution function, $P(Z\leq z)$, by $\Phi(z)$.

### The Empirical Rule

For a normal distribution, the area under the curve within a given number of standard deviations (SDs) of the mean is the same regardless of the value of the mean and the standard deviation. In particular, about 68% of the area is within 1 standard deviation of the mean, 95% is within 2 standard deviations of the mean, and 99.7% is within 3 standard deviations of the mean.

• $P(\mu - \sigma \lt Z \lt \mu+\sigma)\approx 0.68$.
• $P(\mu - 2\sigma \lt Z \lt \mu+2\sigma)\approx 0.95$.
• $P(\mu - 3\sigma \lt Z \lt \mu+3\sigma)\approx 0.997$.

Use the radiobuttons at the bottom to show the regions within 1, 2, or 3 standard deviations ($\sigma$) of the mean ($\mu$).

Change the values of $\mu$ and $\sigma$ to verify that the areas within a given number of sd's from the mean are the same regardless of the values of the mean and standard deviation.

The green 'z' can be dragges to mark an area within 'z' standard deviations of the mean.

### Standard Units

Determining how many standard deviations a value is from the mean is called standardizing the value or converting it to standard units.

Standard units indicate how many standard deviations a value is from the mean.

To standardize a value means to indicate how many standard deviations the value is from the mean.

Suppose the mean height of students in a particular statistics class is 69 inches with a standard deviation of 4 inches. How many standard deviations from the mean is a height of 73 inches? 63 inches?

73 inches is 4 inches, or 1 standard deviation, above the mean.

63 inches is 6 inches, or 1.5 standard deviations, below the mean.

When we standardize a normally distributed random variable, the resulting random variable has a standard normal distribution.

If $X$ is a random variable such that $X\sim N(\mu,\sigma^2)$, $$Z=\frac{X-\mu}{\sigma}\sim N(0,1)$$

We can also standardize a partiular value or find how many standard deviations the value is from the mean.

If $X\sim N(\mu, \sigma^2)$, to find how many standard deviations a value, x, is from the mean (i.e. to standardize x), subtract $\mu$ from $x$ and divide the result by the standard deviation: $z=\frac{x-\mu}{\sigma}$. The result is typcially denoted with 'z' and is often referred to as a z-score.

A z-score indicates how many standard deviations a particular value is from the mean, the standard units.

$z=\frac{x-\mu}{\sigma}$.

The units of the standard normal curve are standard units. That is if $Z\sim N(0,1)$, the value of Z that is 1 sd above the mean is 1, the value that is 2 sd's below the mean is -2, etc.

The units of the standard normal curve are standard units.

The mean protein content in the milk of a group of cows in the weeks after calving is 3.4g with a standard deviation of 0.3g.
1. How many sd's from the mean is a value of 4.1g?
2. Express 3.2g in standard units.
3. What value is 2.5 sd's above the mean?

1. How many sd's from the mean is a value of 4.1g?
$z=\frac{4.1-3.4}{.3} = 2.333$.
4.1g is 2.333 sd's below the mean.

2. Express 3.2g in standard units.
$z=\frac{3.2-3.4}{.3} = \frac{2}{3}$.
3.2g is $\frac{2}{3}$ in standard units.

3. What value is 2.5 sd's above the mean?
2.5 sd's is 2.5×0.3 = 0.75g.
0.75g above the mean is 3.4+0.75 = 4.15g.

### Finding Probabilties with the Normal Distribution

For a continuous random variable X with probability density function $\small{f(x)}$, $\small{P(a \leq X \leq b) = \int_a^bf(x)dx}$. However, the probability density function of a normal random variable cannot be integrated by hand. To find probabilities pertaining to a normal distribution therefore, it is necessary either to use software or to use a table.

Use the sliders to adjust $\mu$ and $\sigma$.

Drag the orange triangle to change the value of x in the expression $P(X\leq x)$. The resulting probabability is given.

A normal cdf table gives values of the cumulative distribution function for the standard normal distribution. To use the table to find $P(X\leq x)$ where $X\sim N(\mu, \sigma^2)$:

1. Compute the z-score for $x$, $z=\frac{x-\mu}{\sigma}$.
2. Find the z-score on the table, the first two digits along the left margin and a third digit along the top.
3. The value in the table on the row and column indicated by the previous step is $P(X \leq x)$.

z-scores are indicated along the margins of the table. The body of the table contains cumulative probabilities, $P(Z\leq z) = \Phi(z)$.

Move the arrows in the margins to locate a specific z-score. The first two digits of the z-score are marked on the left margin and the third digit on the top. The value in the box, where the indicate row and column interact is the descired probability.

Find the probability that protein content, X, in the milk of a cow is less than 3g. $X \sim N(3.4, 0.3^2)$.
1. Find 3g in standard units: $z=\frac{3-3.4}{0.3} = -1.33$
2. Find -1.3 along the left margin of the table.
3. Find 0.03 along the top margin of the table.
$\Phi(-1.33) = 0.1020$
$P(X \leq -1.33) = 0.1020$

Find the probability that a standard normal random variable takes a value less than -0.72.

Using the table, find -0.7 along the left margin and 0.02 along the top.
$\Phi(-0.72) = 0.2483$.

Normal probabilities can also be found using software or with a calculator with statistical functions.

$X\sim N(3.4,0.3^2)$

Evaluate $P(X \leq 2)$ on a TI-84

2ND VARS 2
lower: -1E99.
upper: 2
μ: 3.4
σ:0.3
ENTER
Notes:
1. The true lower endpoint should be $-\infty$, since the calculator can't handle this, enter something very small compared to the mean.
2. When using a calculator or software to find a normal probability, it is typically not necessary to standardize first.

Find the probability that a standard normal random variable takes a value greater than 0.63.

$P(Z\leq 0.63) = \Phi(0.63) = 0.7357$.

Using the complement rule, $P(Z > 0.63) = 1 - \Phi(0.63) = 1 - 0.7357 = 0.2643$

Suppose $X\sim N(25, 9)$.
1. Without doing any calculations, is $P(X \leq 23)$ greater than or less than 0.5?
2. Find $P(X \leq 23)$.

1. Since the normal curve is symmetric about its mean, half of the area under the curve is above the mean and half is below. $P(X \leq 23)$ must be less than 0.5 since 23 < 25.

2. $P(X \leq 23) = P(Z \leq \frac{23-25}{3}) = \Phi(-0.67) = 0.2981$

To find the probability that a normal random variable takes on a value over a given interval: $\small{P(a \leq X \leq b) = F(b)-F(a)}$. Calculator or software can evaluate this probability directly.

$\small{P(a \leq X \leq b) = F(b)-F(a)}$

What is the probability that the protein content in the milk is between 2.5 and 3 grams? That is, if $X \sim N(3.4, 0.3^2)$, what is $P(2.5 \leq X \leq 3)$?

Using the table:
$$\small{\begin{array} {lcl}P(2.5 \leq X \leq 3) &=& F(3) - F(2.5) \\ &=& P(X \leq 3) - P(X \leq 2.5)\\ &=& P(Z \leq \frac{3-3.4}{.3}) - P(Z \leq \frac{2.5-3.4}{.3}) \\ &=& \Phi(-1.33) - \Phi(-3)\\ &=& 0.1020 - 0.0013 \\ &=& 0.1007 \end{array}}$$

Using a TI-84 calculator:

normalcdf
lower: 2.5.
upper: 3
μ: 3.4
σ:0.3
0.1007
$P(2.5 \leq X \leq 3)=0.1007$

$X\sim N(25, 9)$. Find $P(22 \leq X \leq 30)$.

$$\begin{array} {lcl}P(22 \leq X \leq 30 &=& F(30) - F(22) \\ &=& P(X \leq 30) - P(X \leq 22)\\ &=& P(Z \leq \frac{30-25}{3}) - P(Z \leq \frac{22-25}{3}) \\ &=& \Phi(1.67) - \Phi(-1)\\ &=& 0.9525 - 0.1587\\ &=& 0.7938 \end{array}$$

### Finding Percentiles of the Normal Distribution

To find percentiles from the normal distribution, calculate how many sd's the given percentile is from the mean, then find the value of the variable that corresponds to that z-score.

How many SDs from the mean is the 70th percentile?

This question is equivalent to asking for the value of z-score associated with the 70th percentile. Regardless of the values of the mean and variance of a normal distribution, the z-score corresponding to the 70th percentile is the same.

The 70th percentile is the value, T, such that 70% of the area is less than T.

To find T using the table, look for 0.7 in the body of the table and find the associated z-score. Since the exact value 0.7 is not in the table, it is reasonable to use the closest available value, 0.6985. Reading from the margins, the z-score associated with 0.6985 is 0.52, that is $P(Z \leq 0.52) = 0.6985$

A more precise value can be obtained using software or a calculator (below). Using either of these shows that $z = 0.5244$, that is $P(Z \leq 0.5244) = 0.7$

Find the Z-score corresponding to a given percentile:

Find the 70th percentile on a TI-84

2ND VARS 3
invNorm
area: 0.7.
μ: 0
σ:1
ENTER
Notes:
1. Since the units of the standard normal curve are standard units, use the standard normal distribution to find the z-score that corresponds to a percentile.
2. Divide the percentile by 100 to find the corresponding area.

Find the value of a given percentile:

$X \sim N(3.4, 0.3^2)$

Find the 70th percentile on a TI-84

2ND VARS 3
invNorm
area: 0.7.
μ: 3.4
σ:0.3
ENTER

Use the z-score to find the 70th percentile of protein content for cow's milk. $X\sim N(3.4,0.3^2)$,

The 70th percentile is 0.52 SDs abovethe mean.

0.52 sd's is $0.52(0.3) = 0.156$ grams.

The value that is 0.52 sd's above the mean is $3.4 + 0.156 = 3.556$ grams.

The 70th percentile for the protein content of cows' milk is 3.556 grams.

### Sums of Normal Random Variables

A linear combination of normally distributed random variables is also normally distributed. For example, if $X$ is normally distributed and $Y$ is normally distributed, then ($\small{X+Y}$), ($\small{Y-X}$), and ($\small{2X+3Y}$) are all normally distributed random variables as well. The Linearity Properties facilitate finding the expected value and variance.

Let $X_1, X_2, \ldots X_n$ be independent, normally distributed random variables with expected values $\mu_1, \mu_2\ldots \mu_n$ and variances $\sigma^2_1, \sigma^2_2, \ldots \sigma^2_n$ respectively and let $a_1, a_2, \ldots a_n$ be constants.

$\sum_{i=1}^na_iX_i \sim N\left(\sum_{i=1}^na_i\mu_i, \sum_{i=1}^na_i^2\sigma^2_i\right)$

Let $X_1, X_2, \ldots X_n$, $Y_1, Y_2, \ldots Y_n$, and $Z_1, Z_2, \ldots Z_n$ be independent random variables such that $X_i \sim N(4,4)$, $Y_i \sim N(2,9)$, and $Z_i \sim N(0,1)$. Find the distributions of the following random variables:

1. $X_1+Y_1$
2. $2Y_1-Z_2$
3. $\sum_{i=1}^3 X_i-2Z_3$
1. $E(X_1+Y_1)=E(X_1)+E(Y_1)=4+2=6$
$Var(X_1+Y_1)=Var(X_1)+Var(Y_1)=4+9=13$
Since $X_1$ and $Y_1$ are both normally distributed, so is $X_1+Y_1$. $(X_1+Y_1) \sim N(6,13)$.

2. $E(2Y_1-Z_2)=E(2Y_1)+E(-Z_2)=2E(Y_1)-E(Z_2)=2(2)-0=4$
$Var(2Y_1-Z_2)=Var(2Y_1)+Var(-Z_2)=2^2Var(Y_1)+(-1)^2Var(Z_2)=4(9)+1(1)=37$
Since $Y_1$ and $Z_2$ are both normally distributed, so is $2Y_1-Z_2.$ $(2Y_1-Z_2) \sim N(4,37)$.

3. $E(\sum_{i=1}^3 X_i-2Z_3)=E(\sum_{i=1}^3 X_i)+E(-2Z_3) =\sum_{i=1}^3E(X_i)-2E(Z_3)=\sum_{i=1}^3(4)-2(0)=3(4)-0=12$
$Var(\sum_{i=1}^3 X_i-2Z_3)=Var(\sum_{i=1}^3 X_i)+Var(-2Z_3) =\sum_{i=1}^3Var(X_i)+(-2)^2E(Z_3)=\sum_{i=1}^3(4)+4(1)=3(4)+4(1)=16$
Since $X_i$ and $Z_3$ are both normally distributed, so is $\sum_{i=1}^3 X_i-2Z_3$. $(\sum_{i=1}^3 X_i-2Z_3) \sim N(12,16)$.

Let $X_1, X_2, \ldots X_n$, be independent random variables such that $X_i \sim N(\mu,\sigma^2)$. What is the distribution of $\bar{X} = \sum_{i=1}^n\frac{1}{n}X_i$?

\begin{array}{lcl} E(\bar{X}) &=& E\left(\sum_{i=1}^n\frac{1}{n}X_i\right)\\ &=&\sum_{i=1}^n\frac{1}{n}E(X_i)\\ &=&\sum_{i=1}^n\frac{1}{n}\mu\\ &=& n\frac{1}{n}\mu\\ &=& \mu \end{array}
\begin{array}{lcl} Var(\bar{X}) &=& Var\left(\sum_{i=1}^n\frac{1}{n}X_i\right)\\ &=&\sum_{i=1}^n\left(\frac{1}{n}\right)^2E(X_i)\\ &=&\sum_{i=1}^n\left(\frac{1}{n}\right)^2\sigma^2\\ &=& n\left(\frac{1}{n}\right)^2\sigma^2\\ &=& \frac{\sigma^2}{n} \end{array}
$\bar{X}\sim N\left(\mu, \frac{\sigma^2}{n}\right)$.