Everything we care about lies somewhere in the middle, where pattern and randomness interlace.

Top

Site Menu

Estimation

In 2009, the camera company, Nikon, released the results of a survey called "Picture Yourself". For the survey, they obtained a sample of 1000 US adults. They found that the proportion of respondents said that they look better in person than they do in photographs was 0.79. 0.79 is an estimate of the true proportion.

A person holding a camera up to their face, looking through the viewfinder. The camera is a black Nikon DSLR with a large lens.

Many statistical questions deal with estimation:

In the "Picture Yourself" survey by Nikon, the researchers were interested in the proportion of all US adults who think they look better in person thank in photographs. In statistical inference, information obtained from a sample is used to draw conclusions about a population. Thus, the value of the sample proportion is an estimate of the true value of the parameter. The observed value of a statistic is called a point estimate.

A point estimate is the value of a statistic computed from a sample.

Often, there is an estimator that seems 'natural' to use to estimate the value of a given parameter. For instance, it seems reasonable to use the sample mean to estimate the population mean.



Let $x_1, x_2, \ldots x_N$ denote the values in a population with mean $\mu$ and variance $\sigma^2$ and let $X_1, X_2, \ldots X_n$ denote a sample. Then $$\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i$$ is a 'natural' estimator of $$\mu=\frac{1}{N}\sum_{i=1}^Nx_i.$$ Notice that $\bar{X}$ is to the sample values what $\mu$ is to the population values. $\bar{X}$ is the sample analogue of $\mu$.

However, sometimes the 'natural' estimator is not the best choice. Consider the variance. $S^2$ is not the sample analogue of $\sigma^2$, not the 'natural' estimator (because the denominator is n-1 instead of just n) so why do we use $S^2$?



Let $x_1, x_2, \ldots x_N$ denote the values in a population with mean $\mu$ and variance $\sigma^2$ and let $X_1, X_2, \ldots X_n$ denote a sample. Then $$\hat{\sigma}^2=\frac{1}{n}\sum_{i=1}^n(X_i-\bar{X})^2$$ is the sample analogue of $$\sigma^2=\frac{1}{N}\sum_{i=1}^N(x_i-\mu)^2.$$ However, we generally estimate $\sigma^2$ with $$S^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2$$ instead. Why?

To determine how we choose the best estimators for a parameter of interest, consider two properties of estimators: the bias and the variance.

The bias and variance of an estimator influence its usability.

Properties of Estimators

Roughly speaking, bias indicates whether or not the statistic is 'aiming for' the parameter of interest and the variance indicates how much values of the estimate determined from different samples vary.

Bias

Unbiased estimators are generally preferred to biased ones. Stated formally, an estimator is an unbiased estimator of a parameter when its expected value is equal to that parameter. If an estimator is not unbiased it is biased. The bias is equal to the difference between the expected value of the estimator and the parameter.

An estimator $\hat{\theta}$ is an unbiased estimator of $\theta$ if $E(\hat{\theta}) = \theta$.



Given $X_1$, and $X_2$ such that $E(X_1) = \mu$ and $E(X_2)=\mu$ show that $\hat{\mu}=\frac{X_1}{4}+\frac{3X_2}{4}$ is an unbiased estimator of $\mu$.
$$\begin{array}{lcl} E(\hat{\mu})&=& E\left(\frac{X_1}{4}+\frac{3X_2}{4}\right)\\ &=& \frac{E(X_1)}{4}+\frac{3E(X_2)}{4}\\ &=& \frac{\mu}{4} + \frac{3\mu}{4}\\ &=& \mu \end{array}$$
Since $E(\hat{\mu}) = \mu$, $\hat{\mu}$ is an unbiased estimator for $\mu$.
NOTATION: $\theta$ is often used to denote an unspecified parameter. A "hat" on any parameter indicates and estimate of that parameter, e.g. $\hat{\theta}.$



Let $X_1, X_2, \ldots X_n$ be a sample from a population with mean $\mu$ (that is $E(X_i) = \mu$). Show that $\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i$ is an unbiased estimator for $\mu$.

$$\begin{array}{lcl} E(\bar{X})&=& E(\frac{1}{n}\sum_{i=1}^nX_i)\\ &=& \frac{1}{n}E(\sum_{i=1}^nX_i)\\ &=& \frac{1}{n}\sum_{i=1}^nE(X_i)\\ &=& \frac{1}{n}\sum_{i=1}^n\mu\\ &=& \frac{1}{n}n\mu\\ &=& \mu \end{array}$$ $E(\bar{X}) = \mu$ so $\bar{X}$ is an unbiased estimator for $\mu$.

$\bar{X}$ is an unbiased estimator for $\mu$




Let $X\sim Binomial(n,p)$. Show that $\hat{p}=\frac{X}{n}$ is an unbiased estimator for $p$.

If $X\sim Binomial(n,p)$, $E(X)=np$. $$\begin{array}{lcl} E(\hat{p})&=& E(\frac{X}{n})\\ &=& \frac{1}{n}E(X)\\ &=& \frac{1}{n}np\\ &=& p \end{array}$$ $E(\hat{p}) = p$ thus $\hat{p}$ is an unbiased estimator for $p$.

$\hat{p}$ is an unbiased estimator for $p$.




Let $X_1, X_2, \ldots X_n$ be a sample from a population with mean $\mu$ and variance $\sigma^2$, that is $E(X)=\mu$ and $Var(X_i)=\sigma^2$. Show that $S^2$ is an unbiased estimator for $\sigma^2$.

Recall that $Var(X) = E(X^2)-[E(X)]^2$, that is $\sigma^2 = E(X^2)-\mu^2$, so $E(X^2) = \sigma^2+\mu^2$.
Similarly, since $Var(\bar{X})=\frac{\sigma^2}{n}$ and $E(\bar{X})=\mu$, $E(\bar{X}^2) = \frac{\sigma^2}{n}+\mu^2$.

$$\begin{array}{lcl} E(S^2) &=& E\left(\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2\right) & \text{Definition of }S^2\\ &=& \frac{1}{n-1} E\left(\sum_{i=1}^n(X_i^2-2X_i\bar{X}-\bar{X}^2)\right) & \text{Expand the squared term.}\\ &=& \frac{1}{n-1} \left[\sum_{i=1}^nE(X_i^2)-2E(\sum_{i=1}^nX_i\bar{X})+\sum_{i=1}^nE(\bar{X}^2)\right] & \text{Separate the expected values.}\\ &=& \frac{1}{n-1} \left[\sum_{i=1}^nE(X_i^2)-2E(n\bar{X}\cdot\bar{X})+\sum_{i=1}^nE(\bar{X}^2)\right] & \text{Since }\sum_{i=1}^nX_i = n\bar{X}\\ &=& \frac{1}{n-1} \left[\sum_{i=1}^nE(X_i^2)-2E(n\bar{X}^2)+\sum_{i=1}^nE(\bar{X}^2) \right]& \text{Simplify} \\ &=& \frac{1}{n-1} \left[\sum_{i=1}^nE(X_i^2)-nE(\bar{X}^2)\right] & \text{Simplify} \\ &=& \frac{1}{n-1} \left[\sum_{i=1}^n(\sigma^2+\mu^2)-n\frac{\sigma^2}{n}-n\mu^2\right] & \text{By Note above.}\\ &=& \frac{1}{n-1} \left[n\sigma^2+n\mu^2-n\frac{\sigma^2}{n}-n\mu^2\right] & \text{Since } \sigma^2 \text{ and } \mu \text{ are constants.}\\ &=& \frac{1}{n-1} \left[n\sigma^2-\sigma^2\right] & \text{Simplify} \\ &=& \frac{1}{n-1} \left[(n-1)\sigma^2\right] & \text{Simplify} \\ &=& \sigma^2 & \text{Simplify} \\ \end{array}$$ Thus we see that $S^2$ is an unbiased estimator for $\sigma^2$.

$S^2$ is an unbiased estimator for $\sigma^2$.

$$\begin{array}{lcl} E(\hat{\upsilon})&=& E(\frac{2}{3}\sum_{i=1}^3X_i)\\ &=& \frac{2}{3}E(\sum_{i=1}^3X_i)\\ &=& \frac{2}{3}\sum_{i=1}^3E(X_i)\\ &=& \frac{2}{3}\sum_{i=1}^3\upsilon\\ &=& \frac{2}{3}3\upsilon\\ &=& 2\upsilon \end{array}$$ $E(\hat{\upsilon}) = 2\upsilon \neq \upsilon$ therefore, $\hat{\upsilon}$ us a biased estimator of $\upsilon$.

The bias is $E(\hat{\upsilon}) - \upsilon = 2\upsilon - \upsilon = \upsilon$.


Variance

For unbiased estimators or estimators equal bias, the better estimator is the one with the smallest variance.


Given independent $X_1$ and $X_2$ such that $Var(X_i) = \sigma^2$, find the variance of $\hat{\mu}=\frac{X_1}{4}+\frac{3X_2}{4}$.
$$\begin{array}{lcl} Var(\hat{\mu})&=& Var\left(\frac{X_1}{4}+\frac{3X_2}{4}\right)\\ &=& Var(\frac{X_1}{4})+Var(\frac{3X_2}{4})\\ &=& \frac{\sigma^2}{16} + \frac{9\sigma^2}{16}\\ &=& \frac{5\sigma^2}{8} \end{array}$$
$Var(\hat{\mu}) = \frac{5\sigma^2}{8}$.


Let $X_1, X_2, \ldots X_n$ be a sample from a population with mean $\mu$ and variance $\sigma^2$ (that is, $X_1, X_2, \ldots X_n$ are independent, $E(X_i) = \mu$ and $Var(X_i) = \sigma^2$). Find the variance of $\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i$.

$$\begin{array}{lcl} Var(\bar{X})&=& Var(\frac{1}{n}\sum_{i=1}^nX_i)\\ &=& \frac{1}{n^2}Var(\sum_{i=1}^nX_i)\\ &=& \frac{1}{n^2}\sum_{i=1}^nVar(X_i)\\ &=& \frac{1}{n^2}\sum_{i=1}^n\sigma^2\\ &=& \frac{1}{n^2}n\sigma^2\\ &=& \frac{\sigma^2}{n} \end{array}$$ $Var(\bar{X}) =\frac{\sigma^2}{n}$.

$Var(\bar{X}) =\frac{\sigma^2}{n}$.



Let $\small{X\sim Binomial(n,p)}$. Find $\small{Var(\hat{p})}$ where $\small{\hat{p}=\frac{X}{n}}$.

Recall that if $\small{X\sim Binomial(n,p)}$, $\small{E(X)=np}$ and $\small{Var(X) = np(1-p)}$.
$$\small{\begin{array}{lcl} Var(\hat{p})&=& Var(\frac{X}{n})\\ &=& \frac{1}{n^2}Var(X)\\ &=& \frac{1}{n^2}np(1-p)\\ &=& \frac{p(1-p)}{n} \end{array}}$$ $\small{Var(\hat{p}) = \frac{p(1-p)}{n}}$.

$Var(\hat{p}) = \frac{p(1-p)}{n}$.



Let $X_1, X_2, \ldots X_n$ be a sample from a population with mean $\mu$. Let $\bar{X}_{10}=\frac{1}{10}\sum_{i=1}^{10}X_i$ and $\bar{X}_{15}=\frac{1}{15}\sum_{i=1}^{15}X_i$. Which is the better estimator for $\mu$?

$\bar{X_n}=\frac{1}{n}\sum_{i=1}^{n}X_i$ is an unbiased estimator of $\mu$ and $Var(X_n) = \frac{\sigma^2}{n}$. Thus $\bar{X}_{10}$ and $\bar{X}_{15}$ are both unbiased estimators of $\mu$.

Furthermore, $Var(\bar{X}_{10}) = \frac{\sigma^2}{10}$ and $Var(\bar{X}_{15}) = \frac{\sigma^2}{15}$ .

Since both are unbiased and $\bar{X}_{15}$ has the smaller variance, $\bar{X}_{15}$ is the better estimator.
$$\small{\begin{array}{lcl} Var(\hat{\upsilon_1})&=& Var(\frac{2}{3}\sum_{i=1}^3X_i)\\ &=& \frac{4}{9}Var(\sum_{i=1}^3X_i)\\ &=& \frac{4}{9}\sum_{i=1}^3Var(X_i)\\ &=& \frac{4}{9}\sum_{i=1}^3\upsilon^2\\ &=& \frac{4}{9}3\upsilon^2\\ &=& \frac{4}{3}\upsilon^2 \end{array}}$$
$$\small{\begin{array}{lcl} Var(\hat{\upsilon_2})&=& Var(\frac{3}{4}\sum_{i=1}^3X_i)\\ &=& \frac{9}{16}Var(\sum_{i=1}^3X_i)\\ &=& \frac{9}{16}\sum_{i=1}^3Var(X_i)\\ &=& \frac{9}{16}\sum_{i=1}^3\upsilon^2\\ &=& \frac{9}{16}3\upsilon^2\\ &=& \frac{27}{16}\upsilon^2 \end{array}}$$
Since $\small{\frac{4}{3} \lt \frac{27}{16}}$, $\small{Var(\hat{\upsilon_1}) \lt Var(\hat{\upsilon_2})}$. $\small{\hat{\upsilon_1}}$ has the smaller variance.


Standard Error

The standard deviation of a statistic is called the standard error. The units of the variance are the units of the context squared while the units of the standard error are the same as the units of the context. Thus it is often useful to report the standard error along with an estimate.

The standard error is the standard deviation of a statistic.



Given $X_1$ and $X_2$ such that $Var(X_i) = \sigma^2$, find the standard error (se) of $\hat{\mu}=(X_1+X_2)/2$.
$Var(\hat{\mu}) = \frac{\sigma^2}{2}$.

$se(\hat{\mu}) = \sqrt{\frac{\sigma^2}{2}} = \frac{\sigma}{\sqrt{2}}$.

In the example at the beginning of this page, the point estimate for the proportion of US adults who say they look better in person than in photos is 0.79. What is the standard error?

$\small{Var(\hat{p}) = \frac{p(1-p)}{n}}$ so $\small{se(\hat{p}) = \sqrt{\frac{p(1-p)}{n}}}$. However, this can't be calculated from the information given since the value of $p$ is unknown (0.79 is an estimate, $\hat{p}$).

In practice, it is usual to compute the estimated standard error using information obtained from the sample.



$se(\bar{X}) = \sqrt{\frac{S^2}{n}}$




$se(\hat{p}) = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$


The estimated standard error the the proportion of US adults who say they look better in person than in photos is $\small{\sqrt{\frac{0.79(1-0.79)}{1000}} = 0.0129}$

Since the true standard error is rarely available in practice, it is common to say 'standard error' when referring to the 'estimated standard error'.