Estimation
In 2009, the camera company, Nikon, released the results of a survey called "Picture Yourself". For the survey, they obtained a sample of 1000 US adults. They found that the proportion of respondents said that they look better in person than they do in photographs was 0.79. 0.79 is an estimate of the true proportion.
Many statistical questions deal with estimation:
- What is the mean age of recent college graduates?
- What proportion of republicans support the current US president?
In the "Picture Yourself" survey by Nikon, the researchers were interested in the proportion of all US adults who think they look better in person thank in photographs. In statistical inference, information obtained from a sample is used to draw conclusions about a population. Thus, the value of the sample proportion is an estimate of the true value of the parameter. The observed value of a statistic is called a point estimate.
A point estimate is the value of a statistic computed from a sample.
Often, there is an estimator that seems 'natural' to use to estimate the value of a given parameter. For instance, it seems reasonable to use the sample mean to estimate the population mean.
Let $x_1, x_2, \ldots x_N$ denote the values in a population with mean $\mu$ and variance $\sigma^2$ and let $X_1, X_2, \ldots X_n$ denote a sample. Then $$\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i$$ is a 'natural' estimator of $$\mu=\frac{1}{N}\sum_{i=1}^Nx_i.$$ Notice that $\bar{X}$ is to the sample values what $\mu$ is to the population values. $\bar{X}$ is the sample analogue of $\mu$.
However, sometimes the 'natural' estimator is not the best choice. Consider the variance. $S^2$ is not the sample analogue of $\sigma^2$, not the 'natural' estimator (because the denominator is n-1 instead of just n) so why do we use $S^2$?
Let $x_1, x_2, \ldots x_N$ denote the values in a population with mean $\mu$ and variance $\sigma^2$ and let $X_1, X_2, \ldots X_n$ denote a sample. Then $$\hat{\sigma}^2=\frac{1}{n}\sum_{i=1}^n(X_i-\bar{X})^2$$ is the sample analogue of $$\sigma^2=\frac{1}{N}\sum_{i=1}^N(x_i-\mu)^2.$$ However, we generally estimate $\sigma^2$ with $$S^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2$$ instead. Why?
To determine how we choose the best estimators for a parameter of interest, consider two properties of estimators: the bias and the variance.
Properties of Estimators
Roughly speaking, bias indicates whether or not the statistic is 'aiming for' the parameter of interest and the variance indicates how much values of the estimate determined from different samples vary.
Bias
Unbiased estimators are generally preferred to biased ones. Stated formally, an estimator is an unbiased estimator of a parameter when its expected value is equal to that parameter. If an estimator is not unbiased it is biased. The bias is equal to the difference between the expected value of the estimator and the parameter.
An estimator $\hat{\theta}$ is an unbiased estimator of $\theta$ if $E(\hat{\theta}) = \theta$.
Given $X_1$, and $X_2$ such that $E(X_1) = \mu$ and $E(X_2)=\mu$ show that $\hat{\mu}=\frac{X_1}{4}+\frac{3X_2}{4}$ is an unbiased estimator of $\mu$.
$$\begin{array}{lcl} E(\hat{\mu})&=& E\left(\frac{X_1}{4}+\frac{3X_2}{4}\right)\\ &=& \frac{E(X_1)}{4}+\frac{3E(X_2)}{4}\\ &=& \frac{\mu}{4} + \frac{3\mu}{4}\\ &=& \mu \end{array}$$
Since $E(\hat{\mu}) = \mu$, $\hat{\mu}$ is an unbiased estimator for $\mu$.
Let $X_1, X_2, \ldots X_n$ be a sample from a population with mean $\mu$ (that is $E(X_i) = \mu$). Show that $\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i$ is an unbiased estimator for $\mu$.
$\bar{X}$ is an unbiased estimator for $\mu$
Let $X\sim Binomial(n,p)$. Show that $\hat{p}=\frac{X}{n}$ is an unbiased estimator for $p$.
$\hat{p}$ is an unbiased estimator for $p$.
Let $X_1, X_2, \ldots X_n$ be a sample from a population with mean $\mu$ and variance $\sigma^2$, that is $E(X)=\mu$ and $Var(X_i)=\sigma^2$. Show that $S^2$ is an unbiased estimator for $\sigma^2$.
$S^2$ is an unbiased estimator for $\sigma^2$.
The bias is $E(\hat{\upsilon}) - \upsilon = 2\upsilon - \upsilon = \upsilon$.
Variance
Given independent $X_1$ and $X_2$ such that $Var(X_i) = \sigma^2$, find the variance of $\hat{\mu}=\frac{X_1}{4}+\frac{3X_2}{4}$.
$$\begin{array}{lcl} Var(\hat{\mu})&=& Var\left(\frac{X_1}{4}+\frac{3X_2}{4}\right)\\ &=& Var(\frac{X_1}{4})+Var(\frac{3X_2}{4})\\ &=& \frac{\sigma^2}{16} + \frac{9\sigma^2}{16}\\ &=& \frac{5\sigma^2}{8} \end{array}$$
$Var(\hat{\mu}) = \frac{5\sigma^2}{8}$.
Let $X_1, X_2, \ldots X_n$ be a sample from a population with mean $\mu$ and variance $\sigma^2$ (that is, $X_1, X_2, \ldots X_n$ are independent, $E(X_i) = \mu$ and $Var(X_i) = \sigma^2$). Find the variance of $\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i$.
$Var(\bar{X}) =\frac{\sigma^2}{n}$.
Let $\small{X\sim Binomial(n,p)}$. Find $\small{Var(\hat{p})}$ where $\small{\hat{p}=\frac{X}{n}}$.
$Var(\hat{p}) = \frac{p(1-p)}{n}$.
Let $X_1, X_2, \ldots X_n$ be a sample from a population with mean $\mu$. Let $\bar{X}_{10}=\frac{1}{10}\sum_{i=1}^{10}X_i$ and $\bar{X}_{15}=\frac{1}{15}\sum_{i=1}^{15}X_i$. Which is the better estimator for $\mu$?
Standard Error
The standard deviation of a statistic is called the standard error. The units of the variance are the units of the context squared while the units of the standard error are the same as the units of the context. Thus it is often useful to report the standard error along with an estimate.
The standard error is the standard deviation of a statistic.
Given $X_1$ and $X_2$ such that $Var(X_i) = \sigma^2$, find the standard error (se) of $\hat{\mu}=(X_1+X_2)/2$.
$Var(\hat{\mu}) = \frac{\sigma^2}{2}$.
$se(\hat{\mu}) = \sqrt{\frac{\sigma^2}{2}} = \frac{\sigma}{\sqrt{2}}$.
In the example at the beginning of this page, the point estimate for the proportion of US adults who say they look better in person
than in photos is 0.79. What is the standard error?
$\small{Var(\hat{p}) = \frac{p(1-p)}{n}}$ so $\small{se(\hat{p}) = \sqrt{\frac{p(1-p)}{n}}}$. However, this can't be calculated from the
information given since the value of $p$ is unknown (0.79 is an estimate, $\hat{p}$).
In practice, it is usual to compute the estimated standard error using information obtained from the sample.
$se(\bar{X}) = \sqrt{\frac{S^2}{n}}$
$se(\hat{p}) = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$
The estimated standard error the the proportion of US adults who say they look better in person than in photos is $\small{\sqrt{\frac{0.79(1-0.79)}{1000}} = 0.0129}$
Since the true standard error is rarely available in practice, it is common to say 'standard error' when referring to the 'estimated standard error'.