What is Statistics?
Which is the most effective face mask for preventing
spread of coronavirus? Is marijuana use during pregnancy related to autism in children?
Can spending five minutes a day
training the muscles we use to breathe bring some of the same benefits as 30 minutes of exercise? All of these questions reflect recent headlines from various news sites and all of them depend upon statistics for their answers.
Statistics encompasses methods for data collection, analysis, and interpretation. Statistical methods influence our lives daily. They help to determine the medicines that are available to us, the ads that we get from the grocery store, and the websites our search engines refer us to. They are essential to weather forecasts, insurance rates, and the quality control of the products we buy. We use them to keep track of our health, the economy, and social issues such as race and gender equality, crime rates, and poverty. We see them in the form of graphics and charts, means and percentages, conclusions and predictions in academic research, media reports, and advertisements.
StatsStuff covers basic statistical methods in the following areas:
- Data Collection: This involves collecting data in such a way as to reduce biases and to facilitate the production of accurate and reliable conclusions.
- Descriptive Statistics: These methods are used to create numerical and visual descriptions of data that enable us to grasp key information without looking at the data values in detail.
- Probability Theory: Probability is the mathematics that underlies the science of statistics.
- Statistical Inference: The process used to draw conclusions about large populations based on well chosen subsets or samples.
The purpose of this site is to introduce basic ideas and concepts and to help site users to become more aware of how statistical methods influence their lives.
Statistics and Variation
The field of Statistics is largely concerned with describing variation in data, that is, difference between subjects in the group or groups being studied. Variation in data can arise from a number of sources:
- Natural variation: differences that exist inherently between individuals, for example size, weight, or color.
- Induced variation: differences that exist because they were intentionally introduced. For instance, a researcher induces variation in a drug study by giving some subjects a treatment and some a placebo.
- Measurement variation: differences due to inadquacies in the tools or methods we use to create measurements.
- Sampling variation: differences that exist due to the many possibilities for choosing a sample.
For a more in-depth description of sources of variation, see the GAISE report, page 11.
We can describe the variation of a particular variable with its distribution. Distributions can describe a variable related to observed data (such as how much money each person in a sample spends on fast food) or to an experiment (such as the outcomes of a die roll). Essentially, a distribution indicates the possible values of a variable and their associated frequencies or probabilities. Distributions can be displayed as formulas, tables, or graphs. Distributions will be be important throughout our discussion of statistics.

$\small{P(outcome = x) = \frac{1}{6}}$ for $\small{x = 1, 2, 3, 4, 5, 6}$
Each of these representations makes it clear which values the variable can assume and how likely they are.
