Statistics is the grammar of science.
Karl Pearson

Top

Site Menu

What is Statistics?

A woman wearing a mask. Which is the most effective face mask for preventing spread of coronavirus? Is marijuana use during pregnancy related to autism in children? Can spending five minutes a day training the muscles we use to breathe bring some of the same benefits as 30 minutes of exercise?

All of these questions reflect recent headlines from various news sites and all of them depend upon statistics for their answers.

Statistics encompasses methods for data collection, analysis, and interpretation. Statistical methods influence our lives daily. They help to determine the medicines that are available to us, the ads that we get from the grocery store, and the websites our search engines refer us to. They are essential to weather forecasts, insurance rates, and the quality control of the products we buy. We use them to keep track of our health, the economy, and social issues such as race and gender equality, crime rates, and poverty. We see them in the form of graphics and charts, means and percentages, conclusions and predictions in academic research, media reports, and advertisements.


Statistics is the science of learning from data.


Most say it’s now more common for people to express racist or racially insensitive views
Figure: A statistical graphic from the media.

StatsStuff covers basic statistical methods in the following areas:

The purpose of this site is to introduce basic ideas and concepts and to help site users to become more aware of how statistical methods influence their lives.



Statistics and Variation

The field of Statistics is largely concerned with describing variation in data, that is, difference between subjects in the group or groups being studied. Variation in data can arise from a number of sources:

For a more in-depth description of sources of variation, see the GAISE report, page 11.

We can describe the variation of a particular variable with its distribution. Distributions can describe a variable related to observed data (such as how much money each person in a sample spends on fast food) or to an experiment (such as the outcomes of a die roll). Essentially, a distribution indicates the possible values of a variable and their associated frequencies or probabilities. Distributions can be displayed as formulas, tables, or graphs. Distributions will be be important throughout our discussion of statistics.

Distribution: the set of values a variable can assume and their associated frequencies or probabilities.


Example: Consider the variable, "the outcome of a standard die roll". The distribution can be described with a table, line graph, or formula as shown here.

Table showing the probability distribution for rolling a fair six-sided die. The top row is labeled ‘Outcome’ with values 1, 2, 3, 4, 5, and 6. The second row is labeled ‘Probability,’ and each outcome has a probability of one-sixth (1/6).

Probability distribution graph for rolling a fair six-sided die. The x-axis is labeled ‘Outcome’ with values 1 through 6. The y-axis is labeled ‘Probability’ ranging from 0 to 0.2. There are six vertical blue lines of equal height at each outcome (1–6), each representing a probability of one-sixth (approximately 0.167).

$\small{P(outcome = x) = \frac{1}{6}}$ for $\small{x = 1, 2, 3, 4, 5, 6}$

Each of these representations makes it clear which values the variable can assume and how likely they are.