In God we trust, all others bring data.

Top

Site Menu

Summary for Two Variables

Looking at individual characteristics of states, such as marriage rate or population, is interesting. However, examining two variables simultaneously gives additional information.

The plot (scatterplot) on the right summarizes life expectance and murder rates for the 50 US states and shows the relationship between these variables.

Scatterplot titled “Murder Rates and Life Expectancy in US States, 2009.” Each point represents a U.S. state, showing an inverse relationship where higher murder rates correspond to lower life expectancy. The x-axis is labeled “Murder Rate,” and the y-axis is labeled “Life Expectancy.

The x-coordinate of each plot indicates the life expectancy for the corresponding state while the y-coordinate indicates the murder rater for the same state. Looking at this plot, it is clear that states with low life expectancies also have higher murder rates and the states with lower murder rates have higher life expectancies.

In bivariate data, for each subject in the dataset there are measurements on two variables. Life expectancy and murder rate, as in the example above, are both numeric variables. Bivariate data can also consist of two categorical variables or a categorical variable and a numeric variable.

In bivariate data there are measurements on two variables for each subject.


Two Numeric Variables

Data on two numeric variables is usually displayed with a scatterplot. In a scatterplot, each subject is represented by one point where the coordinates of the points are the measurements on the two variables.

The labeled point on the plot represents New Mexico. The 2009 murder rate in New Mexico was 10 per 100,000 and the life expectancy was 75.76 years.

Scatterplot titled “Murder Rates and Life Expectancy in US States, 2009.” Each point represents a U.S. state, showing a negative relationship between murder rate and life expectancy. One point is highlighted in orange and labeled “New Mexico (10, 75.74),” indicating New Mexico’s murder rate of 10 and life expectancy of 75.74 years.

A scatter plot shows a point for each observation such that the x-coordinate of the point is the value of one variable and the y-coordinate is the value of the other for that observation.


One Categorical and One Numeric Variable

Boxplots facilitate looking at the distribution of a numeric variable across levels of a categorical variable.

Boxplot titled “Bachelor Degree Percentage vs Region.” It compares the percent of adults with bachelor’s degrees across four U.S. regions: Midwest, Northeast, South, and West. The Northeast shows the highest median and narrowest spread, while the South has the lowest median and widest range.

Association is not Causation

An association between variables is an observable relationship. For instance, lower murder rates are associated with higher life expectancies in US States or living in the North East is associated with a greater percentage of bachelor's degrees. However, an association between variables is not the same as a causal relationship. That is, lower murder rates do not necessarily cause higher life expectancies nor does living in the North East necessarily cause more people to get Bachelor degrees. One variable may cause a change in the other but an association does not certainly indicate that it does.