Bivariate Data Introduction
As of May 2020, the fastest roller coaster in the world is the Formula Rossa coaster in Abu Dhabi, United Arab Emirates. The roller coaster, which opened in 2010, has a top speed of 149.1 mph (240km/h). The Top Thrill Dragster coaster, located in Ohio, USA only reaches speeds of 120 mph but has a maximum height of 420 feet compared to Formula Rossa's 170.6 feet.
Is there a relationship betweeen the height of a roller coaster and its maximum speed? Would it be possible to estimate the maximum speed of a based on its height? These are questions that about bivariate data, data that involve two measurements on each subject.
Bivariate Data: Data composed of measurements on two variables for each object or subject.
Bivariate data are often displayed in a scatterplot in which the values of the two variables for each subject form an ordered pair.Displaying the data this way makes it possible to summarize the individual variables as well as the relationship between them.
In the plot, the point (170.6, 149.1) corresponds to the Formula Rossa coaster.
From the scatterplot, it is apparent that the heights of the roller coasters vary from close to 0 feet to over 400 feet but that most of the coasters are shorter than 300 feet.
The speeds vary from pretty close to 0 up to nearly 150mph. There is a strong relationship between the height and speeds of the roller coasters, that is, in general,
faster roller coasters tend to be taller.
Scatter Plot: A plot in which each point is identified by an ordered pair consisting of measurements on the two variables.
From the scatterplot, it is also evident that one roller coaster is much faster than the others. This is the Formula Rossa. Interestingly, it is somewhere near the middle
of the pack in terms of height.
Also noticable and perhaps surprising from the scatterplot is that there are a number of roller coasters that are both very short and very slow (comparatively speaking). The coaster that
is both the shortest (8 feet) and the slowest (4.5 mph) is at the Grona Lund park in Sweden. Not surprisingly, this is a "kiddie coaster" meant for
kids between 0 and 3 years old. While it is probably safe to assume that many of other the shortest and slowest rides are also kiddie coasters, there is no clear
point below which we could assume the coasters are all meant for kids.
When analyzing bivariate data, we commonly refer to the independent variable as the 'explanatory variable' because it can be used to 'explain'
changes in the dependent or 'response variable'. For the roller coaster data, height is plotted on the horizontal axis and is the explanatory or independent
variable. The choice of which variable is the explanatory variable is not always clear cut and can be left to the choice of the investigator.
Explanatory variable: The independent or 'x' variable is also called the explanatory variable.
Response variable: The dependent or 'y' variable is also called the explanatory variable.
A scatterplot is a useful graphical summary of bivariate data. The next section introduces the
correlation, a numerical summary of the relationship
between two variables.