In God we trust, all others bring data.

Top

Site Menu

Box Plots

This diagram, a boxplot shows the distribution of the number of marriages per 1000 people in the 50 United States in 2020. The vertical line on the left indicates the minimum, the blue box shows the middle 50% of values and the vertical line on the right corresponds to the state with the maximum number of marriages per 1000 residents.

How would you describe this distribution?

Boxplot titled ‘Marriages per 1000 Residents in US States, 2020’ showing a narrow blue box near 10 with a long right whisker.

A boxplot is a common graphical summary for numeric data. It provides a simple summary of a variables distribution. A boxplot (or box-and-whisker plot) consists of a box that extends from the lower to the upper quartile, a line through the box at the median, and lines or "whiskers" that extend to the minimim and maximum values in the data. A common variation on the box plot has the whiskers extend to the minimum and maximum values that are not outliers with the outliers marked with points beyond. Boxplots can be either horizontally or vertically oriented.

Box plot titled “Murders per 100,000 People in US States, 2020.” The box shows the middle 50% of data, ranging from about 3 to 6 murders per 100,000. A horizontal line inside the box marks the median near 4. Whiskers extend to the minimum around 1.5 and the maximum near 12. Labeled arrows identify the minimum, 1st quartile, median, 3rd quartile, and maximum.

Boxplot 1: whiskers to minimum and maximum.

Box plot titled “Murders per 100,000 People in US States, 2020.” The box represents the middle 50% of the data, ranging from about 3 to 6 murders per 100,000 people. A horizontal line within the box marks the median near 4. The whiskers extend to the minimum around 1.5 and the maximum near 10, with a single outlier above 12. Labeled arrows identify the minimum, 1st quartile, median, 3rd quartile, maximum (except outliers), and outlier.

Boxplot 2: outliers shown.

The mean of a variable is not an essential part of a boxplot though it is sometimes added, marked with a star or similar. The width of the boxplot does not affect the interpretation.



In this boxplot, the outliers (values that are 1.5×IQR above the upper quartile) are displayed as points beyond the upper whisker. It is clear from the box that most states have a small number of marriages per 1000 residents (between 5.8 and 7.5) and that one particular state is largely responsible for the heavy right skew. It probably won't be a surprise that this state is Nevada. There were nearly 41 marriages per 1000 people there in 2020. The other two outlier states are Arkansas (10.7) and Hawaii (17.9).

Horizontal box plot titled “Marriages per 1,000 Residents in US States, 2020.” The box is narrow and positioned near the lower end of the scale, around 5 to 8 marriages per 1,000 people. Several individual points appear to the right, representing outliers around 10, 20, and 40. The chart shows that most states have similar marriage rates, with a few states reporting much higher rates.




The boxplot shows the distribution of the land area of US states in acres. Notice the two outliers shown as points beyond the upper whisker. Since Alaska was excluded from the reporting, the further outlier is Texas and California is the oulier just beyond the end of the whisker.

Even without the outliers, the distribution is somewhat right skewed - the right tail is longer than the left and the line denoting the median is left of center in the box. The width of the box itself is the IQR.

Box plot titled “Area of States, AK, HI Excluded.” The x-axis is labeled “Thousands of Acres.” The box, shaded red, represents the middle 50% of state areas, roughly from 30,000 to 60,000 thousand acres, with a median line near 45,000. Whiskers extend from near 10,000 to 90,000 thousand acres, and two outliers appear beyond 100,000 thousand acres.

Constructing A Boxplot

To construct a boxplot,

  1. Compute the minimum, lower quartile, median, upper quartile, and maximum of the data (the five number summary).
  2. Draw a box that extends from the lower to the upper quartile.
  3. Draw a line through the box at the median.
  4. Draw whiskers that extend from the box to the minimum and maximum.

The five number summary consists of the minimum, lower quartile, median, upper quarlier, and maximum of the data.


Boxplots and Histograms

Boxplots and histograms are both graphical methods for displaying numeric data and some of the same information can be obtained from either. Boxplots do not show as much detail as histograms do, but they give a quick visualization of the spread of the data.



Box plot titled “Area of States, AK, HI Excluded.” The x-axis is labeled “Thousands of Acres.” The box, shaded red, represents the middle 50% of state areas, roughly from 30,000 to 60,000 thousand acres, with a median line near 45,000. Whiskers extend from near 10,000 to 90,000 thousand acres, and two outliers appear beyond 100,000 thousand acres. Histogram titled “Area of States.” The x-axis is labeled “Thousands of Acres.” Most bars are concentrated between 0 and 75,000 thousand acres, with the tallest bar around 30,000 to 40,000. The frequency decreases as area increases, and one small bar appears near 150,000 thousand acres, representing a large outlier state. The distribution is right-skewed, showing that most states have smaller land areas while a few are much larger.


Use the applet to compare histograms and boxplots. Drag the mouse across the upper canvas to create a histogram. A boxplot of the same distribution will be shown in the canvas below. Choose whether to display outliers and separate points. If the show mean box is selected, the mean will be shown on the boxplot as a green x.

Boxplots are useful for comparing distributions of multiple variables.



From the boxplots, we don't see a lot of detail about the distributions, however, from the display it is easy to make some comparisons. We see that the lower quartile is about the same for the two distributions however, the median for the forest distribution is much higher than the median for the crop distribution; in fact, the forest median is higher than the upper quartile for the crop distribution.

Box plot titled “Land Use, Percentage.” Two boxes compare the percentage of land used for crops and forests. The crop box, in tan, shows most values between about 5% and 25%, with several outliers above 40%. The forest box, in green, ranges roughly from 10% to 60%, with a median near 35% and a wider spread. The plot shows that forest land use tends to cover a higher percentage than cropland.