Graphical Summary - Numerical Data
This page will only include very basic visualizations. More complex and customizable graphics can be made using the ggplot2 package along with others.Histograms
The hist() function will produce a histogram of the provided numerical data. A few of the additional arguments for hist() are listed below along with their use:- main Using "", it provides the overall title for the plot.
- xlab Using "", it provides the label for the x-axis.
- ylab Using "", it provides the label for the y-axis.
- xlim Using c(), it changes the lower and upper limits of the x-axis.
- ylim Using c(), it changes the lower and upper limits of the y-axis.
- breaks Specifies a suggested number of bins in the histogram. R will do its best to match the number specified.
- labels If set to TRUE, labels will be drawn on top of the bars in the histogram.
- freq If set to TRUE, the histogram will represent frequencies. If set to FALSE, the histogram will represent probability densities.
- col Changes the colors of the bars in the histogram.
Example: Using the iris dataset from R
If you want to produce histograms of the Petal.Length variable but separated by Species, separate histograms must be produced with subsetted data (using logical operators and '[ ]' notation).
# no pec
# Basic Histogram of Petal.Length
hist(iris$Petal.Length)
# Customized Histogram with Frequencies
hist(iris$Petal.Length,
main = "Histogram of Iris Petal Lengths (Frequency)",
xlab = "Petal Length",
xlim = c(0, 8),
ylim = c(0, 60),
breaks = 8,
labels = TRUE,
col = "skyblue")
# Customized Histogram with Probability Densities
hist(iris$Petal.Length,
main = "Histogram of Iris Petal Lengths (Density)",
xlab = "Petal Length",
xlim = c(0, 8),
ylim = c(0, 1),
breaks = 8,
labels = TRUE,
freq = FALSE,
col = "tomato")
Example: Using the iris dataset from R
# no pec
# Histograms for the Petal Length variable
hist(iris$Petal.Length,
main = "Histogram of Petal Lengths",
xlab = "Petal Length",
xlim = c(1, 7),
ylim = c(0, 40))
# Now separated between the different species
hist(iris$Petal.Length[iris$Species == "setosa"],
main = "Histogram of Petal Lengths (Setosa)",
xlab = "Petal Length",
xlim = c(1, 7),
ylim = c(0, 40),
breaks = 2,
col = "lightseagreen")
hist(iris$Petal.Length[iris$Species == "versicolor"],
main = "Histogram of Petal Lengths (Versicolor)",
xlab = "Petal Length",
xlim = c(1, 7),
ylim = c(0, 40),
col = "mediumseagreen")
hist(iris$Petal.Length[iris$Species == "virginica"],
main = "Histogram of Petal Lengths (Virginica)",
xlab = "Petal Length",
xlim = c(1, 7),
ylim = c(0, 40),
col = "darkseagreen")
Video Tutorial:
Boxplots
The boxplot() function will produce a boxplot of the provided numerical data. A few of the additional arguments for boxplot() are listed below along with their use:- main Using "", it provides the overall title for the plot.
- xlab Using "", it provides the label for the x-axis.
- ylab Using "", it provides the label for the y-axis.
- names Using c(), it provides the labels for the individual boxes in the plot.
- ylim Using c(), it changes the lower and upper limits of the y-axis.
- range The whiskers will extend to the most extreme value that is no more than the range value times the IQR.
(Default is 1.5, if set to 0, whiskers extend to most extreme values.) - col Changes the colors of the boxes in the boxplot.
Example: Using the iris dataset from R
# no pec
# Basic Boxplot of Petal.Length
boxplot(iris$Petal.Length)
# Customized Boxplot
boxplot(iris$Petal.Length,
main = "Boxplot of Iris Petal Lengths",
ylab = "Petal Length",
ylim = c(0, 8),
range = 0.5,
col = "skyblue")
If you want to produce a boxplot of the Petal.Length variable but separated by Species, you will use '~' notation.
Example: Using the iris dataset from R
# no pec
# Basic Boxplot separated by Species
boxplot(iris$Petal.Length ~ iris$Species)
# Customized Boxplot of Petal Length separated by Species
boxplot(iris$Petal.Length ~ iris$Species,
main = "Boxplot of Iris Petal Lengths",
xlab = "Species",
ylab = "Petal Length",
names = c("Setosa", "Versicolor", "Virginica"),
ylim = c(0, 8),
col = c("lightseagreen", "mediumseagreen", "darkseagreen"))
When using '~' notation, the numerical variable comes first followed by the categorical variable that will be used to group the numerical variable.