Graphical Summary - Categorical Data
This page will only include very basic visualizations. More complex and customizable graphics can be made using the ggplot2 package along with others.Bar Charts
The barplot() function will produce a bar chart of the provided categorical data. However, it cannot accept a list of the category names as the input. This function needs a concise numerical summary of the categories and how many are within each group. To achieve this, we will use the summary() function in conjunction with barplot().
Example: Using the iris dataset from R
With each graphical summary, there are other customizations and options you can add. A few of the additional arguments for barplot() are listed below along with their use:
# no pec
barplot(summary(iris$Species))
- main Using "", it provides the overall title for the plot.
- xlab Using "", it provides the label for the x-axis.
- ylab Using "", it provides the label for the y-axis.
- ylim Using c(), it changes the lower and upper limits of the y-axis.
- names.arg Using c(), it changes the labels for the individual bars in the bar chart.
- col Changes the colors of the bars in the barplot.
Example: Using the iris dataset from R
# no pec
barplot(summary(iris$Species),
main = "Bar Chart of Iris Species",
xlab = "Species",
ylab = "Frequency",
names.arg = c("Setosa", "Versicolor", "Virginica"),
ylim = c(0, 60),
col = c("lightseagreen", "mediumseagreen", "darkseagreen"))
Example: Using the mtcars dataset from R (first used on Numerical Data Summary page)
# no pec
# Basic Bar Chart
barplot(summary(as.factor(mtcars$am)))
# Customized Bar Chart
barplot(summary(as.factor(mtcars$am)),
main = "Bar Chart of Car Transmissions",
xlab = "Transmission Type",
ylab = "Frequency",
names.arg = c("Automatic", "Manual"),
ylim = c(0, 25),
col = "#003056")
Changing the Order of the Categories
There are a few additional steps that need to be taken to change the order of the categories in a bar chart.
Example: Using the mtcars dataset from R (first used on Numerical Data Summary page)
Remember that the am variable was recorded as a "0" for automatic and "1" for manual transmission and to make it a categorical variable, we need to use the as.factor() function.
Notice in the output that R specifies the levels of the factor, Levels: 0 1. The order that these categories are in is very important. This is the order that R will use to plot the bar chart.
To change the order of the levels, we will use the factor() function and specify the levels in the order desired, in quotations and spelled exactly as they are listed in the previous example.
Remember that the am variable was recorded as a "0" for automatic and "1" for manual transmission and to make it a categorical variable, we need to use the as.factor() function.
# no pec
as.factor(mtcars$am)
Similarly to as.factor(), this function will also make the am variable a categorical variable.
None of the data has been changed but the order of the categories has been changed, Levels: 1 0.
# no pec
transmiss_order <- factor(mtcars$am, levels = c("1", "0"))
transmiss_order
Here is the same customized barplot from above but with the new order of categories.
The only parts of the code that have been changed are the data being given to the function (transmiss_order rather than as.factor(mtcars$am)) and the order of the labels in the names.arg argument (Don't forget to change the labels!).
The only parts of the code that have been changed are the data being given to the function (transmiss_order rather than as.factor(mtcars$am)) and the order of the labels in the names.arg argument (Don't forget to change the labels!).
transmiss_order <- factor(mtcars$am, levels = c("1", "0"))
# Customized Bar Chart with new order of categories.
barplot(summary(transmiss_order),
main = "Bar Chart of Car Transmissions",
xlab = "Transmission Type",
ylab = "Frequency",
names.arg = c("Manual", "Automatic"),
ylim = c(0, 25),
col = "#003056")
Colors in R
There are two main ways to specify which color(s) you want to use in R. You can refer to a color by its name or in terms of their hexadecimal number (#RRGGBB).A list of available colors by name is given here: R Colors
Video Tutorial:
Pie Charts
The pie() function will produce a pie chart of the provided categorical data. Like barplot(), it cannot accept a list of the category names as the input. We will use the summary() function again in conjunction with pie().A few of the additional arguments for pie() are listed below along with their use:
- main Using "", it provides the overall title for the plot.
- labels Using c(), it changes the names for each section in the pie chart.
- radius It changes the size of the chart's radius (accepts any number between -1 and 1).
- clockwise If set to TRUE, the categories will be displayed clockwise on the pie chart.
- col Changes the colors of the sections of the pie chart.
Example: Using the iris dataset from R
# no pec
# Basic Pie Chart
pie(summary(iris$Species))
# Customized Pie Chart
pie(summary(iris$Species),
main = "Pie Chart of Iris Species",
labels = c("Setosa", "Versicolor", "Virginica"),
radius = 0.5,
clockwise = TRUE,
col = c("lightseagreen", "mediumseagreen", "darkseagreen"))
Example: Using the mtcars dataset from R (first used on Numerical Data Summary page)
# no pec
# Basic Pie Chart
pie(summary(as.factor(mtcars$am)))
# Customized Pie Chart
pie(summary(as.factor(mtcars$am)),
main = "Pie Chart of Car Transmissions",
labels = c("Automatic", "Manual"),
radius = -0.75,
clockwise = FALSE,
col = c("#003056", "#8A8D8F"))