Any sufficiently advanced technology is equivalent to magic.

Top

Site Menu

Dataframes

In addition to reading in your own data sets, R has many datasets already built into it when you download the program. We will be using the dataset called iris for these examples. You can run the R code below to see what the iris dataset looks like.

# no pec iris

After running the code, you can see that the iris dataset has 150 observations (rows) and five variables (columns).

names() function

When doing expoloratory data analysis, we usually only want to look at a particular column of data in the dataset. By using the names() function, we can see the names of all of the variables in that dataset (or the column titles).

# no pec names(iris)

The reason why this function is so important is because if we ever need to reference one of these columns of data, we need to type the name of the column exactly like it is shown here in the names() function's output (R is case-sensitive).

str() function

You can also look at the structure of the dataset and all of its variables by using the str() function.

# no pec str(iris)

The first line of output tells us that the iris dataset is considered a 'data.frame' in R and has 150 observations and five variables. After that, it gives information about each of the variables. The length and width variables are all numeric and the Species variable is a factor (categorical) variable with three levels (categories).

'$' notation

To only access one column of the dataset, we can use the '$' operator.

This will give us just the data from the Petal.Length column in the iris dataset:

# no pec iris$Petal.Length

Video Tutorial: