Dataframes
In addition to reading in your own data sets, R has many datasets already built into it when you download the program. We will be using the dataset called iris for these examples. You can run the R code below to see what the iris dataset looks like.
# no pec
iris
After running the code, you can see that the iris dataset has 150 observations (rows) and five variables (columns).
names() function
When doing expoloratory data analysis, we usually only want to look at a particular column of data in the dataset. By using the names() function, we can see the names of all of the variables in that dataset (or the column titles).
# no pec
names(iris)
The reason why this function is so important is because if we ever need to reference one of these columns of data, we need to type the name of the column exactly like it is shown here in the names() function's output (R is case-sensitive).
str() function
You can also look at the structure of the dataset and all of its variables by using the str() function.
# no pec
str(iris)
The first line of output tells us that the iris dataset is considered a 'data.frame' in R and has 150 observations and five variables. After that, it gives information about each of the variables. The length and width variables are all numeric and the Species variable is a factor (categorical) variable with three levels (categories).
'$' notation
To only access one column of the dataset, we can use the '$' operator.
# no pec
iris$Petal.Length