Stat 2000, Section 001, Homework Assignment 2 (15 Points)
(1/20/2017 - Due Friday 1/27/2017 by 10:20am)
- 0) Reading: Section 1.2
- 1) Please work on the following textbook exercises in Moore/McCabe/Craig:
Try to use a computer and software of your choice whenever possible.
Include printouts, screendumps, or photos of your results.
Note that StatCrunch does not allow you to produce back-to-back stemplots. Those have to be created by hand or by some alternative software.
- 2) Histogram vs Time Plot:
The file
http://www.math.usu.edu/~symanzik/teaching/2017_stat2000/Yellowstone_2001_2010.csv
was created from data accessible at the US National Park Service (NPS) web site at
http://www.nature.nps.gov/stats/park.cfm
(this URL no longer exists).
This file contains the total monthly number of visitors at Yellowstone National Park (NP) over a 10-year period
(i.e., 120 months) from January 2001 (MonthSeq = 1) through December 2010 (MonthSeq = 120).
Try to use a computer and software of your choice whenever possible.
Include printouts, screendumps, or photos of your results.
This data file is in csv format. There exist different ways how to separate the
data in such files, e.g., using commas, white space, or tabs. Here, commas are used.
To load these data correctly into StatCrunch, you have to change the "Delimiter"
to "comma" on the "Load data from WWW address" page.
(a) Make a histogram of the number of "Visitors" and describe the pattern and
any striking deviations that you see. Create meaningful titles and labels!
(b) Make a time plot of the number of visitors and describe its pattern
in terms of long-term trend and seasonal variation (if any of these is
applicable).
(c) Which of these two plots is better suited to explain the
number of visitors at Yellowstone NP from 2001 through 2010?
Justify your answer.
- 3) Facebook Friends:
The file
http://www.math.usu.edu/~symanzik/teaching/2017_stat2000/Survey_S2016_Population.xlsx
contains in-class survey data for the students in a previous Stat 2000 class. For this question, we are only interested
in the number of Facebook friends (FBFriends).
It is your task do determine whether men from that class tended to have more Facebook friends
than women (or at least, whether they may have claimed that they have). You should do this as follows:
- Create at least two relevant plots for each gender (four plots overall) and discuss the
shape, center, and spread of the data, based on these plots.
Make sure that all plots have titles, that axes are labeled, and so on.
- Create relevant numerical summaries. Justify which measure for the
center (mean or median) might be better suited here.
- Discuss whether there are outliers. Do this manually, using the numerical
summaries and see whether your software automatically displays outliers or whether
you need to change some settings to display outliers (if any exist).
- Write a computer-based conclusive report of your results (a total of two to three pages,
including all figures).
- 4) Boxplot of the Number of Siblings:
The boxplot below shows the distribution of the number
of siblings for the students in a previous Stat 2000 class. The count includes
the student, i.e., a value of ``1'' represents a student with no
additional siblings.
The graph shows the default
boxplot produced from StatCrunch, i.e., the option
``Use fences to identify outliers'' has not been set.
This means that outliers (in case they exist) have not
been marked in the usual way. All relevant numbers
shown in the boxplot are integers, i.e., whole numbers
such as 2, 7, or 9 (but not 1.9, 7.1, or 8.95).
Answer the questions below, based on the boxplot above.
-
Indicate the values for the five number summary for the siblings and clearly name each of these values
(e.g., if variance is one of these numbers than indicate ``variance = ...'').
Make sure to list the names and values in the correct order as the first value
always represents the ..., etc.
1.)
2.)
3.)
4.)
5.)
-
First indicate the formula, and then calculate the interquartile range (IQR).
-
What is (approximately) the percentage of
students that have fewer than 3 siblings (including themselves)?
Answer: about _________ %
-
What is (approximately) the percentage of
students that have between 3 and 5 siblings (including themselves)?
Answer: about _________ %
-
When we compare the mean with the median, (i) the mean will be higher than the median,
(ii) both will be about the same, or (iii) the median will be higher than the mean.
Just circle the correct answer.
-
As indicated above, no fences have been used to identify outliers.
So, from this graphical representation, it is not clear whether outliers
exist in this data set.
Do the necessary calculation to determine
whether there are any outliers on the high side and circle ``Yes'' / ``No''
where ``Yes'' means that there are outliers on the high side in this data set.
If ``Yes'', what is the smallest number of siblings
that would be an outlier on the high side?
Calculation:
Answer: _________ (or more) siblings would be an outlier on the high side
Note:
The same requirements hold as for Homework Assignment 1. See
http://www.math.usu.edu/~symanzik/teaching/2017_stat2000/hw01.html
for details. As warned previously, you will start losing points if your homework submission is not stapled
and/or you list an incomplete name or forget your A number!