Stat 2000, Section 001, Homework Assignment 5 (15 Points)
(2/16/2017 - Due Friday 2/24/2017 by 10:20am)
- 0) Reading: Section 2.4, 2.5 & 2.6
- 1) [5 points] Please work on the following textbook exercises in Moore/McCabe/Craig:
- Exercise 2.102, 2.103, 2.113, 2.121, 2.125, 2.139, 2.140, 2.141, 2.142, 2.143
Data: http://www.math.usu.edu/~symanzik/teaching/2017_stat2000/infantgrowth.xls
Data: http://www.math.usu.edu/~symanzik/teaching/2017_stat2000/sleep.xls
Data: http://www.math.usu.edu/~symanzik/teaching/2017_stat2000/sleep_StatCrunch.csv
Data: http://www.math.usu.edu/~symanzik/teaching/2017_stat2000/uscollegestudents.xls
Data: http://www.math.usu.edu/~symanzik/teaching/2017_stat2000/uscollegestudents_StatCrunch.csv
Note: File names that contain the word StatCrunch contain the raw data for use in StatCrunch.
Similar file names without the word StatCrunch show the numbers for hand calculations.
When you load csv files into StatCrunch, you may have to adjust the "Delimiter" field
as commas, tabs, or whitespaces may be used in a data file. If the default delimiter does not
read in the data in a meaningful way, try something else (Hint: csv stands for comma separated values -
so guess what is frequently being used as a delimiter in such files.)
- 2) [10 points] Cereals and Calories:
The file
http://www.math.usu.edu/~symanzik/teaching/2017_stat2000/Cereals.csv
contains data for 77 different breakfast cereals. It was originally posted on StatCrunch
by butler@utsc.utoronto.ca on June 11, 2014.
For this exercise, we are primarily interested
in the question how calories in cereals are affected by five other variables,
in particular fat, sugar, vitamins, potassium, and rating.
You should do this as follows:
- Calculate the correlations between calories and the five other variables of interest.
Which are strongest and which are weaker? What are the directions?
- Create the scatterplots for calories (the response variable) and the five other variables of interest.
Based on these five plots, can we fit a regression line in each case? Justify your answer.
- For those plots where a regression line is meaningful, fit the regression line
and construct at least three meaningful residual plots for each of the lines you have fitted.
- Use your lines to make the following predictions of calories, using
2.5, 4, and 15 for fat;
5, 12.5, and 50 for sugars;
100, 250, and 500 for potassium;
25, 60, and 120 for vitamins; and
5, 30, 70, for rating.
Are all of these meaningful predictions? Be specific and use the proper statistical terms.
- Carefully assess your residual plots. Do these plots suggest that your fitted regression
lines is suitable to describe the relationship? Be specific and indicate which of the
residual plots suggest that your fitted regression line is meaningful and which residual
plots are not very informative.
- Provide an overall assessment how the five variables affect calories in cereals.
Which variables seem to have a strong effect on calories and which variables
seem to have a minor (possibly neglectable) effect? Think carefully about the
relationship between calories and rating!
- Write a computer-based conclusive report of your results (a total of four to five pages,
including all figures). Look at the solutions of HW 2, Exercise 3, how to structure such
a report. Just the computer output will result in at most 50% of the possible points
for this exercise.
Try to use statistical software of your choice (StatCrunch, Excel, etc.)
whenever possible. If you can't solve a problem part with the software
of your choice, then answer that part via manual calculations.