HW 10 - Ann

Homework from the statistical packages

HW 10: Homework from the statistical packages

1) R

I carried out the analysis in both the point-and-click modules version of R and the code driven version of Rweb, and found the results to be the same.

Exercise 3: Statistics for height and weight

a) Summary statistics:


Height          Weight      
Median :69.00   Median :155.0 
Std Dev: 3.98   Std Dev: 28.9

b) Correlation between height and weight:


0.5379353

c) Scatterplot:

Excercise 4: Regression of weight and age


Coefficients: 

            Estimate  Std. Error t value Pr(>|t|)
(Intercept)   15.231     38.134   0.399 0.691668 
Age            6.269      1.645   3.812 0.000455 






Residual standard error: 25.09 on 41 degrees of freedom 
Multiple R-Squared: 0.2617, 
Adjusted R-squared: 0.2437  
F-statistic: 14.53 on 1 and 41 degrees of freedom,
p-value: 0.0004551

2) Rice virtual lab in statistics:

I performed the exercise 2 as the homework for this package, however none of the output windows let you copy their contents so I report only briefly the results:


a) Weight- mean 159.86, median 155.00, sd 28.85
   Height- mean  69.116, median 69.00, sd  3.983

b) The predictor variable is age and the dependent variable is weight.  R equals 0.5115.

c) Eyecolor does not have a significant influence on weight (F= 1.00994, df=42, p= 0.3733)

3) Master solution to Webstats homework

To complete the homework, it was necessary to use the data set on Labor force available as a sample data set in the site under ‘data’, ‘sample data sets’, ‘labor force’.

To provide a brief statement on what the data set is measuring, we need to look at the file describing the data set on the help/documentation page. We would then find that the data set measures the labor force participation rate of women for 19 cities and two years: 1968 and 1972.

To analyze for differences between 1968 and 1972 in the measured variable we could use a paired t-test. This can be done by clicking ‘stat’, ‘t-test’, ‘paired’. We would generate the following results, and conclude that there was a significant difference (at alpha = 0.05) in the labor force participation rate of women living in the 19 cities between 1968 and 1972 (p=0.0244). The participation rate was higher in 1972 (mean 0.53) than in 1968 (mean 0.49).


Difference    Delta0   Estimate      Std. Err.     DF
1968 - 1972   0        -0.03368421   0.013705561   18

Difference    Tstat        Pval
1968 - 1972   -2.4577038   0.0244

The summary statistics of interest can be generated by clicking ‘stat’, ‘summary statistics’:

Variable   n    Mean        Variance      Std. Dev.    Median
1968       19   0.4931579   0.004622807   0.06799123   0.5
1972       19   0.5268421   0.005011696   0.07079333   0.53

Variable   Range   Min    Max    Q1     Q3
1968       0.29    0.34   0.63   0.45   0.54
1972       0.29    0.35   0.64   0.49   0.57

I decided a means plot demonstrated nicely the difference in the participation rate of women in the labor force between the two years, but many graphics could have been used here.

4) Statlets:

The 95% confidence interval is 68.2648, 70.7575.
We reject the null hypothesis at 5% as p=0.0186.
We know this to be the case as 68 does not fall within our confidence interval.
A value of m between 68.2648 and 70.7575 would lead us to not reject at 5%.

5) XploRe

1) Histogram and Boxplot:

Sibling:

library("XploRe")
x = readm("user")
z = x.double[ , 3]
library ("plot")
plothist(z)
plotbox(z)

Weight:

library("XploRe")
x = readm("user")
z = x.double[ , 5]
library ("plot")
plothist(z)
plotbox(z)

2) Regression:

library("XploRe")
x = readm("user")
y = x.double[ , 4|5]
y1 = y[ , 1]
y2 = y[ , 2]
library("stats")
{b, bse, bstan, bpval} = linreg(y1, y2)
library("plot")
plot(y)
regy=grlinreg(y)
plot(y, regy)

Output of regression:

[ 2,] "A N O V A SS df MSS F-test P-value"  
[ 4,] "Regression 10118.023 1 10118.023 16.696 0.0002"
[ 5,] "Residuals 24847.140 41 606.028"
[ 6,] "Total Variation 34965.163 42 832.504"
[ 8,] "Multiple R = 0.53794"
[ 9,] "R^2 = 0.28937"
[10,] "Adjusted R^2 = 0.27204"
[11,] "Standard Error = 24.61763"