HW 10 - Weiping

Homework of Using Statistical Software on the Web

Rweb

Use of the JavaScript Version of Rweb

I started at http://www.math.montana.edu/Rweb/, then clicked on "JavaScript Version of Rweb".
I scrolled down on this page until I saw the Open Code Window button. After I clicked this button, a new window appeared. I enlarged this window to see all of its features and controls. The last line in this window read "Last Modified: ..."

In the field Enter a dataset URL, enter

http://www.math.usu.edu/~vukasino/teaching/spring2000/complab/student_data1.prn 

Problem 1:

Type the following lines in the big input area:

median(X[,7])
median(X[,8])
var(X[,7])
var(X[,8])
cor(X[,7], X[,8])
plot(X[,7], X[,8], main="Height vs Weight")

X is the data matrix that is created from the data stored at the given URL. Column 7 is Height and column 8 is Weight. median(X[,7])and median(X[,8]), var(X[,7]) and var(X[,8])provide us with the medians and variance of Height and Weight. cor(X[,7], X[,8]) compute the correlation between Height and Weight. plot(X[,7], X[,8], main="Height vs Weight") create a scatterplot of the Height (horizontal) vs Weight (vertical).

Now click on Submit. It takes a few seconds to calculate the results. Three new windows are available.

One Window (without a name) contains the input and output produced when executing R. Here you see which commands have been executed internally, possible warnings, and error messages.

The window entitled "Rweb Analysis Output" contains
```
      

      [1] 69
      [1] 155
      [1] 15.86711
      [1] 832.5039
      [1] 0.5379353
```
which are the arithmetic medians and variances for Height and Weight, respectively, and correlation between them.
The window entitled "Rweb images" should contain the scatterplot of Height vs. Weight.

Problem 2:

Type the following lines in a new input area:

result <- lm(X[,8]~X[,3])

summary(result)

In the matrix X, column 8 is Weight and column 3 is Age. lm(X[,8]~X[,3]) calculate a simple linear regression of Weight on Age. Assign the outcome of this calculation to a new variable result. summary(result) produces the visible output .

Then click on Submit. The output window has the following output:

  
Call:
lm(formula = X[, 8] ~ X[, 3])

Residuals:
    Min      1Q  Median      3Q     Max 
-49.500 -14.462  -1.769   9.904  84.308 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   15.231     38.134   0.399 0.691668
X[, 3]         6.269      1.645   3.812 0.000455

Residual standard error: 25.09 on 41 degrees of freedom
Multiple R-Squared: 0.2617,	Adjusted R-squared: 0.2437 
F-statistic: 14.53 on 1 and 41 degrees of freedom,	p-value: 0.0004551 

Since the P-value is much less than 0.05, we conclude that this linear model Y=15.231 + 6.269X + e is quite significant.

Rice virtual lab in statistics - data analysis

First open http://www.ruf.rice.edu/~lane/rvls.html, and then click on "Analysis Lab" to activate the program. When the page is fully loaded, click on the "Analyze" button.

Using user-entered datasets

By clicking on Enter/Edit User Data a clipboard will appear. Now open the following file: http://www.math.usu.edu/~vukasino/teaching/spring2000/complab/student_data1.prn.

However, we cannot use this data directly, because it contains non-numerical values. The Data Analysis Lab requires all data to be numeric. So first we have to recode the nonnumerical variables (Gender, Eyecolor, Major) and replace them by the coding integer values, ranging from 1 to the number of levels of the variable.

To recode the data, open the file in a wordpad that has a "Replace" function and recode the non-numerical variables as follows:

Gender:

1 - male
2 - female

Eyecolor:

1 - blue
2 - brown
3 - green

Major:

1 - Biology
2 - CompSci
3 - Chemistry
4 - Health
5 - Other

After recoding the data, while still in the text editor, select "Select all" from the "Edit" menu and then "Copy". Then change to the Data Analysis Clipboard window and paste the data (by clicking on "Paste" in the "Data" menu at the top of the window). If this does not work, paste the data by pressing "Control" and "v" key at the same time. Do not delete the first row which contains the names of the variables. Click on Accept Data. The clipboard window should disappear.

When the data are accepted, the names of the variables will appear in the window on the right. In the selection menu on the left, we can choose any of the variables as dependent or predictor variable. However, only "Gender", "Eyecolor", and "Major" are listed as possible grouping variables. Actually, all variables that have integer values from one to (any) number of levels without any gaps will be listed as grouping variables.

Part 1:

Now choose "Height" as dependent variable. Click on Descriptive window to obtain summary statistics for Age. Click on Box Plot, Histogram and Stem-and-Leaf windows to get box plot, histogram, and stem-leaf graphs, respectively. In the window with a histogram, note that the applet allows some interactivity, i.e., you can change the bin width and the lower limit of the first bin. The histogram applet automatically provides auxiliary graphs of the cross-validation functions that estimate the quality of the currently displayed histogram. Smaller values of the cross-validation function generally imply smaller errors in the approximation.

Then go back to the selection menu and select "Weight" as dependent variable, and perform the same procedures as we did for "Height" above.

Part 2:

Go back to the selection menu. Select "Weight" as dependent variable and "Age" as predictor variable. Click on Correlation/regression window to perform a regression analysis of "Weight" on "Age". A window with a scatterplot will appear. We can choose whether or not to show the regression line and the numerical results of the regression analysis. The result is the same as in the problem 2 of using Rweb.

Part 3:

Again, go back to the selection menu. Select "Weight" as dependent variable and select "Eyecolor" as grouping variable. Note that all the windows change and now offer separate analyses for each level of the grouping variable. In the ANOVA window, it says ANOVA with IV "Eyecolor" and DV "Weight". After clicking on this widow, an ANOVA table appears. However, this table is blank, without any input. I have tried several times, but the output table is still empty. I asked Natascha about this. She said it would work on advanced browser.

Webstat

First open Webstat at http://www.stat.sc.edu/webstat/, and hit the orange button: Click here to fire it up. The Webstat 2.0 window appears. Click on Data button in the main menu, select Sample data sets, and then choose Labor Force. The data fill in the first 3 columns of the table immediately.

Part 1:

The first column (variable) is the name of the 19 cities. The second and third columns are the measurements of labor force for each city in 1972 and 1968, respectively.

Part 2:

Click on Stat in the main menu, select T statistics, and choose Paired. Then the Paired T window appears. Select 1972 as var1 and 1968 as var2. We can click Save the differences if we want. Click on the Next button, and then select the Hypothesis Test. It shows the null: mean difference = 0, and the alternative: not equal. We can do some interactivity, i.e. the mean difference can be changed. If we set the null: mean difference = 0, and click the Calculate button, the following output appears in the Results window.

Two tailed Paired T-test results:

Difference Delta0 Estimate Std. Err. DF 1972 - 1968 0 0.03368421 0.013705561 18 Difference Tstat Pval 1972 - 1968 2.4577038 0.0244 Differences stored in column var4.

From the results, P-value = 0.0244, less than 0.05, so we reject the null hypothesis and conclude that the Labor Forces in 1972 and 1968 are significantly different.

Part 3:

Click on Stat in the main menu again, and select Summary Stats. In the Summary Statistics window, select 1972 and 1968 as variables, and click on Next button. Click all the statistics and click on the Calculate button. The following results appear in the Results window.

Summary Statistics:

Variable n Mean Variance Std. Dev. Median 1972 19 0.5268421 0.005011696 0.07079333 0.53 1968 19 0.4931579 0.004622807 0.06799123 0.5 Variable Range Min Max Q1 Q3 1972 0.29 0.35 0.64 0.49 0.57 1968 0.29 0.34 0.63 0.45 0.54

Part 4:

Click on Graphics in the main menu, and select Pie Chart. In the Pie Chart window, select variable 1972, and click on the Next button. We can type the name of the chart title in the Labels window, and click on the Next button. In the Colors window, we can select the colors of background and foreground, and then click on the Create Graph button. The colorful pie chart appears in the Graphics window. We can also obtain the Pie Chart for variable 1968, using the similar steps.

Click on Graphics in the main menu, and select Histogram. In the Histogram window, select 1972 and 1968 as the variables, and click the Next button. We can change the starting point and class width of the histogram, and then click the Next button. In the labels window, we can type the labels for the title, x-axis and y-axis, and then click the Next button. In the Colors window, the colors of background and foreground can be reset. Click the Create Graph button, and then Histogram of 1972 and 1968 are shown in the same Graphics window. Using the same steps, but selecting Boxplot instead of Histogram, we can get the Boxplot of 1972 and 1968.

I think the Pie Chart, Histogram and Boxplot display the data best.

Statlet

Start at http://www.statlets.com/statletsindex.htm, and select Internet Access button. Click on Menu Version in the Version 1.1B. The STATLETS-Data statlet window is open.

1. Load the data.

Let's load data into this window. Click on the data file in the lecture page of Statlet. Edit, select all, and copy the data. Then go to STATLETS-Data statlet window. Click Clipboard, Click inside the clipboard and paste the data. The data is now in the clipboard. Click Data, and select Clip-in. Then the data fill in the table in the Data statlet window.

According to the problem, in this data table, find out 103 in the column that represent height, and replace it by 67.

2. Find the 95% confidence interval of height.

Click on Analyze in the main menu, select One Sample, and choose One Variable Analysis. The STATLETS-One Variable Analysis 2 window is open. Select variable Height as the sample data and click on the t-test button. Then the following output appears.

Estimation of Population Mean for height

Sample size = 45 Mean = 69.5111 95.0% confidence interval for mean: 69.5111 +/- 1.24635 [68.2648,70.7575]

3. Do a t-test on H₀ : m=68''.

In the output window we have got in part 2, click on the Options button. In the Hypothesis Test Options window, change 0.0 to 68.0 as the null hypothesis. Alt. hypothesis is still not equal and alpha is 5.0%. Then click on OK. We have the following output:

Estimation of Population Mean for height

Sample size = 45 Mean = 69.5111 95.0% confidence interval for mean: 69.5111 +/- 1.24635 [68.2648,70.7575] t-test ------ Null hypothesis: mean = 68.0 Alt. hypothesis: not equal Computed t-statistic = 2.44349 P-value = 0.0186264 Reject the null hypothesis for alpha = 0.05 Statistical Interpreter ----------------------- This table displays the result of a t-test performed to test the null hypothesis that the mean of the population from which the sample data come equals 68.0 versus the alternative hypothesis that the mean is not equal to 68.0. Since the P-value for this test is less than 0.05, we can reject the null hypothesis at the 95.0% confidence level. Also shown is a 95.0% confidence interval for the population mean. In repeated sampling, 95.0% of all such intervals will contain the true mean. Since the 95% confidence interval is [68.2648, 70.7575], and 68 is outside this interval, so we reject the null hypothesis at 5% level.
4. For what value of m would we not reject at 5%?

Since the 95% confidence interval is [68.2648, 70.7575], for any number within this interval as m we would not reject at 5%.

XploRe

First, we must download XploRe package from web page at http://www.XploRe-stat.de/.

Click the XploRe icon, and then the XploRe window is open. Click on Program button, and select New. Now, a new editor window is open. By clicking http://www.math.usu.edu/~vukasino/teaching/spring2000/complab/student_data1.prn , open the data . Copy the data to the new editor window in the XploRe. We should delete the first line that contains the names of the variables. The data is in the matrix form. Then save the data as user.dat .

Let's open a new editor window by clicking on Program and select New again. Since this data set includes both Numerical and Text Data, we use READM to read the mixed data set.

Type the following lines into the new editor area:

library("XploRe") x = readm("user") x

Click on Execute to get the output. In the output-window, 'x.type', 'x.double' and 'x.text 'are shown. The original data set was divided into two parts. The numerical data are double type, and the character data are text type. The 'x.double' has 5 variables in order: nr, age, sibling, height, weight.

Problem 1. Click on Program and select New to open a new editor window. Type the following lines in this new editor window.

library("xplore") x = readm("user") z = x.double[ , 3] library ("plot") plothist(z) plotbox(z)

The first two command lines are to read the mixed data. z = x.double[ , 3] means to select the third column (sibling) in the numerical data matrix x.double, and assign it to vector z. Load the library "plot", and use the function plothist and plotbox to get the histgram and boxplot of sibling.

Click Execute button. Histogram and boxplot of sibling are shown in the display windows.

We can write the similar program to plot the histgram and boxplot of weight. Weight is the fifth column in the matrix x.double. Open a new editor window and type the following command lines in it.

library("xplore") x = readm("user") u = x.double[ , 5] library ("plot") plothist(u) plotbox(u)

Click Execute button. Histogram and boxplot of weight are shown in the display windows.

Problem 2. Open a new editor window and type the following lines in it.

library("XploRe") x = readm("user") y = x.double[ , 4|5] y1 = y[ , 1] y2 = y[ , 2] library("stats") {b, bse, bstan, bpval} = linreg(y1, y2) library("plot") regy=grlinreg(y) plot(y, regy)

The first two lines read the data. y = x.double[ , 4|5] means to select the fourth column (height) and fifth column (weight) in the matrix x.double, and assign the two columns to matrix y. y1 = y[ , 1], y2 = y[ , 2] assign the first column (height) and second column (weight) of matrix y to vectors y1 and y2 respectively. Load the library "stats", and perform the linear regression. The function linreg(y1, y2) is to perform a linear regression of y2 (weight) on y1 (height). y1 is the predictor variable, and y2 is the dependent variable. {b, bse, bstan, bpval} = linreg(y1, y2) assign the outcomes of the linear regression of y2 on y1 to {b, bse, bstan, bpval}. Here b represents the estimates of the coefficients, bse is standard error, bstan is standardized b, and bpval is the p-value. To plot the scatterplot and regression line, we should load library "plot" first. regy=grlinreg(y)means to assign the regression line of y2 on y1 to regy. plot(y, regy) is to plot the scatterplot of matrix y and the regression line regy.

Click Execute, we have the following output.





Contents of out



[ 1,] ""
[ 2,] "A  N  O  V  A                   SS      df     MSS  F-test P-value"
[ 3,] "_________________________________________________________________________"
[ 4,] "Regression                 10118.023     1 10118.023 16.696 0.0002"
[ 5,] "Residuals                  24847.140    41   606.028"
[ 6,] "Total Variation            34965.163    42   832.504"
[ 7,] ""
[ 8,] "Multiple R      = 0.53794"
[ 9,] "R^2             = 0.28937"
[10,] "Adjusted R^2    = 0.27204"
[11,] "Standard Error  = 24.61763"
[12,] ""
[13,] ""
[14,] "PARAMETERS         Beta         SE         StandB   t-test   P-value"
[15,] "________________________________________________________________________"
[16,] "b[ 0,]=       -109.4509      66.0171       0.0000    -1.658  0.1049"
[17,] "b[ 1,]=          3.8965       0.9536       0.5379    4.086   0.0002"



The p-value 0.0002 is much less than 0.05, so we conclude that this linear model is quite siginificant.
The regression equation is Y = -109.4509 + 3.8965 X + e. 
The scatterplot and regression line are shown in the plot display window.


Weiping Deng
5/30/00