Stat 5810, Applied Spatial Statistics

Project 2 (11/27/00)

30% of your course grade - Due Mon 12/11/00 5pm

In this project, you have to show that you can handle a full spatial data set. You should include a visualization, an exploration, and a modeling component, using ArcView/XGobi and S-Plus/SpatialStats.

You have to write a full report on your analysis and your results. Describe the techiques you are using, summarize meaningful results, and include useful graphics (please no more than 6 graphics on one printed page). Overall, the main part of your report must not exceed 20 pages. However, you should include printouts of S-Plus sessions, S-Plus code you have developed, intermediate graphics that lead you to a particular assumption, etc. into an appendix.

You should work in the same 2 groups as for Project 1.

Group A: Analyze the "South American climate" data presented in Bailey/Gatrell. Some particular questions are: Do you end up with the same clusters as presented in the book when using the grand tour in XGobi (do NOT include the PCA results into your exploration)? Are there any unusual sites, i.e., spatial outliers, for some of the variables? Then concentrate on the average annual temperature (and average annual precipitation) and use at least 3 different approaches to predict temperature (and precipitation) at any site in South America that is located under 200 metres above sea level (co-kriging would be an option but is not necessarily required).

Test your 3 approaches in the following way: Predict the value for each known location, calculate the residuals, and plot residual maps. Also, calculate the mean squared error and the maximum absolute error. Does one approach work better for temperature and another approach work better for precipitation?

Group B: Use the precision agriculture "PrecAg1" data set from Gotway/Hartford that is already available in ArcView/XGobi. Note that this data set contains many missing observations and could be addressed as spatially continuous data or as area data. Some particular questions are: Are there any unusual sites, i.e., spatial outliers, for yield or nitrate? Then use at least 3 different approaches to predict yield (and nitrate) at any location in the field (co-kriging would be an option but is not necessarily required).

Test your 3 approaches in the following way: Predict the value for each known location, calculate the residuals, and plot residual maps. Also, calculate the mean squared error and the maximum absolute error. Does one approach work better for yield and another approach work better for nitrate? Due to the lattice-like structure of the data, does median polish or the removal of any obvious outlier have any effect on the errors?

P.S.: If you find any additional literature on any of the 2 data sets, please provide me with a copy. Thanks.