Spatial Statistics Stat 5810
            Project 1, Nov 2000
           Aaron, Ann, Cristina, Leslie, Raj

Group 1: Analysis of Multiple Types of Events

For information on how to extract the data, click here

INTRODUCTION

Here we present an example of analyzing a dataset that has two or more different types of 'events'. In this situation, we are interested in assessing the
relationship between the patterns of the types of events, and in particular, whether they are independent.

            For the example we have used the dataset presented in Bailey and Gatrell (see section 4.2; Analysis of Multiple Types of Events) regarding the locations of
            property theft offenses perpetrated by white and black people in Oklahoma City in the late 1970s. Thus, the different types of events here are the offenses
            committed by white people and those committed by black people.

The analyses that we present will help us determine whether white offenses and black offenses exhibit different spatial patterns and whether these patterns are related (i.e. dependent on each other) in any way. It is possible to imagine that the patterns may in fact be non-independent. The different types of events may be negatively correlated (ie. they exhibit repulsion) if, for example, whites and black residences are negatively correlated and the offenders commit crime close to home. Conversely, the events may be positively correlated (i.e.. they show attraction) if certain areas of the town are 'attractive' to thieves and thus both black and white thieves commit their crimes in similar areas.

VISUAL DISPLAY OF DATA

We first plotted the data to visually examine the distributions for patterns:

                                                                                                              POSITIONS OF BLACK AND WHITE OFFENSES
                                 Black offenses = red circles
                                 White offenses = yellow circles

BASIC ANALYSES: CHI SQUARE

It is possible to test for independence between the spatial distribution of two different types of events by determining whether each event occurred or did not
occur in each of a number of grids placed randomly or regularly over the region. This 'presence-absence' information can then be presented in a table as such:

Black offenses

Absence Presence

                                                                                          ________________________
                                                                                          |                             |                         |
                                                         Absence                |          c11            |          c12        |
                                                                                          |                             |                         |
            White offenses                                                 |____________ |___________|
                                                                                          |                             |                          |
                                                         Presence               |          c21            |        c22           |
                                                                                          |                             |                          |
                                                                                          | ____________|___________|

            We could then test this distribution against a chi-square distribution using the standard chi-square statistic. This test is inferior to other approaches in that
            it does not effectively use the data. It is easy to see that by using only presence/absence of offenses in the quadrats, we are losing data on the intensity or
            number of counts per quadrat. In addition, the size of the quadrat chosen can influence the analysis.

To see the code you would use to analyze the data with a chi-square test, go to the Chi-square code page.

MORE POWERFUL ANALYSES

Due to the problems with the chi-square analysis, we went on to use some more powerful tools.

When assessing multivariate data, the use of a set of nearest neighbor distribution functions (Gij(h)) can be illuminating. Gij(h) is the probability that the distance from a randomly chosen type i event (e.g. white offenses) to the nearest event of type j (e.g. black offenses) is less than or equal to h; h is the distance between a randomly chosen white crime (type i) event to the nearest black crime (type j) event. If the distributions of our two event types are independent, then the distribution of nearest neighbor distances to events of type j from an origin of measurement should be the same. Thus, we can compare our Gij(h) to Fj(h), where Fj(h) is the probability that the distance from a randomly chosen point to the nearest event of type j is less than or equal to h. This is done by plotting both Gij(h) and Fj(h) on the same plot against h. A similar line for each estimated distribution will indicate independence between our events types, whereas variations between the distributions will indicate non-independence.

F-HAT AND G-HAT ANALYSIS

Start Splus

library(splancs, first = T)

bkpts1 <- matrix(scan("/home/ssa9/stats/project1/blackdata"),147,5,byrow = T)
bkpts1[1,]
bkptsx <- bkpts1[,2]
bkptsy <- bkpts1[,3]
blkpts <- as.points(bkptsx, bkptsy)
whpts1 <- matrix(scan("/home/ssa9/stats/project1/whitedata"),104,5,byrow = T)
whpts1[1,]
whptsx <- whpts1[,2]
whptsy <- whpts1[,3]
whtpts <- as.points(whptsx, whptsy)
Description for the program 'findngh':
The program takes each row of obj1 and adds it as the first row to the matrix that otherwise contains obj2.
It then computes the distances between the points in the matrix using the 'dist' function. Since we are interested only in the distances from the first
row(containing x,y of obj1) to the other points, we filter the first 'nrow(obj2)' distances and compute the shortest distance using the 'min' function.
The result is added to a vector.
findngh <- function(obj1,obj2)
{
count <- 0
for (i in 1:nrow(obj1))
{
    xyb <- obj1[i,]
    xy <- c(xyb[1], obj2[,1], xyb[2], obj2[,2])
    xymat <- matrix(xy, nrow(obj2) + 1, 2)
    if (count == 0)
      {
      smalldist <- min(dist(xymat)[1:nrow(obj2)])
      count <- 1
      }
    else
      smalldist <- c(smalldist, min(dist(xymat)[1:nrow(obj2)]))
    }
    return(smalldist)
}

g12hat <- function(obj1,obj2,dist.ghat = all.dists)
{
all.dists <- findngh(obj1,obj2)
all.dists <- sort(all.dists)
n <- length(all.dists)
ghat <- 1:n
ghat <- ghat/n
plot(dist.ghat, ghat, xlab = "Distance", ylab = "Ghat")
return(cbind(dist = dist.ghat, Ghat = ghat))
}

par(mfrow = c(2,2))
gbw <- g12hat(blkpts,whtpts)
title ("Ghat: from black to white")
fw <- Fhat(whtpts)
pointmap(gbw, col = 3 , add = T)
## Points out of bounds X= 75.3923 Y= 0.986395
## Points out of bounds X= 77.8974 Y= 0.993197
## Points out of bounds X= 101.1385 Y= 1
## Warning messages:
## pointmap: plot type not square in: pointmap(gbw, col = 3, add = T)
title("Fhat vs. Gbwhat")

gwb <- g12hat(whtpts,blkpts)
title ("Ghat: from white to black")
fb <- Fhat (blkpts)
pointmap (gwb, col = 3, add = T)
## Points out of bounds X= 76.5506 Y= 0.990385
## Points out of bounds X= 94.8472 Y= 1
## Warning messages:
## pointmap: plot type not square in: pointmap(gwb, col = 3, add = T)
title ("Fhat vs. Gwbhat")

The output of this analysis should be:

BIVARIATE LHAT AND SIMULATION ENVELOPES

For a dataset based on spatial locations, analyses based on the K function can be more powerful than using an analysis based on nearest neighbor distances. This is because the L-hat plots (derived from K functions) i) show black and white crime patterns considered separately and how they depart from spatial randomness, and ii) show the tendency for black and white crimes to occur together (attraction- positive peaks in the plot), or further apart (repulsion- negative troughs in the plot).
Lets look briefly at the cross- K function:
Since Kii(h) = univariate K function for white crime (i), and
            Kjj(h) = univariate K function for black crime (j),
then    Kij(h) = the cross K function
When used for analyses, we can see that
              Lamba(j) * Kij(h) = E (# of black crimes less than some distance (h) from a white crime), where Lamba(j) = intensity of black crime.
Under independence,
               Kij(h) will equal pi(h²)
But, if white and black crimes are further apart (i.e. negative correlation or repulsion), then Kij(h) will be less than pi(h²) and
         if white and black crimes are close (i.e. positive correlation or attraction), then Kij(h) will be greater than pi(h²)
Simulation envelopes are used to demonstrate how the K-hat function departs from its theoretical value. K-hat is an estimate of Kij(h). As the relationship between the two patterns is not affected by the spatial randomness of either, we do not compare the distributions to a CSR pattern. Instead, simulation of the entire point pattern can be done by randomly shifting all of the black crimes relative to all of the white crimes.

THE CODE:

The first command to enter is:
library(splancs, first = T)
We use "first = T" so that if the Splus Spatial Module is already loaded, Splancs
will be accessed first. This is necessary in this case because both the Splus Spatial
Module and Splancs have functions called "bbox" and if we don't specify that Splancs
must be loaded first, it will use the Splus Spatial Module version of bbox when we
really want the Splancs version.
Next, we load the data into three different matrices, one with black data, one with white
data, and one with both data sets together. These are simple Splus functions.
white1 _ matrix(scan('/home/ssa5/project1/whitepts'),104,5,byrow=T)
black1 _ matrix(scan('/home/ssa5/project1/blackpts.dat'),147,5,byrow=T)
alldata _ matrix(scan('/home/ssa5/project1/alldata'),251,5,byrow=T)
Then we need to get the data into data structures that Splancs can use, primarily points
data sets and a polygon data set. Polypts is the bounding box for the two data sets together.
Hvector is the vector if nearest neighbor distances between points in the two data sets, and
sort sorts them into ascending order.
whitepts _ as.points(white1[,2],white1[,3])
blackpts _ as.points(black1[,2],black1[,3])
polypts _ bbox(as.points(alldata[,2],alldata[,3]))
hvector _ sort(nndistF(whitepts,blackpts))
Next we calculate the cross-k with an Splancs function and the bivariate Lhat.
kvector _ k12hat(whitepts,blackpts,polypts,hvector)
lvector _ sqrt(kvector/pi)-hvector
We then calculate the simulation envelopes with a toroidal shift using another Splancs function.
This was done with 20 simulations. Upper and lower are the upper and lower bounds of the
simulation envelope for the bivariate Lhat.
K12env _ Kenv.tor(whitepts,blackpts,polypts,20,hvector)
upper _ sqrt(K12env$upper/pi)-hvector
lower _ sqrt(K12env$lower/pi)-hvector
Plot the estimated bivariate Lhat and the simulation envelope.
plot(hvector,lvector,type="l",ylim=c(-40,40),xlab="h",ylab="Bivariate Lhat")
lines(hvector,upper,type="l",lty=2)
lines(hvector,lower,type="l",lty=2)
title(main="Estimated Bivariate Lhat and Simulation Envelopes")
Splancs functions used:
as.points
nndistF
k12hat
Kenv.tor

Thus, we get:

CONCLUSIONS: what do you think? Then, click here for answers

What about edge effects? Edge effects should be considered in analyses such as the ones presented. Due to time considerations, we have not discussed them in class in detail, but click here for information regarding edge effects.