Hierarchical Clickable Micromaps in the GPL Environment for the Communication of Geographically Referenced Statistical Data

Contract (# 8W-1712-NTEX) between
the U.S. Environmental Protection Agency
and George Mason University

Final Report

Period of Performance: 7/15/98 - 5/31/99

Principical Investigator: Jürgen Symanzik
George Mason University
Center for Computational Statistics 4A7
Fairfax, Virginia 22030
Phone: (703) 993 - 3786
FAX : (703) 993 - 1700

e-mail: symanzik@galaxy.gmu.edu
WWW: http://www.galaxy.gmu.edu/~symanzik/

Collaborators funded through this contract:


Results at a Glance


Background and Purpose

EPA is currently conducting a project to assess the national distribution of air toxic concentrations across the United States as part of the Cumulative Exposure Project (CEP). Modeling techniques are used to estimate annual average concentrations of 148 Hazardous Air Pollutants (HAPs), also called air toxics, for the year 1990 from stationary and mobile sources for each census tract in the contiguous US. One of the purposes of the project is to display the data in a way that conveys multiple descriptions of the distribution of the concentrations across the United States in a limited number of graphics. The display needs to be understandable to the general public and incorporate a number of different aspects of the data, including, types of toxics, source of toxics and uncertainties in the estimates. In addition, methods need to be developed to make the data easily accessible to the public via a World Wide Web-based application. To display this information requires integrating statistical methods with visual display methods, including the use of mapping programs.

In addition, it requires the ability to work with an extremely large dataset, containing 60,803 cells, each with 148 observations for each of 9 underlying sources of data (greater than 80 million values). The purpose of this assignment is to develop methods for displaying the geographic distribution of the air toxics data, which incorporates important information regarding sources, individual pollutants and uncertainties and to identify the best method for displaying the information on the Web.

Tasks:
(1) Develop techniques for displaying geographic variation of air toxics data.
(2) Develop a web application that displays the HAP data.


Results

To accomplish our previously listed tasks, three different presentation techniques have been considered. A graphical approach, based on micromaps (Carr & Pierson, 1996; Carr, Olsen, Courbois, Pierson & Carr, 1998), has been considered as a high level visualization and user interface. A micromap can be described as follows. Instead of displaying all available information on a single map, several small maps (e.g., 10 maps if we look at data for the 50 US states) are drawn. The associated data is ordered according to a particular criterion. Then, the five highest values are highlighted on a statistical plot aside the map. For each observation a different color is used in the statistical plot. The corresponding regions (in this case the states) are highlighted in the same colors on the first map. The same is done for the next five highest remaining observations. We continue until all observations/regions have been plotted/highlighted. On the Web, micromaps could also be used as a navigational tool. If the user clicks on a state in the US map, a micromap display at the state level would become visible. If the user now selects a county, a graphical display of the census tracts within this county would be displayed.

Interactive tables have been considered as the second tool for accessing the HAP data. The user would be able to do the same hierarchical selection process as in the micromap displays but it would also be possible to rearrange the ordering of the table entries according to different sorting criteria.

Finally, the raw data would be available and could be downloaded and further analyzed.

The tables (a restricted version that does not contain the sorting criteria) and the raw data displays have been made available through the preview Web site (http://www.galaxy.gmu.edu/~symanzik/gpl/) in November 1998 and have been installed for release on the official EPA CEP Web site (http://www.epa.gov/CumulativeExposure) in December 1998. The interactive version of the tables (http://www.galaxy.gmu.edu/~symanzik/gpl/CEPstart/DATAstartfull.html) has been intended for release in early 1999 and has been made available to EPA in January 1999.

The figures below show (from upper left to lower right): a tabular display at the US level, a tabular display at the state level (Rhode Island), a tabular display at the county level (Bristol County, RI), a raw data display at the county level (Bristol County, RI), an interactive tabular display at the state level (Pennsylvania), and an interactive tabular display at the county level (Prince George County, VA).

In addition to the work on the data-related Web pages, the PI significantly contributed to the overall appearance of the textual part of the CEP Web site (http://www.epa.gov/CumulativeExposure) and performed updates to this site during the entire period of performance of this contract.

The Underlying C Code

The C source code developed for this application can be accessed through the directory http://www.math.usu.edu/~symanzik/epa/final2/c_source/ . Two C source programs are required to create the currently existing data-related Web pages. maps_state.c creates the data tables and raw data displays for the Cumulative Exposure Web page. data_menu2.c creates the top level menu for the Cumulative Exposure Web page. A Makefile is used to control whether the official EPA version, the official preview version, or the inofficial preview version is created by use of the compiler flags EPA, PREVIEW, and FULL, respectively. The required C library for the CGI programming has been taken from http://www.boutell.com/cgic/ .

Procedure for Polygon Generalization for Hierarchical Clickable Maps

The generalization procedure that produced the generalized polygon boundaries for the hierarchical clickable maps consists of three subroutines: (i) a routine to decompose the polygon boundary into arcs or lines and to add topological or neighboring polygon information into the arcs; (ii) a line generalization routine based upon the Douglas-Peucker line simplification algorithm; and (iii) a routine to rebuild polygons by assembling the simplified arcs and based upon topological or neighborhood information.

These routines are applied in a sequential order to generalize polygon boundaries depicting various census enumeration units or political boundary units. These routines are developed using Avenue, an object-oriented programming script language for ArcView, a desktop Geographic Information System (GIS) package. These routines are still undergoing modifications to increase their efficient and stability. Interested parties can contact David Wong (dwong2@gmu.edu) to inquire the status of these scripts.

The files created in this process can be accessed through the directory http://www.math.usu.edu/~symanzik/epa/final2/genmaps/ . Files called stxx.gen where xx is the state FIPS number represent the generalized coordinates for all counties within this state. Files called genstxx.gen where xx is the state FIPS number represent the generalized coordinates for the outer boundary of this state.

Micromaps in S-Plus

Sample linked micromap plots at the state level have been created using S-Plus. It is possible to download the S-Plus code from http://www.math.usu.edu/~symanzik/epa/final2/splus/ . Files with the extension .s are executables for S-Plus, files with the extension .dmp are data files. In case of any problems or additional questions related to the S-Plus code, please contact Dan Carr (dcarr@galaxy.gmu.edu)

Currently, files for Pennsylvania [and Michigan] have been created for usage in S-Plus. When executed properly, one should obtain plots similar to the ones shown below:

Work under the new GPL

Initially, it was planned to use the Graphics Production Library (GPL) as a basis for the interactive graphical displays. The GPL (Carr, Valliant & Rope, 1996) is a tool that allows an easy creation and modification of statistical graphics on the WWW. It supports guidelines and recommendations of modern statistical graphics. The GPL is maintained within the Bureau of Labor Statistics (BLS) and it is accessible on the Web at http://www.monumental.com/dan_rope/gpl/ . A mirrored copy of the GPL as of 8/27/96, provided by the BLS, is available at http://www.galaxy.gmu.edu/~symanzik/gpl/data/ .

However, during the development of the Cumulative Exposure Web page, a new commercial product that builds upon the old GPL is under development. This new product is currently available for testing purposes and will be commercially available in late 1999 or early 2000. The new GPL provides many new features that would make the inclusion of interactive micromaps relatively easy. A preliminary version of the new GPL has been successfully tested and it might be highly recommendable to use the new GPL as the basis for a future graphical display of the HAP data.


Conclusion

Unfortunately, EPA decided not to provide electronic access to the 1990 HAP data through its Cumulative Exposure Web site. Therefore, the official Web site only contains the textual pages related to the Cumulative Exposure Project. Pages with interactive tabular displays and raw data displays exist. They are accessible at the preview site and are ready for immediate release whenever EPA decides to publish the 1990 HAP data.

Procedures for polygon generalization have been developed in Avenue and have been successfully used to create polygon boundaries that can be used for hierarchical clickable micromaps. First experiments with the new GPL have shown very promising results to incorporate micromaps into the GPL.

Once the 1996 HAP data has been processed and analyzed, it could be distributed through the official EPA Cumulative Exposure Web site soon thereafter. An update of the data files and an update of the underlying C programs should be achievable within a few days. Given that the commercial new GPL is on the market by the release of the processed 1996 HAP data, the integration of the hierarchical clickable micromaps into the EPA Web site should be possible within a few weeks.


References

Final Report for 'Methods Development for the Geographic Display of Air Toxics Data': http://www.math.usu.edu/~symanzik/epa/final1/cumul.html

Carr, D. B., Olsen, A. R., Courbois, J. P., Pierson, S. M. & Carr, D. A. (1998) Linked Micromap Plots: Named and Described, Statistical Computing and Statistical Graphics Newsletter, Volume 9, Number 1, pp. 24-32.

Carr, D. B. & Pierson, S. M. (1996) Emphasizing Statistical Summaries and Showing Spatial Context with Micromaps, Statistical Computing and Statistical Graphics Newsletter, Volume 7, Number 3, pp. 16-23.

Carr, D. B., Valliant, R. & Rope, D. (1996) Plot Interpretation and Information Webs: A Time-Series Example from the Bureau of Labor Statistics, Statistical Computing and Statistical Graphics Newsletter, Volume 7, Number 2, pp. 19-26.


Final Report last updated 12/06/99