Period of Performance: 7/15/98 - 5/31/99
Principical Investigator:
Jürgen Symanzik
George Mason University
Center for Computational Statistics 4A7
Fairfax, Virginia 22030
Phone: (703) 993 - 3786
FAX : (703) 993 - 1700
e-mail: symanzik@galaxy.gmu.edu
WWW:
http://www.galaxy.gmu.edu/~symanzik/
Collaborators funded through this contract:
Symanzik, J., Carr, D. B., Axelrad, D. A., Wang, J., Wong, D., Woodruff, T. J. (1999): Interactive Tables and Maps - A Glance at EPA's Cumulative Exposure Project Web Page, 1999 Proceedings of the Section on Statistical Graphics, American Statistical Association, Alexandria, Virginia, Forthcoming. [572K ps.Z (9.6MB uncompressed), 441K pdf]
Symanzik, J., Axelrad, D. A., Carr, D. B., Wang, J., Wong, D., Woodruff, T. J. (1999): HAPs, Micromaps and GPL - Visualization of Geographically Referenced Statistical Summaries on the World Wide Web, © 1999 American Congress on Surveying and Mapping, Annual Proceedings (ACSM-WFPS-PLSO-LSAW 1999 Conference CD). [1.4MB ps, 279K pdf]
Symanzik, J., Wong, D., Wang, J., Carr, D. B., Axelrad, D. A., Woodruff, T. J. (1999): Web-based Access and Visualization of Hazardous Air Pollutants, Proceedings GIS in Public Health 1998, Forthcoming.
Symanzik, J.: Conference (Contributed Talk), 1999 Joint Statistical Meetings (ASA), Baltimore, Maryland (August 10, 1999): Interactive Tables and Maps - A Glance at EPA's Cumulative Exposure Web Page.
Symanzik, J.: Conference (Contributed Talk), 1999 ACSM-WFPS Conference, Portland, Oregon (March 17, 1999): HAPs, Micromaps and GPL - Visualization of Geographically Referenced Statistical Summaries on the World Wide Web .
Wong, D.: Conference (Contributed Talk), 1999 ACSM-WFPS Conference, Portland, Oregon (March 17, 1999): Polygon Generalization in GIS Environements.
Symanzik, J.: Seminar, CSI/Statistics Colloquium Series, George Mason University, Fairfax, Virginia (March 12, 1999): HAPs, Micromaps and GPL - Visualization of Geographically Referenced Statistical Summaries on the World Wide Web.
Symanzik, J.: Seminar, Department of Mathematics and Statistics, Utah State University, Logan, Utah (January 21, 1999): Dynamic and Interactive Statistical Graphics for Spatially Referenced Data.
Symanzik, J.: Seminar, US Geological Survey, Reston, Virginia (September 10, 1998): WWW-based Access and Visualization of Hazardous Air Pollutants.
Symanzik, J.: Conference (Contributed Talk), 1998 GIS in Public Health 3rd National Conference, San Diego, California (August 20, 1998): WWW-based Access and Visualization of Hazardous Air Pollutants .
EPA is currently conducting a project to assess the national distribution of air toxic concentrations across the United States as part of the Cumulative Exposure Project (CEP). Modeling techniques are used to estimate annual average concentrations of 148 Hazardous Air Pollutants (HAPs), also called air toxics, for the year 1990 from stationary and mobile sources for each census tract in the contiguous US. One of the purposes of the project is to display the data in a way that conveys multiple descriptions of the distribution of the concentrations across the United States in a limited number of graphics. The display needs to be understandable to the general public and incorporate a number of different aspects of the data, including, types of toxics, source of toxics and uncertainties in the estimates. In addition, methods need to be developed to make the data easily accessible to the public via a World Wide Web-based application. To display this information requires integrating statistical methods with visual display methods, including the use of mapping programs.
In addition, it requires the ability to work with an extremely large dataset, containing 60,803 cells, each with 148 observations for each of 9 underlying sources of data (greater than 80 million values). The purpose of this assignment is to develop methods for displaying the geographic distribution of the air toxics data, which incorporates important information regarding sources, individual pollutants and uncertainties and to identify the best method for displaying the information on the Web.
Tasks:
(1) Develop techniques for displaying geographic variation
of air toxics data.
(2) Develop a web application that displays the HAP data.
To accomplish our previously listed tasks, three different presentation techniques have been considered. A graphical approach, based on micromaps (Carr & Pierson, 1996; Carr, Olsen, Courbois, Pierson & Carr, 1998), has been considered as a high level visualization and user interface. A micromap can be described as follows. Instead of displaying all available information on a single map, several small maps (e.g., 10 maps if we look at data for the 50 US states) are drawn. The associated data is ordered according to a particular criterion. Then, the five highest values are highlighted on a statistical plot aside the map. For each observation a different color is used in the statistical plot. The corresponding regions (in this case the states) are highlighted in the same colors on the first map. The same is done for the next five highest remaining observations. We continue until all observations/regions have been plotted/highlighted. On the Web, micromaps could also be used as a navigational tool. If the user clicks on a state in the US map, a micromap display at the state level would become visible. If the user now selects a county, a graphical display of the census tracts within this county would be displayed.
Interactive tables have been considered as the second tool for accessing the HAP data. The user would be able to do the same hierarchical selection process as in the micromap displays but it would also be possible to rearrange the ordering of the table entries according to different sorting criteria.
Finally, the raw data would be available and could be downloaded and further analyzed.
The tables (a restricted version that does not contain the sorting criteria) and the raw data displays have been made available through the preview Web site (http://www.galaxy.gmu.edu/~symanzik/gpl/) in November 1998 and have been installed for release on the official EPA CEP Web site (http://www.epa.gov/CumulativeExposure) in December 1998. The interactive version of the tables (http://www.galaxy.gmu.edu/~symanzik/gpl/CEPstart/DATAstartfull.html) has been intended for release in early 1999 and has been made available to EPA in January 1999.
The figures below show (from upper left to lower right): a tabular display at the US level, a tabular display at the state level (Rhode Island), a tabular display at the county level (Bristol County, RI), a raw data display at the county level (Bristol County, RI), an interactive tabular display at the state level (Pennsylvania), and an interactive tabular display at the county level (Prince George County, VA).
In addition to the work on the data-related Web pages, the PI significantly contributed to the overall appearance of the textual part of the CEP Web site (http://www.epa.gov/CumulativeExposure) and performed updates to this site during the entire period of performance of this contract.
The C source code developed for this application can be accessed through the directory http://www.math.usu.edu/~symanzik/epa/final2/c_source/ . Two C source programs are required to create the currently existing data-related Web pages. maps_state.c creates the data tables and raw data displays for the Cumulative Exposure Web page. data_menu2.c creates the top level menu for the Cumulative Exposure Web page. A Makefile is used to control whether the official EPA version, the official preview version, or the inofficial preview version is created by use of the compiler flags EPA, PREVIEW, and FULL, respectively. The required C library for the CGI programming has been taken from http://www.boutell.com/cgic/ .
The generalization procedure that produced the generalized polygon boundaries for the hierarchical clickable maps consists of three subroutines: (i) a routine to decompose the polygon boundary into arcs or lines and to add topological or neighboring polygon information into the arcs; (ii) a line generalization routine based upon the Douglas-Peucker line simplification algorithm; and (iii) a routine to rebuild polygons by assembling the simplified arcs and based upon topological or neighborhood information.
These routines are applied in a sequential order to generalize polygon boundaries depicting various census enumeration units or political boundary units. These routines are developed using Avenue, an object-oriented programming script language for ArcView, a desktop Geographic Information System (GIS) package. These routines are still undergoing modifications to increase their efficient and stability. Interested parties can contact David Wong (dwong2@gmu.edu) to inquire the status of these scripts.
The files created in this process can be accessed through the directory http://www.math.usu.edu/~symanzik/epa/final2/genmaps/ . Files called stxx.gen where xx is the state FIPS number represent the generalized coordinates for all counties within this state. Files called genstxx.gen where xx is the state FIPS number represent the generalized coordinates for the outer boundary of this state.
Sample linked micromap plots at the state level
have been created using S-Plus. It is possible to download the
S-Plus code from
http://www.math.usu.edu/~symanzik/epa/final2/splus/ .
Files with the extension .s are executables for S-Plus,
files with the extension .dmp are data files.
In case of any problems or additional questions
related to the S-Plus code, please contact
Dan Carr (dcarr@galaxy.gmu.edu)
Currently, files for Pennsylvania [and Michigan] have
been created for usage in S-Plus. When executed properly,
one should obtain plots similar to the ones shown below:
Final Report for 'Methods Development for the Geographic Display of Air Toxics Data': http://www.math.usu.edu/~symanzik/epa/final1/cumul.html
Carr, D. B., Olsen, A. R., Courbois, J. P., Pierson, S. M. & Carr, D. A. (1998) Linked Micromap Plots: Named and Described, Statistical Computing and Statistical Graphics Newsletter, Volume 9, Number 1, pp. 24-32.
Carr, D. B. & Pierson, S. M. (1996) Emphasizing Statistical Summaries and Showing Spatial Context with Micromaps, Statistical Computing and Statistical Graphics Newsletter, Volume 7, Number 3, pp. 16-23.
Carr, D. B., Valliant, R. & Rope, D. (1996) Plot Interpretation and Information Webs: A Time-Series Example from the Bureau of Labor Statistics, Statistical Computing and Statistical Graphics Newsletter, Volume 7, Number 2, pp. 19-26.
Final Report last updated 12/06/99