Research highlighted here is focused on human-centered design and implementation of component-based tools that help agency analysts and others to identify errors, anomalies, clusters, and possible multivariate relationships in geospatially referenced data. One specific focus in our digital government research has been to develop and assess components that support highly interactive visual data analysis. The accompanying figure presents a prototypical application of the tools being developed, uncovering a trend toward increasing lung cancer mortality rates for white females in a few regions of the U.S. (a trend counter to decreases for the U.S. as a whole).
A table browser component provides access to multivariate data for health service areas (HSAs)806 data aggregation units covering the U.S., each with one or more counties. The table browser adapts an application initially built as part of a separate digital government project (Citizen Access to Government Statistical Data). This component is dynamically linked to both a map and an interactive parallel coordinate plot (PCP). The latter depicts multivariate data visually for all HSAs, while the map depicts one variable at a time spatially. Each axis of the PCP represents a data variable and the axes can be manually or computationally sorted. The first five axes shown depict age-adjusted lung cancer mortality for white females (LUNWF) for each of five, three-year averages. The sixth visible axis depicts per-capita income for 1993 (PCI93).
Line segments connect the values on each variable (axis) for each HSA, creating a multivariate signature for each place. With 806 signatures displayed at once, the pattern can be difficult to interpret. Focusing (that is, narrowing the view to subsets of the data range) has been used to highlight HSAs having data values in the top 7% (purple) and bottom 7% (green) for the selected PCP axes (white female lung cancer mortality rates for 199193). The preponderance of crossed line segments between the LUNWF92 and PCI93 axes indicates an inverse relationship (regions with a low per capita income have high lung cancer mortality rates). The dynamically linked map highlights the location of the extreme HSAs (using the same colors as used in the PCP). Also highlighted on both the PCP and map are one prototypical high rate HSA that is picked (selected non-transiently and shown in light blue), and one low rate HSA is indicated (selected transiently and shown in green).
The visual analysis presents a clear picture. While lung cancer mortality for white women began decreasing nationally after a rise in the 1980s, mortality rates for the highest mortality areas (mostly low-income areas) continued to rise. This diverging trend corresponds to related evidence that both major cancers and cancers overall are exhibiting increasing disparity for places of low versus high socioeconomic status [5].
The example here includes only a small sample of available tools that can be integrated into data analysis applications using GeoVISTA Studio (another example developed specifically for our digital government work is a manipulable matrix for exploring bivariate relationships, see [2]). GeoVISTA Studio is a cross-platform, Web-deployable, Java-based, development environment that facilitates integration of data visualization and analysis components to produce stand-alone applications and Web applets.1 Studio enables data analysts who are not software developers to construct analysis applications from components developed independently (as long as the components meet the JavaBeans API standards). Details about Studio are reported in [3, 4].
One objective in our work is to develop effective data analysis components, integrate them into applications, and assess usefulness and usability of those applications. Beyond this, a specific focus has been to enable comprehensive and flexible coordination among software components. There is considerable evidence from usability assessments of coordinated views for information visualization and query that coordinated multiview environments are effective tools for access and analysis of complex information [1]. Coordination among our exploratory spatial data analysis beans is achieved through a separate coordination bean that supports several independent, simultaneous, dynamic connections among coordinator-aware components. These dynamic connections extend traditional concepts of linked brushing. The current implementation supports coordination for three categories of selection and two of visual appearance. For selection, events shared among components (all illustrated in the figure) are picking (direct selection of objects by pointing or bounding), indication (transient picking, as in a mouse-over), and focusing (indirectly manipulating the data range displayed). For visual appearance, shared events include data-to-display mapping (for example, shared colors to depict data categories on the map and parallel coordinate plot) and setting the display context (for example, shared background color, font for text labels, and so on).
Working closely with agency partners, formal usability methods are being applied to continued development and refinement of data exploration components and their coordination. Formal study of the impact of these tools on strategies for data analysis is planned for the coming year.
1. Chimera, R., and Shneiderman, B. An exploratory evaluation of three interfaces for browsing large hierarchical tables of contents. ACM Trans. Info. Systems 12, 4 (1994), 383406.
2. Dai, X., and Hardisty, F. Conditioned and manipulable matrix for visual exploration. In Proceedings of the National Conference for Digital Government Research (Los Angeles, CA, May 2022, 2002), 489492.
3. Gahegan, M., Takatsuka, M., Wheeler, M., and Hardisty, F. Introducing GeoVISTA Studio: An integrated suite of visualization and computational methods for exploration and knowledge construction in geography. Computers, Environment and Urban Systems 26, 4 (2001), 267292.
4. MacEachren, A.M., Hardisty, F., Gahegan, M., Wheeler, M., Dai, X., Guo, D., and Takatsuka, M. Supporting visual integration and analysis of geospatially-referenced statistics through Web-deployable, cross-platform tools. In Proceeding of the National Conference for Digital Government Research (Los Angeles, CA, May 2123, 2001), 1724.
5. Singh, G.K., Miller, B.A., and Hankey, B.F. Area socioeconomic status and changing patterns in U.S. cancer mortality, 19501998: Part IILung and colorectal cancers. J. of the National Cancer Institute.
1 Mark Gahegan directs the GeoVISTA Studio project, with Masa Takatsuka as the primary software architect. For more details and to download the software, see: www.geovista.psu.edu.
This research is part of a larger project (Collaborative Research: Quality Graphics for Federal Statistical Summaries) directed by Dan Carr at George Mason University. Alan MacEachren is PI for the Penn State University component and David Scott is PI for the Rice University component. The project involves collaboration with eight partner agencies (the National Cancer Institute, Census Bureau Population Division, Bureau of Labor Statistics, National Center for Health Statistics, Energy Information Agency, National Agricultural Statistical Service, Environmental Protection Agency, and Bureau of Transportation Statistics). The specific research presented here was carried out in the GeoVISTA Center at Penn State and supported in part by the U.S. National Science Foundation, grant #EIA-9983451. Two additional NSF-funded Digital Government projects contributed toward the software components presented here: #EIA-9983445 and #EIA-9876640.
Figure. A multicomponent exploratory spatial data analysis application constructed with GeoVISTA Studio. In the application, a table (lower left) and a map (upper left) are dynamically linked to an interactive parallel coordinate plot (lower right). The latter depicts multivariate health service area (HSA) data; for each HSA one set of linked line segments depicts a trace (a signature) through multivariate space.
©2003 ACM 0002-0782/03/0100 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2003 ACM, Inc.
No entries found