Envisioning Uncertainty in Geospatial Information Kathryn B. Laskey Edward J. Wright Paulo C. G. da Costa George Mason University Information Extraction and George Mason University MS 4A6, 4400 University Drive Transport, Inc. MS 4B5, 4400 University Drive 1911 N. Ft Myer Dr., Suite 600 Fairfax, VA 22030-4400 Arlington, VA 22209 Fairfax, VA 22030-4400 klaskey@gmu.edu ewright@iet.com pcosta@gmu.edu Abstract in seconds products it would take soldiers many hours of Geospatial Reasoning has been an essential aspect of tedious effort to duplicate, and can be sent instantly to military planning since the invention of cartography. relevant consumers anywhere on the Global Information Although maps have always been a focal point for Grid (GIG), the information processing infrastructure of developing situational awareness, the dawning era of the United States Department of Defense (DoD). The GIG Network Centric Operations brings the promise of is the physical infrastructure to enable Network-Centric unprecedented battlefield advantage due to improved Operations, the DoD’s new doctrine for warfare in the 21st geospatial situational awareness. Geographic infor- Century. mation systems (GIS) and GIS-based decision support Advanced automated geospatial tools (AAGTs) transform systems are ubiquitous within current military forces, commercial geographic information systems (GIS) into as well as civil and humanitarian organizations. useful military services for Network Centric Operations. Understanding the quality of geospatial data is Because of their basis in commercial GIS, they also have fundamental to using it intelligently. A systematic widespread applicability to fire, police, disaster relief, and approach to data quality requires: estimating and other problems characterized by a command hierarchy. describing the quality of data as it is collected; The advanced situation awareness provided by AAGTs recording the data quality as meta data; propagating can do much more than simply speed up calculations. uncertainty through models for data processing; They are changing the way military operations are exploiting uncertainty appropriately in decision conducted. The development of tools is shaped by support tools; and communicating to the user the military necessity, but as the new century dawns, the uncertainty in the final product. Bayesian reasoning decision making process itself is being shaped by the provides a principled and coherent approach to automated tools that provide warfighters with more robust representing and drawing inferences about data situational awareness. quality. This paper describes our research on data quality for military applications of geospatial Widespread enthusiasm for AAGTs has created a demand reasoning, and describes model views appropriate for for geospatial data that exceeds the capacity of agencies model builders, analysts, and end users. that produce data. As a result, geospatial data from a wide variety of sources is being used, often with little regard 1 INTRODUCTION for quality. A concern is the influence of errors or uncertainty in geospatial data on the quality of military The focal point of the battlefield command post is the decisions made based on displays of geospatial data. map. Through interactions with the map, the commander and staff collaborate to build a common operating picture. Quality of geospatial data is an issue that has received This common operating picture displays the area of considerable interest in the academic GIS community operations, the militarily significant features of the (Goodchild, 1992). Studies have shown that, while all terrain, the locations of adversary and friendly forces, and geospatial data contain errors, errors in geospatial data are the evolving plan. A generation ago, planning centered on not well documented, not well understood, and are a paper map, its overlays of acetate covered with marks of commonly underestimated by users. A particular problem grease pencils wielded by the staff members congregated is the tendency of users to implicitly trust high resolution around it. Today the paper map has been replaced in graphic computer displays of geographic data. The quality brigade and larger headquarters with a digitized map of the display masks the underlying uncertainty in the projected onto a large-screen display. The grease pencil data (Lunetta & Congalton, 1991). has become an input device for drawing objects or Scientifically based methodologies are required to assess selecting pre-computed overlays from a menu of options. data quality, to represent quality as metadata associated The map and overlays are stored in the computer as data with GIS systems, to propagate it correctly through structures, are processed by algorithms that can generate models for data fusion, data processing and decision support, and to provide end users with an assessment of the implications of uncer- tainty in the data on decision making. Statisticians have developed a wide variety of methods for analyzing and reasoning with spatial data (e.g., Cressie, 1993), and these methods are widely used in generating and an- alyzing geospatial data. A number of authors have applied Bayesian networks to reason about uncertainty in geographic information sys- tems (e.g., Walker, et al., 2005). A Bayesian analysis plugin, based on the open source GeNIe/SMILE1 open source Bayesian network Figure 1: Traditional CCM Product system, has recently been (M1 Tank, DMA Mobility Model, ITD Data, Korea) released for the open source MapWindow 2 GIS system. Applications of Bayesian net- works to geospatial reasoning include avalanche risk attributes, ground or surface roughness, and presence of assessment (Grêt-Regamey and Straub, 2006), locust obstacles. hazard modeling (Jianwen and Qin, 2005), and watershed There are several CCM analysis models commonly in use management (Ames, 2002), and military decision support by military organizations in the U.S. and around the (Wright, 1998; 2002). world. The CCM product of Figure 1 was produced using In his dissertation on the application of Bayesian the DMA CCM algorithm (DMS, 1993). CCM products networks to tactical military decision aids, Wright (2002) can be generated for specific vehicle types, for classes of considered all phases of the life cycle of geospatial data, vehicles, or for military unit types. The products can be including data generation, data management, analysis, used as inputs to algorithms for producing mobility display, and decision support. In this paper, we focus on corridors, or combined with other information to generate improving decisions by representing, propagating through avenues of approach for friendly or enemy forces. models, and reporting to users the uncertainties in Traditional CCM algorithms use point estimates of their geospatial data. We describe how model views can be input data and produce point estimates of predicted applied to conveying the uncertainty in geospatial speeds. Traditional CCM displays show predicted speeds information to decision makers. without any attempt to estimate or communicate the quality of the prediction based on the quality of the 2 CASE STUDY: CROSS COUNTRY underlying data and the quality of the algorithm used to make the prediction. MOBILITY There are many sources of uncertainty in CCM estimates. As a case study to illustrate the challenges and Input data on the factors that influence speed may contain opportunities of uncertainty management in geospatial errors. In many cases, the input parameters required by information systems, we focus on Cross Country Mobility models may be unavailable, and must be estimated using (CCM) analysis. CCM analysis is performed to evaluate the feasibility and desirability of enemy and friendly a combination of auxiliary models and human judgment. Models for predicting speed from input parameters are courses of action. The CCM Tactical Decision Aid (TDA) imperfect. As shown below, uncertainty can have decision predicts the speed that a specific military vehicle or unit implications, and decision making can be improved by can move across country (off roads) based on the terrain. properly considering uncertainty in decision support The terrain factors that influence CCM speed are slope, algorithms. soil type, soil wetness, vegetation and vegetation 3 MILITARY GIS DATA 1 A wide range of military digital mapping products (digital http://genie.sis.pitt.edu/ 2 terrain data) are available from the DoD National http://www.mapwindow.org/ streams, with attributes that define width, depth, bank height and slope. Surface configuration layer contains polygons for surface slope in defined categories. The obstacle layer contains information of other terrain features (like ledges, fences, pipelines, cuts and fills) that may be obstacles to military mobility. ITD is used for a range of military GIS applications (Terrain Analysis) including mobility products such as CCM. Although ITD data is very valuable, it is expensive to produce, requiring a great deal of human-intensive feature extraction. NGA has recognized the inability to provide widespread coverage of ITD (or ITD-like data) in support of worldwide military operations. The NGA concept for future terrain data support Figure 2: Information Content of Interim Terrain Data (ITD) envisions large area coverage of a subset of quickly produced data (Foundation Geospatial Agency (NGA). Two commonly used products Feature Data - FFD) to meet the military’s immediate for military GIS analysis are Digital Terrain Elevation planning needs, and rapid production of more complete Data (DTED) and Interim Terrain Data (ITD). data (Mission Specific Data Sets - MSDS) to meet DTED is an array of elevation values that represent the specific requirements in a crisis. This concept requires surface elevation of portion of the world. Elevation values the ability to combine information from multiple sources are provided on a grid with a defined spacing in the North to produce the needed products. The available inputs may South and East West directions. NGA produces DTED be of varying quality and resolution. It is essential both to level 1 data in cells covering an area of 1 degree by 1 employ a sound methodology for propagating the degree, with a grid spacing of 3 arc seconds uncertainties in the different inputs, and to communicate (approximately 100 meters at the equator). DTED level 2 properly to end users the uncertainty in results. is produced over smaller areas with a grid spacing of 1 arc second (approximately 30 meters at the equator). Higher 4 PROPAGATING UNCERTAINTY resolution DTED at levels 3, 4, and 5 are available in Figure 3 shows an example, taken from (Wright, 2002), of limited areas. DTED is widely used for visualization and a Bayesian network (BN) for integrating data from Line of Sight (LOS) applications. different sources into an integrated vegetation cover map, ITD is the most widely available feature data in use by an important input into a CCM tactical decision aid. military GIS systems today. It was originally developed Information was fused from ten sources, including digital as an interim product, while users awaited a more detailed elevation data; geology data; forest and vegetation maps; and robust digital terrain data product. ITD is available in and various images from the years 1977, 1987, and 1988. two forms - ITD, and VITD (Vector Product Format This Bayesian network applies to a single pixel, and is (VPF) ITD) - which differ in format, although much of replicated for each pixel in the data set. A more the information content is similar. ITD is digital vector sophisticated model must be applied when errors at data, where terrain features are represented as points, lines different pixels are not independent. For example, and polygons. Each terrain feature has a number of fea- blurring of an image can introduce correlations between ture attributes defined for it. Figure 2 shows a graphic that neighboring pixels, and registration errors can introduce a illustrates the information content of ITD. Information is bias that affects all pixels in a given region. A graphical provided in six thematic layers. Vegetation polygons are model for fusing elevation data described by Wright defined for several types of wooded areas, orchards, and (2002) used undirected arcs to model spatial auto- agricultural applications. Vegetation attributes include correlation, and included random variables to represent vegetation stem spacing, and stem diameter. The vertical bias in elevation measurements. If spatial transportation layer contains features that represent roads, correlation and bias were considered serious sources of bridges, railroads, airfields, etc. Attributes define road error, the model of Figure 3 could be extended in a widths, construction materials, bridge length, width, ca- similar manner. pacity, etc. The surface materials layer provides polygons To perform the kind of analysis described here, the fusion of soil type and an attribute for surface roughness. The system must have the necessary information to surface drainage layer contains information on rivers and characterize the quality of the input data sources. Metadata that represents data quality information enables producer and consumer to communicate information about data quality needed for fusing that data with data from other sources. The BN of Figure 3 also makes use of geology, topography, soils, and image data (or results from algorithms run on images). In order for this scheme to work, all data sources must publish relevant data quality information as metadata. Furthermore, all sources must describe appropriate structure (relationships between themes, and common image sources for products). That is, the metadata must include not just simple data quality attributes for results, but also the necessary structural information to enable a probabilistic reasoner to construct Figure 3 – Bayesian Network for Information the appropriate Bayesian network for drawing inferences Integration about vegetation cover. We have argued elsewhere (e.g., Costa, et al, 2007) that this information should be represented as a probabilistic ontology (PO). three databases obtained their raw data for this area from the same satellite image, and all three applied similar An ontology specifies a controlled vocabulary for algorithms for assigning a ground cover type label. In this representing entities and relationships characterizing a situation, the credibility of the aggregate report is no domain. Ontologies facilitate interoperability by greater than any of the individual input credibility values. standardizing terminology, allow automated tools to use In this case, we need to represent not just a single the stored data in a context-aware fashion, enable credibility number, but dependency information about intelligent software agents to perform better knowledge how the credibility depends on the sensor and the data management, and provide other benefits of formalized processing algorithm. If there is uncertainty about the semantics. However, as described in (Costa, 2005), source of the data in one of the databases, then the standard ontology formalisms do not provide a appropriate combination rule would be a probability standardized means to convey both the structural and weighted average, with weights equal to the posterior numerical information required to represent and reason probability, given the observed data, of the different data with uncertainty in a principled way. POs, on the other sources. If the systems providing input give no data hand, are designed for comprehensively describing quality information, or supply insufficient information for knowledge about a domain and the uncertainty associated a probabilistic reasoner to determine unambiguously the with that knowledge in a principled, structured and structure and/or probabilities for the Bayesian network, sharable way. Therefore, POs provide a coherent then the fusion system has an additional inference representation of statistical regularities and uncertain challenge – to determine the appropriate BN for fusing evidence, an ideal way of representing and propagating the diverse inputs. uncertainty in geospatial systems. Like a traditional ontology, a PO represents types of entities that can exist A standard ontology annotated with probabilities could in a domain, the attributes of each type of entity, and the not represent these complex kinds of dependence relationships that can occur between entities. In addition, relatinoships. A probabilistic ontology could, provided a PO can represent probability distributions. This requires that it is based on a sufficiently expressive probabilistic more than the simple ability to represent uncertainty about logic. POs provide a flexible means to express complex the attributes of entities of a given type. POs represent statistical relationships, a crucial requirement for dealing conditional dependencies on other attributes of the same with uncertainty in geospatial systems. Reasoners capable or related entities, as well as uncertainty about the types of handling general-purpose relational probabilistic of entities and the relationships themselves. PR-OWL models are not yet generally available. To compute the (Costa, 2005) is an upper ontology, written in the OWL results shown in Figure 5, custom application was written ontology language, that enables an OWL ontology to to apply the Bayesian network of Figure 3 to each pixel in represent such relational uncertainty. a geographic database, using an application programmer interface to a Bayesian network tool. Today, this example As an example, consider the problem of aggregating could be computed using the Bayesian plugin to geospatial information from several databases. Suppose MapWindow. More sophisticated models including we consult three different databases, all three of which spatial correlation and bias would still require custom label a particular area as forested. Each report is tagged software, although new theory and tools are emerging with a particular credibility. Because the three reports rapidly. agree, standard statistical aggregation technologies would label the region as forested and assign a higher credibility than the three individual credibilities. However, if all Figure 4: Conceptual Model for Mangement of Uncertainty in GIS Data 80% accuracy are considered good. Terrain products, 5 MANAGING UNCERTAINTY IN GIS produced from terrain feature data, will be in error as a DATA result of propagation of the uncertainties in the terrain There are errors, or uncertainties, in all geospatial data. data through the algorithm, or model, used to create the Different kinds of uncertainties in geospatial data include product. Today, military GIS systems typically do not uncertainties due to positional error, feature classification attempt to estimate the uncertainty in GIS products, and error, resolution, attribute error, data completeness, have no way to incorporate uncertainty in algorithms or currency, and logical consistency (Kraak, & Ormeling, display it to users. 1996). Unfortunately many of these types of uncertainty Other uncertainties due to data resolution, completeness, are difficult to quantify, and are often ignored in the or consistency are also present in military GIS systems. production of GIS products - even for military Although users (terrain analysts) may be aware of these applications. uncertainties, there is no systematic way to account for Positional errors, absolute and relative errors in X,Y, and them or to communicate them to decision makers. Z, are reasonably well understood and for most military Figure 4, taken from Wright (2002), presents a model of geospatial data are fairly well defined. For many the lifecycle of geospatial data, showing the management applications, like targeting and navigation, estimates of of uncertainty operations that are required at each stage. positional accuracy are sufficient to evaluate the The first block, data generation, is the creation of suitability of the GIS data for use. Other GIS products, geospatial data from source materials, often remote that depend on position are more complicated. sensors. During this step, tools and techniques that For example, the LOS product depends on the Z location measure the quality of data, as it is produced, are needed. (elevation) of the observer, a potential target, and multiple Quality must be measured in appropriate quality metrics, terrain points. LOS does not depend on absolute and recorded as part of the meta data for the data. The elevations, but on relative elevations of the various points. next two steps, build and manage the database and the Unfortunately, acceptable relative elevations errors are database itself, are important parts of the process that are not specified for DTED level 1 and 2 products, and are often overlooked. Today we rarely generate all new data not used to estimate the uncertainty in LOS predictions. for a particular GIS project. In almost all cases existing data will be available, and there will be new data Uncertainty due to feature classification errors and feature produced by other organizations. All this data must be attribute errors are also commonly neglected in military integrated into a cohesive database. The data integration GIS analysis. The product specification for ITD (and for required to merge these different data layers is a critical related feature products) does not provide any standards and complex operation. In addition to merging the data, for feature classification accuracy or feature attribute we need to merge the corresponding meta data as well, to accuracy. The accuracy of the different thematic layers is derive meta data for the new integrated data. in general unknown, although some studies have been done (Ryder, and Voyadgis, 1996) and results from The database, where available data is stored, is also civilian studies may be used as a guide. In general, explicitly shown as not “full.” Usually we will not have estimation of terrain features like vegetation and soil type all the information we would like to have before we start from imagery source materials - without extensive to generate GIS products. Over time, as additional data is “ground truth” is very difficult. Results which achieve ingested, the database will contain more data - but usually our appetite for new data is also growing, so the data store will never be full. As the availability of data changes over time, the meta data must be updated to reflect the quality of currently available data. The next block, analysis, is the application of GIS operations, according to some model, to produce a GIS product. Techniques for propagating the uncertainty in geospatial data through the GIS model into the GIS product are required. In the following block, the GIS product is displayed or presented to the user. In this step, it is important to present the user with a visualization of the uncertainty in the product. One of the challenges is to find good ways to present such information. The final block in the geospatial life cycle is the user. This block also is an important step in managing Figure 5 – Fused Vegetation Map for 1988 uncertainty: ensuring that users are trained to ask for and use information about the quality of GIS products as part of their decision process. associated with the fused estimate. Figure 6 shows a visualization of CCM with associated uncertainty. The 6 VISUALIZING UNCERTAINTY underlying computations for this display were performed by implementing a standard CCM algorithm as the Visualization of uncertainty in GIS products is essential to Bayesian network shown in Figure 7 (Wright, 2002). communicate uncertainties to decision makers. This helps to prevent decision makers from being blinded by the CCM uncertainty is shown two ways, in the legend and quality of the display, and to make them aware of the via interactive histograms that the user can control. The underlying uncertainty of the product. bi-variate legend uses color to represent the predicted CCM speed range. The quality of the color represents the A few examples of uncertainty visualization ideas, taken quality of the prediction. There is enough information in from Wright (2002), are presented here. Figure 5 shows a the legend that it is difficult to interpret the product fused vegetation map that displays the results of applying colors. This difficulty is exacerbated by the difficulty of the Bayesian network model of Figure 3. The display matching colors from computer monitor to printed shows color-coded highest-probability classifications, and hardcopy. To offset the difficulty in interpretation, user provides the ability to drill down to view the uncertainty Figure 6: CCM Product with Visualization of Uncertainty in the CCM Prediction controlled popup histograms were provided on the digital used in a Monte Carlo technique to map variation in display. Several examples are shown in figure 7. The terrain inputs into variation in predicted CCM speed. The popup histograms are useful to illustrate how the legend graphic output shows four small graphs that map each works: individual terrain parameter's effect on the CCM speed, assuming all other terrain parameters remain fixed (at the • For each pixel in the product display, a mean of their distribution). These small graphics each probability distribution for predicted CCM speed contain the curve of terrain value vs. CCM speed, and two was generated (via Bayesian Network), based on histograms. The one on the bottom is the random uncertainties in the original feature data layers. variation of the terrain parameter, the one on the left is the • The pixel color (legend column) was selected resulting variation in the predicted CCM speed. Note that that corresponds with the highest probability if the terrain parameter vs CCM speed curve is flat (or speed bin. nearly flat) then there is very little variation in predicted CCM speed, even for large variations in terrain values. If • The prediction quality color (legend row) was the terrain parameter vs predicted CCM speed curve is selected based on the range of speed bins with steep, then there can be large variation in predicted CCM probability equal or greater than 10%. speed even if there is little uncertainty in the terrain • For example, the top row, right histogram is for a values. The large histogram at the bottom shows the total bright green pixel, indicating that the predicted distribution of predicted CCM speeds based on the speed is reasonably fast, and there is little combined variation of all the terrain inputs. The total uncertainty. The bottom row, left histogram is distribution of predicted CCM speeds shows more also for a green pixel, indicating that the highest variation in predicted speed than for any of the individual probability is for a fast CCM speed. However, terrain parameters, because of the random combination of there is also a 10% probability that the correct values and interaction between parameters. CCM speed range is the lowest speed bin, so the In the visualization shown, - for this specific set of terrain quality color of this pixel reflects that the actual inputs, and terrain uncertainties - the effects of errors in CCM speed extends across the entire range of slope, stem spacing, and soil strength (Rating Cone Index speeds. - RCI) have only a small impact on the total uncertainty in This CCM display provides more information to decision predicted CCM uncertainty. The influence of stem di- makers about the quality of the prediction and (in the ameter uncertainty, on the other hand, has a fairly large interactive versions) the popup histograms provide a impact on the uncertainty in predicted CCM speed. means to query for more detailed predictions at specific This kind of visualization could be used as an interactive points. guide during data collection: For a given area, and given One type of query cannot be answered by the popup his- the current best estimate of terrain values and terrain ac- tograms of Figure 6. If the decision maker is interested in curacies, it is possible to determine which terrain factor reducing the uncertainty in the CCM predictions - perhaps will provide the most improvement as a result of addi- by allocating reconnaissance resources to collect addi- tional collection effort. tional terrain data, he would like to know the influence of individual terrain factors on the total uncertainty in the CCM prediction. The query is: “what terrain factor contributes the most to the uncertainty in the predicted CCM speed?” Figure 8 shows an additional visualization that makes it possible to answer this query. The figure represents the uncer- tainty in the values of the terrain factors for one specific point on the terrain, as well as a graphical depiction of the impact of each of the individual factors. The visu- alization requires input of the probability distribution that de- scribes the current estimate of the terrain parameters at a point. Figure 7: Bayesian Network Implementation of CCM Algorithm These probability distributions are The above ideas regarding possible visualizations of finish point ahead of schedule by an amount determined uncertain, incomplete data uncover another vital issue for by the padding factor. Agents with access to uncertainty a successful geospatial system – its ability to meet the information estimated a distribution of arrival times. If distinct knowledge requirements of its distinct users. Note this distribution was “too wide,” they were able to that we are not addressing cosmetic GUI customizations, perform “reconnaissance” to reduce the uncertainty, and but a much complex issue. The multitude and diversity of then replan their routes. Their estimated travel time at the users relying upon a wide spectrum of possible features of 90th percentile of the travel time distribution, and also a geospatial system suggests the need of a much richer applied a “padding factor.” As shown in Figure 9, taken approach for predicting the answers that have to be from Wright (2002), results of this experiment showed a provided, the granularity of information sought by each type of user, or even the algorithms that need to be run to meet such requests. Merely listing types of users and crafting customized reports does not scale to geospatial systems intended to meet GIG-era requirements. A more flexible solution is required. One possible approach to face the above challenge might be to employ an ontology conveying knowledge of patterns of system usage, which would trace characteristics related to each type of user to the particular aspects regarding the situation in which a given service is being requested. Depending on how rich this ontology is, the system would be able to predict parameters such as the user’s decision level, precision, timelessness and expected granularity of information, most important factors for CCM predictions, etc, and then optimize its resources to provide the most adequate level of service to that specific situation (e.g. by selecting the most appropriate model views, fine-tuning plausible algorithms for CCM predictions, etc). Finally, in order to meet the demands of a network- centric environment, a service-oriented architecture similar to the one in (Costa et al., 2007) is implied as a precondition for a ontology-driven, seamless interoperable employment of multiple, distributed information sources, repositories, and users of a geospatial system. Figure 8: Visualizing The Influence of Terrain Factors on Total Uncertainty in CCM Predictions 7 DECISION IMPLICATIONS A simulation experiment reported by Wright (2002) dramatic improvement in the probability of arriving at the demonstrates the importance of properly accounting for destination on time for agents that had access to the uncertainty in CCM calculations. Two versions of a CCM uncertainty information. product were generated from the operational terrain database (original terrain data). Both used the same CCM 8 DISCUSSION algorithm, but the first used the algorithm directly and did not estimate the uncertainty of the CCM product, whereas The examples demonstrate the importance of the second used the Bayesian Network implementation representing, properly managing, and communicating to shown in Figure 7. The data quality information used to decision makers information about uncertainty in the GIS generate the uncertain CCM product was the same as that products used for military planning. Several prerequisites used to generate the simulated terrain databases, with the are required. The quality of the geospatial data must be exception that the Bayesian Network CCM process does known, or techniques must be available to estimate data not account for spatial accuracy. quality. If a “ground truth” data set exists, in which values are available for all random variables of the network, then Simulated agents without access to uncertainty straightforward parameter learning algorithms can be used information used a standard A* search algorithm to find to estimate the required parameters. Typically, though, the fastest route from start to finish, and applied a some of the random variables will be unobserved hidden “padding factor” (a parameter varied in the experiment) to variables. In this case, more sophisticated algorithms are determine a start time that predicted them to arrive at the needed for learning in the presence of hidden variables Route Planning w/ and w/o Uncertainty 50 Percent Failures 40 30 w/o Uncertainty 20 w/ Uncertainty 10 0 0 1 2 3 4 Travel time multiplier Figure 9. Results of the Value of Understanding Uncertainty experiment. (e.g., Friedman, 1998; Laskey, et al, 2003). In addition to representing data quality, techniques must be available to References propagate uncertainty of the data through GIS algorithms Ames, D.P., Bayesian Decision Networks for Watershed to estimate the uncertainty in the product. For example, Management. PhD Dissertation, Department of Civil and the Bayesian network of Figure 7 was used to propagate Environmental Engineering, Utah State University, 2002. uncertainty through the CCM model. Costa, P.C.G., Bayesian Semantics for the Semantic Web. Different model views are appropriate for users playing Doctoral Dissertation, School of Information Technology different roles in the uncertainty management process & Engineering, George Mason University: Fairfax, VA, shown in Figure 4. Model developers and implementers USA, 2005. need access to the Bayesian network models of Figures 3 and 7, as well as to statistical models used to estimate the Costa, P.C.G., Laskey, K.B., Laskey, K.J., and Wright, probability distributions that go into the models. End E.J. Probabilistic Ontologies: The Next Step for Net- users need to see views of the model results that are tied Centric Operations, Proceedings of the Twelfth to their familiar ways of interacting with the data. The International Command and Control Research and displays of Figures 5 and 6 are constructed to be similar Technology Symposium, 2007, to appear. to traditional map displays, but to provide additional Cressie, N. A. C. Statistics for Spatial Data. New York, information about uncertainty as part of the display, and John Wiley & Sons, 1993. to allow users to drill down to a more detailed explanation of particular uncertainties. Figure 8 shows one kind of DMS, "Procedural Guide for Preparation of DMA Cross- drill-down that decision makers might find useful. Country Movement (CCM) Overlays", Defense Mapping The analyses and displays shown here were generated as School Student Text DMS ST 330, Ft. Belvoir, VA, 1993. stand-alone applications, and have not been incorporated Friedman, N. Learning Belief Networks in the Presence of into military geospatial analysis tools, into geospatial Missing Values and Hidden Variables. Fourteenth ontologies, or into decision support products. It is possible International Conference on Machine Learning (ICML- to carry out the kinds of analysis described in this paper 97), San Mateo, CA: Morgan Kaufmann Publishers, 1998. with technology available today, however both impose costs on the production and use of geospatial data. So the Goodchild, M.F., “Closing Report NCGIA Research final prerequisite is an organizational decision that Initiative 1: Accuracy Of Spatial Databases l”, National providing information about the uncertainty of GIS Center for Geographic Information and Analysis, products is important - that it provides benefits that University of California, 1992. exceed the costs. Grêt-Regamey, A. and Straub, D. Spatially explicit avalanche risk assessment linking Bayesian networks to a GIS. Natural Hazards and Earth System Science 6(6), pp.911-926, 2006. Jianwen, M., Qin, D. Migratory locust hazard monitoring and prediction using the Bayesian network inference. Proceedings of the International Geoscience and Remote Sensing Symposium (IGARSS) Page(s): 3623 – 3626, 2005 Kraak, M., and Ormeling, F. Cartography, Visualization of Spatial Data, Addison Wesley Longman Limited, Essex, England, 1996. Laskey, K.B. and Myers, J. Population Markov Chain Monte Carlo, Machine Learning 50(1-2), 1993. Lunetta, R.S., Congalton, E.S., Fenstermaker, Jensen, J.R., MCGwire, K.C., and Tinney, L.R. Remote Sensing and Geographic Information System Data Integration: Error Sources and Research Issues. PE&RS 57(Jun), 677- 687, 1991. Ryder, W.H. and Voyadgis, D.E., Measuring the Performance of Algorithms for Generating Ground Slope, USATEC Paper: Presented at DCAC TEM Feb 1996. Walker, A.R., Pham, B., Moody, M. Spatial Bayesian Learning Algorithms for Geographic Information Retrieval, Proceedings of the 13th annual ACM international workshop on Geographic information systems. Bremen, Germany, 2005. Wright, E. Application of Bayesian Networks for Representing Uncertainty in Geospatial Data. ASPRS Spring Convention, Tampa, FL, 1998 Wright, E.J., Understanding and Managing Uncertainty in Geospatial Data for Tactical Decision Aids. Doctoral Dissertation, School of Computational Sciences, George Mason University: Fairfax, VA, USA, 2002.