Hydrologic Predictions using Probabilistic Relational Models


     Max Metzger                     Alison O’Connor                   David F. Boutt                   Joe Gorman
 Charles River Analytics.          Charles River Analytics        University of Massachusetts      Charles River Analytics.
   625 Mt. Auburn St.               625 Mt. Auburn St.             611 North Pleasant Street         625 Mt. Auburn St.
 Cambridge, MA 02138               Cambridge, MA 02138               Amherst, MA 01003             Cambridge, MA 02138


                         Abstract                                 significant role played by terrain, soil, and subsurface
                                                                  factors in the effects of rainfall run-off on terrain. A
                                                                  decision aid capable of automatically evaluating the
     The US Army faces a significant burden in                    suitability of emplacement sites would reduce the time
     planning sustainment operations. Currently,                  needed for evaluation by logistics planners and improve
     logistics planners must manually evaluate                    the quality of sites selected.
     potential emplacement sites to determine their               To reduce the time and the difficulty of logistics site
     terrain suitability. Sites subject to rainfall-runoff        selection we designed and demonstrated a series of
     responses such as flooding are ill-suited for                Terrain Impact Decision Extensions (TIDE) for logistics
     emplacements, but evaluating the likelihood of               planning tools and processes. TIDE performs data-fusion
     such responses requires significant time and                 over a variety of terrain and weather data sets, and uses
     expertise. To reduce the time and to ease the                probabilistic relational models (PRMs) to reason with
     difficulty of logistics site selection we                    uncertainty to evaluate the suitability of potential logistics
     demonstrated a series of Terrain Impact Decision             sites against a series of expert rules for a variety of
     Extensions (TIDE) for use in logistics planning              emplacement systems. By using PRMs to rank the
     tools and processes. TIDE performs data-fusion               severity of potential rainfall-runoff responses, TIDE was
     over a variety of terrain and weather data sets              able to site determine suitability much faster than by
     using probabilistic relational models (PRMS),                rigorous physical simulation. Additionally, PRMs can
     providing a high-performance alternative to                  reason with incomplete data (e.g., a lack of detailed soil
     physics-based hydrologic models.                             information), making them useful even when evaluating
                                                                  data-poor regions.

1.   INTRODUCTION                                                 1.1 PROBLEM DESCRIPTION
Maintaining a constant supply of water and fuel is critical       The rainfall-runoff response of landscapes is a
to sustaining the US Army’s forces in the field.                  fundamental problem in the field of hydrology (Singh,
(Department of the Army, 2008). To provide access to              1988). The accumulation of water at a particular time-
these critical resources, logistics planners must deliver         space location on the Earth’s surface (i.e., terrain
those resources with minimal failure to establish and             ponding) is the result of the confluence of many
maintain emplacements (e.g., tanks, fuel lines) capable of        climatologic, hydrologic, and physical factors and
storing these crucial commodities. Water and fuel                 parameters. During a liquid precipitation event (e.g.,
supplies must not be susceptible to disruption, damage,           rain), water is transported in three main ways: water can
and contamination by water due to rainfall-runoff                 run-off/on in the form of overland flow; infiltrate into the
responses such as flooding, overland flow, and ponding            soil and become ground water; or be transferred back into
(i.e., the temporary accumulation of surface water).              the atmosphere via evapotranspiration. Overland flow can
                                                                  in turn lead to the accumulation of surface water (e.g.,
Currently, the risks posed by rainfall-runoff responses to        flooding), which poses a risk to US Army emplacements.
potential emplacement sites are manually evaluated, and
require considerable expertise and time. Site evaluation is       The terrain assessment model must account for multiple
further complicated for areas lacking detailed data that          aspects of the area of interest. First, overall climatic
describe terrain, soil properties, and subsurface conditions      conditions (i.e., arid, semi-arid, humid) have an important
(e.g., the presence of aquifers). This occurs due to the          influence over the relative distribution of water in the


                                                             31
three pathways. The model must account for uncertainty         with limited or incomplete data before executing any
in weather predictions and climatologic predictions.           impact assessment rules. The PRM model developed
Second, the model must account for local factors within        under the TIDE effort is capable of reasoning with
the area of interest that influence the rainfall-runoff        incomplete data and inferring data that may be absent.
response. There are many such factors, including rainfall      Additionally, while our initial model is very simple,
intensity and duration, slope of the land, land use and land   further work may expand the model to be very complex.
cover characteristics (i.e., vegetation and impervious         The object-oriented PRM approach is well-suited for such
surfaces), soil and air temperature, soil hydraulic            complexity.
properties, and soil moisture conditions. The data sources
                                                               The PRM output is used to generate maps showing the
for these factors may be incomplete or inaccurate,
                                                               likelihood for flow accumulation at a given location for a
introducing additional uncertainty.
                                                               certain amount of time. We based our models on the
The prediction of where, when, and how long water will         Hortonian Infiltration and Runoff/On (HIRO2) model,
accumulate on the land surface is reliant on constraining      which was originally developed for the USDA (Meng,
parameters that describe the above processes and               Green, Salas et al., 2008). This model predicts rainfall-
conditions.     Fortunately, hydrologists have been            runoff responses, including runoff channels (in which
developing tools to both quantify these factors and            surface water flows) and the time until ponding occurs.
develop quantitative models for predicting rainfall-runoff     The HIRO2 model performs well, but operates at larger
response to precipitation events.                              scales than are useful to emplacement selection, generally
                                                               being most accurate at scales of hundreds of meters. The
These models are often based on solving complex
                                                               HIRO2 model served as the basis for our model, but was
equations that govern the physics of surface and
                                                               modified to operate at higher levels of fidelity without
subsurface water (Abbott, Bathurst, Cunge et al., 1986;
                                                               significantly compromising performance.
Panday & Huyakorn, 2004) or assign statistical values to
terrain based on observation (Yoram, 2003). These              Bayesian modeling techniques have been used in the field
models are not practical for US Army planning because          of hydrology for decades (Vicens, Rodriguez-Iturbe,
they require complete data sets, are extremely time-           Schaake et al., 1975), but the majority of this work has
consuming to compute, and do not scale to the levels of        different goals than TIDE. Bayesian modelling
detail and scope required by US Army logistics planners.       approaches generally take existing models that use direct
                                                               measurements as inputs (e.g., rainfall) and predict specific
2.   APPROACH                                                  hydrologic response values (e.g., runoff rate, groundwater
                                                               level). Bayesian techniques are then used to calibrate the
Given the potential incompleteness of input parameters         models parameters to improve their accuracy (Beven &
(including terrain, soil and subsurface data), our approach    Binley, 1992; Thiemann, Trosset, Gupta et al., 2001;
uses a probability-based method to track the inferences        Vrugt, Ter Braak, Clark et al., 2008).
made about data through the model. For TIDE to be
useful, the system must infer terrain characteristics, soil    TIDE differs from past Bayesian hydrologic models in
properties, and subsurface conditions from limited data.       two fundamental ways. First, our model attempts to
While terrain elevation data is available for most of the      predict the impact of rainfall-runoff responses, not their
world at varying levels of detail, soil data is less           precise values. Generally speaking, US Army logistics
prevalent. Land use, land cover (e.g., vegetation), as well    planners are not concerned about predicting the exact
as the soil’s hydrologic properties and moisture               amount of surface water that may accumulate, but are
conditions are all factors in predicting rainfall-runoff       instead primarily concerned about the impact on the
response. When this information is not directly available,     mission. For example, the difference between 1.2 meters
it needs to be estimated or inferred. For example, soil        of standing water or 2.4 meters is irrelevant if either
properties for a given region within the United States may     makes the mission impossible to complete.
be well-known and stored in a Geographic Information           Second, the TIDE model must perform with reasonable
System (GIS) database, but this data may be unknown for        accuracy in regions of the world that have little, if any,
many rural regions around the world. An exhaustive             hydrologic data observations (e.g., hourly flow rates for a
geological survey of potential sites within that region is     stream) that can be used to train or calibrate a model.
not possible given time and personnel constraints. Even        Instrumenting and measuring rainfall-runoff responses in
when terrain, soil, and subsurface data are present, it may    these areas may be too costly, logistically infeasible, or
not be at resolutions high enough to be relevant to the        dangerous. As a consequence the TIDE model must rely
emplacements (e.g., a map with soil data at a resolution of    on generally available data (e.g., elevation, land cover,
500m is of limited use when selecting a site for a fuel line   weather).
less than a meter across). In cases where data describing
terrain characteristics, soil properties, and subsurface       2.1 PROBABALISTIC RELATIONAL MODELS
conditions are absent, purely rule-based approaches are
insufficient, as rules alone are poorly-suited to handling     To represent our terrain and hydrologic models in a
incomplete data. The system must be capable of reasoning       probabilistic form that allows us to determine the


                                                          32
suitability of an area of interest, we designed a              situation. The relationships that hold between these
probabilistic relational model (PRM) (Koller & Pfeffer,        instances are captured by the PRM. For example, our
1998; Pfeffer, Koller, Milch et al., 1999; Friedman,           PRM contains a class SiteModel that has one attribute,
Getoor, Koller et al., 1999). PRMs describe the world in       Suitability. The value of Suitability depends on the
terms of classes of objects, instances of those classes and    instance     of    the    Runoff      class’  attribute,
relationships between them. Serving as a powerful              RankRunoffPotential. To reason over this model, one
extension of Bayesian Networks (BNs), PRMs use object-         must create instances of both the SiteModel and Runoff
oriented semantics that capture attribute, structural, and     classes.
class uncertainty to overcome computational and storage
                                                               The flexibility and reusability of PRMs grant us the
complexity challenges faced by BNs.
                                                               ability to reason over millions of locations. For each
The design of PRMs has proven to be useful in                  location, the relevant set of known facts about specific
representing a wide range of complex domains that              attributes – the land cover type, soil type, rank of slope
involve uncertainty and require flexibility and reusability.   and rank of flow –must be provided to the instances of
In regard to complexity, PRMs capture the logical and          classes. As we transition to discuss our PRM in greater
relational structure of a domain. For example, PRMs            detail, it will become more evident that these four key
specify how one attribute influences the value of another      features of PRMs – complexity, uncertainty, flexibility,
attribute. In our PRM, the value of the attribute,             and reusability – are crucial to obtaining successful
RankDrainageCapacity, is dependent on the values of            results. Bayesian Networks could also apply to this
attributes, LandCoverType and SoilType, from two other         problem, as the relational structure is fixed for every
classes. Therefore, the model uses the values of               instance. Nevertheless, the object-oriented representation
RankDrainageCapacity’s         dependent         attributes,   of PRMs were quite helpful in designing the model.
LandCoverType and SoilType, to infer the value of
RankDrainageCapacity.                                          2.2 PRM EDITOR
To handle uncertainty, PRMs use probability distributions      We developed a PRM Editor that provides an intuitive
encoded in the model to determine values of unknown            graphical user interface (GUI) that allows users to create
variables. The value of LandCoverType and SoilType for         complex PRMs by defining classes of objects, adding
a location are retrieved from data sources outside the         attributes to those class definitions, creating instances of
model and then posted to the model. Therefore, there is        the classes, and specifying the relationships between
little uncertainty in regard to these two attributes.          them.
Conversely, the value of RankDrainageCapacity is
inferred inside the model using probability distributions.     Upon launching the PRM Editor, the user can navigate
To overcome the uncertainty involved with this attribute,      between three views: the global view, class view, and
encoded in the model is a map of possible combinations         instance view. While these three views are initially blank,
of land cover and soil types to appropriate probability        the panels become populated with information and
distributions. Relying on the team’s hydrologic expertise,     graphical representations of the model. The global view
we created initial distributions for each possible pair of     allows a user to view the PRM as a whole in a folder
land cover and soil types, as well as each land cover and      format. Its top-level folder, named after the PRM, can be
soil type provided the other attribute was unknown.            expanded to display three other folders, enums, classes,
Similarly, using domain knowledge, we supplied                 and instances. The enums and instances folders can
distributions for each individual slope ranking, flow          further be expanded to show all enumerations and
ranking, and drainage ranking assuming that the other two      instances of classes in the model. Within the classes
attributes were unknown. Given the increased number of         folder are additional folders for each class that can be
combinations for RankRunoffPotential, to obtain                expanded to view the attributes in that class.
distributions in the case that two or three attributes were    Unlike the global view, the class view displays a
known, we multiplied the probabilities of the known            graphical representation of the PRM. Each class is
attributes    for     each    of     the    five    possible   represented by a box labeled with the class name. If
RankRunoffPotential values. The distributions for              applicable, arrows are automatically drawn between
Suitability were much simpler to encode, as only five          boxes indicating super and subclasses (parent-child
probability distributions that required no further             relationships). The instance view also displays boxes that,
calculations were necessary. While these initial               rather than represent classes, represent the instances of
distributions pass face validation, future work is needed to   classes in the model.
adjust the distributions to meet higher accuracy needs.
                                                               To begin utilizing these three views, the user has the
To support flexibility and reusability, PRMs allow the         option to either load an existing model into the GUI or
reuse of the same class probability models for all             create a new PRM. After loading or initializing the model,
instances of a class. New probabilistic models do not have     the user can begin building the model by adding classes.
to be constructed for each new situation. Instances of         When creating a class, the user must specify the name and
classes can be configured in any way desired for a given       parent class of the new class. In the case of our model, we


                                                          33
created six classes, none of which had parents, so this         the resolution to assignment, the user must enter the exact
field remained blank.                                           value of the attribute or its reference. For example, if the
                                                                attribute were an integer, the user could indicate that
Adding an attribute requires more detail than adding a
                                                                value was 10. Alternatively, the reference could be set to
class. A user must specify the attribute name, type, and
                                                                an attribute of another class that was also an integer. The
resolution. The user has the option to assign single or
                                                                appropriate resolution for Suitability is dependency.
multi as the attribute’s type, as well as choose from a list
                                                                Therefore, the user must specify the influencer, the
of possible types. Possible types contained in this list
                                                                attribute that Suitability depends on, as well as the
include integer, real number, Boolean, type, nothing, as
                                                                conditions and their respective distributions. The
well as all of the classes and enumerations created by the
                                                                conditions are the possible values of the influencer. Each
user. If the attribute is of type enumeration, the user must
                                                                possible value of the influencer is paired with a CPD
have previously defined the appropriate enumeration. For
                                                                indicating the likelihood of each possible value of the
example, in our model, the SiteModel class’ attribute,
                                                                attribute. Suitability depends on the instance of the
Suitability, is of type enumeration. The possible values
                                                                Runoff         class’,       RunoffInstance,        attribute
for Suitability are VeryPoor, Poor, Medium, Well and
                                                                RankRunoffPotential. RankRunoffPotential has five
VeryWell. Therefore, we created an enumeration called
                                                                possible values – VeryLow, Low, Medium, High, and
RankPoor, to represent an attribute with these five
                                                                VeryHigh. Therefore, Suitability will have five conditions
possible values. Before defining an attribute’s resolution,
                                                                and five distributions that indicate the probability of each
the user must create instances of other classes. By
                                                                of Suitability’s five possible values occurring given the
creating instances of classes, attributes in other classes
                                                                value of RankRunoffPotential.
can depend on the attributes of these instances. Figure 1
shows the global and class relationship views after the six              Having defined four enumerations, six classes,
classes and their six respective instances have been            seven attributes, and six instances in our model using the
created.                                                        PRM Editor, the model was saved to as a .prm file that
                                                                could be used by the TIDE system.

                                                                2.3 HYRDOLOGIC MODEL
                                                                A PRM consists of a set of class probability models. The
                                                                final version of our PRM (Figure 2) contains six classes –
                                                                SiteModel, Runoff, Topography, DrainageCapacity,
                                                                LandCover, and Soil. Each class has a set of attributes.
                                                                Attributes are either simple or complex. Simple attributes
                                                                are random variables that represent direct properties of an
                                                                object, such as the type of land cover or type of soil,
                                                                whereas complex attributes represent relationships to
                                                                other objects. The attributes in our model are all simple.
                                                                Logical relationships can be described between classes.
                                                                The lines in Figure 2 represent these relationships.
                                                                Assuming we have an instance of every class, an instance
                                                                of LandCover is related to an instance of the
                                                                DrainageCapacity class by the LandCoverType attribute.
                                                                Each simple attribute is associated with a set of parents
                                                                and a CPD. The parents are determined by the attributes
                                                                that the attribute depends on. Attributes can depend on
                                                                either other simple attributes of the same object or of
                                                                related objects. An example of an attribute of an object
                                                                depending on an attribute of a related object is the
                                                                dependence      of    the   RankDrainageCapacity      on
                                                                LandCoverType.
  Figure 1: Global View (left), Class Relationship View         Attributes of related objects are specified via attribute slot
                         (right)                                chains, such as the slot chain LandCoverInstance
                                                                LandCoverType. This slot chain begins with the object
To complete the implementation of the previously                representing the land cover of a location, and accesses the
discussed attribute, Suitability, its resolution must be        simple attribute indicating the type of land cover at this
defined. The resolution of an attribute can be assigned as      location.     The      model       specifies     that      the
nothing, assignment, or dependency. Upon creating the           RankDrainageCapacity attribute of the DrainageCapacity
attribute, the default choice is nothing. If the user updates   class has this slot chain as a parent. To reiterate, this


                                                           34
indicates that the RankDrainageCapacity depends                           Table 1: SiteModel Class Implementation
probabilistically on the LandCoverType. The other slot
chain      parent     of    RankDrainageCapacity        is      class SiteModel = {
SoilInstance.SoilType. It is important to emphasize that             Suitability:        single
                                                                [RunoffInstance.RankRunoffPotential]
                                                                                                      RankPoor        depends       on

these parent relationships are general, meaning that the             case [VeryLow] =>
land cover or soil type may vary from scenario to                          (0.1 -> VeryPoor, 0.15 -> Poor, 0.5 -> Med, 0.15 -> Well,
                                                                0.1 -> VeryWell)
scenario, but the probabilistic relationships hold for all           case [Low] =>
scenarios (e.g., when a new area is investigated).                         (0.7 -> VeryPoor, 0.1 -> Poor, 0.1 -> Med, 0.05 -> Well,
                                                                0.05 -> VeryWell)
                                                                     case [Med] =>
                                                                           (0.05 -> VeryPoor, 0.1 -> Poor, 0.7 -> Med, 0.1 -> Well,
                                                                0.05 -> VeryWell)
                                                                     case [High] =>
                                                                           (0.05 -> VeryPoor, 0.05 -> Poor, 0.1 -> Med, 0.7 -> Well,
                                                                0.1 -> VeryWell)
                                                                     case [VeryHigh] =>
                                                                           (0.05 -> VeryPoor, 0.05 -> Poor, 0.1 -> Med, 0.1 -> Well,
                                                                0.7 -> VeryWell)
                                                                     case [_] =>
                                                                           (0.2 -> VeryPoor, 0.2 -> Poor, 0.2 -> Med, 0.2 -> Well, 0.2
                                                                -> VeryWell)
                                                                             }


                                                                With a clear understanding of how relationships and
                                                                CPDs are specified in the model, we can discuss how
                                                                inference ultimately determines if a location is suitable.
                                                                The basic order of how the model performs inference is:
                                                                once the values of an attribute’s parents are known, the
                                                                value of that attribute can be inferred. Therefore, the
                                                                process begins by posting evidence to the leaf classes.
                                                                First, the LandCoverType, SoilType, RankSlope, and
                                                                RankFlow evidence is posted to the model. Next, the
                Figure 2: Hydrologic PRM                        model can infer the value of RankDrainageCapacity from
                                                                the land cover and soil data. For example, if the
In our tree-structured PRM, the attributes in the classes       LandCoverType is Shrub and the SoilType is Vertisols,
directly below another class are parents to the attributes in   the probability distribution encoded in the model for
the class above them. Therefore, the attributes in the three    RankDrainageCapacity given this evidence is: case
leaf classes, LandCover, Soil and Topography, do not            [Shrub,Vertisols] => (0.7 -> VeryPoor, 0.3 -> Poor, 0.0 ->
have parents. The values of these attributes are derived        Med, 0.0 -> Well, 0.0 -> VeryWell). Again, this
outside the model and posted as evidence to the model.          distribution can be interpreted as: If LandCoverType is
Conversely, the values of the attributes in the remaining       Shrub and SoilType is Vertisols, there is 70% likelihood
classes, SiteModel, Runoff, and DrainageCapacity, are           the RankDrainageCapacity is VeryPoor and 30%
inferred from the data available within the model.              likelihood the rank of drainage capacity is Poor. Once the
Recall that the other information associated with a simple      CPD of RankDrainageCapacity is determined, the
attribute is a CPD that specifies a distribution over values    distribution can be used in conjunction with the
of an attribute given the values of its parents. In the case    Topography evidence to infer the value of the
of Suitability, its parent is Runoff.RankRunoffPotential.       RankRunoffPotential. This process propagates up the
Table 1 shows the code for the implementation of the            model, as RankRunoffPotential influences the value of
SiteModel class, complete with its attribute, Suitability,      Suitability.
specification of its parent, RankRunoffPotential, and CPD       To determine a site’s suitability, the model uses all
for every possible value of RankRunoffPotential. The            available data. While accuracy increases with amount of
bolded line specifies, in plain terms, that if the              available data, our model is capable of reasoning with
RankRunoffPotential is VeryLow then there is 10%                incomplete or no data. In the case that data is unavailable
likelihood Suitability is VeryPoor, 15% likelihood              or unknown for the four inputs – LandCoverType,
Suitability is Poor, 50% likelihood Suitability is Medium,      SoilType, RankSlope, and RankFlow – the probability is
15% likelihood Suitability is Well and 10% likelihood           evenly distributed over all possible values.
Suitability is VeryWell.
                                                                Our PRM Editor utilizes the open source Figaro
Similar to how parent relationships are defined, the            probabilistic   programming         language      (PPL)
assigned CPD is general; it holds no matter what the            (www.cra.com/figaro) to perform inference. PPLs provide
specific related objects are.                                   a powerful and flexible way to represent probabilistic
                                                                models using the power of programming languages. In


                                                           35
addition, PPLs offer general-purpose reasoning                  Table 2: Mapping terrain slope angles to model inputs
algorithms for inference and machine learning. Our
implementation      utilizes the Metropolis-Hastings          Angle Range                     Rank of Slope
reasoning algorithm, capped with a runtime of 5,000
milliseconds per inference.                                   Ϊ                       Very Low
                                                              10 < Ϊ                      Low
2.4 INTEGRATION
                                                              20 < Ϊ                      Medium
Our PRM used data from the following data sources.
                                                              30 < Ϊ                      High
2.4.1   SRTMVF2
                                                              Ϊ > 60                          Very High
 The Shuttle Radar Topography Mission (SRTM) was a
joint project between NASA and the National Geospatial-
Intelligence Agency (NGA) to create high-resolution land      Flow
surface data for much of the world (roughly 80% of the        The elevation dataset is used to predict flow channels –
Earth’s land surface is covered). The SRTM Void-Filled 2      that is, paths that surface water is likely to take in the
(SRTMVF2) data set is at 1-arc-second (approximately          event of rainfall. A greater amount of flow indicates a risk
30-meter) resolution data, with many gaps in data void-       of surface water accumulation. To calculate flow, we
filled using interpolation techniques (Dowding, Kuuskivi,     relied on the TopoToolbox (Schwanghart & Kuhn, 2010).
Li et al., 2004). The SRTMVF2 dataset serves as our           The toolbox includes techniques for predicting flow
primary elevation data source, as our hydrologic model is     estimation. The flow values predicted can vary wildly. In
heavily dependent on accurate, high-resolution elevation      the case of our AOI, estimated flow varied between 0 and
data. However, we have identified that there are gaps         over 3,300. To normalize the dataset, we first transformed
within the SRTMVF2 elevation data. In areas where no          the flows to a logarithmic scale (changing the range from
SRTMVF2 data can be found, we can fall back to lower          1 to ~9.7) and then normalized the results to [0, 1].
resolution DTED data, including the SRTMVF1 and
SRTMVF0 data sets. Elevation is used to determine                          Table 3: Mapping Flow to Model Inputs
inputs to our model, slope and water flow. Slope and flow
implicitly capture the spatial relationships of each DTED
                                                              Flow Range                      Rank of Flow
point with its neighbors, allowing the PRM to reason
about each point’s data independently.                        flow is exactly 0               Very Low

Slope                                                         0 < flow  0.1                  Low
Slope is determined using the elevation dataset. For the      0.1 < flow                 Medium
initial effort, we used a simple algorithm that iterates
across each elevation point. For each point, the relative     0.2 < flow                 High
change, dE, in elevation is calculated for each adjacent      flow > 0.5                      Very High
point (excluding diagonally adjacent points.) The dE
value with the greatest magnitude is selected, and the
distance between points (1 arc-second in the case of the      As shown in Table 3, these values are then translated into
SRTMVF2 dataset) is used to calculate the angle of the        five discrete inputs for the terrain assessment model
WHUUDLQ¶VVXUIDFHԦ7KLVYDOXHFDQUDQJHIURPGHJUHHV    (same as the slope). As with the slope values, the process
(i.e., perfectly flat) to 90 degrees (which would be a        of mapping flows to discrete ranking values is
perfectly vertical surface.) While there are more elaborate   independent from the flow calculations. This means that
methods for determining slope that provide more accurate      calculating the flows (a process that took roughly two
results, this technique can process millions of points in a   hours for the Demonstration Scenario’s AOI) need only
matter of minutes, and yields sufficient accuracy for the     be run once per AOI, even if we adjust model values or
needs of the terrain assessment model.                        how flow values are mapped to model inputs.
Once the slope angles have been calculated using the
algorithm described above, they are translated from a         2.4.2        GeoCover
continuum of [0, 90) to five discrete values, which are       Earth Satellite Corporation (EarthSat) developed the
used as inputs for the terrain model. Table 2 shows how       GeoGover data set, a global landcover database. The
angle ranges are mapped to model inputs.                      GeoCover dataset consists of 13 land cover classes and is
                                                              available for much of the world (Cunningham, Melican,
                                                              Wemmelmann et al., 2002). Classes of land cover include
                                                              grasslands, agriculture areas (i.e., farmland), wetlands,
                                                              and water/ice. This data will serve as additional inputs to


                                                         36
our terrain models so we can more accurately assess                                 Table 4: Mission Rules
rainfall-runoff response. The GeoCover dataset will also
enable TIDE to identify bodies of water.                        Condition                           Effect

2.4.3    Harmonized World Soil Database                         If the point is a body of water     Mission suitability is Very Poor
The Harmonized World Soil Database (HWSD) was                   If slope is ranked as “Low” or      Mission suitability is High
produced by the European Union’s European Commission            “Very Low” and hydrologic
Joint Research Centre (more specifically, the Land              suitability is “Medium”
Management Unit of the Institute for Environment and
                                                                If slope is ranked as “Low” or      Mission suitability is equal to
Sustainability.) The HWSD is a 30 arc-second
                                                                “Very Low” and hydrologic           hydrologic suitability
(approximately 90-meter) resolution that contains detailed
                                                                suitability is not “Medium”
information about the top soil and subsoil properties. It
was created by merging data from four different soil            If slope is ranked as “Medium”      Mission suitability is Medium
databases (Nachtergaele, Van Velthuizen, Verelst et al.,        and hydrologic suitability is
2008).                                                          “High” or “Very High”
This data allows the model to more accurately predict           If slope is ranked as “Medium”      Mission suitability is equal to
how terrain will respond to surface water (for example,         and hydrologic suitability is not   hydrologic suitability
how quickly water will be absorbed into the soil.) This         “High” or “Very High”
dataset’s low resolution means that some terrain
boundaries (such as coasts) and geographical features           If slope is “High” or “Very         Mission suitability is Very Poor
(such as bodies of water) are of low accuracy compared to       High”
the other data sets.

2.5 MISSION DECISION RULES                                      Currently, rules are distinct from the PRM model, so that
                                                                custom rules can be written for different operational needs
The system must provide a set of logistic system-specific       while using the same PRM model. For example, while the
terrain assessment rules for a variety of systems and           PRM output is constant, the mission requirements for a
purposes (e.g., Tactical Water Distribution System,             long fuel pipeline may be very different than the mission
Assault Hose Line System). Terrain suitability may vary         requirements for a convoy. The fuel line would have a
from system to system—for example, a suspended hose             very low tolerance for changes in elevation (as the pumps
may be unaffected by some types of standing water while         cannot handle the increased workload) and would be
a ground-level hose could be at risk for contamination.         susceptible to contamination from standing water. The
Rule sets for individual systems will need to account for       convoy, while still limited by severe terrain or flooding,
these differences, allowing planners to choose the              would be much more resilient to water and slopes.
appropriate system given the characteristics of a
prospective emplacement site. Additionally, logistics            A standard rules engine and associated rules language,
planners must be able to easily modify and expand these         such as that provided by JBoss, would allow mission
rules as new systems are introduced, and as mission             experts to author rules for the TIDE system without
requirements change. (For example, different rules would        requiring them to understand the PRM or hydrology.
be used for route planning than well placement.)
                                                                2.6 VISUALIZATION
In our initial effort, we have implemented some basic
rules that filter terrain suitability for a hypothetical fuel   Outputs of the PRM and the rules engine, as well as the
line. The fuel line has two requirements: (1) it must be        data sources themselves, were rendered within NASA
installed on flat land (so the pumps can function               WorldWind. WorldWind can accept a variety of GIS data
properly); and (2) the fuel lines cannot be placed in           formats and is easily customizable. Using the open-source
standing water (to prevent contamination), which includes       the Geospatial Data Abstraction Library (GDAL), we
bodies of water (such as lakes) and areas that are prone to     wrote custom modules to render the HWSD and
flooding.                                                       GeoCover data sets, while the SRTMVF2 data was loaded
                                                                in using built-in WorldWind methods. Model output and
To determine suitability for the fuel lines, we take slope,
                                                                rules output were rendered as textures which were then
land coverage, and hydrologic suitability as inputs. We
                                                                projected onto the WorldWind globe at the appropriate
then apply a set of rules as described in Table 4. The rules
                                                                coordinates, but the data could easily be written to a
transform the hydrologic suitability into mission
                                                                variety of GIS formats.
suitability. These rules favor flat land over sloped land.
                                                                The hydrologic suitability is presented as belief values in
                                                                five categories: {Very Poor, Poor, Medium, High, Very
                                                                High}. Related military impact assessment (e.g., weather
                                                                impact assessment) is done at three intervals (e.g., low


                                                           37
risk, medium risk, and high risk.) We expanded our model       the central region of the image – these are riverbeds and
to use five intervals instead of three to present additional   their surrounding valleys, which were detected despite
granularity in the model’s output. Further work is needed      those bodies of water not being explicitly present within
to determine the best number of intervals and their            our GeoCover or HWSD data sets.
thresholds.
                                                               Figure 4 shows the output of our rules engine (and a
For the initial effort, the category with the highest belief   region slightly larger than the figure above). While these
value is selected as the ‘correct’ suitability value. These    rules are very simple, they demonstrate how rules can
categories are then color-mapped for visualization: {Red,      transform the high-density output of the models (Figure 3,
Orange, Yellow, Green, Blue}. The same categories and          above). The model output scores each point in the
colors are used for the rules output.                          elevation grid (approximately a 30 by 30 meter square
                                                               when using the SRTMVF2 data set), producing a very
3.   DISCUSSION                                                dense output. Rules can be used to simplify the models’
                                                               output into easier-to-interpret regions. With these simple
For our area of interest and the 1 arc-second SRTMFV2          rules, we were able to execute rules across the entire
set, there are 25,934,402 points to process. Executing the     region in five minutes. Figure 4 shows the same riverbeds
entire PRM for each point would be unnecessarily               as in Figure 3, but the view is expanded to show a large
complex – instead, we store each unique combination of         lake to the west, which has been appropriately flagged as
{soil type, land cover, rank slope, rank flow} and store       having very poor mission suitability. Unlike the riverbeds
the associated beliefs. This means we can simply look up       (which were predicted by the PRM), this body of water is
the correct PRM output for each unique combination of          present within both the GeoCover and HWSD data sets
inputs, which need only be run through the PRM once. As        (at differing levels of precision).
a result, we are able to process all 25.9 million points in
only two hours. (Further updates to the Figaro library
should increase runtime performance as well.) In a full-
scale TIDE system, the PRM values for all combinations
could be calculated once and only once, and then stored in
a database for quick reference. This database would only
need to be updated when the PRM is updated.


                                                                                Figure 4: Rules Output

                                                               4.   FUTURE WORK

                                                               4.1 TERRAIN AND HYDROLOGIC MODELS
                                                               Future improvements to the model begin by incorporating
                                                               more data. The more information captured by the model,
                                                               the more accurate the inferences will be. The next data
              Figure 3: PRM Model Output                       source to integrate is precipitation data. Depending on the
                                                               duration of the mission, weather or climate data would be
Figure 3 shows the output of the model. (Figures 3 and 4       used. For example, missions spanning from zero to four
are best viewed in color.) The output of the model is very     months would heavily rely on weather information,
grainy as each point in the elevation set can have a           missions spanning from four to eight months would
distinct rank. Of note are the red regions running across      integrate both weather and climate data, and missions


                                                          38
lasting longer than eight months would incorporate              sources into the model, the number of attributes and
climate data. We will also work to quantitatively evaluate      dependencies will increase, resulting in more accurate
the performance and applicability of our models.                inferences. Existing data can also be used to infer missing
                                                                data. For example, using higher-resolution data (such as
Our approach to testing and verifying the accuracy of our
                                                                elevation data or land cover data) we can easily determine
models is two-fold. First, we will compare the outputs of
                                                                that the HSWD fails to cover the coastlines. We can then
our models to those of existing, alternative hydrologic
                                                                predict the missing values using spatial relationships.
models. These models are often based on solving complex
                                                                Ambiguous areas could be assigned multiple values with
equations that govern the physics of surface and
                                                                different confidence values. Figure 5 shows how the two
subsurface water (Abbott et al., 1986; Panday &
                                                                HSWD regions could be used to infer the values for the
Huyakorn, 2004) or assign statistical values to terrain
                                                                missing regions.
based on observation (Yoram, 2003). These models are
not practical for US Army planning because they require         Point A, to the north, would be assigned a high
complete data sets, are extremely time-consuming to             probability of having luvisols as the dominate soil type.
compute, and do not scale to the levels of detail and scope     Point B would be assigned near equal probabilities of
required by US Army logistics planners. However, their          being either luvisols or vertisols. Point C, to the south,
outputs have been validated when tested on carefully            would be assigned a high probability of vertisols as the
monitored and measured regions of terrain, typically            dominate soil type. The inference used for point B could
within the US. By running the TIDE models on the same           be assigned to any region near the boundaries of low-
regions and comparing its output to that of the established     resolution data sets – for example, point D could also be
models, we can confirm that the TIDE models are                 assigned a probability of being either vertisols or luvisols;
functioning correctly.                                          even though the data set classifies it as vertisols, the
                                                                resolution is low enough that the point could be a
Second, we will gather existing data sources of rainfall-
                                                                misclassification. The assigned probabilities, along with
runoff responses. Several regions within the United States
                                                                the soil types themselves, would serve as inputs to the
have had their rainfall-runoff responses measured at
                                                                PRM models.. For example, the soil type input to our
various degrees of fidelity. For example, the Leaf River
                                                                PRMs for Point D could be “{Vertisols-50%, Luvisols-
basin in Mississippi has over forty years of time series
                                                                50%} instead of simply {Vertisols}.
data that includes precipitation and runoff (Yapo, Gupta,
Sorooshian et al., 1996). Additional data sources could be
built from flood records and high water level records.
These data sets will serve to validate the PRM models
used by TIDE. They may also serve as training data to
calibrate the model to more accurately predict the severity
of rainfall run-off responses (e.g., flooding).

4.2 DATA FUSION MODEL
Our basic solution for handling cases of limited or
missing data assumes that each value is equally likely if
no evidence is posted to the model. Under this
assumption, the accuracy of our inferences declines with
limited or no data. The inferences are only as strong as the            Figure 5: Reasoning about incomplete data
data known and evidence provided.
          Future improvements for how to reason with            5.   CONCLUSIONS
incomplete or no data involve adjusting the prior
distributions. Although the prior distributions in our          Flooding, and other terrain rainfall-runoff responses, pose
current model assume that all values of an attribute are        significant risk and cost to US Army operations.
equally likely if no data is available, one would argue this    Assessing the magnitude of flood risk and the impact it
is not representative of the real world. We plan to explore     will have on a mission requires both time and expertise
the possibilities of more representative prior distributions.   that may not always be available. An automated system
For example, the prior distribution for land cover type         for predicting the likelihood and impact of flooding and
could reflect that fact that over 70% of the earth’s surface    surface water accumulation would be of great benefit to
is covered in water, making it the most likely of the seven     logistics planners and the US Army at large.
values.                                                         During our initial effort, we demonstrated the feasibility
This being said, the most dramatic mitigation of                of Terrain Impact Decision Extensions to predict rainfall-
consequences due to incomplete data or unknown values           runoff response. We have identified key data sources
will result from future improvements to the model itself        required for predicting flooding and have developed an
rather than the dependencies. As we integrate more data         initial set of models that are capable of identifying regions


                                                           39
that are at high risk of flooding. These models are capable             Infiltration and Runoff/On. Environmental
of processing millions of data points per hour, allowing                Modelling & Software, 23, 794-812.
them to process thousands of square kilometers. We feel
these models and their performance indicate our approach       Nachtergaele, F., Van Velthuizen, H., Verelst, L., Batjes,
is sound, and future work will refine and validate the                 N., Dijkshoorn, K., Van Engelen, V., Fischer, G.,
models’ performance.
                                                                       Jones, A., Montanarella, L., and Petri, M. (2008).
                                                                       Harmonized world soil database. Food and
Acknowledgements
                                                                       Agriculture Organization of the United Nations.
This work was performed under US Army Research Lab
contract number W911QX-13-C-0111. The authors would            Panday, S.& Huyakorn, P. S. (2004). A fully coupled
like to thank Mr. Peter Grazaitis for his significant                  physically-based spatially-distributed model for
technical support and eager engagement on this project.                evaluating surface/subsurface flow. Advances in
This work was funded in its entirety by ARL. We would                  water Resources, 27, 361-382.
also like to thank Ms. Yvonne Fuller and Ms. Jill Oliver
for their assistance in preparing this paper.                  Pfeffer, A., Koller, D., Milch, B., and Takusagawa, K. T.
                                                                        (1999). SPOOK: A system for probabilistic
References:
                                                                        object-oriented knowledge expression. In 14th
                                                                        Annual Conference on Uncertainty in AI (UAI).
Abbott, M. B., Bathurst, J. C., Cunge, J. A., O'Connell, P.
        E., and Rasmussen, J. (1986). An introduction to
                                                               Schwanghart, W.& Kuhn, N. J. (2010). TopoToolbox: A
        the European Hydrological System—Systeme
                                                                      set of Matlab functions for topographic analysis.
        Hydrologique Europeen,“SHE”, 1: History and
                                                                      Environmental Modelling & Software, 25, 770-
        philosophy of a physically-based, distributed
                                                                      781.
        modelling system. Journal of hydrology, 87, 45-
        59.
                                                               Singh, V. P. (1988). Hydrologic systems. Volume I:
                                                                       Rainfall-runoff modeling. Prentice Hall,
Beven, K.& Binley, A. (1992). The future of distributed
                                                                       Englewood Cliffs New Jersey. 1988. 480.
        models: model calibration and uncertainty
        prediction. Hydrological processes, 6, 279-298.
                                                               Thiemann, M., Trosset, M., Gupta, H., and Sorooshian, S.
                                                                      (2001). Bayesian recursive parameter estimation
Cunningham, D., Melican, J., Wemmelmann, E., and
                                                                      for hydrologic models. Water Resources
       Jones, T. (2002). GeoCover LC-A moderate
                                                                      Research, 37, 2521-2535.
       resolution global land cover database. In ESRI
       International User Conference.
                                                               Vicens, G. J., Rodriguez-Iturbe, I., and Schaake, J. C.
                                                                       (1975). A Bayesian framework for the use of
Dowding, S., Kuuskivi, T., and Li, X. (2004). Void fill of
                                                                       regional information in hydrology. Water
       SRTM elevation data–principles, processes and
                                                                       Resources Research, 11, 405-414.
       performance. In Images to Decisions: Remote
       Sensing Foundations for GIS Applications,
                                                               Vrugt, J. A., Ter Braak, C. J., Clark, M. P., Hyman, J. M.,
       ASPRS, Fall Conf., Sep, 12-16.
                                                                         and Robinson, B. A. (2008). Treatment of input
                                                                         uncertainty in hydrologic modeling: Doing
Friedman, N., Getoor, L., Koller, D., and Pfeffer, A.
                                                                         hydrology backward with Markov chain Monte
        (1999). Learning probabilistic relational models.
                                                                         Carlo simulation. Water Resources Research, 44.
        In Sixteenth International Joint Conference on
        Artificial Intelligence (IJCAI-99).
                                                               Yapo, P. O., Gupta, H. V., and Sorooshian, S. (1996).
                                                                       Automatic calibration of conceptual rainfall-
Koller, D.& Pfeffer, A. (1998). Probabilistic frame-based
                                                                       runoff models: sensitivity to calibration data.
         systems. In Fifteenth National Conference on
                                                                       Journal of Hydrology, 181, 23-48.
         Artificial Intelligence (AAAI-98), 580-587.
                                                               Yoram, R. (2003). Applied stochastic hydrogeology.:
Meng, H., Green, T. R., Salas, J. D., and Ahuja, L. R.
                                                                      Oxford University Press.
       (2008). Development and testing of a terrain-
       based hydrologic model for spatial Hortonian


                                                          40