Horizontal Integration of Warfighter Intelligence Data
                          A Shared Semantic Resource for the Intelligence Community
Barry Smith        Tatiana Malyuta William S. Mandrick       Chia Fu                   Kesny Parent                  Milan Patel
University at     Data Tactics Corp. Data Tactics Corp. Data Tactics Corp.           Intelligence and             Intelligence and
  Buffalo,             VA, USA           VA, USA            VA, USA                Information Warfare          Information Warfare
 NY, USA                                                                            Directorate (I 2WD)          Directorate (I 2WD)
                                                                                   CERDEC, MD, USA              CERDEC, MD, USA

Abstract - We describe a strategy that is being used for the         have been developed. We propose a strategy for horizontal
horizontal integration of warfighter intelligence data within the    integration which seeks to avoid such problems by being
framework of the US Army’s Distributed Common Ground                 completely independent of the processes by which the data
System Standard Cloud (DSC) initiative. The strategy rests on        store to which it is applied is populated and utilized. This
the development of a set of ontologies that are being
                                                                     strategy, which draws on standard features of what is now
incrementally applied to bring about what we call the
‘semantic enhancement’ of data models used within each               called ‘semantic technology’ [2], has been used successfully
intelligence discipline. We show how the strategy can help to        for over ten years to advance integration of the data made
overcome familiar tendencies to stovepiping of intelligence          available to bioinformaticians, molecular biologists and
data, and describe how it can be applied in an agile fashion to      clinical scientists in the wake of the successful realization of
new data resources in ways that address immediate needs of           the Human Genome Project [3, 4]. The quantity and variety
intelligence analysts.                                               of such data – now spanning all species and species-
                                                                     interactions, at all life stages, at multiple granularity levels,
Index Terms—semantic enhancement, ontology, joint doctrine,
intelligence analytics, intelligence data retrieval.                 and pertaining to thousands of different diseases – is at least
                                                                     comparable to the quantity and variety of the data which
                                                                     need to be addressed by intelligence analysts. As we
                     I.      INTRODUCTION
                                                                     describe in more detail in [5], however, today’s dynamic
                                                                     environment of military operations (from Deterrence to
    The horizontal integration of warfighter intelligence data       Crisis Response to Major Combat Operations) is one in
is described in Chairman of the Joint Chiefs of Staff                which ever new data sources are becoming salient to
Instruction J2 CJCSI 3340.02A [1] in the following way:              intelligence analysis, in ways which will require a new sort
Horizontally integrating warfighter intelligence data improves the   of agile support for retrieval, integration and enrichment of
consumers’ production, analysis and dissemination capabilities.      data. We will thus address in particular how our strategy can
Horizontal Integration (HI) requires access (including discovery,    be rapidly reconfigured to allow its application to emerging
search, retrieval, and display) to intelligence data among the       data sources.
warfighters and other producers and consumers via standardized           The strategy is one of a family of similar initiatives
services and architectures. These consumers include, but are not
                                                                     designed both to rectify the legacy effects of data stovepiping
limited to, the combatant commands, Services, Defense agencies,
and the Intelligence Community.                                      in the past and to counteract the problems caused by new
                                                                     stovepipes arising in the future. It is currently being applied
Horizontal integration is achieved when multiple                     within the DCGS-A Standard Cloud (DSC) initiative, which
heterogeneous data resources become aligned or harmonized            is part of the Distributed Common Ground System-Army [6],
in such a way that search and analysis procedures can be             the principal Intelligence, Surveillance and Reconnais-
applied to their combined content as if they formed a single         sance (ISR) enterprise for the analysis, processing and
resource. We describe here a methodology that is designed            exploitation of all US Army intelligence data, and which is
to achieve such alignment in a flexible and incremental way.         designed to be interoperable with other DCGS
The methodology is applied to the source data at arm’s               programs. The DSC Cloud is a military program of record in
length, in such a way that the data itself remains unaffected        the realm of Big Data that is accumulating data from
by the integration process.                                          multiple diverse sources and with high rapidity of change. In
    Ironically, attempts to achieve horizontal integration           [5, 7] we described how the proposed strategy is already
have often served to consolidate the very problems of data           helping to improve search results within the DSC Cloud in
stovepiping which they were designed to solve. Integration           ways that bring benefits to intelligence analysts. In this
solution A is proposed; and works well for the data and              communication, we present the underlying methodology
purposes for which it was originally tailored; but it does not       describing also how it draws on resources developed in an
work at all when applied to new data, or to existing data that       incremental way that takes account of lessons learned in
has to be used in new ways. Such failures arise for a variety        successive phases of application of the methodology to new
of reasons, many of which have to do with the fact that              kinds of data. Here we provide only general outlines. Further
integration systems are too closely tied to specific features        details and supplementary material are presented at [8].
of the (software/workflow) environments for which they
         II.     OVERCOMING SEMANTIC STOVEPIPES                    order to ensure that the suite of purpose-built ontologies
    Every data store is based on some data model which             evolves in a consistent and non-redundant fashion.
specifies how the data in the store is to be organized. Since           III.   DEFINING FEATURES OF THE SE APPROACH
communities that develop data stores do so always to serve
some particular purpose, so each data model, too, is oriented          Associating terms used in source data with preferred
around some specific purpose. Data models have been                labels in ontologies leads to what we call ‘Semantic
created in uncoordinated ways to address these different           Enhancement’ (SE) of the source data. The ontologies
purposes, and they typically cannot easily be modified to          themselves we call ‘SE ontologies’, and the semantically
serve additional purposes. Where there is a need to combine        enhanced source data together form what we call the
data from multiple existing systems, therefore, the tendency       ‘Shared Semantic Resource’ (SSR). To create this resource
has been to invest what may be significant manual effort in        in a way that supports successful integration, our
building yet another data store, thereby contributing further      methodology must ensure realization of the following goals,
to a seemingly never-ending process of data stovepipe              which are common to many large-scale horizontal
proliferation.                                                     integration efforts:
    To break out of this impasse, we believe, a successful         • It must support an incremental process of ontology
strategy for horizontal integration must operate at a different      creation in which ontologies are constructed and
level from the source data. It must be insulated from                maintained by multiple distributed groups, some of them
entanglements with specific data models and associated               associated with distinct agencies, working to a large
software applications, and it must be marked by a degree of          degree independently.
persistence and of relative technological simplicity over
                                                                   • The content of each ontology must exist in both human-
against the changing source data to which it is applied.             readable (natural language) and computable (logical)
    The strategy we propose, which employs by now                    versions in order to allow the ontologies to be useful to
standard methods shared by many proponents of semantic               multiple communities, not only of software developers
technology [2], begins by focusing on the terms (labels,             and data managers, but also of intelligence analysts.
acronyms, codes) used as column headers in source data
artifacts. The underlying idea is that it is very often the case   • Labels must be selected with the help of SMEs in the
that multiple distinct terms {t1, …, tn} are used in separate        relevant domains. This is not because these labels are
data sources with one and the same meaning. If, now, these           designed to be used by SMEs at the point where source
terms are associated with some single ‘preferred label’              data are collected; rather it is to ensure that the
drawn from some standard set of such labels, then all the            ontologies reflect the features of this domain in a way
separate data items associated with the {t1, … tn} will              that coheres as closely as possible with the
become linked together through the corresponding preferred           understanding of those with relevant expertise. Where
labels.                                                              necessary – for instance in cases where domains overlap
    Such sets of preferred labels provide the starting point         – multiple synonyms are incorporated into the structure
for the creation of what are called ‘ontologies’, which are          of the relevant ontologies to reflect usage of different
created (1) by selecting a preliminary list of labels in             communities of interest.
collaboration with subject-matter experts (SMEs); (2) by           • Ontology development must be an arms-length process,
organizing these labels into graph-theoretic hierarchies             with minimal disturbance to existing data and data
structured in terms of the is_a (or subtype) relation and            models, and to existing data collection and management
adding new terms to ensure is_a completeness; (3) by                 workflows and application software.
associating logical definitions, lists of synonyms and other       • Ontologies must be developed in an incremental process
metadata with the nodes in the resultant graphs. One                 which approximates by degrees to a situation in which
assumption widespread among semantic technologists is that           there is one single reference ontology for each domain of
ontology-based integration is best pursued by building large         interest to the intelligence community.
ontology repositories (for example as at [9]), in which,
while use of languages such as RDF or OWL is                       • The ontologies must be capable of evolving in an agile
standardized, the ontologies themselves are unconstrained.           fashion in response to new sorts of data and new
Our experience of efforts to achieve horizontal integration in       analytical and warfighter needs.
the bioinformatics domain, however, gives us strong reason         • The ontologies must be linked together through logical
to believe that, in order to counteract the creation of new          definitions [10], and they must be maintained in such a
(‘semantic’) stovepipes, we must ensure that the separate            way that they form a single, non-redundant and
ontologies are constructed in a collaborative process which          consistently evolving integrated network. The fact that
ensures a high degree of integration among the ontologies            all the ontologies in this network are being used
themselves. To this end, our strategy imposes on ontology            simultaneously to create annotations of source data
developers a common set of principles and rules and an               artifacts will in turn have the effect of virtually
associated common architecture and governance regime in              transforming the latter into an evolving single SSR, to
    which computer-based retrieval and analysis tools can be     idea that ontologies should be constructed as
    applied.                                                     representations, not of data or of data models, but rather of
The ontology development strategy we advocate thus differs       the types of entities in reality to which the data relate.
radically from other approaches (such as are propounded in           The first step in the development of an ontology for a
[11]), which allow contextualized inconsistency. For while       domain that has been identified as a target for intelligence
of course source data in the intelligence domain will            analysis is thus not to examine what types of data we have
sometimes involve inconsistency – the data is derived, after     about that domain. Rather, it is to establish in a data-neutral
all, from multiple, and variably reliable, sources –, to allow   fashion the salient types of entities within the domain, and
inconsistency among the ontologies used in annotations           to select appropriate preferred labels for these types,
would, from our point of view, defeat the purposes of            drawing for guidance on the language used by SMEs with
horizontal integration.                                          corresponding domain expertise. In addition, we rely on
                                                                 authoritative publications such as the capstone Joint
   To achieve the goals set forth above, we require:             Publication (JP) 1 of Joint Doctrine and the associated
• A set of ontology development rules and principles, a          Dictionary (JP 1-02) [14, 15] (see Figure 1), applying
  shared governance and change management process, and           adjustments where necessary to ensure logical consistency.
  a common architecture incorporating a common,                  The resultant preferred labels are organized into simple
  domain-neutral, upper-level ontology.                          hierarchies of subtype and supertype, and each label is
• An ontology registry in which all ontology initiatives         associated with a simple logical definition, along the lines
  and emerging warfighter and analyst needs will be              illustrated (in a toy example) in Table 1.
  communicated to all collaborating ontology developers.
• A simple, repeatable process for ontology development,            vehicle	
  =def:	
  an	
  object	
  used	
  for	
  transporting	
  people	
  or	
  
  which will promote coordination of the work of                    goods	
  
  distributed development teams, allow the incorporation                 personnel	
  carrier	
  =def.	
  a	
  vehicle	
  that	
  is	
  used	
  for	
  
                                                                         transporting	
  persons	
  
  of SMEs into the ontology development process, and
                                                                         tractor	
  =def:	
  a	
  vehicle	
  that	
  is	
  used	
  for	
  towing	
  
  provide a software-supported feedback channel through                  crane	
  =def:	
  a	
  vehicle	
  that	
  is	
  used	
  for	
  lifting	
  and	
  moving	
  
  which users can easily communicate their needs, and                    heavy	
  objects	
  
  report errors and gaps to those involved in ontology              	
  
  development.                                                      vehicle	
  platform=def.	
  means	
  of	
  providing	
  mobility	
  to	
  a	
  
• A process of intelligence data capture through                    vehicle	
  
  ‘annotation’ [12] or ‘tagging’ of source data artifacts [7],           wheeled	
  platform=def.	
  a	
  vehicle	
  platform	
  that	
  
  whereby the preferred labels in the ontologies are                     provides	
  mobility	
  through	
  the	
  use	
  of	
  wheels	
  	
  
                                                                         tracked	
  platform=def.	
  a	
  vehicle	
  platform	
  that	
  provides	
  
  associated incrementally with the terms embedded in
                                                                         mobility	
  through	
  the	
  use	
  of	
  continuous	
  tracks	
  
  source data models and terminology resources in such a                                  Table 1. Fragments of asserted ontologies
  way that the data in distinct data sources, where they
                                                                                  V.            REALIZATION OF THE STRATEGY
  pertain to a single topic, are represented in the SSR in a
  way that associates them with a single ontology term.               There is a tension, in attempts to create a framework for
  Currently the annotation process is primarily manually         horizontal integration of large and rapidly changing bodies
  driven, but it will in the future incorporate the use of       of data, which turns on the fact that (1) to secure integration
  Natural Language Processing (NLP) tools. Importantly,          the framework needs to be free from entanglements with
  the process of annotation incrementally tests the              specific data models; yet (2) to allow effective
  ontologies against the data to which they must be              representation of data, the framework needs to remain as
  applied, thereby helping to identify errors and gaps in the    close as possible to those same data models.
  ontologies and thus serving as a vital ontology quality             This same tension arises also for the SE approach, where
  assurance mechanism [12].                                      it is expressed in the fact that:
                                                                 (1) The SSR needs to be created on the basis of persistent,
               IV.     ONTOLOGICAL REALISM                             logically well-structured ontologies designed to be
    The key idea underlying the SE methodology is that the             reused in relation to multiple different bodies of data;
successful application of ontologies to horizontal data                yet:
integration requires a process for creating ontologies that is   (2) To ensure agile response to emerging warfighter needs,
independent of specific data models and software                       its ontologies must be created in ways that keep them as
implementations. This is achieved through the adoption of              close as possible to the new data that is becoming
what is called ‘ontological realism’ [13], which rests on the          available locally in each successive stage.
                                                     Figure 1 - Joint Doctrine Hierarchy


To resolve this tension, the SE strategy incorporates a                   that this asserted is_a hierarchy is a monohierarchy (a
distinction between two sorts of ontologies, called                       hierarchy in which each term has at most one parent). This
‘reference’ and ‘application’ ontologies, respectively. By                requirement is imposed for reasons of efficiency and
‘reference ontology’, we mean an ontology that captures                   consistency: it allows the total ontology structure to be
generic content and is designed for aggressive reuse in                   managed more effectively and more uniformly across
multiple different types of context. Our assumption is that               distributed development teams – for example by aiding
most reference ontologies will be created manually on the                 positioning and surveyability of terms. It brings also
basis of explicit assertion of the taxonomical and other                  computational performance benefits [23] and provides an
relations between their terms. By ‘application ontology’, we              easy route (described in Section V.E below) to the creation
mean an ontology that is tied to specific local applications.             of the sorts of logical definitions we will need to support
Each application ontology is created by using ontology                    horizontal integration. The principle of asserted single
merging software [16] to combine new, local content with                  inheritance comes at a price, however, in that it may require
generic content taken over from relevant reference
                                                                          reformulation of content – for example deriving from multi-
ontologies [17, 18], thereby providing rapid support for
information retrieval in relation to particular bodies of                 inheritance ontologies already developed by the intelligence
intelligence data but in a way that streamlines the task of               community – that is needed to support the creation of the
ensuring horizontal integration of this new data with the                 SSR. Again, our experience in the biomedical domain is that
existing content of the SSR.                                              such reformulation, while requiring manual effort, is in
                                                                          almost all cases trivial, and that, where it is not trivial, the
A. Principle of Single Inheritance                                        effort invested often brings benefits in terms of greater
                                                                          clarity as to the meanings and interrelationships of the new
Our ontologies are ‘inheritance’ hierarchies in the sense that
                                                                          terms that need to be imported into the SE framework.
everything that holds (is true) of the entities falling under a
given parent term holds also of all the entities falling under           B. A Simple Case Study
its is_a child terms at lower levels. Thus in Figure 2, for                   Imagine, now, that there is a need for rapid creation of an
example, everything that holds of ‘vehicle’ holds also of                 application ontology incorporating preferred labels to
‘tractor’. Each reference ontology is required to be created              describe artillery units available to some specific military
around an inheritance hierarchy of this sort that is                      unit called ‘Delta Battery’. Such an ontology is enabled,
constructed in accordance with what we call the principle of              first, by selecting from existing reference ontologies the
asserted single inheritance. This requires that for each                  terms needed to address the data in hand, for example of the
reference ontology the is_a hierarchy is asserted, through                sort used in Table 1. Second we define supplementary terms
explicit axioms (subclass axioms in the OWL language),                    needed for our specific local case, as in Table 2.
rather than inferred by the reasoner. In addition it requires
      Some of these terms may later be incorporated into
corresponding asserted ontologies within the SE suite. For
our present purposes, however, they can be understood as
being simply combined together with the associated asserted
ontology terms using ontology merging software, for
example as developed by the Brinkley [17,19,17] and He
[20,21] Groups. Because of the way the definitions are
formulated, it is then possible to apply an automatic
reasoner [22] to the result of merger to infer new relations,
and thereby to create a new ontology hierarchy, as in Figure
2. Note that, in contrast to the reference ontologies from
which it is derived, such an application ontology need not
satisfy the principle of single inheritance. Note, too, that the
definitions are exploited by the reasoner not only to generate
the new inferred ontology, but also to test its consistency
both internally and with the reference ontologies from which
it is derived.
                                                                                                                         Figure 2. Inferred ontology of Delta Battery artillery vehicles.
                                                                                                                   Child-parent links are inferred by the reasoner from the content of merged
   artillery	
  weapon	
  =	
  def.	
  device	
  for	
  projection	
  of	
  munitions	
                           reference ontologies and from definitions of the supplementary terms. Note
   beyond	
  the	
  effective	
  range	
  of	
  personal	
  weapons	
                                                                that some terms have multiple parents.
   artillery	
  vehicle	
  =	
  def.	
  vehicle	
  designed	
  for	
  the	
  transport	
  
   of	
  one	
  or	
  more	
  artillery	
  weapons	
  
                                                                                                                      A suite of normalized ontologies is easier to maintain,
   wheeled	
  tractor	
  =	
  def.	
  a	
  tractor	
  that	
  has	
  a	
  wheeled	
  
   platform	
  
                                                                                                                  because globally significant changes – those changes which
   tracked	
  tractor	
  =	
  def.	
  a	
  tractor	
  that	
  has	
  a	
  tracked	
  platform	
                   potentially have implications across the entire suite of
   artillery	
  tractor	
  =	
  def.	
  an	
  artillery	
  vehicle	
  that	
  is	
  a	
  tractor	
  	
            ontologies – can be made in just one place in the relevant
   wheeled	
  artillery	
  tractor	
  =	
  def.	
  an	
  artillery	
  tractor	
  that	
                           reference ontology, thereby allowing consequent changes in
   has	
  a	
  wheeled	
  platform	
                                                                              the associated inferred ontologies to be propagated
   Delta	
  Battery	
  artillery	
  vehicle=def.	
  an	
  artillery	
  vehicle	
                                  automatically. This makes ontology-based integration easier
   that	
  is	
  at	
  the	
  disposal	
  of	
  Unit	
  Delta	
                                                   to manage and scale, because when single-inheritance
   Delta	
  Battery	
  artillery	
  tractor=def.	
  an	
  artillery	
  tractor	
  that	
                          modules serve to constrain allowable sorts of combinations,
   is	
  at	
  the	
  disposal	
  of	
  Unit	
  Delta	
                                                           this makes it easier to avoid problems of combinatorial
   Delta	
  Battery	
  wheeled	
  artillery	
  tractor=def.	
  a	
  wheeled	
                                     explosion.
   artillery	
  
           Tabletractor	
        that	
  is	
  ofat	
  supplementary
                       2: Examples                      the	
  disposal	
  oterms
                                                                            f	
  Unit	
  and
                                                                                          Delta
                                                                                             definitions
                                                                                                           	
     C. Modularity of Ontologies Designed for Reuse
The strategy is designed to guarantee
                                                                                                                      The reference ontologies within the SE suite are to be
(1) that salient reference ontology content is preserved in                                                       conceived as forming a set of plug-and-play ontology
     the new, inferred ontology in such a way that                                                                modules such as the Organization Ontology, Geospatial
(2) the latter can be used to semantically enhance newly                                                          Feature Ontology, Human Physical Characteristics
     added data very rapidly, and thereby                                                                         Ontology, Event Ontology, Improvised Explosive Device
                                                                                                                  Component Ontology, and so on. These modules need to be
(3) bring about the horizontal integration of these data with                                                     created at different levels of generality, with the architecture
     all remaining contents of the SSR.                                                                           of the higher level reference ontologies being preserved as
While ontology software has the capacity to support rapid                                                         we move down to lower levels.
ontology merger and consistency checking, we note that the                                                            Each module has its own coverage domain, and the
inferred application ontology that is generated may on first                                                      coverage domains for the more specific modules (for
pass fail to meet the local application needs. Thus, multiple                                                     example artillery vehicle, military engineering vehicle) are
iterations and investment of manual effort are needed.                                                            contained as parts within the coverage domains of the more
    Requiring that all inferred ontologies rest on reference                                                      general modules (for example vehicle, equipment). It is our
ontology content serves not only to ensure consistency, but                                                       intention that the full SE suite of ontologies will mimic the
also to bring about what we can think of as the                                                                   sort of hierarchical organization that we find in the Joint
normalization [23] of the evolving ontology suite. (This is in                                                    Doctrine Hierarchy [15], and our strategy for identifying
loose analogy with the process of normalization of a vector                                                       and demarcating modules will wherever possible follow the
space, where a basis of orthogonal unit vectors is chosen, in                                                     demarcations of Joint Doctrine. The goal is to specify a set
terms of which every vector in the whole space can be                                                             of levels of greater and lesser generality: for example
represented in a standard way.)                                                                                   Intelligence, Operations, Logistics, at one level; Army
                                                                                                                  Intelligence, Navy Intelligence, Airforce Intelligence, at the
                                                                                                                  next lower level; and so on. Ideally, the set of modules on
each level are non-redundant in the sense that (1) they deal      independently developed ontologies and terminology
with non-overlapping domains of entities and thus (2) do not      content, the incremental approach adopted here implies that
contain any terms in common. In this way the more general         mergers will be applied almost exclusively only (1) to the
content at higher levels is inherited by the lower levels and     content of reference ontologies developed according to a
thus does not need to be recreated anew. As the history of        common methodology and reviewed at every stage for
doctrine writing shows, drawing such demarcations and             mutual consistency and (2) to application ontology content
ensuring consistency of term use in each sibling domain on        developed by downward population from the evolving
any given level is by no means easy. Here, however, we will       ontology suite.
have the advantage that the ontology resource we are
                                                                  E. Creating Definitions
creating is not designed to serve as a terminology and
doctrine set for use by multiple distinct groups of                   The principle of single inheritance allows application of
warfighters. Rather, it is designed for use behind the scenes     a simple rule for formulating definitions of ontology terms,
for the specific purpose of data discovery and integration.       whereby all definitions are required to have the form:
Thus it is assumed that disciplinary specialists will continue                        an S = Def. a G which Ds
to use their local terminologies (and taxonomies) at the
point where source data is being collected, even while,           where ‘S’ (for: species) is the term to be defined, ‘G’ (for:
thanks to the intermediation of ontology annotation, they are     genus) is the immediate parent term of ‘S’ in the relevant SE
contributing to the common SSR. At the same time,                 asserted ontology, and ‘D’ (for: differentia) is the species-
community-specific terms will wherever possible be added          criterion, which specifies what it is about certain G’s which
to the SE ontology hierarchies as synonyms. This will             makes them S’s. (Note that this rule can be applied
contribute not only to the effectiveness of ontology review       consistently only in a context where every term to be
by SMEs but also to the applicability of NLP technology in        defined has exactly one asserted parent.)
support of automatic data annotation.                                 As more specific terms are defined through the addition
   Our goal is to build the SE ontology hierarchy in such a       of more detailed differentia, their definitions encapsulate the
way as to ensure non-redundancy by imposing the rule that,        taxonomic information relating the corresponding type
for each salient domain, one single reference ontology            within the SE ontology to the sequence of higher-level terms
module is developed for use throughout the hierarchy.             by which it is connected to the corresponding ontology root.
Creating non-redundant modules in this way is, we believe,        The task of formulating definitions thereby serves as a
indispensable if we are to counteract the tendency for            quality control check on the correctness of the constituent
separate groups of ontology developers to create new              hierarchies, just as awareness of the hierarchy assists in the
ontologies for each new purpose.                                  formulation of coherent definitions.
                                                                      A further requirement is that the definitions themselves
D. Benefits of Normalized Ontology Modules                        use (wherever possible) preferred labels which are taken
    The grounding in modular, hierarchically organized,           over from other ontologies within the SE suite. Where
non-redundant, asserted ontology modules brings a number          appropriate terms are missing, the SE registry serves as a
of significant benefits, of a sort which are being realized       feedback channel through which the corresponding need can
already in the biomedical ontology research referred to           be transmitted to those tasked with ontology maintenance.
above [3]. First, it creates an effective division of labor       The purpose of this requirement is to bring it about that the
among those involved in developing, maintaining and using         SE ontologies themselves will become incrementally linked
ontologies. In particular, it allows us to exploit the existing   together via logical relations in the way needed to ensure the
disciplinary division of knowledge and expertise among            horizontal integration of the data in the SSR that have been
specialists in the domains and subdomains served by the           annotated with their terms. And as more logical definitions
intelligence community. To ensure population of the               are added to the SE suite, the more its separate modules
ontologies in a consistent fashion, we are training selected      begin to act like a single, integrated network. All of this
SMEs from relevant disciplines in ontology development            brings further benefits, including:
and use; at the same time we are ensuring efficient feedback      • Lessons learned in experience developing and using one
between those who are using ontologies in annotating data           module can be easily propagated throughout the entire
and those who are maintaining the ontologies over time in           system.
order to assure effective update, including correction of gaps    • The value of training in ontology development in any
and errors.                                                         given domain module is increased, since the results of
    Second, it ensures that the suite of asserted ontologies is     such training can easily be re-applied in relation to other
easily surveyable: developers and users of ontologies can           modules.
easily discover where the preferred label equivalents of          • The incrementally expanding stock of available reference
given terms are to be found in the ontology hierarchy; they         ontology terms will help to make it progressively easier to
can also easily determine where new terms, or new                   create in an agile fashion new application ontologies for
branches, should be inserted into the SE suite. Thus, where         emerging domains.
familiar problems arise when mergers are attempted of
• The expanding set of logical definitions cross-linking the      Formal Ontology 2.0 (BFO), which has been thoroughly
  ontologies in the SE suite will mean that the use of            tested in multiple application areas [8, 24]. Its role is to
  ontology reasoners [22] for quality assurance of both           provide a framework that can serve as a starting point for
  asserted and inferred ontologies will become                    downward population in order to ensure consistent ontology
  progressively more effective. These same reasoners will         development at lower levels. Since almost all SE ontology
  then be able to be used to check the consistency of the         development is at the lower levels within the hierarchy,
  resultant annotations; and when inconsistencies are             BFO itself will in most cases be invisible to the user.
  detected, these can be flagged as being of potential                The Mid-Level Ontologies (MLOs) introduce
  significance to the intelligence analyst.                       successively less general and more detailed representations
                                                                  of types which arise in successively narrower domains until
      VI.     FROM DATA TO DECISIONS: AN EXAMPLE                  we reach the Lowest Level Ontologies (LLOs). These LLOs
    Suppose, for example, that analysts are faced with a large    are maximally specific representation of the entities in a
body of new data pertaining to activities of organizations        particular one-dimensional domain, as illustrated in Table 3.
involved in the financing of terrorism through drug                   Some MLOs are created by adding together LLO
trafficking. The data is presented to them in multiple            component modules, for example, the Person MLO may be
different formats, with multiple different types of labels        created by conjoining person-relevant ontology components
(acronyms, free text descriptions, alphanumeric identifiers)      from Table 3 such as: Person Name, Person Date, Hair
for the types of organizations and activities involved.           Color, Gender, and so on. More complex MLOs will involve
    To create a semantically enhanced and integrated version      the use of reasoners to generate ontologies incorporating
of these data for purposes of indexing and retrieval, analysts    inferred labels such as ‘Male Adult’, ‘Female Infant’, and so
and ontology developers can use as their starting point the       on, along the lines sketched in Section V.B above.
Organization Ontology which has already been populated
with many of the general terms they will need across the             Person Name (with types such as: FirstName, LastName, …)
entire domain of organizations, both military and non-               Hair Color (with types such as Grey, Blonde, … )
military, formal and informal, family- or tribe- or religion-        Military Role (with types such as: Soldier, Officer, …)
based, and so on. It will also contain the terms they need to
                                                                     Blood Type (with types: O, A, …)
define different kinds of member roles, organizational units
                                                                     Eye Color (with types: Blue, Grey, …)
and sub-units, chains of authority, and so on.
                                                                     Gender (with types: Male, Female, …)
    Adherence to the SE principles ensures that the
Organization Ontology has been developed in such a way as            Age Group (with types: Infant, Teenager, Adult, …)
to be interoperable, for example, with the Financial Event           Person Date (with types: BirthDate, DeathDate, …)
and Drug Trafficking Ontologies. Portions of each of these           Education History (with types: HighSchoolGraduation, …)
modules can thus be selected for merger in the creation of a         Education Date (with types: DateOfGraduation, …)
new, inferred ontology, which can rapidly be applied to              Criminal History (with types: FirstArrest, FirstProsecution, …)
annotation of the new drug-financed terrorism data, which            Citizenship (based on ISO 3166 Country Codes)
thereby becomes transformed from a mere collection of
separate data sources into a single searchable store                       Table 3. Examples of Lowest Level Ontologies (LLOs)
horizontally integrated within the SSR.
                                                                  Figure 3 illustrates the rough architecture of the resultant
  VII.      UPPER-, MID-AND LOWEST-LEVEL ONTOLOGIES               suite of SE ontologies on different levels, drawing on the
   The SE suite of ontologies is designed to serve                top-level architecture of Basic Formal Ontology.
horizontal integration. But, it depends also on what we can
now recognize as a vertical integration of asserted                                    VIII.      CONCLUSION
ontologies through the imposition of a hierarchy of ontology          In any contemporary operational environment, decision
levels. In general, the SE methodology requires that all          makers at all levels, from combatant commanders to
asserted ontologies are created via downward population           tactical-level team leaders, need timely information
from a common top-level ontology, which embodies the              pertaining to issues ranging from insurgent activity to
shared architecture for the entire suite of asserted ontologies   outbreaks of malaria and from key-leader engagements to
– an architecture that is automatically inherited by all          local elections. This requires the exploitation by analysts of
ontologies at lower levels.                                       a changing set of highly disparate databases and other
   Here, the level of an ontology is determined by the level      sources of information, whose horizontal integration will
of generality of the types in reality which its nodes             greatly facilitate this data to decision cycle.
represent. The Upper Level Ontology (ULO) in the SE                   The SE strategy is designed to create the resources
hierarchy must be maximally general – it must provide a           needed to support such integration incrementally, with
high-level domain-neutral representation of distinctions          thorough testing at each successive stage, and one of our
between objects and events, objects and attributes, roles,        current pilot projects is designed to identify the problems
locations, and so forth. For this purpose we select the Basic     which arise when the SE methodology is applied to support
collaboration across distinct intelligence agencies, including
exploring how independently developed legacy ontologies
can be incorporated into the framework.                           REFERENCES
                                                                  [1]      Chairman of the Joint Chiefs of Staff Instruction. J2 CJCSI 3340.02A.
                                                                  [2]      P. Hitzler, M. Krötzsch and S. Rudolph, Foundations of Semantic
                                                                           Web Technologies, Chapman & Hall, 2009.
                                                                  [3] Barry Smith, et al., “The OBO Foundry: Coordinated Evolution of
                                                                           Ontologies to Support Biomedical Data Integration”, Nature
                                                                           Biotechnology, 25 (11), November 2007, 1251–1255.
                                                                  [4] Fahim T. Imam, et al., “Development and use of Ontologies Inside
                                                                           the Neuroscience Information Framework: A Practical Approach”,
                                                                           Frontiers in Genetics, 2012; 3: 111.
                                                                  [5] Barry Smith, et al., “Ontology for the Intelligence Analyst”,
                                                                           Crosstalk: The Journal of Defense Software Engineering
                                                                           (forthcoming).
                                                                  [6] Distributed Common Ground System - Army (DCGS-A) What is it?
                                                                           Pentagon Army Posture Statement, 27 December 2011.
                                                                  [7] David Salmen, et al., “Integration of Intelligence Data through
                                                                           Semantic Enhancement”, Proceedings of the Conference on Semantic
                                                                           Technology in Intelligence, Defense and Security (STIDS), George
                                                                           Mason University, Fairfax, VA, November 16-17, 2011, CEUR, Vol.
                                                                           808, 6–13
                                                                  [8] Supplementary material on Semantic Enhancement:
                                                                           http://ncorwiki.buffalo.edu/index.php/Semantic_Enhancement	
  
                                                                  [9] http://ontolog.cim3.net/cgi-bin/wiki.pl?OpenOntologyRepository.	
  
                                                                  [10] Chris J. Mungall et al., “Cross-product extensions of the Gene
             Figure 3. Organization of asserted ontologies
                                                                           Ontology”, Journal of Biomedical Informatics 44 (2007), 80–86.	
  
    Our work on using SE ontologies for purposes of               [11] Douglas B. Lenat, “CYC: a large-scale investment in knowledge
annotation has been executed thus far both manually and                    infrastructure”, Communications of the ACM, 38 (11), 1995 33-38.	
  
                                                                  [12]	
   David P. Hill, et al., “Gene Ontology Annotations: What they mean
with NLP support. The results of this work have been found                 and where they come from”, BMC Bioinformatics, 2008; 9(Suppl 5):
useful to indexing and retrieval of large bodies of data in the            S2.
DSC Cloud store. In our next phase we will test its capacity      [13] Barry Smith and Werner Ceusters, “Ontological Realism as a
to support rapid creation of application ontologies to address             Methodology for Coordinated Evolution of Scientific Ontologies”,
emerging analyst needs. In a subsequent, and more                          Applied Ontology, 5 (2010), 139–188.
                                                                  [14] Joint Publication 1, Doctrine for the Armed Forces of the United
ambitious phase, we plan to explore the degree to which the                States, Chairman of the Joint Chiefs of Staff. Washington, DC. 20
idea of semantic enhancement can be truly transformative in                March 2009.
the sense that it will influence the way in which source data     [15] Joint Electronic Library: The Joint Publications.
are collected and stored. We believe that such an influence       [16] Z. Xiang, et al., “OntoFox: Web-Based Support for Ontology Reuse”,
would bring a series of positive consequences flowing from                 BMC Research Notes. 2010, 3:175.
the fact that the asserted ontologies will be focused             [17] Marianne Shaw, et al., “Generating Application Ontologies from
                                                                           Reference Ontologies”, Proceedings, American Medical Informatics
automatically upon (i.e. represent) the same entities in the               Association Fall Symposium, 2008, 672-676.
battlespace that the operators, analysts, and war-planners are    [18] James Malone and Helen Parkinson, “Reference and Application
concerned with, and they would treat these entities in the                 Ontologies.”
same intuitively organized way. Thus while at this stage all      [19] James F. Brinkley et al., “Project: Ontology Views.”	
  
SE ontologies are free of entanglements with specific source      [20] http://www.hegroup.org/ontoden/.	
  
data models, our vision for the future is that the success of     [21] J. Hur, et al., “Ontology-based Brucella vaccine literature indexing
                                                                           and systematic analysis of gene-vaccine association network”, BMC
the approach will provide ever stronger incentives for the                 Immunology 2011, 12:49	
  
use of SE ontologies already in the field. These incentives       [22] OWL 2 Reasoners,
will exist, because using such ontologies at the point of data             http://www.w3.org/2007/OWL/wiki/Implementations.
collection will guarantee efficient horizontal integration        [23] Rector, A. L. “Modularisation of Domain Ontologies Implemented in
with the contents of the SSR, thereby giving rise to a                     Description Logics and Related Formalisms including OWL”.
                                                                           Proceedings of the 2nd International Conference on Knowledge
network effect whereby not only the immediate utility of the               Capture, ACM, 2003, 121–128.
collected data will be increased, but so also will the value of   [24] Pierre Grenon and Barry Smith, “SNAP and SPAN: Towards
all existing data stored within the SSR.                                   Dynamic Spatial Ontology”, Spatial Cognition and Computation, 4: 1
                                                                           (March 2004), 69–103.