IAO-Intel An Ontology of Information Artifacts in the Intelligence Domain Barry Smith Tatiana Malyuta Ron Rudnicki William Mandrick David Salmen University at Buffalo CUNY, NY, USA CUBRC, Buffalo Data Tactics Data Tactics NY, USA Data Tactics, McLean, VA NY, USA McLean, VA, USA McLean, VA, USA Peter Morosoff Danielle K. Duff James Schoening Kesny Parent E-Maps, Inc. I2WD I2WD I2WD Washington, DC, USA Aberdeen, MD, USA Aberdeen, MD, USA Aberdeen, MD, USA Abstract—We describe on-going work on IAO-Intel, an integration (for example in the case of registries of persons of information artifact ontology developed as part of a suite of interest) are all too familiar. Increasingly, however, it is ontologies designed to support the needs of the US Army recognized that there is the need for a unified approach to intelligence community within the framework of the Distributed description and classification of information resources (see for Common Ground System (DCGS-A). IAO-Intel provides a example [3], [4]), and the DoD has recognized at an official controlled, structured vocabulary for the consistent formulation level that, to advance discoverability and analysis in the age of of metadata about documents, images, emails and other carriers Big (military) Data, new approaches are needed that can of information. It will provide a resource for uniform explication enable computational retrieval, integration and processing of of the terms used in multiple existing military dictionaries, data. Thus Directive 8320.02 [5], the latest version of which is thesauri and metadata registries, thereby enhancing the degree to which the content formulated with their aid will be available to dated August 5, 2013, requires all authoritative DoD data computational reasoning. sources to be registered in the DoD Data Services Environment (DSE) [6]. It further requires that all salient Keywords—ontology; information artifacts; military doctrine; metadata be discoverable, searchable, retrievable, and intelligence analysis; interoperability; data services environment understandable: Data, information, and IT services will be considered under- I. BACKGROUND standable when authorized users are able to consume them and Standardization of terminology has been important from the when users can readily determine how those assets may be used very beginning of organized warfare. Imagine the Chinese for specific needs. Data standards and specifications that require trying to pass reports down the Great Wall using fire beacons associated semantic and structural metadata, including without standardization of the signals used. In the vocabularies, taxonomies, and ontologies, will be published in Revolutionary War, General Washington directed Friedrich the DSE, or in a registry that is federated with the DSE. Wilhelm von Steuben to write the drill manual for the We shall return to the DSE below. First, we present our own Continental Army [1] so that all units would use and respond strategy for realizing these important goals. uniformly to the same commands. II. THE INFORMATION ARTIFACT ONTOLOGY In our own era, DoD has directed development and use of the DoD Dictionary of Military and Associated Terms (Joint The Information Artifact Ontology (IAO) was originally Publication 1-02) as the paramount terminological standard for conceived in 2008 as part of an effort to master the Big Data military operations [2]. JP 1-02 helps to enable joint warfare accumulating in the wake of the Human Genome Project in by (a) advancing consistency in communications and (b) the context of biological research [7]. Its goal was to aid the facilitating consistent interpretation of commands. Military consistent description of biological data emanating from dictionaries and related terminology artifacts continue to be multiple heterogeneous sources. The goal of IAO-Intel is developed, addressing these and a series of additional aims, in- analogous: it is to provide common resources for the cluding: (c) compiling lessons learned (outcomes assessment); consistent description of information artifacts of relevance to (d) providing controlled vocabularies for official reporting; the intelligence community in a way that will allow discovery, and (e) enhancing discoverability and analysis of data. integration and analysis of intelligence data from both official and non-official sources. Such artifacts have until recently been conceived by analogy with traditional free-text dictionaries published in When biomedical informaticians work with databases, forms designed to maximize utility to human beings. Most publications and records generated by experimental research existing doctrinal and related lexica and thesauri not only or medical care they focus primarily on what these artifacts provide little aid to computation, they also suffer from the fact describe (for example on the genes or proteins which form the that multiple such resources have been (and continue to be) subject matters of a given journal publication, or on the developed independently, in divergent and often non- symptoms or diseases reported in a given clinical note). principled ways. The result is that identical data may be Similarly, when intelligence analysts work with source data classified and described entirely differently by different artifacts, then they, too, focus primarily on what the data in agencies, and the consequences of the resultant failures of these artifacts describe, for example on the military units STIDS 2013 Proceedings Page 33 whose movements are recorded in a given shipping report, or which involve a particular creator, or a particular type of on the vulnerabilities of a given forward operations base as intelligence report, or a particular type of weblink, or have described in some force protection assessment. been declassified under the authority of a particular agency, or are operative within a given time window. But while the primary focus concerns in both cases the topic or subject of the artifacts in question, both also require a Importantly, IAO-Intel is not designed to replace existing secondary focus, targeted to the artifacts themselves, through doctrinal or other standards created to guide human beings or which information about these topics is conveyed. Such computer applications in the creation and description of artifacts have attributes – including format, purpose, evidence, documents in accordance with defined formats or document provenance, operational relevance, security markings – data architectures. Rather, its purpose is to allow the results of concerning which (often called   ‘metadata’)   is   vital   to   the   using such standards to generate the needed metadata in a effective exploitation of the reports, images, or signals uniform, non-redundant and algorithmically processable documents with which the analyst has to deal. fashion. Moreover, the broad scope of IAO-Intel means that the metadata generated in relation to official documents will The dichotomy between focus on entities in the world and be of a piece with the metadata incrementally accumulating in focus on the information artifacts in which these entities are relation to all information artifacts of relevance to the IC – the represented is fundamental to the work reported here. IAO metadata will consist, in every case, of annotations to IAs relates precisely to the objects of this secondary focus. An formulated in ontology terms drawn not only from IAO-Intel information artifact (IA), as we conceive it, is an entity that but from the entire suite of DSGS-A ontology modules. has been created through some deliberate act or acts by one or more human beings, and which endures through time, Thus while using existing standards for human or potentially in multiple (for example digital or printed) copies. computer-aided creation or description of IAs does indeed IAO thus deals with information in the forms it takes when it allow us to retrieve data pertaining to IAs prepared in has been deliberately fixed in some medium in such a way as accordance with these standards, for IAs of other sorts the to become accessible to multiple subjects. Examples are: a existing approach will fail. Only an ontology-based approach diagram on a sheet of paper, a video file, a map on a computer along the lines here proposed can, we believe, demonstrate the monitor, an article in a newspaper, a message on a network, sort of flexibility and consistent expandability which are the output of some querying process in a computer memory. needed  in  today’s  dynamic  and data-rich environments. III. GOAL OF IAO-INTEL IV. EXPLICATION AND ANNOTATION The goal of IAO-Intel is to support the effective handling of Currently a draft version of IAO-Intel is being applied data concerning those attributes of IAs that are relevant to the within  the  framework  of  the  US  Army’s  Distributed Common purposes of intelligence analysis. To describe such attributes Ground System (DCGS-A) Standard Cloud (DSC) initiative as coherently we need to distinguish: part of a strategy for the horizontal integration of warfighter intelligence data [9]. Two sorts of application are currently – the particular information artifact of interest, tied to some being used to enable the ontology to support computer-aided particular physical information bearer: the photographic retrieval and analytics. First, is explication of general terms image on this piece of paper retrieved from this enemy used in source intelligence artifacts and in data models, combatant; the email created by this particular author on this terminologies and doctrinal publications which provide typo- specific laptop; the target list compiled for this particular logies of intelligence-related IAs. Second, is the annotation of artillery unit on this particular date; the instance-level information captured by such IAs. – the copyable information content that is carried by the Explication is performed by providing definitions of such artifact in question. The photographic image may be printed general terms using the resources of IAO-Intel and of the out in multiple paper copies; the email or target list may be domain ontologies (such as Agent or Event ontologies) being transmitted to multiple further recipients. The information developed within the DSGS-A framework. Annotation is content that is copied or transmitted thereby remains in each performed by associating ontology terms with data about part- case one and the same. icular persons, events, or places in given information artifacts. IAO-Intel provides ontology terms relating both to official documents and to non-official (source) artifacts. It provides TABLE 1. SAMPLE TYPES AND SUBTYPES OF INFORMATION ARTIFACTS also a set of relations to be used when we wish to represent the IAO IAO-Intel (examples) fact that, say, IA #12345 is-about some given person, or uses- Report Intelligence Report (FM 6-99.2, 126) symbols-from some specified symbology, or links-to some second IA #56789, and so forth, Summary Electronic Warfare Mission Summary (FM 6-99.2, 87) Diagram Network Analysis Diagram (from JP 2-01.3, II-51) IAO-Intel is designed from the start to provide the needed Overlay Combined Information Overlay (JP 2-01.3, II 33) supplement in a way that will create semantic interoperability Assess- Assessment of Impact of Damage (FM 6-99.2, 53) of data retrieved from different types of sources through an ment incremental process of semantic enhancement as described in Estimate Adversary Course of Action Estimate [8], [9] and [10]. It is designed to allow automatic retrieval of all documents in a given collection of heterogeneous sources List List of High-Value Targets (JP 2-01.3, II 61) STIDS 2013 Proceedings Page 34 Order Airspace Control Order (FM 6-99.2, 17) annotated using different standard terminology resources. To Matrix Target Value Matrix (JP 2-01.3, II-63) bring this about, the constituent terms of such resources will be explicated using terms from IAO-Intel so that the artificial Template Ground and Air Adversary Template (JP 2-01.3, II-57) composite terms used in certain official terminologies and The goal of explication is to ensure that the data captured exchange model resources (along the lines of in annotations is semantically enhanced in a way that enables ‘VehicleInspectionJurisdictionAuthorityText’) will be broken computational integration and reasoning along the lines down logically into constituent elements. This will provide a described in [11], [12]. The goal of annotation is to aid means to avoid the combinatoric explosion that is threatened retrieval of information about specific persons, groups, events, by traditional approaches. Some composite expressions – for documents, images, and so forth, where this information is example  ‘Essential  Element  of  Friendly  Information  (EEFI)’  – conveyed through source documents using disjointed and will indeed be included in pre-composed form in the IAO-Intel disparate systems for designation. ontology, but only where they are either defined in doctrine or already established as part of relevant SME vocabularies. V. STRATEGY FOR BUILDING IAO-INTEL The modeling task for which compounds such as ‘VehicleInspectionJurisdictionAuthorityText’   were   designed   Our strategy for building IAO-Intel is to extend the draft is addressed in our framework by allowing single data entries IAO to include terms and definitions tailored for the intelli- to be annotated by multiple ontology terms (sometimes linked gence domain and specifically for the needs of our DSGS-A by appropriate relations). A record in one of the tables ontology initiative. The strategy has the following parts. containing data about an IED can be annotated, for example, both   with   ‘IED   Event’   (based   on   its   aboutness)   and   with   First, IAO-Intel is created by downward population from ‘EEFI’   (based   on   its   importance).   A particular plan for the the draft IAO reference ontology. That is, the highest level Intelligence Preparation of the Battlefield can be annotated as terms of IAO-Intel are defined as specializations of terms from being at the same time a Plan (based on its purpose), a IAO along the lines illustrated in Table 1. The coverage do- Government Document (based on its source), a Report on Air main of IAO-Intel will be determined incrementally on the ba- Defenses (based on its aboutness). It can be annotated also sis of requests from analysts and other SME communities and through relations, for example through located-at linking the through incorporation of terms from doctrinal publications and source of the plan to some city or building and linking the relevant high-level data models and document classifications. planned air defenses to some region of interest. Second, we use these sources to identify the dimensions of Currently, military terminology resources generally fail to attributes along which IAs will be annotated. The selected follow established best practice principles for the formulation dimensions are constructed in such a way as to be orthogonal of definitions. For example, they often confuse terms referring in the sense in which, for example, color is orthogonal to to components of information artifacts with terms referring to shape – thus ontology branches built to represent different the entities in reality which those information artifacts are dimensions of attributes will contain no terms in common. about.  The  “WTI  Improvised  Explosive  Device”  Glossary,  for   This will enable these branches to be structured following the example, defines Method of Emplacement as: principle of single inheritance (thus as true hierarchies) [13]. The description of where the [improvised explosive] device was Third, we create low-level ontology modules (LLOs) delivered, used or employed. corresponding to each of these orthogonal dimensions. LLOs are small single-dimension attribute lists or shallow Similarly the DCGS-A Logical Data Model defines Cover- hierarchies designed to advance ease of maintenance and Concealment as: surveyability of the ontology and to provide a growing set of information about geographical features that provide protection simple component terms which can be used: from attack or observation. 1. to construct more complex terms, both terms for inclusion in IAO-Intel, and terms to be used to generate inferred Use of IAO-Intel in tandem with corresponding domain classifications in application ontologies created for specific ontologies allows us to explicate CoverConcealment (properly local purposes, along the lines described in [10]; so-called) as: 2. to define the terms of the IAO-Intel ontology and of its a geographic feature which has-role CoverRole, sister ontologies within the DSGS-A framework; and to explicate CoverConcealmentInformation as: 3. to explicate the meanings of terms standardly used by different agencies, or by different groups of SMEs, or by IA which is-about CoverConcealment, different existing and future systems to describe such where CoverRole is defined as: artifacts in a logically consistent way that is designed to allow integration of data and enhanced analytics; the Role acquired by a given geographic feature when it is used to provide protection from attack or observation. 4. to annotate instance data pertaining to particular information artifacts used by the intelligence community – VI. MAINTAINING AND EVALUATING IAO-INTEL for   instance   analysts’   reports;;   harvested   emails;;   signals   data; and so forth. To maintain the IAO-Intel term collection over time we will create feedback links to enable users of the ontology to The goal is that IAO-Intel should support integration of data request new terms and to report errors. We are also working STIDS 2013 Proceedings Page 35 on an objective validation process which will enable us to – Information Quality Entity (IQE). An IQE is the pattern on determine how requested terms should be treated, an IBE in virtue of which it is a bearer of some information. distinguishing options such as: 1. incorporation into IAO-Intel – Information Structure Entity (ISE). An ISE is a structural or into some associated reference ontology, 2. incorporation part of an ICE; speaking metaphorically, it is an ICE with into an application ontology maintained for some local the content removed: for example an empty cell in a spread- purpose, 3. being marked as a synonym of some existing sheet; a blank Microsoft Word file. ISEs thus capture part of ontology term. what is involved  when  we  talk  about  the  ‘format’  of  an  IA. We are identifying, and where necessary constructing de novo, the domain ontologies that will need to be used in the The term ‘information  artifact’ can now be used to refer either definition of complex terms, and defining the relations that 1. to some combination of ICEs and ISEs (roughly: the IA as will link IAO-Intel terms with terms in these domain body of copyable information content); or 2. to some ontologies. These ontologies, too, will be extended over time concretization of ICEs and ISEs in some IBE in which some on the basis of input from users. IQE inheres (the information artifact is: this content here and now, on this specific computer screen or this printed page). We are also testing a series of objective criteria to be used Different information artifact types will differ in different in evaluation of IAO-Intel and other DCGS-A ontologies, ways along these dimensions, as illustrated in Table 2. starting with simple numerical measures of (a) term requests received and dealt with, and (b) uses of terms in definitions, explications and annotations. IAO-Intel will allow us to keep BFO: BFO: BFO: track of the number of information artifacts that make Independent Generically Specifically reference to individuals falling under a given class, and these Continuant Dependent Dependent metrics too can be used to assess the relative importance of Continuant Continuant this class within the ontology framework taken as a whole. While not definitive, such measures will help guide our judgments concerning the content and structure both of IAO- Information Information Information Information Intel and of its associated domain ontologies. Quality Entity Bearing Entity Content Entity Structure Entity (Pattern) VII. ORGANIZATION OF IAO-INTEL (IBE) (ICE) (ISE) (IQE) Given the importance of the dichotomy between primary Figure 1. Continuants in the IAO framework (topic) and secondary (artifact) focus, a central role in IAO- Intel is played by what we call VIII. IAO AND THE BASIC FORMAL ONTOLOGY  Information Content Entities (ICEs) are about something Figure 1 shows how IAO and IAO-Intel are being built to in reality (they have this something as a subject; they conform to Basic Formal Ontology (BFO), the upper-level represent, or mention or describe this something; they architecture used in the DSGS-A ontologies [14]. IBEs are, in inform us about this something). Aboutness may be BFO terms, independent continuants (they are entities made of identifiable from different perspectives. Thus one analyst physical matter). An IBE is a physical entity that is created or may interpret a given ICE as being about the geography modified to serve as bearer of certain patterned arrangements of a given encampment; another may view it as providing – for example of ink or other chemicals, of electromagnetic information about the morale of those encamped there. excitations. An IQE is a quality of an IBE which exists in virtue of such patterned arrangements and which is All major classes of information artifacts involve ICEs – interpretable as an ICE or ISE. Such an IQE is created when simply because all major classes of information artifacts are some physical artifact is deliberately created or modified to about something. A plan of action, for example, is about a support it (patterned to serve as its bearer). IQEs are certain group of persons and goals and the types and ordering BFO:specifically dependent continuants (SDCs) – entities of actions that will be used to realize these goals. Even a which require some specific physical bearer but which are not document that has been written in code will be assumed by an themselves physical. Each IBE and IQE is restricted at any analyst to be about something (for what, otherwise, would be given time to some specific location in space. (If you display the reason for its creation?). Typically, an information artifact the same digital image twice on your desktop, then there are such as a copy of a newspaper will be associated with multiple two IQEs on your desktop, which are – at some level of ICEs at successive levels of granularity, including separate granularity – indistinguishable copies of each other. articles within the newspaper, separate sentences within these articles, and so on. ICEs and ISEs, in contrast, are what BFO calls generically dependent continuants or GDCs. This means that they are In addition to ICEs, we distinguish also: entities – such as a pdf file or an email – which can be copied – Information Bearing Entity (IBE). An IBE is a material from one physical bearer to another and thus may exist entity that has been created to serve as a bearer of simultaneously in multiple different IQEs, which are called information. IBEs are either (1) self-sufficient material ‘concretizations’   of   the   corresponding   GDC.   Each   GDC   is   wholes, or (2) proper material parts of such wholes. concretized by at least one specific IQE inhering for example Examples under (1) are: a hard drive, a paper printout (e.g., in the tiny piles of ink on the piece of paper in your pocket or a report); and under (2): a specific sector on a hard drive, a in differentially excited pixels on your screen. When the GDC single page of a paper printout. STIDS 2013 Proceedings Page 36 is copied, then a new IQE is created on a new physical Note that we do not assume that all portions of IAO-Intel information bearer, as when a new pattern of characters is will be of equal utility in applications for the IC. We do, created on the screen of the recipient of an email. This second however, believe that to achieve clarity of explication in the pattern is a copy of the pattern created on the screen of the treatment of source data artifacts will require clear definitions sender. The GDC itself exists simultaneously both at its of the upper-level terms in the IAO, and a clear understanding original site and at the site to which it has been transmitted. of the relations between them. GDCs can thus be multiply located. TABLE 2: DIMENSIONS OF INFORMATION ARTIFACTS (IAS) BFO relations between ICEs, ISEs, IQEs and IBEs can be set forth as follows: Information IBE ISE ICE Artifact ICE generically-depends-on IBE Hard drive ISE generically-depends-on IBE MS Word file MS Word Varies (magnetized (.doc, .docx) format IQE specifically-depends-on IBE sector) ICE concretized-by IQE Hard drive XML V 2.0 Varies XML file (magnetized ISE concretized-by IQE sector) format IAO contains in addition relations which allow us to Hard drive MS Excel 2010 MS Excel 2010 Varies formulate metadata concerning attributes of IAs such as file (.xls, .xlsx) (magnetized format author, creation date, classification status, and so forth, and to sector) annotate also components of IAs such as the To- and Hard drive FromAddress components of email headers. The ToAddress of KML file (magnetized KML Map overlay email message m, for example, is defined as: sector) a collection of at least one email addresses of the intended reci- Hard drive pients of m, each with at most one optionally associated name. JPEG file (.jpg) (magnetized JPEG format Image sector) The set of relations can be extended to include also relations Internet Message involving documents, document parts and document Email file (with Hard drive Format (e.g., Message collections, such as retrieved-from, curated by, and so forth. embedded (magnetized RFC 5322 attachments sector) compliant) When we consider examples such as those provided in A specific Table 2, then it becomes clear that, when IAO-Intel is applied USMTF Message Message government USMTF Format to the explication of terms involved in describing instance- file network data relating to real-world IAs, then multiple artifacts may need to be distinguished. Consider, for example, a pdf file Paper Name, document; ID formats, Personal data, stored on some specific laptop. When we address what is Passport (may include security marking Passport meant by the (copyable) content of this file, then we recognize photographs, formats  … number, Visas that this content may be copied in multiple ways, for example: RFID tags) … to a pdf file using the same version of the Acrobat software Official paper and on the same operating system, to a pdf file using a Title Deed Varies Varies document different version of the Acrobat software, using characters Report Varies Varies Varies from the same or a different character set, by being printed out on a piece of paper, and so on. The annotation of instance data MIL-STD-2525 with information of this sort may be important for example in Overlay Sheet Symbols; FM investigating the provenance of given information artifacts ( e.g. Map 101-1-5 Acetate sheet Map overlay which lie at the end of long chains of copying and processing Overlay Sheet – Operational see Figure 2) Terms and involving multiple authors and computer systems. One Graphics potential application of IAO-Intel is to the systematic annotation of data pertaining to such chains. IX. ATTRIBUTES OF INFORMATION ARTIFACTS Matters are complicated further when we go deeper into the question of how IAs are stored inside the computer. Given Information artifacts have attributes along a number of a generically dependent continuant which is the pdf file stored distinct dimensions, treated in LLO modules of the IAO. in the hard drive on some given laptop, there is a specifically Terms in these modules will be applied to explicate dependent IQE which is (roughly) the pattern of 1s and 0s in information relating to IAs of different types, and to annotate the magnetic coating of the hard drive. When the entirety of data pertaining to IA instances with the help of relations this pdf file is displayed on your screen, then there is a further mentioned above. Some dimensions of IA attributes are specifically dependent IQE which is the corresponding pattern common to all areas, both military and non-military, of pixels on your screen. Both of these IQEs are concretiza- including: Purpose, Lifecycle Stage (draft, finished version, tions of a corresponding GDC. revision); Language, Format, Provenance, Source (person, organization), and so forth. STIDS 2013 Proceedings Page 37 Along the dimension of Purpose we distinguish: operational environment, such as obstacles restricting military movement, key geography, and military objectives. x Descriptive purpose: scientific paper, newspaper article, after-action report x Prescriptive purpose: legal code, license, statement of rules of engagement x Directive purpose (of specifying a plan or method for achieving something): instruction, manual, protocol x Designative purpose: a registry of members of an organization, a phone book, a database linking proper names of persons with their social security numbers whereby it should be stressed that one and the same IA may of course serve multiple purposes. As is shown in Table 3 IAO-Intel will include additional LLOs relating to attributes of importance to the intelligence domain such as: Classification, Encryption Status, Encryption Strength, and so forth. IAO-Intel will also include terms representing specific IA Purposes such as: informing the commander, providing targeting support, intelligence preparation of the battlefield. TABLE 3. DIMENSIONS OF INFORMATION ARTIFACT ATTRIBUTES Role in the Intelligence Process (JP 3-0, III-11) Priority Intelligence Requirement (PIR) Commander’s  Critical  Information  Requirement  (CCIR) Essential Element of Information (EEI) Essential Element of Friendly Information (EEFI) Confidence Level (JP 2.0, Appendix A) Figure 2: Modified Combined Obstacle Overlay (example IA#1) Highly Likely Unlikely Likely Highly Unlikely We assume that IA#1 has been prepared as part of some given Even Chance plan, IA#2. Both IAs #1 and #2 will then be referred to in Discipline (JP 2.0, I-5) Intelligence multiple further IAs including multiple databases compiled Legal Signal during planning, execution and outcomes assessment. Ideology Human Relevant terms used in the data models associated with these Religion Rumor intelligence data models will have been explicated using terms from IAO- Propaganda Web intelligence Intel. The latter terms can then be used along the lines Intelligence Excellence (JP 2.0, II-6) described in [9] to create annotations to both #1 and #2 on the Anticipatory Complete basis of the fact that they are referred to in the databases in Timely Relevant question. The results will include, for example: Accurate Objective a) annotations to the attributes of IA#1: Usable Available  ICE: MCOO  IBE: Acetate Sheet Table 3 illustrates fragments of some of the dimensional hierarchies specific to IAO-Intel, with their doctrinal sources.  uses-symbology MIL-STD-2525C  authored-by person #4644 X. EXAMPLES OF USE OF IAO-INTEL IN ANNOTATION  part-of plan IA#2 As should by now be clear, IAO-Intel relates not merely to b) annotations relating to the aboutness of IA#1 textual documents but to information artifacts of all types including maps, videos, photographic images, websites,  Avenue of Approach databases, and so forth, both unstructured source documents  Strategic Defense Belt and official documents of many different varieties. Consider,  Amphibious Operations the Modified Combined Obstacle Overlay (MCOO), taken  Objective from JP 2-01.3 [15] and illustrated in Figure 2. (We refer to this as example IA#1 in what follows.) An MCOO is defined and so forth. Used in conjunction with the skill ontology and as: the person database the annotations above will enable a planner to retrieve (for example) all MCOOs relating to A joint intelligence preparation of the operational environment product used to portray the militarily significant aspects of the amphibious operations authored by persons with certain skills. STIDS 2013 Proceedings Page 38 Consider, as a second example, a collection of documents describing (A) chemicals (properties, costs, manufacture, prepared according to FM 6-99.2 [16], for example of types: transport, supply, and so forth), and (B) explosives manufacture (raw materials, persons and skills involved, Intelligence Report [INTREP] processes and equipment and safety measures used). We will Intelligence Summary [INTSUM] have satisfied Directive 8322.20 in maximizing discoverability Logistics Situation Report [LOGSITREP] if we annotate each body of data in accordance with Operations Summary [OPSUM] corresponding term repositories, which we can assume to have Patrol Report [PATROLREP] been independently developed. Suppose now, however, that we are called upon to integrate the data in (A) with the data in Reconnaissance Exploitation Report [RECCEXREP] (B). Here these annotations will likely provide no assistance, SAEDA Report [SAEDAREP] which will in turn lead to calls for the creation of a third term repository to be used in efforts to annotate the combined (AB) Suppose further that we need to cross-reference these with data. The results of these efforts will then once again likely comparable sets of documents prepared by other commands, provide no assistance when (AB) data itself needs to be and that we need to do this in such a way as to extract and integrated with, say, data about explosives financing. process the information computationally. FM 6-99.2 provides definitions of the mentioned report types, but does not take the Where, in contrast, the systems for annotating (A) and (B) step of formulating these definitions computationally. IAO- reflect a common ontological approach, then new annotation Intel addresses this problem by providing a common, resources for the merged data can be easily be developed by algorithmically useful, set of ontology terms that is designed reusing the initially developed ontologies in the formulation of to allow consistent explication of these and related types as both composite terms and corresponding definitions [10]. they appear in different doctrinal resources. The results can then be used for computer-aided aggregation of the data A further problem is that the need to create new represented using corresponding IA types, cross-checking of terminology resources for the annotation of such merged mismatches, and so forth. content may lead to the need for corrections of the initial terminology resources. Such corrections may have expensive XI. THE DOD DATA SERVICES ENVIRONMENT consequences: either they will break interoperability with the We can now return to Directive 8320.02 and address the results of earlier annotation efforts, or – if resources are relevance of the work reported above to its successful invested to correct already existing annotations to make them implementation. As   we   saw,   the   Directive   requires   that   ‘all   conform to the new usage – they will have unforeseen salient   metadata   be   discoverable,   searchable,   and   retrievable’   consequences for third parties who have been relying on the through use of the DoD Data Services Environment (DSE) [6]. older resources to be maintained consistently through time. DSE’s numerous data sources include 35   ‘supporting   Such problems are minimized where terminology resources taxonomies’ derived from pre-existing terminology resources. are developed in tandem from the very start as parts of a single Problems arise, however, because the latter have been suite of ontology modules developed using common constructed on the basis of multiple distinct methodologies principles, exactly as is proposed by our DSGS-A strategy. (for example as concerns the formulation of definitions). We believe that only a strategy of this sort can satisfy the When, on August 25, 2013, the DSE was queried for requirement that data, information, and IT services are  ‘made information   on   “location”,   the   DSE   reported   660   possibly   visible, accessible, understandable, trusted, and interoperable relevant sources of information. When the DSE was queried throughout  their  lifecycles  for  all  authorized  users.’ [5] for  “unit  types,”  882 possibly relevant sources of information XII. SEMANTIC TECHNOLOGY IS NOT ENOUGH were  reported.  When  types  of  “ground  vehicles”  were  queried   for, 175 possible relevant sources of information were The strategy underlying DSE has much in common with a reported. Such redundancies present obstacles to discovery, strategy adopted widely in the semantic technology search and retrieval. They arise because different compilers of community under the heading of Linked Open Data, a strategy authoritative data describe entities of the same types in often involving the use of the Dublin Core Metadata Element heterogeneous ways. This thwarts the sort of coherent Set as controlled vocabulary. We believe that the Dublin Core integration that is required for the mounting of what, in [6], we can serve as reliable controlled vocabulary for describing IA referred  to  as  the  “massing  of  intelligence  fires”. data only where the information artifacts in question are themselves artifacts formulated using RDF or some other One problem is that while the terms in thesauri and W3C recommended syntax, and unfortunately this is not the glossaries can be used in annotations, the value derived case for many of the artifacts at issue here. We believe further therefrom is limited above all because they do not allow the that the Linked Data approaches cannot solve the problems of benefits of inferencing and of rapid introduction and definition silo-formulation in the IC for the results outlined already in of new terms which are provided by a framework of well- section XI above. The semantic technology community draws constructed ontologies along the lines described in [10]. There a distinction between two levels of interoperability: Level 1, we show how reference ontologies can be quickly expanded resting on shared term definitions (for example drawn from with new content to meet emerging data representation needs the Dublin Core), and Level 2, of what is called Formal and in such a way that data annotated with the newly added Semantic Interoperability. As is recognized at [17], Level 1 is terms is automatically integrated with existing data. ‘so open-ended that it quickly leads to a proliferation of Imagine, for example that we have two large bodies of data custom-built solutions incompatible with each other, such as STIDS 2013 Proceedings Page 39 metadata expressed in document formats that require years in the domain of biomedical informatics, and is customized software to read and data models that cannot gradually being adopted also in other domains, including for easily be mapped to generic, interoperable representations example the domain of modeling and simulation, where the such as those expressed in RDF.’ Level 2 is designed to solve identifying authoritative data sources is needed to ensure these problems by requiring that all IAs are described via realistic scenarios [18]. One principal feature of the strategy is metadata formulated using RDF. Unfortunately RDF (or even that it provides a standard means for defining new ontologies OWL) is no panacea. Multiple conflicting ontologies can be in light of emerging needs, in a way that guarantees formulated in RDF terms, yet still remain conflicting. consistency with the ontologies already created and with the data annotated in their terms. We believe that this feature The solution, again, must rely on shared development of a makes the strategy particularly useful in addressing the emerg- single suite of modularized ontologies, in which not only the ing challenges to the intelligence analyst in accordance with same formal language is used, but also consistent definitions DoD directives concerning discovery, retrieval and search. populating downward from a common upper level such as BFO – and we note in this connection a parallel with the way ACKNOWLEDGMENTS in which joint doctrine is elaborated, in a process that is Work on IAO-Intel was supported by I2WD. Thanks are due designed to ensure (at least ideally) that the same term is also to Mathias Brochhausen, Werner Ceusters, Mélanie defined and used consistently across the 80 plus Joint Courtot, Janna Hastings, James Malone, Bjoern Peters, Publications (JPs) that address the various aspects of joint Jonathan Rees, and Alan Ruttenberg for their work on IAO. REFERENCES [1] Friedrich Wilhelm von Steuben, Regulations for the order and discipline IAO:Report of the troops of the United States, 1792, http://x.co/1dJEk. [2] Department of Defense Dictionary of Military and Associated Terms, Intelligence 2013, http://www.dtic.mil/doctrine/new_pubs/jp1_02.pdf. report [3] Leo Obrst, Patrick Cassidy, “The   need   for ontologies: Bridging the barriers of terminology and data structure”,   Geological   Society   of   America Special Paper 482, 2011. Geospatial Human Measurement and [4] Leo Obrst, Terry Janssen, Werner Ceusters (eds.), Ontologies and Semantic Technologies for the Intelligence Community. Amsterdam: intelligence intelligence signals intelligence IOS Press, 2010. report report report [5] Sharing Data, Information, and Information Technology (IT) Services in the Department of Defense, DoD Instruction 8320.02, August 5, 2013, Human geospatial Signals Measurement http://www.dtic.mil/whs/directives/corres/pdf/832002p.pdf. intelligence intelligence intelligence [6] DSE Data Services Environment, https://metadata.ces.mil/dse. report report report [7] https://code.google.com/p/information-artifact-ontology. [8] David Salmen, Tatiana Malyuta, Alan Hansen, Shaun Cronen, Barry The above IAO-Intel terms are defined by using terms from the Smith,   “Integration   of   intelligence   data   through   Semantic   Enhance- ontologies below with the help of relations such as is-about, ment”,   Proceedings of the Conference on Semantic Technology in created-by, derives-from and so forth [7]. Intelligence, Defense and Security (STIDS), 2011, CEUR 808, pp. 6–13. [9] Barry Smith, Tatiana Malyuta, David Salmen, William Mandrick, Kesny Parent,  Shouvik  Bardhan,  Jamie  Johnson,  “Ontology  for  the  Intelligence Geospatial feature IA source Analyst”,   CrossTalk:   The   Journal   of   Defense   Software   Engineering,   November/December 2012, pp. 18–25. Person Intel discipline [10] Barry Smith, Tatiana Malyuta, William S. Mandrick, Chia Fu, Kesny Parent,   Milan   Patel,   “Horizontal   integration   of   warfighter   intelligence   Signal IA classification data.   A   shared   semantic   resource   for   the   Intelligence   Community”,   measurement Proceedings of STIDS Conference, 2012 (CEUR 996), pp. 112–119. [11] Ron Rudnicki, Werner Ceusters, Shahid Manzoor and Barry Smith, “What   particulars   are   referred   to   in   EHR   data?”,   American   Medical   Figure 3. Top: Terms from IAO (unfilled) and IAO-Intel (grey) ontologies. Informatics Association 2007 Annual Symposium, 2007, pp. 630–634. Taxonomical hierarchies: asserted – solid lines, inferred – dashed lines. Bottom left: Domain ontologies. Bottom Right: IAO-Intel LLOs. [12] Ron   Rudnicki,   “DCGS-A   Ontology   Program   Explication   Procedures”,   MS, 2013. warfare in accordance with JP 1-02 [2]. [13] Barry   Smith   and   Werner   Ceusters,   “Ontological   Realism   as   a   methodology for coordinated evolution of scientific ontologies”,   Applied Ontology, 5 (2010), pp. 139–188. XIII. CONCLUSION [14] Basic Formal Ontology 2.0, http://ontology.buffalo.edu/BFO/Reference. To summarize: IAO-Intel forms part of a collection of [15] Joint Publication 2-01.3 Joint Intelligence Preparation of the Operational ontologies that is being applied primarily to the explication of Environment, 16 June 2009. data models and other terminology resources of importance to [16] U.S. Army Report and Message Formats (FM 6-99.2), April 2007, DCGS-A. The terms in these ontologies are linked together http://armypubs.army.mil/doctrine/DR_pubs/dr_a/pdf/fm6_99x2.pdf. logically in virtue of the fact that each ontology uses terms [17] Dublin Core User Guide, Last modified September 6, 2011, which are defined in terms of other ontologies belonging to http://wiki.dublincore.org/index.php/User_Guide. this same suite (as illustrated in Figure 3). This strategy for [18] Saikouy   Diallo,   Jose   Padilla,   “Military   Interoperability   Challenges”,   Handbook on Real-World Applications in Modeling and Simulation, ontology development has been tested in use over several Wiley, 2012, pp. 298–332. STIDS 2013 Proceedings Page 40