<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Contributions to a Semantically Based Intelligence Analysis Enterprise Workflow System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Robert C. Schrag</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jon Pastor</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chris Long</string-name>
          <email>clong@setcorp.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric Peterson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mark Cornwell</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lance A. Forbes</string-name>
          <email>lforbes@sms-fed.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephen Cannon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>by the U.S. Government. All authors were with Global InfoTek, Inc.</institution>
          ,
          <addr-line>1920 Association Dr, Suite 600, Reston, VA USA 20191, 703-652-1600</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>-We have contributed key elements of a semantically based intelligence analysis enterprise workflow architecture: a uniformly accessible semantic store conforming to an enterprisewide ontology; a branching context representation to organize workflow components' analytical hypotheses; a logic programming-based, forward-chaining query language for components to access data from the store; and a software toolkit embracing all the foregoing to streamline the process of introducing additional legacy software components as semantically interoperable workflow building blocks. We explain these contributions, focusing particularly on the toolkit. For certain widely used input/output formats-e.g., comma-separated value (CSV) files-a knowledgeable user can quickly “wrap” a newly installed component for workflow operation by providing a compact and entirely declarative specification that uses the query language to map specific relation arguments in the ontology to specific structural elements in the component's native input and output formats. Our contributions are built to work with AllegroGraph, from Franz, Inc.</p>
      </abstract>
      <kwd-group>
        <kwd>Intelligence analysis</kwd>
        <kwd>enterprise workflow</kwd>
        <kwd>hypothesis representation</kwd>
        <kwd>branching contexts</kwd>
        <kwd>semantic interoperability</kwd>
        <kwd>declarative data transformation</kwd>
        <kwd>software component wrapping</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>WE have contributed key elements of a semantically based
intelligence analysis enterprise workflow architecture for
Tangram, a multi-year, multi-contractor threat surveillance and
alerting research and development program sponsored by the
United States’ Intelligence Advanced Research Projects
Agency (IARPA). Tangram’s objective has been to automate
routine analysis workflows, so that these can be executed as
standing processes, on a large scale.</p>
      <p>To support the rapidly changing needs of an intelligence
enterprise, a workflow authoring tool must be extremely
flexible. The enterprise must be able to rearrange components
(e.g., pattern matchers, classifiers, group detectors) in the same
kind of way that a child rearranges Lego bricks. They must be
able to introduce new software into the enterprise rapidly.
However, Lego bricks have a distinct advantage over legacy
software components from different source: they were all
created to respect a common interface. One brute-force
approach to integrating legacy components is to manually
develop code that transforms data from one form (e.g., Java
objects) to another (e.g., flat files); that requires O(n2)
transforms. Tangram’s approach reduces the required number
of transforms to O(n), and our toolkit enables knowledgeable
users to “wrap” legacy components with such transforms,
making the components workflow-ready quickly.</p>
      <p>To motivate our contributions, we present the (notional,
simplified) two-component workflow in Fig. 1: a suspicion
scorer hypothesizes potential terrorists, then a group detector
clusters the hypothesized terrorists into hypothesized potential
terrorist groups.</p>
      <p>Suspicion Scoring Component</p>
      <p>
        The workflow in Fig. 1 raises some enterprise-level
architecture issues that our contributions address.
1) What are components’ input and output data, how is data
stored, and how do components access it? We have
introduced a uniformly accessible semantic store
conforming to an enterprise-wide ontology and a logic
programming-based, forward-chaining query language for
components to access data from the store. Component
specifications (see Issue 3 below) indicate what data is
accessed in particular.
2) How are the hypotheses that analytical components
produce distinguished from background data, and how are
they communicated among components? As hypotheses,
analytical components’ outputs must not simply be mixed
indiscriminately with more uniformly credible evidence
data or with each other. Among other considerations, the
broad body of evidence changes over time (leading to
different hypotheses), and different components—or
3)
different (e.g., control) configurations thereof can lead to
different hypotheses even for the same inputs. We
organize the content of the semantic store into distinct
RDF graphs that we call “datasets,” and (correlating
datasets with contexts) represent the outputs of
successively applied analytical components as branching
contexts (that incrementally add information). Our
component specifications and our query language thus
include parameters for the datasets that are passed among
or otherwise accessed by components. Besides these
datasets for hypotheses, the store includes one or more
background, or “evidence,” datasets and for convenience
some intermediate (i.e., not necessarily hypothetical)
datasets that result from purely logical queries. This
treatment of evidence and hypotheses, together with the
above-mentioned query language, provide a practical
implemented solution to meet broad Tangram
requirements outlined in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>How can legacy components with arbitrary input/output
formats easily be made to interact with the data? The
contributions above are integrated in a software toolkit to
streamline the process of introducing additional legacy
software components as semantically interoperable
workflow building blocks. For certain widely used
input/output formats—e.g., comma-separated value (CSV)
files—a knowledgeable user can quickly wrap a newly
installed component for workflow operation by providing
a compact and entirely declarative specification that uses
the query language to map specific relation arguments in
the ontology to specific structural elements in the
component’s native input and output formats. The toolkit
also provides some less fully automated interface options
to address more general input/output situations.</p>
      <p>II. ARCHITECTURAL SCHEME OF A WORKFLOW COMPONENT
3) Invoke the legacy component in its “native” (unwrapped)
form.
4) Convert the legacy component’s native-format outputs to
the common ontology, as metadata-bearing hypotheses.
5) Assert the output hypotheses to the central store.</p>
      <p>We implement the central semantic store using
AllegroGraph from Franz, Inc. AllegroGraph is a “quad” store
that includes, in addition to the “subject,” “predicate,” and
“object” fields standard to RDF and common to triple stores, a
“graph” field. We use this field to distinguish among the
various datasets that are available as inputs or have been
produced as outputs of workflow components.</p>
      <p>We provide a knowledge base (KB) query language
supporting a wrapped component’s query and assertion
processes and allowing users to define, for specific analytical
purposes, KB query components (including no legacy process)
that combine elements from one or more existing datasets into
one or more output datasets. We implement legacy component
wrappers and KB query components using the Prolog and
Common Lisp interfaces to AllegroGraph.</p>
      <p>Fig. 3 illustrates the meta-data classes (noted in bold) and
attributes (with multi-valued attributes starred*) that support
the representation of a dataset’s context lineage. We take each
workflow component’s execution, noted in a ProcessExecution
(PE) object, as the source of the statements in any output
(hypothesis) dataset; lineage is manifested in the connections
among datasets, process executions, and workflow executions
(noted in WorkflowExecution objects).</p>
      <p>WorkflowExecution</p>
      <p>hasProcessExecution*
ProcessExecution
hasProcess (e.g., GDA)
hasPEDatasetInput*
hasPEDatasetOutput*
hasPEControlInput*</p>
      <p>ProcessExecutionDatasetInput
hasParameterName (consistent with Process)
hasInputDataset
ProcessExecutionDatasetOutput
hasParameterName (consistent with Process)
hasOutputDataset
ProcessExecutionControlInput
hasParameterName
hasValue</p>
      <p>As noted in Section I, the interpretation of datasets as a
context is incremental along its lineage: in general any
statement that holds in a dataset that is upstream
(workflowwise) from a given dataset D created during a workflow also
(implicitly) holds in D. The representation is thus
spaceefficient. We have not yet found it necessary to implement
such transitivity of dataset contexts directly in the KB query
language; our current workflow components use just
background (evidence) datasets and datasets that their
immediate workflow predecessors create.</p>
      <p>Fig. 4 presents a use case workflow including both a
wrapped legacy component and a KB query component.</p>
      <p>In Fig. 4, datasets (graphs) are depicted by square-cornered
boxes; workflow components are depicted by round-cornered
boxes. Each component reads data from one or more input
graphs and writes to one or more output graphs. Here, a
dataset join KB query component is used to select from
broader evidence (right) just information relevant to
watchlisted terrorist suspects (left) for processing by a
downstream legacy group detection component.</p>
      <p>In our toolkit, the defining forms for workflow components
are Lisp macro calls. Beyond providing one or more files
containing such definitions, ToolKit users need never interact
directly with Lisp or with AllegroGraph, as we provide
alternative interfaces.</p>
      <p>IV. KB QUERY COMPONENTS AND QUERY LANGUAGE
The definition for the KB query component used in Fig. 4
appears below.
(defKB-query-component</p>
      <p>group-detection-watchlist-evidence-dataset-join-component
((and (q- ?Event !rdf:type !teo:TwoWayCommunicationEvent
evidenceGraph)
(q- ?Event !teo:sender ?sender ?evidenceGraph)
(q- ?Event !teo:receiver ?receiver ?evidenceGraph)
(q- ?sender !rdf:type !teo:Person ?evidenceGraph)
(q- ?receiver !rdf:type !teo:Person ?evidenceGraph)
(q- ?sender !rdf:type !teo:Person ?watchlistGraph)
(q- ?receiver !rdf:type !teo:Person ?watchlistGraph)
(a- ?Event !rdf:type !teo:TwoWayCommunicationEvent</p>
      <p>?linkGraph)
(a- ?Event !teo:deliberateActor ?sender ?linkGraph)
(a- ?Event !teo:deliberateActor ?receiver ?linkGraph)
(a-- ?sender !rdf:type !teo:Person ?linkGraph)
(a-- ?receiver !rdf:type !teo:Person ?linkGraph))))</p>
      <p>The above component selects events from one dataset
(denoted by the logic variable ?evidenceGraph) whose
participants also appear in another dataset (denoted by
?watchlistGraph) and asserts the links among them in an
output dataset (represented by the logic variable ?linkGraph)
for consumption by a group detection component. Note the
following.</p>
      <p>• This component performs a single KB query that
implicitly conjoins (logically) the twelve top-level (q-, a-, and
a--) forms.</p>
      <p>• A q- conjunct succeeds iff a triple (in subject, predicate,
object, graph, index—“spogi”—format) exists in the workflow
KB. q- is included in the standard Franz Allegro Prolog
interface to AllegroGraph.</p>
      <p>• a- indicates that a triple is to be written to the specified
output dataset. An a- conjunct always succeeds. a- and its
duplicate-avoiding twin a-- (below) are our contributions that
confer the KB query language’s forward chaining character.
• a-- indicates that a triple is to be written to the workflow
KB iff it is not already present there. An a-- conjunct
always succeeds.
• !rdf:type is an example of a shorthand that expands to
http://www.w3.org/1999/02/22-rdf-syntax-ns#type — the
atom type in the namespace for RDF. (!teo: refers to an
application-specific ontology.)
• ?Event, ?sender, and other symbols beginning with ? are
logic programming (AKA Prolog) variables. In the logic
programming style we support, every logic variable
becomes bound when the q- conjunct is matched in the
KB.
• Prolog will backtrack to execute each conjunct in the KB
query for every combination of variable bindings for
which the preceding conjuncts succeed.
• The KB query language provides a variety of additional
constructs (e.g., and, or, not) in which the usual
expressions that appear as top-level conjuncts may be
embedded—e.g.,
(and (not (q- ?P !rdf:type !teo:Terrorist ?evidenceGraph))
(or (q- ?P1 !rdf:type !teo:Terrorist ?evidenceGraph)</p>
      <p>(q- ?P2 !rdf:type !teo:Terrorist ?evidenceGraph))).
• While the repetition of entity type statements—e.g.,
(a-- ?sender !rdf:type !teo:Person ?linkGraph)
—from the input graph is not strictly necessary given our
context interpretation, the Tangram contractors agreed
that it would be convenient to include such declarations
uniformly in all datasets.</p>
      <p>Below are the definitions for some utility KB query
components that we provide with the toolkit distribution.
(defKB-query-component 2-input-dataset-union-component
(DataUnionProcess)
((query (q- ?S ?P ?O ?sourceGraph1)</p>
      <p>(a- ?S ?P ?O ?destGraph))
(query (q- ?S ?P ?O ?sourceGraph2)</p>
      <p>(a- ?S ?P ?O ?destGraph))))
(defKB-query-component 3-input-dataset-intersection-component
(DataIntersectionProcess)
((query (q- ?S ?P ?O ?sourceGraph1)
(q- ?S ?P ?O ?sourceGraph2)
(q- ?S ?P ?O ?sourceGraph3)
(a- ?S ?P ?O ?destGraph))))
(defKB-query-component dataset-de-duplication-component ()
((query (q- ?S ?P ?O ?sourceGraph)</p>
      <p>(a-- ?S ?P ?O ?destGraph))))</p>
      <p>The (first) dataset union component writes everything it
finds in either of its source graphs into its destination graph;
the (second) intersection component writes anything it finds in
all of its sources into the destination. A workflow author may
choose to follow either of these up with the (third) dataset
deduplication component to remove duplicates; note that the
author could achieve the same effect by using a-- rather than
aconjuncts in the union components’ definitions.</p>
      <p>Existing Tangram workflow and process infrastructure
required that we specify the fixed (e.g., two-input) arities for
the components above. This might not be the case in every
workflow setting of interest (see Section VIII). Likewise, it
might not be necessary to name (or permanently
componentize) every query before it can be used.</p>
    </sec>
    <sec id="sec-2">
      <title>WRAPPED LEGACY COMPONENTS</title>
      <p>Toolkit users define wrappers for legacy/native components
using the Lisp macro defWrapped-component, which affords
a choice among three distinct interfaces.
Non-Lispprogramming ToolKit users will want to use one of the first
two interfaces described below; Lisp-programming users are
most likely to use the first or third.
1) Fully automatic: defWrapped-component writes a
commaseparated value (CSV) or other delimited text file (to be
consumed by the native component) for each input dataset
and automatically reads a delimited text file (produced by
the native component) for each output dataset. For native
components with delimited text file-oriented input/output,
the ToolKit user need provide no additional wrapping
code.
2) Semi-automatic: defWrapped-component automatically
writes an ntriples file for each input dataset and
automatically reads an ntriples file for each output dataset.</p>
      <p>The ToolKit user provides additional (presumably
nonLisp), shell-callable wrapping code as necessary to
mediate between these ntriples files and the native
component.
3) Manual: The ToolKit user provides, via an additional
argument to defWrapped-component, custom Lisp code to
implement the required native component interface. Here
we assume that the Lisp programmer will interact directly
with AllegroGraph to create suitable inputs for the native
component.</p>
      <p>In the sequel, we focus primarily on the fully automatic
interface.</p>
      <p>
        Consider the GDA group detection algorithm [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] from
CMU’s Auton Lab), which uses CSV input and output files as
shown in Fig. 5. The group detector uses event-based linkages
among individuals to infer groups of associating individuals.
Each input line indicates evidence that a certain event involves
a certain individual. Each output line indicates that a certain
individual is hypothesized to belong to a certain group.
      </p>
      <p>Native GDA Input:
Ev-1194,In-10381
Ev-709,In-15840
Ev-709,In-36232
Ev-38749,In-4938
Ev-38749,In-48834
Ev-34121,In-3007
Ev-34121,In-35214
Ev-65474,In-21371
Ev-65474,In-19354
Ev-23484,In-39017
Ev-23484,In-16809
…
Native GDA Output:
group,entity
G0,In-10096
G0,In-15840
G0,In-19354
G0,In-19540
G0,In-19625
G0,In-21371
G0,In-28719
G0,In-37201
G0,In-37733
G0,In-38634
G0,In-47910
G1,In-1002
…</p>
      <p>Below is a toolkit-based component definition that invokes
the automatic CSV file interface to wrap GDA. The
(completely declarative) definition specifies that
GDAcomponent-TerroristGroup is an instance of the class
GroupDetectionProcess (see [9]). The (keyword) argument
:native-input-CSV-file-specs specifies the relation of the input
CSV file (to be named "GDA-input-links.csv") to the input
dataset (bound to the Prolog variable ?linkGraph).1 Note that
the separating character may be specified, using the
:textdelimiter argument, and the presence of a headerline via the
:headerline argument. The argument
:native-output-CSV-filespecs specifies the relation of the output CSV file (to be
named "GDA-output-groups.csv") to the output dataset (bound
to ?outputGraph). The remaining top-level arguments specify
how to invoke the native component. Further explanation
follows the definition.</p>
      <p>(defWrapped-component GDA-component-TerroristGroup
(GroupDetectionProcess)
:native-input-CSV-file-specs
(("GDA-input-links.csv"
:query
(query
(q- ?E !teo:deliberateActor ?P ?linkGraph))
:query-type select
:headerline nil
:text-delimiter ","
:query-template (?E ?P)))
:native-output-CSV-file-specs
(("GDA-output-groups.csv"
:query
(query
(a- ?G !teo:orgMember ?P ?outputGraph)
(a-- ?G !rdf:type !teo:TerroristGroup ?outputGraph)
(a-- ?P !rdf:type !teo:Terrorist ?outputGraph))
:headerline t
:CSV-template (?G ?P)
:namespace-template
("http://anchor/teo#" "http://anchor/teo#")))
:native-component-directory "GDA_DISTRIBUTION"
:native-component-command-name "gda_applic"
:native-component-command-arguments
("GDA-output-groups.csv" "GDA-input-links.csv"))
1 The full interface supports any number of native input and of native
output delimited text files and corresponding datasets/graphs.
:CSV-template argument), instantiating the template and
binding query variables. Again, the template indicates the
order of each bound Prolog variable in each line of the CSV
?linkGraph) file. Note the final template instantiation step that inserts</p>
      <p>First, we execute the input query against the input dataset
(graph). At top right, Fig. 6 illustrates how the query’s single
(general) conjunct is first specifically instantiated, binding the
conjunct’s variables to values for which a triple exists in the
input graph. The :query-template argument specifies how the
query’s bound variable values should be ordered in the CSV
file. At bottom, Fig. 6 illustrates the intermediate step of
instantiating the query template, based on the instantiated
query conjunct. At left, Fig. 6 shows how we generate one
CSV file line per query instantiation.2 (Note that the RDF
namespace, !teo:, is removed, as it is not useful to the native
component.)</p>
      <p>Fig. 7 illustrates how the native component is (next) invoked
by the workflow execution system. Execution takes place in a
temporary directory specific to the given workflow and
component instance.</p>
      <p>Directory: Command-name: Command-arguments:
$GU_CORE/GDA_DISTRIBUTION gda_applic GDA-output-groups.csv GDA-input-links.csv</p>
      <p>Fig. 8 illustrates how the :native-output-CSV-file-specs
argument is (next) processed.
Native GDA Output File:
group,entity
G0,In-10096
G0,In-15840
G0,In-19354
Gen. (a-- ?G !rdf:type !teo:TerroristGroup ?outputGraph) GG00,,IInn--1199652450
Inst. (a-- !teo:G0 !rdf:type !teo:TerroristGroup ?outputGraph) GG00,,IInn--2218377119
G0,In-37201
G0,In-37733
Gen. (a-- ?P !rdf:type !teo:Terrorist ?outputGraph) GG00,,IInn--3487693140
Inst. (a-- !teo:In-10096 !rdf:type !teo:Terrorist ?outputGraph) G1,In-1002
…
General CSV / Query Template: (?G ?P)</p>
      <p>Instantiated CSV Template: (G0 In-10381)</p>
      <p>Instantiated Query Template: (!teo:G0 !teo:In-10381)</p>
      <p>The process is here roughly the reverse of that in Fig. 6. At
bottom, Fig. 8 illustrates how we first interpret each line of the
output CSV file (at right) using the template specified (via the
2 This is per the value select specified for the :query-type argument,
which indicates that duplicate links (useful to GDA) are to be retained in the
input dataset. By instead using the (default) value select-distinct, the
user may alternatively specify one line per unique query instantiation (thus
removing duplicates).</p>
      <p>VI. CONCEIVED FULL AUTOMATION FOR COMPONENTS WITH</p>
      <p>XML INPUT/OUTPUT FILES</p>
      <p>
        While delimited text input/output formats are quite
prevalent, they are by no means the only structured formats of
interest. We have also designed (not yet implemented) a
similar, declaratively-specified wrapping capability for
components with XML file input/output. The general idea is
to embed a similar query specification into the XML file where
data is to be read or written. Another alternative on the input
side (only) would be integration of Xpath and Xquery with
logic programming. (See [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for a recent survey.)
      </p>
    </sec>
    <sec id="sec-3">
      <title>VII. THE WRAPPING PROCESS</title>
      <p>The toolkit’s comprehensive documentation (available from
the first author) details the following steps included in the
endto-end process of wrapping and then deploying components.
1) Install the wrapping toolkit.
2) Install the native component so that it will be accessible to
the wrapper.
3) Define any KB query component(s) needed to select
appropriate data from any broader dataset(s).
4) Define the wrapper for the native component.
5) Test both KB query and wrapped native components to
ensure effective operation. We have developed and
applied a testing framework that includes component
concurrency (i.e., re-entrance) testing.
6) Deploy the developed and tested components.</p>
      <p>These steps may of course be undertaken by different
classes of users. E.g., in a component wrapping team (of
which an enterprise may have several), one member (the
“installer”) may be primarily responsible for software
installations; another (the “developer”) may be expert with the
enterprise’s ontology, workflows, and datasets, the KB query
language, and the component defining forms; still another (the
“tester”) may primarily have testing and another (perhaps the
“installer” again) deployment responsibilities. “Scripters”
might write custom Lisp wrapping code or shell scripts or
other command line-callable programs to perform data
transformations not (yet) supported by toolkit (semi-)
automation.</p>
      <p>For each component to be wrapped, the wrapping team also
should include, or at least have access to, a component
“champion” who knows what enterprise function(s) the
component must accomplish and understands how the
component works well enough to address any wrapping issues
(e.g., whether duplicate assertions are or are not appropriate,
what native component control parameters are appropriate).
The champion should bring one or more exemplary use cases
(preferably expressed in terms of the enterprise’s datasets and
ontology) and should help the wrapping team realize the use
case(s) in component (and workflow) definitions.3</p>
      <p>Finally, the component wrapping team always should be
able to present new requirements to the toolkit development
team (who may serve multiple enterprises).</p>
      <p>
        We developed the toolkit during roughly six months of
concentrated effort, to serve both the broader Tangram
community and ourselves. Starting with the use case presented
in Section III, we developed first the KB query language and
KB query components, then progressively more automatic
interfaces with which we wrapped GDA (initially). We also
have used (or assisted others to use) the toolkit to wrap the
ORA group detection algorithm, suspicion scorers based on
the Proximity [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and NetKit [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] classifiers, and the pattern
matchers LAW [9] and CADRE [8].
      </p>
      <p>We have met the Tangram program’s toolkit usability goals:
as knowledgeable users, we can usually (for components with
inputs/outputs amenable to the toolkit’s fully automatic
interface) complete Steps 3 and 4 of the above wrapping
process within a single staff hour.</p>
      <p>VIII. RELAXING THE CONTEXT MONOTONICITY ASSUMPTION</p>
      <p>Implicit in the semantics of current Tangram workflow
processing is the following monotonicity assumption: A
component’s output graph(s) only add(s), logically, to the
information in its input graph(s), never delete(s) or retract(s).
This is not entirely practical.</p>
      <p>The need to manage potentially conflicting source
information and analytic hypotheses is ubiquitous in an
intelligence analysis enterprise. An analyst, surrounded with
data and applicable tools or methods, may choose to pursue
one line of reasoning at one time and another later, and
different analysts may take different approaches and may build
on each other’s analyses or workflow products. Each such
approach—a combination of data, tools, methods, and earlier
hypotheses—represents a context for analytical reasoning. It
is important within the enterprise for each analyst to
understand the actual context of each piece of information that
s/he might examine and exploit in further analysis—in which
s/he may either extend an existing context or branch to create a
new subcontext.</p>
      <p>Different contexts may arise in workflow-supported
analytical reasoning for different reasons, including:
• Differences in supporting data, from:
o Conflicting original data sources.
o Time-varying data conditions for a given source, such
as:
3 Consider that a champion may also bring a new data source that may
require extensions or other modifications to the enterprise ontology.
Addressing such issues has been the responsibility of a different Tangram
contractor.
Disbelief in something we earlier had belief in
(perhaps because it had been supplied in error).</p>
      <p>Belief in something we did not have belief in
(perhaps because we had no data about it).
• Differences in supporting analytical hypotheses, from:
o Analyst’s conjecture, or “what-if” analysis (that may
effect belief or disbelief in data as discussed above).
o Differences in workflow components giving rise to
different answers, when:</p>
      <p>A given workflow function has alternative
realizations in different components.</p>
      <p>A given component has alternative
configurations of control parameters.</p>
      <p>We have commenced efforts to address these issues both
formally and with appropriate workflow system infrastructure.</p>
      <p>IX. CONTRIBUTIONS’ RELEVANCE BEYOND TANGRAM
The use case workflow in Section III includes a generic
“Group Detection Component.” While we’ve noted (in
Section V) that GDA-component-TerroristGroup is an instance
of the class GroupDetectionProcess, we haven’t said anything
yet about how such a specific component instance is selected
from among the available alternatives for such a general
process class. Beyond enabling semantic interoperability of
enterprise workflow components, IARPA’s broader objectives
in Tangram have included providing technology for
characterizing, for a given generic workflow process, the likely
performance of a given specific component with data inputs
having certain characteristics, so that the workflow
management system can select the component likely to
perform best in any given circumstance. Our toolkit supports
this objective by automating the formal description and
registration of newly defined components in Tangram’s
process catalog [9].</p>
      <p>It’s worth noting that all of the toolkit’s other
heretoforedescribed capabilities remain applicable in the (perhaps more
pragmatic) setting where users specify particular components
for all workflows themselves.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Almendros-Jiménez</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Becerra-Terón</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Enciso-Baños</surname>
            ,
            <given-names>F. J.</given-names>
          </string-name>
          :
          <article-title>Querying XML documents in logic programming</article-title>
          ,
          <source>Theory Pract. Log. Program. 8</source>
          ,
          <issue>3</issue>
          (May.
          <year>2008</year>
          ),
          <fpage>323</fpage>
          -
          <lpage>361</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Carley</surname>
            ,
            <given-names>K. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dereno</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <string-name>
            <surname>ORA-Organizational</surname>
            <given-names>Risk</given-names>
          </string-name>
          <string-name>
            <surname>Analyzer</surname>
          </string-name>
          .
          <source>Tech. rep. CMU-ISRI-06-113</source>
          , Carnegie Mellon University,
          <year>August 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Kubica</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <article-title>Tractable group detection on large link data sets</article-title>
          ,
          <source>Third IEEE International Conference on Data Mining (ICDM-2003)</source>
          , pp.
          <fpage>573</fpage>
          -
          <lpage>576</lpage>
          ,
          <fpage>19</fpage>
          -
          <lpage>22</lpage>
          Nov. 2003
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Macskassy</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Provost</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <string-name>
            <surname>NetKit-SRL</surname>
          </string-name>
          :
          <article-title>A Toolkit for Network Learning and Inference</article-title>
          ,
          <source>In Proceedings of the NAACSOS Conference</source>
          ,
          <year>June 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Murray</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harrison</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lowrance</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomere</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolverton</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>PHERL: an Emerging Representation Language for Patterns, Hypotheses, and Evidence</article-title>
          ,
          <source>in Proceedings of the AAAI Workshop on Link Analysis</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Neville</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Dependency networks for relational data</article-title>
          .
          <source>In Proceedings of the 4th IEEE International Conference on Data Mining</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Pioch</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hunter</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Fournelle</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Washburn</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bostwick</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kao</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Graham</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Allen,
          <string-name>
            <given-names>T.</given-names>
            ;
            <surname>Dunn</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.:</surname>
          </string-name>
          <article-title>CADRE: continuous analysis and discovery from relational evidence</article-title>
          ,
          <source>International Conference on Integration of Knowledge Intensive MultiAgent Systems</source>
          ,
          <year>2003</year>
          . pp.
          <fpage>555</fpage>
          -
          <lpage>561</lpage>
          , 30 Sept.-4
          <string-name>
            <surname>Oct</surname>
          </string-name>
          .
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Wolverton</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berry</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harrison</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lowrance</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruspini</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomere</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>LAW: A Workbench for Approximate Pattern Matching in Relational Data</article-title>
          .
          <source>In Proceedings of the Fifteenth Innovative Applications of Artificial Intelligence Conference (IAAI-03)</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Wolverton</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harrison</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomere</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A Process Catalog for Workflow Generation</article-title>
          , in The Semantic Web-7th
          <source>International Semantic Web Conference</source>
          , Springer, vol.
          <volume>5318</volume>
          /
          <year>2008</year>
          , pp.
          <fpage>833</fpage>
          -
          <lpage>846</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>