=Paper=
{{Paper
|id=Vol-555/paper-4
|storemode=property
|title=Contributions to a Semantically Based Intelligence Analysis Enterprise Workflow System
|pdfUrl=https://ceur-ws.org/Vol-555/paper4.pdf
|volume=Vol-555
}}
==Contributions to a Semantically Based Intelligence Analysis Enterprise Workflow System==
1
Contributions to a Semantically Based
Intelligence Analysis Enterprise Workflow
System
Robert C. Schrag, Jon Pastor, Chris Long, Eric Peterson, Mark Cornwell, Lance A. Forbes, and
Stephen Cannon
To support the rapidly changing needs of an intelligence
Abstract—We have contributed key elements of a semantically enterprise, a workflow authoring tool must be extremely
based intelligence analysis enterprise workflow architecture: a flexible. The enterprise must be able to rearrange components
uniformly accessible semantic store conforming to an enterprise- (e.g., pattern matchers, classifiers, group detectors) in the same
wide ontology; a branching context representation to organize
workflow components’ analytical hypotheses; a logic
kind of way that a child rearranges Lego bricks. They must be
programming-based, forward-chaining query language for able to introduce new software into the enterprise rapidly.
components to access data from the store; and a software toolkit However, Lego bricks have a distinct advantage over legacy
embracing all the foregoing to streamline the process of software components from different source: they were all
introducing additional legacy software components as created to respect a common interface. One brute-force
semantically interoperable workflow building blocks. approach to integrating legacy components is to manually
We explain these contributions, focusing particularly on the
develop code that transforms data from one form (e.g., Java
toolkit. For certain widely used input/output formats—e.g.,
comma-separated value (CSV) files—a knowledgeable user can objects) to another (e.g., flat files); that requires O(n2)
quickly “wrap” a newly installed component for workflow transforms. Tangram’s approach reduces the required number
operation by providing a compact and entirely declarative of transforms to O(n), and our toolkit enables knowledgeable
specification that uses the query language to map specific relation users to “wrap” legacy components with such transforms,
arguments in the ontology to specific structural elements in the making the components workflow-ready quickly.
component’s native input and output formats.
To motivate our contributions, we present the (notional,
Our contributions are built to work with AllegroGraph, from
Franz, Inc. simplified) two-component workflow in Fig. 1: a suspicion
scorer hypothesizes potential terrorists, then a group detector
Index Terms—Intelligence analysis, enterprise workflow, clusters the hypothesized terrorists into hypothesized potential
hypothesis representation, branching contexts, semantic terrorist groups.
interoperability, declarative data transformation, software
Suspicion Scoring Component Group Detection Component
component wrapping
Fig. 1 A notional intelligence analysis workflow
I. INTRODUCTION The workflow in Fig. 1 raises some enterprise-level
architecture issues that our contributions address.
WE have contributed key elements of a semantically based 1) What are components’ input and output data, how is data
intelligence analysis enterprise workflow architecture for stored, and how do components access it? We have
Tangram, a multi-year, multi-contractor threat surveillance and introduced a uniformly accessible semantic store
alerting research and development program sponsored by the conforming to an enterprise-wide ontology and a logic
United States’ Intelligence Advanced Research Projects programming-based, forward-chaining query language for
Agency (IARPA). Tangram’s objective has been to automate components to access data from the store. Component
routine analysis workflows, so that these can be executed as specifications (see Issue 3 below) indicate what data is
standing processes, on a large scale. accessed in particular.
2) How are the hypotheses that analytical components
Manuscript submitted August 19, 2009. This work was supported in part produce distinguished from background data, and how are
by the U.S. Government. they communicated among components? As hypotheses,
All authors were with Global InfoTek, Inc., 1920 Association Dr, Suite analytical components’ outputs must not simply be mixed
600, Reston, VA USA 20191, 703-652-1600, (e-mail:
firstinitialLastname@globalinfotek.com). indiscriminately with more uniformly credible evidence
C. Long is now with SET Corp., Arlington, VA, 703-738-6214 (email: data or with each other. Among other considerations, the
clong@setcorp.com). broad body of evidence changes over time (leading to
L. A. Forbes is now with Solutions Made Simple, Inc., Reston, VA (email:
lforbes@sms-fed.com). different hypotheses), and different components—or
2
different (e.g., control) configurations thereof can lead to 3) Invoke the legacy component in its “native” (unwrapped)
different hypotheses even for the same inputs. We form.
organize the content of the semantic store into distinct 4) Convert the legacy component’s native-format outputs to
RDF graphs that we call “datasets,” and (correlating the common ontology, as metadata-bearing hypotheses.
datasets with contexts) represent the outputs of 5) Assert the output hypotheses to the central store.
successively applied analytical components as branching We implement the central semantic store using
contexts (that incrementally add information). Our AllegroGraph from Franz, Inc. AllegroGraph is a “quad” store
component specifications and our query language thus that includes, in addition to the “subject,” “predicate,” and
include parameters for the datasets that are passed among “object” fields standard to RDF and common to triple stores, a
or otherwise accessed by components. Besides these “graph” field. We use this field to distinguish among the
datasets for hypotheses, the store includes one or more various datasets that are available as inputs or have been
background, or “evidence,” datasets and for convenience produced as outputs of workflow components.
some intermediate (i.e., not necessarily hypothetical) We provide a knowledge base (KB) query language
datasets that result from purely logical queries. This supporting a wrapped component’s query and assertion
treatment of evidence and hypotheses, together with the processes and allowing users to define, for specific analytical
above-mentioned query language, provide a practical purposes, KB query components (including no legacy process)
implemented solution to meet broad Tangram that combine elements from one or more existing datasets into
requirements outlined in [6]. one or more output datasets. We implement legacy component
3) How can legacy components with arbitrary input/output wrappers and KB query components using the Prolog and
formats easily be made to interact with the data? The Common Lisp interfaces to AllegroGraph.
contributions above are integrated in a software toolkit to Fig. 3 illustrates the meta-data classes (noted in bold) and
streamline the process of introducing additional legacy attributes (with multi-valued attributes starred*) that support
software components as semantically interoperable the representation of a dataset’s context lineage. We take each
workflow building blocks. For certain widely used workflow component’s execution, noted in a ProcessExecution
input/output formats—e.g., comma-separated value (CSV) (PE) object, as the source of the statements in any output
files—a knowledgeable user can quickly wrap a newly (hypothesis) dataset; lineage is manifested in the connections
installed component for workflow operation by providing among datasets, process executions, and workflow executions
a compact and entirely declarative specification that uses (noted in WorkflowExecution objects).
the query language to map specific relation arguments in ProcessExecutionDatasetInput
WorkflowExecution
the ontology to specific structural elements in the hasProcessExecution* hasParameterName (consistent with Process)
hasInputDataset
component’s native input and output formats. The toolkit
also provides some less fully automated interface options ProcessExecution
hasProcess (e.g., GDA) ProcessExecutionDatasetOutput
to address more general input/output situations. hasPEDatasetInput* hasParameterName (consistent with Process)
hasPEDatasetOutput* hasOutputDataset
hasPEControlInput*
II. ARCHITECTURAL SCHEME OF A WORKFLOW COMPONENT ProcessExecutionControlInput
hasParameterName
Fig. 2 presents our general scheme for wrapping legacy hasValue
components.
Fig. 3 Meta-data classes and attributes for hypothesis datasets
As noted in Section I, the interpretation of datasets as a
Common Semantic Store
context is incremental along its lineage: in general any
statement that holds in a dataset that is upstream (workflow-
wise) from a given dataset D created during a workflow also
(implicitly) holds in D. The representation is thus space-
Transform: Transform:
efficient. We have not yet found it necessary to implement
Query Common Common Assert to
Semantic Ontology
Native
Ontology Semantic such transitivity of dataset contexts directly in the KB query
Store Component Store
Native Native language; our current workflow components use just
Format Format
background (evidence) datasets and datasets that their
Wrapped Component immediate workflow predecessors create.
Fig. 2 Component wrapping scheme
Fig. 2 schematizes a single wrapped component that
executes processes to:
1) Retrieve input data, expressed in the enterprise’s common
ontology, from the central semantic store.
2) Format the input data for the legacy component.
3
?watchlistGraph Group Detection Watchlist-Evidence ?evidenceGraph object, graph, index—“spogi”—format) exists in the workflow
Dataset Join Component
KB. q- is included in the standard Franz Allegro Prolog
?linkGraph
interface to AllegroGraph.
• a- indicates that a triple is to be written to the specified
Group Detection Component output dataset. An a- conjunct always succeeds. a- and its
duplicate-avoiding twin a-- (below) are our contributions that
?outputGraph confer the KB query language’s forward chaining character.
Fig. 4 Use case workflow (see Section III) • a-- indicates that a triple is to be written to the workflow
KB iff it is not already present there. An a-- conjunct
III. USE CASE WORKFLOW always succeeds.
• !rdf:type is an example of a shorthand that expands to
Fig. 4 presents a use case workflow including both a
http://www.w3.org/1999/02/22-rdf-syntax-ns#type — the
wrapped legacy component and a KB query component.
atom type in the namespace for RDF. (!teo: refers to an
In Fig. 4, datasets (graphs) are depicted by square-cornered
application-specific ontology.)
boxes; workflow components are depicted by round-cornered
• ?Event, ?sender, and other symbols beginning with ? are
boxes. Each component reads data from one or more input
logic programming (AKA Prolog) variables. In the logic
graphs and writes to one or more output graphs. Here, a
programming style we support, every logic variable
dataset join KB query component is used to select from
becomes bound when the q- conjunct is matched in the
broader evidence (right) just information relevant to
KB.
watchlisted terrorist suspects (left) for processing by a
• Prolog will backtrack to execute each conjunct in the KB
downstream legacy group detection component.
query for every combination of variable bindings for
In our toolkit, the defining forms for workflow components
which the preceding conjuncts succeed.
are Lisp macro calls. Beyond providing one or more files
containing such definitions, ToolKit users need never interact • The KB query language provides a variety of additional
directly with Lisp or with AllegroGraph, as we provide constructs (e.g., and, or, not) in which the usual
alternative interfaces. expressions that appear as top-level conjuncts may be
embedded—e.g.,
(and (not (q- ?P !rdf:type !teo:Terrorist ?evidenceGraph))
IV. KB QUERY COMPONENTS AND QUERY LANGUAGE
(or (q- ?P1 !rdf:type !teo:Terrorist ?evidenceGraph)
The definition for the KB query component used in Fig. 4 (q- ?P2 !rdf:type !teo:Terrorist ?evidenceGraph))).
appears below. • While the repetition of entity type statements—e.g.,
(defKB-query-component (a-- ?sender !rdf:type !teo:Person ?linkGraph)
group-detection-watchlist-evidence-dataset-join-component —from the input graph is not strictly necessary given our
((and (q- ?Event !rdf:type !teo:TwoWayCommunicationEvent context interpretation, the Tangram contractors agreed
evidenceGraph) that it would be convenient to include such declarations
(q- ?Event !teo:sender ?sender ?evidenceGraph) uniformly in all datasets.
(q- ?Event !teo:receiver ?receiver ?evidenceGraph) Below are the definitions for some utility KB query
(q- ?sender !rdf:type !teo:Person ?evidenceGraph)
components that we provide with the toolkit distribution.
(q- ?receiver !rdf:type !teo:Person ?evidenceGraph)
(q- ?sender !rdf:type !teo:Person ?watchlistGraph) (defKB-query-component 2-input-dataset-union-component
(q- ?receiver !rdf:type !teo:Person ?watchlistGraph) (DataUnionProcess)
(a- ?Event !rdf:type !teo:TwoWayCommunicationEvent ((query (q- ?S ?P ?O ?sourceGraph1)
?linkGraph) (a- ?S ?P ?O ?destGraph))
(a- ?Event !teo:deliberateActor ?sender ?linkGraph) (query (q- ?S ?P ?O ?sourceGraph2)
(a- ?Event !teo:deliberateActor ?receiver ?linkGraph) (a- ?S ?P ?O ?destGraph))))
(a-- ?sender !rdf:type !teo:Person ?linkGraph)
(defKB-query-component 3-input-dataset-intersection-component
(a-- ?receiver !rdf:type !teo:Person ?linkGraph))))
(DataIntersectionProcess)
The above component selects events from one dataset ((query (q- ?S ?P ?O ?sourceGraph1)
(denoted by the logic variable ?evidenceGraph) whose (q- ?S ?P ?O ?sourceGraph2)
participants also appear in another dataset (denoted by (q- ?S ?P ?O ?sourceGraph3)
(a- ?S ?P ?O ?destGraph))))
?watchlistGraph) and asserts the links among them in an
output dataset (represented by the logic variable ?linkGraph) (defKB-query-component dataset-de-duplication-component ()
for consumption by a group detection component. Note the ((query (q- ?S ?P ?O ?sourceGraph)
following. (a-- ?S ?P ?O ?destGraph))))
• This component performs a single KB query that The (first) dataset union component writes everything it
implicitly conjoins (logically) the twelve top-level (q-, a-, and finds in either of its source graphs into its destination graph;
a--) forms. the (second) intersection component writes anything it finds in
• A q- conjunct succeeds iff a triple (in subject, predicate,
4
all of its sources into the destination. A workflow author may Native GDA Input: Native GDA Output:
choose to follow either of these up with the (third) dataset de- Ev-1194,In-10381 group,entity
Ev-709,In-15840 G0,In-10096
duplication component to remove duplicates; note that the Ev-709,In-36232 G0,In-15840
Ev-38749,In-4938 G0,In-19354
author could achieve the same effect by using a-- rather than a- Ev-38749,In-48834 G0,In-19540
conjuncts in the union components’ definitions. Ev-34121,In-3007 G0,In-19625
Ev-34121,In-35214 G0,In-21371
Existing Tangram workflow and process infrastructure Ev-65474,In-21371 G0,In-28719
Ev-65474,In-19354 G0,In-37201
required that we specify the fixed (e.g., two-input) arities for Ev-23484,In-39017 G0,In-37733
Ev-23484,In-16809 G0,In-38634
the components above. This might not be the case in every … G0,In-47910
workflow setting of interest (see Section VIII). Likewise, it G1,In-1002
…
might not be necessary to name (or permanently
componentize) every query before it can be used. Fig. 5 CSV input/output files for the GDA group detection component
Below is a toolkit-based component definition that invokes
V. WRAPPED LEGACY COMPONENTS the automatic CSV file interface to wrap GDA. The
Toolkit users define wrappers for legacy/native components (completely declarative) definition specifies that GDA-
using the Lisp macro defWrapped-component, which affords component-TerroristGroup is an instance of the class
a choice among three distinct interfaces. Non-Lisp- GroupDetectionProcess (see [9]). The (keyword) argument
programming ToolKit users will want to use one of the first :native-input-CSV-file-specs specifies the relation of the input
two interfaces described below; Lisp-programming users are CSV file (to be named "GDA-input-links.csv") to the input
most likely to use the first or third. dataset (bound to the Prolog variable ?linkGraph).1 Note that
1) Fully automatic: defWrapped-component writes a comma- the separating character may be specified, using the :text-
separated value (CSV) or other delimited text file (to be delimiter argument, and the presence of a headerline via the
consumed by the native component) for each input dataset :headerline argument. The argument :native-output-CSV-file-
and automatically reads a delimited text file (produced by specs specifies the relation of the output CSV file (to be
the native component) for each output dataset. For native named "GDA-output-groups.csv") to the output dataset (bound
components with delimited text file-oriented input/output, to ?outputGraph). The remaining top-level arguments specify
the ToolKit user need provide no additional wrapping how to invoke the native component. Further explanation
code. follows the definition.
2) Semi-automatic: defWrapped-component automatically (defWrapped-component GDA-component-TerroristGroup
writes an ntriples file for each input dataset and (GroupDetectionProcess)
automatically reads an ntriples file for each output dataset. :native-input-CSV-file-specs
The ToolKit user provides additional (presumably non- (("GDA-input-links.csv"
Lisp), shell-callable wrapping code as necessary to :query
mediate between these ntriples files and the native (query
component. (q- ?E !teo:deliberateActor ?P ?linkGraph))
:query-type select
3) Manual: The ToolKit user provides, via an additional
:headerline nil
argument to defWrapped-component, custom Lisp code to :text-delimiter ","
implement the required native component interface. Here :query-template (?E ?P)))
we assume that the Lisp programmer will interact directly :native-output-CSV-file-specs
with AllegroGraph to create suitable inputs for the native (("GDA-output-groups.csv"
component. :query
In the sequel, we focus primarily on the fully automatic (query
interface. (a- ?G !teo:orgMember ?P ?outputGraph)
(a-- ?G !rdf:type !teo:TerroristGroup ?outputGraph)
Consider the GDA group detection algorithm [3] from
(a-- ?P !rdf:type !teo:Terrorist ?outputGraph))
CMU’s Auton Lab), which uses CSV input and output files as :headerline t
shown in Fig. 5. The group detector uses event-based linkages :CSV-template (?G ?P)
among individuals to infer groups of associating individuals. :namespace-template
Each input line indicates evidence that a certain event involves ("http://anchor/teo#" "http://anchor/teo#")))
a certain individual. Each output line indicates that a certain :native-component-directory "GDA_DISTRIBUTION"
individual is hypothesized to belong to a certain group. :native-component-command-name "gda_applic"
:native-component-command-arguments
("GDA-output-groups.csv" "GDA-input-links.csv"))
1
The full interface supports any number of native input and of native
output delimited text files and corresponding datasets/graphs.
5
Fig. 6 illustrates how the :native-input-CSV-file-specs :CSV-template argument), instantiating the template and
argument is processed. binding query variables. Again, the template indicates the
Native GDA Input File: General Query Conjunct: order of each bound Prolog variable in each line of the CSV
Ev-1194,In-10381
Ev-709,In-15840
(q- ?E !teo:deliberateActor ?P ?linkGraph)
file. Note the final template instantiation step that inserts
Instantiated Query Conjunct:
Ev-709,In-36232
Ev-38749,In-4938 (q- !teo:Ev-1194 !teo:deliberateActor !teo:In-10381 ?linkGraph) appropriate RDF namespaces (per the :namespace-template
Ev-38749,In-48834
Ev-34121,In-3007
Ev-34121,In-35214
argument). At right, Fig. 8 illustrates how these bindings are
Ev-65474,In-21371
Ev-65474,In-19354
used to instantiate each specified output assertion (query
Ev-23484,In-39017
Ev-23484,In-16809 conjunct). Each assertion is executed to add a triple to the
…
semantic store (with appropriate treatment of duplicates).
General Query Template: (?E ?P)
Instantiated Query Template: (!teo:Ev-1194 !teo:In-10381)
VI. CONCEIVED FULL AUTOMATION FOR COMPONENTS WITH
XML INPUT/OUTPUT FILES
Fig. 6 Automatic CSV file input mechanism
While delimited text input/output formats are quite
First, we execute the input query against the input dataset prevalent, they are by no means the only structured formats of
(graph). At top right, Fig. 6 illustrates how the query’s single interest. We have also designed (not yet implemented) a
(general) conjunct is first specifically instantiated, binding the similar, declaratively-specified wrapping capability for
conjunct’s variables to values for which a triple exists in the components with XML file input/output. The general idea is
input graph. The :query-template argument specifies how the to embed a similar query specification into the XML file where
query’s bound variable values should be ordered in the CSV data is to be read or written. Another alternative on the input
file. At bottom, Fig. 6 illustrates the intermediate step of side (only) would be integration of Xpath and Xquery with
instantiating the query template, based on the instantiated logic programming. (See [1] for a recent survey.)
query conjunct. At left, Fig. 6 shows how we generate one
CSV file line per query instantiation.2 (Note that the RDF VII. THE WRAPPING PROCESS
namespace, !teo:, is removed, as it is not useful to the native The toolkit’s comprehensive documentation (available from
component.) the first author) details the following steps included in the end-
Fig. 7 illustrates how the native component is (next) invoked to-end process of wrapping and then deploying components.
by the workflow execution system. Execution takes place in a 1) Install the wrapping toolkit.
temporary directory specific to the given workflow and 2) Install the native component so that it will be accessible to
component instance. the wrapper.
Directory: Command-name: Command-arguments: 3) Define any KB query component(s) needed to select
$GU_CORE/GDA_DISTRIBUTION gda_applic GDA-output-groups.csv GDA-input-links.csv
appropriate data from any broader dataset(s).
Fig. 7 Automatic CSV file native component calling mechanism 4) Define the wrapper for the native component.
Fig. 8 illustrates how the :native-output-CSV-file-specs 5) Test both KB query and wrapped native components to
argument is (next) processed. ensure effective operation. We have developed and
applied a testing framework that includes component
Query Conjuncts:
Gen. (a- ?G !teo:orgMember ?P ?outputGraph)
Native GDA Output File: concurrency (i.e., re-entrance) testing.
Inst. (a- !teo:G0 !teo:orgMember !teo:In-10096 ?outputGraph)
group,entity
G0,In-10096 6) Deploy the developed and tested components.
G0,In-15840
G0,In-19354 These steps may of course be undertaken by different
G0,In-19540
Gen. (a-- ?G !rdf:type !teo:TerroristGroup ?outputGraph) G0,In-19625
G0,In-21371
classes of users. E.g., in a component wrapping team (of
Inst. (a-- !teo:G0 !rdf:type !teo:TerroristGroup ?outputGraph) G0,In-28719
G0,In-37201 which an enterprise may have several), one member (the
G0,In-37733
Gen. (a-- ?P !rdf:type !teo:Terrorist ?outputGraph) G0,In-38634
G0,In-47910
“installer”) may be primarily responsible for software
Inst. (a-- !teo:In-10096 !rdf:type !teo:Terrorist ?outputGraph)
G1,In-1002
… installations; another (the “developer”) may be expert with the
enterprise’s ontology, workflows, and datasets, the KB query
(?G ?P)
General CSV / Query Template:
language, and the component defining forms; still another (the
Instantiated CSV Template: (G0 In-10381)
Instantiated Query Template: (!teo:G0 !teo:In-10381)
“tester”) may primarily have testing and another (perhaps the
“installer” again) deployment responsibilities. “Scripters”
Fig. 8 Automatic CSV file output mechanism might write custom Lisp wrapping code or shell scripts or
other command line-callable programs to perform data
The process is here roughly the reverse of that in Fig. 6. At
transformations not (yet) supported by toolkit (semi-)
bottom, Fig. 8 illustrates how we first interpret each line of the
automation.
output CSV file (at right) using the template specified (via the
For each component to be wrapped, the wrapping team also
2 should include, or at least have access to, a component
This is per the value select specified for the :query-type argument,
which indicates that duplicate links (useful to GDA) are to be retained in the “champion” who knows what enterprise function(s) the
input dataset. By instead using the (default) value select-distinct, the component must accomplish and understands how the
user may alternatively specify one line per unique query instantiation (thus component works well enough to address any wrapping issues
removing duplicates).
6
(e.g., whether duplicate assertions are or are not appropriate, Disbelief in something we earlier had belief in
what native component control parameters are appropriate). (perhaps because it had been supplied in error).
The champion should bring one or more exemplary use cases Belief in something we did not have belief in
(preferably expressed in terms of the enterprise’s datasets and (perhaps because we had no data about it).
ontology) and should help the wrapping team realize the use • Differences in supporting analytical hypotheses, from:
case(s) in component (and workflow) definitions.3 o Analyst’s conjecture, or “what-if” analysis (that may
Finally, the component wrapping team always should be effect belief or disbelief in data as discussed above).
able to present new requirements to the toolkit development o Differences in workflow components giving rise to
team (who may serve multiple enterprises). different answers, when:
We developed the toolkit during roughly six months of A given workflow function has alternative
concentrated effort, to serve both the broader Tangram realizations in different components.
community and ourselves. Starting with the use case presented A given component has alternative
in Section III, we developed first the KB query language and configurations of control parameters.
KB query components, then progressively more automatic We have commenced efforts to address these issues both
interfaces with which we wrapped GDA (initially). We also formally and with appropriate workflow system infrastructure.
have used (or assisted others to use) the toolkit to wrap the
ORA group detection algorithm, suspicion scorers based on IX. CONTRIBUTIONS’ RELEVANCE BEYOND TANGRAM
the Proximity [7] and NetKit [5] classifiers, and the pattern The use case workflow in Section III includes a generic
matchers LAW [9] and CADRE [8]. “Group Detection Component.” While we’ve noted (in
We have met the Tangram program’s toolkit usability goals: Section V) that GDA-component-TerroristGroup is an instance
as knowledgeable users, we can usually (for components with of the class GroupDetectionProcess, we haven’t said anything
inputs/outputs amenable to the toolkit’s fully automatic yet about how such a specific component instance is selected
interface) complete Steps 3 and 4 of the above wrapping from among the available alternatives for such a general
process within a single staff hour. process class. Beyond enabling semantic interoperability of
enterprise workflow components, IARPA’s broader objectives
VIII. RELAXING THE CONTEXT MONOTONICITY ASSUMPTION in Tangram have included providing technology for
Implicit in the semantics of current Tangram workflow characterizing, for a given generic workflow process, the likely
processing is the following monotonicity assumption: A performance of a given specific component with data inputs
component’s output graph(s) only add(s), logically, to the having certain characteristics, so that the workflow
information in its input graph(s), never delete(s) or retract(s). management system can select the component likely to
This is not entirely practical. perform best in any given circumstance. Our toolkit supports
The need to manage potentially conflicting source this objective by automating the formal description and
information and analytic hypotheses is ubiquitous in an registration of newly defined components in Tangram’s
intelligence analysis enterprise. An analyst, surrounded with process catalog [9].
data and applicable tools or methods, may choose to pursue It’s worth noting that all of the toolkit’s other heretofore-
one line of reasoning at one time and another later, and described capabilities remain applicable in the (perhaps more
different analysts may take different approaches and may build pragmatic) setting where users specify particular components
on each other’s analyses or workflow products. Each such for all workflows themselves.
approach—a combination of data, tools, methods, and earlier
hypotheses—represents a context for analytical reasoning. It REFERENCES
is important within the enterprise for each analyst to [1] Almendros-Jiménez, J. M., Becerra-Terón, A., Enciso-Baños, F. J.:
understand the actual context of each piece of information that Querying XML documents in logic programming, Theory Pract. Log.
s/he might examine and exploit in further analysis—in which Program. 8, 3 (May. 2008), 323–361.
[2] Carley, K. M., Dereno, M.: ORA—Organizational Risk Analyzer. Tech.
s/he may either extend an existing context or branch to create a rep. CMU-ISRI-06-113, Carnegie Mellon University, August 2006.
new subcontext. [3] Kubica, J.; Moore, A.; Schneider, J., Tractable group detection on large
Different contexts may arise in workflow-supported link data sets, Third IEEE International Conference on Data Mining
(ICDM-2003), pp. 573–576, 19–22 Nov. 2003
analytical reasoning for different reasons, including: [4] Macskassy, S. A., Provost, F.: NetKit-SRL: A Toolkit for Network
• Differences in supporting data, from: Learning and Inference, In Proceedings of the NAACSOS Conference,
o Conflicting original data sources. June 2005.
[5] Murray, K., Harrison, I., Lowrance, J., Rodriguez, A., Thomere, J.,
o Time-varying data conditions for a given source, such Wolverton, M.: PHERL: an Emerging Representation Language for
as: Patterns, Hypotheses, and Evidence, in Proceedings of the AAAI
Workshop on Link Analysis, 2005.
[6] Neville, J., Jensen, D.: Dependency networks for relational data. In
3
Consider that a champion may also bring a new data source that may Proceedings of the 4th IEEE International Conference on Data Mining,
require extensions or other modifications to the enterprise ontology. 2004.
Addressing such issues has been the responsibility of a different Tangram [7] Pioch, N.; Hunter, D.; Fournelle, C.; Washburn, B.; Moore, K.; Jones,
contractor. E.; Bostwick, D.; Kao, A.; Graham, S.; Allen, T.; Dunn, M.: CADRE:
7
continuous analysis and discovery from relational evidence,
International Conference on Integration of Knowledge Intensive Multi-
Agent Systems, 2003. pp. 555–561, 30 Sept.–4 Oct. 2003.
[8] Wolverton, M., Berry, P., Harrison, I., Lowrance, J., Morley, D.,
Rodriguez, A., Ruspini, E., Thomere, J.: LAW: A Workbench for
Approximate Pattern Matching in Relational Data. In Proceedings of the
Fifteenth Innovative Applications of Artificial Intelligence Conference
(IAAI-03), 2003.
[9] Wolverton, M., Martin, D., Harrison, I., Thomere, J.: A Process Catalog
for Workflow Generation, in The Semantic Web—7th International
Semantic Web Conference, Springer, vol. 5318/2008, pp. 833–846,
2008.