<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dynamic Data Relevance Estimation by Exploring 2 Models (D REEM)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>H. Van Dyke Parunak</string-name>
          <email>van.parunak@soartech.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Soar Technology, Inc.</institution>
          <addr-line>3600 Green Court, Suite 600 Ann Arbor, MI 48105</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <fpage>3</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>-Analysts in many areas of national security face a massive (high volume), dynamically changing (high velocity) flood of possibly relevant information. Identifying reasonable suspects confronts a tension between data that is too atomic to be diagnostic and knowledge that is too complex to guide search. D2REEM (Dynamic Data Relevance Estimation by Exploring Models) is a knowledge-based metaheuristic that uses stochastic search of a graph-based semantic model to guide successive queries of high-volume, high-velocity data. We motivate D2REEM by considering the nature of knowledge-based search in highvolume, high-velocity data and reviewing current tools. We then outline the D2REEM metaheuristic and describe the state of progress in applying it to a range of model types, including geospatial movement, behavioral models, discourse models, narrative generators, and social networks. Finally, we outline work that needs to be done to advance the D2REEM agenda.</p>
      </abstract>
      <kwd-group>
        <kwd>retrieval</kwd>
        <kwd>querying</kwd>
        <kwd>semantic models</kwd>
        <kwd>big data</kwd>
        <kwd>stochastic search</kwd>
        <kwd>any-time methods</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>Analysts in many areas of national security face a massive,
dynamically changing flood of possibly relevant information.
“Big data” is typically described in terms of Volume (the
amount of data), Velocity (how fast it changes), and Variety
(the diversity of data formats); our concern here focuses on
high-volume, high-velocity data. Activities of crucial interest
can be expected to leave many “footprints” in available data,
but identifying reasonable suspects confronts a tension between
data that is too atomic to be diagnostic and knowledge that is
too complex to guide search.</p>
      <p>The data problem is that no single data item is diagnostic of
an attack. Any one data item that might be part of an attack
could also be part of a benign scenario. For example, a
purchase of fermentation equipment might be a precursor to
anthrax cultivation...or to opening a microbrewery. A new
dissertation on gene splicing in microbes might point to a
potential perpetrator...or just a promising new assistant
professor. In data retrieval terms, static single-item queries give
very low precision in identifying the overall event.</p>
      <p>
        The knowledge problem is that while we can capture
overall patterns of behavior that are diagnostic, matching them
against massive data is combinatorially prohibitive.
Representations that are available include discourse models
that capture the different forms a conversation in social media
might take [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], hierarchical task networks (HTN) that
capture goal-oriented behaviors [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ], social networks that
show possible connections and flows among people and
organizations [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], and narrative models that capture causal
dependencies [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Such a structure covers many possible
behaviors, depending on which combinations of constraints are
satisfied. If we could match such a structure against data, we
would expect very high precision and recall. However, realistic
structures can grow very large (for instance, an HTN might
contain hundreds or thousands of atomic behaviors and
constraints), and naïvely matching such a structure against
massive data all at once is combinatorially prohibitive.
      </p>
      <p>This paper describes D2REEM (Dynamic Data Relevance
Estimation by Exploring Models), a knowledge-based
metaheuristic that uses stochastic search of a semantic model to
guide successive queries of high-volume, high-velocity data.
Section II explores the challenge that D2REEM addresses and
the current state of the art. Section III outlines the D2REEM
metaheuristic. The heart of D2REEM is a knowledge-based
model of the domain, and Section IV reviews several classes of
models to which D2REEM may be applied and documents our
success so far. D2REEM is a work in progress. Section V
identifies a series of next steps for advancing this approach to
semantic-driven search of big data. Section VI concludes.</p>
      <p>II.</p>
    </sec>
    <sec id="sec-2">
      <title>THE CHALLENGE AND PRIOR WORK</title>
      <p>Figure 1 summarizes the challenge that D2REEM
addresses. Static single-record queries are simple, but can be
efficiently applied to high-volume high-velocity data.
Conventional matching methods are too inefficient to apply
knowledge-rich patterns to such data. D2REEM is a novel way
to match complex patterns to big data.</p>
      <p>For years, the staple of information retrieval has been the
record-oriented query, in which the analyst describes single
data items that might be of interest. Static queries can be
matched very efficiently, but their relevance depends on the
state of knowledge about the world, which changes with each
new piece of information.</p>
      <p>The last 25 years have produced an explosion in graph
databases, that is, databases that capture semantic relationships
among data items in a graph structure. Graph databases can be</p>
      <sec id="sec-2-1">
        <title>Sta8c+Single:</title>
      </sec>
      <sec id="sec-2-2">
        <title>Record+Query+</title>
        <p>+
a
t
a
D
+
f
o
+
y
t
i
c
o
l
e
V
+
&amp;
+
e
m
u
l
o
V</p>
      </sec>
      <sec id="sec-2-3">
        <title>D2REEM+</title>
      </sec>
      <sec id="sec-2-4">
        <title>Complex+</title>
      </sec>
      <sec id="sec-2-5">
        <title>Pa/ern+Query+</title>
        <p>Graphs are a natural way to capture a knowledge model,
but classical graphical query languages have several
disadvantages for knowledge-based subgraph matching.
•
•
•
•</p>
        <p>They are generic to any graph-structured data, and do
not take advantage of specific semantics in various
kinds of graphical models. We wish to exploit the
knowledge in a model.</p>
        <p>They require the entire query to match a subset of the
data. We would like to search the data against a
graphical structure (such as a hierarchical task network
[HTN]) that expresses a range of possibilities, and
identify coherent subsets of the pattern that the data
support.</p>
        <p>
          In general, graph matching is intractable [
          <xref ref-type="bibr" rid="ref12">13</xref>
          ], with
either exponential or NP-complete complexity in the
size of the query. Thus queries must be kept small [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
We wish to exploit large knowledge models.
        </p>
        <p>Graph databases require the data to be represented as a
graph. We address high-volume high-velocity data
streams (such as social media) where such
preprocessing is not feasible.</p>
        <p>III.</p>
        <p>THE D2REEM METAHEURISTIC</p>
        <p>D2REEM is a metaheuristic, a high-level procedure that
guides a lower-level process (in this case, record-level
querying). Like many metaheuristics (e.g., genetic algorithms,
ant-colony optimization, swarm optmization, artificial immune
systems), its methods are strongly inspired by biological
models.</p>
        <p>D2REEM shifts the focus of computation in doing
knowledge-based exploration of big data. It moves
computation away from matching the model against the data,
and toward executing a process over the model that embodies
the distinctive semantics of the model. Table I summarizes the
differences between D2REEM and subgraph matching in a
graph database.</p>
        <p>Because D2REEM works with data as a stream of records,
rather than a pre-processed graph, it must issue many
recordlevel queries in order to match a knowledge model. It does this
by executing a continuous cycle (Figure 2). Repeatedly,
D2REEM
•
•
•
•
explores the current state of the model,
updates the priority for learning more about each node
in the model,
adaptively generates a query for the highest-priority
node, and
updates the model with what is learned from that query.
The queries can be posed to any data source, and do not</p>
        <sec id="sec-2-5-1">
          <title>Explore(model(to( iden.fy(highest( priority(needed( informa.on(</title>
        </sec>
        <sec id="sec-2-5-2">
          <title>Formulate(&amp;( post(query(for( high=priority( informa.on(</title>
        </sec>
        <sec id="sec-2-5-3">
          <title>Update( model(state(</title>
        </sec>
        <sec id="sec-2-5-4">
          <title>Retrieve(</title>
        </sec>
        <sec id="sec-2-5-5">
          <title>Evidence(</title>
          <p>
            require predefining relationships among separate data items.
The relationships among retrieved nodes are computed by
exploring the model, not by a complex matching process, a
strategy similar to graph simulation [
            <xref ref-type="bibr" rid="ref13">14</xref>
            ] (though unlike that
work, we do not require that the data already form a graph).
          </p>
          <p>Figure 1 shows the result. Static single-record queries can
be applied to big data, but cannot capture complex patterns
among records. Graphical databases can express patterns, but
computational complexity forces the patterns to be smaller than
a realistic behavioral model, and the data must be small enough
and stationary enough to preprocess into a graph. By taking
advantage of model semantics, D2REEM can match very large
knowledge-rich patterns (with thousands of nodes) against
high-volume, high-velocity data streams that are not in
graphical form.</p>
          <p>Figure 3 shows the D2REEM architecture. The heart of
D2REEM is a Graphical Knowledge Model (GKM) with two
characteristics:
•</p>
          <p>Edges in the graph represent causal or other sequential
dependencies between nodes, so that a trajectory is a
possible evolution of the world, and
The likelihood of visiting a given node can be
modulated by evidence attached to the node.</p>
          <p>The Polyagent Sampling Engine (PSE) continuously
samples alternative trajectories through the GKM to generate a
distribution over possible trajectories reflecting current
knowledge of the domain. The Evidence Prioritizer and
Marshaller (EPM) examines these distributions to identify
nodes about which more information would be useful, issues
queries to retrieve that information, and updates the GKM with
the results. The PSE’s ongoing exploration takes account of
this new information, modifying the distributions over
trajectories, and thus leading to new rounds of queries,
implementing the processing cycle shown in Figure 2.
B. Polyagent Sampling Engine</p>
          <p>By construction, each trajectory through a GKM
corresponds to a possible instance of the dynamics implicit in
the graph. Evidence currently on each node of the graph
modulates the probability assigned to trajectories involving that
node. We wish to construct a distribution over all possible
trajectories. An approach we have found particularly tractable
over many types of GKM is polyagent sampling.</p>
          <p>A polyagent is a set of agents that collectively explore
possible trajectories for a single entity or behavioral instance of
interest. It consists of a single persistent coordinating agent (the
“avatar”), which continuously generates a stream of simple
agents (“ghosts”), each exploring a single trajectory.
Figure 4 shows a polyagent sampling possible paths
through a geospatial domain.</p>
          <p>Ghosts have three biologically-inspired
characteristics: they are manifold, apoptotic, and
tropistic. “Manifold” means that many of them explore
the domain in parallel, like multiple ants in an ant, or
multiple chromosomes in genetic evolution, or multiple
antibodies in an immune system, or agents in swarm
Ghosts&amp;</p>
          <p>Analyst(</p>
          <p>
            While the immediate inspiration of the PSE is biological,
its mathematical underpinnings are based on Monte-Carlo tree
search (MCTS) [
            <xref ref-type="bibr" rid="ref14 ref15">15, 16</xref>
            ], which explores multiple descendants
of a single node to estimate the probability with which that
node should be visited. In MCTS, the graph being explored is a
game tree, in which the same game rules are applied in
expanding each node. The PSE generalizes this concept to
other graph structures, taking advantage of their distinctive
semantics in the decision rules used by the ghosts and the
digital pheromone fields they manipulate.
          </p>
          <p>C. Evidence Prioritizer and Marshaller</p>
          <p>The EPM has three functions:
•
•</p>
          <p>Based on the distribution of trajectories through the
GKM determined by the PSE, identify the nodes for
which additional information would be most valuable.
Formulate and execute queries that will provide more
information on the selected nodes
Avatar&amp;
•</p>
          <p>Update evidence on the selected nodes based on the
results of the queries.</p>
          <p>We consider these in turn.</p>
          <p>Identify nodes to guide queries.—Intuitively, D2REEM
estimates the relevance of candidate queries based on the nodes
for which additional information would be most valuable. The
precise sense of “valuable” depends on the kind of GKM that is
guiding the search, and the decisions that it is guiding. Here are
some alternatives that are useful in different settings. In the
next section, we give further examples of each of these.</p>
          <p>In sparse environments, the most valuable query is one
most likely to yield a hit. In our PROPS system, polyagent
sampling over a geospatial lattice generates candidate
trajectories for adversaries, and the most probable trajectory
guides the decision of where to deploy scarce surveillance
assets to increase the probability of detecting an adversary.</p>
          <p>
            One might try to maximize some global measure over the
GKM. One use for an HTN in D2REEM is to model a potential
adversary’s behavior (e.g., mounting a biological attack). In an
HTN (using the rTÆMS dialect [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]), each leaf task that is
executed contributes to the quality that accumulates at the root,
and the higher that quality, the better the objective is achieved.
By examining a set of possible trajectories identified by the
PSE, the EPM can identify which trajectory would yield the
highest root quality. If the HTN models adversarial behavior,
this trajectory is most consistent with the adversarial intent we
are seeking to detect. In this case, we want to select the nodes
for which gaining more information might increase the
probability of that trajectory.
          </p>
          <p>In some cases, the nodes about which we want to learn
more are those for which more information would sharpen the
distribution over alternative trajectories. We estimate the
effect of this choice by changing the evidence levels for
various nodes in copies of the GKM and run the PSE on them,
then compare the resulting distributions.</p>
          <p>Formulate and Execute Queries.—The EPM submits
queries to external data sources for those nodes that have been
identified as of highest priority. Currently, we construct queries
for each node manually in the course of formulating the GKM,
and the EPM retrieves the specified query and submits it.</p>
          <p>Update Node Evidence.—The EPM updates the evidence
supporting the node on the basis of the response to the query.
This change modulates the ongoing execution of the PSE,
potentially changing the highest priority nodes in the next
invocation of the EPM and directing the search process to the
most relevant potential data.</p>
          <p>IV.</p>
          <p>EXAMPLES OF D2REEM MODELS</p>
          <p>The heart of D2REEM is a semantic model of some facet of
the real world. We have identified numerous such models, and
demonstrated various facets of D2REEM on them. This section
outlines these examples. For each, it discusses
• how the model supports the two requirements identified
in Section III.A (trajectories represent possible
evolutions of the world; evidence on nodes modulates
probability of trajectory)
• how it supports the PSE and EPM (in particular, what
makes a node “relevant”), and
•</p>
          <p>what aspects of D2REEM have been implemented in it.
A. Movement on Geospatial Maps</p>
          <p>
            The most mature class of GKM for polyagent sampling is
the geospatial lattice, whose nodes correspond to tiles of the
environment and whose edges represent adjacency among tiles
[
            <xref ref-type="bibr" rid="ref16">17</xref>
            ].
          </p>
          <p>A trajectory represents the movement of a target, and the
probability that a trajectory visits a node depends on
externally-provided information such as terrain, presence of
friendly or adversarial forces, and combat activity. The
cumulative distribution of presence pheromone thus reflects the
probability of encountering the target as a function of location.</p>
          <p>
            In the DARPA RAID project, we applied the PSE on such a
model to urban combat. Figure 5 shows the close correlation of
predictions of red force movement in a human-commanded
wargame, compared with the actual movement of troops.
Quantitatively, the PSE produced more accurate forecasts than
both experienced human observers and game-theoretic or
Bayesian reasoners [
            <xref ref-type="bibr" rid="ref17">18</xref>
            ].
          </p>
          <p>In the ARL PROPS project, we used the PSE on a
geospatial lattice to direct collection management. The
relevance criterion in this case is to give priority to queries
(intelligence requests) on areas most likely to generate a hit.</p>
          <p>PROPS is the most mature implementation of the D2REEM
metaheuristic to date, including ongoing PSE exploration of the
knowledge model, dynamic query formulation, and updating of
the knowledge model.</p>
          <p>B. Hierarchical Task Networks</p>
          <p>
            Goal-oriented behavior by intelligent agents is often
represented with a hierarchical task network (HTN) [
            <xref ref-type="bibr" rid="ref18 ref4">4, 19</xref>
            ].
Figure 6 is a fragment of an HTN model for a mix of benign
and nefarious cyber-activities. The nodes are actions, and are
joined by two kinds of edges: subtask edges (solid) that
connect a higher-level task (the goal) to lower-level tasks that
15 min predictions
t = 140 sec
          </p>
          <p>Forces w/ 15 min tails
t = 140 + 900 sec</p>
          <p>Load
Trojan on
target
Load Trojan
via Game</p>
          <p>Load Trojan via</p>
          <p>Hacked Site
Post bogus
game</p>
          <p>Pgm on target
opens port</p>
          <p>Scan site
for banner,</p>
          <p>version
Invite &amp; Load
Pgm on target
beacons out
Retrieve Data</p>
          <p>Prepare Slides</p>
          <p>List
Available</p>
          <p>Files</p>
          <p>Key:
Upload
hacked
front end</p>
          <p>Retrieve File
Decomposable</p>
          <p>
            Task
Atomic
Action
“and”
subtask
sequence
carry it out, and sequence edges (dashed) that reflect
precedence constraints. These precedence constraints
are inherited by the leaves of the HTN. The graphical
language in the figure is a simplification; our full
formalism, derived from the TÆMS language [
            <xref ref-type="bibr" rid="ref18">19</xref>
            ], is
much more sophisticated. In TÆMS, successful
execution of a task generates “quality,” a scalar value,
that propagates upward via combination rules. The
degree to which a sequence of leaf actions satisfies a
top-level goal is measured by the amount of quality
that accumulates at that top node.
          </p>
          <p>Polyagent sampling explores alternative
trajectories through the leaf tasks. Each trajectory
reflects a sequence of actions that an agent might
execute in the world. The probability that an agent’s
next step will move to a given task depends on the
task’s feasibility (the satisfaction of its prerequisites),
its desirability (based on the degree to which
higherlevel tasks have been achieved), and evidence for the
task from the external world (provided by the EPM).</p>
          <p>
            The HTN is an example of a GKM for which the
value of generating a query for a node depends on a global
characteristic of the model, namely, the change in the quality
of the root node that a response to the query might generate.
We have demonstrated the PSE on HTNs [
            <xref ref-type="bibr" rid="ref4">4, 20</xref>
            ], but not yet
implemented an EPM for it.
          </p>
          <p>C. Social Networks</p>
          <p>
            We represent a social network [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] as a bipartite graph, in
which one set of nodes represents people or organizations, and
another indicates class of transaction. Several different kinds of
transaction are possible, including communication, transfer of
wealth, transfer of power (e.g., by confrontation), or transfer of
status (e.g., by endorsement). Figure 7 is an example social
network in our PSTK system (Power Structure ToolKit), in
which the Agents are people and the bar graphs between them
represent the levels of the different transaction types (in this
example, Political, Military, Economic, Social, from the
PMESII ontology).
          </p>
          <p>A trajectory in a social network indicates a sequential
transfer of social capital. For example, one may seek evidence
for a money laundering operation that moves a financial
payment makes its way through a series of organizations.
Evidence of a specific transaction increases the likelihood of a
transition from one agent to another.</p>
          <p>Our current PSTK system explores possible flows using
specialized processes residing on each agent, not the PSE. We
have not implemented a EPM for social networks. If one is
seeking to identify sequences of transactions, the relevance of a
node to generate a query is measured by the degree to which
additional information on that node would sharpen the
distribution over alternative trajectories. For example, a node
that is shared by several emerging trajectories would not rank
as high as one that is unique to a single candidate trajectory.</p>
          <p>
            A Narrative Space Model (NSM) captures a set of many
possible narratives that could explain the evolution of a
situation [
            <xref ref-type="bibr" rid="ref19 ref7">7, 21</xref>
            ], and is an external representation for the
mental activity of an analyst who is seeking to explain how a
given state of affairs might come about. Each node in an NSM
consists of a statement about the world to which belief may be
assigned. The NSM has an edge from one node to another just
if the first statement followed by the second forms a coherent
segment of narrative.
          </p>
          <p>Each trajectory through an NSM represents a coherent
narrative about how the world might evolve from the origin to
the destination. Figure 8 shows an abbreviated NSM that
captures ways that al-Assad might stay in power or lose power
in Syria. The ‘????’ notation on edges between nodes are
placeholders for edge weights that the PSE fits based on
evidence on the nodes. In an NSM, external evidence for
(against) an individual node increases (decreases) the
probability of trajectories including that node.</p>
          <p>We have implemented the PSE on NSMs, and modulated
its behavior by attaching external evidence to nodes in the
NSM. In our work so far, this
evidence has been attached by a
human analyst, not by the EPM. Since
our interest is in identifying the most
likely narrative given the evidence
available to date, the EPM for a NSM
would weight nodes based on how
much evidence for a given node
would sharpen the distribution over
emerging narratives.</p>
          <p>E. Discourse Models</p>
          <p>
            Dooley Graphs [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] reflect a
speech-act view of discourse [
            <xref ref-type="bibr" rid="ref20">22</xref>
            ], in
which each utterance seeks to
accomplish something (e.g., Solicit an
action or a statement, Inform,
Commit, or Refuse). In a coherent
conversation (a sequence of speech
acts), later utterances may be related in different ways to earlier
ones: they may Respond, Reply, Resolve, or Complete them.
Detecting such coherent conversations from a high-volume,
high-velocity stream of data (for example, a Twitter feed)
would make a great contribution to surveillance activities.
          </p>
          <p>There are a number of ways one could graph a sequence of
utterances, depending on what one chooses as the nodes.</p>
          <p>The nodes could represent specific utterances, and
edges would reflect the sequences among them. This
representation loses critical information about who
issues each utterance.</p>
          <p>One might analyze the conversation to characterize
different states that it could assume, and then represent
each state as a node, with edges representing possible
state transitions. A state model, like an utterance
model, deemphasizes the participants, and in addition
makes it difficult to distinguish specific conversations.
We could represent participants as nodes, with edges
representing utterances from the source to the target.
Like the state model, the participant model does not
1.#rfq#
4.#bid#
5.#accept#</p>
          <p>B1#
2.#query#
3.#answer#</p>
          <p>A2#
1.#propose#</p>
          <p>7.#no?fy#
A1#</p>
          <p>C1#
3.#prefer#
1.#propose#
6.#prefer#
7.#no?fy#
2.#check#
4.#confirm#
B1#</p>
          <p>E1#
UFerance#
Character#
•
•
•
D1#
D1#
1.#rfq#
1.#rfq#
A1#</p>
          <p>C1#
1.#propose#
7.#prefer#
5.#no?fy#</p>
          <p>A Dooley graph (e.g., Figure 9) is a bipartite graph. One
class of nodes (circles in Figure 9) represents characters,
which are participants at distinguished stages of the discourse,
based on the notions of resolution and completion. Thus
participant A may appear as nodes A1 and A2. The other class
of nodes (squares in Figure 9) represents utterances, which are
characterized by type of speech act. Utterances that resolve or
complete one another tend to form tightly-connected
components of the graph, while those that take off in new
directions spawn new components. A trajectory through the
Dooley graph represents a realization of a conversation.
Retrieving a tweet from (say) A to B adds evidence to
Acharacters and B-characters; recognizing the tweet as a specific
speech act adds evidence to utterance nodes requiring that
speech act.</p>
          <p>We have not yet implemented either the PSE or the EPM
on Dooley Graphs. In using a Dooley Graph for surveillance of
social media, one would seek to identify well-formed
conversations and classify them (e.g., meeting organization,
viral propagation of opinion, purchase activities). For this
purpose, the EPM should prefer nodes based on their potential
for sharpening the distribution over alternative trajectories.</p>
          <p>Four main avenues for extension of D2REEM provide a
range of challenging and important research opportunities:
multiple model types, model management, query generation,
and model linking.</p>
          <p>Multiple Model Types: Our most complete example of
D2REEM is the PROPS system, which treats the geospatial
domain. The NSM is the next most mature, demonstrating the
effectiveness and computational efficiency of modulating the
state of a non-geospatial knowledge model by external
evidence. In addition to refining these applications, we seek to
extend the complete D2REEM cycle to other model types. As
we configure the PSE and EPM to different model types, we
gain valuable insights into how the underlying mechanisms of
the metaheuristic can be generalized.</p>
          <p>Model Management: As noted in Section II.A, an
important difference between D2REEM and graph DBs is how
knowledge is expressed. Graph DBs construct a small
graphical query using the same graph syntax that governs the
data, and seek to match the entire query graph against the data
graph. D2REEM uses a large GKM that captures a range of
hypotheses, and then explores this model in the light of the
data to identify high-priority record-level queries. The use of a
complex knowledge model is a strength, in that it externalizes
analysts’ internal mental models, facilitating collaborative
review and enhancement. But it is also a weakness, since
constructing such models is itself a labor-intensive process.</p>
          <p>
            For many long-standing problems, model construction is
integral to the analytic effort [
            <xref ref-type="bibr" rid="ref19">21</xref>
            ], and D2REEM offers an
additional incentive to construct such models. But it will be
even more useful if model construction can be partially
automated. For example, in the case of the NSM, techniques
exist to merge specific narratives in a domain of interest into a
NSM [
            <xref ref-type="bibr" rid="ref21 ref22">23, 24</xref>
            ]. Such technology could exploit past analytic
products (which often include a narrative of the event under
investigation) to enhance a NSM of the domain. Another
example is the Disciple technology [
            <xref ref-type="bibr" rid="ref23">25</xref>
            ], which has been used
successfully to learn inferential relations of the sort one might
encounter in a belief network.
          </p>
          <p>A strength of the PSE approach to model exploration is the
locality of ghost movement and pheromone-based interaction.
This locality means that GKMs can be extended incrementally,
and encourages the notion of a persistent library of
dynamically updated models as a central resource in analysis.
Development of mechanisms for managing such a library will
considerably advance the analytic enterprise.</p>
          <p>Query Generation: One task of the EPM is formulating
queries that can provide additonal information on GKM nodes
that it identifies as highly relevant. In our current
implementations, these queries are manually constructed along
with the GKM. Given the description of a node in a model and
schemata for external data sources, one would like to generate
queries automatically, a task that will rely heavily on research
in ontological reasoning and semantic web technologies.</p>
          <p>
            Model Linking: The same analytic tasking can be viewed
through the lens of multiple model types, and we would like to
facilitate the flow of information between these model types by
defining mappings between nodes in different model types.
Like the previous topic, this one depends on advances in
ontological reasoning, as well as model theory and other
formal tools [
            <xref ref-type="bibr" rid="ref24">26</xref>
            ], and will require attention to aligning multiple
levels of meaning [
            <xref ref-type="bibr" rid="ref24 ref25">26, 27</xref>
            ], not all of which may be represented
in each model.
          </p>
          <p>VI.</p>
          <p>CONCLUSION</p>
          <p>Matching knowledge-rich patterns against high-volume,
high-velocity data is combinatorially prohibitive. The
D2REEM metaheuristic is a new approach to such retrieval
problems that shifts the computational burden from graph
matching (a NP-complete problem) to iteratively exploring a
knowledge model and issuing focused queries for the data that
is most relevant in the light of current knowledge (a process
that is linear in the size of the knowledge model). D2REEM
can be applied to any graphical knowledge model in which
edges represent causal or or other sequential dependencies and
in which adding data to individual nodes can change the
probability of a trajectory.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N. N. Binti</given-names>
            <surname>Abdullah</surname>
          </string-name>
          ,
          <article-title>"Activity States: a theoretical framework for the analysis of actual human collaboration on the Web,"</article-title>
          <source>Ph.D., Information</source>
          , Structure, Systèmes, Université
          <string-name>
            <surname>Montpellier</surname>
            <given-names>II</given-names>
          </string-name>
          , Montpellier, France,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H. V. D.</given-names>
            <surname>Parunak</surname>
          </string-name>
          ,
          <article-title>"Visualizing Agent Conversations: Using Enhanced Dooley Graphs for Agent Design and Analysis,"</article-title>
          <source>in Second International Conference on Multi-Agent Systems (ICMAS'96)</source>
          ,
          <year>1996</year>
          , pp.
          <fpage>275</fpage>
          -
          <lpage>282</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Erol</surname>
          </string-name>
          ,
          <article-title>"Hierarchical Task-Network Planning Systems: Formalization, Analysis, and</article-title>
          <string-name>
            <surname>Implementation</surname>
          </string-name>
          ," Computer Science, University of Maryland, College Park, MD,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H. V. D.</given-names>
            <surname>Parunak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Belding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bisson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brueckner</surname>
          </string-name>
          , E. Downs,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hilscher</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Decker</surname>
          </string-name>
          ,
          <article-title>"Stigmergic Modeling of Hierarchical Task Networks,"</article-title>
          <source>in the Tenth International Workshop on Multi-Agent-Based Simulation (MABS</source>
          <year>2009</year>
          , at AAMAS 2009), Budapest, Hungary,
          <year>2009</year>
          , pp.
          <fpage>98</fpage>
          -
          <lpage>109</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , B.
          <string-name>
            <surname>Bechtel</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Knudsen</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Waltz</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>White</surname>
          </string-name>
          ,
          <article-title>"PSTK: A Toolkit for Modeling Dynamic Power Structures,"</article-title>
          <source>in the 16th Behavior Representation in Modeling and Simulation Conference (BRIMS)</source>
          , Providence, RI,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Easley</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          , Networks, Crowds, and
          <article-title>Markets: Reasoning about a Highly Connected World</article-title>
          . Cambridge, UK: Cambridge University Press,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H. V. D.</given-names>
            <surname>Parunak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brueckner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Downs</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Sappelsa</surname>
          </string-name>
          ,
          <article-title>"Swarming Estimation of Realistic Mental Models,"</article-title>
          <source>in Thirteenth Workshop on Multi-Agent Based Simulation (MABS</source>
          <year>2012</year>
          , at AAMAS 2012), Valencia, Spain,
          <year>2012</year>
          , pp.
          <fpage>43</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P. T.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <article-title>"Query Languages for Graph Databases,"</article-title>
          <source>SIGMOD Record</source>
          , vol.
          <volume>41</volume>
          , pp.
          <fpage>50</fpage>
          -
          <lpage>60</lpage>
          ,
          <year>March 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Neo</given-names>
            <surname>Technology</surname>
          </string-name>
          ,
          <source>The Neo4j Manual, 1.8</source>
          .1 ed.:
          <source>Neo4j</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berglund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Boag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chamberlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Fernández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Robie</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Siméon.</surname>
          </string-name>
          (
          <year>2011</year>
          ,
          <article-title>22 August)</article-title>
          .
          <article-title>XML Path Language (XPath) 2.0 (Second Edition)</article-title>
          . Available: http://www.w3.org/TR/xpath20/ [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Harris</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne</surname>
          </string-name>
          ,
          <article-title>"SPARQL 1.1 Query Language, W3C Working Draft," W3C2011.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Klyne</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Carroll</surname>
          </string-name>
          ,
          <article-title>"Resource Description Framework (RDF): Concepts and Abstract Syntax, W3C Recommendation," W3C2004.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Barcelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Libkin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Reutter</surname>
          </string-name>
          ,
          <article-title>"Querying Graph Patterns," presented at the PODS'11</article-title>
          , Athens, Greece,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Henzinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Henzinger</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. W.</given-names>
            <surname>Kopke</surname>
          </string-name>
          ,
          <article-title>"Computing Simulations on Finite and Infinite Graphs," presented at the the 36th</article-title>
          <source>Annual IEEE Symposium on Foun dations of Computer Science (FOCS 95)</source>
          , Milwaukee,
          <string-name>
            <surname>WI</surname>
          </string-name>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kearns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mansour</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <article-title>"A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes," presented at the the</article-title>
          <source>Sixteenth International Joint Conference on Artificial Intelligence</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kocsis</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Szepesvári</surname>
          </string-name>
          ,
          <article-title>"Bandit based Monte-Carlo Planning," presented at the the EMCL 2006</article-title>
          , Berlin, Germany,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>H. V. D.</given-names>
            <surname>Parunak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Brueckner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Matthews</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Sauter</surname>
          </string-name>
          ,
          <article-title>"Swarming methods for geospatial reasoning,"</article-title>
          <source>International Journal of Geographical Information Science</source>
          , vol.
          <volume>20</volume>
          , pp.
          <fpage>945</fpage>
          -
          <lpage>964</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>H. V. D.</given-names>
            <surname>Parunak</surname>
          </string-name>
          ,
          <article-title>"Real-Time Agent Characterization and Prediction," presented at the</article-title>
          <source>International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS'07)</source>
          , Industrial Track, Honolulu, Hawaii,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>B.</given-names>
            <surname>Horling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lesser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vincent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Raja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Decker</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. Garvey.</surname>
          </string-name>
          (
          <year>2004</year>
          ,
          <article-title>1 August)</article-title>
          .
          <source>The Taems White Paper [Web site]</source>
          . Available: http://dis.cs.umass.edu/research/taems/white/ [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Brueckner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Belding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bisson</surname>
          </string-name>
          , E. Downs, and
          <string-name>
            <given-names>H. V. D.</given-names>
            <surname>Parunak</surname>
          </string-name>
          ,
          <article-title>"Swarming Polyagents Executing Hierarchical Task Networks,"</article-title>
          <source>in Third IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO</source>
          <year>2009</year>
          ), San Francisco, CA,
          <year>2009</year>
          , pp.
          <fpage>51</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>L.</given-names>
            <surname>Sappelsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. V. D.</given-names>
            <surname>Parunak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Brueckner</surname>
          </string-name>
          ,
          <article-title>"The Generic Narrative Space Model as an Intelligence Analysis Tool,"</article-title>
          <source>American Intelligence Journal</source>
          , vol.
          <volume>31</volume>
          , p.
          <source>(in press)</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>P. R.</given-names>
            <surname>Cohen</surname>
          </string-name>
          and
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Perrault</surname>
          </string-name>
          ,
          <article-title>"Elements of a Plan-Based Theory of Speech Acts,"</article-title>
          <source>Cognitive Science</source>
          , vol.
          <volume>3</volume>
          , pp.
          <fpage>177</fpage>
          -
          <lpage>212</lpage>
          ,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee-Urban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Appling</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <article-title>"</article-title>
          <source>Crowdsourcing Narrative Intelligence," Advances in Cognitive Systems</source>
          , vol.
          <volume>1</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Riedl and R. M. Young</surname>
          </string-name>
          ,
          <article-title>"From Linear Story Generation to Branching Story Graphs,"</article-title>
          <source>IEEE Computer Graphics and Applications</source>
          , vol.
          <volume>26</volume>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Gheorghe</given-names>
            <surname>Tecuci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Boicu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Boicu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Marcu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stanescu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Barbulescu</surname>
          </string-name>
          ,
          <article-title>"The Disciple-RKF Learning and Reasoning Agent,"</article-title>
          <source>Computational Intelligence</source>
          , vol.
          <volume>21</volume>
          , pp.
          <fpage>462</fpage>
          -
          <lpage>479</lpage>
          ,
          <year>November 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tolk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Y.</given-names>
            <surname>Diallo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Padilla</surname>
          </string-name>
          ,
          <article-title>"Semiotics, entropy, and interoperability of simulation systems: mathematical foundations of M&amp;S standardization," presented at the</article-title>
          <source>Proceedings of the Winter Simulation Conference</source>
          , Berlin, Germany,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ackoff</surname>
          </string-name>
          ,
          <article-title>"From Data to Wisdom,"</article-title>
          <source>Journal of Applied Systems Analysis</source>
          , vol.
          <volume>16</volume>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>9</lpage>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>