=Paper= {{Paper |id=None |storemode=property |title=CHAMPION: Intelligent Hierarchical Reasoning Agents for Enhanced Decision Support |pdfUrl=https://ceur-ws.org/Vol-808/STIDS2011_CR_T5_HohimerEtAl.pdf |volume=Vol-808 |dblpUrl=https://dblp.org/rec/conf/stids/HohimerGNS11 }} ==CHAMPION: Intelligent Hierarchical Reasoning Agents for Enhanced Decision Support== https://ceur-ws.org/Vol-808/STIDS2011_CR_T5_HohimerEtAl.pdf
      CHAMPION: Intelligent Hierarchical Reasoning
         Agents for Enhanced Decision Support
                        Ryan E. Hohimer, Frank L. Greitzer, Christine F. Noonan, Jana D. Strasburg
                                                Pacific Northwest National Laboratory
                                                            Richland, WA



Abstract — We describe the design and development of an advanced      information analysis, inspired by neuroscience, in particular the
reasoning framework employing semantic technologies, organized        neuron. A neuron is a cell in the brain whose principal function
within a hierarchy of computational reasoning agents that interpret   is collection, processing and distribution of signals. These
domain specific information. The CHAMPION reasoning framework         signals are propagated through networks of neurons controlling
is designed based on an inspirational metaphor of the pattern         brain activity and formulating the basis for human learning and
recognition functions performed by the human neocortex. The           intelligence including perception, cognition and action.
framework represents a new computational modeling approach that       Artificial intelligence (AI) as a field of inquiry has been around
derives invariant knowledge representations through memory-
                                                                      for decades and currently encompasses a large number of
prediction belief propagation processes that are driven by formal
                                                                      subfields intersecting biology, engineering and complex
ontological language specification and semantic technologies. The
CHAMPION framework shows promise for enhancing complex
                                                                      systems [4-6].
decision making in diverse problem domains including cyber                Properties of biological memory systems motivate the sub-
security, nonproliferation and energy consumption analysis.           field of artificial neural networks (ANN), one type of
                                                                      computational model representing a bottom up or data-driven
                                                                      approach [7]. Feed-forward or recurrent ANNs learn by
Keywords — Semantic Graphs, Description Logic Reasoning,              example and are able to model nonlinear systems. They require
Belief Propagation, Memory-Prediction Framework, Case-Based
                                                                      data for training the network, which is not always available.
Reasoning, Ontological Engineering
                                                                      From the decision support perspective they have the
                                                                      disadvantage of being ―opaque‖ to the user [8]—that is, the
                                                                      distribution and weights of the neural network connections are
                       I.    INTRODUCTION                             not sufficiently specified to offer insight into their operation;
    A major challenge for information analysis is to develop          and this clearly doesn’t facilitate collaboration of joint
joint cognitive systems, described by Woods [1, 2] as systems         cognitive systems.
in which humans interact with another, artificial, cognitive              Machine learning is a mature field focused on programming
system. Cognitive systems are goal-directed, using knowledge          computers to optimize performance based on past experience.
about ―self‖ and the environment to monitor, plan, and modify         The goal with this type of research is to develop general
actions in pursuit of goals. They are both data-driven and            purpose systems that can adapt to new circumstances and
concept-driven. Woods observed that ―developments in                  domain knowledge [9]. A disadvantage of machine learning
computational technologies (i.e., heuristic programming               approaches, when coupled with human decision makers in a
techniques) have greatly increased the potential for automating       joint cognitive systems context, is similar to that described
decisions‖ and for ―… the support of human cognitive                  above for ANNs and connectionist solutions to the extent that
activities….‖ [1] A single, integrated system was envisioned at       the workings of the machine learning component are not
that time that could be composed of both human and artificial         readily understood or communicated to the human decision
cognitive systems working collaboratively to perform complex          maker.
decision making tasks. In the quarter-century that has passed
since this vision was described, many different types of                  In contrast to these data-driven approaches, research in
intelligent systems and processing frameworks have been               knowledge-based/expert systems has focused more on concept-
proposed and developed, though it is not clear that the vision of     driven or top-down reasoning. Top-down reasoning tries to
joint cognitive systems has been realized. The current research       mimic the brain’s functions such as memory. This area of AI is
and development effort represents a serious attempt to bring us       concerned with thinking; how knowledge is represented
closer to this vision utilizing semantic modeling.                    symbolically and manipulated and how it contributes to
                                                                      intelligence.
                       II.   BACKGROUND
                                                                          Bayesian Network (BN) modeling approaches have become
    Understanding how the human brain works is one of                 a rapidly growing area of research aimed at modeling human
science’s grand challenges [3]. A great deal of effort has been       cognitive and decision making behavior, reflecting a
devoted to the development of data-driven approaches to               perspective that use of probabilistic models and associated
computational power of the Bayesian mathematical framework          store memory patterns which can lead to prediction of future
greatly facilitates the representation of human performance         events. These higher level concepts of cognitive processing
within a rational decision making framework. BN models can          have been applied in our work in development of the
be viewed graphically to represent probabilistic relationships in   CHAMPION system.
a given domain; hence they are more readily comprehended by
users. Nevertheless, there are un-answered questions regarding           We advance many of the aforementioned artificial
the appropriateness of using the Bayesian probability construct,    intelligence concepts through extensive use of semantic
which reflects the assumption that human decision processes         technologies. With our modeling architecture, we separate
may be explained in terms of rational/normative models [10].        domain knowledge from the reasoning framework. This is
                                                                    done to maintain flexibility with domain knowledge, allowing
    Logic-based/rule-based systems comprise a structured            it to be updated as needed, and to ensure domain agnosticism,
collection of rules. A long-standing top-down approach is the       allowing the system to be implemented in many fields of
use of logic, as represented in rule based expert systems. A        inquiry.
major difficulty in implementing such knowledge-based
systems is the difficulty of collecting expert knowledge that
must be represented in the collection of rules that comprise the                         III.   SYSTEM DESIGN
knowledgebase. The use of semantic web technology provides
an expressive knowledge representation using ontologies, along          The neocortex was the inspirational metaphor for the
with the application of Description Logics, which provides a        design of our reasoning framework, called CHAMPION (for
formal knowledge representation language that facilitates           Columnar Hierarchical Auto-associative Memory Processing
generation of conclusions or predictions.                           In Ontological Networks). This metaphor serves as a
                                                                    representation for a functional (not structural) design adopting
    Unlike most problem solving techniques in artificial            the following requirements :
intelligence, case based reasoning (CBR) is memory based.
Solving a problem using the classic CBR cycle involves four                    Stores sequences in an invariant form
major components - retrieve, reuse, revise and retain (see                     Stores sequences of patterns
Figure 1) [11, 12]. CBR systems are concept-driven and rely on
the recognition of previously-learned (hard-coded) or                          Stores sequences in a hierarchy
experienced representations to determine the system’s response
to new information. A challenge for the CBR approach is the                    Retrieves sequences auto-associatively
development of efficient and effective methods to search the
repository of cases (stored in case memory).
                                                                        The CHAMPION architecture incorporates a significant
                                                                    variation on knowledge intense case based reasoning (KI-CBR)
                                                                    depicted in Figure 2. Modifications to the traditional CBR
                                                                    cycle were invented in order to meet the functional
                                                                    requirements of this metaphor.
                                                                               Instead of iteration through the case library to find
                                                                                a useful solution, our system uses semantic
                                                                                expressions to represent the criteria for a case
                                                                                belonging in the case library. We consider this an
                                                                                invariant form of a concept belonging to the set of
                                                                                cases.
                                                                               The functional requirement to store sequences of
                                                                                patterns is met by representing the problem and
                                                                                solution spaces in the form of semantic graphs.
                                                                                The nodes and edges constitute the patterns.
                                                                               The architecture uses the query/construct
                                                                                capabilities of SPARQL and programming pattern
                                                                                paradigm of ―Publish and Subscribe‖ to
                                                                                implement an auto-associative mechanism.
                                                                               The domain ontology of the system addresses the
                                                                                functional requirement to store the concepts in a
               Figure 1. The CBR Cycle, adapted from [13].
                                                                                hierarchy.


   A relatively recent top-down approach showing great
promise is the memory prediction framework (MPF) [14, 15].
The MPF defines how the neocortex uses a feedback loop to
                                                                               Figure 3. The components of the CHAMPION system


              Figure 2. The CHAMPION modified CBR cycle
                                                                     A. CHAMPION Ontologies
                                                                     There are four key ontologies in the CHAMPION system,
    The CHAMPION reasoning framework consists of a                   each having a unique purpose: Domain Ontology, Core
hierarchy of reasoning agents called Auto-associative Memory         Ontology, Bridge Ontology, and a collection of Rules
Columns (AMCs). The hierarchy is formed as each agent                Ontologies.
subscribes to subgraphs of interest from a base graph and               1) Domain Ontology
publishes subgraphs back to the base graph (i.e. making the              The content in the domain ontology is the knowledge of
base graph an inference graph).                                      the subject matter expert in the domain of discourse to be
    Agents interpret data in a similar fashion as subject matter     reasoned about. It is expected that the specialized terminology
experts. The lowest level agents in the hierarchy interpret the      of interest be captured in this T-Box ontology. If the domain
rawest form of data, and pass their interpretation of that data up   of interest is Insider Threat, concepts used by experts in this
the hierarchy. Primitive data goes in the bottom and higher          field are defined here. Concepts specifically about aspects of
level interpretations come out the top.                              trusted persons, their access, privileges, roles, responsibilities,
                                                                     and authorities would be defined. Additionally, concepts of the
    A basic premise adhered to is the separation of the domain
knowledge from the reasoning framework. If domain                    enterprise within which they function would be defined, such
knowledge is hardcoded within the reasoning framework, then          as concepts related to the infrastructure and business systems.
the framework’s source code must be changed and recompiled             2) Core Ontology
frequently as domain knowledge is updated. Equally important             The content of the core ontology is the knowledge of the
is the fact that this separation of domain knowledge from the        reasoning framework and its elements. The definitions that
reasoning framework maintains the domain agnostic quality of
                                                                     describe what the necessary components of the AMCs are
the system, which enables its application to diverse problems
                                                                     encoded into this ontology. The primary concept defined in
without modification to the reasoning framework. We use the
Ontology Web Language (OWL) as our knowledge                         this ontology is the AMC. The AMC is the primary reasoning
representation language, to implement the ontologies and             agent of the framework and the class definition of the AMC is
knowledgebases of the system.                                        found in the core ontology.

    The main components of the CHAMPION system, shown                  3) Bridge Ontology
in Figure 3, are:                                                        The bridge ontology associates concepts in the domain
                                                                     ontology with concepts from the core ontology. In other words,
       Ontologies, used for representing the specialized            this is the place where domain concepts are assigned an AMC
domain knowledge.                                                    to reason about them.
        Reifiers, used for ingesting the primitive data as              Continuing with the Insider Threat domain, let’s assume
individuals of the types specified in the domain ontologies.         the concepts of access and unauthorized access are defined in
                                                                     the domain ontology as Access and UnauthorizedAccess
        Memory, used to store the facts asserted from the
                                                                     respectively. In this example, Access is the superclass of
primitive data and the facts inferred by the reasoning system.
                                                                     UnauthorizedAccess. In the bridge ontology we encode that an
        Auto-associative  Memory        Columns      (AMCs),        AMC is assigned to reason about UnauthorizedAccess (the
reasoning components used to interpret the data assertions and       AMC class is subclassed to be an UnauthorizedAccess). The
infer new assertions.                                                UnauthorizedAccess AMC is further defined to subscribe to
Access individuals, and publish UnauthorizedAccess                 that address reasoning or pattern recognition for different
individuals. Later in this paper, we will see that this is a       domains. Similarly, even higher level collections of AMCs
subsumptive AMC.                                                   enable reasoning across such regions, providing a natural
                                                                   mechanism for high level information fusion and analysis.
  4) Rules Ontologies
   An AMC in the reasoning framework is to publish the                 Using a hierarchical framework of reasoners allows us to
appropriate assertions that are entailed in the local AMC’s        constrain the requirements of each reasoner to a narrowly-
graph. Two governing ontologies are applied to the local AMC,      defined purpose. There is almost a one to one relationship
1) the domain ontology, and 2) an AMC specific ontology            between AMCs and the classes defined in the domain
which contains knowledge that is relevant to the local AMC         ontology. With a well-formed domain ontology, we can
only. The consequence of having an ontology at the AMC             overcome computational intractability by performing
granularity is that a rules ontology must exist for each AMC.      reasoning on subsets of the semantic graph. Rather than
                                                                   implementing a monolithic reasoner that is required to reason
B. Knowledgebases
                                                                   over all the concepts represented in the semantic graph, each
In addition to the ontologies, the following knowledgebases        reasoner in the hierarchy is only required to reason about a
are required: Working Memory, AMC Knowledgebases                   small set of relevant concepts.
(Binning Queue, Case Library), and a Contextual
Knowledgebase.                                                         The belief propagation network performs a transformation
                                                                   of the low level literal inputs into higher level abstractions.
  1) Working Memory                                                Ingesting and properly formatting the input data for a given
   The Working Memory knowledgebase is the semantic                domain is performed by a reifier, which instantiates the input
graph containing the state of the base-graph and the inference-    from a data source and packages the information into an OWL
graph assertions. This is the location of all the individuals      representation called an individual. In turn these individuals
from reifiers and from AMCs.                                       are instantiated in Java objects called abstractions. The
                                                                   abstractions are added to the Working Memory of the
   2) AMC
                                                                   CHAMPION system.
    Each AMC has to have a local knowledgebase over which
it can reason. The local knowledgebase directly imports the        D. Reifiers
bridge ontology, which in turn indirectly imports the core and         Reifiers are responsible for asserting individuals
domain ontologies. Additionally, each AMC has a dedicated          (primitives) into the Working Memory via abstractions.
ontology that contains semantic expressions specific to this       Although AMCs are domain agnostic, this is not possible with
AMC. These expressions include SWRL rules that the local           the reifiers. The reifier takes in raw literal data and forms an
AMC’s description logic reasoner evaluates.                        individual that is defined by the domain ontology. When raw
  3) Contextual Knowledge                                          data needs to be reified, specific code is required to convert
    Additional knowledge beyond the streaming problem data         the raw data into a data-type defined in the domain ontology.
under analysis or search is stored in contextual                   E. Provenance Information
knowledgebases. This type of knowledge needs to be accessed
                                                                        Provenance has been defined as the description of the
by the AMC in order to do informed searches or analysis. For
                                                                   origins of data and the process by which it came to exist [16,
example, to correctly reason about an activity associated with
                                                                   17]. Clearly this is an important requirement for the system
a username, the AMC must be able to access information
about that username, such as the roles and access controls that    that will facilitate the decision maker’s understanding of the
are associated with that user.                                     reasoning process. The system has two locations where
                                                                   provenance information can be stored. The first is in the
C. Auto-associative Memory Columns                                 asserted individuals added to the graph. Reified individuals
    The analysis of real world data presents a challenge to        (i.e. individuals from a reifier) and inferred individuals (i.e.
computationally analyze very large graphs. The difficulty is       individuals from an AMC) can have data properties asserted
not so much a data reduction problem as it is a data               specifying their time and source of instantiation. The second
interpretation problem. A traditional approach to analyzing        location for storing provenance information is the episodic
large graphs is to build the graph and then conduct reasoning      memory of the AMCs. Each AMC has an instantiation history
over the entire graph. In contrast, the CHAMPION hierarchy         of all the individuals that it has classified as being a member
of reasoners comprises a ―stack‖ of individual AMCs which          of its governing class. This constitutes its case library,
reason over the data as it is introduced into the system in much   comprising each inference graph the AMC has asserted into
smaller graphs than the entire dataset. The larger graph           the base graph.
structure is built as data are analyzed; this produces a dynamic       To date we have not focused on collection of provenance
belief propagation network that takes in primitive data and        information. However, in future research we wish to use
pushes the interpretation of that data up the hierarchy. We can    provenance information for two significant purposes: 1)
think of this as interpreting the current structure in the data    intelligent rollback to a point of logical consistency, and 2)
and simplifying with abstracting semantics. Just as we can
stack the AMCs, we can stack collections (regions) of AMCs
adaptive machine learning of higher level class resolutions          Motorcycle as well. The reasoning agent would subscribe to
based on case library analysis.                                      individuals of type Vehicle, examine the state of that
                                                                     individual, and determine if the state of the individual meets
                  IV.     AN AGENT’S PURPOSE                         the criteria for being a motorcycle. For instance, the Vehicle
                                                                     may have two wheels and handlebars, thus qualifying it as a
A. Initial Base Graph Assertions are “Primitives”
                                                                     Motorcycle. The reasoning agent would then publish the added
    The first assertions into the base graph are defined as          assertion that the Vehicle was also a Motorcycle.
―primitives.‖ These are not primitives in the same sense as how
programming languages define them, but in the sense that they        B. Composite Reasoning Agents
are defined by a subject matter expert. These primitives are             Composite reasoning agents are less straightforward.
nodes that are believed to be assertions with very low               Unlike subsumption which is supported by explicit subclassing
uncertainty. For example, the data reified into the base graph       and superclassing predicates of standards based ontology
could be computer workstation events such as security events,        languages, the composite reasoner examines user defined
application events, and system events. No assumptions are            predicates to determine if the classification is valid.
made about the events; they occurred and the information is          Subsumption only requires that a new typing assertion on an
reified into the base graph. However, as reasoning agents infer      existing individual be made, not the creation of a new
new assertions based upon these primitive assertions,                individual. A composite reasoner on the other hand may need
uncertainty can be introduced into the graph.                        to create a new named individual, not just new assertions on
                                                                     existing individuals.
B. Inference Graph Assertions are “Abstractions”
    The AMCs are in fact ―classifiers‖. Each AMC in the              C. Aggregation Composite Reasoning Agents
hierarchy is configured by an ontology that defines classes that         These agents must recognize when the requisite parts to an
are the types of things in the domain of interest. In other words,   individual are present, and if so, create the new individual. An
the ontology contains the class definitions of the domain            example of this kind of reasoning follows:
concepts. Class definitions are the abstract data types of the
domain. Concepts are recognized by CHAMPION reasoners                    Continuing with the Vehicle example, a composite
that have been configured to detect them. This means that for        reasoning agent would subscribe to subgraphs that represented
each AMC in the hierarchy there is a class definition in the         parts of a Motorcycle. These would be individuals of type
governing ontology.                                                  Wheel and Handlebar. When the reasoning agent recognizes
                                                                     that all the requisite parts of a specific Motorcycle exist it
    The purpose of each AMC is to recognize the existence of         creates a new individual and makes the appropriate object
an individual of the type that belongs to its assigned class. If     property assertions.
the individual does exist, the agent publishes the appropriate
assertions.                                                              An important aspect of this aggregation process is the
                                                                     concept of making sure that the pieces are all parts of the same
         V.    THE TAXONOMY OF CHAMPION AMCS                         whole. In the CHAMPION system we refer to this notion as a
                                                                     ―binning property.‖ This property can be thought of as a
    There are several types of AMCs in the CHAMPION
                                                                     Vehicle Identification Number (VIN) on an automobile. The
system. Each AMC has the job of classifying the individuals
                                                                     VIN is a number that is used to keep track of the parts that
that exist in the system. To deal with different kinds of
                                                                     belong to a specific automobile. It is not true that any four
concepts, it is necessary to define different kinds of reasoners
                                                                     wheels, any engine, any fender, or any two bumpers sensed as
within the AMCs. We have defined the following types of
                                                                     inputs are the parts that make up an automobile. There has to
reasoning agents:
                                                                     be a mechanism to assure us that these parts all belong to the
             Subsumptive                                            same car. This is the purpose of the binning property of a
                                                                     CHAMPION Composite Reasoning Agent, to make sure that
             Composite                                              the parts are recognized as being parts of a specific whole.
                 o      Aggregate                                    D. Existential Composite Reasoning Agents
                 o      Existential                                      Existential reasoning agents are very similar to aggregation
                                                                     reasoning agents in the fact that they have the capability to
   We will discuss each of these in the following sections.
                                                                     create a new individual if it is appropriate to do so. However,
A. Subsumptive Reasoning Agents                                      the aggregate reasoning agent is looking for the sum of a
    Subsumption is rather straight forward. The knowledge            whole, looking to entail the existence of a thing if its necessary
representation language (OWL) used to implement our                  parts exist. An existential reasoning agent is looking to entail
governing domain ontology specifically defines the predicates        the existence of a thing based on evidence that it should exist.
for subclassing and superclassing. A subsumptive agent               As an example of existential reasoning, if we know that a
examines the state of subscribed subgraphs and determines if         traffic ticket exists which identifies a particular license plate,
the subgraph is subsumed by a higher level class defined in the      we can infer that a vehicle exists. In contrast, an example of
ontology. Consider the following example:                            aggregation reasoning would be if we watched for vehicle parts
                                                                     and when we found the parts necessary to make a vehicle we
    A subsumptive reasoning agent would be used to recognize         could infer a vehicle exists.
that an asserted Vehicle was in addition to being a Vehicle a
    The assertion that a traffic ticket exists carries little                 2.   Acquire the requisite/relevant knowledge from
uncertainty. The inference that a vehicle exists based on the                      contextual knowledgebases and assert into local
assertion of the traffic ticket carries with it a level of higher                  memory.
uncertainty than the existence of the traffic ticket. There could
not have been a violation without the vehicle, but it may have                3.   Apply SWRL rules to abstractions to check and
been destroyed as a result of the violation. If we assert that it                  modify their state (i.e. their data and object
exists based on the fact that a traffic ticket refers to it, we are                properties).
propagating a level of uncertainty.                                           4.   Check to see if the abstraction can be classified as
                                                                                   the targeted type of the Reasoning Agent based on
       VI.   AMC CLOCKWORKS – MAKING AMCS TICK                                     equivalent class expressions in the domain
   CHAMPION AMCs comprise several components. The                                  ontology
main component is a modified CBR mechanism. We have
                                                                              5.   If the DL reasoner has typed the abstraction as the
customized a traditional approach to CBR in order to meet the
                                                                                   targeted type, publish the abstraction to memory
design criteria established early in our implementation.
                                                                                   and add it to the case library of this agent.
A. Traditional Case Based Reasoning Cycle
                                                                          The purpose of the AMCs is to process abstractions
    A traditional CBR cycle iterates through instances of cases       (subscribed input) and decide if it is appropriate to publish
in a case library. As a new case is considered in traditional         additional assertions. The additional assertions are not limited
CBR it is compared to each of the cases in its case library. If a     to existing individuals, meaning that the AMCs can assert new
match is found it is considered to be a solution/match to the         named individuals if deemed appropriate.
new case. If an exact match is not found in the case library, the
closest match is modified to see if it can be made to match. If it                          VII. AMC REGIONS
can it is considered a solution and the modified case is added to         The reasoning framework arranges the AMCs in a
the case library.                                                     hierarchy. The lowest levels of the hierarchy contain AMCs
B. CHAMPION’s Modified Case Based Reasoning Cycle                     that subscribe to the abstractions published to the working
                                                                      memory by the reifiers. The AMCs of the system have a
    We chose to alter the traditional CBR cycle because the           publish and subscribe relationship with working memory (see
iterations through the case library to find an exact match do not     Figures 4 and 5).
fit our functional requirement to use an invariant form to
characterize solutions.                                                   When a low level AMC publishes an abstraction, a higher
                                                                      level AMC may be a subscriber of that type of abstraction. This
    The CHAMPION CBR cycle doesn’t iterate through                    is the method in which abstractions propagate up the hierarchy.
instances of cases in a case library. As a new problem case is        As mentioned earlier, at the lowest levels in the hierarchy one
considered it is compared to semantic expressions to see if           expects that the abstractions contain very little uncertainty. As
qualifies (i.e. it belongs to the appropriate class) to be in the     the AMCs are placed higher in the hierarchy the more
case library. A Description Logic (DL) reasoner is used to            uncertainty is likely in their output abstractions.
examine the state of the new case, if that state entails that the
classification is true, the new case is added to the case library,
and published to the working memory (see Figure 2).
      In traditional CBR the case library is used as a repository
for cases that will be iteratively compared to new input cases.
This is not the purpose of the case library in our modified
version of CBR. The CHAMPION system maintains the case
library for the purpose of statistical analysis. The results of the
statistical analysis can be used to improve the semantic
expressions that define whether or not the abstractions belong
in the case library.
C. Processes of the AMCs
    The semantic expressions which define the class of objects
recognized by the reasoning agents are implemented in the
form of Semantic Web Rule Language (SWRL) and equivalent
class expressions in OWL. The Reasoning Agents use a DL
Reasoner to examine the state of the subscribed abstractions                   Figure 4. AMCs Publish and Subscribe to and from Memory
and modify the data and object properties of the abstractions.
   A basic flow of the processes of an AMC:
        1.   Accept subscribed abstractions into local memory.
                                                                                              Modeling employee computer behaviors of concern using
                                                                                              knowledge engineering methods serves as a framework to
                                                                                              explore the insider threat. A key to the identification of an
                                                                                              insider threat is to understand the signatures of suspicious
                                                                                              activity and to disrupt it in its early stages. The main objective
                                                                                              of our research is the development, validation and
                                                                                              improvement of knowledge discovery automation tools for
                                                                                              cyber security personnel that will significantly reduce the
                                                                                              amount of manual analysis while simultaneously improving the
                                                                                              quality of perceived threat indicators [20].
                                                                                                   To create useful models, information is acquired from
                                                                                              multiple sources including specialized reports, open literature,
                                                                                              and subject matter experts. This information is captured via
                                                                                              interviews with subject-matter experts (SMEs) and the
                                                                                              development of concept maps based on domain expertise and
                                                                                              literature analysis.

            Figure 5. Abstractions passing up the AMC hierarchy
                                                                                                  We conducted interviews of SMEs to capture information
                                                                                              and priorities, to reveal how analysts intuitively conduct risk
                      VIII. APPLICATIONS                                                      profiling, and to understand how they gather information about
                                                                                              the purposes, goals and perceived risk mitigation outcomes of
    The CHAMPION reasoning framework is being applied to
                                                                                              such activities. The information acquired is formally
a variety of advanced decision making problem domains,
                                                                                              represented ontologically; some of the information is stored in
including cyber security/counterintelligence, counterterrorism/
                                                                                              contextual memory, and other information resides in ontologies
weapons nonproliferation, and smart grid power consumption
                                                                                              that drive the AMCs and define the structure of the hierarchy of
analysis. A cybersecurity/counterintelligence application
                                                                                              reasoners for this application. Figure 6 illustrates the
focusing on countering the insider threat is illustrative.
                                                                                              CHAMPION system architecture within this application
    The insider threat refers to harmful acts that trusted                                    context.
individuals might carry out that may cause harm to the
                                                                                                  Another interesting application for this technology is
organization or those which benefit the individual. The insider
                                                                                              understanding nuclear proliferation. The nuclear fuel cycle is a
threat is manifested when human behaviors depart from
                                                                                              large, complex process with many stages, dependencies,
established policies, regardless of whether it results from
                                                                                              processes and signatures. In the coming year the team will use
malice or disregard for security policies. The annual e-Crime
                                                                                              the CHAMPION framework to provide a mechanism for
Watch Survey conducted by Carnegie-Mellon’s CERT
                                                                                              exploring the nuclear fuel cycle (NFC) and the logical
program reveals that for both the government and commercial
                                                                                              relationships between the activities, processes, and materials
sectors, current or former employees and contractors pose the
                                                                                              involved. Working with SMEs, the team will encode the
second greatest cybersecurity threat, exceeded only by hackers;
                                                                                              necessary knowledge into OWL to implement a proof-of-
the financial impact and operating losses due to insider
                                                                                              concept demonstration that will focus on a portion of the NFC.
intrusions           are           increasing          [18,19].
                                                                                              As development continues, broader coverage of the NFC will
                                                                                              be encoded.




                                                                                        AMC




                                                                                  AMC         AMC




                                                                                              AMC         AMC
                                                                            AMC




                                                                      AMC               AMC         AMC         AMC




                            Host event logs
                                  Print logs
                           Web server logs                                                                            -
                         Search engine logs
                                 Email data
                              Location data
                         Personnel records
                                        etc.




                                           Figure 6. CHAMPION Framework in an insider threat monitoring application
                    IX.    CONCLUSIONS                                                             REFERENCES
                                                                    [1]  D. D. Woods, ―Cognitive technologies: the design of joint human-
   We have described a new approach to computational                     machine cognitive systems,‖ AI Magazine, vol. 6, pp. 86-92, 1985.
reasoning models that combines key aspects of belief                [2] D. D. Woods, Joint Cognitive Systems: Patterns in Cognitive Sytems
propagation networks, semantic web, Description Logics, and              Engineering. Boca Raton, FL: Taylor & Francis, 2006.
Case Based Reasoning to yield a system best characterized as        [3] Institute of Medicine Forum on Neuroscience and Nervous System
a memory-prediction framework. This framework is                         Disorders. From Molecules to Minds: Challenges for the 21 st Century.
                                                                         Washington, DC: National Academy of Sciences, 2008.
functionally modeled after an interpretation of how the
                                                                    [4] B.G. Buchanan, ―A (very) brief history of artificial intelligence,‖ AI
neocortex performs pattern recognition. It is implemented as a           Magazine, vol. 26, pp. 53–60, 2005.
hierarchy of reasoning agents that retain certain critical          [5] N.J. Nilsson, Artificial Intellugence: A New Synthesis. San Francisco,
functional requirements that produce a domain-independent                CA: Morgan Kaufmann Publishers, Inc., 1998.
model that may be applied to a variety of decision making           [6] R. Chrisley, ed. Artificial Intelligence: Critical Concepts, vols. 1-4.
problems.                                                                London: Routledge, Taylor & Francis Group, 2000.
                                                                    [7] D.J.C. MacKay. Information Theory, Inference, and Learning
    Earlier in this paper, we compared several extant                    Algorithms. Cambridge: Cambridge University Press, 2003.
approaches to problems in AI and noted the drawbacks of             [8] P. Smolensky, ―On the treatment of connectionism,‖ Behavioral and
                                                                         Brain Sciences, vol. 11, pp. 1-23, 1988.
using rational decision making models to characterize human
                                                                    [9] T.G. Dietterich, ―Machine Learning,‖ Annual Reviews in Computer
performance, such as represented in typical BN models that               Science, vol. 4, pp. 255-306, 1990.
rely on probability theory constructs. Similar issues apply to      [10] M. Jones and B.C. Love, ―Bayesian fundamentalism or enlightenment?
models that apply other forms of probabilistic models such as            On the explanatory status and theoretical contributions of Bayesian
subjective expected utility theory. Famous research programs             models of cognition,‖ Behavioral and Brain Sciences (in press).
conducted by Kahneman and Tversky [e.g., 21] demonstrate            [11] R. Lopez De Mantaras, et al., "Retrieval, reuse, revision and retention in
that human decision making is not rational and is rather                 case-based reasoning," The Knowledge Engineering Review, vol. 20, pp.
                                                                         215-240, 2005.
characterized by the use of heuristics (or influenced by
                                                                    [12] I. Watson and F. Marir, "Case-based reasoning: a review," Knowledge
cognitive biases) that do not yield optimal decisions. The use           Engineering Review, vol. 9, pp. 355-381, 1994.
of heuristics—and what has been described by Kahneman [22]          [13] A. Aamodt and E. Plaza, ―case-based reasoning: foundational issues,
as ―system 1 cognitive processes‖ – exploiting intuition and             methodological       variations,   and     system     approaches,‖       AI
experience rather than procedural knowledge – is sometimes               Communications, vol. 7, pp. 39-59, 1994.
cited as a critical survival mechanism that accounts for expert     [14] A. Nouri and H. Nikmehr, ―Hierarchical bayesian reservoir memory,‖
                                                                         Proceedings of the 14th International CSI Computer Conference
decision making by firefighters and other highly experienced             (CSICC’09), pp. 582-587, 2009.
individuals who do not have time to systematically calculate        [15] J. Hawkins and S. Blakeslee. On Intelligence. New York: Henry Holt
and compare outcomes of alternative responses [23]. A                    and Company, 2004.
conceptual model that reflects this view is the ―Recognition-       [16] Buneman, P., S. Khanna, and W.C. Tan. 2001. Why and where: A
Primed Decision Making Model‖ (RPDM) offered by Gary                     characterization of data provenance. International Conference on
Klein and collaborators [24]. In this regard, the basic structure        Database Theory (ICDT), 316-330.
of the CHAMPION reasoning framework, rooted in the notion           [17] Simmhan, Y.L., B. Plale, and D. Gannon. 2005. A survey of data
                                                                         provenance in e-Science. ACM SIGMOD Record, 34(3), Sept. 2005.
of the memory-prediction system, is very compatible with this
                                                                    [18] CSO Magazine, U.S. Secret Service, Software Engineering Institute,
view of expert decision making. Indeed, the CHAMPION                     CERT Program at Carnegie Mellon University and Deloitte. 2010
framework represents one method of implementing an                       CyberSecurity watch survey - survey results.
operational version of a RPDM model. It is our hope that such       [19] M. Keeney, et al. Insider Threat Study: Computer System Sabotage in
a model, fortified by recent computational methods adopted               Critical Infrastructure Sectors. U.S. Secret Service and Carnegie-Mellon
                                                                         University, Software Engineering Institute, CERT Coordination Center.
from semantic Web technologies, will provide a major                     2005.
advancement in realizing the vision for joint cognitive systems     [20] F. L. Greitzer and R. E. Hohimer, ―Modeling human behavior to
for decision support.                                                    anticipate insider attacks,‖ Journal of Strategic Security, vol. 4, pp. 25-
                                                                         48, 2011. doi:10.5038/1944-0472.4.2.2.
                                                                    [21] D. Kahneman and A. Tversky, ―On the psychology of prediction,‖
                      ACKNOWLEDGMENT                                     Psychological Review, vol. 80, pp. 237-251, 1973.
    The authors wish to thank the CHAMPION team - Shawn             [22] D. Kahneman, ―A perspective on judgement and choice: mapping
                                                                         bounded rationality,‖ American Psychologist, vol. 58, pp. 697-720.
Hampton, Samantha Curtis, Alex Gibson, Michael Henry, Dan
                                                                    [23] G. Klein. Streetlights and Shadows: Searching for the Keys to Adaptive
Johnson, Gary Kiebel, Peter Neorr, Patrick Paulson, Yekaterina           Decision Making. Cambridge, MIT Press, 2009.
Pomiak, and John Schweighardt; and the Information and
                                                                    [24] G.A. Klein, ―A recognition primed decision (RPD) model of rapid
Infrastructure Integrity Initiative at Pacific Northwest National        decision making,‖ in GA Klein, J Orasanu, R Calderwood and CE
Laboratory which supported this work. The Pacific Northwest              Zsambok, eds. Decision Making in Action: Models and Methods.
National Laboratory is operated by Battelle for the U.S.                 Norwood, NJ: Ablex, pp. 138-147, 1993.
Department of Energy under Contract DEAC06- 76RL01830.
PNNL Information Release Number PNNL-SA- 82834.