Supporting the Analytic Knowledge Manager: Formal Methods for
                             Ontology Display and Management

                    Alan Chappell, Anthony Bladek, Cliff Joslyn, Eric Marshall,
                 Liam McGrath, Patrick Paulson, Sean Stolberg, and Amanda White
                                          Pacific Northwest National Laboratory

   Abstract — The Intelligence Community and other analytic-        IT department, a librarian, and a technical support group,
focused communities are developing and implementing large           each of which must understand and support multiple user
knowledge bases and semantic-based systems. These systems           communities within that organization. This ultimately leaves
require new activities for managing their ontological
underpinning, including a range of tasks from supporting
                                                                    the end user—the intelligence analyst—with many of the
domain description and evolution to integrating multiple source     tasks of the knowledge manager. These shared tasks include:
of semantic information. Beyond the role of the analyst or the      how to construct requests for data, how to access the resultant
traditional data base administrator, the role of the knowledge      data, and how to integrate them into an analysis. And, since
manager as the point of focus for such activities is growing in     the required semantics of data can be lost or modified by the
prominence. We are developing methods and tools to provide an       many de-facto AKMs along the data delivery chain
analytical ability for the display and management of ontological
systems, rooted in the formal properties of semantic relations in
                                                                    (including all those listed previously), it may be impossible
semantic graphs, and the semantic hierarchies in which they are     for the analyst to retrieve information related to an
valued. We describe methods for display, integration, and           intelligence problem or the metadata necessary to determine
management of ontological resources to support the emerging         the quality of data.
Analytical Knowledge Manager with the AKEA tool.
                                                                       We find that many workgroups within the IC already rely,
 Index terms — Knowledge management, knowledge                      formally or informally, on selected member of the workgroup
manager, ontology visualization, ontology alignment.                to assist others with AKM functions. This person typically is
                                                                    technology “savvy” and skilled in the use of a wide set of
                   I.        INTRODUCTION                           data access and transformation tools. Unfortunately, this ad
In this paper we address the needs of the “Analytic                 hoc role often is under-recognized and under-resourced,
Knowledge Manager” (AKM), a hypothetical actor whose                which can exacerbate the workload of the individual even if
responsibilities are to manage not the underlying data of an        enhancing the effectiveness of the workgroup.
analytical organization, but rather the collection of its              We argue that a recognition of the AKM role in terms of its
semantic information, ontologies, and schemata. The                 responsibilities and the support it requires will allow an
semantic domain of an enterprise is linked both to the content      intelligence enterprise to more effectively find the data
of its data and the applications in which those data are used.      required for a particular analysis task, allow the analyst to
Thus the AKM must respond to the needs of a particular              understand the quality and provenance of data, and help
analytical/scientific function in much the same way that the        prevent the analyst from being overwhelmed by data not
IT manager responds to the needs of the business function of        pertaining to the current problem. Ultimately, a formal
an organization.                                                    assessment of AKM roles may assist with understanding
   Since the role of the AKM is a relatively recent evolution,      access control and separation of duty considerations.
most organizations splinter the associated functions among             In this paper we use the RASCI (Responsible,
multiple actors, each performing AKM functions as adjuncts          Accountable, Supportive, Consulted, Informed) framework
to data processing pathways established before the                  [1] to define the AKM role, its responsibilities, and the
organization incorporated semantic processing. These actors         support required for the AKM role. We then describe typical
include the producers of the data; the end users of the data        AKM tasks in the context of ontology management, including
(e.g. intelligence analysts), those who store and provide           analysis and linkage. We describe our approach to supporting
access to the data (e.g. IT and DBMs); and intermediaries           such AKM tasks on ontologies through the formal analysis of
(e.g. web site managers, web programmers, information               the mathematical properties of link types, and in particular
retrieval specialists, and anyone who must interpret,               the manipulation of semantic hierarchies. We conclude by
transform, or manipulate the data).                                 illustrating our implementation of these methods within the
 Large organizations typically provide partial support for          AKEA tool.
AKM roles through formal groups. These groups include an
                      II.      BACKGROUND                                 be informed when requested data has been delivered. Finally,
   Knowledge Management (KM) is a discipline that strives                 the chart specifies who the AKM is accountable (A) to for
to organize and preserve knowledge, making it accessible to               the specified activity. As an example, the analyst
the enterprise [8]. In the domain of intelligence analysis, the           acknowledges that delivered data matches their requirements
primary knowledge is fluid and tied to specific analytical                and that is placed in the proper context in their analysis tools.
problems.                                                                 B.        The Current State: de Facto AKMs
A.       The Potential Roles of AKMs                                         Given the complexity of the AKM’s task, the heavy
   We envision the following to                 be    the       primary   dependence on knowledge that is tightly bound to particular
responsibilities of the AKM:                                              problems and problem domains, and the need to access data
                                                                          with many formats and from many sources—each with their
     1. To enable access to information by analysts that fulfills
                                                                          own set of semantics—it is understandable that the role of
        the requirements of a particular analytic problem
                                                                          AKM either has been ignored or distributed to other parts of
     2. To provide queries to data sources using the semantics
                                                                          the organization. This leads to a lack of responsibility and
        and syntax expected by the data source
                                                                          accountability for the activities that should belong to the
     3. To interpret the provided information within the context
                                                                          AKM.
        of the analytic problem, and
                                                                             Table II applies the RASCI chart method to this current
     4. To provide the supporting data required to determine
                                                                          state of affairs. The analyst is both responsible and held
        the quality and provenance of the delivered information.
                                                                          accountable for almost all activities, allowing no check on the
   Table I applies RASCI charting to describe the relationship            suitability of data for an analytic task. Responsibility is often
of the AKM role to the other roles within the intelligence                split between the analyst, who must use data supplied by
organization. For each activity in a process, a RASCI chart               tools, and the developers of tools that deliver data. Having
identifies who is responsible for carrying the activity, who is           multiple roles responsible for the same activity can lead to
accountable for the result, who provides support for the                  conflict and the activity not being completed, since the ‘buck’
activity, who is consulted in carrying out the activity, and              doesn’t stop at a specific doorstep.
who is informed about the status of the activity. In Table I,
we list only the activities for which the AKM is responsible              C.       Assessing the Needs of the AKM
(R). In carrying out or enabling the activities, the AKM may                 In order to carry out the described primary responsibilities,
have to consult (C) with other roles. For example, in order to            the knowledge manager must further:
determine the type of data that corresponds to a request from                  1. Understand the data requirements of the user
an analyst, the AKM will need to consult with the analyst and                     community in terms of the semantics of the particular
subject matter experts in order to build a representation of the                  problem domains of interest to that community
terminology used in the problem domain and the relationships                   2. Explore and understand potential information sources
between terms. Once an activity is completed, other roles                         and determine the relevance of the provided data to the
may have to be informed (I)—for example, the analyst must
                                                                                                       TABLE II
                              TABLE I                                                       RASCI MATRIX FOR THE REAL WORLD
                       RASCI MATRIX FOR AKM                                                                    Librarian/
                                                                                                                          Programmer/
                                                      Librarian/                Activity              Analyst Manager of
                                                                                                                           Developer
     Activity                 AKM      Analyst       Manager of                                               Data Source
                                                     Data Source                Understand
                                                                                                       R/A
     Provide analyst with                                                       problem semantics
                                R       C/I/A               C
     appropriate data                                                           Explore data
                                                                                                       R/A                      R/C
     Provide queries in the                                                     sources
     semantics of the data      R         C             C/I/A                   Create appropriate
     source                                                                     queries for data       R/A         C            R/C
     Interpret delivered                                                        sources
     data in the context of                                                     Interpret delivered
                                R       C/I/A
     specified analytical                                                       data within            R/A                        R
     problem                                                                    problem domain
     Provide provenance                                                         Provide
     and metadata for                                                           provenance and
                                R        I/A                C                                          R/A         C              R
     interpretation of data                                                     metadata to ensure
     quality                                                                    data quality
     community of interest                                            enabling repetition of results.
  3. Adapt requests for data to the format required by the              We now describe tools which are relevant to support the
     selected data source, without losing the semantic                AKM functions within the intelligence organization.
     meaning implied by the request or the retrieved data             1)         Tools for the Analyst as the AKM: Often the AKM
  4. Support delivery of information to analysts and analytic         role is delegated entirely to the analyst, who must determine
     systems using representations and semantics appropriate          the best key words to use to bridge from requirements to the
     to its intended use                                              documents of a data source, and understand the terminology
  5. Provide appropriate metadata – such as the original              used across multiple disciplines. The advantage to this
     source, publication date, and processing work-flow – of          approach is that the analyst has direct knowledge of the
     any data provided in response to a request                       source and provenance of the data that is obtained. However,
  6. Ensure that security protocols are invoked properly so           few tools are provided to support the analyst within the AKM
     that data are only available to those with the necessary         role beyond the firm grounding of the analyst in select
     credentials for obtaining the data                               disciplines and the ability of the analyst to quickly adapt to
                                                                      changing conditions and new data sources.
   Given these requirements, one of the primary needs of the
AKM is an ontology that describes the semantics of the target         2) AKM Tools for the Data-Base Manager: For structured
analytical domain. The ontology describes the semantic                data sources, there is at least some schema or description of
meaning of potential queries and the relationship between             the type of data that can be expected, and maybe even some
terms. The ontology describes basic attributes of terminology,        business rules that can be used to infer relationships between
                                                                      data. Here standard structured data tools such as schema
such as composition or subsumption, and may also describe
                                                                      editors and query engines can be used, with the assistance of
more advanced notions, such as formal definitions of terms in
                                                                      a knowledgeable data-base manager (DBM), to deliver
terms of primitive assertions. The knowledge manager will             appropriately annotated data to the analyst.
most likely need to develop or adapt much of this ontology so
that it serves the needs of the user community, and will use a           Intelligence organizations also work with their own
variety of tools to present the ontology to end users in order        knowledge and data repositories. These repositories can have
to validate its content and to ensure the its consistency.            some known data semantics and relationships, although those
   In order to determine if a potential data source will be           semantics often are only loosely related to individual problem
useful to their knowledge consumers, AKMs must be able to             semantics. The data manager can use standard database
access both the semantics of a data source, preferably through        management tools to organize and provide these repositories.
an ontology, and the relationship of the data delivered by the        While an experienced DBM may have an understanding of
data source to the source’s domain ontology. This can be a            the semantics of stored data that could be of use to the analyst
large bottleneck, since many data sources provide neither.            and application programmers, they may not be able to address
The resulting lack of formalized knowledge forces the                 questions about semantics outside of what is needed to
knowledge manager to define both of these using whatever              provide reliable performance and data security.
sources are available, including database schemas, XML                3)       Tools for AKM Role of Application Programmers
schemas, and—mostly—common-sense. Identifying the                     and Web Developers: AKM tasks are also supported by
correct semantics of data retrieved from the source can be            application programmers and web developers that provide
particularly onerous, potentially requiring specialized tools to      analytical tools. Often it is left to a programmer to determine
scrape source documents, information extraction software              where a required piece of data resides within a source
employing natural language processing, and, in the worse              repository and where to map that data into the analyst’s
                                                                      resident databases. It also is up to the designers and
case, hand annotation.
                                                                      implementers of these tools to ensure that all requisite
   In order to provide data that meets an analyst’s needs, the
                                                                      provenance and metadata is carried along with the data—
knowledge manager needs the ability to understand the
                                                                      failing to make this requirement known may result in the
relationship between terms used by client analysts and the            analyst obtaining interesting, but unusable, information.
terminology used by specific data sources. Visualizing and
                                                                         There are also few tools to support the AKM role of the
understanding these relationships is at the core of generating
                                                                      application programmer, who are left with the same tools as
appropriate queries and presenting data within the analyst’s
                                                                      the DBM, along with less structured tools such as XML
problem context.
                                                                      schemas and tags, to determine the semantics of data they
   Finally, the AKM must have access to metadata describing
                                                                      obtain from web sites and other sources. The programmer
data provenance. For each data element, metadata describing
                                                                      needs to coordinate not only with the DBM to determine
its source, e.g. the date of publication, the original source, etc,
                                                                      where to best store mined data within a structured data store,
and documenting its history of analytic or prepatory steps
                                                                      but must also use test cases and user acceptance tests to
should be made available to end users. Such metadata enable
                                                                      verify that the data delivered is displayed with the correct
users to understand the quality of the delivered data as well as
                                                                                                 TABLE III
semantics in deployed tools. These approaches can be                                      LINK TYPE PROPERTIES
effective when such defined requirements are available, but                              Transitive     Symmetric             Example
can also be cumbersome and limiting in the dynamic
                                                                    Directed graph          No               No           A knows B
environment of intelligence analysis.                               Simple graph            No               Yes          A friend of B
                                                                    Partial order           Yes              No           A employer of B
      III.      FORMAL SUPPORT FOR THE AKM ROLE                     Equivalence             Yes              Yes          A sibling of B
                                                                    classes
   The AKM responsibilities revolve around the generation,
maintenance, description, and alignment of ontologies for         cores”, specifically their class hierarchies connected by “is-a”
both the problem domain of the client analysts and of             subsumptive       and     “has-part”     compositional    links.
available data sources. Tools to support this task are only       Mathematically, these are partial orders, each corresponding
now emerging from the research community [9], and often           to the transitive, non-symmetric link types exemplified in
require a large investment of time to master. Given that the      Table III by the link type “employs”. Additionally, many of
AKM role is mostly filled now by application developers,          the most common links in RDF graphs are transitive,
DBMs, and end-users such as intelligence analysts, who            including “causes” “implies” and “precedes”. Any transitive
already need to master a large number of processing tools,        link yields a mathematical structure of a partial order, and
disciplines, and subject matter areas, it’s not surprising that   makes the machinery of order theory [2] available to exploit
these tools are often not understood and are underutilized.       these hierarchical constraints. In our past work, we have
   Current tools for ontology generation and maintenance are      described techniques based in order theory to support a
generally ontology editors. But AKMs require additional           variety of AKM tasks, including:
tools to help them accomplish such tasks as:                         x Clustering and Classification: Characterizing a
                                                                        portion of a hierarchy (e.g. groups of ontology nodes) to
  x Representing domain and source ontologies to end-
                                                                        identify common characteristics [10].
    users to enable validation and understanding
                                                                     x Alignment: Casting ontology matching [3] as mappings
  x Mapping or aligning the semantics of data sources to
                                                                        between hierarchical structures [4].
    the analyst’s problem domain
                                                                     x Induction from Source Data: Using concept lattices to
  x Aiding in the generation of ontologies for new or
                                                                        induce ontologies from textual relations [5].
    evolving problem domains
                                                                     x Visualization: Including exploiting the vertical level
   These tools and techniques can also be applied to the                structure of semantic hierarchies to achieve a
metadata associated with data sources to allow data quality             satisfactory layout [6].
and provenance to be available to the intelligence analyst.          In general, such a hierarchical analysis, when available,
   Our approach rests on being sensitive to the mathematical      promises complexity reduction, improved user interaction
properties of the link types present in an ontology, and in       with the knowledge base, and improved layout and visual
particular to their symmetric and transitive properties. Table    analytics. Fig. 1 shows a fragment of a semantic graph using
III shows the primary classes of link types in terms of these     the link types present in Table III. Once the hierarchical link
mathematical properties, together with their canonical            type “employs” is identified, the fragment can be laid out
mathematical structures and a simple example.                     according to the hierarchical layout shown in Fig. 2, the
   In practice, ontologies are dominated by their “hierarchical

                              Emily
                    Knows
                                   Employer
                           Employer
             Mary
                                                       Bill
                                        Sibling
       Employer                Joe
                                    Friend
                Employer    Employer
                  Sibling Sibling
                                     Steve
             Ted
                                           Friend
               Employer
                              Carol
                    Fig. 1. A simple semantic graph.               Fig 2: Semantic graph laid out by the hierarchical link type “employs.”
remaining, non-hierarchical link types moving around the
central hierarchical structure. The result is a great
clarification of the underlying link structure.
   Additionally, mathematical properties of the semantic
hierarchy, and of particular nodes within it, can be revealed to
the user. Especially in large semantic hierarchies where graph
drawing and visualization is difficult, it can be critical to
report such quantities as:
   x The number of nodes
   x “Edge density”: number of links per node
   x “Leaf density'': percentage of nodes which are terminals
                                                                                           Fig. 3. A simple semantic hierarchy alignment example.
   x Height: maximum chain length from the top to the
       bottom                                                                   performed interactively within a GUI-based tool suite such as
   x Amount of multiple inheritance: percent of nodes with                      PROMPT within the Protégé tool [7], augmentation with
       more than one parent                                                     such statistics will provide the AKM with the context needed
These quantities are over the whole semantic hierarchy.                         to understand the quality of the proposed mappings. For
Additionally, it is useful to be able to provide quantitative                   example, in Fig. 3, it is valuable to map nodes high in the
assessments of individual nodes in the hierarchy, for                           structure on the left to those high in the structure on the right,
example:                                                                        requiring the kind of quantification we have proposed here.
   x Depth: Number of levels down from the top
   x Height: Number of levels up from the bottom                                     IV.        IMPLEMENTATION WITHIN THE AKEA TOOL
   x Number of children                                                            The methods proposed above are being implemented with
   x Number of total descendants                                                the Analyst-Driven Knowledge Enhancement and Analysis
   x Number of parents                                                          (AKEA) tool at the Pacific Northwest National Laboratory.
   x Number of total ancestors                                                  AKEA was created for clients within the IC as an
   Such quantifications are very useful when performing                         environment for testing analyst interaction with semantically
alignment tasks. Fig 3 shows a small example of an                              labeled data and for enabling automation-supported
alignment between two semantic hierarchies. Our prior work                      knowledge-level analysis over contents of structured and
[4] has proposed methods for measuring the quality of such                      unstructured sources. While being ontology agnostic, AKEA
alignments based on such measures. And when alignment is                        depends on data representations which are ontologically


   Fig. 4. The AKEA tool showing a portion of an ontology used within the intelligence community. A portion of the “event” class hierarchy is linked to a
                                  portion of the “entity” hierarchy through the selected “from-organization” property.
                                                                                              V.         CONCLUSIONS
                                                                           The advent of knowledge-based systems and supporting
                                                                        knowledge bases is augmenting and making more critical the
                                                                        role of the Analytic Knowledge Manager. While IC personnel
                                                                        already perform these activities, current organizational
                                                                        systems and structures lend themselves to a fractured and less
                                                                        than effective execution. By clearly articulating these
                                                                        activities, the roles and responsibilities involved, and the
                                                                        resultant support needs, the IC can begin to move toward
                                                                        better recognition of the importance and value of the AKM.
                                                                        Such recognition will help bring about the systemic changes
                                                                        necessary to take full value of ontologically-based system
                                                                        investments, make that value more widely available, and
                                                                        make these technologies more readily applicable to the
                                                                        dynamic problems encountered by the intelligence analyst.
        Fig. 5. Relation types for hierarchical layout and filtering.
backed in order to provide the variety of visualization and                                     ACKNOWLEDGMENT
analytic capabilities offered.                                            This work funded by Battelle Memorial Institute under the
   For this effort we exploited and extended AKEA’s                     Threat Anticipation Initiative.
capabilities to additionally support activities of the AKM.
While many aspects of the AKM roles were already                                                    REFERENCES
addressed, these capabilities needed to be more directly                1.  S. Bonacorsi, RACI Diagram/RASCI Matrix - A Complete Definition,
focused on the ontology itself rather than on instance data                 The Project Management Hut, 2008.
                                                                        2. Davey, BA and Priestly, HA: (1990) Introduction to Lattices and
represented using the ontology.                                             Order, Cambridge UP, Cambridge UK, 2nd Edition
   The first step in this support was direct visualization of the       3. Euzenat, Jerome and Shvaiko, P: (2007) Ontology Matching, Springer-
ontology. Because of the complex nature of the classes and                  Verlag, Hiedelberg
relationships typically described within an ontology, typical           4. Joslyn, Cliff; Donaldson, Alex; and Paulson, Patrick: (2008)
                                                                            “Evaluating the Structural Quality of Semantic Hierarchy Alignments”,
link-node layouts fail to communicate meaningfully.                         Int. Semantic Web Conf. (ISWC 08), http://dblp.uni-
However, by integrating the visualization approached                        trier.de/db/conf/semweb/iswc2008p.html#JoslynDP08
described above, layouts appropriate to understanding the               5. Joslyn, Cliff; Paulson, Patrick; and Verspoor, KM: (2008) “Exploiting
                                                                            Term Relations for Semantic Hierarchy Construction”, Proc. Int. Conf.
conceptual and relational structures of the ontology begin to               Semantic Computing (ICSC 08), pp. 42-49, IEEE Computer Society,
address this problem. Fig. 4 provides a snapshot of an                      Los Alamitos CA
ontology presented in the AKEA ontology viewer using the                6. Joslyn, Cliff; Mniszewski, SM; Smith, SA; and Weber, PM: (2006)
                                                                            “SpindleViz: A Three Dimensional, Order Theoretical Visualization
subsumption hierarchy to drive layout. Fig. 5, left side, shows             Environment for the Gene Ontology”, Joint BioLINK and 9th Bio-
the controls for selecting among transitive relationships to                Ontologies Meeting (JBB 06), http://www.bio-
view other concept structure. At the right of Fig. 5 is the                 ontologies.org.uk/2006/download/Joslyn2EtAlSpindleviz.pdf}
                                                                        7. Noy, Natasha and Musan, Mark A: (2003) “The PROMPT Suite:
relationship filters used to de-clutter the display. Since the              Interactive Tools for Ontology Merging and Mapping”, Int. J. Human-
sheer number of relationships in most ontologies would                      Computer Studies, v. 59, pp.983-1024
obscure the concept structure, this allows the analyst to focus         8. D.E. O'Leary, “Enterprise knowledge management,” Computer, vol.
                                                                            31, no. 3, 1998, pp. 54-61.
on only the specific relationships of interest at any given time        9. T Tudorache, N.F. Noy, S. Tu, M. A. Musen: (2008) “Collaborative
to fully understand interactions between the concept                        Ontology Development in Protégé”, 7th International Semantic Web
structures and relationships.                                               Conference (ISWC 2008), Karlsruhe, Germany, Springer.
   Future work with AKEA will address additional activities             10. Verspoor, KM; Cohn, JD; Mniszewski, SM; and Joslyn, CA: (2006) “A
                                                                            Categorization Approach to Automated Ontological Function
of the AKM. Work is already underway to incorporate the                     Annotation”, Protein Science, v. 15, pp. 1544-1549
structural characterization statistics of the ontology and of
classes and relationships. However, the most important
change will be the ability to address multiple ontologies. This
will enable the visualization, analysis and creation of
alignment mappings between ontologies for communication,
documentation, and automated translation needs.