Supporting the Analytic Knowledge Manager: Formal Methods for Ontology Display and Management Alan Chappell, Anthony Bladek, Cliff Joslyn, Eric Marshall, Liam McGrath, Patrick Paulson, Sean Stolberg, and Amanda White Pacific Northwest National Laboratory Abstract — The Intelligence Community and other analytic- IT department, a librarian, and a technical support group, focused communities are developing and implementing large each of which must understand and support multiple user knowledge bases and semantic-based systems. These systems communities within that organization. This ultimately leaves require new activities for managing their ontological underpinning, including a range of tasks from supporting the end user—the intelligence analyst—with many of the domain description and evolution to integrating multiple source tasks of the knowledge manager. These shared tasks include: of semantic information. Beyond the role of the analyst or the how to construct requests for data, how to access the resultant traditional data base administrator, the role of the knowledge data, and how to integrate them into an analysis. And, since manager as the point of focus for such activities is growing in the required semantics of data can be lost or modified by the prominence. We are developing methods and tools to provide an many de-facto AKMs along the data delivery chain analytical ability for the display and management of ontological systems, rooted in the formal properties of semantic relations in (including all those listed previously), it may be impossible semantic graphs, and the semantic hierarchies in which they are for the analyst to retrieve information related to an valued. We describe methods for display, integration, and intelligence problem or the metadata necessary to determine management of ontological resources to support the emerging the quality of data. Analytical Knowledge Manager with the AKEA tool. We find that many workgroups within the IC already rely, Index terms — Knowledge management, knowledge formally or informally, on selected member of the workgroup manager, ontology visualization, ontology alignment. to assist others with AKM functions. This person typically is technology “savvy” and skilled in the use of a wide set of I. INTRODUCTION data access and transformation tools. Unfortunately, this ad In this paper we address the needs of the “Analytic hoc role often is under-recognized and under-resourced, Knowledge Manager” (AKM), a hypothetical actor whose which can exacerbate the workload of the individual even if responsibilities are to manage not the underlying data of an enhancing the effectiveness of the workgroup. analytical organization, but rather the collection of its We argue that a recognition of the AKM role in terms of its semantic information, ontologies, and schemata. The responsibilities and the support it requires will allow an semantic domain of an enterprise is linked both to the content intelligence enterprise to more effectively find the data of its data and the applications in which those data are used. required for a particular analysis task, allow the analyst to Thus the AKM must respond to the needs of a particular understand the quality and provenance of data, and help analytical/scientific function in much the same way that the prevent the analyst from being overwhelmed by data not IT manager responds to the needs of the business function of pertaining to the current problem. Ultimately, a formal an organization. assessment of AKM roles may assist with understanding Since the role of the AKM is a relatively recent evolution, access control and separation of duty considerations. most organizations splinter the associated functions among In this paper we use the RASCI (Responsible, multiple actors, each performing AKM functions as adjuncts Accountable, Supportive, Consulted, Informed) framework to data processing pathways established before the [1] to define the AKM role, its responsibilities, and the organization incorporated semantic processing. These actors support required for the AKM role. We then describe typical include the producers of the data; the end users of the data AKM tasks in the context of ontology management, including (e.g. intelligence analysts), those who store and provide analysis and linkage. We describe our approach to supporting access to the data (e.g. IT and DBMs); and intermediaries such AKM tasks on ontologies through the formal analysis of (e.g. web site managers, web programmers, information the mathematical properties of link types, and in particular retrieval specialists, and anyone who must interpret, the manipulation of semantic hierarchies. We conclude by transform, or manipulate the data). illustrating our implementation of these methods within the Large organizations typically provide partial support for AKEA tool. AKM roles through formal groups. These groups include an II. BACKGROUND   be informed when requested data has been delivered. Finally, Knowledge Management (KM) is a discipline that strives the chart specifies who the AKM is accountable (A) to for to organize and preserve knowledge, making it accessible to the specified activity. As an example, the analyst the enterprise [8]. In the domain of intelligence analysis, the acknowledges that delivered data matches their requirements primary knowledge is fluid and tied to specific analytical and that is placed in the proper context in their analysis tools. problems. B. The Current State: de Facto AKMs A. The Potential Roles of AKMs Given the complexity of the AKM’s task, the heavy We envision the following to be the primary dependence on knowledge that is tightly bound to particular responsibilities of the AKM: problems and problem domains, and the need to access data with many formats and from many sources—each with their 1. To enable access to information by analysts that fulfills own set of semantics—it is understandable that the role of the requirements of a particular analytic problem AKM either has been ignored or distributed to other parts of 2. To provide queries to data sources using the semantics the organization. This leads to a lack of responsibility and and syntax expected by the data source accountability for the activities that should belong to the 3. To interpret the provided information within the context AKM. of the analytic problem, and Table II applies the RASCI chart method to this current 4. To provide the supporting data required to determine state of affairs. The analyst is both responsible and held the quality and provenance of the delivered information. accountable for almost all activities, allowing no check on the Table I applies RASCI charting to describe the relationship suitability of data for an analytic task. Responsibility is often of the AKM role to the other roles within the intelligence split between the analyst, who must use data supplied by organization. For each activity in a process, a RASCI chart tools, and the developers of tools that deliver data. Having identifies who is responsible for carrying the activity, who is multiple roles responsible for the same activity can lead to accountable for the result, who provides support for the conflict and the activity not being completed, since the ‘buck’ activity, who is consulted in carrying out the activity, and doesn’t stop at a specific doorstep. who is informed about the status of the activity. In Table I, we list only the activities for which the AKM is responsible C. Assessing the Needs of the AKM (R). In carrying out or enabling the activities, the AKM may In order to carry out the described primary responsibilities, have to consult (C) with other roles. For example, in order to the knowledge manager must further: determine the type of data that corresponds to a request from 1. Understand the data requirements of the user an analyst, the AKM will need to consult with the analyst and community in terms of the semantics of the particular subject matter experts in order to build a representation of the problem domains of interest to that community terminology used in the problem domain and the relationships 2. Explore and understand potential information sources between terms. Once an activity is completed, other roles and determine the relevance of the provided data to the may have to be informed (I)—for example, the analyst must TABLE II TABLE I RASCI MATRIX FOR THE REAL WORLD RASCI MATRIX FOR AKM Librarian/ Programmer/ Librarian/ Activity Analyst Manager of Developer Activity AKM Analyst Manager of Data Source Data Source Understand R/A Provide analyst with problem semantics R C/I/A C appropriate data Explore data R/A R/C Provide queries in the sources semantics of the data R C C/I/A Create appropriate source queries for data R/A C R/C Interpret delivered sources data in the context of Interpret delivered R C/I/A specified analytical data within R/A R problem problem domain Provide provenance Provide and metadata for provenance and R I/A C R/A C R interpretation of data metadata to ensure quality data quality community of interest enabling repetition of results. 3. Adapt requests for data to the format required by the We now describe tools which are relevant to support the selected data source, without losing the semantic AKM functions within the intelligence organization. meaning implied by the request or the retrieved data 1) Tools for the Analyst as the AKM: Often the AKM 4. Support delivery of information to analysts and analytic role is delegated entirely to the analyst, who must determine systems using representations and semantics appropriate the best key words to use to bridge from requirements to the to its intended use documents of a data source, and understand the terminology 5. Provide appropriate metadata – such as the original used across multiple disciplines. The advantage to this source, publication date, and processing work-flow – of approach is that the analyst has direct knowledge of the any data provided in response to a request source and provenance of the data that is obtained. However, 6. Ensure that security protocols are invoked properly so few tools are provided to support the analyst within the AKM that data are only available to those with the necessary role beyond the firm grounding of the analyst in select credentials for obtaining the data disciplines and the ability of the analyst to quickly adapt to changing conditions and new data sources. Given these requirements, one of the primary needs of the AKM is an ontology that describes the semantics of the target 2) AKM Tools for the Data-Base Manager: For structured analytical domain. The ontology describes the semantic data sources, there is at least some schema or description of meaning of potential queries and the relationship between the type of data that can be expected, and maybe even some terms. The ontology describes basic attributes of terminology, business rules that can be used to infer relationships between data. Here standard structured data tools such as schema such as composition or subsumption, and may also describe editors and query engines can be used, with the assistance of more advanced notions, such as formal definitions of terms in a knowledgeable data-base manager (DBM), to deliver terms of primitive assertions. The knowledge manager will appropriately annotated data to the analyst. most likely need to develop or adapt much of this ontology so that it serves the needs of the user community, and will use a Intelligence organizations also work with their own variety of tools to present the ontology to end users in order knowledge and data repositories. These repositories can have to validate its content and to ensure the its consistency. some known data semantics and relationships, although those In order to determine if a potential data source will be semantics often are only loosely related to individual problem useful to their knowledge consumers, AKMs must be able to semantics. The data manager can use standard database access both the semantics of a data source, preferably through management tools to organize and provide these repositories. an ontology, and the relationship of the data delivered by the While an experienced DBM may have an understanding of data source to the source’s domain ontology. This can be a the semantics of stored data that could be of use to the analyst large bottleneck, since many data sources provide neither. and application programmers, they may not be able to address The resulting lack of formalized knowledge forces the questions about semantics outside of what is needed to knowledge manager to define both of these using whatever provide reliable performance and data security. sources are available, including database schemas, XML 3) Tools for AKM Role of Application Programmers schemas, and—mostly—common-sense. Identifying the and Web Developers: AKM tasks are also supported by correct semantics of data retrieved from the source can be application programmers and web developers that provide particularly onerous, potentially requiring specialized tools to analytical tools. Often it is left to a programmer to determine scrape source documents, information extraction software where a required piece of data resides within a source employing natural language processing, and, in the worse repository and where to map that data into the analyst’s resident databases. It also is up to the designers and case, hand annotation. implementers of these tools to ensure that all requisite In order to provide data that meets an analyst’s needs, the provenance and metadata is carried along with the data— knowledge manager needs the ability to understand the failing to make this requirement known may result in the relationship between terms used by client analysts and the analyst obtaining interesting, but unusable, information. terminology used by specific data sources. Visualizing and There are also few tools to support the AKM role of the understanding these relationships is at the core of generating application programmer, who are left with the same tools as appropriate queries and presenting data within the analyst’s the DBM, along with less structured tools such as XML problem context. schemas and tags, to determine the semantics of data they Finally, the AKM must have access to metadata describing obtain from web sites and other sources. The programmer data provenance. For each data element, metadata describing needs to coordinate not only with the DBM to determine its source, e.g. the date of publication, the original source, etc, where to best store mined data within a structured data store, and documenting its history of analytic or prepatory steps but must also use test cases and user acceptance tests to should be made available to end users. Such metadata enable verify that the data delivered is displayed with the correct users to understand the quality of the delivered data as well as TABLE III semantics in deployed tools. These approaches can be LINK TYPE PROPERTIES effective when such defined requirements are available, but Transitive Symmetric Example can also be cumbersome and limiting in the dynamic Directed graph No No A knows B environment of intelligence analysis. Simple graph No Yes A friend of B Partial order Yes No A employer of B III. FORMAL SUPPORT FOR THE AKM ROLE Equivalence Yes Yes A sibling of B classes The AKM responsibilities revolve around the generation, maintenance, description, and alignment of ontologies for cores”, specifically their class hierarchies connected by “is-a” both the problem domain of the client analysts and of subsumptive and “has-part” compositional links. available data sources. Tools to support this task are only Mathematically, these are partial orders, each corresponding now emerging from the research community [9], and often to the transitive, non-symmetric link types exemplified in require a large investment of time to master. Given that the Table III by the link type “employs”. Additionally, many of AKM role is mostly filled now by application developers, the most common links in RDF graphs are transitive, DBMs, and end-users such as intelligence analysts, who including “causes” “implies” and “precedes”. Any transitive already need to master a large number of processing tools, link yields a mathematical structure of a partial order, and disciplines, and subject matter areas, it’s not surprising that makes the machinery of order theory [2] available to exploit these tools are often not understood and are underutilized. these hierarchical constraints. In our past work, we have Current tools for ontology generation and maintenance are described techniques based in order theory to support a generally ontology editors. But AKMs require additional variety of AKM tasks, including: tools to help them accomplish such tasks as: x Clustering and Classification: Characterizing a portion of a hierarchy (e.g. groups of ontology nodes) to x Representing domain and source ontologies to end- identify common characteristics [10]. users to enable validation and understanding x Alignment: Casting ontology matching [3] as mappings x Mapping or aligning the semantics of data sources to between hierarchical structures [4]. the analyst’s problem domain x Induction from Source Data: Using concept lattices to x Aiding in the generation of ontologies for new or induce ontologies from textual relations [5]. evolving problem domains x Visualization: Including exploiting the vertical level These tools and techniques can also be applied to the structure of semantic hierarchies to achieve a metadata associated with data sources to allow data quality satisfactory layout [6]. and provenance to be available to the intelligence analyst. In general, such a hierarchical analysis, when available, Our approach rests on being sensitive to the mathematical promises complexity reduction, improved user interaction properties of the link types present in an ontology, and in with the knowledge base, and improved layout and visual particular to their symmetric and transitive properties. Table analytics. Fig. 1 shows a fragment of a semantic graph using III shows the primary classes of link types in terms of these the link types present in Table III. Once the hierarchical link mathematical properties, together with their canonical type “employs” is identified, the fragment can be laid out mathematical structures and a simple example. according to the hierarchical layout shown in Fig. 2, the In practice, ontologies are dominated by their “hierarchical Emily Knows Employer Employer Mary Bill Sibling Employer Joe Friend Employer Employer Sibling Sibling Steve Ted Friend Employer Carol Fig. 1. A simple semantic graph. Fig 2: Semantic graph laid out by the hierarchical link type “employs.” remaining, non-hierarchical link types moving around the central hierarchical structure. The result is a great clarification of the underlying link structure. Additionally, mathematical properties of the semantic hierarchy, and of particular nodes within it, can be revealed to the user. Especially in large semantic hierarchies where graph drawing and visualization is difficult, it can be critical to report such quantities as: x The number of nodes x “Edge density”: number of links per node x “Leaf density'': percentage of nodes which are terminals Fig. 3. A simple semantic hierarchy alignment example. x Height: maximum chain length from the top to the bottom performed interactively within a GUI-based tool suite such as x Amount of multiple inheritance: percent of nodes with PROMPT within the Protégé tool [7], augmentation with more than one parent such statistics will provide the AKM with the context needed These quantities are over the whole semantic hierarchy. to understand the quality of the proposed mappings. For Additionally, it is useful to be able to provide quantitative example, in Fig. 3, it is valuable to map nodes high in the assessments of individual nodes in the hierarchy, for structure on the left to those high in the structure on the right, example: requiring the kind of quantification we have proposed here. x Depth: Number of levels down from the top x Height: Number of levels up from the bottom IV. IMPLEMENTATION WITHIN THE AKEA TOOL x Number of children The methods proposed above are being implemented with x Number of total descendants the Analyst-Driven Knowledge Enhancement and Analysis x Number of parents (AKEA) tool at the Pacific Northwest National Laboratory. x Number of total ancestors AKEA was created for clients within the IC as an Such quantifications are very useful when performing environment for testing analyst interaction with semantically alignment tasks. Fig 3 shows a small example of an labeled data and for enabling automation-supported alignment between two semantic hierarchies. Our prior work knowledge-level analysis over contents of structured and [4] has proposed methods for measuring the quality of such unstructured sources. While being ontology agnostic, AKEA alignments based on such measures. And when alignment is depends on data representations which are ontologically Fig. 4. The AKEA tool showing a portion of an ontology used within the intelligence community. A portion of the “event” class hierarchy is linked to a portion of the “entity” hierarchy through the selected “from-organization” property. V. CONCLUSIONS The advent of knowledge-based systems and supporting knowledge bases is augmenting and making more critical the role of the Analytic Knowledge Manager. While IC personnel already perform these activities, current organizational systems and structures lend themselves to a fractured and less than effective execution. By clearly articulating these activities, the roles and responsibilities involved, and the resultant support needs, the IC can begin to move toward better recognition of the importance and value of the AKM. Such recognition will help bring about the systemic changes necessary to take full value of ontologically-based system investments, make that value more widely available, and make these technologies more readily applicable to the dynamic problems encountered by the intelligence analyst. Fig. 5. Relation types for hierarchical layout and filtering. backed in order to provide the variety of visualization and ACKNOWLEDGMENT analytic capabilities offered. This work funded by Battelle Memorial Institute under the For this effort we exploited and extended AKEA’s Threat Anticipation Initiative. capabilities to additionally support activities of the AKM. While many aspects of the AKM roles were already REFERENCES addressed, these capabilities needed to be more directly 1. S. Bonacorsi, RACI Diagram/RASCI Matrix - A Complete Definition, focused on the ontology itself rather than on instance data The Project Management Hut, 2008. 2. Davey, BA and Priestly, HA: (1990) Introduction to Lattices and represented using the ontology. Order, Cambridge UP, Cambridge UK, 2nd Edition The first step in this support was direct visualization of the 3. Euzenat, Jerome and Shvaiko, P: (2007) Ontology Matching, Springer- ontology. Because of the complex nature of the classes and Verlag, Hiedelberg relationships typically described within an ontology, typical 4. Joslyn, Cliff; Donaldson, Alex; and Paulson, Patrick: (2008) “Evaluating the Structural Quality of Semantic Hierarchy Alignments”, link-node layouts fail to communicate meaningfully. Int. Semantic Web Conf. (ISWC 08), http://dblp.uni- However, by integrating the visualization approached trier.de/db/conf/semweb/iswc2008p.html#JoslynDP08 described above, layouts appropriate to understanding the 5. Joslyn, Cliff; Paulson, Patrick; and Verspoor, KM: (2008) “Exploiting Term Relations for Semantic Hierarchy Construction”, Proc. Int. Conf. conceptual and relational structures of the ontology begin to Semantic Computing (ICSC 08), pp. 42-49, IEEE Computer Society, address this problem. Fig. 4 provides a snapshot of an Los Alamitos CA ontology presented in the AKEA ontology viewer using the 6. Joslyn, Cliff; Mniszewski, SM; Smith, SA; and Weber, PM: (2006) “SpindleViz: A Three Dimensional, Order Theoretical Visualization subsumption hierarchy to drive layout. Fig. 5, left side, shows Environment for the Gene Ontology”, Joint BioLINK and 9th Bio- the controls for selecting among transitive relationships to Ontologies Meeting (JBB 06), http://www.bio- view other concept structure. At the right of Fig. 5 is the ontologies.org.uk/2006/download/Joslyn2EtAlSpindleviz.pdf} 7. Noy, Natasha and Musan, Mark A: (2003) “The PROMPT Suite: relationship filters used to de-clutter the display. Since the Interactive Tools for Ontology Merging and Mapping”, Int. J. Human- sheer number of relationships in most ontologies would Computer Studies, v. 59, pp.983-1024 obscure the concept structure, this allows the analyst to focus 8. D.E. O'Leary, “Enterprise knowledge management,” Computer, vol. 31, no. 3, 1998, pp. 54-61. on only the specific relationships of interest at any given time 9. T Tudorache, N.F. Noy, S. Tu, M. A. Musen: (2008) “Collaborative to fully understand interactions between the concept Ontology Development in Protégé”, 7th International Semantic Web structures and relationships. Conference (ISWC 2008), Karlsruhe, Germany, Springer. Future work with AKEA will address additional activities 10. Verspoor, KM; Cohn, JD; Mniszewski, SM; and Joslyn, CA: (2006) “A Categorization Approach to Automated Ontological Function of the AKM. Work is already underway to incorporate the Annotation”, Protein Science, v. 15, pp. 1544-1549 structural characterization statistics of the ontology and of classes and relationships. However, the most important change will be the ability to address multiple ontologies. This will enable the visualization, analysis and creation of alignment mappings between ontologies for communication, documentation, and automated translation needs.