Leveraging Emergent Ontologies in the Intelligence Community Jim Starz, Jason Losco, Brian Kettler, Rachel Hingst, and Chris Rouff Lockheed Martin Advanced Technology Laboratories jstarz@atl.lmco.com Abstract – The vision of a Semantic Web of from Wikipedia. This paper describes our prototype intelligence knowledge has yet to be fully realized, in application, its use of Wikipedia, and some preliminary part because of the tough challenges of ontology results. engineering and maintenance. Recent developments on the World Wide Web and IC intranets demonstrate that II. THE CONTRAIL TOOLS individual users are willing to supply structured information conforming to de facto standards. This can The Contrail tools help analysts find, organize, re-find, be most prominently seen in ”peer produced” and share unstructured and semi-structured information folksonomies and knowledge bases such as Wikipedia obtained from the Web (or Intelink), email, documents, and and Intellipedia, its cousin. Though these structures other sources [2]. While our focus is on intelligence lack the machine reasoning potential of highly analysts, these tasks are those of many knowledge workers. engineered ontologies, for many purposes they are “good Contrail has been evaluated in several experiments with real enough”. This paper describes Contrail, a prototype intel analysts on open source intelligence tasks. information management application, that leverages an “emergent” ontology from Wikipedia to model a intelligence analyst’s context and exploit that model to aid information retrieval, refinding, and sharing I. INTRODUCTION The widespread adoption of Semantic Web and other ontology-based applications in the intelligence community (and indeed the wider web) is that quality ontologies are difficult to build, maintain, and exploit. Ontology engineering requires significant subject domain expertise and knowledge engineering skills. For all-source and other kinds of analysts, such ontologies span a broad range of subject domains, which are constantly evolving. Wikipedia and Intellipedia are approaches to capturing this broad range of knowledge from the community without Fig. 1. High-Level Concept of Operations for Contrail Tools requiring pre-built ontologies. These knowledge bases are not without structure. A prominent example is the World Wide Web’s Wikipedia, which contains over fifteen million Fig. 1 shows the high-level concept of operations for pages. The structure for pages of the same type are very the Contrail tools as an analyst does her research online, she similar, illustrating that people are willing to provide finds relevant items through web browsing, web searches, structure in the form of lightweight ontology-like reading email, etc. Through instrumentation and logging information. This similarity is discussed in the work on services, Contrail is notified of these “information keeping Wikitology [4] and dbpedia [1]. actions”, such as the bookmarking of a web page. Contrail then performs a semantic analysis of each kept information While such “ontologies” might not support formal item’s content using text analytics and other methods. Using automated reasoning system well, they can support other the results of this analysis, Contrail updates its model of the useful applications. Our research investigated leveraging analyst’s context and stores a copy of the kept item in her emergent ontologies for the purposes of representing user Semantic Shoebox. A user’s Semantic Shoebox can be models of analysts. The work used an ontology derived thought of as a semantically grounded container for accumulated pieces of information. Contrail supports the knowledge base of concepts, however, presented significant sharing and retrieval of kept items from other analyst’s ontology engineering and maintenance challenges, as well shoeboxes. The contextual knowledge appended to these being limited by the underlying entity extractor used. These items by Contrail helps one analyst quickly understand the challenges – all potential barriers to Contrail’s deployment – potential relevance and pedigree of an item retrieved from included the potential breadth required for ontologies and another analyst’s shoebox. the handling of new concepts and entities in these dynamic domains. The Contrail Refinder tool, shown in Fig. 2, presents a more comprehensive view of a Semantic Shoebox and III. USING WIKIPEDIA displays a variety of information (textually and graphically) associated with a kept item including its metadata, content, To alleviate these issues, we have replaced the static and context tags. A user may do a one button search to ontology based context representation with one based on display those items most relevant to his current context. Wikipedia. We used IR based techniques to relate Contrail also presents context-relevant recommendations for documents with pages in Wikipedia and associated a score stored items and potential collaborators in a desktop sidebar. with each relationship. One significant benefit of this approach is the elimination of the need for knowledge At the core of Contrail is its Context Aggregator which engineering to update the “ontology.” Wikipedia serves as a maintains and updates the user’s context at each keeping publicly maintained emergent ontology, allowing for user action. Concepts and their instances (specific people, context to shift as the world changes. organizations, locations, etc.) are extracted from the text of the kept item using a commercial entity extractor. A Specifically, keeping actions performed by the users spreading activation algorithm is used to find related associate their interests in particular documents or snippets concepts in a knowledge base (KB). These related concepts of text. Based on this text, we query a Lucene index of might not be explicitly mentioned in the text itself. Wikipedia to obtain pages that may be of interest to the Extracted and related concepts are thus associated with an user. A weighted merge of the query results is performed activation level and the most active concepts represent the with their existing contextual information to form their user’s current context. Contrail’s KB, grounded in hand- updated user model. built OWL ontologies extending the SUMO [3]. It should be noted that given the scale of Wikipedia, This approach worked well, as judged in experiments such queries are very resource intensive. Despite this with analysts who periodically reviewed Contrail’s model of challenge, the results from leveraging the emergent their contexts. Contrail’s use of an ontologically-grounded ontology from Wikipedia appear promising. Fig. 2. Contrail Refinder (Item Browser, Item General Details, and Item Source Details screens) IV. EVALUATION ontological structure of Wikipedia for representing context has advantages over human-engineered ontologies for at Initial informal experimentation using this new least one application and likely many others. approach for user modeling has shown significant improvements over using a traditional static ontology in ACKNOWLEDGEMENTS representing user context. The new approach improves finding documents and collaborators. There was also Many of the concepts applied in this paper were anecdotal evidence that the biggest advantage occurred motivated by conversations with Tim Finin of the when new concepts and instances were present in the University of Maryland at Baltimore County. emergent ontology that could be immediately leveraged. An example of the differences is shown below. REFERENCES TABLE 1 [1] S. Auer, C. Bizer, J. Lehmann, G. Kobilarov, R. Cyganiak, Z. Example of context terms from static ontology and Wikitology Ives: DBpedia: A Nucleus for a Web of Open Data. In Aberer derived terms et al. (Eds.): The Semantic Web, 6th International Semantic Static Ontology Wikitology Web Conference, 2nd Asian Semantic Web Conference, Indonesia United Malyas Nat. Org. ISWC 2007 + ASWC 2007, Busan, Korea, November 11–15, Malaysia Ketuanan Melayu 2007. Lecture Notes in Computer Science 4825 Springer Singapore Mahatir bin Muhamed 2007, ISBN 978–3-540–76297–3. June Islam in Malaysia [2] B., Kettler (2008). Putting Knowledge in Context to Facilitate 2002 Anwar Ibraham Collaboration. In Proceedings of the 2008 International Symposium on Collaborative Technologies and Systems The Wikitology approach consistently provided more (May 19-23, 2008 in Irvine, CA). IEEE, 313-320. specific terms that may not easily be found in an ontology or [3] I. Niles, and A. Pease. 2001. Towards a standard upper by text analytics packages. Using the old approach, we ontology. In Proceedings of the international Conference on found general terms would dominate the user context. The Formal ontology in information Systems - Volume 2001 breadth of Wikipedia does add the potential for significant (Ogunquit, Maine, USA, October 17 - 19, 2001). FOIS '01. ACM, New York, NY, 2-9. noise, such as pages about specific dates. Though [4] Z, Syed et al., "Wikipedia as an Ontology for Describing Wikipedia is relatively comprehensive, for specific domains Documents", In Proceedings, Proceedings of the Second pages may not exist. For emerging concepts, it is critical to International Conference on Weblogs and Social Media, mirror Wikipedia and update the index regularly. The March 2008. results of this evaluation will be documented in a future [5] M. Williams and J. Hollan. (1981). The Process of Retrieval research paper. from Very Long-Term Memory. Cognitive Science 5: 87-119. V. FUTURE WORK Our research agenda includes further investigations to determine new applications where emergent ontologies can be applied. This investigation will include tools leveraging these ontologies for enhanced semantic authoring. We also plan to investigate the extraction of rules from patterns in emergent ontologies. A major focus area will be handling the significant scale and rapid updates of Wikipedia. Both of the aspects provide significant challenges and opportunities. Finally, we plan to make additional extensions to the Contrail suite of tools to extend the representation of user models. VI. CONCLUSION In the large distributed nature of the World Wide Web, leveraging massive convergence in terminology and structure can be highly useful. While these structures may not replace formal ontologies, they can be appropriate for certain applications and they can help bridge a gap to more formal structures. We have demonstrated that the use of the