<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Leveraging Emergent Ontologies in the Intelligence Community</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jim Starz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jason Losco</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brian Kettler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rachel Hingst</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fig. 1. High-Level Concept of Operations for Contrail Tools</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>- The vision of a Semantic Web of intelligence knowledge has yet to be fully realized, in part because of the tough challenges of ontology engineering and maintenance. Recent developments on the World Wide Web and IC intranets demonstrate that individual users are willing to supply structured information conforming to de facto standards. This can be most prominently seen in ”peer produced” folksonomies and knowledge bases such as Wikipedia and Intellipedia, its cousin. Though these structures lack the machine reasoning potential of highly engineered ontologies, for many purposes they are “good enough”. This paper describes Contrail, a prototype information management application, that leverages an “emergent” ontology from Wikipedia to model a intelligence analyst's context and exploit that model to aid information retrieval, refinding, and sharing</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Wikipedia and Intellipedia are approaches to capturing
this broad range of knowledge from the community without
requiring pre-built ontologies. These knowledge bases are
not without structure. A prominent example is the World
Wide Web’s Wikipedia, which contains over fifteen million
pages. The structure for pages of the same type are very
similar, illustrating that people are willing to provide
structure in the form of lightweight ontology-like
information. This similarity is discussed in the work on
Wikitology [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and dbpedia [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>While such “ontologies” might not support formal
automated reasoning system well, they can support other
useful applications. Our research investigated leveraging
emergent ontologies for the purposes of representing user
models of analysts. The work used an ontology derived
from Wikipedia. This paper describes our prototype
application, its use of Wikipedia, and some preliminary
results.</p>
    </sec>
    <sec id="sec-2">
      <title>II. THE CONTRAIL TOOLS</title>
      <p>
        The Contrail tools help analysts find, organize, re-find,
and share unstructured and semi-structured information
obtained from the Web (or Intelink), email, documents, and
other sources [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. While our focus is on intelligence
analysts, these tasks are those of many knowledge workers.
Contrail has been evaluated in several experiments with real
intel analysts on open source intelligence tasks.
      </p>
      <p>Fig. 1 shows the high-level concept of operations for
the Contrail tools as an analyst does her research online, she
finds relevant items through web browsing, web searches,
reading email, etc. Through instrumentation and logging
services, Contrail is notified of these “information keeping
actions”, such as the bookmarking of a web page. Contrail
then performs a semantic analysis of each kept information
item’s content using text analytics and other methods. Using
the results of this analysis, Contrail updates its model of the
analyst’s context and stores a copy of the kept item in her
Semantic Shoebox. A user’s Semantic Shoebox can be
thought of as a semantically grounded container for
accumulated pieces of information. Contrail supports the
sharing and retrieval of kept items from other analyst’s
shoeboxes. The contextual knowledge appended to these
items by Contrail helps one analyst quickly understand the
potential relevance and pedigree of an item retrieved from
another analyst’s shoebox.</p>
      <p>The Contrail Refinder tool, shown in Fig. 2, presents a
more comprehensive view of a Semantic Shoebox and
displays a variety of information (textually and graphically)
associated with a kept item including its metadata, content,
and context tags. A user may do a one button search to
display those items most relevant to his current context.
Contrail also presents context-relevant recommendations for
stored items and potential collaborators in a desktop sidebar.</p>
      <p>
        At the core of Contrail is its Context Aggregator which
maintains and updates the user’s context at each keeping
action. Concepts and their instances (specific people,
organizations, locations, etc.) are extracted from the text of
the kept item using a commercial entity extractor. A
spreading activation algorithm is used to find related
concepts in a knowledge base (KB). These related concepts
might not be explicitly mentioned in the text itself.
Extracted and related concepts are thus associated with an
activation level and the most active concepts represent the
user’s current context. Contrail’s KB, grounded in
handbuilt OWL ontologies extending the SUMO [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>This approach worked well, as judged in experiments
with analysts who periodically reviewed Contrail’s model of
their contexts. Contrail’s use of an ontologically-grounded
knowledge base of concepts, however, presented significant
ontology engineering and maintenance challenges, as well
being limited by the underlying entity extractor used. These
challenges – all potential barriers to Contrail’s deployment –
included the potential breadth required for ontologies and
the handling of new concepts and entities in these dynamic
domains.</p>
    </sec>
    <sec id="sec-3">
      <title>III. USING WIKIPEDIA</title>
      <p>To alleviate these issues, we have replaced the static
ontology based context representation with one based on
Wikipedia. We used IR based techniques to relate
documents with pages in Wikipedia and associated a score
with each relationship. One significant benefit of this
approach is the elimination of the need for knowledge
engineering to update the “ontology.” Wikipedia serves as a
publicly maintained emergent ontology, allowing for user
context to shift as the world changes.</p>
      <p>Specifically, keeping actions performed by the users
associate their interests in particular documents or snippets
of text. Based on this text, we query a Lucene index of
Wikipedia to obtain pages that may be of interest to the
user. A weighted merge of the query results is performed
with their existing contextual information to form their
updated user model.</p>
      <p>It should be noted that given the scale of Wikipedia,
such queries are very resource intensive. Despite this
challenge, the results from leveraging the emergent
ontology from Wikipedia appear promising.</p>
    </sec>
    <sec id="sec-4">
      <title>IV. EVALUATION</title>
      <p>Initial informal experimentation using this new
approach for user modeling has shown significant
improvements over using a traditional static ontology in
representing user context. The new approach improves
finding documents and collaborators. There was also
anecdotal evidence that the biggest advantage occurred
when new concepts and instances were present in the
emergent ontology that could be immediately leveraged. An
example of the differences is shown below.</p>
      <p>The Wikitology approach consistently provided more
specific terms that may not easily be found in an ontology or
by text analytics packages. Using the old approach, we
found general terms would dominate the user context. The
breadth of Wikipedia does add the potential for significant
noise, such as pages about specific dates. Though
Wikipedia is relatively comprehensive, for specific domains
pages may not exist. For emerging concepts, it is critical to
mirror Wikipedia and update the index regularly. The
results of this evaluation will be documented in a future
research paper.</p>
    </sec>
    <sec id="sec-5">
      <title>V. FUTURE WORK</title>
      <p>Our research agenda includes further investigations to
determine new applications where emergent ontologies can
be applied. This investigation will include tools leveraging
these ontologies for enhanced semantic authoring. We also
plan to investigate the extraction of rules from patterns in
emergent ontologies. A major focus area will be handling
the significant scale and rapid updates of Wikipedia. Both
of the aspects provide significant challenges and
opportunities. Finally, we plan to make additional
extensions to the Contrail suite of tools to extend the
representation of user models.</p>
    </sec>
    <sec id="sec-6">
      <title>VI. CONCLUSION</title>
      <p>In the large distributed nature of the World Wide Web,
leveraging massive convergence in terminology and
structure can be highly useful. While these structures may
not replace formal ontologies, they can be appropriate for
certain applications and they can help bridge a gap to more
formal structures. We have demonstrated that the use of the
ontological structure of Wikipedia for representing context
has advantages over human-engineered ontologies for at
least one application and likely many others.</p>
    </sec>
    <sec id="sec-7">
      <title>ACKNOWLEDGEMENTS</title>
      <p>Many of the concepts applied in this paper were
motivated by conversations with Tim Finin of the
University of Maryland at Baltimore County.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          , G. Kobilarov,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z.</surname>
          </string-name>
          <article-title>Ives: DBpedia: A Nucleus for a Web of Open Data</article-title>
          . In Aberer et al. (Eds.):
          <source>The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference</source>
          ,
          <string-name>
            <surname>ISWC</surname>
          </string-name>
          <year>2007</year>
          +
          <article-title>ASWC 2007, Busan</article-title>
          , Korea,
          <source>November 11-15</source>
          ,
          <year>2007</year>
          .
          <source>Lecture Notes in Computer Science 4825 Springer</source>
          <year>2007</year>
          , ISBN 978-3-
          <fpage>540</fpage>
          -76297-3.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Kettler</surname>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Putting Knowledge in Context to Facilitate Collaboration</article-title>
          .
          <source>In Proceedings of the 2008 International Symposium on Collaborative Technologies and Systems (May 19-23</source>
          , 2008 in Irvine, CA). IEEE,
          <fpage>313</fpage>
          -
          <lpage>320</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Niles</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Pease</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Towards a standard upper ontology</article-title>
          .
          <source>In Proceedings of the international Conference on Formal ontology in information Systems - Volume</source>
          <year>2001</year>
          (
          <article-title>Ogunquit, Maine</article-title>
          , USA, October
          <volume>17</volume>
          -
          <issue>19</issue>
          ,
          <year>2001</year>
          ).
          <source>FOIS '01. ACM</source>
          , New York, NY,
          <fpage>2</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z</given-names>
            ,
            <surname>Syed</surname>
          </string-name>
          et al.,
          <article-title>"Wikipedia as an Ontology for Describing Documents"</article-title>
          ,
          <source>In Proceedings, Proceedings of the Second International Conference on Weblogs and Social Media</source>
          ,
          <year>March 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Williams</surname>
          </string-name>
          and
          <string-name>
            <surname>J. Hollan.</surname>
          </string-name>
          (
          <year>1981</year>
          ).
          <article-title>The Process of Retrieval from Very Long-Term Memory</article-title>
          .
          <source>Cognitive Science</source>
          <volume>5</volume>
          :
          <fpage>87</fpage>
          -
          <lpage>119</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>