<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Streaming OWL</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mike Dean</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>BBN Technologies</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ann Arbor MI</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>USA mdean@bbn.com</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Stream processing can offer significant performance and scalability advantages for many Semantic Web applications. An important OWL profile for stream processing includes single OWL statements that allow inference and/or generation of new rules with single statement bodies. This position paper discusses our experiences and ideas in this area.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        A major next step for the Semantic Web is likely to be support for streaming
content, rather than focusing on sedentary web pages and knowledge bases. In
our work we’ve seen 10x+ performance improvements when using streaming vs.
materializing and then navigating an in-memory model for suitable applications.
This is analogous in the XML world to using SAX vs. DOM. Jeremy Carroll
similarly found a threefold time and space improvement over abstract syntax
tree approaches in applying stream processing to recognizing OWL dialects [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Semantic Web streaming involves processing one RDF statement at a time,
while maintaining a minimal amount of state. A useful profile of OWL can be
supported by streaming, as discussed in Section 2, particularly when statements
are used to generate rules with single statement bodies. In keeping with the
2character OWL 2 profile [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] naming convention, we might call such a streaming
profile OWL SL (which also avoids confusion with OWL-S). Section 2 details
OWL SL, while Section 3 describes previous work that led up to these ideas,
Section 4 discusses a prototype implementation using DERI Pipes, and Section
5 offers a generalization. Section 6 discusses additional work we plan to pursue,
and Section 7 concludes.
      </p>
      <p>OWL SL
Early in the DARPA Agent Markup Language (DAML) program I developed
dumpont1, a program that provides a view of OWL class and property hierarchies
while depicting restrictions using a representation that’s basically a combination
of Java method signatures and Kleene regular expressions. Compared to
ontology browsers that focus on a single class at a time, dumpont provides an effective
means of “seeing the forest for the trees”. We periodically found processes
consuming excessive CPU time on the www.daml.org system hosting the dumpont
web service. This was usually caused by people trying to run dumpont on a large
ontology such as OpenCyc. Converting the program from internalizing a model
to a streaming implementation using Jena’s ARP parser alleviated the problem.</p>
      <p>
        Around 2003, I added inference support to our DAML DB triple store2 (which
is now available in open source as Parliament3) by adding a simple rule
engine limited to single-statement bodies (which avoided any need for unification
or query optimization). Triggers were set on non-variable subjects, predicates
(other than rdf:type, unless it was the only non-variable) and objects that
appeared in rules. Rules were generated on the fly and maintained only in memory.
The idea was to generate a large number of very specific rules rather than
employ a small number of more general and complex rules [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The application that
motivated this work had a knowledge base that included a “reference load” data
set of about 1 million statements plus regularly incoming triples from natural
language extraction of web pages. The reference load happened to include a
largely unused OWL version of the United Nations Standard Products and
Services Code (UNSPSC), which included about 65,000 rdfs:subClassOf statements,
each of which generated 2 in-memory rules with associated triggers. DAML DB
still started up in a few seconds on a commodity server, so we never bothered
to remove UNSPSC. It turns out that the types of rules and techniques we used
here are exactly what’s needed for stream processing.
      </p>
      <p>
        Recently, in performing an analysis of the 2008 and 2009 Billion Triples
Challenge corpora, I found a 5-10X increase in performance using stream processing
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
1 http://www.daml.org/2001/03/dumpont/, http://www.daml.org/2003/09/dumpont/,
and http://semwebcentral.org/projects/dumpont/
2 http://www.daml.org/2001/09/damldb/
3 http://parliament.semwebcentral.org
      </p>
      <p>
        Other people are also getting interested in streaming of Semantic Web and
other content. DERI Pipes [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] provides a research framework and graphical
interface for stream processing of Semantic Web and other data.. IBM System S
provides a highly scalable but non-semantic streaming infrastructure.
Streambase and other Complex Event Processing engines provide stream processing
for tuples. Brad Allen proposed using Atom for distributing RDF content [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] at
SemTech 2007 while Nova Spivak has recently blogged4 and twittered about the
Stream replacing the Web.
4
      </p>
    </sec>
    <sec id="sec-2">
      <title>Prototype Implementation</title>
      <p>We’re developing a prototype DERI Pipes5 operator that embodies these ideas
and will report on it at the workshop. The basic approach is to check each
incoming statement for each of the OWL SL constructs and execute code that either
adds to the internal state (e.g. for rdfs:subClassOf) or that infers additional
statements (e.g. rdf:type).
5</p>
    </sec>
    <sec id="sec-3">
      <title>More General Streaming</title>
      <p>In many streaming applications, statements are likely to come in batches (e.g.
from updated web pages) rather than just one at a time. In this case, it’s likely
that certain constructs (e.g. OWL Restrictions) will be grouped together. Making
this assumption allows us to also add owl:allValuesFrom and owl:hasValue to an
extended version of OWL SL, which might be called OWL SL*.
6</p>
    </sec>
    <sec id="sec-4">
      <title>Knowledge Streams</title>
      <p>
        We’ve been developing a concept we call Knowledge Streams, which is depicted
in Figure 1 (from [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]). This shows stream networks for 2 overlapping
Communities of Interest (each likely using their own ontologies), with nodes (operators)
providing filtering, translation, augmentation (enrichment), aggregation,
alerting, inference, and other services. OWL ST could well be used for the inference
operator.
      </p>
      <p>Knowledge Streams can also be viewed as a step toward Semantic Complex
Event Processing based on triples rather than tuples.
7</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>We’ve identified a profile of OWL, which we call OWL SL, that’s suitable for
stream processing of RDF and OWL content. We hope we’ve also gotten other
people excited about the prospects for stream processing of active Semantic Web
content.
4 http://www.twine.com/item/128lryv9z-46/is-the-stream-the-next-new-metaphor
5 http://pipes.deri.org</p>
      <p>Fig. 1. Knowledge Streams</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Carroll</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          : Streaming
          <source>OWL DL. Proc. First European Semantic Web Symposium (ESWS</source>
          <year>2004</year>
          ), Heraklion, Crete, May
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fokoue</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C:</given-names>
          </string-name>
          <article-title>OWL 2 Web Ontology Language Profiles</article-title>
          .
          <source>W3C Candidate Recommendation 11 June</source>
          <year>2009</year>
          . http://www.w3.org/TR/2009/CR-owl2
          <string-name>
            <surname>-</surname>
          </string-name>
          profiles-20090611/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Semantic Web Rules: Covering the Use Cases</article-title>
          .
          <source>Proc. 3rd Intl. Workshop on Rules and Rule Markup Languages for the Semantic Web (RuleML</source>
          <year>2004</year>
          ), Springer LNCS 3323,
          <string-name>
            <surname>Hiroshima</surname>
          </string-name>
          , Japan,
          <year>October 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>How is the Semantic Web Being Used?: An Analysis of the Billion Triples Challenge Corpus</article-title>
          .
          <source>5th Semantic Technology Conference</source>
          , San Jose, California, May
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Le-Phuoc</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morbidoni</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hauswirth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tummarello</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Rapid Prototyping of Semantic Mash-Ups through Semantic Web Pipes</article-title>
          .
          <source>Proc. 18th World Wide Web Conference (WWW2009)</source>
          , Madrid, Spain,
          <year>April 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Allen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A Semantic Web Without RDF/XML: Building RDF Applications in Atom</article-title>
          .
          <source>3rd Semantic Technology Conference</source>
          , San Jose, California, May
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hebeler</surname>
          </string-name>
          , J.:
          <source>Semantic Web @ BBN. 5th Semantic Technology Conference</source>
          , San Jose, California, May
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>