<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Event Ob ject Boundaries in RDF Streams { A Position Paper {</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Robin Keskisarkka</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eva Blomqvist</string-name>
          <email>eva.blomqvistg@liu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer and Information Science Linkoping University</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The amount of information available as online streams is increasing steadily. A number of RDF stream processing systems have been developed in an attempt to leverage existing Semantic Web technologies, and to support typical stream operations, but very little attention has been paid to the way in which event objects (i.e. data records representing events) are streamed. In this position paper, we present the issue of respecting event object boundaries in RDF streams, and discuss some pros and cons of the various solutions.</p>
      </abstract>
      <kwd-group>
        <kwd>RDF stream processing</kwd>
        <kwd>RDF streams</kwd>
        <kwd>event objects</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The amount of information available as online streams is increasing steadily, and
a number of domains already rely heavily on streams of data, e.g., market feed
processing and electronic trading [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The costs of deploying sensor networks has
been dropping dramatically in recent years, and they are expected to become
common in a variety of tasks, including natural disaster response, surveillance,
monitoring of potential terrorist and criminal activity, military planning, and
more [
        <xref ref-type="bibr" rid="ref11 ref5">5, 11</xref>
        ].
      </p>
      <p>
        The Semantic Web community has attempted to lift streaming content to
a semantic level [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], but in the context of streaming data traditional Semantic
Web technologies are unsatisfactory. Streaming data is characterized by being
received continuously in a possibly in nite stream [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and in many contexts the
streaming data becomes outdated quickly, and must therefore be consumed on
the y. Delays in query answering applies both to feeds generating thousands of
messages per second, as well as feeds generating only a few messages every hour.
In process control, and automation of industrial facilities, streams with large
data volumes need to be processed with minimum latency, and in electronic
trading systems even sub-second delays can a ect pro ts [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        While benchmarks have been developed to evaluate system performance [
        <xref ref-type="bibr" rid="ref10 ref13">13,
10</xref>
        ], and recommendations for publishing Linked Stream Data data have been
produced [
        <xref ref-type="bibr" rid="ref11 ref4">4, 11</xref>
        ], the existing work on RDF stream processing has only discussed
how complex data can be represented in a streams, to allow graph boundaries
to be respected, to limited degree.
      </p>
      <p>In this position paper, we present the issue of respecting event object
boundaries in RDF streams, and present some possible solutions with respect to
existing state of the art, and lists some of the pros and cons of the di erent solutions.
Finally, we discuss the open challenges for future research.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background - RDF Stream Processing</title>
      <p>
        The authors in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] di erentiate between three di erent basic architectures that
can be applied to solve high-volume low-latency streaming problems: Database
Management Systems, rule engines, and stream processing engines. This paper
makes no assumptions about underlying architectures, rather, all systems
supporting continuous querying of RDF streams in combination with static RDF
data is referred to as RDF stream processing systems.
      </p>
      <p>Contrary to queries executed against static RDF data, the constant ow of
information in a streaming data context means that no nal answer can ever be
returned. Queries must instead be executed continuously as new data becomes
available. Typically, windows over streams are used to allow outdated triples to
be ignored, which also makes it possible to support common aggregate functions
in queries.</p>
      <p>
        There are, as of yet, no standards for how RDF data should be represented as
streams, but a number of state-of-the-art RDF stream processing systems have
assumed that the data is delivered as individual RDF triples [
        <xref ref-type="bibr" rid="ref1 ref3 ref6">1, 6, 3</xref>
        ]. Generally,
it is also assumed that the triples in a stream are implicitly ordered by arrival
time, or explicitly ordered by timestamps. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] an RDF stream is de ned as
an ordered sequence of tuples, where each pair is made of an RDF triple and a
timestamp :
      </p>
      <p>(hsubji; predi; objii; i)
(hsubji+1; predi+1; obji+1i; i+1)</p>
      <p>:::</p>
      <p>
        Since the sequence of pairs is ordered the timestamps are strictly non-decreasing.
Triples with the same timestamp, although sequenced in a speci c order in the
stream, should be regarded as having \occurred" at the same time. This view is
similar to that which is often used in Data Stream Management Systems [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. If
timestamps are not supplied in the stream triples are typically associated with
the time of arrival.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Problem</title>
      <p>
        Event objects (i.e. data records denoting events) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] do not only consist of
individual RDF triples, but rather of RDF graphs. When event objects are decomposed
into lists of RDF triples, intended to be streamed, it becomes essential that the
boundaries of the event objects can be respected within queries, since a query
that observes only a partial RDF graph may return false results.
      </p>
      <p>In sensor streams, the issue of event object boundaries can often be handled
without any great di culties. For example, if synchronization issues are ignored,
a sensor reporting a xed number of values, or the intervals between streamed
events, is known streams can be viewed through a window that slides over the
stream. But when the input is not predictable, a window over a stream, by
itself, cannot guarantee that all the triples of an RDG graph are within window
boundaries at any given point in time, thus, we could risk querying a partially
streamed graph.</p>
      <p>The solutions presented in the following section are aimed both towards RDF
stream processing systems (i.e. event consumers), and towards the RDF streams
(i.e. event producers). For brevity, queries will be discussed only in general terms,
rather than as full SPARQL queries.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Solutions</title>
      <p>Depending on the complexity of the RDF graphs that are transferred in an RDF
stream, di erent solutions are possible. As mentioned in the previous section, if
the events in a stream have predictable event object boundaries, these can be
respected simply by having a window that slides with a xed amount between
query evaluations. However, when event objects are less predictable, this is no
a su cient solution. To support dynamic, and heterogeneous, streams of event
objects, a solution must be able to handle streams for which neither the intervals
between events, nor the number of triples in an event, is known.
Header A common approach to making transferred data interpretable is to
provide a message \header", containing meta data about the message. In the
case of RDF streams each event object could be preceded by a header, providing
information about the RDF graph being streamed. A header would have to be
generated by the event producer for every new event object. The event consumer
(i.e the RDF stream processing system) would subsequently have to interpret
each header, ensuring that registered queries are not executed against a partially
transferred RDF graph. The header element could be represented in various
ways, e.g., as a triple, a set of triples, or as a dedicated type. A simple example
of a header can be seen Figure 1.</p>
      <p>Past
4
2
7</p>
      <p>Future</p>
      <p>Commonality A common attribute, or feature of \commonality", can be used
as an indicator for the boundary of an RDF graph. A very basic feature of
commonality could be that triples belonging to an RDF graph are streamed in
sequence. A trailing boundary triple could then be used as a marker to denote
that the entire RDF graph has been streamed (see Figure 2).</p>
      <p>A more robust solution would be to attach the trailing triple to a node that
is unique for the particular graph, i.e., some node that can be used to refer to
the speci c event object. This could provide support for graph boundaries in
situations where graphs can overlap each other in a stream.</p>
      <p>RDF Graph Stream RDF data streams are usually considered to be composed
of individual RDF triples, but we can also consider streaming RDF graphs
directly. Streaming sets of RDF triples together, rather than the individual triples
separately, would greatly simplify the task of the event producer, since neither
the order of decomposition of event object graphs, nor the addition of triples,
needs to be considered. Figure 3 illustrates a stream of RDF triple sets, where
each set represent a complete event object graph.</p>
      <p>Past</p>
      <p>Future</p>
      <p>Customized Decomposition Controlling the order, by which the individual
triples in an RDF graph are streamed, can sometimes provide a more e cient way
of streaming RDF graphs. If an RDF graph can be transferred in an order that
implicitly respects event object boundaries correctly, results can be returned
even before the a complete RDF graph has been received. This can be quite
valuable in contexts where latencies must be kept as low as possible. There is,
however, no general way of employing this solution for arbitrary RDF graphs.</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>Passing a header, as the rst element of a decomposed RDF graph, can provide
meta data about event objects. For example, before the arrival of an event
object the header may convey the size of the arriving RDF graph, allowing the
RDF stream processing system to adapt windows over the stream accordingly.
A header could, on the other hand, require a di erent way of consuming data
streams, if the header information is to be handled automatically by the RDF
stream processing system. One of the bene ts of the header solution is that, if
the header was handled by the system internally, queries would not need use any
special patterns to respect event object boundaries.</p>
      <p>Providing a trailing triple, generated for the purpose of separating two RDF
graphs, provides a simple way of separating event objects in a stream. The triple
could be generated in any number of ways, but the most robust solutions would
require it to be connected, in some way, to a unique event object identi er in the
preceding graph. Even if we assume that event object triples may overlap each
other in the stream, the trailing triple could still be used to determine that the
entire graph has been streamed.</p>
      <p>The streaming of RDF graphs, rather than individual triples, is a feasible
solution for adding support for event object boundaries. Streaming graphs, rather
than triples, would require a di erent way of consuming data streams than in
the case of triples. RDF stream processing engines, that rely on evaluation of all
queries for every new incoming triple, would either be forced to suppress results
based on partial graphs, or would need to support adding multiple triples at a
time. The bene t of streaming graphs directly is that, since the event object
boundaries never become an issue, engines could provide a built-in support for
event object boundaries.</p>
      <p>The nal solution, suggested in this paper, involves customizing the
decomposition of a graph, so that the boundaries are implicitly respected by the order
of the streamed triples. In domains where real-time execution is a priority this
would enable queries to be executed as soon as a match can be found. However,
since this solution is only guaranteed to work in some contexts, and requires a
customized solution, it cannot be viewed as a general solution to the issue of
event object boundaries.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>While it is rarely discussed in research literature, the way in which RDF graphs
are decomposed into streams of RDF triples has a direct impact on the ability
to distinguish graph boundaries. In streams emitting predictable RDF graphs, it
is usually su cient to let the boundaries of event objects be de ned by windows
that slide across the stream. However, this solution is not su cient if neither
the time interval between events, nor the number triples in each event object, is
known ad remains static.</p>
      <p>In this paper, we have presented four possible solutions, ranging from adding
a single trailing triple to the end of each graph, to solutions that require
modi cations to the way in which RDF data streams are represented, and the way
in which they are typically consumed by RDF stream processing engines.</p>
      <p>
        Future challenges involve practical evaluation, and standardization, of the
various solutions. It would, e.g., be necessary to develop vocabularies to express
various types of headers, and trailing triples. It is also necessary to study the
efciency of the various solutions empirically, to ensure that low latency, necessary
in a number of domains [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], can be maintained.
      </p>
      <p>The various solutions can also be combined in various ways. As an example,
domains requiring near real-time execution could bene t from event object
decomposition that optimizes the stream for the most common type of query, even
if this solution alone does not provide support for event object boundaries.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Anicic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fodor</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rudolph</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stojanovic</surname>
          </string-name>
          , N.:
          <article-title>EP-SPARQL: a uni ed language for event processing and stream reasoning</article-title>
          .
          <source>In: Proceedings of the 20th international conference on World wide web</source>
          . pp.
          <volume>635</volume>
          {
          <fpage>644</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Arasu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Babu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Widom</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The CQL continuous query language: semantic foundations and query execution</article-title>
          .
          <source>The VLDB Journal (2)</source>
          ,
          <volume>121</volume>
          {142 (Jul.)
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Barbieri</surname>
            ,
            <given-names>D.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Braga</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>An execution environment for C-SPARQL queries</article-title>
          .
          <source>In: Proceedings of the 13th International Conference on Extending Database Technology</source>
          . pp.
          <volume>441</volume>
          {
          <fpage>452</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Barbieri</surname>
            ,
            <given-names>D.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Della Valle</surname>
          </string-name>
          , E.:
          <article-title>A proposal for publishing data streams as linked data - a position paper</article-title>
          .
          <source>In: Proceedings of the Linked Data on the Web LDOW2010 Workshop</source>
          , co-located with WWW2010
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Goodwin</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Russomanno</surname>
            ,
            <given-names>D.J.:</given-names>
          </string-name>
          <article-title>Ontology integration within a service-oriented architecture for expert system applications using sensor networks</article-title>
          .
          <source>Expert Systems (5)</source>
          ,
          <volume>409</volume>
          {432 (Nov.)
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Le-Phuoc</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dao-Tran</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parreira</surname>
            ,
            <given-names>J.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hauswirth</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A native and adaptive approach for uni ed processing of linked streams and linked data</article-title>
          .
          <source>In: ISWC'11 Proceedings of the 10th international conference on the semantic web</source>
          . pp.
          <volume>370</volume>
          {
          <fpage>388</fpage>
          . No.
          <volume>24761</volume>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Le-Phuoc</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parreira</surname>
            ,
            <given-names>J.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hausenblas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hauswirth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Unifying stream data and linked open data</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Luckham</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schulte</surname>
          </string-name>
          , R.:
          <source>Event Processing Glossary Version 2</source>
          .0 (
          <issue>Jul</issue>
          .)
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Morris</surname>
            ,
            <given-names>G.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomas</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luk</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <source>FPGA Accelerated Low-Latency Market Data Feed Processing. 17th IEEE Symposium on High Performance</source>
          Interconnects pp.
          <volume>83</volume>
          {
          <issue>89</issue>
          (Aug.)
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Scharrenbach</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Urbani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Margara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Della Valle</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Seven Commandments for Benchmarking Semantic Flow Processing Systems</article-title>
          . The Semantic Web: . . . pp.
          <volume>305</volume>
          {
          <fpage>319</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Sequeda</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Linked stream data: A position paper</article-title>
          .
          <source>In: The 2nd International Workshop on Semantic Sensor Networks</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Stonebraker</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cetintemel</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zdonik</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The 8 requirements of real-time stream processing</article-title>
          .
          <source>ACM SIGMOD Record (4)</source>
          ,
          <volume>42</volume>
          {47 (Dec.)
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duc</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calbimonte</surname>
            ,
            <given-names>J.P.:</given-names>
          </string-name>
          <article-title>SRBench: A Streaming RDF/SPARQL Benchmark</article-title>
          .
          <source>In: Proceedings of International Semantic Web Conference (ISWC</source>
          <year>2012</year>
          ). pp.
          <volume>641</volume>
          {
          <fpage>657</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>