<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Unified Interface for Optimizing Continuous Query in Heterogeneous RDF Stream Processing Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Seungjun Yoon</string-name>
          <email>sjyoon@icl.yonsei.ac.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sejin Chun</string-name>
          <email>sjchun@icl.yonsei.ac.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiongnan Jin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kyong-Ho Lee</string-name>
          <email>khlee89@yonsei.ac.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Yonsei University</institution>
          ,
          <country>Republic of Korea</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The W3C RDF Stream Processing (RSP) community has proposed a common model and a language for querying RDF streams. However, the current RSP systems significantly differ from each other in terms of performance. In this paper, we propose a unified interface for optimizing a continuous query in heterogeneous RSP systems. To enhance the performance of RSP, a unified interface decomposes a query, reassembles partial queries and assigns them to appropriate RSP systems. Experimental results show that the proposed approach performs better in terms of memory consumption and latency.</p>
      </abstract>
      <kwd-group>
        <kwd>RDF stream processing</kwd>
        <kwd>Unified query interface</kwd>
        <kwd>RSP system</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The importance of RDF Stream Processing (RSP) is magnified by the extension of
Linked Open Data (LOD) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Since RSP systems have been proposed, we are able to
integrate data sources and process a continuous query using the RSP systems. In order
to provide a coherent semantic model for various RSP systems [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ], the RDF Stream
Processing Query Language (RSP-QL) was proposed by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We face various features
of RDF streams, e.g., the number of streams and the execution time of a stream, when
processing a continuous query using heterogeneous RSP systems. The performance of
existing RSP systems varies in terms of memory consumption and latency. We have to
evaluate them to select an optimal RSP system that processes with the smallest memory
and lowest latency.
      </p>
      <p>
        We propose a unified interface for processing a continuous query in three steps:
Firstly, we analyze a global query [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] using the query form (e.g. SELECT etc.) that is
defined by the SPARQL syntax. After analysis, the global query is divided into partial
queries according to the streams contained in the global query. Secondly, each partial
query is continuously evaluated based on stream features to select an optimal RSP
system, for which a partial query is registered. Thirdly, each RSP system returns an RDF
graph as the answer to a partial query. However, each partial RDF graph is incomplete
to answer the global query since it has only partial information. So we integrate partial
RDF graphs into a global RDF graph. Furthermore, we demonstrate that the proposed
method has advantages in comparison with C-SPARQL [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and CQELS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in terms of
the growth rate of memory consumption and latency.
Fig. 1 shows the overall process of the proposed unified interface of optimizing a global
query, given by a user for heterogeneous RSP systems. The process consists of five
stages: query partitioning, optimization, query localization, local system management
and result integration. Our main contributions lie on the three cores, including query
partitioning, optimization and local system management. In the following, the details
of the three core stages are described.
      </p>
      <p>Query partitioning. To assign a query to a suitable RSP system that has better
performance than the other RSP systems, we divide a global query into partial queries. As
shown in Algorithm 1, by dividing a given global query, our method returns a set of
partial queries that should be delivered to appropriate RSP systems. Specifically, the
tree of a query plan is constructed using the semantics of a given query. Each node of
the tree contains operators (e.g., window, join etc.), which perform respective roles.
And, each node finds a stream corresponding to an operator as an input parameter. Since
a partial query must include a given window operator and a stream, thus we repeatedly
constitute a pair of a window and a stream. Then a set of partial queries is returned to
be registered to the RSP systems.</p>
      <p>Optimization. In this stage, partial queries are optimized based on the stream features
to be allocated to appropriate RSP systems. Specifically, to check whether a partial
query is suitable to be processed on a specific RSP system or not, we obtain a set of
RSP systems, which support a certain operator. We also use the performance history
(e.g. memory consumption) to select an RSP system that can process the partial query
most efficiently in accordance with the execution time. The number of streams is not
considered by this stage because the partial query generated in the stage of Query
partitioning only has a single stream. Furthermore, we classify RSP systems into four
classes for registering a partial query to an optimal RSP system in accordance with two
factors: the initial memory consumption and the increasing rate of memory
consumption. If all of the factors are high, the corresponding RSP system is not selected because
the performance is always the worst. In contrast, the RSP system with the lower values
is an ideal choice. Generally, if one of the factors is high, then the other one is low.
Moreover, it is reasonable to register a partial query with short execution time to a
system, which has the high increasing rate and the low initial cost of memory
consumption. By using the classification, we determine the type of the systems with the optimal
performance. We define a set of timestamps T , , … , , … , , where
indicates the time when two RSP systems consume the same amount of memory. Therefore,
we try to derive the optimal performance to assign a partial query to the selected RSP
system based on .</p>
      <p>Local system manager. This stage includes the query localization and the result
integration. In the previous step, we determine how to distribute partial queries. To deal
with the partial queries in the given RSP system, the proposed method supports the
translation into the language form of the given RSP system. Note that, since the RSP
systems may not require translation, the query localization is optional. And then, we
distribute each partial query into an appropriate RSP system. Besides, to answer a
global query, we integrate the graphs generated from partial queries into a graph. The
integrated graph provides the answer to the global query.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Experimental Result</title>
      <p>
        To verify our approach, we evaluated the performance of the proposed unified interface
in comparison with C-SPARQL and CQELS by using multiple stream queries. The
performance was evaluated in terms of memory consumption and latency. The latency
indicates the time of delay until our method provides an answer after query execution.
Experimental Setting. Our experiments were conducted with Intel Core i7-4790k
4.0GHz CPU, 8 GB RAM, 250 GB SSD, Windows 10 OS and Eclipse for Java. Also
we used the real time cities dataset provided by CityBench [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We experimented by
varying the number of streams for global queries Query5 and Query10 of CityBench.
Result Analysis. As shown in Fig. 2 and Fig. 3, we compared the proposed interface
with CQELS and C-SPARQL in terms of the number of streams. The results showed
that our method consumed the latency approximately 85% less, but the memory
approximately 40% more than CQELS. And, our method consumed the memory 80%
less, but the latency approximately 95% more than C-SPARQL. In this respect, we
observed that our proposed method outperformed C-SPARQL in terms of memory
consumption and CQELS in terms of latency. The reason for this is that partial queries
were distributed by the evaluated performance of systems. Also, each RSP system
processed the smaller number of streams as compared to the number of streams of a
global query and reduced the load for processing streams.
      </p>
      <p>We measured the performance using queries consisting of five streams according to
the execution time of streams as shown in Fig. 4 and Fig. 5. A unified interface required
the increased memory consumption during the initial one minute. It is due to that a
unified interface needs the processing steps of query writing and query registering.
After some time, the experimental results were optimized in other parts, respectively. The
proposed method was better than C-SPARQL as much as 80% in terms of memory
consumption. Also, the proposed method was better than CQELS as much as 60% in
terms of latency. This is because a partial query was assigned to an optimal RSP system.</p>
      <p>In conclusion, the conventional RSP systems have to trade-off between memory
consumption and latency. The proposed method showed stable performance between the
memory consumption and the latency without being biased to one side.</p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgement References</title>
      <p>This work was supported by the ICT R&amp;D program of MSIP/IITP, Republic of Korea.
[B0101-16-1276, Access Network Control Techniques for Various IoT Services]</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Le-Phuoc</surname>
          </string-name>
          , Danh, et al.:
          <article-title>A native and adaptive approach for unified processing of linked streams and linked data</article-title>
          .
          <source>In: Web-ISWC</source>
          ,
          <year>2011</year>
          . p.
          <fpage>370</fpage>
          -
          <lpage>388</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. BARBIERI,
          <string-name>
            <surname>Davide Francesco</surname>
          </string-name>
          , et al.:
          <article-title>C-SPARQL: SPARQL for continuous querying</article-title>
          .
          <source>In: WWW. ACM</source>
          ,
          <year>2009</year>
          , p.
          <fpage>1061</fpage>
          -
          <lpage>1062</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dell'Aglio</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calbimonte</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Della Valle</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Towards a Unified Language for RDF Stream Query Processing</article-title>
          . In: ESWC,
          <year>2015</year>
          , p.
          <fpage>353</fpage>
          -
          <lpage>363</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Daum</surname>
          </string-name>
          , Michael.:
          <article-title>Deployment of Global Queries in Distributed and Heterogeneous Stream Processing Systems</article-title>
          . In.: DEBS Workshop,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>M. I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Mileo</surname>
            ,
            <given-names>A..:</given-names>
          </string-name>
          <article-title>CityBench: a configurable benchmark to evaluate RSP engines using smart city datasets</article-title>
          .
          <source>In. ISWC</source>
          ,
          <year>2015</year>
          . p.
          <fpage>374</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Sejin</given-names>
            <surname>Chun</surname>
          </string-name>
          , Seungmin Seo, Wonwoo Ro, and
          <string-name>
            <surname>Kyong-Ho Lee</surname>
          </string-name>
          ,
          <article-title>"Proactive Plan-Based Continuous Query Processing over Diverse SPARQL Endpoints,"</article-title>
          <source>Proc. of the IEEE/WIC/ACM Web Intelligence conference (WI</source>
          <year>2015</year>
          ) , pp.
          <fpage>161</fpage>
          -
          <lpage>164</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>