<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SRSPG: A Plugin-based Spark Framework for Large-scale RDF Streams Processing on GPU</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tenglong Ren</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guozheng Rao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaowang Zhang?</string-name>
          <email>xiaowangzhang@tju.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhiyong Feng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Intelligence and Computing, Tianjin University</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country>China Tianjin</country>
          <institution>Key Laboratory of Cognitive Computing and Application</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we propose a plugin-based Spark framework (SRSPG) for large-scale RDF streams processing on GPU. Within this framework, We convert RDF streams to a RDF graph in a uni ed and simple way. Then we can apply various SPARQL query engines to process continuous queries and utilize GPU to accelerate queries. Computation Module provides a Spark-based Join algorithm utilizing GPU for parallel joining, obtaining the nal results. Besides, we provide Compute Resource Management to balance the scheduling and task execution between GPU and memory resources. Finally, we evaluate our work bulit on gStore and RDF-3X on the LUBM benchmark. The experimental results show that SRSPG is e ective for real-time processing of large-scale RDF streams.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>In our work, we propose a plugin-based Spark framework for large-scale
RDF streams processing on GPU. In order to make full use of computing power
of GPU, we add Query Split module and Computation Module. The Query
Split decomposes the SPARQL queries. Computation module is used to compute
the intermediate results through on GPU. We evaluate our experiments on the
benchmark LUBM. The experimental results show that our framework is e
ective and e cient.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Overview of SRSPG</title>
      <p>The framework of SRSPG consists of the following six main parts: Syntax
Translator, Data Transformer, Query Trigger, Query Split, SPARQL API, and
Computation Module shown in Figure 1.</p>
      <p>RDF</p>
      <sec id="sec-2-1">
        <title>Stream(S)</title>
      </sec>
      <sec id="sec-2-2">
        <title>Continuous</title>
      </sec>
      <sec id="sec-2-3">
        <title>Query(Q)</title>
      </sec>
      <sec id="sec-2-4">
        <title>Data</title>
      </sec>
      <sec id="sec-2-5">
        <title>Transformer</title>
      </sec>
      <sec id="sec-2-6">
        <title>RDF Graph</title>
      </sec>
      <sec id="sec-2-7">
        <title>Window</title>
      </sec>
      <sec id="sec-2-8">
        <title>Selector</title>
      </sec>
      <sec id="sec-2-9">
        <title>Syntax</title>
      </sec>
      <sec id="sec-2-10">
        <title>Translator</title>
      </sec>
      <sec id="sec-2-11">
        <title>Query</title>
      </sec>
      <sec id="sec-2-12">
        <title>Trigger</title>
      </sec>
      <sec id="sec-2-13">
        <title>Query</title>
      </sec>
      <sec id="sec-2-14">
        <title>Split</title>
      </sec>
      <sec id="sec-2-15">
        <title>SPARQL Engines</title>
      </sec>
      <sec id="sec-2-16">
        <title>SPARQL API</title>
      </sec>
      <sec id="sec-2-17">
        <title>SubQ1</title>
        <p>...</p>
      </sec>
      <sec id="sec-2-18">
        <title>SubQm</title>
        <p>Computation Module</p>
      </sec>
      <sec id="sec-2-19">
        <title>Spark-based</title>
      </sec>
      <sec id="sec-2-20">
        <title>Parallel Join</title>
      </sec>
      <sec id="sec-2-21">
        <title>Compute Resource</title>
      </sec>
      <sec id="sec-2-22">
        <title>Management</title>
        <p>...
GPU
1
GPU
2
GPU
k</p>
      </sec>
      <sec id="sec-2-23">
        <title>Results</title>
        <p>Data Transformer This module transforms RDF streams to capture
snapshots based on window selector obtained from Query Trigger, and converting
these snapshots to RDF graphs. We convert RDF streams into continuous
window data by Esper or other DSMS. Finally, the RDF graphs are sent to
SPARQL API.</p>
        <p>SPARQL API SRSPG provides SPARQL API for users, which makes it
possible for SPARQL engines (centralized and distributed) to process RDF
streams.</p>
        <p>Query Split Query Split module decomposes queries into some subqueries based
on the weight of predicates. We assign weights to predicates when loading
RDF data. The higher the frequency of predicates in RDF graph, the greater
the weight and the greater the impact on query results.</p>
      </sec>
      <sec id="sec-2-24">
        <title>SELECT ？X WHERE{</title>
        <p>TP1 : ?X likes ?Y. 100
TP2 : ?Y lives ?W. 10
TP3 : ?W located ?S.} 30
TP2: ?Y lives ?W 10</p>
      </sec>
      <sec id="sec-2-25">
        <title>TP3: ?W located ?S 30 TP1: ?X likes ?Y 100 solution</title>
        <p>Computation Module The module of Computation Module proposes Join
parallel algorithm based on Spark, utilizing GPU for parallel joining. The
tasks can be disassembled into some streams. Spark supports multi-threaded
computing, while GPUs are usually serial. The situation leads to contention
for GPU resources among threads. So we present Compute Resource
Management to balance the scheduling and task execution of GPU, GPU and
memory resources.
3</p>
        <p>Experiments and Evaluations
Our experiments are evaluated on server equipped with a 4 CPUs with 6 cores
and 64GB memory, a NVIDIA GTX590 GPU, which has 24GB device memory
and is clocked at 1.35GHz. The version of operating system is Ubuntu 14.04. In
order to support GPU on YARN, we use version 3.1.2. We use RDF-GPU and
gStore-GPU by employing RDF-3X and gStore within SRSPG. Our experiments
utilized LUBM dataset.</p>
        <p>The experiments uses the standard query Q1 and Q2 provided by LUBM.
Figure 3 shows that when S = 240s and SET = 235s, SRSPG uses GPU, the
query time of stream data decreases, gStore improves the speed by more than
two times than RDF-3X. When the data size is small, GPU acceleration is not
obvious, mainly due to data communication and transmission time problems.
The larger the data scale, the better the acceleration e ect.</p>
        <p>105
s
e
m
i
T
104</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>In this paper, we proposed a plugin-based Spark framework for real-time
largescale RDF streams processing on GPU in an e cient and simply way. In the
future, we will take advantage of more novel computing hardware to increase
the speed of large-scale RDF streams processing such as FPGA.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This work is supported by the National Key Research and Development Program
of China (2017YFC0908401) and the National Natural Science Foundation of
China (61672377,61972455). Xiaowang Zhang is supported by the Peiyang Young
Scholars in Tianjin University (2019XRX-0032).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Anicic</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fodor</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rudolph</surname>
            <given-names>S.</given-names>
          </string-name>
          , and Stojanovic N.:
          <string-name>
            <surname>EP-SPARQL</surname>
          </string-name>
          :
          <article-title>A unifed language for event processing and stream reasoning</article-title>
          .
          <source>In: Proc. of WWW</source>
          <year>2011</year>
          , pp.
          <volume>635</volume>
          {
          <fpage>644</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Barbieri</surname>
            <given-names>D.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Braga</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ceri</surname>
            <given-names>S.</given-names>
          </string-name>
          , Della Valle E.,
          <string-name>
            <surname>Grossniklaus</surname>
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Querying RDF streams with C-SPARQL</article-title>
          . SIGMOD Rec.,
          <volume>39</volume>
          (
          <issue>1</issue>
          ),
          <volume>20</volume>
          {
          <fpage>26</fpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Fang</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>X.:</given-names>
          </string-name>
          <article-title>A united framework for large-scale resource description framework stream processing</article-title>
          .
          <source>J. Comput. Sci. Technol</source>
          .,
          <volume>34</volume>
          (
          <issue>4</issue>
          ):
          <fpage>762</fpage>
          -
          <lpage>774</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Le-Phuoc</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dao-Tran</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parreira</surname>
            <given-names>J. X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hauswirth</surname>
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A native and adaptive approach for unifed processing of linked streams and linked data</article-title>
          .
          <source>In: Proc. of ISWC</source>
          <year>2011</year>
          , pp.
          <volume>370</volume>
          {
          <fpage>388</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Li</surname>
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>X.</given-names>
          </string-name>
          , Feng Z.:
          <article-title>PRSP: A plugin-based framework for RDF stream processing</article-title>
          .
          <source>In: Proc. of WWW 2017 (Poster)</source>
          , pp.
          <volume>815</volume>
          {
          <fpage>816</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>