<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Internet Technology (TOIT), Vo.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Proactive Replication of Dynamic Linked Data for Scalable RDF Stream Processing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sejin Chun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jooik Jung</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiongnan Jin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Seungjun Yoon</string-name>
          <email>sjyoong@icl.yonsei.ac.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kyong-Ho Lee</string-name>
          <email>khlee89@yonsei.ac.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer science, Yonsei University</institution>
          ,
          <addr-line>Seoul</addr-line>
          ,
          <country>Republic of Korea</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>290</volume>
      <issue>2003</issue>
      <fpage>161</fpage>
      <lpage>164</lpage>
      <abstract>
        <p>In this paper, we propose a scalable method of proactively replicating a subset of remote datasets for RDF Stream Processing. Our solution achieves a fast query processing by maintaining the replicated data up-to-date before query evaluation. To construct the replication process e ectively, we present an update estimation model to handle the changes in updates over time. With the update estimation model, we re-construct the replication process in response to the outdated data. Finally, we conduct exhaustive tests with a real-world dataset to verify our solution.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Background</title>
      <p>5HQFVWSPRL
{tc,tu}
\6FQ]UKLWDR
)</p>
      <sec id="sec-1-1">
        <title>Update estimation model</title>
        <p>{NR}
đ
RDF Nodes</p>
      </sec>
      <sec id="sec-1-2">
        <title>Graphs</title>
      </sec>
      <sec id="sec-1-3">
        <title>SERVICE (sub-)queries,</title>
        <p>ET
(Continuous)</p>
      </sec>
      <sec id="sec-1-4">
        <title>Answers</title>
        <p>2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Solution</title>
      <p>RZLGQ</p>
      <p>RDF Stream
5 H V X O W , Q W H J U D W R U</p>
      <sec id="sec-2-1">
        <title>Replicated</title>
        <p>RDF Graphs Materialized View</p>
        <p>Fig. 1. The proposed system
Our solution presents a proactive replication of Linked Data for RSP. The
proposed solution refreshes the replicated data retrieved from a SPARQL endpoint
before query evaluation. In other words, we maintain the replicated data
up-todate before joining stream data with remote data. Thus, we achieve a fast query
processing because we do not require any invocations to the endpoints at every
query evaluation while maintaining a high accuracy.</p>
        <p>Figure 1 illustrates the proposed system. Given an RSP query that joins
RDF streams with SERVICE patterns, a query manager accepts the query as an
input and divides it into two queries: STREAM and SERVICE queries. The STREAM
query should be delegated to an RSP engine like C-SPARQL, and SERVICE
(sub)queries should be delivered to a proactive replication component (PR). An RSP
engine registers the STREAM query and evaluates it continuously. Meanwhile, from
the SERVICE (sub-)queries, PR constructs a replication process NR, in which each
instance invokes a remote service and materializes the result to MV. Lastly, a
result integrator combines the results obtained from RSP and PR, and produces
answers continuously.</p>
        <p>Speci cally, PR consists of three phases: construction, re-composition, and
synchronization. In the construction phase, PR constitutes NR with an update
estimation model. Each instance of NR is assigned to a node in order to obtain
a subset of remote RDF data through a SPARQL endpoint. To model various
changes in the number of updates over time, our update estimation model is
based on the inhomogeneous recurrent piecewise constant process [4]. The
underlying assumption of such process is that repeats every Q time unit, in other
words, (T ) = (T + Q) for all time periods T . Thus, we construct an initial
version of an update process NU by assigning to a given time interval.</p>
        <p>With the initial version of NU , we create and deploy the instances of NR
based on a set of evaluation time ET to select stream data. Let a time-based
sliding window W consist of ( , ), where is a width of the window and
is a slide as the gap between the opening time instants of consecutive windows.
Given a query q that contains one or more Ws, we compute ET = f 1; ; ng
for q, where each indicates the evaluation time for each window Wn of W.
Therefore, we determine the number of instances of NR and their positions by
NU and ET .</p>
        <p>Given a time interval Q, the solution mappings of a SERVICE pattern and
the update estimation model = (T; ) for all time periods T , we de ne a
replication process NR of
in the following:</p>
        <p>
          NR( ) = (T ; ; r( ))
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
Where a vector represents an e ective replication instance r( ) with the value
for each time interval T , and each r( ) is composed of half-opened intervals
of the form [s; f ). The start time s is the time at which the SERVICE patterns
corresponding to executes and the nish time f is the time of replicating the
solution mappings retrieved from the endpoint.
        </p>
        <p>In the synchronization phase, PR receives the set of solution mappings
retrieved by the instance of NR and replicates them into MV. To renew the update
estimation model, the information about the replicated data (i.e., whether the
data changes(tc) or not(tu) is transferred and computed for new over a time
period.</p>
        <p>In the recomposition phase, PR re-constructs NR using a new and a cost
metric at time t such as M (t) and G(t). In detail, M (t) is de ned as the
number of updates being missed from MV at time t. Larger M (t) deteriorates the
freshness of MV, and decreases the accuracy of the answer. G(t) is de ned as
the number of replication instances in which the result of SERVICE patterns is
equivalent to the duplicated data in the prior release in [0,t]. Thus, reducing
G(t) improves the performance of maintaining MV in terms of stability.</p>
        <p>To derive new from irregular invocations to endpoints, we use a
maximumlikelihood estimator (MLE) [5]. The MLE computes the expected that has the
highest probability of producing the observed set of changes, which are detected
from accesses. Since each access to an endpoint can determine whether the
requested dataset has been updated(tc) or not(tu), we estimate new without
complete history of updates.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
      <p>Experimental Setup. We developed our solution based on C-SPARQL. To
compare with the state-of-the-art work, we implemented a process of
maintaining MV [3]. In addition, we selected CQELS as a baseline method which
performs generally better than C-SPARQL. We utilized a query Q6 and its related
datasets from CityBench2. In addition, we extended the query by adding remote
services that provide real-world parking information3,4. Here, to maintain the
average response time of a service, e.g., 1s, consistently, we used subqueries,
e.g., hentityURIi ?p ?o. Both the average of result sizes with 850kb and the
number of results with 5000 records are approximately similar at every query
evaluation.</p>
      <p>Experimental Result. Figure 2 shows the average execution time of
processing Q6 with varying the number of SERVICE patterns. On average, our method
took ve seconds less than the method of [3]. Speci cally, the amount of reduced
execution time .5 seconds for two services, 1 second for 4 services, 6s for 8
services, and 11s for 16 services, respectively. This improvement is due to that our
2https://github.com/CityBench/Benchmark
3https://www.parkwhiz.com
4http://lod.seoul.go.kr/
v
e
f
r
b
u
e</p>
      <p>The proposed method
Baseline method
to the number of SERVICE patterns
and the number of missing updates
solution pulls the replicated data from MV at every query evaluation.
ing updates. Using parking information during a week, we checked how many
updates were missing from</p>
      <p>MV. We then measured the accuracy of the
replicated data using Jaccard Similarity, that is de ned as the size of the intersection
of the replicated and the answer sets divided by the size of the union of them.
At each hour, the result has a higher accuracy and small number of missing
updates, i.e., 00:00 to 05:00 and 06:00 to 24:00, whereas some cases have larger
number of missing updates but the accuracy is also high, i.e., 05:00 to 06:00.
At each hour, it has a higher accuracy and small number of missing updates.
In addition, we utilized that the Pearson correlation coe
cient method
estimates the correlation which is a strength of relationship between the accuracy
and the number of missing updates. The obtained value of the coe
cient was
-.234, which indicates that the correlation is weak. From this experiment, we
learned that our solution of maintaining the replicated data up-to-date before
query evaluation may not have a strong in uence on the accuracy of the answer.
Acknowledgement. This work was supported by the ICT R&amp;D program of MSIP/IITP,
Republic of Korea. [B0101-16-1276, Access Network Control Techniques for Various IoT
Services]
approach for uni ed processing of linked streams and linked data. In: ISWC 2011,
Approximate continuous query answering over streams and dynamic linked data</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Barbieri</surname>
            ,
            <given-names>D. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Braga</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ceri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Della Valle</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grossniklaus</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>C-SPARQL: SPARQL for continuous querying</article-title>
          .
          <source>In: WWW</source>
          , pp.
          <fpage>1061</fpage>
          -
          <lpage>1062</lpage>
          . ACM. (
          <year>2009</year>
          )
          <article-title>2</article-title>
          .
          <string-name>
            <surname>Le-Phuoc</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dao-Tran</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parreira</surname>
            ,
            <given-names>J. X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hauswirth</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A native and adaptive</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>