<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On measuring performances of C-SPARQL and CQELS</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xiangnan Ren</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Houda Khrouf</string-name>
          <email>houda.khroufg@atos.net</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zakia Kazi-Aoul</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yousra Chabchoub</string-name>
          <email>yousra.chabchoubg@isep.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olivier Cure</string-name>
          <email>olivier.cure@u-pem.fr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ATOS - 80 Quai Voltaire</institution>
          ,
          <addr-line>95870 Bezons</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ISEP - LISITE</institution>
          ,
          <addr-line>75006 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>UPEM LIGM - UMR CNRS 8049</institution>
          ,
          <addr-line>77454 Marne-la-Vallee</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>To cope with the massive growth of semantic data streams, several RDF Stream Processing (RSP) engines have been implemented. The e ciency of their throughput, latency and memory consumption can be evaluated using available benchmarks such as LSBench and CityBench. Nevertheless, these benchmarks lack an in-depth performance evaluation as some measurement metrics have not been considered. The main goal of this paper is to analyze the performance of two popular RSP engines, namely C-SPARQL and CQELS, when varying a set of performance metrics. More precisely, we evaluate the impact of stream rate, number of streams and window size on execution time as well as on memory consumption.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        With the emergence of Big data's velocity aspect, new platforms are needed
to e ciently handle data streams. In the context of the Semantic Web, a
dedicated W3C community1 extended standard SPARQL queries with the ability
to continuously query unbounded RDF streams. This is a key component of
RDF (Resource Description Framework) Stream Processing, henceforth denoted
as RSP. The development of RSP engines integrates some streaming features
such as windowing operations and periodical execution. Examples of popular
RSP engines are C-SPARQL [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and CQELS [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Each engine has its speci c
architecture, query language, execution mechanism and operational semantics.
In order to determine which RSP engine to adopt for a particular application,
it is primordial to conduct an in-depth and complete performance analysis.
      </p>
      <p>
        Since 2012, benchmarks and comparative research surveys of RSP engines
have been conducted. Examples of RSP benchmarks are SRBench [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
CSRBench [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], LSBench [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and CityBench [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. While SRBench and CSRBench have
studied query functionalities and output correctness, LSBench and CityBench go
1 https://www.w3.org/community/rsp/
a step further by tackling performance criteria. However, current benchmarks do
not distinguish between the di erent mechanisms, namely time-driven and
datadriven described in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and employed by existing RSP engines. Moreover, some
performance criteria have not been considered in the evaluation plan of these
benchmarks. Thus, we conduct some experiments to have a deeper
comprehensive view on current RSP systems. We do not propose a new benchmark, but we
propose an evaluation of some performance criteria. More precisely, we evaluate
the impact of stream rate, window size, number of streams, number of triples
and static data size, on query execution time and memory consumption. This
evaluation has been conducted on the two popular RSP engines: C-SPARQL and
CQELS.
      </p>
      <p>This paper is organized as follows. Section 2 provides an overview of existing
RSP engines and benchmarks. In Section 3, we describe our evaluation plan and
novel performance criteria. Section 4 presents the results of our experiments, and
we discuss them in Section 5. We conclude and outline future work in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>In this section, we rst present the two popular RSP engines, C-SPARQL and
CQELS, and then we describe existing RSP benchmarks.</p>
      <sec id="sec-2-1">
        <title>2.1 RSP engines</title>
        <p>
          C-SPARQL [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and CQELS [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] represent two mature and mostly used RSP
engines. Each engine proposes its own continuous query language extensions to
query time-annotated triples and employs a speci c RSP mechanism. We
precisely distinguish two kinds of RSP mechanisms: time-driven and data-driven.
The time-driven mechanism periodically executes SPARQL queries within a
logical window (time-based) or physical window (triple-based). Whereas, the
datadriven mechanism executes SPARQL queries immediately after the arrival of
new data streams. In the following, we present the main features supported by
each aforementioned engine.
        </p>
        <p>
          C-SPARQL supports time-driven query execution and extends the standard
SPARQL query language with keywords such as RANGE and STEP. The RANGE
keyword de nes the time-based window (e.g., RANGE 5m means a window of
5 minutes), and the STEP keyword indicates the frequency at which the query
should be executed. Standard SPARQL 1.1 operators can be used over the data
within the window such as aggregation, ordering and comparison. C-SPARQL
streams out the whole output at each query execution, which refers to Rstream
operator among the di erent streaming operators [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] (e.g., Rstream, Istream,
Dstream).
        </p>
        <p>
          CQELS is developed in a native and adaptive way proposing a pre-processor
and an optimizer to improve performance [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. It supports data-driven query
execution following the content-change policy, in which queries are triggered
immediately at the arrival of new statements in the window. Even if the SLIDE
keyword is supported in CQELS syntax (like STEP keyword in C-SPARQL),
it does not have any e ect on the engine behavior. The frequency execution
depends on the arrival of new data in the stream. CQELS compares the current
output with the previous one, and streams out only the new results, which refers
to Istream [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] operator in terms of streaming operators.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>RSP benchmarks</title>
        <p>SRBench, one of the rst available RSP benchmarks, proposes a baseline to
evaluate the support of various functionalities (SPARQL features, window operator,
etc.). CSRBench is an extension of SRBench to evaluate the results correctness.</p>
        <p>LSBench covers functionality, correctness and performance evaluation. It uses
a customized data generator and provides insights into some performance aspects
of RSP engines. However, there is no consideration of important performance
metrics such as stream rate, window size and number of streams. Besides, the
memory consumption has not been considered in their experiments.</p>
        <p>CityBench is a recent RSP benchmark based on smart city data and real
application scenarios. It provides a consistent and relevant plan to evaluate
performance. However, few factors have been considered in the experiments. Only
the number of concurrent queries and the number of streams have been
considered to evaluate the execution time and memory consumption, whereas other
important factors such as window size and stream rate are missing. Moreover,
the memory consumption of C-SPARQL shown in CityBench seems to be
questionable, since we obtain di erent results.</p>
        <p>Note that we do not propose a new benchmark for RSP engines. The main
goal of this work is to deeply understand the performance of the above-mentioned
RSP engines.
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation Plan</title>
      <sec id="sec-3-1">
        <title>Dataset and Queries design</title>
        <p>In our experiments, we resolved to use our own data generator for two main
reasons: rst, to be able to control the size of the generated data streams and,
second, to control the data content in order to check the results correctness.
In particular, we use both streaming and static data related to the domain of
water resource management. The logical data model is presented in Figure 1.
The dynamic data describes sensors observations and their metadata, e.g., the
message, the observation and the assigned tags. A message basically contains
an observation, and we set a xed number of tags (hasTag predicate) for each
observation. For each fty ow observations, we include a chlorine observation.
The static data provides detailed information about each sensor, namely the
label, the manufacturer ID, and the sector ID to which it belongs to in the
network.</p>
        <p>We de ne a set of queries Q = fQ1; Q2; Q3; Q4; Q5; Q6g, where Q1; :::; Q5
operate over streaming data, and Q6 integrates static data. These queries involve
di erent SPARQL operators (e.g., FILTER, UNION, etc.) and are sorted in
ascending order based on the execution complexity (e.g. complex queries involve
more query operators). Only the time-based window is addressed in all these
queries. As for the last query Q6, we compare the behavior of RSP engines when
varying the size of static data. Details and pseudo code of the prede ned queries
are available on Github2. They can be summarized as follows:
{ Q1: Which observation involves chlorine value?
{ Q2: How many tags are assigned to each chlorine observation?
{ Q3: Which observation ID has an identi cation ending with \00" or \50"?
{ Q4: Which chlorine observation possesses three tags?
{ Q5: Which observation has an identi cation that ends with \00" or \10" and
how many tags assigned to this observation?
{ Q6: What is the belonging sector, manufacturer, and assigned label of each
chlorine observation?</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 De nition of test criteria</title>
        <p>Let us denote the input parameters by X=fstream rate, number of triples,
window size, number of streams, static data sizeg, and the set of output metrics
by Y =fexecution time, memory consumptiong. We next detail each of these
parameters.</p>
        <p>- X: (1) stream rate The time-driven mechanism consists in executing
periodically the query with a frequency step speci ed in the query. This frequency,
called STEP, can be time-based (e.g., every 10 seconds) or tuple-based (e.g.,
every 10 triples). The query is periodically performed over the most recent items.
2 https://github.com/renxiangnan/Reference_Stream_Reasoning_
2016/wiki
The keyword RANGE de nes the size of these temporary items. Just like the
frequency step, the window size can be time or tuple-based. In case of time-based
window, the execution time and memory consumption are closely dependent
on stream rate. Increasing stream rate makes the engines, such as C-SPARQL,
process more data for each execution. The frequency step indicates the interval
between two successive executions of the same query. Therefore, input stream
rate should not exceed engine's processing capacity, otherwise the system has to
store an always growing amount of data.</p>
        <p>- X: (2) number of triples The stream rate is not an appropriate factor to be
considered for the data-driven mechanism because the query execution and the
data injection are performed in parallel. In another words, it is not feasible to
precisely control the input stream rate. In this context, we need to once feed the
system with a xed number of triples, and that is why we de ne an additional
parameter called number of triples N . A bigger N generates a smaller error
rate, but N should remain under a given threshold to respect the processing
limitations of the RSP engines.</p>
        <p>- X: (3) window size We use window size as a performance metric for RSP
engines. Note that the window size (RANGE) is closely related to the volume of
the queried triples for each execution of the query. According to our
preliminary experiments, the window size has marginal impact on the performance of
CQELS. Thus, we do not consider this metric when evaluatong CQELS.</p>
        <p>- X: (4) number of streams, (5) static data size The capacity to handle
complex queries with multi-stream sources or static background information is
an important criterion to evaluate RSP engines. LSBench and CityBench have
already proposed these metrics.</p>
        <p>- Y : execution time, memory consumption As the machine conditions are
uncontrollable varying factors, we evaluate the execution time, for a given query, as
the average value of n iterations. Since C-SPARQL and CQELS have two di
erent execution mechanisms (time-driven and data-driven), we adapt the de nition
of execution time to each context. In consequence, the execution time represents
for C-SPARQL the average execution time over several query executions, while it
represents for CQELS the global query execution time for processing N triples.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>All experiments are performed on a laptop equipped with Intel Core i5
quadcore processor (2.70 GHz), 8GB RAM, the maximum heap size is set to 2 GB,
running Windows 7, Java version JDK/JRE 1.8. The formal evaluation is done
after a 1-to-2-minutes warm-up period with relatively low stream rate.
4.1</p>
      <sec id="sec-4-1">
        <title>Time-driven: C-SPARQL</title>
        <p>We conducted our experiments over C-SPARQL by testing the previously de ned
queries. We measure the average value of twenty iterations for query execution
time and memory consumption.</p>
        <p>Execution Time We evaluate query execution time by varying stream rate,
number of streams, window size (time-based) and static data size.</p>
        <p>In Figure 2a, one can see that the ve curves exhibit approximately a linear
trend (up to a given threshold concerning the stream rate). For each query,
the linear trend can be maintained only when the stream rate is under a given
threshold. For all ve queries, C-SPARQL normally operates when its execution
time is smaller than one second, which is also the query preset STEP value. Let
us denote by Ratemax(triples=s) the maximum stream rate that can be accepted
by C-SPARQL for a given query. Ratemax represents the maximum number of
triples that can be processed per unit time. Table 1 shows the Ratemax for each
query.</p>
        <p>As shown in Figure 2a, if the stream rate exceeds the corresponding Ratemax,
the results provided by C-SPARQL are erroneous. The reason behind is that
CSPARQL does not have enough time to process both current and incoming data.
Indeed, newly incoming data stream are jammed in memory, and the system
will enforce C-SPARQL to start the next execution which causes errors. Thus,
Ratemax represents the maximum number of triples under which C-SPARQL
delivers correct results.</p>
        <p>
          In some cases, queries require data from multiple streams. In Figure 2b, we
focus on C-SPARQL's behavior by varying the number of streams where the
stream rate is set to 1000 triples/s (i.e. the dotted line in Figure 2b). This
gure reports the execution time of Q1 for di erent number of streams. The
dotted line represents the execution time of Q1 on a single equivalent (i.e. same
workload) stream with a rate Stream Ratesingle = N umber of Streams
Stream Ratemulti, where Stream Ratesingle and Stream Ratemulti denote the
stream rate for respectively single and multi streams. The curve of query
execution time increases as a convex function over the number of streams. C-SPARQL
has a substantial delay by the increasing number of streams. Indeed, it has to
repeat the query execution for each stream [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], then executes the join
operation among the intermediate results from di erent stream sources. This action
requires important computing resources, so we can deduce that C-SPARQL is
more e cient to process single stream than multi-streams. In addition,
according to our experiments, we nd that the query execution time linearly increases
with the growth of the size of time-based window and static data. C-SPARQL
has a constant overhead for delay when increasing these two metrics.
        </p>
        <p>Memory Consumption We used VisualVM to monitor the Memory
Consumption of C-SPARQL. An example of visualization about Java Virtual Machine
Garbage Collector (GC) activity is available on our GitHub3.</p>
        <p>Since the Java Virtual Machine executes the Garbage Collector lazily (in
order to leave the maximum available CPU resource to the application), using
the maximum memory allocated during execution is not an appropriate way to
measure the memory consumption. Practically, the processing of a simple query,
while allocating far less memory on each execution, can also reach the maximum
allocated heap as the processing of a complex query. Thus, instead, we de ne
a new evaluation metric called Memory Consumption Rate (MCR). Measuring
the amount (megabytes) of allocated and freed memory by GC per unit time
comprehensively describe MCR. M CR(MB/s) = Max
P eriModin , M ax and M in
refer to the average maximum and minimum memory consumption, respectively.
P eriod is the average duration of two consecutive maximum memory observed
instances. M ax, M in, P eriod are computed over 10 observed periods. M CR
signi es the memory changes in heap per second. A higher M CR shows a more
frequent activity of garbage collector. It intuitively shows how many bytes have
been released and reallocated by GC per unit time. Figure 3a shows the impact
of stream rate on M CR. For each query, the period decreases and M CR
increases with the growth of Stream Rate. Query Q3 has the highest M CR. This
can be explained by the aggregate operator which produces more intermediate
results during query execution. Note that M CR is not a general criterion for
measuring memory consumption. In some use cases, we could not observe
periodical activity on GC. The main goal of using M CR is to give a comprehensive
description of memory management on C-SPARQL.</p>
        <p>Figure 3b displays the increase of memory consumption rate over the growth
of static data size. For query Q6, memory peak varies marginally while increasing
static data size, but the minimum consumed memory is directly impacted. One
possible explanation is that C-SPARQL produces additional objects to process
static data, and keeps these objects as long-term in memory.
3 https://github.com/renxiangnan/Reference_Stream_Reasoning_
2016/wiki
This section focuses on the performance evaluation of CQELS. The variant
parameters are number of triples, number of streams, and static data size. Q4
and Q5 are not included in this evaluation since CQELS does not support the
timestamp function (i.e. function that performs basic temporal ltering on the
streamed triples).</p>
        <p>Execution Time Since CQELS uses a so-called probing sequence to support
its execution plan, getting the running time for each query execution is not
experimentally feasible. Thus, we evaluate the global execution time of N triples
for CQELS. More precisely, we keep the same strategy as LSBench, i.e. inject a
nite sequence of stream into the system which contains N triples. N should be
big enough to get more accurate results (N 105).</p>
        <p>(a)
(b)
Fig. 4: The impact of number of triples and static data size on query execution
time in CQELS.</p>
        <p>Figure 4a shows the impact of number of triples on execution time. N should
also be controlled within a certain range to prevent the engine from crashing (c.f.
\Memory Consumption" part of CQELS). Queries Q1, Q2, Q3 contain chain
patterns (join occurs on subject-object position) that select chlorine observation:
fT1: ?observation ex:observeChlorine ?chlorineObs . T2: ?chlorineObs ex:hasTag
?tag . g. Pattern T1 returns all results by matching the predicate
\observeChlorine", then T2 lters among all selected observations in T1 those which have been
assigned tags. In Figure 4a, note that there is no signi cant di erence between
Q2 and Q3. Based on Q2, the query Q3 adds a \FILTER" operator to restrict
that preselected observations which have an ID ending by \00" or \50". This
additional lter in Q3 slightly in uences the engine performance, which lets suggest
that CQELS is very e cient at processing \FILTER" operator. As the dotted
line Q01, it represents Q1 without the pattern T2. Its corresponding execution
time is reduced to one-six times compared with Q1. Indeed, the pattern T2 plays
a key role in term of execution time. Without T2, CQELS will return the results
immediately if T1 is veri ed, but pattern T2 makes the engine wait till T2 is
veri ed.</p>
        <p>CQELS supports queries with multi-streams. It allows to assign the triple
patterns which are only relative to the corresponding stream source. This
property gives the engine some advantages to process complex queries. Each triple
just needs to be veri ed in its attached stream source. However, C-SPARQL
has to repeat veri cation on all presenting streams for the whole query syntax,
and this behavior leads to a waste of computing resources. Due to data-driven
mechanism, serious mismatches occur in output for a multi-streams query,
especially when the query requests synchronization among the triples. Asynchronous
streams are illustrated in our GitHub4.</p>
        <p>Suppose that we have two streams, S1 and S2, sequentially sent (due to the
data-driven approach adopted by CQELS) into the engine. If the window size
de ned on S1 is not large enough, ?observation in pattern T2 will not be matched
with ?observation in T1. This problem can be solved by de ning a larger window
size in T1 with a small number of streams. In our experiments, we carry out the
multi-streams test by constructing two streams on Q1, Q2 and Q3. For Q1, with
two streams, CQELS spent approximately 26s to process (2 ) 105 triples, that
is just 30% more than the single stream case. To conclude, CQELS gains some
advantages in term of execution time to process queries with multi-streams.
However, the output may also be in uenced by the asynchronous behavior in
multi-stream context. Note that C-SPARQL does not su er from the streams
synchronization since it follows batch-oriented approach.</p>
        <p>In Figure 4b, the curve gives the total execution time(s) for 1.260.000 triples.
The execution time for N triples slightly changes while increasing the size of
Static Data from 10MB to 50MB. The result shows that CQELS is e cient for
processing static data of a large size.
4 https://github.com/renxiangnan/Reference_Stream_Reasoning_
2016/wiki</p>
        <p>Memory Consumption As we directly send N triples into the system at once,
CQELS's memory consumption does not behave as C-SPARQL (which follows a
periodic pattern). Generally, the memory consumption on CQELS keeps growing
by increasing the number N of triples. As mentioned in the previous section, N
should not exceed a given threshold. If N is very large, the memory consumption
will reach its limit. In this situation, latency on query execution will increase
substantially. Furthermore, since serious mismatch occurs on multi-streams query,
X = Number of Stream is not considered as a metric for memory consumption.
We evaluate the peak of memory consumption (MC) during query execution.
The trend increases over time, where M C reaches the peak just before the end
of query execution.</p>
        <p>
          Figure 5a shows that the memory consumption of Q1, Q2 and Q3 is very
close when varying the number of triples, i.e., the complexity of queries are
not re ected by their memory consumption. CQELS manages e ciently the
memory for complex queries. In Figure 5b, the memory consumption of Q6 is
proportional to the size of static data. According to the evaluation, we found
that a lower maximum allocated heap size (e.g., 512MB) causes a substantial
delay on CQELS. The consumed memory keeps growing to the limited heap size,
i.e. the GC could clear the unused objects in a timely manner. This behavior is
possibly caused by the built-dictionary for URI encoding [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
Fig. 5: Impact of the number of triples and the static data size on memory
consumption in CQELS.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results Discussion</title>
      <p>As we generate di erent streaming modes for time-driven (C-SPARQL) and
data-driven (CQELS) engines, the memory consumption is not comparable
between them. This section mainly derives a discussion on query execution time
based on observed results. It is about a simple comparison between C-SPARQL
and CQELS.</p>
      <p>
        It is not obvious to compare the performance of di erent RSP engines, since
each of them has a speci c execution strategy. According to [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and to our
experiments, we list the following conditions to support a fair cross-engines
performance comparison: (i) the engine results should be correct, at least comparable.
We remind that the untypical behavior of C-SPARQL occurs when the incoming
stream rate exceeds the threshold. Even if the engine still produces results, it is
meaningless to measure the execution time; (ii) The execution time for di erent
RSP engines should associate the same workload. As C-SPARQL uses a batch
mechanism, it is easy to control the workload of the window operator.
However, the data-driven eager mechanism practically makes infeasible the workload
control. Therefore, we choose t = T , the average execution time per triple to
      </p>
      <p>N
support our comparison. T is the total execution time for N triples. Note that
t marginally changes when varying the metrics de ned in section 3.2; (iii) The
engine warming up is also recommended. We inject the \warming up" stream
(with a relatively low stream rate) into the system before the formal evaluation.</p>
      <p>RSP engine
CSPARQL</p>
      <p>CQELS</p>
      <p>Table 2 shows that C-SPARQL outperforms CQELS to deal with Q1, Q2
and Q3. This can be explained by the fact that the chain pattern existing in Q1,
Q2 and Q3 forces CQELS to repeat the veri cation on matching condition for
the whole window. This behavior signi cantly hinder the engine performance.
For Q6, CQELS is almost 27 times faster than C-SPARQL. It shows its high
e ciency to process queries with static data.</p>
      <p>Finally, we summarize our experiment over three aspects: 1) Functionality
support. Since C-SPARQL uses the Sesame/Jena as the querying core, it
supports most of the SPARQL 1.1 grammar. In contrast, as CQELS is implemented
in a native way, it supports less operations than C-SPARQL, e.g., timestamp
function, property path, etc. 2) Output correctness. As mentioned in section
4.2, CQELS su ers from a serious output mismatch in the multi-stream
context. This is due to the eager execution mechanism and asynchronous streams.
C-SPARQL behaves normally with multi-stream queries since it is characterized
by a time-driven mechanism. As a matter of fact, real use cases often require
concurrency of join from di erent stream sources. In this context, C-SPARQL
takes the advantages of correctness and completeness of output results. 3)
Performance. C-SPARQL shows stability with complex queries. However, in practical
applications, input stream rate should be controlled at a low level to guarantee
C-SPARQL's output correctness. Besides, C-SPARQL has scalability problem
when dealing with static data. CQELS takes advantage from its dictionary
encoding technique and dynamic routing policy, and thus, is e cient for simple
queries and is scalable with static data.
This paper focuses on the performance evaluation of two state-of the-art
engines: C-SPARQL and CQELS. We propose some new performance metrics and
designed a speci c evaluation plan. In particular, we take into account the
speci c implementation of each RSP engine. We performed many experiments to
evaluate the impact of Stream Rate, Number of Triples, Window Size, Number
of Streams and Static Data Size on Execution Time and Memory
Consumption. Several queries with di erent complexities have been considered. The main
result of this complete study is that each RSP engine has its own advantage
and is adapted to a particular context and use case, e.g., C-SPARQL excels on
complex and multi-stream queries while CQELS stands out on queries requiring
static data. In future work, we plan to evaluate the performance of RSP engines
in a distributed environment.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work has been supported by the WAVES project which is partially
supported by the French FUI (Fonds Unique Interministeriel) call #17.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A.</given-names>
            <surname>Arasu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Babu</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Widom.</surname>
          </string-name>
          <article-title>CQL: A language for continuous queries over streams and relations</article-title>
          .
          <source>In Database Programming Languages, 9th International Workshop, DBPL 2003</source>
          , pages
          <fpage>1</fpage>
          {
          <fpage>19</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D. F.</given-names>
            <surname>Barbieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Braga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ceri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Della</given-names>
            <surname>Valle</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Grossniklaus</surname>
          </string-name>
          .
          <article-title>Continuous queries and real-time analysis of social semantic data with c-sparql</article-title>
          .
          <source>In SDoW2009</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>D. F.</given-names>
            <surname>Barbieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Braga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ceri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Valle</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Grossniklaus</surname>
          </string-name>
          .
          <article-title>Querying rdf streams with c-sparql</article-title>
          .
          <source>SIGMOD Rec</source>
          .,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>I. Botan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Derakhshan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Dindar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Haas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and N.</given-names>
            <surname>Tatbul</surname>
          </string-name>
          . SECRET:
          <article-title>A model for analysis of the execution semantics of stream processing systems</article-title>
          .
          <source>PVLDB</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.</given-names>
            <surname>Dell'Aglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Calbimonte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Balduini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Corcho</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Valle</surname>
          </string-name>
          .
          <article-title>On correctness in rdf stream processor benchmarking</article-title>
          .
          <source>In The Semantic Web - ISWC 2013 - 12th International Semantic Web Conference</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D.</given-names>
            <surname>Le-Phuoc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dao-Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. X.</given-names>
            <surname>Parreira</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Hauswirth</surname>
          </string-name>
          .
          <article-title>A native and adaptive approach for uni ed processing of linked streams and linked data</article-title>
          .
          <source>In Proceedings of the 10th International Conference on The Semantic Web - Volume Part I</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>F. G.</given-names>
            <surname>Muhammad Intizar Ali</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Mileo</surname>
          </string-name>
          .
          <article-title>Citybench: A con gurable benchmark to evaluate rsp engines using smart city datasets</article-title>
          .
          <source>In The Semantic Web - ISWC</source>
          <year>2015</year>
          , ISWC'
          <volume>15</volume>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Phuoc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dao-Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Boncz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Eiter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Fink</surname>
          </string-name>
          .
          <article-title>Linked stream data processing engines: Facts and gures</article-title>
          .
          <source>In The Semantic Web - ISWC 2012 - 11th International Semantic Web Conference, ISWC'11</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Duc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Corcho</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Calbimonte. Srbench</surname>
          </string-name>
          :
          <article-title>A streaming rdf/sparql benchmark</article-title>
          .
          <source>In Proceedings of the 11th International Conference on The Semantic Web - Volume Part I</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>