<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Parallel Data Loading during Querying Deep Web and Linked Open Data with SPARQL</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pauline Folz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriela Montoya</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hala Skaf-Molli</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pascal Molli</string-name>
          <email>pascal.mollig@univ-nantes.fr</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria-Esther Vidal</string-name>
          <email>fmvidalg@ldc.usb.ve</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Nantes Metropole - Direction Recherche, Innovation et Enseignement Superieur</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Nantes University</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Unit UMR6241 of the Centre National de la Recherche Scienti que (CNRS)</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Universidad Simon Bol var</institution>
          ,
          <country country="VE">Venezuela</country>
        </aff>
      </contrib-group>
      <fpage>63</fpage>
      <lpage>77</lpage>
      <abstract>
        <p>Web integration systems are able to provide transparent and uniform access to heterogeneous Web data sources by integrating views of Linked Data, Web Service results, or data extracted from the Deep Web. However, given the potential large number of views, query engines of Web integration systems have to implement execution techniques able to scale up to real-world scenarios and e ciently execute queries. We tackle the problem of SPARQL query processing against RDF views, and propose a non-blocking query execution strategy that incrementally accesses and merges the views relevant to a SPARQL query in a parallel fashion. The proposed strategy is implemented on top of Jena 2.7.4, and empirically compared with SemLAV, a sequential SPARQL query engine on RDF views. Results suggest that our approach outperforms SemLAV in terms of the number of answers produced per unit of time.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Linked Open Data initiatives have motivated the integration of a large number
of RDF datasets into the Linking Open Data (LOD) cloud [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Di erent
Webbased interfaces are available to access these publicly accessible Linked Data
sets, e.g., SPARQL endpoints and Linked Data fragments [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. However, the
Deep Web which has around 500 times the size of the Surface Web [
        <xref ref-type="bibr" rid="ref10 ref11">11, 10</xref>
        ]
has not been integrated as part of LOD cloud. Performing SPARQL queries
without considering the Deep Web can potentially deliver incomplete results. For
example, the execution of the SPARQL query: Which members of the Semantic
Web community are interested in Dalai Lama, Barack Obama, or Rihanna? (cf.
Figure 2) without the integration of the Deep Web will provide no answers [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Nevertheless, if data from social networks such as Twitter, Facebook, or LinkedIn
were considered, the query execution could return some answers.
      </p>
      <p>
        Two main approaches exist for data integration: data warehousing, and the
virtual mediators [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Semantic data-warehouses such as Virtuoso with the Sponger
feature [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] allow for the implementation of wrappers able to create RDF data
from unsemanti ed data sources, e.g., Web services, CSV les; but this approach
may su er from the freshness problem [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], i.e., data may become stale when data
sources are updated.
      </p>
      <p>
        On the other hand, a mediator relies on a global schema to provide a uniform
interface for accessing the data sources. Global-As-View (GAV) and
Local-AsView (LAV), are the main paradigms for mapping data sources and the global
schema. In GAV mediators, entities of the global schema are described using
views over the data sources, but including or updating data sources may require
the modi cation of a large number of views [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Whereas, in LAV mediators,
the sources are described as views over the global schema, and adding new
data sources can be easily done [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Despite of its expressiveness and exibility,
LAV query re-writting is in general intractable, i.e., NP-complete for conjunctive
queries [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. State-of-the-art LAV query rewriters e ciently solve some families of
the query rewriting problem [
        <xref ref-type="bibr" rid="ref12 ref3">3, 12</xref>
        ]; nevertheless, they may not equally perform
on SPARQL queries [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Recently, SemLAV [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the rst scalable LAV-based
approach for SPARQL query processing, was proposed. Instead of enumerating
the query rewritings of a SPARQL query, SemLAV selects the most relevant LAV
views, accesses the selected views according to their relevance, and materializes
the downloaded data into an integrated RDF graph. Then, the SPARQL query
is executed against the integrated RDF graph.
      </p>
      <p>
        SemLAV provides a new paradigm to execute SPARQL queries against LAV
views, but because relevant views are loaded sequentially, SemLAV may get
blocked loading large views. In the worst case, if the rst loaded view is huge and
it does not provide relevant data for the query answer, SemLAV will be blocked
without producing any answer. Following a sequential view loading strategy may
reduce the number of answer produced per unit of time, i.e., throughput, and
the time for rst answer. Loading several views in parallel may overcome these
limitations. However, a parallel view loading strategy will introduce the problem
of concurrent writing on the integrated RDF graph. In this paper, we propose
a non-blocking query execution strategy to integrate the data from the relevant
views into the integrated RDF graph in a parallel fashion. We implement the
proposed non-blocking strategy on the top of Jena 2.7.4; we name this new
SPARQL query engine parallel SemLAV. Further, an empirical evaluation is
conducted to study the new parallel strategy with respect to SemLAV. The
Berlin Benchmark [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and queries and views designed by Castillo-Espinola [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] are
used to evaluate both query engines. Results suggest that the parallel SemLAV
outperforms SemLAV with respect to answers produced per time unit.
      </p>
      <p>
        The paper is organized as follows. Section 2 describes background and
motivation. Section 3 presents strategies for integrating relevant views into the
integrated RDF graph in a parallel fashion. Section 4 reports our experimental
results. Finally, conclusions and future work are outlined in Section 5.
SemLAV follows a mediator and wrapper architecture [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] where data from the
sources are virtually integrated by SemLAV in a global schema composed by
several RDF vocabularies, as shown in Figure 1. Sources are described by LAV
views and can be heterogeneous, e.g., from the Deep Web, RDF data sets, or
relational tables. SPARQL queries are expressed in terms of the global schema
and posed against the SemLAV mediator. A wrapper is speci c for a data source,
and retrieves data on demand; the retrieved data are transformed to match the
global schema. Wrappers can be generated by tools like Karma [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] or OPAL [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
The global schema is the interface between users and the data sources.
Given a query and a set of views, SemLAV computes a ranked set of relevant
views for answering the query, no statistics are used to rank the views. Relevant
views are ranked based on the number of triple patterns of the original query
that each view covers [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Views are materialized by calling the wrappers, and
each time a new view is fully materialized, the original query is executed.
      </p>
      <p>
        The bene ts of SemLAV are illustrated in the following example [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Suppose
SemLAV global schema comprises di erent RDF vocabularies, e.g., foaf 5 and
      </p>
    </sec>
    <sec id="sec-2">
      <title>5 http://xmlns.com/foaf/0.1/</title>
      <p>&lt;h t t p : / /www. w3 . o r g /2000/01/ r d f schema#&gt;
&lt;h t t p : / / xmlns . com/ f o a f /0.1/ &gt;
SELECT DISTINCT
WHERE f
?P f o a f : member ?C .
?C r d f s : l a b e l " Semantic Web" .
?P f o a f : knows ?WKP .
?WKP f o a f : name ?N .</p>
      <p>FILTER (?N=" D a l a i Lama" j j ?N=" Barack Obama" j j ?N=" Rihanna " )
g
rdfs 6. Figure 2 presents a SPARQL query expressed using the global schema.
Views are expressed as conjunctive queries, where RDF predicates are
represented by binary predicates, e.g., label(C,L) corresponds to ?C rdf:label ?L and
?P foaf:name ?N is expressed as name(P,N). Listing 1 de nes ve LAV views.
Triple patterns in the query are also seen as binary predicates and BGPs are
represented as conjunctive queries; the running SPARQL query is composed
of four subgoals on the predicates: member(P,C), label(C, \Semantic Web"),
knows(P,WKP), and name(WKP,N). The lter expression is modeled as a
disjunction of atomic expressions on the equality comparison operator.</p>
      <p>Listing 1: Views s1-s5 for Query Q
v1 (P , A , I , C , L): made (P , A) , a f f i l i a t i o n (P , I ) , member (P , C) , l a b e l (C , L )
v2 (A , T, P , N, C): t i t l e (A , T) , made (P , A) , name (P ,N) , member (P , C)
v3 (P , N, R ,M): name (P ,N) , name (R ,M) , knows (P , R)
v4 (P , N, G , R , C): name (P ,N) , g e n d e r (P , G) , knows (P , R) , member (P , C)
v5 (P , N, R , C , L): name (P ,N) , knows (P , R) , member (P , C) , l a b e l (C , L )</p>
      <p>Given a subgoal sg of a conjunctive query, e.g., label(C,\Semantic Web"), a
view v is relevant for sg, if sg is part of the body of v, e.g., v1(P,A,I,C,L) and
v5(P,N,R,C,L) are relevant for label(C,\Semantic Web"). Table 1a presents the
set of relevant views for each query subgoal of query in Figure 2.</p>
      <p>SemLAV sorts relevant views according to the number of the subgoals of the
query that the view de nes, e.g., view v5 is sorted rst since it de nes all the
subgoals. Table 1b represents the sorted relevant views for query in Figure 2.</p>
      <p>SemLAV identi es and ranks the relevant views of a query, and executes the
query over the data collected from the relevant views. Di erent strategies can
be followed to contact the views and load the data. For example, following a
blocking strategy, views are contacted one by one in order, and a view is not
contacted until all the data from the previous contacted view have been
downloaded completely. This is the strategy followed by SemLAV, which is illustrated
in the Figure 3a, we can see that this strategy can be blocking if the rst view
is huge. While the view v5 is loading we are not able to perform the query. This
blocking issue can have a negative impact on the performance of the query
en</p>
    </sec>
    <sec id="sec-3">
      <title>6 "http://www.w3.org/2000/01/rdf-schema</title>
      <p>member(P, C) label(C, L) knows(P, WKP) name(WKP, N)
v1(P,A,I,C,L) v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C)
v2(A,T,P,N,C) v5(P,N,R,C,L) v4(P,N,G,R,C) v3(P,N,R,M)
v4(P,N,G,R,C) v5(P,N,R,C,L v4(P,N,G,R,C)
v5(P,N,R,C,L) v5(P,N,R,C,L)</p>
      <p>(b) Sorted relevant views
member(P, C) label(C, L) knows(P, WKP) name(WKP, N)
v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L)
v4(P,N,G,R,C) v1(P,A,I,C,L) v4(P,N,G,R,C) v4(P,N,G,R,C)
v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C)
v2(A,T,P,N,C) v3(P,N,R,M)
gine if the performance is measured in terms of the number of answers produced
per unit of time, i.e., throughput.</p>
      <p>To illustrate this problem, consider Figure 3a, where v5 is loaded rst. Even
if v5 covers all the query subgoals, loading v5 rst reduces the throughput,
because v5 is the biggest view and does not contribute to the result. On the other
hand, loading both v1 and v4, which together cover all the subgoals takes less
time and may produce query answers. If relevant views were loaded in parallel
following a non-blocking strategy, this situation would not a ect the query engine
performance. This solution is illustrated in Figure 3b, where there are ve threads
and each of them loads one of the rst ve top ranked views at the time; views are
allocated in di erent threads. Time to load v5 is greater than the time required
to load v4 and v1 in parallel. Additionally, v4 and v1 cover all the subgoals of
our running query; thus, answers are produced before loading v5 completely.</p>
      <p>We propose a non-blocking strategy for executing SPARQL queries against
views. Like SemLAV, this approach does not rely on statistics to rank and select
the relevant views. The proposed strategy prevents the query engine from getting
blocked until all the data are retrieved from the relevant views.
3</p>
      <sec id="sec-3-1">
        <title>Our Approach</title>
        <p>A non-blocking strategy to access the views in a parallel fashion is de ned.
Although this strategy improves the performance of a query engine, loading the
retrieved data into the integrated RDF graph in parallel, may generate
concurrency problems, i.e., many processes may simultaneously add data to the
integrated RDF graph. So, we de ne a new concurrent model for RDF, and we
propose a non-blocking query execution strategy able to adapt query execution
to di erent criteria, e.g., a query is executed after a certain number of triples</p>
        <p>Query Q
Query Q
Query Q
Query Q
v5
v4
Query Q
(a) Sequential loading
(b) Parallel loading
are loaded into the integrated RDF graph. We implement the concurrency model
and the non-blocking query execution strategy on top of Jena 2.7.4 7 .
3.1</p>
        <sec id="sec-3-1-1">
          <title>A Concurrency Model for the Integrated RDF Graph</title>
          <p>Regarding our approach, we need a model that can handle concurrent insertions.
However, RDF stores like Jena do not handle concurrent insertions, they are only
able to favor one type of operation, e.g., reads or insertions. This strategy is
implemented thanks to locks, but read and insert locks are mutually exclusive,
i.e., they cannot be simultaneously activated. Existing RDF stores assume that
there are more readers than writers and follow the multiple-readers/single-writer
strategy (MRSW)8. According to MRSW, many readers may read
simultaneously, while a writer must have exclusive access. MRSW assumes writers have
the priority to keep data up-to-date. Nevertheless, in our proposed approach,
data insertions are going to be more frequent than data reads. A reader is the
query engine that accesses the integrated RDF graph during query execution,
while writers are the wrappers of the relevant views which load the data into the
integrated RDF graph. The query engine cannot execute the query more often
than loading views into the integrated RDF graph, because executing the query
is expensive, and doing so too often may lead to performance degradation.</p>
          <p>
            In other words, our proposed approach prioritizes read operations over
insertions, i.e., a single-reader/multiple-writers strategy (SRMW) [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ] is followed to
7 http://jena.apache.org/
8 https://jena.apache.org/documentation/notes/concurrency-howto.html
manage concurrency on the integrated RDF graph. So the reader, e.g., a query
execution engine, will have a higher priority rather than a writer, e.g., a wrapper
loading a view. Additionally, two insert locks cannot be activated at the same
time due to the speci cation of the integrated RDF model. However, the query
engine divides each view into blocks of n triples to allow for the loading of
portions of several views at the same time. A lock is requested before starting a
block loading, and it is released after n triples have been loaded completely. In
our example, the rst block of v5 is loaded, then the rst block of v4, and to load
the second block of v5, it may be necessary to wait until all the rst blocks of the
currently loading views are already loaded. However, this order may uctuate
depending on the system time allocation among the threads.
3.2
          </p>
        </sec>
        <sec id="sec-3-1-2">
          <title>A Non-Blocking Strategy for SPARQL Query Execution</title>
          <p>We implement a non-blocking strategy that is able to execute a query according
to the following criteria; the selection of the criteria can be either con gured or
provided by the user during query execution.</p>
          <p>{ View dependent: the reader is woken up after a new view is loaded; thus, if v
is a new loaded view, then the query engine will re-execute the query against
the integrated RDF graph. If enough data is loaded into the integrated RDF
graph from v, then the query engine will be able to generate new results
when it is executed. This criterion is also implemented by SemLAV.
{ Time dependent: the reader is woken up after a period of time t, i.e., if
t is n milliseconds, the query engine will re-execute the query against the
RDF graph every n milliseconds. If enough data is loaded into the integrated
RDF graph during the period t, the query engine will be able to generate
new results. But, the concurrency model prioritizes the reader over writers;
thus, if the writers are stopped and not able to load enough data into the
integrated RDF graph, the query will be ine ciently executed.
{ Data dependent: the reader is woken up after a certain number n of triples
are inserted into the integrated RDF graph by the writers; thus, the query
engine will re-execute the query against the RDF graph whenever n new
triples are integrated. If the n new triples contribute to the results, then the
query engine will be able to generate new answers when it is executed.
{ Two-phase execution: the reader is woken up either after a period of time t
or a certain number n of triples are inserted into the integrated RDF graph
by the writers. In the rst phase, the reader performs ASK queries to check
if new results can be produced, if the answer is true, the second phase is
launched. The second phase strategy will directly execute the query, then the
reader will be woken up either after a period of time t or a certain number
n of new triples have been inserted into the integrated RDF graph.
4</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Experimental Evaluation</title>
        <p>
          The Berlin SPARQL Benchmark (BSBM) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], and queries and views proposed by
Espinola-Castillo [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] are used to compare the performance of parallel SemLAV
with respect to SemLAV. Our goal is to reproduce the experiments reported by
Montoya et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]; therefore, we used the Berlin Benchmark dataset composed of
10,000,736 triples using a scale factor of 28,211 products, 16 out of 18 queries, and
nine out of the ten de ned views proposed by Espinola-Castillo [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. In SemLAV
experiments, some queries and views were not considered because they included
constants and some of the evaluated rewriters only process queries with variables.
Five additional views were de ned to cover all the predicates in the evaluated
queries, i.e., 14 views were evaluated. Furthermore, 476 views were produced by
horizontally partitioning each original view into 34 parts, such that each part
produces 1/34 of the answers given by the original view.
        </p>
        <p>Queries and views are described in Tables 2a and 2b. The size of the complete
answer is computed by including all the views into the Jena RDF triple store
and by executing the queries against this centralized RDF dataset. The Jena
2.7.4 library with main memory setup is used to store and query the integrated
RDF graphs. We executed parallel SemLAV with a timeout of 10 minutes.</p>
        <p>Experiments are also run on the same platform than SemLAV experiments,
i.e., on a Linux server with 128 GB of memory, 124 processors where 20 GB
of RAM are allocated for the experiments. Wrappers are implemented for each
view and to load data from RDF les, i.e., 476 wrappers are available.</p>
        <sec id="sec-3-2-1">
          <title>4.1 Implementation</title>
          <p>We use critical section and lock to implement the single-reader/multiple-writers
SRMW concurrency model in Jena 2.7.4. The number of threads impacts the
SPARQL engine performance; thus, we consider this number as one of the
independent parameters of our study.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>4.2 Impact of the Non-Blocking Query Execution Criteria</title>
          <p>The goal of the experiment is to study the impact of the non-blocking query
execution criteria on the query engine performance. We hypothesize that parallel
SemLAV will outperform SemLAV in terms of throughput and time for the rst
answer. We measure the following metrics: i) total time (TT) in milliseconds;
ii) time for rst answer (TFA) in milliseconds; iii) throughput
(answer/millisecond); and iv) number of times the original query is executed (#EQ).</p>
          <p>We evaluate parallel SemLAV for the non-blocking query execution criteria
de ned in Section 3 with di erent number of threads, i.e., the number of
writers and the con guration of the non-blocking query execution strategy. We use
setups with di erent number of threads 5, 10, and 20. Results suggest that 20
threads is the best number for writers. All the results are available at the project
web site https://sites.google.com/site/semanticlav.
The View Dependent Criterion: The thread which executes the query is woken
up when a new view is loaded. Table 3 shows the result of SemLAV and parallel
SemLAV using the view strategy, i.e., re-execute the query after a new view is
loaded. Parallel SemLAV outperforms SemLAV in terms of throughput and total
execution time. But surprisingly, the time for rst answer is increased, for all
queries except queries 2, 13, and 18; for these queries the time for the rst answer
is at most half of the SemLAV time. In most queries the time for rst answer is
increased because the number of times the original query is executed (#EQ) in
parallel SemLAV is less than in SemLAV; furthermore, parallel SemLAV breaks
the views ranking established by SemLAV, i.e., SemLAV starts by loading the
view ranked in rst place and executes the query. However, parallel SemLAV
loads views in parallel, and the query is re-executed when a new view is loaded,
which is not necessarily the rst ranked view by SemLAV. In setups with 5
and 10 threads, the time for rst answer is better than for 20 threads, but the
throughput is lower as shown in Tables 4 and 5.
The Time Dependent Criterion: The thread which executes the query is woken
up each 500 milliseconds. Table 6 shows the result of SemLAV and parallel
SemLAV using the time dependent strategy for 20 threads. The results also
show that parallel SemLAV outperforms SemLAV in terms of throughput and
total execution time; however, the time for rst results is increased as when the
view dependent criterion is executed.
The Data Dependent Criterion: The query thread is woken up each time the
integrated RDF graph grows up to 500 new triples. Table 7 shows the results
of SemLAV and parallel SemLAV using data dependent strategy for 20 threads.
As in previous experiments, parallel SemLAV outperforms SemLAV in terms of
throughput and total execution time for all queries; but the time for the rst
result is increased for the majority of the queries.</p>
          <p>The Two-phase Criterion: The rst phase of this strategy performs an ASK query
and when it returns true, the second phase is conducted. First, the second phase
executes the original query, then the query engine will be woken up either each
n milliseconds or when n triples are inserted into the integrated RDF graph.
Table 8 reports on the results for the two-phase strategy when the query is
executed whenever 500 triples are inserted into the integrated RDF graph. Parallel
SemLAV outperforms SemLAV in terms of throughput for all the queries, but
throughput values of parallel SemLAV are lower than in previous experiments.
Table 9 summarizes the results of the throughput with 20 threads in the di erent
empirical evaluations. In all experiments, parallel SemLAV outperforms SemLAV
in terms of the throughput and total execution time. However, none of the de ned
execution criterion dominates other criterion. For instance, parallel SemLAV
with query execution every 500 milliseconds is the best execution strategy for
query2; whereas parallel SemLAV with execution strategy whenever 500 triples
have been inserted into the integrated RDF graph is the most suitable strategy
for query5. We repeat the experiments with di erent number of threads. In setup
with 20 threads, parallel SemLAV outperforms SemLAV in terms of throughput
and total execution time but it increases time for rst answer. Preliminary results
suggest that there is a tradeo between throughput and time for rst answer. To
con rm these results, in the future, we plan to evaluate parallel SemLAV with
di erent time and data setups.
5</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Conclusions and Future Work</title>
        <p>We tackle the problem of executing SPARQL queries against LAV views in a
parallel fashion. The query execution model relies on an RDF graph that
temporally materializes the data retrieved from the relevant views of a SPARQL query.
The query engine respects a concurrency model that prioritizes the execution of
queries against the integrated RDF graph over loading data from the views.
Additionally, a non-blocking query execution strategy allows for the execution of a
SPARQL query on an RDF graph depending on di erent criteria. Similarly than
SemLAV, our proposed parallel query execution model, named parallel SemLAV,
was implemented on top of Jena. We empirically compared parallel SemLAV and
SemLAV in terms of the impact of the non-blocking strategy on the query engine
throughput. The observed results suggest that independently of the criterion
followed by the non-blocking query engine strategy, parallel SemLAV outperforms
SemLAV in terms of throughput. One limitation of our current implementation
is inherent from the techniques implemented by Jena to handle concurrent
insertions in an RDF graph. To overcome this limitation, we plan to consider a graph
database engine as the RDF store backend, in order to provide more robust
concurrency management of the RDF graph for incremental query processing.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Acknowledgement</title>
        <p>We thank Maxime Pauvert and Nicolas Brondin, both students of the
Computer Science Department at the University of Nantes for implementing the
non-blocking strategy.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Virtuoso sponger. White paper,
          <source>OpenLink Software.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Manolescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rigaux</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-C. Rousset</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Senellart</surname>
          </string-name>
          .
          <source>Web Data Management</source>
          . Cambridge University Press, New York, NY, USA,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Arvelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bonet</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.-E.</given-names>
            <surname>Vidal</surname>
          </string-name>
          .
          <article-title>Compilation of query-rewriting problems into tractable fragments of propositional logic</article-title>
          .
          <source>In AAAI</source>
          , pages
          <volume>225</volume>
          {
          <fpage>230</fpage>
          . AAAI Press,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          .
          <article-title>Linked data - the story so far</article-title>
          .
          <source>Int. J. Semantic Web Inf. Syst.</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):1{
          <fpage>22</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          and
          <string-name>
            <surname>A. Schultz.</surname>
          </string-name>
          <article-title>The berlin sparql benchmark</article-title>
          .
          <source>Int. J. Semantic Web Inf. Syst.</source>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ):1{
          <fpage>24</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>R.</given-names>
            <surname>Castillo-Espinola</surname>
          </string-name>
          .
          <article-title>Indexing RDF data using materialized SPARQL queries</article-title>
          .
          <source>PhD thesis</source>
          , Humboldt-Universitat zu Berlin,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A.</given-names>
            <surname>Doan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Halevy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z. G.</given-names>
            <surname>Ives</surname>
          </string-name>
          .
          <article-title>Principles of Data Integration</article-title>
          . Morgan Kaufmann,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>P.</given-names>
            <surname>Folz</surname>
          </string-name>
          , G. Montoya,
          <string-name>
            <given-names>H.</given-names>
            <surname>Skaf-Molli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Molli</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Vidal</surname>
          </string-name>
          . Semlav:
          <article-title>Querying deep web and linked open data with SPARQL. In The Semantic Web: ESWC 2014 Satellite Events - ESWC 2014 Satellite Events</article-title>
          , Anissaras, Crete, Greece, May
          <volume>25</volume>
          -29,
          <year>2014</year>
          , Revised Selected Papers, pages
          <volume>332</volume>
          {
          <fpage>337</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>T.</given-names>
            <surname>Furche</surname>
          </string-name>
          , G. Gottlob,
          <string-name>
            <given-names>G.</given-names>
            <surname>Grasso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Guo</surname>
          </string-name>
          , G. Orsi, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Schallhart</surname>
          </string-name>
          .
          <article-title>OPAL: automated form understanding for the deep web</article-title>
          .
          <source>In Proceedings of the 21st World Wide Web Conference</source>
          <year>2012</year>
          ,
          <article-title>WWW 2012</article-title>
          , Lyon, France,
          <source>April 16-20</source>
          ,
          <year>2012</year>
          , pages
          <fpage>829</fpage>
          {
          <fpage>838</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. T. Furche, G. Gottlob,
          <string-name>
            <given-names>G.</given-names>
            <surname>Grasso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Guo</surname>
          </string-name>
          , G. Orsi,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schallhart</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>DIADEM: thousands of websites to a single database</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>7</volume>
          (
          <issue>14</issue>
          ):
          <year>1845</year>
          {
          <year>1856</year>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>B.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and
          <string-name>
            <surname>K. C.-C.</surname>
          </string-name>
          <article-title>Chang. Accessing the Deep Web</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>50</volume>
          (
          <issue>5</issue>
          ):
          <volume>94</volume>
          {
          <fpage>101</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. G. Konstantinidis and
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Ambite</surname>
          </string-name>
          .
          <article-title>Scalable query rewriting: a graph-based approach</article-title>
          . In T. K. Sellis,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kementsietsidis</surname>
          </string-name>
          , and Y. Velegrakis, editors,
          <source>SIGMOD Conference</source>
          , pages
          <volume>97</volume>
          {
          <fpage>108</fpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. G. Montoya, L. D. Iban~ez, H.
          <string-name>
            <surname>Skaf-Molli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Molli</surname>
            , and
            <given-names>M.-E. Vidal.</given-names>
          </string-name>
          <article-title>SemLAV: Local-As-View Mediation for SPARQL. Transactions on Large-Scale Data-</article-title>
          and
          <string-name>
            <surname>Knowledge-Centered Systems</surname>
            <given-names>XIII</given-names>
          </string-name>
          , Lecture Notes in Computer Science, Vol.
          <volume>8420</volume>
          , pages
          <fpage>33</fpage>
          {
          <fpage>58</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>G. L.</given-names>
            <surname>Peterson</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Burns</surname>
          </string-name>
          .
          <article-title>Concurrent reading while writing II: the multiwriter case</article-title>
          .
          <source>In 28th Annual Symposium on Foundations of Computer Science</source>
          , Los Angeles, California, USA,
          <fpage>27</fpage>
          -29
          <source>October</source>
          <year>1987</year>
          , pages
          <fpage>383</fpage>
          {
          <fpage>392</fpage>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>M. Taheriyan</surname>
            ,
            <given-names>C. A.</given-names>
          </string-name>
          <string-name>
            <surname>Knoblock</surname>
            ,
            <given-names>P. A.</given-names>
          </string-name>
          <string-name>
            <surname>Szekely</surname>
            , and
            <given-names>J. L.</given-names>
          </string-name>
          <string-name>
            <surname>Ambite</surname>
          </string-name>
          .
          <article-title>Rapidly integrating services into the linked data cloud</article-title>
          . In P.
          <string-name>
            <surname>Cudre-Mauroux</surname>
            , J. He in, E. Sirin,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Tudorache</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Euzenat</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hauswirth</surname>
            ,
            <given-names>J. X.</given-names>
          </string-name>
          <string-name>
            <surname>Parreira</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Schreiber</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Bernstein</surname>
          </string-name>
          , and E. Blomqvist, editors,
          <source>International Semantic Web Conference (1)</source>
          , volume
          <volume>7649</volume>
          of Lecture Notes in Computer Science, pages
          <volume>559</volume>
          {
          <fpage>574</fpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>J. D. Ullman</surname>
          </string-name>
          .
          <article-title>Information integration using logical views</article-title>
          .
          <source>Theor. Comput. Sci.</source>
          ,
          <volume>239</volume>
          (
          <issue>2</issue>
          ):
          <volume>189</volume>
          {
          <fpage>210</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Meester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Haesendonck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Vocht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Colpaert</surname>
          </string-name>
          , E. Mannens, and R. V. de Walle.
          <article-title>Querying datasets on the web with high availability</article-title>
          .
          <source>In The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23</source>
          ,
          <year>2014</year>
          . Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , pages
          <volume>180</volume>
          {
          <fpage>196</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. G. Wiederhold.
          <article-title>Mediators in the architecture of future information systems</article-title>
          .
          <source>IEEE Computer</source>
          ,
          <volume>25</volume>
          (
          <issue>3</issue>
          ):
          <volume>38</volume>
          {
          <fpage>49</fpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>