-

Parallel Data Loading during Querying Deep Web and Linked Open Data with SPARQL

Pauline Folz

Gabriela Montoya

Hala Skaf-Molli

Pascal Molli

pascal.mollig@univ-nantes.fr

Maria-Esther Vidal

fmvidalg@ldc.usb.ve 3 0 Nantes Metropole - Direction Recherche, Innovation et Enseignement Superieur , France 1 Nantes University , France 2 Unit UMR6241 of the Centre National de la Recherche Scienti que (CNRS) , France 3 Universidad Simon Bol var , Venezuela

63 77

Web integration systems are able to provide transparent and uniform access to heterogeneous Web data sources by integrating views of Linked Data, Web Service results, or data extracted from the Deep Web. However, given the potential large number of views, query engines of Web integration systems have to implement execution techniques able to scale up to real-world scenarios and e ciently execute queries. We tackle the problem of SPARQL query processing against RDF views, and propose a non-blocking query execution strategy that incrementally accesses and merges the views relevant to a SPARQL query in a parallel fashion. The proposed strategy is implemented on top of Jena 2.7.4, and empirically compared with SemLAV, a sequential SPARQL query engine on RDF views. Results suggest that our approach outperforms SemLAV in terms of the number of answers produced per unit of time.

Linked Open Data initiatives have motivated the integration of a large number of RDF datasets into the Linking Open Data (LOD) cloud [ 4 ]. Di erent Webbased interfaces are available to access these publicly accessible Linked Data sets, e.g., SPARQL endpoints and Linked Data fragments [ 17 ]. However, the Deep Web which has around 500 times the size of the Surface Web [ 11, 10 ] has not been integrated as part of LOD cloud. Performing SPARQL queries without considering the Deep Web can potentially deliver incomplete results. For example, the execution of the SPARQL query: Which members of the Semantic Web community are interested in Dalai Lama, Barack Obama, or Rihanna? (cf. Figure 2) without the integration of the Deep Web will provide no answers [ 8 ]. Nevertheless, if data from social networks such as Twitter, Facebook, or LinkedIn were considered, the query execution could return some answers.

Two main approaches exist for data integration: data warehousing, and the virtual mediators [ 7 ]. Semantic data-warehouses such as Virtuoso with the Sponger feature [ 1 ] allow for the implementation of wrappers able to create RDF data from unsemanti ed data sources, e.g., Web services, CSV les; but this approach may su er from the freshness problem [ 2 ], i.e., data may become stale when data sources are updated.

On the other hand, a mediator relies on a global schema to provide a uniform interface for accessing the data sources. Global-As-View (GAV) and Local-AsView (LAV), are the main paradigms for mapping data sources and the global schema. In GAV mediators, entities of the global schema are described using views over the data sources, but including or updating data sources may require the modi cation of a large number of views [ 16 ]. Whereas, in LAV mediators, the sources are described as views over the global schema, and adding new data sources can be easily done [ 16 ]. Despite of its expressiveness and exibility, LAV query re-writting is in general intractable, i.e., NP-complete for conjunctive queries [ 3 ]. State-of-the-art LAV query rewriters e ciently solve some families of the query rewriting problem [ 3, 12 ]; nevertheless, they may not equally perform on SPARQL queries [ 13 ]. Recently, SemLAV [ 13 ], the rst scalable LAV-based approach for SPARQL query processing, was proposed. Instead of enumerating the query rewritings of a SPARQL query, SemLAV selects the most relevant LAV views, accesses the selected views according to their relevance, and materializes the downloaded data into an integrated RDF graph. Then, the SPARQL query is executed against the integrated RDF graph.

SemLAV provides a new paradigm to execute SPARQL queries against LAV views, but because relevant views are loaded sequentially, SemLAV may get blocked loading large views. In the worst case, if the rst loaded view is huge and it does not provide relevant data for the query answer, SemLAV will be blocked without producing any answer. Following a sequential view loading strategy may reduce the number of answer produced per unit of time, i.e., throughput, and the time for rst answer. Loading several views in parallel may overcome these limitations. However, a parallel view loading strategy will introduce the problem of concurrent writing on the integrated RDF graph. In this paper, we propose a non-blocking query execution strategy to integrate the data from the relevant views into the integrated RDF graph in a parallel fashion. We implement the proposed non-blocking strategy on the top of Jena 2.7.4; we name this new SPARQL query engine parallel SemLAV. Further, an empirical evaluation is conducted to study the new parallel strategy with respect to SemLAV. The Berlin Benchmark [ 5 ] and queries and views designed by Castillo-Espinola [ 6 ] are used to evaluate both query engines. Results suggest that the parallel SemLAV outperforms SemLAV with respect to answers produced per time unit.

The paper is organized as follows. Section 2 describes background and motivation. Section 3 presents strategies for integrating relevant views into the integrated RDF graph in a parallel fashion. Section 4 reports our experimental results. Finally, conclusions and future work are outlined in Section 5. SemLAV follows a mediator and wrapper architecture [ 18 ] where data from the sources are virtually integrated by SemLAV in a global schema composed by several RDF vocabularies, as shown in Figure 1. Sources are described by LAV views and can be heterogeneous, e.g., from the Deep Web, RDF data sets, or relational tables. SPARQL queries are expressed in terms of the global schema and posed against the SemLAV mediator. A wrapper is speci c for a data source, and retrieves data on demand; the retrieved data are transformed to match the global schema. Wrappers can be generated by tools like Karma [ 15 ] or OPAL [ 9 ]. The global schema is the interface between users and the data sources. Given a query and a set of views, SemLAV computes a ranked set of relevant views for answering the query, no statistics are used to rank the views. Relevant views are ranked based on the number of triple patterns of the original query that each view covers [ 13 ]. Views are materialized by calling the wrappers, and each time a new view is fully materialized, the original query is executed.

The bene ts of SemLAV are illustrated in the following example [ 8 ]. Suppose SemLAV global schema comprises di erent RDF vocabularies, e.g., foaf 5 and

5 http://xmlns.com/foaf/0.1/

<h t t p : / /www. w3 . o r g /2000/01/ r d f schema#> <h t t p : / / xmlns . com/ f o a f /0.1/ > SELECT DISTINCT WHERE f ?P f o a f : member ?C . ?C r d f s : l a b e l " Semantic Web" . ?P f o a f : knows ?WKP . ?WKP f o a f : name ?N .

FILTER (?N=" D a l a i Lama" j j ?N=" Barack Obama" j j ?N=" Rihanna " ) g rdfs 6. Figure 2 presents a SPARQL query expressed using the global schema. Views are expressed as conjunctive queries, where RDF predicates are represented by binary predicates, e.g., label(C,L) corresponds to ?C rdf:label ?L and ?P foaf:name ?N is expressed as name(P,N). Listing 1 de nes ve LAV views. Triple patterns in the query are also seen as binary predicates and BGPs are represented as conjunctive queries; the running SPARQL query is composed of four subgoals on the predicates: member(P,C), label(C, \Semantic Web"), knows(P,WKP), and name(WKP,N). The lter expression is modeled as a disjunction of atomic expressions on the equality comparison operator.

Listing 1: Views s1-s5 for Query Q v1 (P , A , I , C , L): made (P , A) , a f f i l i a t i o n (P , I ) , member (P , C) , l a b e l (C , L ) v2 (A , T, P , N, C): t i t l e (A , T) , made (P , A) , name (P ,N) , member (P , C) v3 (P , N, R ,M): name (P ,N) , name (R ,M) , knows (P , R) v4 (P , N, G , R , C): name (P ,N) , g e n d e r (P , G) , knows (P , R) , member (P , C) v5 (P , N, R , C , L): name (P ,N) , knows (P , R) , member (P , C) , l a b e l (C , L )

Given a subgoal sg of a conjunctive query, e.g., label(C,\Semantic Web"), a view v is relevant for sg, if sg is part of the body of v, e.g., v1(P,A,I,C,L) and v5(P,N,R,C,L) are relevant for label(C,\Semantic Web"). Table 1a presents the set of relevant views for each query subgoal of query in Figure 2.

SemLAV sorts relevant views according to the number of the subgoals of the query that the view de nes, e.g., view v5 is sorted rst since it de nes all the subgoals. Table 1b represents the sorted relevant views for query in Figure 2.

SemLAV identi es and ranks the relevant views of a query, and executes the query over the data collected from the relevant views. Di erent strategies can be followed to contact the views and load the data. For example, following a blocking strategy, views are contacted one by one in order, and a view is not contacted until all the data from the previous contacted view have been downloaded completely. This is the strategy followed by SemLAV, which is illustrated in the Figure 3a, we can see that this strategy can be blocking if the rst view is huge. While the view v5 is loading we are not able to perform the query. This blocking issue can have a negative impact on the performance of the query en

6 "http://www.w3.org/2000/01/rdf-schema

member(P, C) label(C, L) knows(P, WKP) name(WKP, N) v1(P,A,I,C,L) v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C) v2(A,T,P,N,C) v5(P,N,R,C,L) v4(P,N,G,R,C) v3(P,N,R,M) v4(P,N,G,R,C) v5(P,N,R,C,L v4(P,N,G,R,C) v5(P,N,R,C,L) v5(P,N,R,C,L)

(b) Sorted relevant views member(P, C) label(C, L) knows(P, WKP) name(WKP, N) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v4(P,N,G,R,C) v1(P,A,I,C,L) v4(P,N,G,R,C) v4(P,N,G,R,C) v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C) v2(A,T,P,N,C) v3(P,N,R,M) gine if the performance is measured in terms of the number of answers produced per unit of time, i.e., throughput.

To illustrate this problem, consider Figure 3a, where v5 is loaded rst. Even if v5 covers all the query subgoals, loading v5 rst reduces the throughput, because v5 is the biggest view and does not contribute to the result. On the other hand, loading both v1 and v4, which together cover all the subgoals takes less time and may produce query answers. If relevant views were loaded in parallel following a non-blocking strategy, this situation would not a ect the query engine performance. This solution is illustrated in Figure 3b, where there are ve threads and each of them loads one of the rst ve top ranked views at the time; views are allocated in di erent threads. Time to load v5 is greater than the time required to load v4 and v1 in parallel. Additionally, v4 and v1 cover all the subgoals of our running query; thus, answers are produced before loading v5 completely.

We propose a non-blocking strategy for executing SPARQL queries against views. Like SemLAV, this approach does not rely on statistics to rank and select the relevant views. The proposed strategy prevents the query engine from getting blocked until all the data are retrieved from the relevant views. 3

Our Approach

A non-blocking strategy to access the views in a parallel fashion is de ned. Although this strategy improves the performance of a query engine, loading the retrieved data into the integrated RDF graph in parallel, may generate concurrency problems, i.e., many processes may simultaneously add data to the integrated RDF graph. So, we de ne a new concurrent model for RDF, and we propose a non-blocking query execution strategy able to adapt query execution to di erent criteria, e.g., a query is executed after a certain number of triples

Query Q Query Q Query Q Query Q v5 v4 Query Q (a) Sequential loading (b) Parallel loading are loaded into the integrated RDF graph. We implement the concurrency model and the non-blocking query execution strategy on top of Jena 2.7.4 7 . 3.1

A Concurrency Model for the Integrated RDF Graph

Regarding our approach, we need a model that can handle concurrent insertions. However, RDF stores like Jena do not handle concurrent insertions, they are only able to favor one type of operation, e.g., reads or insertions. This strategy is implemented thanks to locks, but read and insert locks are mutually exclusive, i.e., they cannot be simultaneously activated. Existing RDF stores assume that there are more readers than writers and follow the multiple-readers/single-writer strategy (MRSW)8. According to MRSW, many readers may read simultaneously, while a writer must have exclusive access. MRSW assumes writers have the priority to keep data up-to-date. Nevertheless, in our proposed approach, data insertions are going to be more frequent than data reads. A reader is the query engine that accesses the integrated RDF graph during query execution, while writers are the wrappers of the relevant views which load the data into the integrated RDF graph. The query engine cannot execute the query more often than loading views into the integrated RDF graph, because executing the query is expensive, and doing so too often may lead to performance degradation.

In other words, our proposed approach prioritizes read operations over insertions, i.e., a single-reader/multiple-writers strategy (SRMW) [ 14 ] is followed to 7 http://jena.apache.org/ 8 https://jena.apache.org/documentation/notes/concurrency-howto.html manage concurrency on the integrated RDF graph. So the reader, e.g., a query execution engine, will have a higher priority rather than a writer, e.g., a wrapper loading a view. Additionally, two insert locks cannot be activated at the same time due to the speci cation of the integrated RDF model. However, the query engine divides each view into blocks of n triples to allow for the loading of portions of several views at the same time. A lock is requested before starting a block loading, and it is released after n triples have been loaded completely. In our example, the rst block of v5 is loaded, then the rst block of v4, and to load the second block of v5, it may be necessary to wait until all the rst blocks of the currently loading views are already loaded. However, this order may uctuate depending on the system time allocation among the threads. 3.2

A Non-Blocking Strategy for SPARQL Query Execution

We implement a non-blocking strategy that is able to execute a query according to the following criteria; the selection of the criteria can be either con gured or provided by the user during query execution.

{ View dependent: the reader is woken up after a new view is loaded; thus, if v is a new loaded view, then the query engine will re-execute the query against the integrated RDF graph. If enough data is loaded into the integrated RDF graph from v, then the query engine will be able to generate new results when it is executed. This criterion is also implemented by SemLAV. { Time dependent: the reader is woken up after a period of time t, i.e., if t is n milliseconds, the query engine will re-execute the query against the RDF graph every n milliseconds. If enough data is loaded into the integrated RDF graph during the period t, the query engine will be able to generate new results. But, the concurrency model prioritizes the reader over writers; thus, if the writers are stopped and not able to load enough data into the integrated RDF graph, the query will be ine ciently executed. { Data dependent: the reader is woken up after a certain number n of triples are inserted into the integrated RDF graph by the writers; thus, the query engine will re-execute the query against the RDF graph whenever n new triples are integrated. If the n new triples contribute to the results, then the query engine will be able to generate new answers when it is executed. { Two-phase execution: the reader is woken up either after a period of time t or a certain number n of triples are inserted into the integrated RDF graph by the writers. In the rst phase, the reader performs ASK queries to check if new results can be produced, if the answer is true, the second phase is launched. The second phase strategy will directly execute the query, then the reader will be woken up either after a period of time t or a certain number n of new triples have been inserted into the integrated RDF graph. 4

Experimental Evaluation

The Berlin SPARQL Benchmark (BSBM) [ 5 ], and queries and views proposed by Espinola-Castillo [ 6 ] are used to compare the performance of parallel SemLAV with respect to SemLAV. Our goal is to reproduce the experiments reported by Montoya et al. [ 13 ]; therefore, we used the Berlin Benchmark dataset composed of 10,000,736 triples using a scale factor of 28,211 products, 16 out of 18 queries, and nine out of the ten de ned views proposed by Espinola-Castillo [ 6 ]. In SemLAV experiments, some queries and views were not considered because they included constants and some of the evaluated rewriters only process queries with variables. Five additional views were de ned to cover all the predicates in the evaluated queries, i.e., 14 views were evaluated. Furthermore, 476 views were produced by horizontally partitioning each original view into 34 parts, such that each part produces 1/34 of the answers given by the original view.

Queries and views are described in Tables 2a and 2b. The size of the complete answer is computed by including all the views into the Jena RDF triple store and by executing the queries against this centralized RDF dataset. The Jena 2.7.4 library with main memory setup is used to store and query the integrated RDF graphs. We executed parallel SemLAV with a timeout of 10 minutes.

Experiments are also run on the same platform than SemLAV experiments, i.e., on a Linux server with 128 GB of memory, 124 processors where 20 GB of RAM are allocated for the experiments. Wrappers are implemented for each view and to load data from RDF les, i.e., 476 wrappers are available.

4.1 Implementation

We use critical section and lock to implement the single-reader/multiple-writers SRMW concurrency model in Jena 2.7.4. The number of threads impacts the SPARQL engine performance; thus, we consider this number as one of the independent parameters of our study.

4.2 Impact of the Non-Blocking Query Execution Criteria

The goal of the experiment is to study the impact of the non-blocking query execution criteria on the query engine performance. We hypothesize that parallel SemLAV will outperform SemLAV in terms of throughput and time for the rst answer. We measure the following metrics: i) total time (TT) in milliseconds; ii) time for rst answer (TFA) in milliseconds; iii) throughput (answer/millisecond); and iv) number of times the original query is executed (#EQ).

We evaluate parallel SemLAV for the non-blocking query execution criteria de ned in Section 3 with di erent number of threads, i.e., the number of writers and the con guration of the non-blocking query execution strategy. We use setups with di erent number of threads 5, 10, and 20. Results suggest that 20 threads is the best number for writers. All the results are available at the project web site https://sites.google.com/site/semanticlav. The View Dependent Criterion: The thread which executes the query is woken up when a new view is loaded. Table 3 shows the result of SemLAV and parallel SemLAV using the view strategy, i.e., re-execute the query after a new view is loaded. Parallel SemLAV outperforms SemLAV in terms of throughput and total execution time. But surprisingly, the time for rst answer is increased, for all queries except queries 2, 13, and 18; for these queries the time for the rst answer is at most half of the SemLAV time. In most queries the time for rst answer is increased because the number of times the original query is executed (#EQ) in parallel SemLAV is less than in SemLAV; furthermore, parallel SemLAV breaks the views ranking established by SemLAV, i.e., SemLAV starts by loading the view ranked in rst place and executes the query. However, parallel SemLAV loads views in parallel, and the query is re-executed when a new view is loaded, which is not necessarily the rst ranked view by SemLAV. In setups with 5 and 10 threads, the time for rst answer is better than for 20 threads, but the throughput is lower as shown in Tables 4 and 5. The Time Dependent Criterion: The thread which executes the query is woken up each 500 milliseconds. Table 6 shows the result of SemLAV and parallel SemLAV using the time dependent strategy for 20 threads. The results also show that parallel SemLAV outperforms SemLAV in terms of throughput and total execution time; however, the time for rst results is increased as when the view dependent criterion is executed. The Data Dependent Criterion: The query thread is woken up each time the integrated RDF graph grows up to 500 new triples. Table 7 shows the results of SemLAV and parallel SemLAV using data dependent strategy for 20 threads. As in previous experiments, parallel SemLAV outperforms SemLAV in terms of throughput and total execution time for all queries; but the time for the rst result is increased for the majority of the queries.

The Two-phase Criterion: The rst phase of this strategy performs an ASK query and when it returns true, the second phase is conducted. First, the second phase executes the original query, then the query engine will be woken up either each n milliseconds or when n triples are inserted into the integrated RDF graph. Table 8 reports on the results for the two-phase strategy when the query is executed whenever 500 triples are inserted into the integrated RDF graph. Parallel SemLAV outperforms SemLAV in terms of throughput for all the queries, but throughput values of parallel SemLAV are lower than in previous experiments. Table 9 summarizes the results of the throughput with 20 threads in the di erent empirical evaluations. In all experiments, parallel SemLAV outperforms SemLAV in terms of the throughput and total execution time. However, none of the de ned execution criterion dominates other criterion. For instance, parallel SemLAV with query execution every 500 milliseconds is the best execution strategy for query2; whereas parallel SemLAV with execution strategy whenever 500 triples have been inserted into the integrated RDF graph is the most suitable strategy for query5. We repeat the experiments with di erent number of threads. In setup with 20 threads, parallel SemLAV outperforms SemLAV in terms of throughput and total execution time but it increases time for rst answer. Preliminary results suggest that there is a tradeo between throughput and time for rst answer. To con rm these results, in the future, we plan to evaluate parallel SemLAV with di erent time and data setups. 5

Conclusions and Future Work

We tackle the problem of executing SPARQL queries against LAV views in a parallel fashion. The query execution model relies on an RDF graph that temporally materializes the data retrieved from the relevant views of a SPARQL query. The query engine respects a concurrency model that prioritizes the execution of queries against the integrated RDF graph over loading data from the views. Additionally, a non-blocking query execution strategy allows for the execution of a SPARQL query on an RDF graph depending on di erent criteria. Similarly than SemLAV, our proposed parallel query execution model, named parallel SemLAV, was implemented on top of Jena. We empirically compared parallel SemLAV and SemLAV in terms of the impact of the non-blocking strategy on the query engine throughput. The observed results suggest that independently of the criterion followed by the non-blocking query engine strategy, parallel SemLAV outperforms SemLAV in terms of throughput. One limitation of our current implementation is inherent from the techniques implemented by Jena to handle concurrent insertions in an RDF graph. To overcome this limitation, we plan to consider a graph database engine as the RDF store backend, in order to provide more robust concurrency management of the RDF graph for incremental query processing.

Acknowledgement

We thank Maxime Pauvert and Nicolas Brondin, both students of the Computer Science Department at the University of Nantes for implementing the non-blocking strategy.

1. Virtuoso sponger. White paper, OpenLink Software.

Abiteboul , I. Manolescu ,

Rigaux , M.-C. Rousset , and P. Senellart . Web Data Management . Cambridge University Press, New York, NY, USA, 2011 .

Arvelo ,

Bonet , and

M.-E.

Vidal . Compilation of query-rewriting problems into tractable fragments of propositional logic . In AAAI , pages 225 { 230 . AAAI Press, 2006 .

Bizer ,

Heath , and

Berners-Lee . Linked data - the story so far . Int. J. Semantic Web Inf. Syst. , 5 ( 3 ):1{ 22 , 2009 .

Bizer and A. Schultz. The berlin sparql benchmark . Int. J. Semantic Web Inf. Syst. , 5 ( 2 ):1{ 24 , 2009 .

Castillo-Espinola . Indexing RDF data using materialized SPARQL queries . PhD thesis , Humboldt-Universitat zu Berlin, 2012 .

Doan ,

A. Y.

Halevy , and

Z. G.

Ives . Principles of Data Integration . Morgan Kaufmann, 2012 .

Folz , G. Montoya,

Skaf-Molli ,

Molli , and

Vidal . Semlav: Querying deep web and linked open data with SPARQL. In The Semantic Web: ESWC 2014 Satellite Events - ESWC 2014 Satellite Events , Anissaras, Crete, Greece, May 25 -29, 2014 , Revised Selected Papers, pages 332 { 337 , 2014 .

Furche , G. Gottlob,

Grasso ,

Guo , G. Orsi, and

Schallhart . OPAL: automated form understanding for the deep web . In Proceedings of the 21st World Wide Web Conference 2012 , WWW 2012 , Lyon, France, April 16-20 , 2012 , pages 829 { 838 , 2012 .

10. T. Furche, G. Gottlob,

Grasso ,

Guo , G. Orsi,

Schallhart , and

Wang . DIADEM: thousands of websites to a single database . PVLDB , 7 ( 14 ): 1845 { 1856 , 2014 .

11.

He ,

Patel ,

Zhang , and K. C.-C. Chang. Accessing the Deep Web . Commun. ACM , 50 ( 5 ): 94 { 101 , 2007 .

12. G. Konstantinidis and

J. L.

Ambite . Scalable query rewriting: a graph-based approach . In T. K. Sellis,

R. J.

Miller ,

Kementsietsidis , and Y. Velegrakis, editors, SIGMOD Conference , pages 97 { 108 . ACM, 2011 .

13. G. Montoya, L. D. Iban~ez, H. Skaf-Molli , P.

Molli , and M.-E. Vidal.

SemLAV: Local-As-View Mediation for SPARQL. Transactions on Large-Scale Data- and Knowledge-Centered Systems

XIII

, Lecture Notes in Computer Science, Vol. 8420 , pages 33 { 58 , 2014 .

14.

G. L.

Peterson and

J. E.

Burns . Concurrent reading while writing II: the multiwriter case . In 28th Annual Symposium on Foundations of Computer Science , Los Angeles, California, USA, 27 -29 October 1987 , pages 383 { 392 , 1987 .

15. M. Taheriyan , C. A.

Knoblock , P. A.

Szekely , and J. L.

Ambite . Rapidly integrating services into the linked data cloud . In P. Cudre-Mauroux , J. He in, E. Sirin, T.

Tudorache , J.

Euzenat , M.

Hauswirth , J. X.

Parreira , J.

Hendler , G.

Schreiber , A.

Bernstein , and E. Blomqvist, editors, International Semantic Web Conference (1) , volume 7649 of Lecture Notes in Computer Science, pages 559 { 574 . Springer, 2012 .

16. J. D. Ullman . Information integration using logical views . Theor. Comput. Sci. , 239 ( 2 ): 189 { 210 , 2000 .

17.

Verborgh ,

Hartig ,

B. D.

Meester ,

Haesendonck ,

L. D.

Vocht ,

M. V.

Sande ,

Cyganiak ,

Colpaert , E. Mannens, and R. V. de Walle. Querying datasets on the web with high availability . In The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23 , 2014 . Proceedings, Part

, pages 180 { 196 , 2014 .

18. G. Wiederhold. Mediators in the architecture of future information systems . IEEE Computer , 25 ( 3 ): 38 { 49 , 1992 .