Querying the Web of Interlinked Datasets using VOID Descriptions Ziya Akar Tayfun Gökmen Halaç Erdem Eser Ekinci Department of Computer Department of Computer Department of Computer Engineering, Ege University Engineering, Ege University Engineering, Ege University 35100 Bornova, Izmir, Turkey 35100 Bornova, Izmir, Turkey 35100 Bornova, Izmir, Turkey ziya.seagent@gmail.com tayfunhalac@gmail.com erdemeserekinci@gmail.com Oguz Dikenelli Department of Computer Engineering, Ege University 35100 Bornova, Izmir, Turkey oguz.dikenelli@ege.edu.tr ABSTRACT [1] is published as W3C Semantic Web Interest Group note1 . Query processing is an important way of accessing data on VOID is an RDF vocabulary and is used to describe meta- the Semantic Web. Today, the Semantic Web is character- data of RDF datasets, in a sense, metadata of the web of ized as a web of interlinked datasets, and thus querying the data. Linked open data cloud is represented as a graph of web can be seen as dataset integration on the web. Also, this datasets in which datasets are represented as nodes and sets dataset integration must be transparent from the data con- of links between datasets are represented as edges. Since sumer as if she is querying the whole web. To decide which VOID bases on graph based soul of web of data, it provides datasets should be selected and integrated for a query, one a strong way of describing metadata that allows to discover requires a metadata of the web of data. In this paper, to datasets which queries are distributed over. enable this transparency, we introduce a federated query en- In this paper, we present a federated query engine called gine called WoDQA (Web of Data Query Analyzer) which WoDQA (Web of Data Query Analyzer) which is developed discovers datasets relevant with a query in an automated to execute a query on distributed datasets without missing manner using VOID documents as metadata. WoDQA fo- answers using VOID metadata of datasets in linked open cuses on powerful dataset elimination by analyzing query data cloud. WoDQA focuses on effective dataset selection structure with respect to the metadata of datasets. Dataset for a query and analyzes query structure to eliminate irrel- and linkset descriptions in VOID documents are analyzed for evant datasets. Relevant datasets are selected by analyzing a SPARQL query and a federated query is constructed. By VOID documents and considering which dataset includes a means of linkset concept of VOID, links between datasets are resource related with the query and which links between incorporated into selection of federated data sources. Cur- datasets allow to find a result to the query. VOID meta- rent version of WoDQA is available as a SPARQL endpoint. data provides dataset descriptions representing content of a dataset and linkset descriptions representing relationships between datasets which are used by WoDQA for effective 1. INTRODUCTION dataset selection. While the web is evolving through a structured data space, There are two main approaches which enable automated many applications are publishing and linking their data, and query processing on the web of data and prevent data con- the cloud of this linked and open data will be gigantic as sumers from searching for relevant datasets. The first ap- times go. In this interlinked and structured data space, proach called follow-your-nose (link traversal) [11] is based query execution becomes one of the most important research on following links between data to discover potentially rele- problems and different query execution approaches and tools vant data, and the second one is query federation [7] which have been proposed in the literature [11, 7]. The query exe- is based on dividing a query into sub-queries and distribut- cution on the web of data is basically depends on searching ing sub-queries to relevant datasets which are selected using for resources that satisfy our needs, but we need to discover metadata about datasets. which parts of linked open data cloud may have such re- Follow-your-nose approach conceptualizes the web as a sources. To make this discovery effectively, which resources graph of documents which contains dereferenceable URIs. and vocabularies reside in a dataset and which datasets are This approach is based on executing queries on relevant interlinked to others via interested links should be taken into documents which are retrieved by following links between re- account. If dataset publishers provide such information by sources in different documents. But, this method raises com- describing metadata of their datasets, relevant datasets can pleteness and performance issues. Although some heuris- be selected effectively in an automated manner. To enable tic query planning methods can be used to answer different this automation, Vocabulary of Interlinked Datasets (VOID) kinds of queries [10], this approach cannot guarantee find- ing all results because relevant documents vary according Copyright is held by the author/owner(s). 1 LDOW2012, April 16, 2012, Lyon, France. http://www.w3.org/TR/void/ to the starting point and the path. Also, although follow- On the other hand, distributed querying depends on pro- your-nose requires nothing other than linked data principles cessing query parts directly on original data and managing to process a query, another disadvantage is that encounter- results retrieved from distributed data. Query federation ing large documents causes retrieval problems. The other [9, 7] and follow-your-nose [11] are mainstream distributed approach, query federation, has raised from database liter- querying approaches. DARQ [14], FedX [15] and SPLEN- ature, and is composed of two main steps before perform- DID [8] are the example implementations of the query fed- ing a query. Firstly, query is divided into sub-queries and eration approach. DARQ distributes a query using dataset datasets relevant with sub-queries are selected using some metadata called Service Descriptions3 which are constructed metadata which reflects dataset content. Then, the query manually by query developer, and benefits from triple and evaluation plan is changed using statistics about datasets entity counts and selectivity estimates to optimize the query in the query optimization step. For the purpose of exe- plan. Since DARQ uses predicates to select relevant datasets, cuting sub-queries on distributed data sources, query fed- the success of the query execution depends on associating eration requires accessing datasets via SPARQL endpoints. datasets with predicates, and triple patterns which have un- Contrary to follow-your-nose approach, in this approach, all bound predicates cannot be handled4 . On the other hand, results can be found under the assumption of metadata of FedX is an extended version of Federation SAIL provided all datasets is complete and accurate, and queries can be by AliBaba5 . Datasets which will be queried are given to optimized before execution by estimating execution using FedX, and it checks each triple pattern existence on each dataset metadata. To find all results in an effective way, dataset by ASK queries to decide about which triple pat- query federation determines relevant datasets before exe- tern will be queried on which datasets [16]. These two query cution using well-defined dataset metadata such as VOID federation implementations also stand up to self-descriptive documents. nature of linked data since metadata of datasets should be In the light of these ideas, WoDQA executes queries by described by data publishers as is in describing and link- analyzing VOID documents which constitute a projection of ing their data. The last query federation implementation is the web of data and incorporates follow-your-nose approach SPLENDID which indexes dataset using VOID descriptions, into query federation by considering links between datasets eliminates datasets by ASK queries for triple patterns, and in metadata. WoDQA does not change the evaluation order benefits from statistical data in VOID to optimize federated of a query because the main focus of this initial version of queries. WoDQA is only eliminating much more irrelevant datasets Although aforementioned query federation implementa- in dataset selection without query optimization. Current tions aim to query linked datasets, they do not consider links RDF federation implementations select relevant datasets by between data for dataset selection. For this reason, there considering only predicate and type indexes. Since vocab- are some shortcomings of these implementations from query- ularies in the Semantic Web should be common, there can ing web of data perspective. The first one is that deciding be a lot of datasets which use a specific property or class. datasets via only predicate indexes causes inability to select Therefore, using such indexes causes selection of redundant datasets effectively for triple patterns which have unbounded datasets. The main contribution of WoDQA is incorporating predicates or have so general predicates such as owl:sameAs both links between data through linkset concept and rela- and foaf:page that are extensively used in datasets6 . The sec- tionships between triple patterns of a query into dataset se- ond shortcoming is that so many datasets may be selected lection to eliminate irrelevant datasets effectively. We serve for triple patterns, and executing ASK queries in such a case WoDQA as a SPARQL endpoint and a simple web form2 increases the cost notably. One need to take the structure of to execute raw queries by analyzing datasets in the VOID the query into account to eliminate right irrelevant datasets stores. in the web of data context. Remaining sections are organized as follows. In Section 2, The second distributed querying approach is follow-your- related work is discussed. Section 3 introduces general archi- nose [11] whose basic idea is traversing RDF links between tecture of WoDQA and details dataset selection approach. data to discover relevant datasets. There is no need to any In Section 4, usage of WoDQA is shown with a working ex- prior metadata about datasets in advance as in query fed- ample. Finally, Section 5 concludes the paper. eration, but it needs initial URIs in some triple patterns to start exploring datasets. The main disadvantages of this approach are infinite link discovery, trying to retrieve large 2. RELATED WORK RDF graphs, failing to discover relevant data for queries The Semantic Web querying approaches can be classified with only bound predicates (?s foaf:friend ?o) or type state- as centralized and distributed. Centralized querying is based ments (?s rdf:type foaf:Person). These restrictions cause less on collecting linked data into a single central data store, and comprehensive result sets. One of well-known follow-your- querying the data from this store. This approach includes nose implementations is SQUIN [11] that traverse RDF links data warehousing which collects pre-selected data sources on the fly, i.e. during query execution. Hartig et al. improve and search engines which crawl the Web by following RDF this work using some heuristic methods that modify query links and index discovered data [12]. But, the main disad- vantage of this approach is that queried data is not live, i.e. 3 Service Description introduced in that paper contains in- duplicate of original sources. On the other hand, search en- formation about triples in the dataset, limitations on access gines cannot crawl all the web and cannot answer complete patterns, and statistical information about dataset. 4 structured queries. http://darq.sourceforge.net/#Limitations and known issues 5 http://www.openrdf.org/doc/alibaba/2.0-beta6/alibaba- 2 sail-federation/ The simple web form and up-to-date endpoint address can 6 be found on http://seagent.ege.edu.tr/etmen/wodqa.html SPLENDID also uses type indexes, but it is still not enough page. since vocabularies can be used frequently. evaluation order to reduce execution cost and to provide more comprehensive results [10], but the results strictly de- pend on the starting point and the evaluation order. On the other hand, Bouquet et al. formalize the web of data and suggest three different querying methods exploiting their web of data formalization [4]. These methods based on merging relevant graphs to execute queries on them. One of these methods uses follow-your-nose approach which speci- fies and merges relevant graphs by looking up URIs before query execution. WoDQA aims to query the web of interlinked datasets us- ing VOID dataset and linkset descriptions to decide relevant Figure 3.1: WoDQA internal architecture datasets for a query. At first, it assumes that all datasets are relevant with a query, then irrelevant datasets are elim- inated by analyzing query structure in the light of meta- SERVICE expressions11 . Details of QueryReorganizer are data of datasets. Its novelty is considering query structure given in Subsection 3.2. and links between datasets to select relevant datasets be- The last module is query executor which directly uses Jena fore query execution, and thus it incorporates follow-your- ARQ to execute SPARQL queries including the SERVICE nose approach into the query federation. To the best of expressions inserted by the QueryReorganizer. The feder- our knowledge, WoDQA is the first query engine which uses ated query constructed by QueryReorganizer is passed to datasets and linksets together that are critical elements of ARQ to be executed. Results of query execution are re- VOID to describe dataset metadata. turned to the querior. In the following subsections, the first two modules which implement WoDQA analysis and reor- ganization phases are explained. 3. WODQA INTERNAL ARCHITECTURE In this section, query processing architecture of WoDQA 3.1 Dataset Analyzer is explained in detail. Since it is impractical to perform a This section introduces the details of the DatasetAna- query on all published datasets on the web, WoDQA aims lyzer module which is the core and the innovative part of to transform a query into a federated query which is evalu- the current version of WoDQA. Unlike other query federa- ated only on relevant datasets. In this direction, to process a tion approaches, WoDQA considers triple pattern relations query on the linked data cloud, WoDQA contains three main and links between datasets while selecting datasets. Thanks modules as seen in Figure 3.1: DatasetAnalyzer, QueryRe- to dataset analysis of WoDQA, relevant datasets are speci- organizer and Jena ARQ 7 . fied while plenty of irrelevant ones are excluded. Output of Dataset publishers construct the VOID documents of their dataset analysis is a subset of all published datasets on the datasets and the Semantic Web programmers can access web of data, and thus the query is performed only on this these documents through services called VOID store such subset including related ones. For the purpose of explaining as voiD Browser8 , CKAN9 and voiDStore10 . A VOID store how this subset is constructed, we give a formalization in generates a projection of Linked Open Data, and thus this this section. structure obliges dataset publishers to create well-defined We firstly give a definition of the web of data to formalize VOID document which reflects actual content of the dataset our dataset selection approach. In summary, the web of data to enable including the dataset in relevant queries. Datase- is an RDF graph which is constructed by typed links between tAnalyzer is the module which is responsible for discover- data from different sources. Basically an RDF graph (G) is ing relevant datasets and eliminating irrelevant ones using formally represented as a set of triples in the form of hs, p, oi: VOID documents of datasets in the VOID stores. We as- G = {hs, p, oi | hs, p, oi∈ (I ∪ B) × I × (I ∪ B ∪ L)} where I sume that dataset publishers update the description doc- is the set of IRIs[6], B is the set of blank nodes, L is the uments in the VOID stores to make VOID stores up-to- set of literals, and all are RDF terms T = I ∪ B ∪ L . In date for dataset selection when datasets are changed. In the this direction, web of data is the global graph (Gwod ) which current version of WoDQA, DatasetAnalyzer discovers the consists of the triples constructed from IRIs, blank nodes VOID documents from the CKAN net, and analyzes dataset and literals on the web. Gwod is a model of mathematical and linkset descriptions for each triple pattern in the query. RDF construct for the web of data. This analysis eliminates irrelevant datasets which definitely From another perspective, the web of data means web do not contain any result contributing to the result of the of interlinked datasets [3]. A dataset (δ) is a meaningful query by assuming that accurate and complete VOID docu- set of RDF triples [1] which decreases granularity of the ments of datasets are available. Dataset analysis is achieved web. Rather than publishing information only as single re- by a rule-based approach. We explain the rules which dis- sources and connecting these resources, datasets are the way covers relevant datasets Subsection 3.1. of publishing information as sub-graphs of Gwod . These sub- The second module is QueryReorganizer which rewrites graphs, i.e. datasets, are connected via RDF triples which queries depending on results of DatasetAnalyzer. This rewrit- connect resources in different datasets [13]. The publishers ing process constructs federated SPARQL queries including create their resources and deploy them into the datasets on 7 the web, and consumers use these resources while creating http://jena.sourceforge.net/ARQ/ their datasets. With regard to this, to formalize a dataset, 8 http://kwijibo.talis.com/voiD/ we use subj (G) which represents the set of resources which 9 http://ckan.net/ 10 11 http://void.rkbexplorer.com/ http://www.w3.org/TR/sparql11-federated-query/ are the subjects of triples in a graph. Definition 1. A dataset is a sub-graph of web of data, δx ⊂ Gwod , and the resources which are included by δx are specified as follows: ∀r, δi (r ∈ subj (δx ) → Owner (δx , r)). VOID describes a dataset with well-defined properties12 , and we formalize a VOID dataset description as a tuple hLspace , I voc i ∈ L × I. The first dataset property Lspace corresponding to void:uriSpace set which contains string lit- erals that all entity IRIs in a dataset start with. The other Figure 3.2: Example VOID Models one is I voc corresponding to void:vocabulary which denotes the set of vocabularies used by the dataset13 . The triples whose object is a resource in another dataset datasets. Since this is impractical, our purpose is eliminat- make the web of data a graph of interlinked datasets. We ing irrelevant datasets for each triple pattern. By elimina- call such triples link triples, and define the set of link triples tion of irrelevant datasets, we construct a federated query as LT = {(s, p, o) |owner (s) 6= owner (o)} where s, o ∈ I. which distributes sub-queries to only relevant ones. In or- This definition leads us to define link predicate (plink ) con- der to eliminate irrelevant datasets, we introduce a set of cept which corresponds to void:linkPredicate used to define rules which discover the relevant datasets in dataset analysis, a linkset which is an important contribution of VOID ef- called relevant dataset discovery rules. A relevant dataset fort. Set of link predicates P link includes all the predicates for a triple pattern is formalized as an assertion ρ(δx , tpi ) which are used in a link triple: P link = {plink |∃ s, plink , o ∈ which denotes that the dataset δx may contain a result for LT }. A linkset represents link triples which connect re- the triple pattern tpi . We need to discuss how relevant sources in different datasets using the D same link predicate.E dataset assertions (ρ) inferred by discovery rules used for We formalize the linkset, λ, as a tuple δλf rom , δλto , plink λ ∈ eliminating irrelevant datasets. To explain the method of ∆ × ∆ × P link where ∆ is the set of all datasets on the web. eliminating irrelevant datasets, assume that Qtpi is the set In this definition, δλf rom is the referrer dataset which is the of datasets selected to be queried for tpi which we call se- owner of subject of link triples in the linkset, δλto is the refer- lected set, and this set initially contains all datasets on the enced dataset which is the owner of object of link triples in web, Qinit tpi ≡ ∆ (all datasets in a VOID store in our case). the linkset, and plink is the link predicate of all link triples Each rule analyzes datasets in Qtpi of each triple pattern. λ in the linkset. After applying a rule, elements of Qtpi which the rule does DatasetAnalyzer uses both dataset and linkset descrip- not infer relevant dataset assertion about are removed from tions of VOID metadata to select the relevant datasets, in Qtpi . But, if the rule does not imply any relevant dataset other words sub-graphs, which may contain the results of assertion, Qtpi remains the same. Irrelevant dataset elimi- the query. By this means, the query is transformed into a nation method is formalized as Qnew tpi below to denote such federated query and executed on the relevant datasets on an update of selected set subsequent to executing a rule. the web. Accordingly, we exclude the datasets which do not  ∃δa (ρ (δa , tpi )) ; {δx | (δx ∈ Qtpi ) ∧ ρ(tpi , δx )}  contain any result for the query while querying the global Qnew = tpi ∃0 δa (ρ (δa , tpi )) ; Qtpi graph, Gwod . Beside analyzing relationships between VOID descriptions (datasets, linksets) and triple pattern, relation- ships between triple patterns in a query are considered. For In the following subsection we give relevant dataset discov- this reason, we need to give a formal definition of SPARQL ery rules in detail and we use some queries to exemplify ap- queries. plication of rules. Figure 3.2 shows VOID models of a set of We consider a subset of SPARQL queries which corre- datasets which are used in these examples. This model con- sponds to a basic graph pattern for formalization [5]. A tains simplified VOID descriptions of five datasets and the basic graph pattern consists of triple patterns, BGP = {tpi | linksets connecting these datasets by link predicates. Link hstpi , ptpi , otpi i ∈ (I ∪ V) × (I ∪ V) × (I ∪ L ∪ V)}. A triple predicates are showed by arrows between dataset descrip- pattern is slightly different from RDF triple since it contains tions which are represented with squares. Also we create a at least one and at most three variables that are elements sample dataset called Facebook which keeps the data about of infinite set V 14 . While performing a query, its variables Facebook users in our local store. This data is about that are replaced by RDF terms. Thus, according to semantics movie resources located in LinkedMDB dataset are liked by of SPARQL, a query result is a set of solution mappings, which users. Although there can be a lot of linksets, in Fig- {µ|µ : V → T }, where a solution mapping, µ, is a partial ure 3.2, only a few linksets are taken into account to explain function from variables to RDF terms. our rules are depicted. To perform a query on the web of data, in the worst case, each triple pattern has to be queried on all published 3.1.1 Relevant Dataset Discovery Rules 12 This subsection presents a set of rules each of which repre- DatasetAnalyzer of the current version of WoDQA consid- sents an analysis method of the DatasetAnalyzer module to ers only these properties for the sake of simplicity. We plan to integrate other properties such as statistics in the future discover relevant datasets effectively. Each relevant dataset to make optimized queries. discovery rule aims to analyze datasets from different per- 13 spectives and combinations of them to infer relevant dataset Note that schemas of the Semantic Web languages such as RDFS and OWL are not specified in I voc . (ρ) assertions for triple patterns. These perspectives can 14 We exclude blank nodes in query. be classified into three groups. The first one is analyzing IRIs in the triple patterns. We call this perspective IRI- Another perspective to discover relevant datasets is Link- based Analysis where namespaces of IRIs and vocabularies ing Analysis. Since a triple links two resources in the same in the VOID documents are considered to determine relevant dataset or in different datasets, this perspective is separated datasets. The second one is considering linked resources. into two kinds of analyses, each of which considers different We call this perspective Linking Analysis which considers kind of triples. The first one is Internal Linking Analysis whether a triple links two resources in the same dataset (in- which considers triples linking resources in the same dataset. ternal) or in different datasets (external) to eliminate irrele- A relevant dataset found by this analysis is called internal vant datasets. The last perspective is Shared Variable Anal- relevant dataset, and is represented with ρint (δx , tpi ). On ysis. Since triple patterns share some variables, each triple the other hand, External Linking Analysis considers link pattern affects relevant datasets of other triple patterns that triples which connect resources in different datasets. In this include same variables. Relevant dataset discovery rules of case, linkset descriptions of VOID documents are taken into all perspectives are introduced in this section. account to discover relevant datasets, and a dataset found The first two rules are under the IRI-based analysis per- by this analysis is called external relevant dataset which is spective each of which considers vocabularies I voc of VOID represented as ρext (δx , tpi ). Internal and External Link- metadata. The first discovery rule checks whether IRIs in ing analyses are both executed for a triple pattern in whole triple patterns are RDFS (or OWL) classes or properties in Linking Analysis process, and then produced internal and the vocabulary set (I voc ) of VOID documents. To give the external relevant datasets for a triple pattern are unified as rule, we define has Iδvoc x , r function which represents that a relevant datasets for the triple pattern as shown in Rule 3. resource r ∈ I (a property or a class IRI) is included by one of the vocabularies in Iδvoc x . Using has expression, relevance Rule 3. Union of external and internal datasets for a of a dataset δx to an IRI r is represented with V ocM atch as triple pattern constitutes relevant datasets for the triple pat- shown in Definition 2. tern. Definition 2. ∀δx (has(Iδvoc x , r) → V ocM atch(δx , r)) where ∀δx , tpi ρint (δx , tpi ) ∨ ρext (δx , tpi ) → ρ (δx , tpi )  r∈I For a triple pattern such as ?s dbpprop:name15 “Nikola Irrelevant dataset elimination method considers relevant Tesla”, dbpprop:name RDFS property in the predicate po- datasets which are specified by Rule 3 to eliminate irrelevant sition obliges that matching triple patterns can only be in datasets from selected set of a triple pattern (Qtpi ). Inter- datasets which uses dbpprop vocabulary. Therefore, we can nal and external datasets are intermediate results to infer eliminate datasets which do not use dbpprop vocabulary. relevant datasets in a Linking Analysis. This situation is handled by Rule 1 which is similar to pred- The first two rules under Linking Analysis perspective icate indexes. are linking-to-IRI discovery rules. Consider the example triple pattern, ?film owl:sameAs dbpedia:A Fistful of Dollars, Rule 1. If there is a dataset in which one of its vocab- which can be matched with a triple that links a resource to ularies includes the predicate of a triple pattern, then it is dbpedia:A Fistful of Dollars resource. Triple patterns whose relevant for the triple pattern. object is an IRI are analyzed by these rules. Since own- ers of the linked IRI must be known in these rules, we ∀tpi , δx (V ocM atch (δx , ptpi ) → ρ (δx , tpi )) give a definition that depicts the owners of any resource on the basis of IRI Analysis. We formalize inclusion of a According to Figure 3.2, DBpedia uses dbpprop vocabu- resource (r ∈ I) by a dataset (δx ) in Definition 3 by us- lary, and the rule decides that it is relevant for such a triple ing the urispaces (Lspace ) property of the VOID description pattern. In web of data, lots of datasets which uses dbpprop and startsW ith r, Lspace function which represents that r δx vocabulary can be found, and they can be eliminated using starts with one of the urispaces in Lspace . δx outputs of other discovery rules. To introduce Rule 2, consider another triple pattern, ?pro- Definition 3. ∀δx (startsW ith(r, Lspace ) → Owner(δx , r)) δx ducer rdf:type linkedMDB:producer, which contains a type definition for the variable ?producer. In such cases, the ob- Rule 4 is linking-to-IRI internal discovery rule from the in- ject of the triple pattern is a class definition, it makes sense ternal linking point of view. According to the example query to eliminate the datasets which do not use the vocabulary ?film should be in the same dataset with dbpedia:A Fistful of of this class. Rule 2 resembling type indexes is used to spec- Dollars, i.e. owner of the dbpedia:A Fistful of Dollars resource. ify relevant datasets for such triple patterns. The example Therefore appropriate triple patterns can be found in DB- triple pattern is queried from the datasets that use linked- pedia dataset. MDB vocabulary, i.e. LinkedMDB dataset for our example model. Other datasets do not include a resource which is Rule 4. If there is a triple pattern whose object is an IRI, an instance of linkedMDB:producer class, and therefore they then owner datasets of the IRI are internal relevant for the are eliminated by output of this rule. triple pattern. Rule 2. If there is a dataset in which one of its vocabu- ∀tpi , δx (δx ∈ Qtpi ) ∧ Owner (δx , otpi ) → ρint (δx , tpi )  laries includes the object of a triple pattern when the predi- where otpi ∈ I, stpi ∈ V cate of the triple pattern is rdf:type, then the dataset is rel- evant for the triple pattern. On the other hand, from the external linking point of view, appropriate triples can be found in datasets which are ∀tpi , δx (V ocM atch(δx , otpi ) ∧ (ptpi = rdf:type) → ρ(δx , tpi )) linked to the owner datasets of object IRI. For our exam- 15 All prefixes used in the paper are defined in Table 1. ple triple pattern, ?film should be in datasets which contain link triples whose object resource is defined in DBpedia. For Rule 7 is IRI-links-to external discovery rule and it finds this analysis, linkset descriptions of VOID documents are the triples that connect resources in different datasets. Ac- used. To discover relevant datasets for a triple pattern by cording to this rule an owner dataset of the subject IRI of using linking-to-IRI external discovery rule, the triple pat- the triple pattern is external relevant only when there is tern should have a bound predicate. We define Compatible a linkset definition that includes owner dataset as referrer expression in Definition 4 to represent that which linkset dataset compatible with the triple pattern. description is appropriate to use for determining relevant datasets for a triple pattern. Rule 7. If there is a linkset description that is compati- ble with the triple pattern and whose referrer dataset is an Definition 4. If selected set of a triple pattern has the owner dataset of the triple pattern’s subject, then the refer- referrer dataset of a linkset description and link predicate rer dataset of the linkset description is external relevant for of the linkset description is same with the triple pattern’s the triple pattern. predicate, then the linkset description is compatible with the triple pattern. ∀tpi , λm , δx (Compatible(λm , tpi ) ∧ Owner(δx , stpi ) ∧ (δx = δλf m rom ) → ρext (δx , tpi )) where otpi ∈ V, stpi ∈ I ∀λm , tpi ((δλf m rom ∈ Qtpi ) ∧ (plink λm = ptpi ) → Compatible(λm , tpi )) According to the example, DBpedia is the owner dataset of dbpedia:Ennio Morricone and also there is a linkset de- Considering ?film owl:sameAs dbpedia:A Fistful of Dollars scription whose referrer dataset is DBpedia and whose link triple pattern, and remembering the linkset description of predicate is owl:sameAs. our example model in Figure 3.2, there are linksets from Other discovery rules in Linking Analysis are combined LinkedMDB and YAGO datasets to DBpedia dataset whose with Shared Variable Analysis. The first discovery rule link predicates are owl:sameAs. Rule 5 which is under the which conforms to this combined analysis is Chaining Triple Linking Analysis perspective gives these two datasets as ex- Patterns Analysis. This rule considers two triple patterns to- ternal relevant datasets for this triple pattern. If a dataset gether to discover relevant datasets. This is a characteristic is not linked to DBpedia by owl:sameAs predicate then one of Shared Variables Analysis, since it depends on analyzing can conclude that this dataset is irrelevant with the triple more than one triple pattern that have same variable. Triple pattern. patterns below are example of chaining triple patterns: ?s owl:sameAs ?film. Rule 5. If there is a linkset description that is compati- ?film linkedMDB:producer name “Sergio Leone” ble with the triple pattern and whose referenced dataset is an Notice that the second triple pattern is used for querying owner dataset of the triple pattern’s object, then the referrer films whose producer name is “Sergio Leone”. Thus, the sec- dataset of the linkset description is external relevant for the ond triple pattern affects the relevant datasets of the first triple pattern. triple pattern. From internal linking point of view, the first triple pattern can be found in datasets which satisfy the sec- ∀tpi , λm , δx (Compatible(λm , tpi ) ∧ Owner(δx , otpi ) ∧ (δx = ond triple pattern because ?s and ?film should be in the same δλtom ) → ρext (δλf m rom , tpi )) where otpi ∈ I, stpi ∈ V dataset. In this direction, Internal Chaining Triple Pattern Analysis formalized in Rule 8 is used to discover internal Recall that internal and external relevant datasets are uni- relevant datasets for triple patterns. fied by Rule 3 after applying rules in Rule 4 and Rule 5. Thus, the final relevant datasets which are selected by the Rule 8. If there is a triple pattern whose object is same linking-to-IRI rules are DBpedia, LinkedMDB and YAGO. with the subject of another triple pattern, then datasets in- Another couple of rules under the Linking Analysis per- cluded by selected sets of both triple patterns are internal spective are IRI-links-to rules. These rules are applied to relevant. triple patterns whose subject is an IRI and object is a vari- able to determine relevant datasets from internal and ex- ∀δx , tpi , tpj ((otpi = stpj ) ∧ (δx ∈ Qtpi ) ∧ (δx ∈ Qtpj ) → ternal linking point of view. Rule 6 is IRI-links-to internal ρint (δx , tpi ) ∧ρint (δx , tpj )) where otpi , stpi ∈ V discovery rule and it finds the triples that link resources in the same dataset. One can conclude that if the subject of To execute Chaining Triple Pattern Analysis, execution a triple pattern is an IRI, then triples matching with this order of rules becomes important. To exemplify this situ- triple pattern are in the owner dataset of this IRI. ation according to Chaining Triple Patterns query, assume that IRI-based analysis is applied before, and LinkedMDB Rule 6. If a dataset is an owner of the subject of a triple is the relevant dataset for the second triple pattern since pattern, then this dataset is internal relevant dataset for the LinkedMDB VOID includes linkedMDB as the value of vo- triple pattern. cabulary. Then, this rule can specify that the internal rel- evant dataset for the first triple pattern is LinkedMDB. It ∀tpi , δx (δx ∈ Qtpi ) ∧ Owner(δx , stpi ) → ρint (δx , tpi ) where  is clearly seen from this example, Shared Variable Analysis otpi ∈ V, stpi ∈ I should be performed after the execution of IRI-based Anal- ysis rules to eliminate more datasets. Hence, in the Sub- Consider the triple pattern dbpedia:Ennio Morricone owl: section 3.1.2, we give an overview of analysis process which sameAs ?person. Subject of this triple pattern is an IRI specifies an execution order for these rules. whose namespace is dbpedia, and therefore an internal rel- After the IRI-based analysis, if we apply Rule 8 for the ex- evant dataset is DBpedia whose VOID metadata contains ample triple patterns, relevant datasets of the second triple dbpedia as value of urispace property. pattern is shown as Qtp2 = {δLinkedM DB }, and of the first one is shown as Qtp1 ≡ ∆. For this case, this rule asserts triple patterns, then referrer datasets of the linkset descrip- that ρint (δLinkedM DB , tp1 ) and ρint (δLinkedM DB , tp2 ). tions are external relevant for the triple patterns. On the other hand, External Chaining Triple Patterns analysis uses linkset descriptions while considering two triple ∀λm , λn , tpi , tpj ((otpi = otpj )∧Compatible(λm , tpi )∧Com− patterns. While internal one can be used for all triple pat- patible(λn , tpj )∧(δλtom = δλton ) → ρext (δλf m rom , tpi )∧ρext (δλf nrom , terns without considering the predicate, external rule is ap- tpj )) where otpi ∈ V plied for a triple pattern which has a link predicate. Rule 9 introduces this rule. For the example of Object Sharing Triple Pattern Anal- ysis, assume that no rule is applied before this analysis Rule 9. If there is a triple pattern (tpi ) whose object is and selected sets are Qtp1 ≡ ∆ and Qtp2 ≡ ∆. In Fig- same with the subject of another one (tpj ), and there is a ure 5, δDBpedia , δY AGO , and δLinkedM DB have link triples linkset description which is compatible with (tpi ) and its ref- with predicate owl:sameAs. But, only δF acebook has a linkset erenced dataset is included by the selected set of tpj , then to δLinkedM DB with predicate facebook:likes, and therefore referrer dataset is external relevant for tpi and referenced Qnew tp1 = {δF acebook }. In this case, tp2 can only be queried dataset is external relevant for tpj . on the datasets which is linked to δLinkedM DB with predi- cate owl:sameAs, and thus Qnewtp2 = {δDBpedia }. Other triples ∀λm , tpi , tpj ((otpi = stpj ) ∧ Compatible(λm , tpi ) ∧ (δλtom ∈ whose subject corresponding to ?film in other datasets ac- Qtpj ) → ρext (δλf mrom , tpi ) ∧ ρext (δλtom , tpj )) where otpi ∈ V cording to tp2 cannot include objects that satisfy object of tp1 . As done in other Linking Analysis methods, internal With respect to the example of chaining triple patterns, and external relevant datasets which are inferred by Rule 10 assume that selected set for the first triple pattern is Qtp1 ≡ and Rule 11 are unified according to Rule 3. ∆, and for the second one is Qtp2 = {δLinkedM DB }. Rule The last analysis from the Shared Variable Analysis per- 9 determines that δDBpedia is external relevant for tp1 since spective considers the triple patterns which have the same resources ?film can be found in δLinkedM DB and there is a subject called Subject Sharing Triple Patterns Analysis. This linkset between these two datasets with owl:sameAs predi- rule does not use Linking Analysis perspective, and thus it cate. It is clear that no other dataset can contain an appro- does not contain Internal and External Linking Analysis. priate triple if it is not linked to LinkedMDB by owl:sameAs. Consider the following example triple patterns for Subject At the end of Chaining Triple Pattern Analysis, internal and Sharing Triple Pattern Analysis: external relevant datasets specified by Rule 8 and 9 are uni- ?city dbpprop:name “Izmir” . fied according to Rule 3. ?city dc:terms ?subject. Another analysis which uses both Linking Analysis and According to our dataset definition, triples which have the Shared Variable Analysis is Object Sharing Triple Patterns same subject are included by the same dataset. Based on Analysis. For triple patterns which have the same object this, Rule 12 infers datasets which are relevant for triple pat- variable, only the datasets can include triples which satisfy terns by taking the intersection of selected sets into account. the object variable of both triple patterns. To simplify the Assume that selected sets for triple patterns are Qtp1 = explanation, we use the following example for Object Shar- {δDBpedia } and Qtp2 ≡ ∆. According to the rule, the fi- ing Triple Pattern Analysis below: nal datasets according to this rule are Qnew tp1 ≡ Qnew tp2 = ?person facebook:likes ?movie. {δDBpedia }. ?film owl:sameAs ?movie. From internal linking point of view, ?person and ?film Rule 12. If there is a triple pattern whose subject is same should be in the same dataset with ?movie. Rule 10 speci- with the subject of another triple pattern, then the datasets fies the internal relevant datasets for triple patterns which included by selected sets of both triple patterns are relevant. have the same object. Assume that Qtp1 = {δF acebook } due to value of vocabulary property of Facebook VOID and ∀δx , tpi , tpj ((stpi = stpj )∧(δx ∈ (Qtpi ∩Qtpj )) → ρ(δx , tpi )∧ Qtp2 ≡ ∆. This rule determines that only δF acebook can ρ(δx , tpj )) where stpi ∈ V contain internal triples that satisfy triple patterns together. Up to this point, we have given the relevant dataset dis- Rule 10. If there is a triple pattern whose object is same covery rules which are used to determine relevant datasets with the object of another one, then the datasets included by from different perspectives. These rules are executed to- selected sets of both triple patterns are internal relevant. gether for a query to make a complete analysis. Next section introduces the analysis process which specifies the execution ∀δx , tpi , tpj ((otpi = otpj ) ∧ (δx ∈ Qtpi ) ∧ (δx ∈ Qtpj ) → order for rules. ρint (δx , tpi ) ∧ρint (δx , tpj )) where otpi ∈ V 3.1.2 Analysis Process To execute Object Sharing Analysis from external point The rules introduced above should be executed together of view, we benefit from the linkset descriptions. To find ap- to provide effective dataset selection. In this section, exe- propriate link triples, ?person and ?film should be in different cution of rules are explained in the process of DatasetAn- datasets. Rule 11 determines external relevant dataset for alyzer which is shown Figure 3.3. In the figure, QBGP = triple patterns which have the same object. This rule consid- {Qtpi |tpi ∈ BGP } is the set of selected sets of all triple pat- ers two linkset descriptions together for two triple patterns. terns in a query. We use Qinit BGP to represent the initial state where selected set of each triple pattern contains the whole Rule 11. If two triple patterns have the same object, and web of data, ∀Qtpi ∈ QinitBGP (Qtpi ≡ ∆). This set is the in- there are two linkset descriptions which have the same ref- put of the single step analysis, and selected sets in this set are erenced dataset each of which is compatible with one of the constrained by execution of the rules includes the IRI-based Algorithm 1 This algorithm divides query into sub-triples FUNCTION DivideT riples() INPUT bgp = {tp1 , . . . , tpn } including n triple patterns; LET i := 1, α := 1; Figure 3.3: WoDQA Analysis Process LET SubT riplesα := {tpi }; WHILE i < n DO IF Qtpi+1 = Qtpi THEN analyses. Single step analysis includes vocabulary match, LET SubT riplesα := SubT riplesα ∪ {tpi+1 }; linking-to-IRI and IRI-links-to rules which are executed only ELSE once. The reason is that the rules based on IRI-based analy- LET SubT riplesα+1 := {tpi+1 }; sis produce the same result for every execution because they LET α := α + 1; do not depend on current Qtpi . Although the rules in sin- LET i := i + 1; gle step analysis do not have a specific order, using output of each rule in dataset elimination method reduces current datasets set of the triple patterns. Then, Qconstrained BGP is sents the federated form of the initial query which contains given to the repetitive analysis phase. ordered service graph patterns. WoDQA executes the reor- On the other hand, rules based on Shared Variable Anal- ganized query using Jena ARQ query engine. ysis take more than one triple pattern into consideration. Besides grouping triple patterns, although WoDQA does Since different combinations of triple patterns affect the se- not include query optimization phase of query federation ap- lected sets of each other, these rules depend on current se- proach, only moving up FILTER expression[2] optimization lected sets of triple patterns to discover the datasets rele- technique is used. Thus intermediate results are filtered as vant to the triple patterns. For this reason, they are exe- early as possible. Furthermore, WoDQA supports queries cuted repetitively until no dataset is eliminated from any include UNION and OPTIONAL keywords but queries in- Qtpi . The repetitive analysis phase produces Qnew BGP by ex- clude GRAPH keyword and blank nodes are not supported. ecuting Shared Variable Analysis rules. After the phase is completed once, if an elimination has been done, Qnew BGP is given to repetitive analysis as Qconstrained BGP , and the phase is 4. USAGE SCENARIO repeated. On the other hand, when the rules do not change There are two ways for users to benefit from WoDQA. any selected set, i.e. Qnew BGP ≡ QBGP constrained , the analysis is The first one is the SPARQL endpoint of WoDQA18 which finished and the result is produced as QfBGPinal . can be used to redirect raw queries. One can construct a SPARQL query with a SERVICE block including the raw 3.2 Query Reorganizer query, and use WoDQA SPARQL endpoint as the remote The QueryReorganizer module is responsible for rewrit- service. When this query is executed, the WoDQA endpoint ing a query using final selected set of each triple pattern is invoked, and this endpoint transforms the query into a federated form by means WoDQA and executes this feder-   QfBGP inal decided by the DatasetAnalyzer. While rewriting ated query on the relevant dataset transparently to the user. a federated query, Query Reorganizer conforms to SPARQL 1.1 federation extension16 . The other way is using the web form of the WoDQA19 . While the initial query is a set of triple patterns, QueryRe- In this section, a sample query execution on the web form organizer divides the query into sub-queries and makes it a of WoDQA is explained. Reorganized form of the query, set of service graph patterns each of which is represented results of select and construct queries and execution time with a tuple sgpα = hSrvα , SubT pα i. A service graph pat- can be observed in this form. tern consists of a Srv set which includes datasets17 to send The example query seen in the WoDQA web form in Fig- the sub-query, and a SubT p set which is the subset of the ure 4.1 searches for an answer to “Which facebook users like triple patterns of the initial query, i.e. a sub-query. Per- movies which are produced by a German producer?”. This forming a service graph pattern is unifying the results of the query is represented as BGP = htp1 , . . . , tp5 i where sub-query in each dataset in Srvα . tp1 = h?faceUser,facebook:likes,?moviei, Triple patterns which have the same selected set (Qtpi ) tp2 = h?movie,linkedMDB:producer,?produceri, are added to the same service graph pattern to decrease net- tp3 = h?dbProducer,owl:sameAs,?produceri , working cost. But, only consecutive triple patterns can be tp4 = h?anyMovie, dbpo:producer, ?dbProduceri, in the same sub-query because WoDQA does not change the tp5 = h?dbProducer, dbpo:birthPlace, dbpedia:Germanyi. evaluation order of the query. In this direction, elements of We explain how relevant datasets are found according SubT p sets of service graph patterns are found by Algorithm to the WoDQA Analysis process introduced in Figure 3.3. 1. Initially, single step analysis phase is performed for this Datasets of a sub-query is formalized as Srvα = {δx | query. Selected sets of tp1 and tp2 are eliminated via out- δx ∈ Qtpi , tpi ∈ SubT pα }, and service endpoint URLs are put of predicate vocabulary match, and in the consequence procured from VOID documents of the datasets. The output of single step analysis they are Qtp1 = {δF acebook } and of the Query Reorganizer is shown as ReorganizedBGP = Qtp2 = {δLinkedM DB }. No relevant dataset is found for tp3 hsgp1 , . . . , sgpm i where 1 ≤ m ≤ n and n is the number of in the single step analysis, because owl:sameAs is a generic triple patterns of the initial query. ReorganizedBGP repre- 18 Up-to-date WoDQA SPARQL endpoint address can 16 http://www.w3.org/TR/sparql11-federated-query/ be found on http://seagent.ege.edu.tr/etmen/wodqa.html 17 page. In the implementation, SPARQL endpoints of datasets are 19 used in SERVICE expressions. http://seagent.ege.edu.tr/etmen/wodqa.html VOID descriptions which include a SPARQL endpoint to query the dataset, reflect actual content of the dataset com- pletely and accurately, and include linksets between datasets to select datasets effectively. WoDQA allows users to con- struct raw queries without the need to know how query will divide into sub-queries and where sub-queries are executed. Query results are complete under the assumption of avail- able, accurate and complete VOID descriptions of datasets. The initial version of WoDQA which is introduced in this paper has some disadvantages arising from query federation approach which WoDQA builds upon. As mentioned previ- ously, follow-your-nose has some problems such as missing results and large document retrieval. Similar problems may occur for query federation. Firstly, to find complete results to queries, it is required that metadata of all datasets must be well-defined and accurate. But, to provide such an ac- curate dataset metadata an automated mechanism which continuously updates the metadata is required. However, even there would be a tool which implements this require- ment, providing accurate dataset metadata via such a tool is the responsibility of dataset publishers. Another problems of query federation are high latency and low selectivity of datasets which are similar to retrieval of large documents in follow-your-nose. Query optimization can be a solution for these problems of query federation. Grouping triple patterns to filter more triples on an end- point can prevent high latency (required processing time) Figure 4.1: Example query execution with WoDQA and changing query evaluation order according to dataset selectivity statistics can prevent retrieving large result sets. To make WoDQA functioning in the wild, optimization step property and owl is not defined as vocabulary property in of query federation is required to be implemented. We plan VOIDs. Thus, Qtp3 still includes all datasets (∆). Predicate to incorporate triple pattern selectivity into query reorgani- vocabulary match discovers relevant datasets for tp4 and tp5 , zation using VOID properties about statistics. and their selected sets are Qtp4 = Qtp5 = {δDBpedia }. On the other hand, we could not make an evaluation of our After the single step analysis is applied to all triple pat- approach in this paper, since VOID documents in current terns, the repetitive analysis phase is performed, and se- VOID stores are not well-defined. Since SPARQL endpoint lected set of tp3 is eliminated in this phase. Subject Sharing definitions, linkset descriptions or vocabularies are missing Triple Patterns Analysis discovers relevant datasets for tp3 in most of VOID documents, we could not find a chance to since tp3 and tp5 have the same subject variable, and thus execute comprehensive scenarios. Developing a tool which its selected set becomes Qtp3 = {δDBpedia }. extracts well-defined VOID descriptions of datasets, and by The reorganized query shown in Figure 4.1 is rendered this means evaluating our approach is a required future work with these analysis results by QueryReorganizer and for- to confirm applicability of WoDQA on linked open data. malized as ReorganizedBGP = hsgp1 , sgp2 , sgp3 i where Also, evaluating the analysis cost of WoDQA for a large sgp1 = h{δF acebook } , {tp1 }i, VOID store will be possible when well-defined VOIDs are sgp2 = h{δLinkedM DB } , {tp2 }i, constructed. sgp3 = h{δDBpedia } , {tp3 , tp4 , tp5 }i. After the reorganizing process, the triple patterns in the service graph patterns are executed on related endpoints by 6. REFERENCES means of Jena ARQ, and query results are incrementally [1] K. Alexander, R. Cyganiak, M. Hausenblas, and collected. In conclusion, results related with the query are J. Zhao. Describing Linked Datasets - On the Design listed at the bottom of the web form page as seen in Figure and Usage of voiD, the ’Vocabulary of Interlinked 4.1. Datasets’. In WWW 2009 Workshop: Linked Data on the Web (LDOW2009), Madrid, Spain, 2009. 5. CONCLUSION [2] Abraham Bernstein, Christoph Kiefer, and Markus In this paper, we have introduced a query federation en- Stocker. OptARQ: A SPARQL Optimization gine called WoDQA that discovers related datasets in a VOID Approach based on Triple Pattern Selectivity store for a query and distributes the query over these datasets. Estimation. Technical Report ifi-2007.03, Department The novelty of our approach is exhaustive dataset selection of Informatics, University of Zurich, 2007. mechanism which includes analysis of triple pattern relations [3] Chris Bizer, Tom Heath, Danny Ayers, and Yves and links between datasets besides analyzing datasets for Raimond. Interlinking open data on the web. each triple pattern. WoDQA focuses on discovering relevant www4.wiwiss.fu- datasets and eliminating irrelevant ones using a rule-based berlin.de/bizer/pub/LinkingOpenData.pdf, 2007. approach introduced in this paper. Our approach requires Stand 12.5.2009. Table 1: Prefix definitions rdf: dbpedia: dbpprop: linkedMDB: owl: dc: foaf: facebook: [4] Paolo Bouquet, Chiara Ghidini, and Luciano Serafini. Manfred Hauswirth, Jörg Hoffmann, and Manolis Querying the web of data: A formal approach. In Koubarakis, editors, The Semantic Web: Research and Proceedings of the 4th Asian Conference on The Applications, volume 5021 of Lecture Notes in Semantic Web, ASWC ’09, pages 291–305, Berlin, Computer Science, chapter 39, pages 524–538. Heidelberg, 2009. Springer-Verlag. Springer Berlin / Heidelberg, Berlin, Heidelberg, 2008. [5] Carlos Buil, Marcelo Arenas, and Óscar Corcho. [15] Andreas Schwarte, Peter Haase, Katja Hose, Ralf Semantics and optimization of the SPARQL 1.1 Schenkel, and Michael Schmidt. Fedx: A federation Federation Extension. In Proc. of 8th Extended layer for distributed query processing on linked open Semantic Web Conference (ESWC 2011), Heraklion, data. In Grigoris Antoniou, Marko Grobelnik, Elena Crete, Greece, volume 6644 of Lecture Notes in Simperl, Bijan Parsia, Dimitris Plexousakis, Pieter Computer Science, pages 1–15. Springer, 2011. De Leenheer, and Jeff Pan, editors, The Semanic Web: [6] M. Duerst and M. Suignard. Internationalized Research and Applications, volume 6644 of Lecture Resource Identifiers (IRIs). RFC 3987 (Proposed Notes in Computer Science, pages 481–486. Springer Standard), January 2005. Berlin / Heidelberg, 2011. [7] Olaf Görlitz and Steffen Staab. Federated data [16] Andreas Schwarte, Peter Haase, Katja Hose, Ralf management and query optimization for linked open Schenkel, and Michael Schmidt. Fedx: Optimization data. In Athena Vakali and Lakhmi Jain, editors, New techniques for federated query processing on linked Directions in Web Data Management 1, volume 331 of data. In Lora Aroyo, Chris Welty, Harith Alani, Jamie Studies in Computational Intelligence, pages 109–137. Taylor, Abraham Bernstein, Lalana Kagal, Natasha Springer Berlin / Heidelberg, 2011. Noy, and Eva Blomqvist, editors, The Semantic Web - [8] Olaf Görlitz and Steffen Staab. SPLENDID: SPARQL ISWC 2011, volume 7031 of Lecture Notes in Endpoint Federation Exploiting VOID Descriptions. Computer Science, pages 601–616. Springer Berlin / In Proceedings of the 2nd International Workshop on Heidelberg, 2011. Consuming Linked Data, Bonn, Germany, 2011. [9] Peter Haase, Tobias Mathäß, and Michael Ziller. An evaluation of approaches to federated query processing over linked data. In Proceedings of the 6th International Conference on Semantic Systems, I-SEMANTICS ’10, pages 5:1–5:9, New York, NY, USA, 2010. ACM. [10] Olaf Hartig. Zero-knowledge query planning for an iterator implementation of link traversal based query execution. In Grigoris Antoniou, Marko Grobelnik, Elena Paslaru Bontas Simperl, Bijan Parsia, Dimitris Plexousakis, Pieter De Leenheer, and Jeff Pan, editors, ESWC (1), volume 6643 of Lecture Notes in Computer Science, pages 154–169. Springer, 2011. [11] Olaf Hartig, Christian Bizer, and Johann Christoph Freytag. Executing sparql queries over the web of linked data. In International Semantic Web Conference, pages 293–309, 2009. [12] Olaf Hartig and Andreas Langegger. A Database Perspective on Consuming Linked Data on the Web. Datenbankspektrum, Semantic Web Special Issue, July 2010. [13] Tom Heath and Christian Bizer. Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool, San Rafael, CA, 1 edition, 2011. [14] Bastian Quilitz and Ulf Leser. Querying Distributed RDF Data Sources with SPARQL. In Sean Bechhofer,