=Paper= {{Paper |id=None |storemode=property |title=Querying the Web of Interlinked Datasets using VOID Descriptions |pdfUrl=https://ceur-ws.org/Vol-937/ldow2012-paper-06.pdf |volume=Vol-937 |dblpUrl=https://dblp.org/rec/conf/www/AkarHED12 }} ==Querying the Web of Interlinked Datasets using VOID Descriptions== https://ceur-ws.org/Vol-937/ldow2012-paper-06.pdf

Querying the Web of Interlinked Datasets
using VOID Descriptions

Ziya Akar Tayfun Gökmen Halaç Erdem Eser Ekinci
Department of Computer Department of Computer Department of Computer
Engineering, Ege University Engineering, Ege University Engineering, Ege University
35100 Bornova, Izmir, Turkey 35100 Bornova, Izmir, Turkey 35100 Bornova, Izmir, Turkey
ziya.seagent@gmail.com tayfunhalac@gmail.com erdemeserekinci@gmail.com
Oguz Dikenelli
Department of Computer
Engineering, Ege University
35100 Bornova, Izmir, Turkey
oguz.dikenelli@ege.edu.tr

ABSTRACT [1] is published as W3C Semantic Web Interest Group note1 .
Query processing is an important way of accessing data on VOID is an RDF vocabulary and is used to describe meta-
the Semantic Web. Today, the Semantic Web is character- data of RDF datasets, in a sense, metadata of the web of
ized as a web of interlinked datasets, and thus querying the data. Linked open data cloud is represented as a graph of
web can be seen as dataset integration on the web. Also, this datasets in which datasets are represented as nodes and sets
dataset integration must be transparent from the data con- of links between datasets are represented as edges. Since
sumer as if she is querying the whole web. To decide which VOID bases on graph based soul of web of data, it provides
datasets should be selected and integrated for a query, one a strong way of describing metadata that allows to discover
requires a metadata of the web of data. In this paper, to datasets which queries are distributed over.
enable this transparency, we introduce a federated query en- In this paper, we present a federated query engine called
gine called WoDQA (Web of Data Query Analyzer) which WoDQA (Web of Data Query Analyzer) which is developed
discovers datasets relevant with a query in an automated to execute a query on distributed datasets without missing
manner using VOID documents as metadata. WoDQA fo- answers using VOID metadata of datasets in linked open
cuses on powerful dataset elimination by analyzing query data cloud. WoDQA focuses on effective dataset selection
structure with respect to the metadata of datasets. Dataset for a query and analyzes query structure to eliminate irrel-
and linkset descriptions in VOID documents are analyzed for evant datasets. Relevant datasets are selected by analyzing
a SPARQL query and a federated query is constructed. By VOID documents and considering which dataset includes a
means of linkset concept of VOID, links between datasets are resource related with the query and which links between
incorporated into selection of federated data sources. Cur- datasets allow to find a result to the query. VOID meta-
rent version of WoDQA is available as a SPARQL endpoint. data provides dataset descriptions representing content of
a dataset and linkset descriptions representing relationships
between datasets which are used by WoDQA for effective
1. INTRODUCTION dataset selection.
While the web is evolving through a structured data space, There are two main approaches which enable automated
many applications are publishing and linking their data, and query processing on the web of data and prevent data con-
the cloud of this linked and open data will be gigantic as sumers from searching for relevant datasets. The first ap-
times go. In this interlinked and structured data space, proach called follow-your-nose (link traversal) [11] is based
query execution becomes one of the most important research on following links between data to discover potentially rele-
problems and different query execution approaches and tools vant data, and the second one is query federation [7] which
have been proposed in the literature [11, 7]. The query exe- is based on dividing a query into sub-queries and distribut-
cution on the web of data is basically depends on searching ing sub-queries to relevant datasets which are selected using
for resources that satisfy our needs, but we need to discover metadata about datasets.
which parts of linked open data cloud may have such re- Follow-your-nose approach conceptualizes the web as a
sources. To make this discovery effectively, which resources graph of documents which contains dereferenceable URIs.
and vocabularies reside in a dataset and which datasets are This approach is based on executing queries on relevant
interlinked to others via interested links should be taken into documents which are retrieved by following links between re-
account. If dataset publishers provide such information by sources in different documents. But, this method raises com-
describing metadata of their datasets, relevant datasets can pleteness and performance issues. Although some heuris-
be selected effectively in an automated manner. To enable tic query planning methods can be used to answer different
this automation, Vocabulary of Interlinked Datasets (VOID) kinds of queries [10], this approach cannot guarantee find-
ing all results because relevant documents vary according
Copyright is held by the author/owner(s). 1
LDOW2012, April 16, 2012, Lyon, France. http://www.w3.org/TR/void/
to the starting point and the path. Also, although follow- On the other hand, distributed querying depends on pro-
your-nose requires nothing other than linked data principles cessing query parts directly on original data and managing
to process a query, another disadvantage is that encounter- results retrieved from distributed data. Query federation
ing large documents causes retrieval problems. The other [9, 7] and follow-your-nose [11] are mainstream distributed
approach, query federation, has raised from database liter- querying approaches. DARQ [14], FedX [15] and SPLEN-
ature, and is composed of two main steps before perform- DID [8] are the example implementations of the query fed-
ing a query. Firstly, query is divided into sub-queries and eration approach. DARQ distributes a query using dataset
datasets relevant with sub-queries are selected using some metadata called Service Descriptions3 which are constructed
metadata which reflects dataset content. Then, the query manually by query developer, and benefits from triple and
evaluation plan is changed using statistics about datasets entity counts and selectivity estimates to optimize the query
in the query optimization step. For the purpose of exe- plan. Since DARQ uses predicates to select relevant datasets,
cuting sub-queries on distributed data sources, query fed- the success of the query execution depends on associating
eration requires accessing datasets via SPARQL endpoints. datasets with predicates, and triple patterns which have un-
Contrary to follow-your-nose approach, in this approach, all bound predicates cannot be handled4 . On the other hand,
results can be found under the assumption of metadata of FedX is an extended version of Federation SAIL provided
all datasets is complete and accurate, and queries can be by AliBaba5 . Datasets which will be queried are given to
optimized before execution by estimating execution using FedX, and it checks each triple pattern existence on each
dataset metadata. To find all results in an effective way, dataset by ASK queries to decide about which triple pat-
query federation determines relevant datasets before exe- tern will be queried on which datasets [16]. These two query
cution using well-defined dataset metadata such as VOID federation implementations also stand up to self-descriptive
documents. nature of linked data since metadata of datasets should be
In the light of these ideas, WoDQA executes queries by described by data publishers as is in describing and link-
analyzing VOID documents which constitute a projection of ing their data. The last query federation implementation is
the web of data and incorporates follow-your-nose approach SPLENDID which indexes dataset using VOID descriptions,
into query federation by considering links between datasets eliminates datasets by ASK queries for triple patterns, and
in metadata. WoDQA does not change the evaluation order benefits from statistical data in VOID to optimize federated
of a query because the main focus of this initial version of queries.
WoDQA is only eliminating much more irrelevant datasets Although aforementioned query federation implementa-
in dataset selection without query optimization. Current tions aim to query linked datasets, they do not consider links
RDF federation implementations select relevant datasets by between data for dataset selection. For this reason, there
considering only predicate and type indexes. Since vocab- are some shortcomings of these implementations from query-
ularies in the Semantic Web should be common, there can ing web of data perspective. The first one is that deciding
be a lot of datasets which use a specific property or class. datasets via only predicate indexes causes inability to select
Therefore, using such indexes causes selection of redundant datasets effectively for triple patterns which have unbounded
datasets. The main contribution of WoDQA is incorporating predicates or have so general predicates such as owl:sameAs
both links between data through linkset concept and rela- and foaf:page that are extensively used in datasets6 . The sec-
tionships between triple patterns of a query into dataset se- ond shortcoming is that so many datasets may be selected
lection to eliminate irrelevant datasets effectively. We serve for triple patterns, and executing ASK queries in such a case
WoDQA as a SPARQL endpoint and a simple web form2 increases the cost notably. One need to take the structure of
to execute raw queries by analyzing datasets in the VOID the query into account to eliminate right irrelevant datasets
stores. in the web of data context.
Remaining sections are organized as follows. In Section 2, The second distributed querying approach is follow-your-
related work is discussed. Section 3 introduces general archi- nose [11] whose basic idea is traversing RDF links between
tecture of WoDQA and details dataset selection approach. data to discover relevant datasets. There is no need to any
In Section 4, usage of WoDQA is shown with a working ex- prior metadata about datasets in advance as in query fed-
ample. Finally, Section 5 concludes the paper. eration, but it needs initial URIs in some triple patterns to
start exploring datasets. The main disadvantages of this
approach are infinite link discovery, trying to retrieve large
2. RELATED WORK RDF graphs, failing to discover relevant data for queries
The Semantic Web querying approaches can be classified with only bound predicates (?s foaf:friend ?o) or type state-
as centralized and distributed. Centralized querying is based ments (?s rdf:type foaf:Person). These restrictions cause less
on collecting linked data into a single central data store, and comprehensive result sets. One of well-known follow-your-
querying the data from this store. This approach includes nose implementations is SQUIN [11] that traverse RDF links
data warehousing which collects pre-selected data sources on the fly, i.e. during query execution. Hartig et al. improve
and search engines which crawl the Web by following RDF this work using some heuristic methods that modify query
links and index discovered data [12]. But, the main disad-
vantage of this approach is that queried data is not live, i.e. 3
Service Description introduced in that paper contains in-
duplicate of original sources. On the other hand, search en- formation about triples in the dataset, limitations on access
gines cannot crawl all the web and cannot answer complete patterns, and statistical information about dataset.
4
structured queries. http://darq.sourceforge.net/#Limitations and known issues
5
http://www.openrdf.org/doc/alibaba/2.0-beta6/alibaba-
2 sail-federation/
The simple web form and up-to-date endpoint address can
6
be found on http://seagent.ege.edu.tr/etmen/wodqa.html SPLENDID also uses type indexes, but it is still not enough
page. since vocabularies can be used frequently.
evaluation order to reduce execution cost and to provide
more comprehensive results [10], but the results strictly de-
pend on the starting point and the evaluation order. On the
other hand, Bouquet et al. formalize the web of data and
suggest three different querying methods exploiting their
web of data formalization [4]. These methods based on
merging relevant graphs to execute queries on them. One of
these methods uses follow-your-nose approach which speci-
fies and merges relevant graphs by looking up URIs before
query execution.
WoDQA aims to query the web of interlinked datasets us-
ing VOID dataset and linkset descriptions to decide relevant Figure 3.1: WoDQA internal architecture
datasets for a query. At first, it assumes that all datasets
are relevant with a query, then irrelevant datasets are elim-
inated by analyzing query structure in the light of meta- SERVICE expressions11 . Details of QueryReorganizer are
data of datasets. Its novelty is considering query structure given in Subsection 3.2.
and links between datasets to select relevant datasets be- The last module is query executor which directly uses Jena
fore query execution, and thus it incorporates follow-your- ARQ to execute SPARQL queries including the SERVICE
nose approach into the query federation. To the best of expressions inserted by the QueryReorganizer. The feder-
our knowledge, WoDQA is the first query engine which uses ated query constructed by QueryReorganizer is passed to
datasets and linksets together that are critical elements of ARQ to be executed. Results of query execution are re-
VOID to describe dataset metadata. turned to the querior. In the following subsections, the first
two modules which implement WoDQA analysis and reor-
ganization phases are explained.
3. WODQA INTERNAL ARCHITECTURE
In this section, query processing architecture of WoDQA 3.1 Dataset Analyzer
is explained in detail. Since it is impractical to perform a This section introduces the details of the DatasetAna-
query on all published datasets on the web, WoDQA aims lyzer module which is the core and the innovative part of
to transform a query into a federated query which is evalu- the current version of WoDQA. Unlike other query federa-
ated only on relevant datasets. In this direction, to process a tion approaches, WoDQA considers triple pattern relations
query on the linked data cloud, WoDQA contains three main and links between datasets while selecting datasets. Thanks
modules as seen in Figure 3.1: DatasetAnalyzer, QueryRe- to dataset analysis of WoDQA, relevant datasets are speci-
organizer and Jena ARQ 7 . fied while plenty of irrelevant ones are excluded. Output of
Dataset publishers construct the VOID documents of their dataset analysis is a subset of all published datasets on the
datasets and the Semantic Web programmers can access web of data, and thus the query is performed only on this
these documents through services called VOID store such subset including related ones. For the purpose of explaining
as voiD Browser8 , CKAN9 and voiDStore10 . A VOID store how this subset is constructed, we give a formalization in
generates a projection of Linked Open Data, and thus this this section.
structure obliges dataset publishers to create well-defined We firstly give a definition of the web of data to formalize
VOID document which reflects actual content of the dataset our dataset selection approach. In summary, the web of data
to enable including the dataset in relevant queries. Datase- is an RDF graph which is constructed by typed links between
tAnalyzer is the module which is responsible for discover- data from different sources. Basically an RDF graph (G) is
ing relevant datasets and eliminating irrelevant ones using formally represented as a set of triples in the form of hs, p, oi:
VOID documents of datasets in the VOID stores. We as- G = {hs, p, oi | hs, p, oi∈ (I ∪ B) × I × (I ∪ B ∪ L)} where I
sume that dataset publishers update the description doc- is the set of IRIs[6], B is the set of blank nodes, L is the
uments in the VOID stores to make VOID stores up-to- set of literals, and all are RDF terms T = I ∪ B ∪ L . In
date for dataset selection when datasets are changed. In the this direction, web of data is the global graph (Gwod ) which
current version of WoDQA, DatasetAnalyzer discovers the consists of the triples constructed from IRIs, blank nodes
VOID documents from the CKAN net, and analyzes dataset and literals on the web. Gwod is a model of mathematical
and linkset descriptions for each triple pattern in the query. RDF construct for the web of data.
This analysis eliminates irrelevant datasets which definitely From another perspective, the web of data means web
do not contain any result contributing to the result of the of interlinked datasets [3]. A dataset (δ) is a meaningful
query by assuming that accurate and complete VOID docu- set of RDF triples [1] which decreases granularity of the
ments of datasets are available. Dataset analysis is achieved web. Rather than publishing information only as single re-
by a rule-based approach. We explain the rules which dis- sources and connecting these resources, datasets are the way
covers relevant datasets Subsection 3.1. of publishing information as sub-graphs of Gwod . These sub-
The second module is QueryReorganizer which rewrites graphs, i.e. datasets, are connected via RDF triples which
queries depending on results of DatasetAnalyzer. This rewrit- connect resources in different datasets [13]. The publishers
ing process constructs federated SPARQL queries including create their resources and deploy them into the datasets on
7
the web, and consumers use these resources while creating
http://jena.sourceforge.net/ARQ/ their datasets. With regard to this, to formalize a dataset,
8
http://kwijibo.talis.com/voiD/ we use subj (G) which represents the set of resources which
9
http://ckan.net/
10 11
http://void.rkbexplorer.com/ http://www.w3.org/TR/sparql11-federated-query/
are the subjects of triples in a graph.

Definition 1. A dataset is a sub-graph of web of data,
δx ⊂ Gwod , and the resources which are included by δx are
specified as follows: ∀r, δi (r ∈ subj (δx ) → Owner (δx , r)).

VOID describes a dataset with well-defined properties12 ,
and we formalize a VOID dataset description as a tuple
hLspace , I voc i ∈ L × I. The first dataset property Lspace
corresponding to void:uriSpace set which contains string lit-
erals that all entity IRIs in a dataset start with. The other Figure 3.2: Example VOID Models
one is I voc corresponding to void:vocabulary which denotes
the set of vocabularies used by the dataset13 .
The triples whose object is a resource in another dataset
datasets. Since this is impractical, our purpose is eliminat-
make the web of data a graph of interlinked datasets. We
ing irrelevant datasets for each triple pattern. By elimina-
call such triples link triples, and define the set of link triples
tion of irrelevant datasets, we construct a federated query
as LT = {(s, p, o) |owner (s) 6= owner (o)} where s, o ∈ I.
which distributes sub-queries to only relevant ones. In or-
This definition leads us to define link predicate (plink ) con-
der to eliminate irrelevant datasets, we introduce a set of
cept which corresponds to void:linkPredicate used to define
rules which discover the relevant datasets in dataset analysis,
a linkset which is an important contribution of VOID ef-
called relevant dataset discovery rules. A relevant dataset
fort. Set of link predicates P link includes all the predicates
for a triple pattern is formalized as an assertion ρ(δx , tpi )
which are used in a link triple: P link = {plink |∃ s, plink , o ∈ which denotes that the dataset δx may contain a result for
LT }. A linkset represents link triples which connect re- the triple pattern tpi . We need to discuss how relevant
sources in different datasets using the D same link predicate.E dataset assertions (ρ) inferred by discovery rules used for
We formalize the linkset, λ, as a tuple δλf rom , δλto , plink
λ ∈ eliminating irrelevant datasets. To explain the method of
∆ × ∆ × P link where ∆ is the set of all datasets on the web. eliminating irrelevant datasets, assume that Qtpi is the set
In this definition, δλf rom is the referrer dataset which is the of datasets selected to be queried for tpi which we call se-
owner of subject of link triples in the linkset, δλto is the refer- lected set, and this set initially contains all datasets on the
enced dataset which is the owner of object of link triples in web, Qinit
tpi ≡ ∆ (all datasets in a VOID store in our case).

the linkset, and plink is the link predicate of all link triples Each rule analyzes datasets in Qtpi of each triple pattern.
λ
in the linkset. After applying a rule, elements of Qtpi which the rule does
DatasetAnalyzer uses both dataset and linkset descrip- not infer relevant dataset assertion about are removed from
tions of VOID metadata to select the relevant datasets, in Qtpi . But, if the rule does not imply any relevant dataset
other words sub-graphs, which may contain the results of assertion, Qtpi remains the same. Irrelevant dataset elimi-
the query. By this means, the query is transformed into a nation method is formalized as Qnew tpi below to denote such

federated query and executed on the relevant datasets on an update of selected set subsequent to executing a rule.
the web. Accordingly, we exclude the datasets which do not
∃δa (ρ (δa , tpi )) ; {δx | (δx ∈ Qtpi ) ∧ ρ(tpi , δx )}

contain any result for the query while querying the global Qnew =
tpi
∃0 δa (ρ (δa , tpi )) ; Qtpi
graph, Gwod . Beside analyzing relationships between VOID
descriptions (datasets, linksets) and triple pattern, relation-
ships between triple patterns in a query are considered. For In the following subsection we give relevant dataset discov-
this reason, we need to give a formal definition of SPARQL ery rules in detail and we use some queries to exemplify ap-
queries. plication of rules. Figure 3.2 shows VOID models of a set of
We consider a subset of SPARQL queries which corre- datasets which are used in these examples. This model con-
sponds to a basic graph pattern for formalization [5]. A tains simplified VOID descriptions of five datasets and the
basic graph pattern consists of triple patterns, BGP = {tpi | linksets connecting these datasets by link predicates. Link
hstpi , ptpi , otpi i ∈ (I ∪ V) × (I ∪ V) × (I ∪ L ∪ V)}. A triple predicates are showed by arrows between dataset descrip-
pattern is slightly different from RDF triple since it contains tions which are represented with squares. Also we create a
at least one and at most three variables that are elements sample dataset called Facebook which keeps the data about
of infinite set V 14 . While performing a query, its variables Facebook users in our local store. This data is about that
are replaced by RDF terms. Thus, according to semantics movie resources located in LinkedMDB dataset are liked by
of SPARQL, a query result is a set of solution mappings, which users. Although there can be a lot of linksets, in Fig-
{µ|µ : V → T }, where a solution mapping, µ, is a partial ure 3.2, only a few linksets are taken into account to explain
function from variables to RDF terms. our rules are depicted.
To perform a query on the web of data, in the worst
case, each triple pattern has to be queried on all published 3.1.1 Relevant Dataset Discovery Rules
12 This subsection presents a set of rules each of which repre-
DatasetAnalyzer of the current version of WoDQA consid- sents an analysis method of the DatasetAnalyzer module to
ers only these properties for the sake of simplicity. We plan
to integrate other properties such as statistics in the future discover relevant datasets effectively. Each relevant dataset
to make optimized queries. discovery rule aims to analyze datasets from different per-
13 spectives and combinations of them to infer relevant dataset
Note that schemas of the Semantic Web languages such as
RDFS and OWL are not specified in I voc . (ρ) assertions for triple patterns. These perspectives can
14
We exclude blank nodes in query. be classified into three groups. The first one is analyzing
IRIs in the triple patterns. We call this perspective IRI- Another perspective to discover relevant datasets is Link-
based Analysis where namespaces of IRIs and vocabularies ing Analysis. Since a triple links two resources in the same
in the VOID documents are considered to determine relevant dataset or in different datasets, this perspective is separated
datasets. The second one is considering linked resources. into two kinds of analyses, each of which considers different
We call this perspective Linking Analysis which considers kind of triples. The first one is Internal Linking Analysis
whether a triple links two resources in the same dataset (in- which considers triples linking resources in the same dataset.
ternal) or in different datasets (external) to eliminate irrele- A relevant dataset found by this analysis is called internal
vant datasets. The last perspective is Shared Variable Anal- relevant dataset, and is represented with ρint (δx , tpi ). On
ysis. Since triple patterns share some variables, each triple the other hand, External Linking Analysis considers link
pattern affects relevant datasets of other triple patterns that triples which connect resources in different datasets. In this
include same variables. Relevant dataset discovery rules of case, linkset descriptions of VOID documents are taken into
all perspectives are introduced in this section. account to discover relevant datasets, and a dataset found
The first two rules are under the IRI-based analysis per- by this analysis is called external relevant dataset which is
spective each of which considers vocabularies I voc of VOID represented as ρext (δx , tpi ). Internal and External Link-
metadata. The first discovery rule checks whether IRIs in ing analyses are both executed for a triple pattern in whole
triple patterns are RDFS (or OWL) classes or properties in Linking Analysis process, and then produced internal and
the vocabulary set (I voc ) of VOID documents. To give the external relevant datasets for a triple pattern are unified as
rule, we define has Iδvoc
x
, r function which represents that a relevant datasets for the triple pattern as shown in Rule 3.
resource r ∈ I (a property or a class IRI) is included by one
of the vocabularies in Iδvoc
x
. Using has expression, relevance Rule 3. Union of external and internal datasets for a
of a dataset δx to an IRI r is represented with V ocM atch as triple pattern constitutes relevant datasets for the triple pat-
shown in Definition 2. tern.
Definition 2. ∀δx (has(Iδvoc
x
, r) → V ocM atch(δx , r)) where ∀δx , tpi ρint (δx , tpi ) ∨ ρext (δx , tpi ) → ρ (δx , tpi )

r∈I
For a triple pattern such as ?s dbpprop:name15 “Nikola Irrelevant dataset elimination method considers relevant
Tesla”, dbpprop:name RDFS property in the predicate po- datasets which are specified by Rule 3 to eliminate irrelevant
sition obliges that matching triple patterns can only be in datasets from selected set of a triple pattern (Qtpi ). Inter-
datasets which uses dbpprop vocabulary. Therefore, we can nal and external datasets are intermediate results to infer
eliminate datasets which do not use dbpprop vocabulary. relevant datasets in a Linking Analysis.
This situation is handled by Rule 1 which is similar to pred- The first two rules under Linking Analysis perspective
icate indexes. are linking-to-IRI discovery rules. Consider the example
triple pattern, ?film owl:sameAs dbpedia:A Fistful of Dollars,
Rule 1. If there is a dataset in which one of its vocab- which can be matched with a triple that links a resource to
ularies includes the predicate of a triple pattern, then it is dbpedia:A Fistful of Dollars resource. Triple patterns whose
relevant for the triple pattern. object is an IRI are analyzed by these rules. Since own-
ers of the linked IRI must be known in these rules, we
∀tpi , δx (V ocM atch (δx , ptpi ) → ρ (δx , tpi )) give a definition that depicts the owners of any resource
on the basis of IRI Analysis. We formalize inclusion of a
According to Figure 3.2, DBpedia uses dbpprop vocabu- resource (r ∈ I) by a dataset (δx ) in Definition 3 by us-
lary, and the rule decides that it is relevant for such a triple ing the urispaces (Lspace ) property of the VOID description
pattern. In web of data, lots of datasets which uses dbpprop and startsW ith r, Lspace function which represents that r
δx
vocabulary can be found, and they can be eliminated using starts with one of the urispaces in Lspace .
δx
outputs of other discovery rules.
To introduce Rule 2, consider another triple pattern, ?pro- Definition 3. ∀δx (startsW ith(r, Lspace ) → Owner(δx , r))
δx
ducer rdf:type linkedMDB:producer, which contains a type
definition for the variable ?producer. In such cases, the ob- Rule 4 is linking-to-IRI internal discovery rule from the in-
ject of the triple pattern is a class definition, it makes sense ternal linking point of view. According to the example query
to eliminate the datasets which do not use the vocabulary ?film should be in the same dataset with dbpedia:A Fistful of
of this class. Rule 2 resembling type indexes is used to spec- Dollars, i.e. owner of the dbpedia:A Fistful of Dollars resource.
ify relevant datasets for such triple patterns. The example Therefore appropriate triple patterns can be found in DB-
triple pattern is queried from the datasets that use linked- pedia dataset.
MDB vocabulary, i.e. LinkedMDB dataset for our example
model. Other datasets do not include a resource which is Rule 4. If there is a triple pattern whose object is an IRI,
an instance of linkedMDB:producer class, and therefore they then owner datasets of the IRI are internal relevant for the
are eliminated by output of this rule. triple pattern.
Rule 2. If there is a dataset in which one of its vocabu-
∀tpi , δx (δx ∈ Qtpi ) ∧ Owner (δx , otpi ) → ρint (δx , tpi )

laries includes the object of a triple pattern when the predi-
where otpi ∈ I, stpi ∈ V
cate of the triple pattern is rdf:type, then the dataset is rel-
evant for the triple pattern. On the other hand, from the external linking point of
view, appropriate triples can be found in datasets which are
∀tpi , δx (V ocM atch(δx , otpi ) ∧ (ptpi = rdf:type) → ρ(δx , tpi )) linked to the owner datasets of object IRI. For our exam-
15
All prefixes used in the paper are defined in Table 1. ple triple pattern, ?film should be in datasets which contain
link triples whose object resource is defined in DBpedia. For Rule 7 is IRI-links-to external discovery rule and it finds
this analysis, linkset descriptions of VOID documents are the triples that connect resources in different datasets. Ac-
used. To discover relevant datasets for a triple pattern by cording to this rule an owner dataset of the subject IRI of
using linking-to-IRI external discovery rule, the triple pat- the triple pattern is external relevant only when there is
tern should have a bound predicate. We define Compatible a linkset definition that includes owner dataset as referrer
expression in Definition 4 to represent that which linkset dataset compatible with the triple pattern.
description is appropriate to use for determining relevant
datasets for a triple pattern. Rule 7. If there is a linkset description that is compati-
ble with the triple pattern and whose referrer dataset is an
Definition 4. If selected set of a triple pattern has the owner dataset of the triple pattern’s subject, then the refer-
referrer dataset of a linkset description and link predicate rer dataset of the linkset description is external relevant for
of the linkset description is same with the triple pattern’s the triple pattern.
predicate, then the linkset description is compatible with
the triple pattern. ∀tpi , λm , δx (Compatible(λm , tpi ) ∧ Owner(δx , stpi ) ∧ (δx =
δλf m
rom
) → ρext (δx , tpi )) where otpi ∈ V, stpi ∈ I
∀λm , tpi ((δλf m
rom
∈ Qtpi ) ∧ (plink
λm = ptpi ) → Compatible(λm ,
tpi )) According to the example, DBpedia is the owner dataset
of dbpedia:Ennio Morricone and also there is a linkset de-
Considering ?film owl:sameAs dbpedia:A Fistful of Dollars scription whose referrer dataset is DBpedia and whose link
triple pattern, and remembering the linkset description of predicate is owl:sameAs.
our example model in Figure 3.2, there are linksets from Other discovery rules in Linking Analysis are combined
LinkedMDB and YAGO datasets to DBpedia dataset whose with Shared Variable Analysis. The first discovery rule
link predicates are owl:sameAs. Rule 5 which is under the which conforms to this combined analysis is Chaining Triple
Linking Analysis perspective gives these two datasets as ex- Patterns Analysis. This rule considers two triple patterns to-
ternal relevant datasets for this triple pattern. If a dataset gether to discover relevant datasets. This is a characteristic
is not linked to DBpedia by owl:sameAs predicate then one of Shared Variables Analysis, since it depends on analyzing
can conclude that this dataset is irrelevant with the triple more than one triple pattern that have same variable. Triple
pattern. patterns below are example of chaining triple patterns:
?s owl:sameAs ?film.
Rule 5. If there is a linkset description that is compati- ?film linkedMDB:producer name “Sergio Leone”
ble with the triple pattern and whose referenced dataset is an Notice that the second triple pattern is used for querying
owner dataset of the triple pattern’s object, then the referrer films whose producer name is “Sergio Leone”. Thus, the sec-
dataset of the linkset description is external relevant for the ond triple pattern affects the relevant datasets of the first
triple pattern. triple pattern. From internal linking point of view, the first
triple pattern can be found in datasets which satisfy the sec-
∀tpi , λm , δx (Compatible(λm , tpi ) ∧ Owner(δx , otpi ) ∧ (δx = ond triple pattern because ?s and ?film should be in the same
δλtom ) → ρext (δλf m
rom
, tpi )) where otpi ∈ I, stpi ∈ V dataset. In this direction, Internal Chaining Triple Pattern
Analysis formalized in Rule 8 is used to discover internal
Recall that internal and external relevant datasets are uni- relevant datasets for triple patterns.
fied by Rule 3 after applying rules in Rule 4 and Rule 5.
Thus, the final relevant datasets which are selected by the Rule 8. If there is a triple pattern whose object is same
linking-to-IRI rules are DBpedia, LinkedMDB and YAGO. with the subject of another triple pattern, then datasets in-
Another couple of rules under the Linking Analysis per- cluded by selected sets of both triple patterns are internal
spective are IRI-links-to rules. These rules are applied to relevant.
triple patterns whose subject is an IRI and object is a vari-
able to determine relevant datasets from internal and ex- ∀δx , tpi , tpj ((otpi = stpj ) ∧ (δx ∈ Qtpi ) ∧ (δx ∈ Qtpj ) →
ternal linking point of view. Rule 6 is IRI-links-to internal ρint (δx , tpi ) ∧ρint (δx , tpj )) where otpi , stpi ∈ V
discovery rule and it finds the triples that link resources in
the same dataset. One can conclude that if the subject of To execute Chaining Triple Pattern Analysis, execution
a triple pattern is an IRI, then triples matching with this order of rules becomes important. To exemplify this situ-
triple pattern are in the owner dataset of this IRI. ation according to Chaining Triple Patterns query, assume
that IRI-based analysis is applied before, and LinkedMDB
Rule 6. If a dataset is an owner of the subject of a triple is the relevant dataset for the second triple pattern since
pattern, then this dataset is internal relevant dataset for the LinkedMDB VOID includes linkedMDB as the value of vo-
triple pattern. cabulary. Then, this rule can specify that the internal rel-
evant dataset for the first triple pattern is LinkedMDB. It
∀tpi , δx (δx ∈ Qtpi ) ∧ Owner(δx , stpi ) → ρint (δx , tpi ) where

is clearly seen from this example, Shared Variable Analysis
otpi ∈ V, stpi ∈ I should be performed after the execution of IRI-based Anal-
ysis rules to eliminate more datasets. Hence, in the Sub-
Consider the triple pattern dbpedia:Ennio Morricone owl: section 3.1.2, we give an overview of analysis process which
sameAs ?person. Subject of this triple pattern is an IRI specifies an execution order for these rules.
whose namespace is dbpedia, and therefore an internal rel- After the IRI-based analysis, if we apply Rule 8 for the ex-
evant dataset is DBpedia whose VOID metadata contains ample triple patterns, relevant datasets of the second triple
dbpedia as value of urispace property. pattern is shown as Qtp2 = {δLinkedM DB }, and of the first
one is shown as Qtp1 ≡ ∆. For this case, this rule asserts triple patterns, then referrer datasets of the linkset descrip-
that ρint (δLinkedM DB , tp1 ) and ρint (δLinkedM DB , tp2 ). tions are external relevant for the triple patterns.
On the other hand, External Chaining Triple Patterns
analysis uses linkset descriptions while considering two triple ∀λm , λn , tpi , tpj ((otpi = otpj )∧Compatible(λm , tpi )∧Com−
patterns. While internal one can be used for all triple pat- patible(λn , tpj )∧(δλtom = δλton ) → ρext (δλf m
rom
, tpi )∧ρext (δλf nrom ,
terns without considering the predicate, external rule is ap- tpj )) where otpi ∈ V
plied for a triple pattern which has a link predicate. Rule 9
introduces this rule. For the example of Object Sharing Triple Pattern Anal-
ysis, assume that no rule is applied before this analysis
Rule 9. If there is a triple pattern (tpi ) whose object is and selected sets are Qtp1 ≡ ∆ and Qtp2 ≡ ∆. In Fig-
same with the subject of another one (tpj ), and there is a ure 5, δDBpedia , δY AGO , and δLinkedM DB have link triples
linkset description which is compatible with (tpi ) and its ref- with predicate owl:sameAs. But, only δF acebook has a linkset
erenced dataset is included by the selected set of tpj , then to δLinkedM DB with predicate facebook:likes, and therefore
referrer dataset is external relevant for tpi and referenced Qnew
tp1 = {δF acebook }. In this case, tp2 can only be queried
dataset is external relevant for tpj . on the datasets which is linked to δLinkedM DB with predi-
cate owl:sameAs, and thus Qnewtp2 = {δDBpedia }. Other triples
∀λm , tpi , tpj ((otpi = stpj ) ∧ Compatible(λm , tpi ) ∧ (δλtom ∈ whose subject corresponding to ?film in other datasets ac-
Qtpj ) → ρext (δλf mrom
, tpi ) ∧ ρext (δλtom , tpj )) where otpi ∈ V cording to tp2 cannot include objects that satisfy object of
tp1 . As done in other Linking Analysis methods, internal
With respect to the example of chaining triple patterns, and external relevant datasets which are inferred by Rule 10
assume that selected set for the first triple pattern is Qtp1 ≡ and Rule 11 are unified according to Rule 3.
∆, and for the second one is Qtp2 = {δLinkedM DB }. Rule The last analysis from the Shared Variable Analysis per-
9 determines that δDBpedia is external relevant for tp1 since spective considers the triple patterns which have the same
resources ?film can be found in δLinkedM DB and there is a subject called Subject Sharing Triple Patterns Analysis. This
linkset between these two datasets with owl:sameAs predi- rule does not use Linking Analysis perspective, and thus it
cate. It is clear that no other dataset can contain an appro- does not contain Internal and External Linking Analysis.
priate triple if it is not linked to LinkedMDB by owl:sameAs. Consider the following example triple patterns for Subject
At the end of Chaining Triple Pattern Analysis, internal and Sharing Triple Pattern Analysis:
external relevant datasets specified by Rule 8 and 9 are uni- ?city dbpprop:name “Izmir” .
fied according to Rule 3. ?city dc:terms ?subject.
Another analysis which uses both Linking Analysis and According to our dataset definition, triples which have the
Shared Variable Analysis is Object Sharing Triple Patterns same subject are included by the same dataset. Based on
Analysis. For triple patterns which have the same object this, Rule 12 infers datasets which are relevant for triple pat-
variable, only the datasets can include triples which satisfy terns by taking the intersection of selected sets into account.
the object variable of both triple patterns. To simplify the Assume that selected sets for triple patterns are Qtp1 =
explanation, we use the following example for Object Shar- {δDBpedia } and Qtp2 ≡ ∆. According to the rule, the fi-
ing Triple Pattern Analysis below: nal datasets according to this rule are Qnew tp1 ≡ Qnew
tp2 =
?person facebook:likes ?movie. {δDBpedia }.
?film owl:sameAs ?movie.
From internal linking point of view, ?person and ?film Rule 12. If there is a triple pattern whose subject is same
should be in the same dataset with ?movie. Rule 10 speci- with the subject of another triple pattern, then the datasets
fies the internal relevant datasets for triple patterns which included by selected sets of both triple patterns are relevant.
have the same object. Assume that Qtp1 = {δF acebook }
due to value of vocabulary property of Facebook VOID and ∀δx , tpi , tpj ((stpi = stpj )∧(δx ∈ (Qtpi ∩Qtpj )) → ρ(δx , tpi )∧
Qtp2 ≡ ∆. This rule determines that only δF acebook can ρ(δx , tpj )) where stpi ∈ V
contain internal triples that satisfy triple patterns together.
Up to this point, we have given the relevant dataset dis-
Rule 10. If there is a triple pattern whose object is same covery rules which are used to determine relevant datasets
with the object of another one, then the datasets included by from different perspectives. These rules are executed to-
selected sets of both triple patterns are internal relevant. gether for a query to make a complete analysis. Next section
introduces the analysis process which specifies the execution
∀δx , tpi , tpj ((otpi = otpj ) ∧ (δx ∈ Qtpi ) ∧ (δx ∈ Qtpj ) → order for rules.
ρint (δx , tpi ) ∧ρint (δx , tpj )) where otpi ∈ V 3.1.2 Analysis Process
To execute Object Sharing Analysis from external point The rules introduced above should be executed together
of view, we benefit from the linkset descriptions. To find ap- to provide effective dataset selection. In this section, exe-
propriate link triples, ?person and ?film should be in different cution of rules are explained in the process of DatasetAn-
datasets. Rule 11 determines external relevant dataset for alyzer which is shown Figure 3.3. In the figure, QBGP =
triple patterns which have the same object. This rule consid- {Qtpi |tpi ∈ BGP } is the set of selected sets of all triple pat-
ers two linkset descriptions together for two triple patterns. terns in a query. We use Qinit
BGP to represent the initial state
where selected set of each triple pattern contains the whole
Rule 11. If two triple patterns have the same object, and web of data, ∀Qtpi ∈ QinitBGP (Qtpi ≡ ∆). This set is the in-
there are two linkset descriptions which have the same ref- put of the single step analysis, and selected sets in this set are
erenced dataset each of which is compatible with one of the constrained by execution of the rules includes the IRI-based
Algorithm 1 This algorithm divides query into sub-triples
FUNCTION DivideT riples()
INPUT bgp = {tp1 , . . . , tpn } including n triple patterns;
LET i := 1, α := 1;
Figure 3.3: WoDQA Analysis Process LET SubT riplesα := {tpi };
WHILE i < n DO
IF Qtpi+1 = Qtpi THEN
analyses. Single step analysis includes vocabulary match, LET SubT riplesα := SubT riplesα ∪ {tpi+1 };
linking-to-IRI and IRI-links-to rules which are executed only ELSE
once. The reason is that the rules based on IRI-based analy- LET SubT riplesα+1 := {tpi+1 };
sis produce the same result for every execution because they LET α := α + 1;
do not depend on current Qtpi . Although the rules in sin- LET i := i + 1;
gle step analysis do not have a specific order, using output
of each rule in dataset elimination method reduces current
datasets set of the triple patterns. Then, Qconstrained
BGP is sents the federated form of the initial query which contains
given to the repetitive analysis phase. ordered service graph patterns. WoDQA executes the reor-
On the other hand, rules based on Shared Variable Anal- ganized query using Jena ARQ query engine.
ysis take more than one triple pattern into consideration. Besides grouping triple patterns, although WoDQA does
Since different combinations of triple patterns affect the se- not include query optimization phase of query federation ap-
lected sets of each other, these rules depend on current se- proach, only moving up FILTER expression[2] optimization
lected sets of triple patterns to discover the datasets rele- technique is used. Thus intermediate results are filtered as
vant to the triple patterns. For this reason, they are exe- early as possible. Furthermore, WoDQA supports queries
cuted repetitively until no dataset is eliminated from any include UNION and OPTIONAL keywords but queries in-
Qtpi . The repetitive analysis phase produces Qnew BGP by ex- clude GRAPH keyword and blank nodes are not supported.
ecuting Shared Variable Analysis rules. After the phase is
completed once, if an elimination has been done, Qnew BGP is
given to repetitive analysis as Qconstrained
BGP , and the phase is 4. USAGE SCENARIO
repeated. On the other hand, when the rules do not change There are two ways for users to benefit from WoDQA.
any selected set, i.e. Qnew
BGP ≡ QBGP
constrained
, the analysis is The first one is the SPARQL endpoint of WoDQA18 which
finished and the result is produced as QfBGPinal
. can be used to redirect raw queries. One can construct a
SPARQL query with a SERVICE block including the raw
3.2 Query Reorganizer query, and use WoDQA SPARQL endpoint as the remote
The QueryReorganizer module is responsible for rewrit- service. When this query is executed, the WoDQA endpoint
ing a query using final selected set of each triple pattern is invoked, and this endpoint transforms the query into a
federated form by means WoDQA and executes this feder-

QfBGP
inal
decided by the DatasetAnalyzer. While rewriting
ated query on the relevant dataset transparently to the user.
a federated query, Query Reorganizer conforms to SPARQL
1.1 federation extension16 . The other way is using the web form of the WoDQA19 .
While the initial query is a set of triple patterns, QueryRe- In this section, a sample query execution on the web form
organizer divides the query into sub-queries and makes it a of WoDQA is explained. Reorganized form of the query,
set of service graph patterns each of which is represented results of select and construct queries and execution time
with a tuple sgpα = hSrvα , SubT pα i. A service graph pat- can be observed in this form.
tern consists of a Srv set which includes datasets17 to send The example query seen in the WoDQA web form in Fig-
the sub-query, and a SubT p set which is the subset of the ure 4.1 searches for an answer to “Which facebook users like
triple patterns of the initial query, i.e. a sub-query. Per- movies which are produced by a German producer?”. This
forming a service graph pattern is unifying the results of the query is represented as BGP = htp1 , . . . , tp5 i where
sub-query in each dataset in Srvα . tp1 = h?faceUser,facebook:likes,?moviei,
Triple patterns which have the same selected set (Qtpi ) tp2 = h?movie,linkedMDB:producer,?produceri,
are added to the same service graph pattern to decrease net- tp3 = h?dbProducer,owl:sameAs,?produceri ,
working cost. But, only consecutive triple patterns can be tp4 = h?anyMovie, dbpo:producer, ?dbProduceri,
in the same sub-query because WoDQA does not change the tp5 = h?dbProducer, dbpo:birthPlace, dbpedia:Germanyi.
evaluation order of the query. In this direction, elements of We explain how relevant datasets are found according
SubT p sets of service graph patterns are found by Algorithm to the WoDQA Analysis process introduced in Figure 3.3.
1. Initially, single step analysis phase is performed for this
Datasets of a sub-query is formalized as Srvα = {δx | query. Selected sets of tp1 and tp2 are eliminated via out-
δx ∈ Qtpi , tpi ∈ SubT pα }, and service endpoint URLs are put of predicate vocabulary match, and in the consequence
procured from VOID documents of the datasets. The output of single step analysis they are Qtp1 = {δF acebook } and
of the Query Reorganizer is shown as ReorganizedBGP = Qtp2 = {δLinkedM DB }. No relevant dataset is found for tp3
hsgp1 , . . . , sgpm i where 1 ≤ m ≤ n and n is the number of in the single step analysis, because owl:sameAs is a generic
triple patterns of the initial query. ReorganizedBGP repre-
18
Up-to-date WoDQA SPARQL endpoint address can
16
http://www.w3.org/TR/sparql11-federated-query/ be found on http://seagent.ege.edu.tr/etmen/wodqa.html
17 page.
In the implementation, SPARQL endpoints of datasets are
19
used in SERVICE expressions. http://seagent.ege.edu.tr/etmen/wodqa.html
VOID descriptions which include a SPARQL endpoint to
query the dataset, reflect actual content of the dataset com-
pletely and accurately, and include linksets between datasets
to select datasets effectively. WoDQA allows users to con-
struct raw queries without the need to know how query will
divide into sub-queries and where sub-queries are executed.
Query results are complete under the assumption of avail-
able, accurate and complete VOID descriptions of datasets.
The initial version of WoDQA which is introduced in this
paper has some disadvantages arising from query federation
approach which WoDQA builds upon. As mentioned previ-
ously, follow-your-nose has some problems such as missing
results and large document retrieval. Similar problems may
occur for query federation. Firstly, to find complete results
to queries, it is required that metadata of all datasets must
be well-defined and accurate. But, to provide such an ac-
curate dataset metadata an automated mechanism which
continuously updates the metadata is required. However,
even there would be a tool which implements this require-
ment, providing accurate dataset metadata via such a tool
is the responsibility of dataset publishers.
Another problems of query federation are high latency
and low selectivity of datasets which are similar to retrieval
of large documents in follow-your-nose. Query optimization
can be a solution for these problems of query federation.
Grouping triple patterns to filter more triples on an end-
point can prevent high latency (required processing time)
Figure 4.1: Example query execution with WoDQA and changing query evaluation order according to dataset
selectivity statistics can prevent retrieving large result sets.
To make WoDQA functioning in the wild, optimization step
property and owl is not defined as vocabulary property in of query federation is required to be implemented. We plan
VOIDs. Thus, Qtp3 still includes all datasets (∆). Predicate to incorporate triple pattern selectivity into query reorgani-
vocabulary match discovers relevant datasets for tp4 and tp5 , zation using VOID properties about statistics.
and their selected sets are Qtp4 = Qtp5 = {δDBpedia }. On the other hand, we could not make an evaluation of our
After the single step analysis is applied to all triple pat- approach in this paper, since VOID documents in current
terns, the repetitive analysis phase is performed, and se- VOID stores are not well-defined. Since SPARQL endpoint
lected set of tp3 is eliminated in this phase. Subject Sharing definitions, linkset descriptions or vocabularies are missing
Triple Patterns Analysis discovers relevant datasets for tp3 in most of VOID documents, we could not find a chance to
since tp3 and tp5 have the same subject variable, and thus execute comprehensive scenarios. Developing a tool which
its selected set becomes Qtp3 = {δDBpedia }. extracts well-defined VOID descriptions of datasets, and by
The reorganized query shown in Figure 4.1 is rendered this means evaluating our approach is a required future work
with these analysis results by QueryReorganizer and for- to confirm applicability of WoDQA on linked open data.
malized as ReorganizedBGP = hsgp1 , sgp2 , sgp3 i where Also, evaluating the analysis cost of WoDQA for a large
sgp1 = h{δF acebook } , {tp1 }i, VOID store will be possible when well-defined VOIDs are
sgp2 = h{δLinkedM DB } , {tp2 }i, constructed.
sgp3 = h{δDBpedia } , {tp3 , tp4 , tp5 }i.
After the reorganizing process, the triple patterns in the
service graph patterns are executed on related endpoints by 6. REFERENCES
means of Jena ARQ, and query results are incrementally [1] K. Alexander, R. Cyganiak, M. Hausenblas, and
collected. In conclusion, results related with the query are J. Zhao. Describing Linked Datasets - On the Design
listed at the bottom of the web form page as seen in Figure and Usage of voiD, the ’Vocabulary of Interlinked
4.1. Datasets’. In WWW 2009 Workshop: Linked Data on
the Web (LDOW2009), Madrid, Spain, 2009.
5. CONCLUSION [2] Abraham Bernstein, Christoph Kiefer, and Markus
In this paper, we have introduced a query federation en- Stocker. OptARQ: A SPARQL Optimization
gine called WoDQA that discovers related datasets in a VOID Approach based on Triple Pattern Selectivity
store for a query and distributes the query over these datasets. Estimation. Technical Report ifi-2007.03, Department
The novelty of our approach is exhaustive dataset selection of Informatics, University of Zurich, 2007.
mechanism which includes analysis of triple pattern relations [3] Chris Bizer, Tom Heath, Danny Ayers, and Yves
and links between datasets besides analyzing datasets for Raimond. Interlinking open data on the web.
each triple pattern. WoDQA focuses on discovering relevant www4.wiwiss.fu-
datasets and eliminating irrelevant ones using a rule-based berlin.de/bizer/pub/LinkingOpenData.pdf, 2007.
approach introduced in this paper. Our approach requires Stand 12.5.2009.
Table 1: Prefix definitions
rdf:
dbpedia:
dbpprop:
linkedMDB:
owl:
dc:
foaf:
facebook:

[4] Paolo Bouquet, Chiara Ghidini, and Luciano Serafini. Manfred Hauswirth, Jörg Hoffmann, and Manolis
Querying the web of data: A formal approach. In Koubarakis, editors, The Semantic Web: Research and
Proceedings of the 4th Asian Conference on The Applications, volume 5021 of Lecture Notes in
Semantic Web, ASWC ’09, pages 291–305, Berlin, Computer Science, chapter 39, pages 524–538.
Heidelberg, 2009. Springer-Verlag. Springer Berlin / Heidelberg, Berlin, Heidelberg, 2008.
[5] Carlos Buil, Marcelo Arenas, and Óscar Corcho. [15] Andreas Schwarte, Peter Haase, Katja Hose, Ralf
Semantics and optimization of the SPARQL 1.1 Schenkel, and Michael Schmidt. Fedx: A federation
Federation Extension. In Proc. of 8th Extended layer for distributed query processing on linked open
Semantic Web Conference (ESWC 2011), Heraklion, data. In Grigoris Antoniou, Marko Grobelnik, Elena
Crete, Greece, volume 6644 of Lecture Notes in Simperl, Bijan Parsia, Dimitris Plexousakis, Pieter
Computer Science, pages 1–15. Springer, 2011. De Leenheer, and Jeff Pan, editors, The Semanic Web:
[6] M. Duerst and M. Suignard. Internationalized Research and Applications, volume 6644 of Lecture
Resource Identifiers (IRIs). RFC 3987 (Proposed Notes in Computer Science, pages 481–486. Springer
Standard), January 2005. Berlin / Heidelberg, 2011.
[7] Olaf Görlitz and Steffen Staab. Federated data [16] Andreas Schwarte, Peter Haase, Katja Hose, Ralf
management and query optimization for linked open Schenkel, and Michael Schmidt. Fedx: Optimization
data. In Athena Vakali and Lakhmi Jain, editors, New techniques for federated query processing on linked
Directions in Web Data Management 1, volume 331 of data. In Lora Aroyo, Chris Welty, Harith Alani, Jamie
Studies in Computational Intelligence, pages 109–137. Taylor, Abraham Bernstein, Lalana Kagal, Natasha
Springer Berlin / Heidelberg, 2011. Noy, and Eva Blomqvist, editors, The Semantic Web -
[8] Olaf Görlitz and Steffen Staab. SPLENDID: SPARQL ISWC 2011, volume 7031 of Lecture Notes in
Endpoint Federation Exploiting VOID Descriptions. Computer Science, pages 601–616. Springer Berlin /
In Proceedings of the 2nd International Workshop on Heidelberg, 2011.
Consuming Linked Data, Bonn, Germany, 2011.
[9] Peter Haase, Tobias Mathäß, and Michael Ziller. An
evaluation of approaches to federated query processing
over linked data. In Proceedings of the 6th
International Conference on Semantic Systems,
I-SEMANTICS ’10, pages 5:1–5:9, New York, NY,
USA, 2010. ACM.
[10] Olaf Hartig. Zero-knowledge query planning for an
iterator implementation of link traversal based query
execution. In Grigoris Antoniou, Marko Grobelnik,
Elena Paslaru Bontas Simperl, Bijan Parsia, Dimitris
Plexousakis, Pieter De Leenheer, and Jeff Pan,
editors, ESWC (1), volume 6643 of Lecture Notes in
Computer Science, pages 154–169. Springer, 2011.
[11] Olaf Hartig, Christian Bizer, and Johann Christoph
Freytag. Executing sparql queries over the web of
linked data. In International Semantic Web
Conference, pages 293–309, 2009.
[12] Olaf Hartig and Andreas Langegger. A Database
Perspective on Consuming Linked Data on the Web.
Datenbankspektrum, Semantic Web Special Issue, July
2010.
[13] Tom Heath and Christian Bizer. Linked Data:
Evolving the Web into a Global Data Space. Morgan &
Claypool, San Rafael, CA, 1 edition, 2011.
[14] Bastian Quilitz and Ulf Leser. Querying Distributed
RDF Data Sources with SPARQL. In Sean Bechhofer,