<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>One Query to Bind Them All</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniel M. Herzig</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thanh Tran</string-name>
          <email>duc.trang@kit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute AIFB Karlsruhe Institute of Technology 76128 Karlsruhe</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recently, SPARQL became the standard language for querying RDF data on the Web. Like other formal query languages, it applies a Boolean-match semantics, i.e. results adhere strictly to the query. Thus, queries formulated for one dataset can not easily be reused for querying other datasets. If another target dataset is to be queried, the queries need to be rewritten using the vocabulary of the target dataset, while preserving the captured information need. This is a tedious manual task, which requires the knowledge of the target vocabulary and often relies on computational expensive techniques, such as mapping, data consolidation or reasoning methods. Given the scale as well as the dynamics of Web datasets, even automatic rewriting is often infeasible. In this paper, we elaborate on a novel approach, which allows to reuse existing SPARQL queries adhering to one dataset to search for entities in other dataset, which are neither linked nor otherwise integrated beforehand. We use the results returned by the given seed query, to construct an entity relevance model (ERM), which captures the content and the structure of relevant results. Candidate entities from the target dataset are obtained using existing keyword search techniques and subsequently ranked according to their similarity to the ERM. During this ranking task, we compute mappings between the structure of the ERM and of the candidates onthe- y. The e ectiveness of this approach is shown in experiments using large-scale datasets and compared to a keyword search baseline.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Publishing structured data on the Web according to the Linked Data
principles has been advocated by the Linked Data movement and the Semantic Web
community and recently also receives an increasing support from companies
like Yahoo!, Google, and Facebook, and also from governmental institutions.
The amount of Linked Data on the Web increases rapidly and is now in the
order of several billion RDF triples. In order to query these structured data
e ectively, SPARQL an expressive, formal query language has been developed,
which became the standard as recommended by the W3C in 2008 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
However, like other formal query languages, e.g. SQL for relational databases, this
query language applies a Boolean-match semantics, which means that results to
a query are crisp and strictly adhere to the speci ed constraints in the query.
As a consequence, these queries are bound to speci c datasets, respectively the
corresponding vocabularies. Whereas the database community relies mainly on
query rewriting [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the answer of the Linked Data community to this problem
is (1) interlinking datasets manually and with the help of mapping and
consolidation techniques [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], (2) promoting the reuse of vocabularies and (3) exploiting
the semantics of RDF for reasoning [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. All these data integration approaches
require high upfront investments, given the scale as well as the heterogeneity of
the Web datasets, before the actual querying across datasets is possible. Also
most of these integration steps need to be performed again, if the underlying
data changes, which is frequently the case in the dynamic data Web scenario.
Another way to bridge this schema- and data-mismatch is to apply information
retrieval techniques, in particular keyword search, which crosses the gap of two
datasets by simply ignoring the structure and applying the `bag of words' notion
not just on webpages, but also for data retrieval [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
      </p>
      <p>In this paper, we elaborate on a novel approach, which allows to search for
entities in a target dataset immediately without any prior integration using a
star-shaped SPARQL query adhering to the vocabulary of another dataset. This
approach combines the exibility of unstructured information retrieval solutions
with nature of database-style querying by incorporating the structure of the
underlying data. The idea is to start with a structured seed query speci ed for
one particular data source. Based on the content as well as the structure of
results obtained from this data source, we construct an Entity Relevance Model
(ERM) that can be seen as an abstract representation of relevant results. Instead
of relying on mappings for rewriting the structured seed query, we simply treat
the seed query as a keyword query and submit it against other data sources to
obtain additional results. Then, we create mappings between the structure of
candidate results and the structure of the ERM during runtime, which are used
for the subsequent matching and ranking. Candidates matching more closely to
the content as well as the structure captured by the ERM are ranked higher.
This way, we can use unstructured keywords for querying, but at the same time,
incorporate structure information into matching and ranking that are part of
the search process.</p>
      <p>We investigate the e ectiveness of our approach in experiments using
DBpedia as the source dataset and an RDF representation of IMdb as target dataset.
Our approach is not limited to these datasets, but general enough to be
applied to any datasets. However, the scenario involving DBpedia illustrates also
an important use case, because DBpedia as the nucleus of the Web of data is
probably the best known dataset and many users already have SPARQL queries
or know how to create such queries for DBpedia. In addition many applications
are build on top of DBpedia, which will also bene t, if the underlying retrieval
incorporates entities from other datasets on-the- y.</p>
      <p>
        Contributions. The contributions of this work can be summarized as
follows: (1) We propose a novel approach for searching Web data sources that
does not rely on upfront data integration, but uses an Entity Relevance Model
(ERM) for searching as well as for computing mappings on-the- y in line with
the pay-as-you-go data integration [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] paradigm. (2) We provide one speci c
implementation of this approach, show how an ERM can be constructed and
applied to rank search results exploiting mappings created during runtime. (3)
In experiments, we investigate the feasibility and e ectiveness of our approach
and compare it to a common keyword search baseline.
      </p>
      <p>Outline. Section 2 de nes the research problem and gives an overview and
brie y sketches our new approach. This approach of relevance based on-the- y
mappings is presented in detail in Section 3. Evaluation results are presented in
Section 4. Section 5 discussed the related work before we conclude in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Overview</title>
      <p>In this section we de ne the setting of the addressed problem, present existing
solutions, and brie y sketch the framework we propose to deal with this problem.
2.1</p>
      <p>Data on the Web
The problem we address is situated in a Web data scenario. Therefore, the kind
of Web data that is of most interest is RDF data. For reasons of simplicity and
generality, we employ a generic graph-based data model that omits speci c RDF
features such as blank nodes. In this model, entity nodes are RDF resources,
literal nodes correspond to RDF literals, attributes are RDF properties, and
edges stand for RDF triples.</p>
      <p>Data Graph. The data graph is a directed and labeled graph G = (N; E).
The set of nodes N is a disjoint union of entities NE and literals NL, i.e. N =
NE ] NL. Edges E can be conceived as a disjoint union E = EE ] EL of edges
representing connections between entities, i.e. a(e1; e2) 2 EE , i e1; e2 2 NE ,
and connections between entities and literals, i.e. a(e1; v) 2 EL, i e1 2 NE and
v 2 NL. Given this graph structure, we de ne the set of edges including the
targeted nodes A(e) = fa(e; ) 2 Eg as the description of the entity e 2 NE , and
each member a(e; ) 2 A(e) is called an attribute of e. Thus, we in fact neglect
the distinction between EE and EL and simply refer to all edges as attributes.
The set of distinct attribute labels of an entity e, i.e. A0(e) = faja 2 A(e)g, is
called the model of e.
2.2</p>
      <p>
        Research Problem
Given this model of Web data, structured queries can be speci ed to search over
such datasets. The most commonly used language for querying RDF data on
the Web is SPARQL [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. One essential feature of SPARQL is the Basic Graph
Pattern (BGP), which conceptually resembles the notion of conjunctive queries
that constitute an important fragment of other widely used structured query
languages such as SQL. Basically, a BGP is a set of conjunctive triple patterns,
each of the form predicate(subject; object). They represent patterns because
either predicate, subject or object might be a variable, or is speci ed explicitly
(as a constant). Answering these queries amounts to the task of graph
pattern matching, where subgraphs in the data graph matching the query pattern
are returned as results. Predicates are matched against attributes in the data,
whereas bindings to subjects and objects in the query are entities and literal
values, respectively.
      </p>
      <p>
        One particular form of BGP with high importance are so-called entity queries.
Essentially, they are star-shaped queries with the node in the center of the star
representing the entity (entities) to be retrieved. Figure 3 provides several
examples. According to a recent Web query logs study performed by Yahoo!
researchers, this type of queries is most common on the Web [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and also the
analysis of SPARQL logs showed that it is also the most common type used in
the Semantic Web context. Further, most of the current Semantic Web search
engines such as Sig.ma 1 and Falcons [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] focus on answering these queries. For
the sake of clarity, we also focus on this type of queries in this paper to illustrate
the main ideas underlying our approach.
      </p>
      <p>Problem. The problem is nding relevant entities in a set of target datasets
Dt given a source dataset Ds and an entity query qs adhering to the vocabulary
of Ds. Clearly, if there are no syntactical di erences at the level of schema and
data as discussed above, then qs can directly be used to retrieve information from
Dt. However, given Web data are heterogeneous in that sense, the following two
strategies can be applied.</p>
      <p>Baseline KW. The rst and most widespread solution to this end is to use
keyword search over so called `bag-of-words' representations of entities. That
is, the description of an entity is simply a set of terms. A query is also a set
of terms, which is then matched against the at term-based representation of
data. This unstructured approach bridges di erences at the level of schemas and
data simply by ignoring the ne-grained semantics and structure available in the
entity descriptions, as illustrated in Fig. 1. Clearly, this approach is simple but
also exible in the sense that the same keyword query speci ed for Ds can also
be used for Dt because results from Dt can be obtained when there are matches
at the level of terms.</p>
      <p>director  rainer  werner  fassbinder  released  1982  type  film   (2)  </p>
      <p>Our approach. In this paper, we present a framework to address this
problem of searching for entities in Web data using on-the- y mappings computed in
a pay-as-you-go fashion based on entity relevance models (ERM). This
framework is instantiated involving the following three steps. (1) First, we compute
an ERM from the results returned from the source dataset Ds using qs. (2)
Second, we present a light-weight on-the- y integration technique, which maps the
structure of result candidates of the target datasets Dt to the structure of the
ERM. (3) Finally, the result candidates are ranked according to their similarity
to the ERM using the mappings computing during runtime.
(1)
3</p>
    </sec>
    <sec id="sec-3">
      <title>Entity Search Over Web Data</title>
      <p>In this section, we present how the entity relevance model is constructed and
discuss how this model can be exploited for ranking and relevance-based
on-they data integration.
3.1</p>
      <p>
        Entity Relevance Model
We aim to build a model that captures the structure and content of entities
relevant to the information need, which is expressed in the seed entity query qs.
This proposed model is called the Entity Relevance Model (ERM). The model
is based on the notion of structured relevance model that has been proposed
before [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        The ERM is a composite model that consists of several, atomic language
models[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and is constructed as follows. Let qs be the query submitted against
the source dataset Ds, and Rs = fe1; : : : ei; : : : emg be the set of m entities
obtained for this query. Across all m entities, we observe n distinct attribute
labels, as1 : : : ajs : : : asn. For each observed attribute label ajs, we add a eld ajs to
the ERM , and create an attribute-speci c language model ERM ajs to instantiate
this eld. This atomic model ERM ajs (w) speci es the probability of w occurring
in the eld ajs, for every word w 2 V , where V is the vocabulary of all words.
The ERM ajs (w) is computed from the entity descriptions ajs 2 A(e), 8e 2 Rs.
In particular, every ERM ajs (w) captures the value v (i.e. the content) of the
attributes with label ajs and is de ned as follows:
      </p>
      <p>ERM ajs (w) =</p>
      <p>P
e2Rs
P
e2Rs</p>
      <p>Pv2ajs n(w; v)</p>
      <p>P
v2ajs jvj
where n(w; v) denotes the number of occurrences of word w in the content
value v of attribute ajs and jvj the length of v. Note that the outer sum goes over
the entities e 2 Rs returned by qs and the inner sum goes over all values v of
attributes with labels ajs. Thus, entity descriptions, which do not have the attribute
j
ajs, do not contribute to ERM as (w). In order to capture the importance of the
atomic attribute-speci c models, we de ne the probability of occurrence P (ajs)
for every ERM ajs (w)) as P (ajs) = n(ajs; e1:::m)=m, where the numerator denotes
s
the number of entities having attributes with label ajs and m = jRsj is the total
number of entity results. In summary, an ERM has n elds, each corresponds
to an attribute with a speci c label, has a corresponding language model, which
has a probability of occurrence. An example for an ERM constructed from two
entities is illustrated in Fig. 2.
(2) ERM
as P (as) w : ERMas (w)
label 1 world:0.2, on:0.2, wires:0.2, : : :
starring 0.5 klaus:0.25, lowitsch:0.25, barbara:: : :
director 1 rainer:0.33, werner:0.33, fassbinder:0.33
released 1 1973:0.5, 1982:0.5
language 0.5 english:1</p>
      <p>type 1 lm:1
(3) et</p>
      <p>at w : etat (w)
i:title e:0.33, t:0.33, 1994:0.33
i:actors coyote:0.5, peter:0.5
i:directors spielberg:0.33, steven:0.33, i:0.33
i:producer spielberg:0.33, steven:0.33, i:0.33</p>
      <p>type movie:1
We tackle the problem of searching over heterogeneous data in a way similar to
entity consolidation. That is, given the results es 2 Rs from the source dataset
obtained for the seed query, we aim to nd out which entities in the target
datasets are similar to Rs. We use the ERM as the model of those relevant
results. In particular, we estimate which entities et of Dt are relevant for the
query qs by measuring their similarity to the ERM and rank them by decreasing
similarity. First, we capture the values of atk of each entity et using a language
k
model etat (w). Similar to the ERM, this model is de ned based on the number
of word occurrences as follows:</p>
      <p>k
etat (w) =</p>
      <p>Pv2atk n(w; v)</p>
      <p>P
v2atk jvj
(2)
As before, the sum goes over all values of attributes with label atk, n(w; v) denotes
that number of occurrences of w in v, and jvj denotes the length of v. Fig. 2 (3)
illustrates an example.</p>
      <p>The similarity of a candidate entity et from the target dataset Dt to the
ERM is calculated as the sum of the weighted similarities between the language
model of eld ajs and the language model of atk, which is calculated by their cross
entropy H:</p>
      <p>Sim(ERM; et) = X
j k
j P (ajs) H(ERM as jjetat )
as
j
as</p>
      <p>In particular, we apply this similarity calculation only when we know which
eld ajs of ERM should be matched against which attribute atk of et. We address
this problem in the next section and show how the ERM can be exploited to
create on-the- y schema mappings, i.e. mappings between an attribute atk and a
eld ajs of ERM . Equation 3 applies to corresponding pairs of attribute and eld.
If there is no mapping between ajs and atk, then we use a \maximum distance".
j
This distance is computed as the cross entropy between ERM as and a language
j
model that contains all words in the vocabulary expect the ones of ERM as .</p>
      <p>The similarity for each eld ajs is weighted by the occurrence probability of
this eld, P (ajs), and the parameter ajs . This parameter gives us the exibility
to boost the importance of attributes that occur in the query qs as follows:
as
j =
1
b
if ajs 2= qs
if ajs 2 qs; b &gt; 1</p>
      <p>For constructing the language models of the ERM and of the candidate
entities, a maximum likelihood estimation has been used, which is proportional
to the count of the words in an attribute value. However, such an estimation
assigns zero probabilities to those words not occurring in the attribute value.</p>
      <p>k
In order to address this issue, etat (w) is smoothed using a collection-wide model
cs(w), which captures the probability of w occurring in the entire dataset Ds.
This smoothing is controlled by the Jelinek-Mercer parameter . As a result, the
entropy H is calculated over the vocabulary V of eld ajs as:
(3)
(4)
j k
H(ERM as jjetat ) =</p>
      <p>X
w2Vajs</p>
      <p>ERM ajs (w) log(</p>
      <p>k
etat (w) + (1
) cs(w) )
(5)
3.3</p>
      <p>
        On The Fly Integration Using ERM
We want to determine which attribute of an entity needs to be compared to a
given eld of the ERM constructed for Ds. The ERM is not only used for search,
but also exploited for this alignment task. The details for computing mappings
between entity attributes etat1:::r and ERM elds ERM as1:::n are presented in
Algorithm (1).The rational of the algorithm is that a eld ajs is aligned to an
attribute atk when the cross entropy H(ERM ajs jjetatk ) between their language
models is low, i.e. a mapping is established, if H is lower than a threshold t
(normalized based on the highest cross entropy, line 12).That is, entropy is used
as a similarity measure and the features used to compare attributes are based on
their entire contents represented using language models. The algorithm iterates
over n r comparisons in worst case for an ERM with n elds and an entity
with r = jA0(et)j attribute labels. Note that this on-the- y algorithm operates
only on entities that are requested as part of the search process, i.e. n and
r are rather small, compared to full- edge upfront integration that takes the
entire schema into account, see Table 1 and Table 3. Especially, this computation
comes for free: searching also requires the same computation (Equation 3) and
thus the entropy values computed here are kept and subsequently reused for that.
Moreover, for a faster performance, ERM elds having an occurrence probability
of P (ajs) &lt; c can be pruned, because of their negligible in uence. In addition,
existing mappings can be used to reduce the number of comparisons even further.
Algorithm 1 On the y Alignment
In this section, we report on the experiments conducted using an
implementation of the proposed approach. We performed the experiments with the following
parameter setting: = 10, c = 0:8, t = 0:75 and follow the Cran eld [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
methodology for the experiments on the search performance.
Our experiments were conducted with two RDF Web datasets, DBpedia 3.5.1
and IMdb. DBpedia is a structured representation of Wikipedia which contains
more than 9 million entities of various types, among them about 50k entities
typed as lms. The IMdb dataset2 is derived from www.imdb.com and
transformed into RDF. The IMdb dataset contains information on movies and lms.
2 We thank Julien Gaugaz and the L3S Research Center for providing us their version
of the IMdb dataset.
Our goal is to nd relevant entities in the target datasets Dt for a given query
qs. We determine the relevant entities in Dt by manually rewriting the query qs
to obtain a structured query qt adhering to the vocabulary of Dt. Fig. 3 shows
such a set of queries, one of the queries serves as the source query qs and the
results of the other two queries capture the ground truth for the IR-experiments.
      </p>
      <p>Note that this ground truth is conservative in the sense that it considers
only matches that can be obtained through query rewriting based on same-as
mappings. However, some possibly relevant entities may either be represented
di erently (using attributes that cannot be aligned via same-as mappings) or
do not contain information for some of the query predicates. Hence, this ground
truth de nes a lower bound on the actual performance.</p>
      <p>
        We created two query sets, each containing 23 BGP entity queries of
different complexities, ranging from 2 to 4 triple patterns and yielded a varying
number of relevant entities (see Table 2). The structured queries represent
realworld information needs, e.g. retrieve \all movies directed by Steven Spielberg",
\all movies available in English and also in Hebrew", or \all movies directed
by Rainer Werner Fassbinder, which were released in 1982". The last query is
illustrated in Fig. 3.
We compare the performance of our ERM approach against Semplore [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which
is based on Lucene, a real-world keyword search system. Semplore creates a
virtual document for every entity description and use the concatenations of
attribute and attribute value as document terms. In the same way, we transform
the structured query into a keyword query of disjunctive terms by using the
concatenations of predicates and constants of the structured query as keywords.
The resulting keyword query retrieves all virtual documents representing entity
descriptions, which contain some of the corresponding terms. Fig. 1 illustrates
an example.
We evaluate search performance using the standard IR measures precision, recall,
mean average precision (MAP) and mean reciprocal rank (MRR). We retrieve
the top ve thousand entities, rank them, and compute the metrics based on the
top one thousand entities returned by each method. We evaluate ERM against
the baseline described above. ERM performs alignment on-the- y and operates
without any upfront integration or prior knowledge about the schemas. When
comparing the performance of ERM and KW in Fig. 4(a), we observe that
ERM outperforms KW . ERM yields a MAP of 0.55, whereas KW achieves a
MAP of about 0.2.
      </p>
      <p>The robustness of the retrieval performance of ERM can be observed in
Fig. 4(b), which shows the interpolated precision across the recall levels. We
can observe that ERM considerably outperforms the KW baseline across the
recall-levels.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Related Work</title>
      <p>
        We have discussed related work throughout the paper. Basically, there are two
existing lines of approaches, one that is based on keyword search and the other
one is structured query rewriting. Our approach represent a novel
combination which combines the exibility of keyword search with the power of query
rewriting. Just like keyword search, it does not rely on precomputed mappings.
However, it is able to exploit the ne-grained structure of query and results,
which is the primary advantage of structured query rewriting. In addition, it
can leverage existing mappings created by tools like [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ].
      </p>
      <p>
        More precisely, there exist similar keyword search retrieval models that also
take the structure of the underlying data into account. In fact, the model
underlying our approach originates from the concept of language models [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which
1
0.8
0.6
P
AM0.4
0.2
0
have been proposed for modeling resources and queries as multinomial
distributions over the vocabulary terms. More precisely, the foundation of our work
is established by Lavrenko et al., who propose relevance-based language models
to directly capture the relevance behind document and queries.In this context,
structure information has been exploited for constructing structured relevance
models [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] (SRM). This is the one mostly related to ERM. The di erence is that
while the goal of SRM is to predict values of empty elds in a single dataset
scenario, ERM targets searching in a completely di erent setting involving multiple
heterogeneous datasets.
      </p>
      <p>
        Beside the query rewriting [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] strategy, there are database oriented approaches
bridging the schema- and vocabulary gap by query relaxation [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ]. However,
these approaches need also a precomputing step to either identify duplicates [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
or derive alternative relations[
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ].
      </p>
      <p>
        The proposed mapping technique is in principle similar to existing
techniques, e.g. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], to the extent that it relies on the same features, i.e., values of
attributes. However, the use of language models for representing these features as
well as the similarity calculation based on entropy is common for retrieval tasks,
but has not been applied to the schema mapping problem before. We consider
this as a promising approach for embedding the pay-as-you-go data integration
paradigm [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] into the search process.
6
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>We have proposed a novel approach for searching Web datasets using one single
structured seed query that adheres to the vocabulary of just one of the dataset.
It is based on using an entity relevance model computed from the seed query,
which captures the structure and content of relevant results. This model is used
both for matching and ranking relevant results, as well as for performing data
integration on-the- y. This approach combines the exibility of keyword search
in the sense that no upfront integration is required, with the power of structured
query rewriting techniques that comes from the use of the ned-grained structure
of query and results. The experiments showed that our approach outperform a
common keyword search baseline.</p>
      <p>We demonstrated our approach between one source and one target dataset.
However, there is no inherent limitation to the number of target datasets. So
theoretically, one query can be used `to bind them all'.</p>
    </sec>
    <sec id="sec-6">
      <title>7 Acknowledgements</title>
      <p>This work was supported by the German Federal Ministry of Education and
Research (BMBF) under the iGreen project (grant 01IA08005K).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <source>W3C: Recommendation 15 January</source>
          <year>2008</year>
          ,
          <article-title>SPARQL Query Language for</article-title>
          RDF http://www.w3.org/TR/rdf-sparql-query/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
          </string-name>
          , R.:
          <article-title>Query rewriting and answering under constraints in data integration systems</article-title>
          . In: IJCAI. (
          <year>2003</year>
          )
          <volume>16</volume>
          {
          <fpage>21</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
          </string-name>
          , I.Y., Han, H.:
          <article-title>A survey on ontology mapping</article-title>
          .
          <source>SIGMOD Rec</source>
          .
          <volume>35</volume>
          (
          <year>September 2006</year>
          )
          <volume>34</volume>
          {
          <fpage>41</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bonatti</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sauro</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Robust and scalable linked data reasoning incorporating provenance and trust annotations</article-title>
          .
          <source>J. Web Sem</source>
          .
          <volume>9</volume>
          (
          <issue>2</issue>
          ) (
          <year>2011</year>
          )
          <volume>165</volume>
          {
          <fpage>201</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Cheng, G.,
          <string-name>
            <surname>Qu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Searching linked objects with falcons: Approach, implementation and evaluation</article-title>
          .
          <source>Int. J. Semantic Web Inf. Syst</source>
          .
          <volume>5</volume>
          (
          <issue>3</issue>
          ) (
          <year>2009</year>
          )
          <volume>49</volume>
          {
          <fpage>70</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Penin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , 0007,
          <string-name>
            <given-names>L.Z.</given-names>
            ,
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          :
          <article-title>Semplore: A scalable ir approach to search the web of data</article-title>
          .
          <source>J. Web Sem</source>
          .
          <volume>7</volume>
          (
          <issue>3</issue>
          ) (
          <year>2009</year>
          )
          <volume>177</volume>
          {
          <fpage>188</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Madhavan</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>X.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halevy</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Je ery</surname>
            ,
            <given-names>S.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ko</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Web-scale data integration: You can a ord to pay as you go</article-title>
          .
          <source>In: CIDR</source>
          . (
          <year>2007</year>
          )
          <volume>342</volume>
          {
          <fpage>350</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Pound</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mika</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaragoza</surname>
          </string-name>
          , H.:
          <article-title>Ad-hoc object retrieval in the web of data</article-title>
          .
          <source>In: Proc. of the 19th Int. Conf. on WWW</source>
          . (
          <year>2010</year>
          )
          <volume>771</volume>
          {
          <fpage>780</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lavrenko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yi</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Allan</surname>
          </string-name>
          , J.:
          <article-title>Information retrieval on empty elds</article-title>
          .
          <source>In: HLTNAACL</source>
          . (
          <year>2007</year>
          )
          <volume>89</volume>
          {
          <fpage>96</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ponte</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croft</surname>
          </string-name>
          , W.B.:
          <article-title>A language modeling approach to information retrieval</article-title>
          .
          <source>In: SIGIR</source>
          . (
          <year>1998</year>
          )
          <volume>275</volume>
          {
          <fpage>281</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Cleverdon</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The CRANFIELD Tests on Index Langauge Devices</article-title>
          .
          <source>Aslib</source>
          (
          <year>1967</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Falcon-ao: A practical ontology matching system</article-title>
          .
          <source>J. Web Sem</source>
          .
          <volume>6</volume>
          (
          <issue>3</issue>
          ) (
          <year>2008</year>
          )
          <volume>237</volume>
          {
          <fpage>239</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>David</surname>
          </string-name>
          , J.:
          <article-title>Aroma results for oaei 2009</article-title>
          . In: Ontology Matching Workshop, ISWC. (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaugaz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balke</surname>
          </string-name>
          , W.T.,
          <string-name>
            <surname>Nejdl</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Query relaxation using malleable schemas</article-title>
          .
          <source>In: SIGMOD Conference</source>
          .
          <article-title>(</article-title>
          <year>2007</year>
          )
          <volume>545</volume>
          {
          <fpage>556</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Elbassuoni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramanath</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          , G.:
          <article-title>Query relaxation for entityrelationship search</article-title>
          . In: ESWC. (
          <year>2011</year>
          )
          <volume>62</volume>
          {
          <fpage>76</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Duan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fokoue</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srinivas</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>One size does not t all: Customizing ontology alignment using user feedback</article-title>
          .
          <source>In: International Semantic Web Conference</source>
          . (
          <year>2010</year>
          )
          <volume>177</volume>
          {
          <fpage>192</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>