<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>PhD Workshop, August</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Comparing entities in RDF graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alina Petrova supervised by  Ian Horrocks</string-name>
          <email>alina.petrova@cs.ox.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardo Cuenca Grau</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science University of Oxford</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>28</volume>
      <issue>2017</issue>
      <abstract>
        <p>The Semantic Web has fuelled the appearance of numerous open-source knowledge bases. Knowledge bases enable new types of information search, going beyond classical query answering and into the realm of exploratory search, and providing answers to new types of user questions. One such question is how two entities are comparable, i.e., what are similarities and di erences between the information known about the two entities. Entity comparison is an important task and a widely used functionality available in many information systems. Yet it is usually domain-speci c and depends on a xed set of aspects to compare. In this paper we propose a formal framework for domain-independent entity comparison that provides similarity and di erence explanations for input entities. We model explanations as conjunctive queries, we discuss how multiple explanations for an entity pair can be ranked and we provide a polynomial-time algorithm for generating most speci c similarity explanations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Information seeking is a complex task which can be
accomplished following di erent types of search behaviour.
Classical information retrieval focuses on the query-response
search paradigm, in which a user asks for entities similar to
the input keywords or tting the formal input constraint.
Yet there exists a broad area of exploratory search that is
characterized by open-ended, browsing behaviour [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and
that is much less well studied. Exploratory search
encompasses activities like information discovery, aggregation and
interpretation, as well as comparison [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Comparing entities, or rather, information available about
the entities, is an important task and in fact a widely-used
functionality implemented in many tools and resources. On
the one hand, systems that highlight similarities between
entities can focus on how much entities are alike, giving a
similarity score to a pair (or a group) of entities [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. On
the other hand, systems can focus on how or why, in which
aspects two entities are similar and di erent, by
comparing entities and nding similar features. Such comparison
is done in many domains and for various types of entities:
hotels,1 cars,2 universities,3 shopping items,4 to name a few.
However, as a rule, such systems perform a side-by-side
comparison of items in a domain-speci c manner, i.e., following
a xed, hard-coded template of aspects to compare (e.g., in
case of hotels, it could be price, location, included
breakfast, rating etc.). In a few more advanced systems,
similarities are computed with respect to the type of information
available about the input entities rather than following a
rigid pattern. One such example is the Facebook Friendship
pages.5 Given two Facebook users, a friendship page
contains all their shared information, be it public posts, photos,
likes or mutual friends, as well as their relationship, if any
(e.g., married, friends etc.). However, as in the
aforementioned examples, comparison is done over a limited set of
attributes.
      </p>
      <p>Relying on a xed set of aspects is a reasonable solution
for tabular data with rigid and stable structure. On the
other hand, a more exible approach to entity comparison is
needed for Linked Data, namely for loosely structured RDF
graphs. However, all current systems with such functionality
compare items following a prede ned, domain-speci c list of
values to compare. Thus, an interesting research problem
would be to create a framework for entity comparison that
is domain- and attribute-independent.</p>
      <p>The Semantic Web has fuelled the appearance of
numerous open-source knowledge bases (KBs). Such KBs enable
both automatic information processing tasks and manual
search, and they facilitate new types of information search,
going beyond classical query answering and providing
answers to new types of user questions. For example, using
KBs one can answer questions like how are the two entities
similar or what di ers them, i.e., perform entity comparison.</p>
      <p>In this paper we propose to study such questions posed
over one of the most common types of KBs | RDF graphs.
In particular, we provide a formal framework for posing such
questions and we model answers to these questions as
similarity and di erence explanations. We then discuss how
1www.flightnetwork.com/pages/
hotel-comparison-tool/
2http://www.cars.com/go/compare/modelCompare.jsp
3http://colleges.startclass.com/
4http://www.argos.co.uk/static/Home.html
5Original announcement (cashed by the Wayback Machine):
https://web.archive.org/web/20101030105622/http:
//blog.facebook.com/blog.php?post=443390892130
multiple explanations to a question can be ranked and we
provide a polynomial-time algorithm for generating most
speci c similarity explanations. Finally, we outline
directions of future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. PRELIMINARIES</title>
      <p>In what follows we use the standard notions of conjunctive
queries (CQs), query subsumption and homomorphism. We
disallow trivial CQs of the form &gt;(X). We model RDF
graphs as nite sets of triples, where a triple is of the form
p(s; o), p and s being URIs and o being a URI or a literal.
Furthermore, we use the notion of a direct product of two
graphs, adapted to RDF graphs:</p>
      <p>De nition 1. Let I and J be RDF graphs, t1 = R(s1; o1)
and t2 = R(s2; o2) be two triples. The direct product of t1
and t2, denoted as t1 t2, is the triple R(hs1; s2i; ho1; o2i).
The direct product I J of I and J is the instance:
ft1</p>
      <p>t2 j t1 2 I and t2 2 Jg:
3.</p>
    </sec>
    <sec id="sec-3">
      <title>COMPARISON FRAMEWORK</title>
      <p>There are multiple ways of how we can de ne similarity
and di erence explanations and how we can model entity
comparison. In our framework the formalism of choice is
conjunctive queries (CQs). We model formal explanations
as conjunctive queries and we consider the problem of
nding such explanations as an instance of the query reverse
engineering problem.
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>Similarity explanations</title>
      <p>We would like similarity explanations to highlight
common patterns for input entities. Thus, we model them as
queries that return both of these entities, i.e, they match
patterns tting both entities. Let hI; a; bi be a tuple
consisting of an RDF graph I and two URIs from the domain
of I a; b 2 dom(I) representing input entities. Furthermore,
given a query Q, let Q(I) be the answer set returned by Q
over I.</p>
      <p>De nition 2. Given hI; a; bi, a similarity explanation for
a and b is a unary connected conjunctive query Qsim such
that fa; bg Qsim(I).</p>
      <p>
        Example 1. Given two entities Marilyn Monroe and
Elizabeth Taylor and the Yago RDF graph [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], a possible
similarity explanation is:
      </p>
      <p>Qsim(X) =hasWonPrize(X; Golden Globe);
diedIn(X; Los Angeles);
hasGender(X; female);
actedIn(X; Y 1);
isLocatedIn(Y 1; United States);
isMarriedTo(X; Y 2);
hasGender(Y 2; male);
hasWonPrize(Y 2; Tony Award); etc:
Qsim can be interpreted the following way: both Monroe
and Taylor received a Golden Globes award, died in Los
Angeles, acted in movies that were shot in the US and were
married to men who received a Tony Award.</p>
      <p>Using this de nition, we can formulate the following
decision problem: given hI; a; bi, SimExp is a problem of
whether there exists a CQ Q such that fa; bg Q(I). The
corresponding functional problem is to compute a query Q
such that fa; bg Q(I), given hI; a; bi. Note that both the
de nition of Qsim and SimExp can be easily generalized
from a pair of entities to a set of input entities.</p>
      <p>We speci cally chose the condition to be fa; bg Q(I)
for two reasons. Firstly, the form of Q does not depend on
the rest of the data: it does not matter whether there exist
other entities that match the graph pattern described by the
query; moreover, queries tting the subsumption condition
will not be a ected if new data is added. This is very
important in the context of RDF graphs, since web data is
intrinsically incomplete.</p>
      <p>
        Secondly, it is known that the de nability problem is
coNExpTime-complete for conjunctive queries [
        <xref ref-type="bibr" rid="ref15 ref3">3,15</xref>
        ]. On
the other hand, SimExp can easily be shown to be in PTime:
for conjunctive queries, it is su cient to take the full join of
all tables in the database instance.
      </p>
      <p>Let Sim(a; b) be the set of all similarity explanations for
a given hI; a; bi. Obviously Sim(a; b) can be quite big,
containing numerous explanations, however, we are interested
in the most informative ones. Our assumption is that the
more speci c a similarity explanation is, the better.</p>
      <p>De nition 3. Given hI; a; bi, a most speci c similarity
explanation is a similarity explanation Qsmimsp s.t. for all
similarity explanations Q0sim wrt hI; a; bi: Qsmimsp Q0sim.</p>
      <p>The decision problem SimExpmsp is the problem of
deciding whether Qsim is a most speci c similarity explanation
for the given hI; a; bi.The related functional problem is to
compute a most speci c Qsim.</p>
      <p>The subsumption relation divides the set of all similarity
explanations Sim(a; b) into -equivalent classes. If Sim(a; b)
is not empty, then Sim(a; b)msp is not empty, and there
exists a nite most speci c similarity explanation Qsmimsp, whose
size is bounded by the size of I. This explanation can in fact
be constructed in PTime (see Section 4).
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>Difference explanations</title>
      <p>Analogous to similarity explanations, we model di erence
explanations as CQs, but this time we require only one of
the input entities to be in the answer set.</p>
      <p>De nition 4. Given hI; a; bi, a di erence explanation for
a wrt b is a unary connected conjunctive query Qdaif such
that a 2 Qdaif (I), but b 2= Qdaif (I).</p>
      <p>The notion of a di erence explanation can be generalized
to sets of entities: given an RDF graph I, a set of entities
P os and a set of entities N eg, a di erence explanation for
I and P os wrt N eg is a unary connected CQ QdPiofs s.t.
8p 2 P os: p 2 QdPiofs and N eg \ QdPiofs = ;.</p>
      <p>Given hI; a; bi, Dif Exp is the problem of deciding whether
there exists a di erence explanation Qdaif . The generalized
di erence explanation problem Dif Exp can be solved using
the most speci c similarity explanation problem SimExpmsp:
given I, P os and N eg, rst construct a most speci c
similarity explanation Q for entities in P os (done in PTime),
and then check whether none of the elements of N eg are in
the answer set of Q (conjunctive query evaluation is
NPcomplete). Hence, the complexity of generalized Dif Exp
is NP-complete.</p>
      <p>Furthermore, we would like to introduce another de
nition of a di erence explanation that is dependent on the
similarities between a and b. We would like the di erence
explanation for a to be as relevant as possible, hence we
model it to be dependent not only on the information about
b, but also on the common patterns for a and b. One
possible way to do so is the following: let const(Q) be the set
of constants appearing in a query Q and let const(R(x)) be
the set of constants appearing in an atom R(x).</p>
      <p>De nition 5. Given hI; a; bi, a di erence explanation for
a wrt b and Qsim is a di erent explanation Qdai;sfim such that
8R(x) 2 Qdai;sfim: const(R(x)) \ const(Qsim) 6= ;.</p>
      <p>Example 2. Let the input entities be a = John Travolta
and b = Quentin Tarantino. Let Qsim for a and b be an
explanation that both persons starred in Pulp Fiction: Qsim(X) =
starredIn(X; Pulp Fiction). Relevant di erence explanations
could be that Travolta also starred in Grease and other
movies, while Tarantino has directed several movies,
including Pulp Fiction: Qdaif (X) = starredIn(X; Grease) and
b
Qdif (X) = directed(X; Pulp Fiction). On the other hand, an
explanation that Travolta (unlike Tarantino) is married to
Kelly Preston is rather irrelevant, since we have not
compared the two persons with respect to their marital status.</p>
    </sec>
    <sec id="sec-6">
      <title>TECHNICAL RESULTS</title>
    </sec>
    <sec id="sec-7">
      <title>Algorithm for computing a most specific similarity explanation</title>
      <p>
        We compute a most speci c similarity explanation by
constructing the direct product of the RDF graph, similar to the
construction of the direct product of a database instances
with itself [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Any RDF graph I a 2 dom(I) can be
associated with a canonical unary conjunctive query qI (xa) such
that for each fact R(c; d) in I there is an atom R(xc; xd) in
qI , where xc and xd are variables and xa is a free variable.
Note that a is an answer to qI (xa) over I. The following
algorithm produces a most speci c similarity explanation.
In it, we rst produce an instance with the domain from
dom(I)2, i.e., tuples hc; di for c; d 2 dom(I), and then
construct a canonical conjunctive query of this instance.
      </p>
      <p>Claim 1. If J 6= ;, then J is a maximal connected
component of I I such that a b = ha; bi 2 dom(J ).
Proof sketch: Firstly, if J 6= ;, then ha; bi 2 dom(J ), by Step
1. Secondly, the while-loop on Step 5 is in fact the greedy
procedure that generates the maximal connected component
in I I. Indeed, the condition R(c; e); R(d; f ) 2 I ensures
that the fact R(hc; di; he; f i) is in I I, and the condition
that there must exist a fact in J that contains hc; di or he; f i
ensures connectedness.</p>
      <p>Claim 2. Let hI; a; bi be an input of Algorithm 1. Let
qJ (xha;bi) be the output, and J the instance obtained after
the while loop on Step 5. Then all of the following hold.
(i) fa; bg
qJ (I),
Algorithm 1: Algorithm for computing a most speci c
similarity explanation
Input: an RDF graph I, entities a; b from dom(I).
Output: a most speci c similarity explanation for a
and b.
1 Let J = fR(ha; bi; hc; di) j R(a; c); R(b; d) 2</p>
      <p>Ig [ fR(hc; di; ha; bi) j R(c; a); R(d; b) 2 Ig;
2 if J = ; then
3 return empty query;
4 Let J = ;;
5 while J 6= J do
6 J := J ;
7 J := J [ fR(hc; di; he; f i) 62 J j R(c; e); R(d; f ) 2</p>
      <p>I; and 9 a fact in J that contains hc; di or he; f ig;
10
8 Construct qJ (xha;bi);
9 foreach xhc;ci in qJ , c 62 fa; bg do
// Replace xhc;ci with constant c
qJ (xha;bi) := qJ (xha;bi)[xhc;ci ! c];
11 return qJ (xha;bi).</p>
      <p>(ii) For a connected unary conjunctive query q0(x), if there
exist homomorphisms h1; h2 : q0 ! I such that h1(x) =
a and h2(x) = b, then there exists a homomorphism
h : q0 ! J such that h(x) = ha; bi.</p>
      <p>Corollary 1. Algorithm 1 produces a most speci c
similarity explanation.
4.2</p>
    </sec>
    <sec id="sec-8">
      <title>Properties of the resulting query</title>
      <p>
        The algorithm 1 runs in time polynomial to the size of
the input RDF graph, and the size of resulting most speci c
similarity explanation is also polynomial to I. It should be
noted that the output query tends to be non-minimal. For
example, since Marilyn Monroe and Elizabeth Taylor acted in
several movies that were shot in the US, Q(X) will contain
atoms like:
actedIn(X; Y 1); isLocatedIn(Y 1; United States);
actedIn(X; Y 2); isLocatedIn(Y 2; United States);
actedIn(X; Y 3); isLocatedIn(Y 3; United States); etc.
To avoid such redundancy, we can take the core of the query
(i.e., apply the query minimization algorithm). Taking the
core is an NP-complete problem [
        <xref ref-type="bibr" rid="ref11 ref9">9, 11</xref>
        ], hence, obtaining
a most speci c similarity explanation without redundant
atoms is an NP-complete task.
5.
      </p>
    </sec>
    <sec id="sec-9">
      <title>RELATED WORK</title>
      <p>
        So far only few works have studied explanations over RDF
graphs [
        <xref ref-type="bibr" rid="ref10 ref12 ref5">5, 10, 12</xref>
        ], and there is no single formal de nition
of an explanation over RDF data. A lot of attention has
been paid to discovering connections (\associations")
between nodes [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which boils down to nding and grouping
together paths in the graph that connect one input node to
another one. Such connectedness explanations are
orthogonal (rather than alternative) to the similarity explanations
modelled as queries, which we propose to study. The two
types of explanations are intended to capture di erent
relations between nodes: the former explore possible paths that
link the two nodes together, while the latter seek to nd
commonalities in the neighbourhoods of the input nodes.
      </p>
      <p>
        The problem of reverse engineering a query given some
examples originated in late 1970s and was rst introduced
for the domain of relational databases [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Later it was
extensively researched with respect to di erent query formats:
regular languages [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], XML queries [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], relational database
queries [
        <xref ref-type="bibr" rid="ref16 ref17 ref19">16, 17, 19</xref>
        ], graph database queries [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and SPARQL
queries [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The problem of QRE for RDF data was rst
studied by Arenas et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and was implemented by Diaz et
al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the authors consider three di erent variations
of QRE problem: the basic variation that requires the input
mappings to be part of the answer set ( JQKG); the one
that allows positive examples together with negative
examples (such that JQKG and \ JQKG = ;); and the
variation that requires the examples from to be exactly
the answer set of Q ( = JQKG). The complexity of these
three variations is then provided for fragments of SPARQL
with AND, FILTER and OPT.
      </p>
    </sec>
    <sec id="sec-10">
      <title>FUTURE WORK</title>
      <p>As part of my PhD, I would like to continue studying the
problem of entity comparison using RDF graphs in several
research directions. So far we have investigated similarity
and di erence explanations, and we rank the former
according to the preference condition based on subsumption. In
particular, we assume that the highest ranked explanations
are most speci c similarity explanations. On the one hand,
we would like to apply a similar rationale to di erence
explanations and to study most general di erence explanations as
most preferred ones. On the other hand, these may not be
the optimal choices for a given user, hence we need to
investigate other possible ranking conditions as well as means of
user-speci c ranking of explanations.</p>
      <p>RDF graphs are inherently incomplete, hence it would be
useful to consider a scenario where an explanation is
produced over an RDF graph and a domain ontology that
contains knowledge not explicitly present in the graph.
Consider a graph G consisting of two facts: T eacher(Bob) and
teaches(Alice; CS), | and a simple EL ontology O
consisting of one axiom: 9teaches:Class v T eacher. Let the two
input entities be Alice and Bob. Then a similarity
explanation wrt G; O could be Q(X) = T eacher(X), while we are
unable to generate such a CQ using only graph data.</p>
      <p>In our framework explanations are modelled as CQs, and
while CQs are formulas with relatively high readability for
a user, it is of interest to be able to verbalize explanations,
transforming them into natural language sentences. For
example, a formal explanation Q(X) = livesIn(X; London),
friendsWith(X; Y ); worksAt(Y; Oracle) could be transformed
into an English sentence \Both input entities live in London
and are friends with someone who works at Oracle".</p>
      <p>While CQs correspond to a large part of queries issued
over relational databases, i.e., they have relatively high
expressivity, they cannot express things like negation or
disjunction, which is a limitation. Hence, an interesting
problem would be to consider more expressive languages, in
particular, union of CQs and CQs with inequalities and numeric
comparison.</p>
      <p>Lastly, we are planning to implement a comprehensive
comparison system that would compute most speci c
similarity explanations and most general di erence explanations,
to test it on real-world RDF graphs and to perform usability
tests.</p>
    </sec>
    <sec id="sec-11">
      <title>REFERENCES</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Angluin</surname>
          </string-name>
          .
          <article-title>Queries and concept learning</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>2</volume>
          (
          <issue>4</issue>
          ):
          <volume>319</volume>
          {
          <fpage>342</fpage>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. I.</given-names>
            <surname>Diaz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E. V.</given-names>
            <surname>Kostylev</surname>
          </string-name>
          .
          <article-title>Reverse engineering SPARQL queries</article-title>
          .
          <source>In Proc. of the 25th Int. Conf. on World Wide Web</source>
          , pages
          <volume>239</volume>
          {
          <fpage>249</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Barcelo</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Romero</surname>
          </string-name>
          .
          <article-title>The complexity of reverse engineering problems for conjunctive queries</article-title>
          .
          <source>In Proc. of 20th Int. Conf. on Database Theory</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bonifati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ciucanu</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Lemay</surname>
          </string-name>
          .
          <article-title>Learning path queries on graph databases</article-title>
          .
          <source>In 18th Int. Conf. on Extending Database Technology (EDBT)</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5] G. Cheng, Y. Zhang, and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qu</surname>
          </string-name>
          .
          <article-title>Explass: exploring associations between entities via top-k ontological patterns and facets</article-title>
          .
          <source>In International Semantic Web Conference</source>
          , pages
          <volume>422</volume>
          {
          <fpage>437</fpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.-S.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-H.</given-names>
            <surname>Cha</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Tappert</surname>
          </string-name>
          .
          <article-title>A survey of binary similarity and distance measures</article-title>
          .
          <source>J. Systemics, Cybernetics and Informatics</source>
          ,
          <volume>8</volume>
          (
          <issue>1</issue>
          ):
          <volume>43</volume>
          {
          <fpage>48</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cohen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y. Y.</given-names>
            <surname>Weiss</surname>
          </string-name>
          .
          <article-title>Learning tree patterns from example graphs</article-title>
          . In LIPIcs-Leibniz
          <source>International Proceedings in Informatics</source>
          , volume
          <volume>31</volume>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Diaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Benedikt</surname>
          </string-name>
          .
          <article-title>SPARQLByE: Querying RDF data by example</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          ,
          <volume>9</volume>
          (
          <issue>13</issue>
          ),
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Gottlob</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Nash</surname>
          </string-name>
          .
          <article-title>E cient core computation in data exchange</article-title>
          .
          <source>Journal of the ACM</source>
          ,
          <volume>55</volume>
          (
          <issue>2</issue>
          ):
          <fpage>9</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Heim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lohmann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Stegemann</surname>
          </string-name>
          . RelFinder:
          <article-title>Revealing relationships in RDF knowledge bases</article-title>
          .
          <source>In Int. Conf. on Semantic and Digital Media Technologies</source>
          , pages
          <volume>182</volume>
          {
          <fpage>187</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hell</surname>
          </string-name>
          and
          <string-name>
            <surname>J. Nesetril.</surname>
          </string-name>
          <article-title>The core of a graph</article-title>
          .
          <source>Discrete Mathematics</source>
          ,
          <volume>109</volume>
          (
          <issue>1</issue>
          ):
          <volume>117</volume>
          {
          <fpage>126</fpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Schuppel, and</article-title>
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          .
          <article-title>Discovering unknown connections - the DBpedia relationship nder</article-title>
          .
          <source>CSSW</source>
          ,
          <volume>113</volume>
          :
          <fpage>99</fpage>
          {
          <fpage>110</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Marchionini</surname>
          </string-name>
          .
          <article-title>Exploratory search: from nding to understanding</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>49</volume>
          (
          <issue>4</issue>
          ):
          <volume>41</volume>
          {
          <fpage>46</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          , G. Kasneci, and
          <string-name>
            <given-names>G.</given-names>
            <surname>Weikum. Yago</surname>
          </string-name>
          :
          <article-title>A large ontology from wikipedia and wordnet</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>6</volume>
          (
          <issue>3</issue>
          ):
          <volume>203</volume>
          {
          <fpage>217</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B. ten</given-names>
            <surname>Cate</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Dalmau</surname>
          </string-name>
          .
          <article-title>The Product Homomorphism Problem and Applications</article-title>
          .
          <source>In 18th International Conference on Database Theory (ICDT</source>
          <year>2015</year>
          ), pages
          <fpage>161</fpage>
          {
          <fpage>176</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Q. T.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-Y.</given-names>
            <surname>Chan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Parthasarathy</surname>
          </string-name>
          .
          <article-title>Query by output</article-title>
          .
          <source>In Proc. of the 2009 ACM SIGMOD Int. Conf. on Management of data</source>
          , pages
          <volume>535</volume>
          {
          <fpage>548</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Q. T.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-Y.</given-names>
            <surname>Chan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Parthasarathy</surname>
          </string-name>
          .
          <article-title>Query reverse engineering</article-title>
          .
          <source>The VLDB Journal</source>
          ,
          <volume>23</volume>
          (
          <issue>5</issue>
          ):
          <volume>721</volume>
          {
          <fpage>746</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>R. W.</given-names>
            <surname>White</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Roth</surname>
          </string-name>
          .
          <article-title>Exploratory search: beyond the query-response paradigm</article-title>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Elmeleegy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Procopiuc</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          .
          <article-title>Reverse engineering complex join queries</article-title>
          .
          <source>In Proc. of the 2013 ACM SIGMOD Int. Conf. on Management of Data</source>
          , pages
          <volume>809</volume>
          {
          <fpage>820</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>M. M. Zloof</surname>
          </string-name>
          .
          <article-title>Query-by-example: A data base language</article-title>
          .
          <source>IBM systems Journal</source>
          ,
          <volume>16</volume>
          (
          <issue>4</issue>
          ):
          <volume>324</volume>
          {
          <fpage>343</fpage>
          ,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>