<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Document-based RDF Keyword Search System: Query-by-Query Analysis</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering, University of Padua</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>RDF datasets are today used more and more for a great variety of applications mainly due to their exibility. However, accessing these data via the SPARQL query language can be cumbersome and frustrating for end-users accustomed to Web-based search engines. In this context, KS is becoming a key methodology to overcome access and search issues. In this paper, we further dig on our previous work on the state-of-the-art system for keyword search on RDF by giving more insights on the quality of answers produced and its behavior with di erent classes of queries.</p>
      </abstract>
      <kwd-group>
        <kwd>RDF</kwd>
        <kwd>Keyword Search</kwd>
        <kwd>Virtual Documents</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In the last decade, the Web of Data emerged as one of the principal means to
access, share and re-use structured data on the Web [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. RDF has become the
de facto standard for the publication of information on the Web [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] because it
eases the enrichment and discovery of data as well as their interoperability.
      </p>
      <p>
        In the past years, we have seen a signi cant increase in the number of
knowledge bases published as large RDF datasets, such as DBpedia , Wikidata , and
OpenPHACTS . The use of RDF is growing also for the representation and
management of enterprise data with heterogeneous and very large graphs, often
containing millions of edges [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        RDF datasets are typically queried through the SPARQL structured
language [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which is often di cult to use for non-expert users due to its syntax
and the required knowledge of the underlying structure of the database and the
utilized IRIs. This language is not intuitive for end-users and it does not allow
the natural expression of their information needs based on keywords [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        In recent years keyword search received a lot of attention in the research
community, both in the contest of Relational Databases (RDB) and Graph DB. Good
reviews about this topic are [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] (for a focus on RDB) and [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] (for a focus
on graph data).
      </p>
      <p>
        Keyword Search systems may be categorized into three main classes: (i)
schema-based (e.g. DISCOVER [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]), which uses the schema of the database to
produce a structured query; (ii) graph-based (e.g. BANKS [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]), which explores the
graph (the graph induced by the foreign keys, if we are in a relational database);
(iii) document-based ( rst described in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]), stemming from the graph-based,
this class is based on the creation of textual documents, also called virtual
documents, from the words contained in the graphs. It also uses state-of-the-art IR
methods for the indexing and ranking of the answers. Thanks to this strategy,
the systems belonging to this class showed to scale to real-world scenarios.
      </p>
      <p>
        In this paper we compare our system TSA+VDP [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to the following
stateof-the-art keyword search systems:
1. Structured Language Model (SLM) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is among the few native RDF search
systems. It starts the searching process from the triples that match at least
one of the query keywords and then produces answers by seeking connected
components in the graph. SLM is based on virtual documents and employs
IR models (statistical language models) for retrieval. SLM returns a ranking
of RDF subgraphs to the user and not virtual text documents as in [
        <xref ref-type="bibr" rid="ref15 ref8">8, 15</xref>
        ].
2. Markov Random Fields-Keyword Search (MRF-KS)[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is based on a
combination of graph and virtual-document-based approaches. This system has
proven to be the most e ective one on the shared benchmark in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. MRF-KS
was originally thought to work on RDBs, even though in principle it can be
used for any kind of graph data. We redesigned it to work on RDF and use
it as a competing baseline.
3. SUMM [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is a native RDF search system. The core idea behind this
system is to produce a rst partition of the whole graph through a BFS-based
greedy algorithm. The partitions are edge-disjointed but are connected one
to the other through nodes called portal nodes. Once given the keyword
query, a backtracking algorithm similar to BANKS [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is deployed to nd
connected subgraphs containing all the keywords. On these subgraphs, a
di erent backtracking algorithm similar to BLINKS [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is used to produce
tree-shaped answer graphs whose leaves are nodes containing one keyword
each. The systems use indexes and particular stopping conditions to
speedup the execution.
2.1
      </p>
      <p>
        TSA+BM25 and TSA+VDP
To overcome the limitations of the previous state-of-the-art systems, in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] we
proposed two new keyword search systems, based on the virtual document
approach, one the re nement of the other. These systems are composed of an
o -line and an on-line phase:
O -line phase The o -line phase is common to both systems and is called
      </p>
      <sec id="sec-2-1">
        <title>Topological Semantical Aggregator (TSA), a virtual document-based ap</title>
        <p>proach that produces documents by aggregating RDF triples containing close
concepts. TSA mimics the \clustering hypothesis" de ned in IR which dictates
that triples characterized by similar concepts will cluster together.</p>
        <p>The algorithm takes an RDF graph as input and creates subgraphs
without considering the user query. The idea is to build a subgraph around a single
\topic" which characterizes the graph semantics. To do so, the algorithm starts
randomly from a node that presents an out-degree higher of an
empiricallyde ned threshold. From that node, the algorithm starts a Breadth-First
Searchlike exploration that visits only nodes that present in and out-degree between
certain thresholds, limiting the exploration inside a radius . All these
parameters are also empirically-de ned. In this way, a subgraph is extracted from the
bigger database. Once the exploration stops, the algorithm marks the utilized
nodes as visited and proceeds randomly to the next unvisited node with high
out-degree. Nodes may be visited more than once if they present a high in-degree,
and in this case, they are shared by many graphs. The rationale is that these
nodes may contain information belonging to more topics.</p>
        <p>The result of TSA is a collection of subgraphs covering the whole database.
They are then converted to their corresponding virtual documents through the
extraction and concatenation of the words contained in the IRIs and Literals in
their nodes and edges. We then index these documents with Terrier.
On-line phase Given a keyword query, our rst system, TSA+BM25, applies
the ranking function BM25 to the collection of virtual documents, and obtains
a ranking of graphs. This system then looks for overlapping graphs in the
ranking produced. Graphs that present a percentage of shared triples beyond an
empirically-found threshold are merged. BM25 is used a second time to produce
a new ranking on these merged graphs, which is returned to the user. This
system is rather quick, since it uses state-of-the-art implementations of BM25, but
does not leverage the information contained in the topology of the graphs.</p>
        <p>Our second system is called TSA+VDP (VDP stands for Virtual
Document Pruning). The system then considers the result of the set-based union
of these ranking produced by TSA+BM25, which is called query graph E. This
graph is not necessarily connected.</p>
        <p>TSA+VDP then starts a BFS from every subject node within E. This BFS
is also limited to a radius 0, and produces subgraphs. Of these subgraphs, only
the ones containing all the keywords are kept, producing a new collection, called
the best candidates collection. The rationale is that these selected graphs will
contain, with high probability, triples that are relevant to the user. Since the
BFS exploration could also have introduced many non-relevant triples, the next
step is a pruning algorithm over these candidate graphs. The pruning proceeds
inward, removing all the triples that do not contain at least one keyword, starting
from the nodes with the highest distance from the source node.</p>
        <p>These pruned graphs are nally ranked using a Markov Random Field scoring
function inspired by the one used by MRF-KS. The nal ranking is the answer
returned to the user.
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methods</title>
      <sec id="sec-3-1">
        <title>Preliminaries</title>
        <p>
          While SPARQL enables the user to obtain an exact answer to his/her
information need, it is not always the most viable approach [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. SPARQL is a structured
language with a complex syntax, which also requires complete knowledge of the
graph being queried. A knowledge that the standard user (one who is not familiar
with the concepts of RDF graphs) may not possess.
        </p>
        <p>Keyword Search is a di erent paradigm that can help this kind of user to
retrieve her information without SPARQL. In this approach, queries are an array
&lt; w1; ; wn &gt; of keywords. The output is a list of answers, ordered depending
on their relevance to the information need expressed by the query. This ordering
is de ned by a ranking function. It is important to note that while SPARQL
returns an exact answer (even an empty one is that is the case), Keyword Search
is a best e ort approach.</p>
        <p>The nature of the answers provided by the keyword search system depends
on the task. In this paper, we consider a user who has an information need that
can be formulated with a SPARQL CONSTRUCT query, which he/she is not able
to formulate. If that query existed, the answer graph can be considered as the
set of all and only the triples of the database that the user wants to see. We
consider this set of triples as a ground truth (GT). The triples of the GT are
called relevant triples.</p>
        <p>Our task is to produce a ranking that contains as many relevant triples as
possible in answer graphs ranked as highly as possible. These graphs should also
not contain too many non-relevant (noisy) triples. A good system, therefore,
produces a list of (preferably not-overlapping) subgraphs extracted from the
database such that the subgraphs at the top of the list contain many relevant
triples.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Evaluation</title>
        <p>
          If a Keyword Search System returns a list of answers, we rst of all need to
understand when one of these graphs is relevant. We follow the same approach
of [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], where a signal-to-noise ratio is de ned for each answer graph.
De nition 1. Given a topic tk 2 T , a ranking Rk and the ground-truth graph
GTtk , we de ne the Signal-to-Noise Ratio (SNR) of Gi 2 Rk as
SN R(Gi) = j(Gi \ GTtk ) n Sj
jGij
(1)
where S is the union set of all the relevant triples in Gj 2 Rk; 8j 2 [1; i[ such
that Gj is relevant.
        </p>
        <p>Given this de nition, we say that a graph is relevant when its SN R is bigger
or equal to a threshold value 2 [0; 1]. Note how the SNR keeps into
consideration the relevant triples already seen in the ranking, avoiding to count them
twice. We can say that a quality answer is a graph that has a high SNR.</p>
        <p>We call the relevance parameter, which describes the quality required for
a graph to be considered relevant. In this evaluation framework, the relevance
is therefore not an absolute, xed concept, but can be varied with . This
parameter models the preferences of a user. A user who does not want noise in its
answers is modeled with a close to 1. A user that allows for imperfect answers
is modeled with a parameter closer to 0. In what follows we study how di erent
values of change the evaluation of the output of the considered systems.</p>
        <p>As evaluation measures we use recall and tb-DCG.</p>
        <p>Recall is the total number of relevant triples found in the ranking produced
by a system { typically considering the rst 1000 results { divided by jGT j. The
recall enables to understand the extractive power of a system, the raw quantity
of relevant triples extrapolated from the database. While useful, it does not
convey any information about the disposition of the triples in the ranking and
little information on the quantity of noise that surrounds them in the subgraphs
where they are contained.</p>
        <p>
          tb-DCG [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] is a more sophisticated metric, a derivation of the classic NDCG,
which takes into consideration aspects such as the relevance of the graphs (the
number of triples of the GT contained in them), their position (relevant graphs
put early weight more) and their uniqueness (repeated relevant triples are not
counted).
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Experimental Setup</title>
        <p>We consider two databases: LinkedMDB and IMDB. LinkedMDB is a native
RDF database of circa 7M triples. IMDB is a relational database that we
converted into an RDF version of about 116M triples.</p>
        <p>For both databases, we created 100 topics, divided into 5 classes of 20 queries
each. 50 were used for training the parameters of TSA+VDP, the other 50 for
testing. Every topic is represented as a pair made of a CONSTRUCT SPARQL
query and a keyword counterpart, both representing the same information need.
The SPARQL query is used to extract the subgraph which works as GT for the
keyword query.</p>
        <p>B
D
M
d
e
k
n
i
L
B
D
M
I</p>
        <p>C1
a a a</p>
        <p>C2</p>
        <p>C3</p>
        <p>C4
a a
b b</p>
        <p>C5</p>
        <p>c c
b b b
c c c
d d</p>
        <p>e e
d d d</p>
        <p>e e e</p>
        <p>Every class is syntactically di erent from the others, i.e. they di er in the
shape of the pattern P , as can be seen in Figure 1. The gure presents the
10 patterns associated with each class ordered by increasing complexity. This
complexity derives from the number of triples in the pattern and the shape of
the connections created by these triples. Note that it is possible that two classes
may di er only in the direction of one edge.</p>
        <p>Each one of these classes is further divided into two sub-classes, di ering by
their semantic. For example, class C1 of LinkedMDB is divided into C1.1, with
queries asking for all the lms of a certain director, and class C1.2, made of
queries asking for all the lms starring a certain actor.</p>
        <p>The databases considered here, LinkedMDB 1M and IMDB 1M (of 1M triples
each), are built by randomly extracting connected subsets of triples from the
original datasets. The triples of the ground truths produced by the SPARQL are
added to these datasets. This was necessary since only SUMM, TSA+BM25 and
TSA+VDP are able to scale to the whole dimensions and we needed to be able
to compare all systems.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Performances</title>
      <sec id="sec-4-1">
        <title>Average Analysis</title>
        <p>Analysis of the quality of the answers through the factor Note that
in Table 1 we xed to 0:1. This parameter was empirically chosen to show the
di erences in the e ectiveness of the di erent systems. A higher could provoke
all the values to be attened to 0.</p>
        <p>One may ask now what happens if we change . A higher describes a user
who tolerates less noise in her answer graphs. It asks for answers that are more
\tailored" around the GT. Figure 2 presents the values of recall and tb-DCG of
the ve systems on the two datasets as varies. On the x-axis we have di erent
values for , from 0 to 1, with step 0:1. Every point in the plots is obtained
computing the averages on the 50 queries issued on each dataset.</p>
        <p>Let us start on LinkedMDB 1M. TSA+BM25 appears to be consistently the
top-performing system in terms of recall at the varying of . Moreover, when
= 0, that is when we do not count the noise in the answers, the values of average
recall is around 0:9, illustrating how, on average, many relevant triples are
retrieved by the system in the top 1000 positions. Immediately below TSA+BM25,</p>
        <p>LinkedMDB 1M
IMDB 1M</p>
        <p>SLM
SUMM
TSA+BM25
TSA+VDP
MRF-KS
SLM
SUMM
TSA+BM25
TSA+VDP
MRF-KS
1
0.8
n0l.6
iiseolca
c
e
rR
P0.4
0.2
1
0.8
0.6
G
C
D
b
t0.4
we nd TSA+VDP. The heuristics used by TSA+VDP eliminate some relevant
triples from the ranking, thus reducing the overall recall.</p>
        <p>For tb-DCG, it is evident how TSA+VDP sacri ces recall to obtain a better
tb-DCG. In particular, TSA+VDP is consistently the top-performing system
until we reach = 0:5. After this point, all the systems collapse on low values,
due to the presence of few high-quality answers.</p>
        <p>For IMDB 1M and the recall value, the top-performing system is now TSA+VDP,
with TSA+BM25 immediately behind. This means that in the case of IMDB,
BM25 is less e ective in extracting relevant triples from the whole graph. The
pruning heuristics of TSA+VDP improve the quality of some answers, making
them relevant and therefore counted in the computation of the average recall.
We also note that the trend of the SUMM method is more stable than the other
systems. In fact, the system manages to get high-quality answers that satisfy
many values. Only after a value of = 0:5 there is a performance decline.</p>
        <p>Regarding tb-DCG, TSA+VDP remains the top system, with a sizeable
difference from the second-placed TSA+BM25 and MRF-KS, which appear to be
quite similar in performances while increases. SUMM maintains its stability
but is still below the other systems.</p>
        <p>This analysis clearly shows how bigger values of make the task harder.
In comparison, however, the TSA-based systems are able to obtain consistently
better results with respect to the state of the art systems.
Recall Figure 3 shows the behavior query-by-query in terms of recall and time
for the two datasets and for the two top-performing systems TSA+VDP and
MRF-KS with = 0:1. We grouped the queries by their classes and subclasses
and highlighted them with di erent colors.</p>
        <p>Starting from LinkedMDB 1M, the queries of certain classes such as C1.1
obtain good results in both systems. Others, like class C5, appear hard for both
systems.</p>
        <p>What is evident is that a simpler P does not necessarily imply high
performances. In fact, a class such as C1.2 presents low results in both systems, while
C4, which has a more complex P , has higher recall with TSA+VDP. It seems
that the semantic of the queries plays a critical role in determining their di
culty. Subclasses such as C1.1 and C1.2, for example, show di erent behaviors,
even if they share the same pattern structure.</p>
        <p>For IMDB 1M, we can make the same observations. It is even more evident
in this instance how a simpler syntax does not correspond to higher recall. For
TSA+VDP class C5 almost always presents a recall of 1, while many queries of
class C1 present values close to 0. This may be explained considering the fact
that a simpler pattern corresponds to a ground truth with fewer relevant triples,
harder to extract from the database in an e ective way.
tb-DCG Figure 4 presents similar graphics but for the tb-DCG metric.</p>
        <p>The high variability in the behavior of the queries is immediately evident
from the di erent performances obtained by the systems. Let us start with
LinkedMDB 1M. Certain classes, such as C4, appear to be relatively easy for
TSA+VDP. On the other hand, they are harder for MRF-KS. If we consider
that C1.2, instead, is hard for both systems, we see how it is not necessarily
true that a simpler syntax (i.e. a simpler pattern P ) corresponds to an easier
query to answer.</p>
        <p>On the other hand, a class such as C2 presents subclass C2.1 which appears
hard for both systems, while C2.2 is easier. This suggests again how the same
syntactic class can become harder or easier depending on its semantic nature,
independently from the system.</p>
        <p>For IMDB 1M, we see how similar observations can be made. Class C5 appears
simpler for TSA+VDP and harder for MRF-KS. More in general, it appears that
MRF-KS struggles more in IMDB to answer the queries: its average tb-DCG is
much lower than the one of TSA+VDP. This shows how it may be possible that
the nature of a database (the number and structure of connections among nodes
and the quantity and disposition of keywords among them) may in uence the
e ectiveness of a system.
Time The execution times are presented in both Figure 3 and 4. Note how we
used a time limit of 1000s.</p>
        <p>Once again, queries in the same class may present di erent behaviors in terms
of execution time. For example, MRF-KS presents relatively low values of time
for subclass C5.1 in IMDB, while subclass C5.2 presents much higher times on
average. This new evidence of the fact that the syntactical factor is not the only
one that plays a role in the determination of the execution time. The choice of
keywords (more or less frequent inside the database) appears to also play a role.
Queries with more frequent keywords will, in fact, demand much more execution
time.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Works</title>
      <p>In this paper, we presented an in-depth analysis of the behavior of
state-of-theart systems with a particular focus on the quality of required answers and the
behavior presented with di erent kinds of queries.</p>
      <p>We used a sub-set of LinkedMDB and IMDB as databases. We de ned ve
di erent classes of queries for each dataset, which di er from one another on
their syntax and semantic, for a total of 100 test queries. We used recall and
tb-DCG as evaluation metrics. The rst one describes the simple raw power
of extraction of relevant triples of a system. The latter is a more sophisticated
measure that takes into consideration the relevance of the answers, their position
in the ranking and the quantity of noise in them. We considered ve di erent
state-of-the-art systems: SLM, MRF-KS, SUMM, TSA+BM25, and TSA+VDP.</p>
      <p>We performed two di erent classes of experiments. In the rst one, we
varied the minimal quantity of SNR required from an answer to be considered as
relevant. We showed that the higher the required threshold, the lower the
performances of the systems. TSA+BM25 and TSA+VDP consistently appeared to
obtain the higher results, although we registered a progressive detriment of the
performances.</p>
      <p>The second class of experiments showed that certain classes of queries seem
to be simpler than others for both systems. It resulted that not necessarily a
simpler pattern P implies better results. The complexity of the query is therefore
not present only in its syntax, but also in its semantic and in the keywords
of choice. To corroborate this intuition, it appeared that queries belonging to
the same syntactic class with di erent semantics presented completely di erent
performances.</p>
      <p>For our future works, we will keep researching, also on di erent databases,
how di erent kinds of queries correspond to di erent behaviors, and how poor
results may be mitigated with techniques as query rewriting.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Balmin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hristidis</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koudas</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Papakonstantinou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A system for keyword proximity search on XML databases</article-title>
          .
          <source>In: Proceedings of 29th International Conference on Very Large Data Bases, VLDB</source>
          . pp.
          <volume>1069</volume>
          {
          <fpage>1072</fpage>
          . Morgan Kaufmann (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bast</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buchhold</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haussmann</surname>
          </string-name>
          , H.:
          <article-title>Semantic search on text and knowledge bases</article-title>
          .
          <source>Foundations and Trends in Information Retrieval</source>
          <volume>10</volume>
          (
          <issue>2-3</issue>
          ),
          <volume>119</volume>
          {
          <fpage>271</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bhalotia</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hulgeri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakhe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakrabarti</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sudarshan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Keyword searching and browsing in databases using BANKS</article-title>
          .
          <source>In: Proceedings of the 18th International Conference on Data Engineering</source>
          . pp.
          <volume>431</volume>
          {
          <fpage>440</fpage>
          . IEEE Computer Society (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Co</surname>
            <given-names>man</given-names>
          </string-name>
          , J.,
          <string-name>
            <surname>Weaver</surname>
            ,
            <given-names>A.C.</given-names>
          </string-name>
          :
          <article-title>A framework for evaluating database keyword search strategies</article-title>
          .
          <source>In: Proc. of the 19th ACM International Conference on Information and knowledge management</source>
          . pp.
          <volume>729</volume>
          {
          <fpage>738</fpage>
          . ACM Press (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silvello</surname>
          </string-name>
          , G.:
          <article-title>A scalable virtual document-based keyword search system for rdf datasets</article-title>
          .
          <source>In: Proceedings of the 42nd International ACM SIGIR conference on Research and Development in Information Retrieval</source>
          ,
          <string-name>
            <surname>SIGIR</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Dosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silvello</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Search text to retrieve graphs: A scalable RDF keywordbased search system</article-title>
          .
          <source>IEEE Access 8</source>
          ,
          <issue>14089</issue>
          {
          <fpage>14111</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Elbassuoni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blanco</surname>
          </string-name>
          , R.:
          <article-title>Keyword Search over RDF Graphs</article-title>
          .
          <source>In: Proc. of the 20th ACM Conference on Information and Knowledge Management</source>
          ,
          <string-name>
            <surname>CIKM</surname>
          </string-name>
          <year>2011</year>
          ,. pp.
          <volume>237</volume>
          {
          <fpage>242</fpage>
          . ACM Press, New York, USA (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>P.S.:</given-names>
          </string-name>
          <article-title>BLINKS: ranked keyword searches on graphs</article-title>
          .
          <source>In: Proceedings of the ACM SIGMOD International Conference on Management of Data</source>
          . pp.
          <volume>305</volume>
          {
          <fpage>316</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kementsietsidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Scalable keyword search on large RDF data</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng. (11)</source>
          ,
          <volume>2774</volume>
          {
          <fpage>2788</fpage>
          . https://doi.org/10.1109/TKDE.
          <year>2014</year>
          .2302294
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lopez-Veyna</surname>
            ,
            <given-names>J.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sosa-Sosa</surname>
            ,
            <given-names>V.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez-Arevalo</surname>
            ,
            <given-names>I.:</given-names>
          </string-name>
          <article-title>A virtual document approach for keyword search in databases</article-title>
          .
          <source>In: DATA</source>
          . pp.
          <volume>39</volume>
          {
          <fpage>48</fpage>
          .
          <string-name>
            <surname>SciTePress</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Mass,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Sagiv</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          :
          <article-title>Virtual Documents and Answer Priors in Keyword Search over Data Graphs</article-title>
          .
          <source>In: Proc. of the Workshops of the EDBT/ICDT 2016 Joint Conference. CEUR Workshop Proceedings</source>
          , vol.
          <volume>1558</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Pound</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mika</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaragoza</surname>
          </string-name>
          , H.:
          <article-title>Ad-hoc object retrieval in the web of data</article-title>
          .
          <source>In: Proc. of the 19th International Conference on World Wide Web</source>
          ,
          <string-name>
            <surname>WWW</surname>
          </string-name>
          <year>2010</year>
          . pp.
          <volume>771</volume>
          {
          <fpage>780</fpage>
          . ACM Press, New York, USA (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Prud</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seaborne</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.:
          <article-title>Sparql query language for rdf (</article-title>
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Sahu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mhedhbi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salihoglu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Ozsu, M.T.:
          <article-title>The ubiquity of large graphs and surprising challenges of graph processing</article-title>
          .
          <source>PVLDB</source>
          <volume>11</volume>
          (
          <issue>4</issue>
          ),
          <volume>420</volume>
          {
          <fpage>431</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Widom</surname>
          </string-name>
          , J.:
          <article-title>Indexing Relational Database Content O ine for E cient Keyword-Based Search</article-title>
          .
          <source>In: Proc. of the 9th International Database Engineering and Applications Symposium (IDEAS</source>
          <year>2005</year>
          ). pp.
          <volume>297</volume>
          {
          <fpage>306</fpage>
          . IEEE Computer Society (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aggarwal</surname>
          </string-name>
          , C.C.
          <article-title>: A survey of algorithms for keyword search on graph data</article-title>
          .
          <source>In: Managing and Mining Graph Data</source>
          , pp.
          <volume>249</volume>
          {
          <fpage>273</fpage>
          . Springer (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Proactive Natural Language Search Engine: Tapping into Structured Data on the Web</article-title>
          .
          <source>In: Proc. of the Joint 2013 EDBT/ICDT Conferences</source>
          . pp.
          <volume>143</volume>
          {
          <fpage>148</fpage>
          . ACM Press, New York, USA (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>J.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Keyword Search in Relational Databases: A Survey. IEEE Data Eng</article-title>
          .
          <source>Bull</source>
          .
          <volume>33</volume>
          (
          <issue>1</issue>
          ),
          <volume>67</volume>
          {
          <fpage>78</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>