<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards open-source shared implementations of keyword-based access systems to relational data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alex Badan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Benvegnù</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Biasetton</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Bonato</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Brighente</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Cenzato</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Piergiorgio Ceron</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Cogato</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Marchesin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Minetto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leonardo Pellegrina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Purpura</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Simionato</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicolò Soleti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Tessarotto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Tonon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federico Vendramin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Ferro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Padua</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Keyword-based access systems to relational data address a challenging and important issue, i.e. letting users to exploit natural language to access databases whose schema and instance are possibly unknown. Unfortunately, there are almost no shared implementations of such systems and this hampers the reproducibility of experimental results. We explore the di culties in reproducing such systems and share implementations of several state-of-the-art algorithms.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Keyword-based access to relational data is a key
technology for lowering the barriers of access to the huge amounts
of data managed by databases. It is an extremely di cult
and open challenge [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] since it has to face the \con ict of
impedance" between vague and imprecise user information
needs and rigorously structured data, allowing users to
express their queries in natural language against a potentially
unknown database.
      </p>
      <p>
        Improving a technology is greatly eased by the existence
of a reference architecture paired with a proper evaluation
framework in order to have a coherent vision of the
components we have to leverage on and to precisely measure and
track performances over time. Unfortunately, the situation
is very fragmented in the case of keyword-based access to
relational data, since both a fully- edged reference
architecture and an evaluation methodology are still missing [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        In this paper, we start to investigate the problem of the
reproducibility of the experimental results for keyword-based
2017, Copyright is with the authors. Published in the Workshop
Proceedings of the EDBT/ICDT 2017 Joint Conference (March
21, 2017, Venice, Italy) on CEUR-WS.org (ISSN 1613-0073).
Distribution of this paper is permitted under the terms of the
Creative Commons license CC-by-nc-nd 4.0
access systems. Reproducibility is becoming a primary
concern in many areas of science and it is a key both for fully
understanding state-of-the-art solutions and for being able
to compare against and improve over them [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. However,
there is a lack of commonly shared open source platforms
implementing state-of-the-art algorithms for keyword-based
access to relational data as, for example, Terrier1 is in the
information retrieval eld; on the contrary, you mostly need
to re-implement each system from scratch [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Therefore, as part of a student project during the course
on databases of the master degree in computer science at the
University of Padua, we considered several state-of-the-art
algorithms and we implemented them from scratch, in order
to understand the di culties and pitfalls in implementing
them and to go towards a shared implementation of them.</p>
      <p>The paper is organized as follows: Section 2 presents the
related work; Section 3 introduces the implementation of the
di erent algorithm; Section 4 reports the lessons learned;
nally Section 5 wraps up the discussion and outlooks some
future work.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORKS</title>
      <p>
        In the following, we brie y summarize the algorithms
implemented in this paper. For a broader perspective on
keyword search over relational database, please refer to [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ].
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Graph-based Approaches</title>
      <p>This type of algorithms starts with the construction of
a graph where nodes are tuples and edges are the forward
(foreign) key constraints and their goal is to nd a set of
trees that connects a group of nodes containing all the
keywords in an input query. Weights between nodes and graph
type (direct or indirect) are dependent on each speci c
approach. Moreover, this type of algorithms di er for the use
of Dijkstra or Steiner methods to visit the graph: the former
allows you to nd the shortest spanning tree using only the
selected nodes, i.e. those that match with the query
keywords; the latter allows you to use other extra nodes which
are not included in the previous selection but exist in the
original graph.</p>
      <p>
        Browsing ANd Keyword Searching I (BANKS-I) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] The
algorithm adds a backward weighted edge for each directed
edge in the graph and assigns a relevance score to each node
in it. Then, it proceeds by nding the shortest path from
each keyword node to all the other nodes in the graph. Each
path is computed by running Dijkstra's single source
shortest path algorithm starting from each one of the keyword
nodes and traversing all the edges in reverse direction. This
process is called backward expanding search. When multiple
paths intersect at a common node r in the graph, the
resulting tree with root r and the nodes containing the keywords
as leaves is examined. If the tree contains all the keywords in
the input query, then its weight is computed and the tree is
added to a heap data structure of xed size, which contains
all the result trees in decreasing order of relevance. When
the heap is full or all the trees have been generated, the most
relevant of them are returned until a prede ned number of
results has been returned.
      </p>
      <p>
        Browsing ANd Keyword Searching II (BANKS-II) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
Bidirectional Search improves on BANKS-I by allowing
forward search from potential roots towards other keyword
nodes. The algorithm uses two concurrent iterators, called
outgoing and ingoing to explore the graph in both
directions. Both the iterators use the Dijkstra's single source
shortest path approach. BANKS-II also allows for
preferential expansion of paths that have less branching
using a spreading activation mechanism to prioritize the most
promising paths and to avoid wasteful exploration of the
graph.
      </p>
      <p>
        Dynamic Programming Best-First (DPBF) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]: it tries to
build a minimum cost Steiner tree: for each neighbour v of
the root in the graph, it creates a new tree with v as the new
root; the algorithm tries to merge trees with the same root
and a di erent set of keywords. If a tree contains all the
searched keywords, it is a solution for the algorithm. Given
a root and a speci c set of keywords, DPBF keeps only the
tree with lower weight, i.e. the best one for the speci c root
node and set of keywords, and it guarantees to be a total,
minimal and controlled in size solution.
      </p>
      <p>
        Steiner-Tree Approximation in Relationship graphs (STAR)
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]: it initially looks for tuples containing one or more of the
query's keywords (let's call this set V 0 V ) and then tries
to connect these tuples into a least-weight tree, i.e.
computing the best Steiner tree between the given nodes which
total weight is Pe2E w(e). In order to build a rst
interconnecting tree, STAR relies on a strategy similar to BANKS-I
but, instead of running single source shortest path iterators
from each node of V 0, STAR runs simple breadth- rst-search
from each terminal, called in a round-robin manner. As soon
as the iterators meet, a result is constructed. In the second
phase, STAR aims at improving the current tree iteratively
by replacing certain paths in the tree by new paths with a
lower weight from the underlying graph.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Schema-based Approaches</title>
      <p>Both DISCOVER-I and II use generalized inverted-index,
called Master Index and IR Engine respectively, to retrieve
tuples containing the input keywords, given by the user,
from the database. Both output a list of Candidate
Networks, albeit de ned slightly di erently.</p>
      <p>
        DISCOVER-I [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] Given a relational database and a set of
keyword K, Discover exploits the database schema to
generate a graph GT S: from Basic Tuple Sets Rikj , set of tuples
of the relation Ri containing the keyword kj 2 K, 8i; j, it
creates a node for each Tuple Set
      </p>
      <p>RiK0 =
\
k2K0</p>
      <p>k
Ri
[ Rik; 8K0
k2=K0</p>
      <p>K
(2.1)
sets of tuples of the relation Ri containing all and only
the keywords of K0, and one for each Free Tuple Sets, the
database relations. Edges are representative of each PK to
FK relationship. The algorithm then computes a complete
non-redundant set of Candidate Networks up to a maximum
number of joints T, pruning all the joining networks of
tuples that are not minimal (i.e no tuples without keywords
as leaves and do not contain the same tuple twice) and total
(i.e. contain every keyword 2 K, AND semantics).To
evaluate the Candidate Networks e ciently, Discover eventually
computes and stores a list of intermediate results of
common join subexpressions. Finding these intermediate results
is an NP-complete problem, so it uses an iterative greedy
algorithm based on frequencies: the most frequent joint
expression is stored in a view at every iteration and then reused
when possible. The results are nally displayed to the user
ordered by size.</p>
      <p>
        DISCOVER-II [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] The algorithm uses an IR engine
module when a query Q is issued, in this way it extracts from
each relation R the tuple set RQ with Score &gt; 0, where
Score is a value for each tuple using a IR-style relevance
scoring function. Candidate Networks are hence generated
based on the tuple sets returned by the IR engine and the
database schema, which are join expressions used to create
joining trees of tuples which are potantial answer to the
issued query. The nal step is to identify the top-k results,
task which is performed by the execution engine module
where it uses: Naive Algorithm which takes the top-k
results from each CN and it combine in a sort-merge manner to
identify the nal top-k results of the given query; Sparse
Algorithm computes a bound on the maximum possible score
of the tuple tree derived from a CN.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3. IMPLEMENTATION</title>
      <p>Students split up in groups of 3-4 people and each group
was responsible for the implementation of one algorithm. All
the algorithms have been implemented in Java while
PostgreSQL has been used as relational engine.</p>
      <p>The implementation of the di erent algorithms is available
as open source under BSD license2.</p>
      <p>We brie y summarize the main features of each
implementation:</p>
      <p>BANKS-I The required graph is created prior to the
execution of the algorithm executing multiple queries
on the selected database without any prior knowledge
of its schema. In our implementation, for each node in
the graph, only the ID of the tuple is saved and an
external le is used to store the information. The graph is
represented using adjacency lists. In order to speed up
the creation of the graph, we used the fastutil library3,
2https://bitbucket.org/ks-bd-2015-2016/ks-unipd
3http://fastutil.di.unimi.it/
in addition to the Java Standard Library. Finally, to
optimize the performance while executing multiple
instances of Dijkstra's algorithm (one for each keyword
node), we chose to compute the next node to be
returned by the algorithm only when it was required. In
fact, most of the times, only a handful of nodes were
required to build a valid result tree. With this
approach we avoided executing Dijkstra's algorithm on
the whole graph, optimizing the execution time and
the memory usage. For the execution of each query
the memory limit was set to 5 Gb while the maximum
running time was set to 1 hour.</p>
      <p>BANKS-II We divided our implementation in two phases:
pre-processing and the BANKS-II execution. We
implemented the graph in a OO manner. The vertexes
are stored using a two-layered HashMap, to improve
performances and to reduce as much as possible the
probability of collisions. The rst layer refers to the
relation name, while the second layer uses as index the
primary key of the tuple. The paper suggested a set
of di erent weight assignment strategies (i.e.
Pagerank); we decided to use equally weighted edges for
simplicity. Once the user provides the keywords, the
pre-processing ends with the creation of the matching
sets. These sets are also organized in a two-layered
HashMap data structure: vertexes are inserted in the
respective matching sets using SQL queries to nd
tuples with attributes having matching values.
During the BANKS-II execution, outgoing and ingoing
iterators start to explore the graph, using two
priority queues to extract the highest activated node. We
created these queues using the standard Java
PriorityQueue class, de ning a custom compare method.
When a new solution (a rooted tree connecting all the
matching sets) is found, we store it in a PriorityQueue;
these results are ranked with a score given by the
overall sum of the edge weights. Our implementation
creates the graph from the DB once, and then accepts
keywords from the user to perform multiple searches.
DISCOVER-I We rst implemented the Master Index
to perform the full text search through the function
to tsvector provided by PostgreSQL and then we
created the initial graph GT S using an adjacency matrix.
We observed that a lot of queries were repeated while
computing Tuple Sets because the same Basic Tuple
Sets were created many times by the DBMS "on the
y". Thus we decided to store a new table for each
Basic Tuple Set to be reused while performing unions
and interceptions as in (2.1).To further improve
performances of full text search we preprocessed the main
search elds to remove english stopwords and to
convert all tokens to lowercase characters. We then stored
the lexemes in a dedicated Generalized Inverted Index.
The Candidate Network algorithm then prune, expand
and accept the Candidate Networks using static
methods. The accepted ones (treated as graphs) were
rewritten, by the Plan Generator, replacing repeated joins
inside them (i.e. the same couples of nodes and the
connecting edge) with a reference (i.e. a new node)
to a new temporary table. Finally queries were
evaluated from the resulted graphs with a BFS algorithm
in order to guarantee the correct order of joins.</p>
      <p>DISCOVER-II The graph was built using the
information schema and the algorithm was implemented in
java using native collections. The IR Engine module
identi es all database tuples that have a non-zero score
for a given query, then each occurrence of a keyword
is recorded as a tuple-attribute pair. Moreover, the
IR Engine relies an inverted index that associates each
keyword that appears in the database with a list of
occurrences of the keyword, which we implemented using
the GIN index of PostreSQL. To obtain better
performances, we store as a view the tuple sets identi ed for
a query Q. The Candidate Network Generator involves
a combination of free and not-free tuple set, the former
represent the tuple set with no occurrences of the query
keyboards, that help via foreign-key join the
connection of non-free tuple set, the latter contains at least
one keyword. The Execution Engine module receives
as input the set of CNs along with the non-free tuple
sets, it contacts the RDBMS's query execution engine
repeatedly to identify the top-k query results. We
implemented two algorithm: the Naive algorithm,
considering a Boolean-OR semantics, issues a SQL query
for each CN to answer a top-k query, then combines
the results in a sort-merge manner to identify the nal
top-k results of the query; the Sparse algorithm
discards at any point in time any (unprocessed) CN that
is guaranteed not to produce a top-k match for the
query, it computes a bound M P Si on the maximum
possible score of a tuple tree derived from a CN Ci. If
M P Si does not exceed the actual score of k already
produced tuple trees, then CN Ci can be safely
removed from further consideration. M P Si is the score
of a hypothetical joining tree of tuples T that contains
the top tuples from every non-free tuple set in Ci. The
CNs for a query are evaluated in ascending size order,
so that smallest CNs, i.e. the least expensive to
process and the most likely to produce high-score tuple
trees, are evaluated rst.</p>
      <p>DPBF The project has been divided into two phases:
creation and serialization of the database-graph and
dpbf implementation. To get all vertexes containing
query's keywords quickly, we used hashmap data
structure to store the graph and sets of vertexes associated
to their words. Completed the graph, our algorithm
gets the vertexes whence start to build the resulting
trees. (i) For each of these we move up one level,
removing similar trees with higher costs and then (ii)
all trees with same root node and di erent sets of
keywords are merged. Points (i) and (ii) are repeated until
at least one result is found. We had to implement some
Java data structure to do that correctly and e ciently.
STAR has been implemented in a schema-agnostic way
so the rst step of the algorithm loads all DB metadata
needed to build the tree. Given an i-keywords query we
look for the starting tuples in a multithreaded manner.
All those tuples are stored in an ad-hoc graph
datastructure based on Java TreeSet class. Considering
these tuples as expanding trees we try to connect them
following both ingoing and outgoing foreign keys,
dynamically loading referenced tuples from DB, until we
get a tree containing all the query's keywords. Given
that our test DBs were not weighted we decided to set
w(e) = 1 8e 2 E. At this point we worked on this
tree to optimize its weight reshaping its structure: we
iteratively remove the highest weight path replacing it
with a lower weight one.</p>
    </sec>
    <sec id="sec-6">
      <title>LESSONS LEARNED</title>
      <p>For each examined system, we brie y summarize the
difculties and lessons learned in reproducing it.</p>
      <p>BANKS-I During the implementation of the algorithm,
we encountered some di culties due to missing
information in the descriptions of crucial parts of the
algorithm. The chosen technique to run e ciently multiple
instances of Dijkstra's single source shortest path
algorithm was not described in the original paper, even
though the choice of one implementation over another
in uenced considerably the memory usage and the
execution time. We also encountered some di culties
due to the lack of precision in the explanation of how
to evaluate the result trees. In fact, the way to
handle some particular cases such as nodes containing all
the keywords in the input query or trees composed by
only two nodes, were not described. It became
noticeable that the major weakness of the algorithm was the
memory occupation that became more evident when
testing it on greater datasets.</p>
      <p>
        BANKS-II We found the paper su ciently clear in
de ning the Bidirectional Search algorithm and the
search approach; however, the implementation of some
procedures has not been trivial. As an example, the
paper does not state a proper algorithm to update
the reached-ancestors' distances and activations when
adding a new node to new the better path (ATTACH
and ACTIVATE function). Therefore, we opted for a
recursive solution for these procedures. To increase the
performances of the searches, we opted for two
heuristic solutions: we set a lower maximum distance of the
answer trees than the one suggested by the paper; we
set a minimum threshold value for the activation of a
node, to avoid wasteful spreading towards the graph.
DISCOVER-I The paper provided was heavily focused
on a theoretical perspective therefore including a lot
of formal de nitions of problems and structures, but
lacking of practical directives. Certain parts of the
algorithm, like the Master Index, were almost treated
like a blackbox. Speci cally Oracle8i interMedia Text
were used but we had to implement our own Master
Index using full text search provided by PostgreSQL.
We made our own assumptions to implement it, for
example by searching and indexing text elds only and
removing English stop words. This probably lead to
di erent execution times, making time performances
comparison with the ones in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] less signi cant. This
is reasonable because of the larger impact that the
indexes have on the queries computation compared to
the advantages brought by the introduction of the Plan
Generator and its execution. Moreover the Plan
Generator was challenging to implement, because
Candidate Networks may contain joins of tuples from the
same relation through an entry of another one. But
this operation is refused from the DBMS because the
resulting table would have repeated column names,
forcing us to change them to be unique. Eventually
there were no suggestion about how to evaluate the
nal queries: there is the need of an order in
considering the tables, with join operations on the correct
columns. For this reason we decided to evaluate
every Candidate Network with a BFS algorithm, that
guaranties correctness.
      </p>
      <p>DISCOVER-II We observed that in a dense dataset
as IMDB the execution time using OR semantic grows
exponentially then queries don't give results in a
useful time with a number of keywords greater than 5. If
we used an AND semantic, since it keeps more CNs,
a post-processing on the stream of results needed and
so worsens the performance. Furthermore, not clear if
the authors keep the duplicate CNs queries or if they
remove before (not speci ed if Set or Bag semantic)
and we found biased queries with the inclusion of
relation names in them, e.g. char name. We chose to
not implement the AND semantic because we consider
such method highly ine cient. For better performance
we decided to use hashMap in the score function that
was lled when the lists were created.</p>
      <p>DPBF During the algorithm implementation we have
faced three di erent problems: making the graph
creation and the algorithm implementation e cient and
using Java and its data-structures. We tried to make
the graph creation as fast as possible, assuming that
it is created only once. So we serialized the graph into
a le. The main cause of problems was the lack of
details in the paper. In particular only some general
cases were described, ignoring speci c ones. The last
class of problems deals with Java. DPBF main data
structure is a priority queue, based on trees' weights.
Unfortunately Java PriorityQueue hasn't a useful
interface compared to our needs; furthermore it gave us
a lot of problems in sorting weights.</p>
      <p>
        STAR In our work we had to deal with some di
culties. The lack of implementation details in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] gave
us a wide range of possibilities especially in the tree
building phase. The most interesting and challenging
issue concerned the choice of the keyword-containing
tuples: if some of the DB's columns contain repeated
strings searching a keyword results with high
probability in many matching tuples, which causes longer
execution times and lower accuracy for some queries.
To avoid these issues we tried to add some heuristics
sorting the expanding trees by number of query's
keywords contained, i.e. trees with more keywords are
expanded rst. Furthermore, to increase DB
throughput, we used multiple threads and connections to the
DB during the queries execution. Java standard
library does not have a graph data structure we rst
tried JGraphT library, but, after some benchmarks,
we decided to write our own graph class based on Java
TreeSet.
      </p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>The paper brie y summarized an ongoing project for
releasing an open source Java implementation of the most
common algorithms for keyword-based search over relational
data. The project has been initiated as a Master students
project where teams of 3-4 students implemented each
algorithm and noted the di culties and issues in trying to
reproduce them. All the developed implementations are available
as open source at: https://bitbucket.org/ks-bd-2015-2016/
ks-unipd.</p>
      <p>
        We are currently working on testing all these di erent
implementations on the same hardware in order to compare
their e ciency. We are also processing the datasets prepared
by [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] in order to make them closer to the standard practices
adopted in Information Retrieval (IR) evaluation [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and
then we will systematically investigate the e ectiveness of
the implemented algorithms.
      </p>
      <p>Finally, we will work on harmonizing the di erent
implementations, factoring out commonly used data structures
and shared functionalities, in order to go towards a uni ed
open source framework for keyword-based search over
relational data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ailamaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Balazinska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Carey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Doan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Franklin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehrke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Haas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Halevy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Hellerstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. E.</given-names>
            <surname>Ioannidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kossmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Madden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mehrotra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Milo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Naughton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Markl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Olston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Ooi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Re</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Suciu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stonebraker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Walter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Widom</surname>
          </string-name>
          .
          <source>The Beckman Report on Database Research. ACM SIGMOD Record</source>
          ,
          <volume>43</volume>
          (
          <issue>3</issue>
          ):
          <volume>61</volume>
          {
          <fpage>70</fpage>
          ,
          <year>September 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bergamaschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guerra</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Silvello</surname>
          </string-name>
          .
          <article-title>Keyword-based Search over Databases: A Roadmap for a Reference Architecture Paired with an Evaluation Framework</article-title>
          .
          <source>LNCS Transactions on Computational Collective Intelligence (TCCI)</source>
          ,
          <volume>9630</volume>
          :1{
          <fpage>20</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bhalotia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hulgeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Nakhe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Sudarshan</surname>
          </string-name>
          .
          <article-title>Keyword Searching and Browsing in Databases using BANKS</article-title>
          . In R. Agrawal,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dittrich</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. H. H.</given-names>
            <surname>Ngu</surname>
          </string-name>
          , editors,
          <source>Proc. 18th International Conference on Data Engineering (ICDE</source>
          <year>2002</year>
          ), pages
          <fpage>431</fpage>
          {
          <fpage>440</fpage>
          . IEEE Computer Society, Los Alamitos, CA, USA,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Co man and</article-title>
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Weaver</surname>
          </string-name>
          .
          <article-title>An Empirical Performance Evaluation of Relational Keyword Search Techniques</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering (TKDE)</source>
          ,
          <volume>1</volume>
          (
          <issue>26</issue>
          ):
          <volume>30</volume>
          {
          <fpage>42</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Xu</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <article-title>Finding Top-k Min-Cost Connected Trees in Databases</article-title>
          . In R. Chirkova,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dogac</surname>
          </string-name>
          , M. T. O zsu, and T. Sellis, editors,
          <source>Proc. 23rd International Conference on Data Engineering (ICDE</source>
          <year>2007</year>
          ), pages
          <fpage>836</fpage>
          {
          <fpage>845</fpage>
          . IEEE Computer Society, Los Alamitos, CA, USA,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          .
          <article-title>Reproducibility Challenges in Information Retrieval Evaluation</article-title>
          .
          <source>ACM Journal of Data and Information Quality (JDIQ)</source>
          ,
          <volume>8</volume>
          (
          <issue>2</issue>
          ):8:
          <issue>1</issue>
          {
          <issue>8</issue>
          :4,
          <string-name>
            <surname>January</surname>
          </string-name>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Fuhr</surname>
          </string-name>
          , K. Jarvelin, N. Kando,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lippold</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Zobel</surname>
          </string-name>
          .
          <article-title>Increasing Reproducibility in IR: Findings from the Dagstuhl Seminar on \Reproducibility of Data-Oriented Experiments in e-Science"</article-title>
          .
          <source>SIGIR Forum</source>
          ,
          <volume>50</volume>
          (
          <issue>1</issue>
          ):
          <volume>68</volume>
          {
          <fpage>82</fpage>
          ,
          <year>June 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V.</given-names>
            <surname>Hristidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gravano</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Papakonstantinou</surname>
          </string-name>
          .
          <article-title>E cient IR-Style Keyword Search over Relational Databases</article-title>
          . In J. C. Freytag,
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Lockemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Carey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Selinger</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A</surname>
          </string-name>
          . Heuer, editors,
          <source>Proc. 29th International Conference on Very Large Data Bases (VLDB</source>
          <year>2003</year>
          ), pages
          <fpage>850</fpage>
          {
          <fpage>861</fpage>
          . Morgan Kaufmann Publisher, Inc., San Francisco, CA, USA,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Hristidis</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Papakonstantinou</surname>
          </string-name>
          . DISCOVER:
          <article-title>Keyword Search in Relational Databases</article-title>
          . In P. A.
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>Y. E.</given-names>
          </string-name>
          <string-name>
            <surname>Ioannidis</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Ramakrishnan</surname>
          </string-name>
          , and D. Papadias, editors,
          <source>Proc. 28th International Conference on Very Large Data Bases (VLDB</source>
          <year>2002</year>
          ), pages
          <fpage>670</fpage>
          {
          <fpage>681</fpage>
          . Morgan Kaufmann Publisher, Inc., San Francisco, CA, USA,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kacholia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pandit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sudarshan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Desai</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Karambelkar</surname>
          </string-name>
          .
          <article-title>Bidirectional Expansion For Keyword Search on Graph Databases</article-title>
          . In K. Bohm,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Haas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Kersten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.-A.</given-names>
            <surname>Larson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Chin</surname>
          </string-name>
          Ooi, editors,
          <source>Proc. 31st International Conference on Very Large Data Bases (VLDB</source>
          <year>2005</year>
          ), pages
          <fpage>505</fpage>
          {
          <fpage>516</fpage>
          . ACM Press, New York, USA,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kasneci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ramanath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sozio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          , and
          <string-name>
            <surname>G. Weikum. STAR</surname>
          </string-name>
          :
          <article-title>Steiner-Tree Approximation in Relationship Graphs</article-title>
          . In
          <string-name>
            <surname>Z</surname>
          </string-name>
          . Li,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. E.</given-names>
            <surname>Ioannidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and R. Ng, editors,
          <source>Proc. 25th International Conference on Data Engineering (ICDE</source>
          <year>2009</year>
          ), pages
          <fpage>868</fpage>
          {
          <fpage>879</fpage>
          . IEEE Computer Society, Los Alamitos, CA, USA,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanderson</surname>
          </string-name>
          .
          <source>Test Collection Based Evaluation of Information Retrieval Systems. Foundations and Trends in Information Retrieval (FnTIR)</source>
          ,
          <volume>4</volume>
          (
          <issue>4</issue>
          ):
          <volume>247</volume>
          {
          <fpage>375</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          and
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          .
          <article-title>A Survey of Algorithms for Keyword Search on Graph Data</article-title>
          . In C. C. Aggarwal and H. Wang, editors,
          <source>Managing and Mining Graph Data</source>
          , pages
          <volume>249</volume>
          {
          <fpage>273</fpage>
          . Springer-Verlag, New York, USA,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J. X.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Qin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Chang</surname>
          </string-name>
          . Keyword Search in Databases. Morgan &amp; Claypool Publishers, USA,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>