=Paper=
{{Paper
|id=Vol-1810/KARS_paper_01
|storemode=property
|title=Towards Open-Source Shared Implementations of Keyword-Based Access Systems to Relational Data
|pdfUrl=https://ceur-ws.org/Vol-1810/KARS_paper_01.pdf
|volume=Vol-1810
|authors=Alex Badan,Luca Benvegnù,Matteo Biasetton,Giovanni Bonato,Alessandro Brighente,Alberto Cenzato,Piergiorgio Ceron,Giovanni Cogato,Stefano Marchesin,Alberto Minetto,Leonardo Pellegrina,Alberto Purpura,Riccardo Simionato,Nicolò Soleti,Matteo Tessarotto,Andrea Tonon,Federico Vendramin,Nicola Ferro
|dblpUrl=https://dblp.org/rec/conf/edbt/BadanBBBBCCCMMP17
}}
==Towards Open-Source Shared Implementations of Keyword-Based Access Systems to Relational Data==
Towards open-source shared implementations of
keyword-based access systems to relational data
Alex Badan, Luca Benvegnù, Matteo Biasetton, Giovanni Bonato, Alessandro
Brighente, Alberto Cenzato, Piergiorgio Ceron, Giovanni Cogato, Stefano Marchesin,
Alberto Minetto, Leonardo Pellegrina, Alberto Purpura, Riccardo Simionato, Nicolò
Soleti, Matteo Tessarotto, Andrea Tonon, Federico Vendramin, Nicola Ferro
University of Padua, Italy
{alex.badan, luca.benvegnu.2, matteo.biasetton, giovanni.bonato,
alessandro.brighente, alberto.cenzato, piergiorgio.ceron, giovanni.cogato.1,
stefano.marchesin, alberto.minetto, leonardo.pellegrina, alberto.purpura,
riccardo.simionato.1, nicolo.soleti, matteo.tessarotto.1, andrea.tonon.3,
federico.vendramin}@studenti.unipd.it
ferro@dei.unipd.it
ABSTRACT access systems. Reproducibility is becoming a primary con-
Keyword-based access systems to relational data address a cern in many areas of science and it is a key both for fully
challenging and important issue, i.e. letting users to exploit understanding state-of-the-art solutions and for being able
natural language to access databases whose schema and in- to compare against and improve over them [6, 7]. However,
stance are possibly unknown. Unfortunately, there are al- there is a lack of commonly shared open source platforms
most no shared implementations of such systems and this implementing state-of-the-art algorithms for keyword-based
hampers the reproducibility of experimental results. We ex- access to relational data as, for example, Terrier1 is in the
plore the difficulties in reproducing such systems and share information retrieval field; on the contrary, you mostly need
implementations of several state-of-the-art algorithms. to re-implement each system from scratch [4].
Therefore, as part of a student project during the course
on databases of the master degree in computer science at the
1. INTRODUCTION University of Padua, we considered several state-of-the-art
Keyword-based access to relational data is a key technol- algorithms and we implemented them from scratch, in order
ogy for lowering the barriers of access to the huge amounts to understand the difficulties and pitfalls in implementing
of data managed by databases. It is an extremely difficult them and to go towards a shared implementation of them.
and open challenge [1] since it has to face the “conflict of The paper is organized as follows: Section 2 presents the
impedance” between vague and imprecise user information related work; Section 3 introduces the implementation of the
needs and rigorously structured data, allowing users to ex- different algorithm; Section 4 reports the lessons learned;
press their queries in natural language against a potentially finally Section 5 wraps up the discussion and outlooks some
unknown database. future work.
Improving a technology is greatly eased by the existence
of a reference architecture paired with a proper evaluation
framework in order to have a coherent vision of the compo-
2. RELATED WORKS
nents we have to leverage on and to precisely measure and In the following, we briefly summarize the algorithms im-
track performances over time. Unfortunately, the situation plemented in this paper. For a broader perspective on key-
is very fragmented in the case of keyword-based access to word search over relational database, please refer to [13, 14].
relational data, since both a fully-fledged reference architec-
ture and an evaluation methodology are still missing [2]. 2.1 Graph-based Approaches
In this paper, we start to investigate the problem of the This type of algorithms starts with the construction of
reproducibility of the experimental results for keyword-based a graph where nodes are tuples and edges are the forward
(foreign) key constraints and their goal is to find a set of
trees that connects a group of nodes containing all the key-
2017, Copyright is with the authors. Published in the Workshop words in an input query. Weights between nodes and graph
Proceedings of the EDBT/ICDT 2017 Joint Conference (March type (direct or indirect) are dependent on each specific ap-
21, 2017, Venice, Italy) on CEUR-WS.org (ISSN 1613-0073). proach. Moreover, this type of algorithms differ for the use
Distribution of this paper is permitted under the terms of the of Dijkstra or Steiner methods to visit the graph: the former
Creative Commons license CC-by-nc-nd 4.0 allows you to find the shortest spanning tree using only the
selected nodes, i.e. those that match with the query key-
1
http://www.terrier.org/
words; the latter allows you to use other extra nodes which
are not included in the previous selection but exist in the DISCOVER-I [9] Given a relational database and a set of
original graph. keyword K, Discover exploits the database schema to gen-
k
Browsing ANd Keyword Searching I (BANKS-I) [3] The erate a graph GT S : from Basic Tuple Sets Ri j , set of tuples
algorithm adds a backward weighted edge for each directed of the relation Ri containing the keyword kj ∈ K, ∀i, j, it
edge in the graph and assigns a relevance score to each node creates a node for each Tuple Set
in it. Then, it proceeds by finding the shortest path from 0 \ k [ k
each keyword node to all the other nodes in the graph. Each RiK = Ri − Ri , ∀K 0 ⊆ K (2.1)
path is computed by running Dijkstra’s single source short- k∈K 0 / 0
k∈K
est path algorithm starting from each one of the keyword sets of tuples of the relation Ri containing all and only
nodes and traversing all the edges in reverse direction. This the keywords of K 0 , and one for each Free Tuple Sets, the
process is called backward expanding search. When multiple database relations. Edges are representative of each PK to
paths intersect at a common node r in the graph, the result- FK relationship. The algorithm then computes a complete
ing tree with root r and the nodes containing the keywords non-redundant set of Candidate Networks up to a maximum
as leaves is examined. If the tree contains all the keywords in number of joints T, pruning all the joining networks of tu-
the input query, then its weight is computed and the tree is ples that are not minimal (i.e no tuples without keywords
added to a heap data structure of fixed size, which contains as leaves and do not contain the same tuple twice) and total
all the result trees in decreasing order of relevance. When (i.e. contain every keyword ∈ K, AND semantics).To evalu-
the heap is full or all the trees have been generated, the most ate the Candidate Networks efficiently, Discover eventually
relevant of them are returned until a predefined number of computes and stores a list of intermediate results of com-
results has been returned. mon join subexpressions. Finding these intermediate results
Browsing ANd Keyword Searching II (BANKS-II) [10] is an NP-complete problem, so it uses an iterative greedy
Bidirectional Search improves on BANKS-I by allowing for- algorithm based on frequencies: the most frequent joint ex-
ward search from potential roots towards other keyword pression is stored in a view at every iteration and then reused
nodes. The algorithm uses two concurrent iterators, called when possible. The results are finally displayed to the user
outgoing and ingoing to explore the graph in both direc- ordered by size.
tions. Both the iterators use the Dijkstra’s single source
shortest path approach. BANKS-II also allows for pref- DISCOVER-II [8] The algorithm uses an IR engine mod-
erential expansion of paths that have less branching us- ule when a query Q is issued, in this way it extracts from
ing a spreading activation mechanism to prioritize the most each relation R the tuple set RQ with Score > 0, where
promising paths and to avoid wasteful exploration of the Score is a value for each tuple using a IR-style relevance
graph. scoring function. Candidate Networks are hence generated
Dynamic Programming Best-First (DPBF) [5]: it tries to based on the tuple sets returned by the IR engine and the
build a minimum cost Steiner tree: for each neighbour v of database schema, which are join expressions used to create
the root in the graph, it creates a new tree with v as the new joining trees of tuples which are potantial answer to the is-
root; the algorithm tries to merge trees with the same root sued query. The final step is to identify the top-k results,
and a different set of keywords. If a tree contains all the task which is performed by the execution engine module
searched keywords, it is a solution for the algorithm. Given where it uses: Naive Algorithm which takes the top-k re-
a root and a specific set of keywords, DPBF keeps only the sults from each CN and it combine in a sort-merge manner to
tree with lower weight, i.e. the best one for the specific root identify the final top-k results of the given query; Sparse Al-
node and set of keywords, and it guarantees to be a total, gorithm computes a bound on the maximum possible score
minimal and controlled in size solution. of the tuple tree derived from a CN.
Steiner-Tree Approximation in Relationship graphs (STAR)
[11]: it initially looks for tuples containing one or more of the
query’s keywords (let’s call this set V 0 ⊂ V ) and then tries
3. IMPLEMENTATION
to connect these tuples into a least-weight tree, i.e. com- Students split up in groups of 3-4 people and each group
puting the bestPSteiner tree between the given nodes which was responsible for the implementation of one algorithm. All
total weight is e∈E w(e). In order to build a first intercon- the algorithms have been implemented in Java while Post-
necting tree, STAR relies on a strategy similar to BANKS-I greSQL has been used as relational engine.
but, instead of running single source shortest path iterators The implementation of the different algorithms is available
from each node of V 0 , STAR runs simple breadth-first-search as open source under BSD license2 .
from each terminal, called in a round-robin manner. As soon We briefly summarize the main features of each imple-
as the iterators meet, a result is constructed. In the second mentation:
phase, STAR aims at improving the current tree iteratively • BANKS-I The required graph is created prior to the
by replacing certain paths in the tree by new paths with a execution of the algorithm executing multiple queries
lower weight from the underlying graph. on the selected database without any prior knowledge
of its schema. In our implementation, for each node in
2.2 Schema-based Approaches the graph, only the ID of the tuple is saved and an ex-
Both DISCOVER-I and II use generalized inverted-index, ternal file is used to store the information. The graph is
called Master Index and IR Engine respectively, to retrieve represented using adjacency lists. In order to speed up
tuples containing the input keywords, given by the user, the creation of the graph, we used the fastutil library3 ,
2
from the database. Both output a list of Candidate Net- https://bitbucket.org/ks-bd-2015-2016/ks-unipd
3
works, albeit defined slightly differently. http://fastutil.di.unimi.it/
in addition to the Java Standard Library. Finally, to • DISCOVER-II The graph was built using the infor-
optimize the performance while executing multiple in- mation schema and the algorithm was implemented in
stances of Dijkstra’s algorithm (one for each keyword java using native collections. The IR Engine module
node), we chose to compute the next node to be re- identifies all database tuples that have a non-zero score
turned by the algorithm only when it was required. In for a given query, then each occurrence of a keyword
fact, most of the times, only a handful of nodes were is recorded as a tuple-attribute pair. Moreover, the
required to build a valid result tree. With this ap- IR Engine relies an inverted index that associates each
proach we avoided executing Dijkstra’s algorithm on keyword that appears in the database with a list of oc-
the whole graph, optimizing the execution time and currences of the keyword, which we implemented using
the memory usage. For the execution of each query the GIN index of PostreSQL. To obtain better perfor-
the memory limit was set to 5 Gb while the maximum mances, we store as a view the tuple sets identified for
running time was set to 1 hour. a query Q. The Candidate Network Generator involves
a combination of free and not-free tuple set, the former
• BANKS-II We divided our implementation in two phases: represent the tuple set with no occurrences of the query
pre-processing and the BANKS-II execution. We im- keyboards, that help via foreign-key join the connec-
plemented the graph in a OO manner. The vertexes tion of non-free tuple set, the latter contains at least
are stored using a two-layered HashMap, to improve one keyword. The Execution Engine module receives
performances and to reduce as much as possible the as input the set of CNs along with the non-free tuple
probability of collisions. The first layer refers to the sets, it contacts the RDBMS’s query execution engine
relation name, while the second layer uses as index the repeatedly to identify the top-k query results. We im-
primary key of the tuple. The paper suggested a set plemented two algorithm: the Naive algorithm, con-
of different weight assignment strategies (i.e. Pager- sidering a Boolean-OR semantics, issues a SQL query
ank); we decided to use equally weighted edges for for each CN to answer a top-k query, then combines
simplicity. Once the user provides the keywords, the the results in a sort-merge manner to identify the final
pre-processing ends with the creation of the matching top-k results of the query; the Sparse algorithm dis-
sets. These sets are also organized in a two-layered cards at any point in time any (unprocessed) CN that
HashMap data structure: vertexes are inserted in the is guaranteed not to produce a top-k match for the
respective matching sets using SQL queries to find query, it computes a bound M P Si on the maximum
tuples with attributes having matching values. Dur- possible score of a tuple tree derived from a CN Ci . If
ing the BANKS-II execution, outgoing and ingoing M P Si does not exceed the actual score of k already
iterators start to explore the graph, using two prior- produced tuple trees, then CN Ci can be safely re-
ity queues to extract the highest activated node. We moved from further consideration. M P Si is the score
created these queues using the standard Java Prior- of a hypothetical joining tree of tuples T that contains
ityQueue class, defining a custom compare method. the top tuples from every non-free tuple set in Ci . The
When a new solution (a rooted tree connecting all the CNs for a query are evaluated in ascending size order,
matching sets) is found, we store it in a PriorityQueue; so that smallest CNs, i.e. the least expensive to pro-
these results are ranked with a score given by the over- cess and the most likely to produce high-score tuple
all sum of the edge weights. Our implementation cre- trees, are evaluated first.
ates the graph from the DB once, and then accepts
keywords from the user to perform multiple searches. • DPBF The project has been divided into two phases:
creation and serialization of the database-graph and
• DISCOVER-I We first implemented the Master Index dpbf implementation. To get all vertexes containing
to perform the full text search through the function query’s keywords quickly, we used hashmap data struc-
to tsvector provided by PostgreSQL and then we cre- ture to store the graph and sets of vertexes associated
ated the initial graph GT S using an adjacency matrix. to their words. Completed the graph, our algorithm
We observed that a lot of queries were repeated while gets the vertexes whence start to build the resulting
computing Tuple Sets because the same Basic Tuple trees. (i) For each of these we move up one level, re-
Sets were created many times by the DBMS ”on the moving similar trees with higher costs and then (ii)
fly”. Thus we decided to store a new table for each all trees with same root node and different sets of key-
Basic Tuple Set to be reused while performing unions words are merged. Points (i) and (ii) are repeated until
and interceptions as in (2.1).To further improve per- at least one result is found. We had to implement some
formances of full text search we preprocessed the main Java data structure to do that correctly and efficiently.
search fields to remove english stopwords and to con-
vert all tokens to lowercase characters. We then stored • STAR has been implemented in a schema-agnostic way
the lexemes in a dedicated Generalized Inverted Index. so the first step of the algorithm loads all DB metadata
The Candidate Network algorithm then prune, expand needed to build the tree. Given an i-keywords query we
and accept the Candidate Networks using static meth- look for the starting tuples in a multithreaded manner.
ods. The accepted ones (treated as graphs) were rewrit- All those tuples are stored in an ad-hoc graph data-
ten, by the Plan Generator, replacing repeated joins structure based on Java TreeSet class. Considering
inside them (i.e. the same couples of nodes and the these tuples as expanding trees we try to connect them
connecting edge) with a reference (i.e. a new node) following both ingoing and outgoing foreign keys, dy-
to a new temporary table. Finally queries were eval- namically loading referenced tuples from DB, until we
uated from the resulted graphs with a BFS algorithm get a tree containing all the query’s keywords. Given
in order to guarantee the correct order of joins. that our test DBs were not weighted we decided to set
w(e) = 1 ∀e ∈ E. At this point we worked on this forcing us to change them to be unique. Eventually
tree to optimize its weight reshaping its structure: we there were no suggestion about how to evaluate the
iteratively remove the highest weight path replacing it final queries: there is the need of an order in consid-
with a lower weight one. ering the tables, with join operations on the correct
columns. For this reason we decided to evaluate ev-
4. LESSONS LEARNED ery Candidate Network with a BFS algorithm, that
guaranties correctness.
For each examined system, we briefly summarize the dif-
ficulties and lessons learned in reproducing it. • DISCOVER-II We observed that in a dense dataset
as IMDB the execution time using OR semantic grows
• BANKS-I During the implementation of the algorithm,
exponentially then queries don’t give results in a use-
we encountered some difficulties due to missing infor-
ful time with a number of keywords greater than 5. If
mation in the descriptions of crucial parts of the algo-
we used an AND semantic, since it keeps more CNs,
rithm. The chosen technique to run efficiently multiple
a post-processing on the stream of results needed and
instances of Dijkstra’s single source shortest path al-
so worsens the performance. Furthermore, not clear if
gorithm was not described in the original paper, even
the authors keep the duplicate CNs queries or if they
though the choice of one implementation over another
remove before (not specified if Set or Bag semantic)
influenced considerably the memory usage and the ex-
and we found biased queries with the inclusion of re-
ecution time. We also encountered some difficulties
lation names in them, e.g. char name. We chose to
due to the lack of precision in the explanation of how
not implement the AND semantic because we consider
to evaluate the result trees. In fact, the way to han-
such method highly inefficient. For better performance
dle some particular cases such as nodes containing all
we decided to use hashMap in the score function that
the keywords in the input query or trees composed by
was filled when the lists were created.
only two nodes, were not described. It became notice-
able that the major weakness of the algorithm was the • DPBF During the algorithm implementation we have
memory occupation that became more evident when faced three different problems: making the graph cre-
testing it on greater datasets. ation and the algorithm implementation efficient and
using Java and its data-structures. We tried to make
• BANKS-II We found the paper sufficiently clear in
the graph creation as fast as possible, assuming that
defining the Bidirectional Search algorithm and the
it is created only once. So we serialized the graph into
search approach; however, the implementation of some
a file. The main cause of problems was the lack of
procedures has not been trivial. As an example, the
details in the paper. In particular only some general
paper does not state a proper algorithm to update
cases were described, ignoring specific ones. The last
the reached-ancestors’ distances and activations when
class of problems deals with Java. DPBF main data
adding a new node to new the better path (ATTACH
structure is a priority queue, based on trees’ weights.
and ACTIVATE function). Therefore, we opted for a
Unfortunately Java PriorityQueue hasn’t a useful in-
recursive solution for these procedures. To increase the
terface compared to our needs; furthermore it gave us
performances of the searches, we opted for two heuris-
a lot of problems in sorting weights.
tic solutions: we set a lower maximum distance of the
answer trees than the one suggested by the paper; we • STAR In our work we had to deal with some difficul-
set a minimum threshold value for the activation of a ties. The lack of implementation details in [11] gave
node, to avoid wasteful spreading towards the graph. us a wide range of possibilities especially in the tree
building phase. The most interesting and challenging
• DISCOVER-I The paper provided was heavily focused
issue concerned the choice of the keyword-containing
on a theoretical perspective therefore including a lot
tuples: if some of the DB’s columns contain repeated
of formal definitions of problems and structures, but
strings searching a keyword results with high proba-
lacking of practical directives. Certain parts of the al-
bility in many matching tuples, which causes longer
gorithm, like the Master Index, were almost treated
execution times and lower accuracy for some queries.
like a blackbox. Specifically Oracle8i interMedia Text
To avoid these issues we tried to add some heuristics
were used but we had to implement our own Master
sorting the expanding trees by number of query’s key-
Index using full text search provided by PostgreSQL.
words contained, i.e. trees with more keywords are
We made our own assumptions to implement it, for ex-
expanded first. Furthermore, to increase DB through-
ample by searching and indexing text fields only and
put, we used multiple threads and connections to the
removing English stop words. This probably lead to
DB during the queries execution. Java standard li-
different execution times, making time performances
brary does not have a graph data structure we first
comparison with the ones in [4] less significant. This
tried JGraphT library, but, after some benchmarks,
is reasonable because of the larger impact that the in-
we decided to write our own graph class based on Java
dexes have on the queries computation compared to
TreeSet.
the advantages brought by the introduction of the Plan
Generator and its execution. Moreover the Plan Gen-
erator was challenging to implement, because Candi- 5. CONCLUSIONS AND FUTURE WORK
date Networks may contain joins of tuples from the The paper briefly summarized an ongoing project for re-
same relation through an entry of another one. But leasing an open source Java implementation of the most
this operation is refused from the DBMS because the common algorithms for keyword-based search over relational
resulting table would have repeated column names, data. The project has been initiated as a Master students
project where teams of 3-4 students implemented each algo- [8] V. Hristidis, L. Gravano, and Y. Papakonstantinou.
rithm and noted the difficulties and issues in trying to repro- Efficient IR-Style Keyword Search over Relational
duce them. All the developed implementations are available Databases. In J. C. Freytag, P. C. Lockemann,
as open source at: https://bitbucket.org/ks-bd-2015-2016/ S. Abiteboul, M. J. Carey, P. G. Selinger, and
ks-unipd. A. Heuer, editors, Proc. 29th International Conference
We are currently working on testing all these different on Very Large Data Bases (VLDB 2003), pages
implementations on the same hardware in order to compare 850–861. Morgan Kaufmann Publisher, Inc., San
their efficiency. We are also processing the datasets prepared Francisco, CA, USA, 2003.
by [4] in order to make them closer to the standard practices [9] V. Hristidis and Y. Papakonstantinou. DISCOVER:
adopted in Information Retrieval (IR) evaluation [12] and Keyword Search in Relational Databases. In P. A.
then we will systematically investigate the effectiveness of Bernstein, Y. E. Ioannidis, R. Ramakrishnan, and
the implemented algorithms. D. Papadias, editors, Proc. 28th International
Finally, we will work on harmonizing the different imple- Conference on Very Large Data Bases (VLDB 2002),
mentations, factoring out commonly used data structures pages 670–681. Morgan Kaufmann Publisher, Inc.,
and shared functionalities, in order to go towards a unified San Francisco, CA, USA, 2002.
open source framework for keyword-based search over rela- [10] V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan,
tional data. R. Desai, and H. Karambelkar. Bidirectional
Expansion For Keyword Search on Graph Databases.
6. REFERENCES In K. Böhm, C. S. Jensen, L. M. Haas, M. L. Kersten,
P.-A. Larson, and B. Chin Ooi, editors, Proc. 31st
[1] D. Abadi, R. Agrawal, A. Ailamaki, M. Balazinska, International Conference on Very Large Data Bases
P. A. Bernstein, M. J. Carey, S. Chaudhuri, J. Dean, (VLDB 2005), pages 505–516. ACM Press, New York,
A. Doan, M. J. Franklin, J. Gehrke, L. M. Haas, A. Y. USA, 2004.
Halevy, J. M. Hellerstein, Y. E. Ioannidis, H. V. [11] G. Kasneci, M. Ramanath, M. Sozio, F. M. Suchanek,
Jagadish, D. Kossmann, S. Madden, S. Mehrotra, and G. Weikum. STAR: Steiner-Tree Approximation
T. Milo, J. F. Naughton, R. Ramakrishnan, V. Markl, in Relationship Graphs. In Z. Li, P. S. Yu, Y. E.
C. Olston, B. C. Ooi, C. Rè, D. Suciu, Ioannidis, D. Lee, and R. Ng, editors, Proc. 25th
M. Stonebraker, T. Walter, and J. Widom. The International Conference on Data Engineering (ICDE
Beckman Report on Database Research. ACM 2009), pages 868–879. IEEE Computer Society, Los
SIGMOD Record, 43(3):61–70, September 2014. Alamitos, CA, USA, 2009.
[2] S. Bergamaschi, N. Ferro, F. Guerra, and G. Silvello. [12] M. Sanderson. Test Collection Based Evaluation of
Keyword-based Search over Databases: A Roadmap Information Retrieval Systems. Foundations and
for a Reference Architecture Paired with an Trends in Information Retrieval (FnTIR),
Evaluation Framework. LNCS Transactions on 4(4):247–375, 2010.
Computational Collective Intelligence (TCCI),
[13] H. Wang and C. C. Aggarwal. A Survey of Algorithms
9630:1–20, 2016.
for Keyword Search on Graph Data. In C. C.
[3] G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, Aggarwal and H. Wang, editors, Managing and
and S. Sudarshan. Keyword Searching and Browsing Mining Graph Data, pages 249–273. Springer-Verlag,
in Databases using BANKS. In R. Agrawal, New York, USA, 2010.
K. Dittrich, and A. H. H. Ngu, editors, Proc. 18th
[14] J. X. Yu, L. Qin, and L. Chang. Keyword Search in
International Conference on Data Engineering (ICDE
Databases. Morgan & Claypool Publishers, USA, 2010.
2002), pages 431–440. IEEE Computer Society, Los
Alamitos, CA, USA, 2002.
[4] J. Coffman and A. C. Weaver. An Empirical
Performance Evaluation of Relational Keyword Search
Techniques. IEEE Transactions on Knowledge and
Data Engineering (TKDE), 1(26):30–42, 2014.
[5] B. Ding, J. Xu Yu, S. Wang, L. Qin, X. Zhang, and
X. Lin. Finding Top-k Min-Cost Connected Trees in
Databases. In R. Chirkova, A. Dogac, M. T. Özsu, and
T. Sellis, editors, Proc. 23rd International Conference
on Data Engineering (ICDE 2007), pages 836–845.
IEEE Computer Society, Los Alamitos, CA, USA,
2007.
[6] N. Ferro. Reproducibility Challenges in Information
Retrieval Evaluation. ACM Journal of Data and
Information Quality (JDIQ), 8(2):8:1–8:4, January
2017.
[7] N. Ferro, N. Fuhr, K. Järvelin, N. Kando, M. Lippold,
and J. Zobel. Increasing Reproducibility in IR:
Findings from the Dagstuhl Seminar on
“Reproducibility of Data-Oriented Experiments in
e-Science”. SIGIR Forum, 50(1):68–82, June 2016.