=Paper=
{{Paper
|id=Vol-2929/paper3
|storemode=property
|title=Let the Database Talk Back: Natural Language Explanations for SQL
|pdfUrl=https://ceur-ws.org/Vol-2929/paper3.pdf
|volume=Vol-2929
|authors=Stavroula Eleftherakis,Orest Gkini,Georgia Koutrika
|dblpUrl=https://dblp.org/rec/conf/vldb/EleftherakisGK21
}}
==Let the Database Talk Back: Natural Language Explanations for SQL==
Let the Database Talk Back:
Natural Language Explanations for SQL
Stavroula Eleftherakis Orest Gkini Georgia Koutrika
Athena Research Center Athena Research Center Athena Research Center
Athens, Greece Athens, Greece Athens, Greece
seleftheraki@athenarc.gr orestg@athenarc.gr georgia@athenarc.gr
ABSTRACT addressed [8]. Furthermore, several different explanations can be
Database interaction is often characterized as a non-trivial and time- generated for the same SQL query, making evaluation challenging.
consuming process due to user’s inexperience with the data or the Logos [3] is a well-established system which translates SQL
query language. Therefore, there is a need for the databases to be queries into narratives. In contrast to neural approaches, Logos’
able to "talk back" in order to assist the users during data exploration model can be interpreted. This gives us the advantage of offering
and eventually lead them to the desired results. In this paper, we precise translations, knowing exactly how translations are pro-
tackle the problem of SQL-to-NL by extending the graph-based duced. Nevertheless, the system’s translation capabilities are highly
model of Logos [3]. Our novel extensions include improvements depended on the query’s syntax, meaning that for each type of
in terms of the system’s translation capabilities and the fluency of query the system should have the necessary tools to model it. More-
the generated explanations. Finally, we report several challenges, over, depended on the query’s complexity (number of attributes,
highlighted by experiments on different user cases, i.e, astronomy bridge tables, etc.) translations can become unnatural.
and policy making. In our work, we provide NL explanations as part of a data ex-
ploration platform used by our collaborators representing different
fields, including astrophysics and policy making [1]. In this plat-
Reference Format:
form, NL explanations are provided for explaining the internal SQL
Stavroula Eleftherakis, Orest Gkini, and Georgia Koutrika. Let the Database
Talk Back: Natural Language Explanations for SQL. In the 2nd Workshop queries generated by different system components used for recom-
on Search, Exploration, and Analysis in Heterogeneous Datastores (SEA mendations, NL-to-SQL translation, and data exploration. Thus, we
Data 2021). have to translate a variety of queries containing different clauses
and operators. Furthermore, there is a need for these translations
to be as natural is possible.
1 INTRODUCTION In an effort to tackle the above-mentioned challenges, we ex-
tend Logos to two important directions: (a) translation capabilities,
Nowadays, the availability of data coming from many and different
where we focus on the system’s ability of translating different types
application fields urges the deployment of sophisticated tools which
of queries, (b) fluent explanations, where we focus on improving
help us effectively explore and extract knowledge out of this data.
the system’s translations. Furthermore, we present experimental
However, the variety of the data sources (astronomical, biomedical,
results as well as results of a user study over two different databases:
etc.) as well as the complexity of query languages such as SQL
astronomical data and policy-making data. Our evaluation shows
and SPARQL often pose obstacles during data exploration due to
the effectiveness of our approach and provides several insights
the users’ unfamiliarity with the database content or the query
regarding challenges that arise due to the ambiguous nature of the
language. Explaining queries in text can help tackle both problems
NL explanations including: (a) scoring textual explanations, and (b)
as it enables users to understand the SQL queries that are used to
generating explanations suitable for different groups of people.
retrieve the answers through the data exploration process [8].
The rest of the paper is organised as follows. In Section 2, related
The problem of translating SQL queries to natural language, or
work is discussed. In Section 3, we provide background information
SQL-to-NL in short, appears to be deceivingly simple, as SQL queries
on Logos. In Section 4, we present our novel extensions. In Section
are using a restricted vocabulary, comprising SQL elements (clauses,
5, the effectiveness of the system is explored. Finally, concluding
operators, etc.) and database elements (i.e., relations, attributes), and
remarks are provided in Section 6.
there is no ambiguity in interpreting such elements. Hence, initially
a straightforward translation appears adequate. Reality teaches
us quite the opposite, as the resulting text should be accurate in 2 RELATED WORK
capturing the respective SQL query, and effective allowing fast Existing approaches for SQL-to-NL can be broadly divided into two
and unique interpretation of it. Achieving both of these qualities is categories. Template-/rule-based approaches (e.g., [4], [5]) require
very difficult and raises several technical challenges that need to be the design of special templates and rules that are used to compose
sentences. Due to the non-ambiguous nature of the SQL queries,
templates can help limit the syntactic variability of their output
Copyright © 2021 for the individual papers by the papers’ authors. Copyright © 2021
for the volume as a collection by its editors. This volume and its papers are published and smooth explanations out [7]. Logos [3] falls into this category.
under the Creative Commons License Attribution 4.0 International (CC BY 4.0). The second line of research tackles the problem as neural machine
Published in the Proceedings of the 2nd Workshop on Search, Exploration, and Anal- translation and uses sequence-to-sequence (Seq2Seq) models (e.g.,
ysis in Heterogeneous Datastores, co-located with VLDB 2021 (August 16-20, 2021,
Copenhagen, Denmark) on CEUR-WS.org. [2], [12]) which treat both the natural language description and
Figure 3: A query with IN and NOT IN operators.
Figure 1: A subgraph of the CORDIS database graph.
Figure 2 zooms in on the join path connecting relation PEOPLE
with relation PROJECTS and provides a more detailed view. A possi-
ble label which expresses this connection is “principal investigators
of”, while the default one is “associated with”. Such designer labels
Figure 2: A join on the CORDIS database graph. are stored in special tables called designer tables. □
NL explanations are created by traversing the query graph ac-
the query as sequences. In contrast to the previous category, those companied with a template mechanism [9]. The template mechanism
models automatically learn how to translate queries without the uses the provided labels (designer or default) to form meaningful
need of predefined query patterns. However, they require plenty of phrases. A template label 𝑙 (𝑢) or 𝑙 ((𝑢, 𝑣)) is assigned to a node 𝑢
training data and fine tuning and their effectiveness is still very low. or a path (𝑢, 𝑣), respectively. For instance, a join template label
Lately, hybrid models (e.g., [10], [11]) have been proposed to may have the form: 𝑙 ((𝑢, 𝑣)) = 𝑙 (𝑢) + 𝑒𝑥𝑝𝑟 + 𝑙 (𝑢, 𝑣) + 𝑙 (𝑣), where
solve a similar problem to ours, i.e., data to natural language. Those 𝑢 and 𝑣 are relation nodes of the query graph. Using the example
models automatically learn templates and use them in order gener- of the previous paragraph (see Figure 2), a template for the path
ate textual explanations. connecting the tables PEOPLE and PROJECTS is the following:
𝑙 ( (PEOPLE, PROJECTS)) = 𝑙 (PEOPLE) + " as " + “principal investigators
3 BACKGROUND of” +𝑙 (PROJECTS) = "people as principal investigators of projects" .
Synthesis of the NL explanations is performed as graph traversal
Logos’ approach to NL generation comprises graph representations
of the query graph. There are three traversal strategies: (a) Binary
of the database and the queries, template phrases associated with
Search Tree (BST) algorithm, where the translation consists of
parts of the graph, and graph traversal in particular directions to
a composition of clauses each one of them focusing on specific
compose the templates found on the way into the final text forma-
query semantics, (b) Multiple Reference Points (MRP) algorithm,
tion. We provide an overview of the main ingredients: the database
where information from all parts of the query graph is blended in
graph, the query graph, labels, templates, and the algorithms.
the translation, (c) Template composition (TMT) algorithm, where
The database graph captures the relationships between the data-
predefined, richer templates corresponding to different query parts
base relations and attributes (nodes of the graph). Its edges are
are used in an effort to produce more concise translations. In what
divided into three types: (a) membership edges (attributes to rela-
follows, our examples were created using the MRP traversal strategy
tions), representing attribute projections in queries, (b) selection
and correspond to queries targeting the CORDIS database.
edges (relations to attributes), representing attributes in predicates,
and (c) join edges (attribute to attribute), representing joins.
The query graph of a query is the part of the database graph 4 TOWARDS RICHER TRANSLATIONS
that the query refers to, extended with additional nodes (e.g., for Logos was extended in two different directions. The first direction
functions) and edges (e.g., for group-by’s) in order to capture the aimed to enable the translation of more query types, while the other
entire query meaning. Each node or edge of the query graph can one aimed to create fluent explanations. For this purpose, we have
be annotated with a label that signifies its meaning in natural upgraded the query parsing, graph generation, traversal strategies,
language. There are default labels that can be used for any database. and template mechanism.
Additionally, labels can be provided by a domain expert. Translation Capabilities. Our first goal was to extend the
Example. Let us consider the CORDIS 1 database, which stores translation capabilities of Logos towards translating more clauses
information about research projects funded by the European Union. and operators. To that end, Logos should be able to analyze those
In Figure 1, we see a subgraph of the corresponding database graph. new types of queries, enrich (if necessary) the query graph with new
Note that for simplicity, we do not show the attributes used to types of nodes and edges, and translate those new elements using
join the various relations. By default, the name of each node is any of the traversal strategies (BST, MRP, TMT). Thus, we have
also its label. We also see how edges are annotated with default implemented changes in the system’s parser, introduced new graph
labels. We can easily override these. For instance, for the relation elements, and created new translation rules. As a result, the system
PROJECT_MEMBERS, instead of the system’s default label “project is now capable of translating queries with the SELECT TOP and
members”, we can use the short label “participants". LIMIT clauses, as well as the (NOT) IN and (NOT) LIKE operators.
Once an input query is given to the system, the parser analyzes
1 https://data.europa.eu/euodp/en/data/dataset/cordisH2020projects it and generates a parsing tree that stores important information for
2
Table 1: Example SQL queries and generated interpretations. Table 2: Example SQL queries and generated interpretations,
where B and A denotes before and after, respectively.
SQL Query Interpretations
1. SELECT TOP 10 title FROM projects WHERE start_year=2021; SQL Query Interpretations
"Find projects whose start year is 2021. Limit the results to top 10." 1. SELECT pr.* FROM projects pr, project_subject_areas psa,
2. SELECT total_cost, ec_max_contribution, framework_program, subject_areas sa WHERE pr.unics_id = psa.project AND
ec_fund_scheme FROM projects WHERE framework_program IN (’FP7’) psa.subject_area = sa.code AND sa.title = ’Mathematics and Statistics’;
AND ec_fund_scheme NOT IN (’ERC-SG’, ’ERC-CG’); B: "Find everything about projects associated with project subject areas,
"Find the total costs, max contributions, framework programs and fund and for project subject areas associated with subject areas whose title is
schemes of projects whose framework program is FP7 and fund scheme is Mathematics and Statistics."
not ERC-SG and ERC-CG." A: "Find everything about projects on subject areas whose title is
3. SELECT title FROM topics WHERE title LIKE ’%climate%’; Mathematics and Statistics."
"Find the titles of topics whose title is like %climate%." 2. SELECT COUNT(p.title), t.title FROM topics t, project_topics pt,
4. SELECT COUNT(∗) FROM projects WHERE start_year=2012; projects p WHERE t.code = pt.topic AND pt.project = p.unics_id
"Find the number of projects whose start year is 2012." GROUP BY t.title;
B: "Create groups according to the title of topics. Find the number of
projects and the title of topics associated with project topics associated
with the projects."
the creation of the query graph. For the parser, we have introduced
A: "Find the titles of topics and the number of projects on these topics
to the system several new parsing nodes (e.g., for limit) capturing
grouped by the titles of topics."
information about all the aforementioned clauses and operators. 3. SELECT m.member_name FROM project_members m, projects p
Queries having SELECT TOP or LIMIT clauses are marked as "lim- WHERE m.project = p.unics_id;
ited" and their associated limitation number is temporally stored in B: "Find the member name of project members associated with projects."
the system. The translation algorithm (BST, MRP, TMT) checks if A: "Find participants participated in projects."
the input query is limited and if that is the case an explanation is
separately created and added to the end of the translation. As an
example, we provide that of query 1 in Table 1. Essentially, for every attribute node 𝑢 of the query graph we take
In order for Logos to translate queries having IN and NOT IN (if that exists) the plural form 𝑝 of its label 𝑙 (𝑢), i.e., 𝑝 (𝑙 (𝑢)).
operators, new types of query graph elements were created: the in In addition, we have modified the way the MRP algorithm trans-
edge, the not-in edge, and the value-list node. An illustrative example lates the group by clause. Previously the translation for this part
of this case is query 2 in Table 1 whose query graph is given in would result in a separate sentence. Leveraging the nature of the
Figure 3. The translation algorithm detects those new elements of algorithm which aims on generating translations by blending differ-
the query graph and produces a textual explanation. ent parts of the query, we now blend this translation more naturally.
Regarding the translation of queries having the (NOT) LIKE An example that captures all the aforementioned changes (bridge
operator, we created two new types of edges: the like edge and the tables, plural form, group by on MRP) is that of query 2 in Table 2.
not-like edge. An example of this case is that of query 3 in Table 1. Lastly, we have improved the translation of queries which in-
Our last improvement, in terms of translation capabilities, is the clude only heading attributes [4] in the SELECT clause. The heading
creation of a new query graph node, the star node. In combination attribute is the most characteristic attribute of its relation. As an
with the function node that represents the COUNT operator, we now example we offer that of query 3 in Table 2.
get explanations like that of query 4 in Table 1.
Fluent Explanations. Our second goal was to improve the 5 EXPERIMENTAL RESULTS
translations generated by the system in terms of fluency. This kind The evaluation of Logos is divided into two parts: (a) the automated
of changes require only the modification of the traversal strategies evaluation part, where we evaluate our results using the Bilingual
and the template mechanism. Evaluation Understudy (BLEU) automated metric [6], and (b) the
While exploring the system’s capabilities, we noticed that many human evaluation part, where we evaluate our results using the
database schemas include bridge tables, i.e., special tables used to help of SQL experts. The purpose of the first part is to use a well-
transform many-to-many relationships into one-to-many relation- established metric to show how good the NL explanations are, while
ships. Bridge tables (usually) appear in queries in order to join two the second part aims at evaluating qualitative aspects of the NL
tables. Although those tables play an important role in terms of explanations, such as clarity and fluency.
data modeling, they cause "noisy" explanations. For instance, let For both types of evaluation, we created 28 queries (14 for the
us consider query 1 of Table 2. Apparently, the previously pro- CORDIS database, and 14 for the SDSS 2 database) (see Tables 6
duced explanation of the system is unnatural. Bridge tables are and 7, respectively). Using the MRP algorithm, we translated those
manually stored in a designer table and excluded from the trans- queries twice. One time using the default version of the system,
lation. The template mechanism has been modified so that if 𝑢, which works by considering only the database schema (internal
𝑣, and 𝑤 are table nodes with 𝑣 corresponding to a bridge table, knowledge), and one time using both the database schema and the
the template label of the path connecting 𝑢 to 𝑤 is of the form designer tables (internal and external knowledge), which, as men-
𝑙 ((𝑢, 𝑤)) = 𝑙 (𝑢) + 𝑙 (𝑢, 𝑤) + 𝑙 (𝑤). tioned in the previous sections, store information about heading
Moreover, a mini dictionary has been developed providing the
translation process with the plural form of all the attribute labels. 2 https://www.sdss.org/
3
Table 3: BLEU scores for the textual explanations of the 14 that the automatically generated translations were in fact correct.
CORDIS queries, and the 14 SDSS queries. However, they looked very different from the ground truth. Indica-
tively, we show the translations of query 9 (from Table 6).
Query CORDIS BLEU Scores SDSS BLEU Scores CORDIS query 9 explanations:
ID Logos v.1 Logos v.2 Logos v.1 Logos v.2
– Logos v.1: “Find the institutions names of institutions associated
1 3.21 3.98 4.99 12.55
with countries whose country name is France.”
2 4.37 6.27 2.45 2.66
3 2.26 9.26 11.71 15.73 – Logos v.2: “Find institutions located in countries whose name is
4 3.51 12.87 4.86 5.01 France.”
5 2.01 15.46 2.84 3.67 – Ground truth: "Show names of institutions from France.".
6 3.40 11.20 3.38 4.07 We see that Logos would not necessarily produce translations the
7 4.03 4.32 3.74 4.37 way that a human mind would produce. And even different people
8 8.23 9.55 14.72 14.01 would provide different explanations for the same SQL query (albeit
9 12.30 9.29 4.46 18.80 all correct). This shows the opportunity of enhancing the translation
10 18.30 7.41 4.03 4.07 capabilities of the system with learning that not only leverages the
11 14.46 24.81 18.46 27.36 database schema but is also performed on previously defined human
12 10.70 17.40 6.87 8.56
translations. It also shows the challenge of creating automated
13 4.07 4.20 4.30 22.24
evaluation metrics which do not judge the explanations quality
14 3.67 4.46 35.61 5.50
only by estimating their similarity with the ground truth. These
MAX 18.30 24.81 35.61 27.36
MIN 2.01 4.20 2.45 2.66
observations lead to the need of conducting human evaluation as
MEDIAN 4.05 9.28 4.66 7.03 well, which will be presented in the next subsection.
Focusing now on the results of CORDIS, we see that there is
significant improvement on the translations of queries with id 3-6,
attributes, bridge tables, and node or edge labels. First of all, we 11, and 12. This is mainly due to the exclusion of bridge tables from
want to investigate the effect of the designer tables to the transla- the translation procedure and the heading attribute addition. For
tions. Moreover, we want to know how close to the ground truth queries with id 9, and 10 we noticed a score reduction. Indicatively,
(textual explanations given by SQL experts of the databases) the looking at the translations of query 9 above, we observe that al-
system’s explanations are. In what follows, we denote the default though the translation of Logos v.2 is more natural than that of
version of the system as Logos v.1 and the version that takes the Logos v.1, the presence of the sentence “names of institutions” in
advantage of the designer tables as Logos v.2. the translation of the latter leads to a higher BLEU score.
Let us now focus on the results of SDSS. The scores are lower
5.1 Automated Evaluation than those of the CORDIS database. This is due to the nature of
the SDSS database that uses abbreviated names and letter symbols
Automated evaluation was carried out to compare the generated
in order to describe the content of its tables and attributes. For in-
explanations to the ground truth i.e., textual explanations of SQL
stance, “photoobj” instead of “photometric objects”, or the letter “u”
queries given by SQL experts of the databases, members of the
to denote the magnitude of a photometric object in “u” (ultraviolet)
INODE project [1]. The quality of the results is measured using the
filter. During the experiments, we realized that by transforming
BLEU-4 score. BLEU is a score for comparing a candidate translation
those names and symbols into meaningful textual sentences, we
of text to one or more reference translations. The scores range
increase the size of the explanations compared to the size of ex-
between 0% and 100%. A score of 100% means that the estimated,
planations provided by the astrophysicist SQL expert. This shows
by the system, explanation matches completely the ground truth.
another challenge for the automatic generation of NL explanations:
The results are summarized in Table 3. We also report the min-
different styles of explanations may be given by domain experts
imum and the maximum BLEU score. Furthermore, we noticed a
in different fields. Thus, different explanations are suitable for dif-
large variation between the scores; thus, we decided to report the
ferent groups of people (e.g., astronomers and data scientists). For
median BLEU score per translation system instead of the average.
example, for query 14, the BLEU score of Logos v.2 is substantially
For both databases, median scores are under 10%. Looking at the
lower than that of Logos v.1.
medians, we conclude that Logos v.2 produces translations closer
SDSS query 14 explanations:
to the ground truth than those obtained from Logos v.1. Especially
for the CORDIS database, the median BLEU score of Logos v.2 is – Logos v.1: “Find the u, g, r, i and z of photoobj associated with specobj
more than two times higher than that of Logos v.1. whose class is QSO.”
The low scores do not indicate that the translations produced – Logos v.2: “Find the magnitude u, magnitude g, magnitude r, mag-
by Logos are not correct. BLEU scores work by counting matching nitude i and magnitude z of photometric objects corresponding to
n-grams in the candidate translation to n-grams in the reference spectroscopic objects whose class is QSO.”
text, where 1-gram or unigram would be each token and a bigram – Ground truth: "Show me the u, g, r, i, z magnitudes of spectroscopic
comparison would be each word pair. That means that the score quasars.".
is higher the more common parts a NL explanation has with the We concluded that this kind of notation (abbreviated names, and
ground truth. Manual examination of the NL explanations that the letter symbols), for the attributes of the SDSS tables, is sometimes
two versions of Logos generated versus the ground truth showed preferred over full descriptions.
4
Table 4: Average clarity, fluency, and precision per database, and translation system (their standard deviation in parentheses).
CORDIS SDSS
Features
Logos v.1 Logos v.2 Ground truth Logos v.1 Logos v.2 Ground truth
Clarity 4.25 (1.29) 5.79 (1.13) 6.79 (0.31) 4.32 (1.11) 6.04 (0.77) 6.50 (0.57)
Fluency 3.64 (1.43) 4.75 (1.06) 6.86 (0.35) 3.39 (0.97) 5.50 (0.87) 6.57 (0.56)
Precision 6.04 (1.46) 5.79 (1.10) 5.18 (1.75) 6.50 (0.50) 6.21 (0.70) 5.18 (1.01)
Lastly, it has been observed that the SDSS database includes Table 5: Final scores per feature for the explanations of the
many discrete variables (attributes) that define different types of CORDIS query with id 9.
objects, e.g., stars. A fine example of that case is that of query 7.
SDSS query 7 (Table 7) explanations: Features Rating
CORDIS QUERY ID 9
Logos v.1 Logos v.2 Ground truth
– Logos v.1: “Find the specobjids of specobj whose subclass is OB and Expert A 2 6 7
class is STAR.” Clarity Expert B 4 6 7
– Logos v.2: “Find spectroscopic objects whose spectroscopic subclass Final Score 3 6 7
is OB and class is STAR.” Expert A 1 3 7
– Ground truth: "Find all spectroscopic stars which are massive and Fluency Expert B 4 4 7
hot.". Final Score 2.5 3.5 7
Expert A 7 7 7
We see that both versions of Logos do not understand that “sub-
Precision Expert B 6 7 7
class = OB” and “class = STAR” means massive and hot stars. This Final Score 6.5 7 7
justifies the low BLEU scores in both versions of the system. For
this purpose, additional knowledge is required. We are in fact in the
process of integrating an ontology that provides such mappings. feature. Indicatively, in Table 5 we show the data collected for the
explanations of the CORDIS query with id 9, and the obtained final
scores. From those final scores, 18 rating sets (2 databases x 3 trans-
5.2 Human Evaluation
lation systems x 3 features) consisting of 14 elements each (1 for
For this experimental setting, an online survey was conducted. every query), were created.
A total of 21 people, all SQL experts, participated in the survey. In Table 4, we show the averages of those sets, accompanied
The experts were members of the INODE project [1]. People who with their standard deviation between brackets. Logos v.2 leads to
contributed to the creation of labels (ground truth explanation) better translations in terms of clarity and fluency for both databases
were excluded from the evaluation process. From the pool of 28 (average score increases). However, we see that these scores do not
SQL queries (see Tables 6 and 7), participants were asked to rate the surpass those of the ground truth. An interesting observation is
textual explanations of 4 randomly chosen queries (2 per database). that as the explanations become clearer and more fluent, precision
The queries were equally distributed to all participants. For each decreases. In other words, as the explanations become more natural,
query, the participant rated 3 explanations (1 per translation): (a) the they tend to lose their ability to explicitly explain each part of their
explanation produced by Logos v.1, (b) the explanation produced associated SQL query. Lastly, we see that the difference between
by Logos v.2, and (c) the ground truth explanation, resulting in a the average fluency scores of Logos v.2 and Logos v.1, increases for
total of 84 explanations rated by humans. the SDSS database. As mentioned in the previous Section, this is
The participants judged the quality of the translations on the due to the nature of the SDSS database which has a less explainable
7-point Likert-scales. They provided ratings for: database schema in terms of NL explanation. By adopting labels
• clarity: how clear and understandable the explanation is. for the components of the database schema (tables, attributes, and
• fluency: how natural the explanation is. joins), we increase the average fluency score (Logos v.2).
• precision: how well the information of the provided SQL
query is captured on its textual explanation. 6 CONCLUDING REMARKS
A score of 7 to all the aforementioned categories means that the In this paper, the translation of SQL queries has been discussed as a
provided explanation is extremely clear, natural, and precise, re- potential solution to problems that rise in data exploration. In that
spectively. We score neutral as a 4. vein, we have extended the graph-based model of Logos [3]. The
Data associated with respondents that completed the survey in system’s capabilities have been extended to two different directions:
less than 5 minutes (half the approximate time for filling out the sur- (a) that of translating different types of queries and (b) that of
vey) were deleted. Furthemore, we deleted the data of participants creating fluent explanations. To this end, we have implemented
which have selected the same response to every question, regard- changes in the system’s parser, introduced new types of nodes
less of the question. After cleaning the data, we ended up having 2 and edges for the query graph, created new translation rules and
different scores (per explanation, and feature), corresponding to 2 templates, and finally modified the available traversal strategies
different participants. The final score of an explanation for a given (BST, MRP, TMT). In addition, a data set composed of 28 queries
feature is obtained by taking the average of the 2 different scores. coming from different user cases (astronomy, policy making) has
Therefore, we ended up having 1 single score per explanation, and been created to perform both automated and human evaluation.
5
The experiments highlighted the following two challenges: (a) 7 APPENDIX
the need for creating automated metrics which do not consider
only the ground truth and (b) the need for generating different Table 6: The 14 SQL queries of the CORDIS database.
explanations for different groups of people. Regarding the latter, we
observed that: (a) scientific related databases are far more difficult ID CORDIS SQL Queries
SELECT sum(pm.ec_contribution) AS funding_received
to be explained by natural language and (b) sometimes the scientists 1 FROM projects p, project_members pm
themselves ask for less detailed interpretations. Lastly, it has been WHERE pm.project=p.unics_id AND p.framework_program=’H2020’;
observed that as the explanations become more natural, they tend SELECT c.name FROM institutions i, countries c
2
WHERE c.unics_id=i.country_id AND i.name=’Athena’;
to lose their ability to explicitly explain each part of the query. SELECT pr.title FROM projects pr, project_subject_areas
Depending on the application, one may want to have more precise 3 psa, subject_areas sa WHERE pr.unics_id = psa.project AND
psa.subject_area = sa.code AND sa.title = ’Mathematics and Statistics’;
or more natural explanations.
SELECT distinct t.title FROM projects pr, project_topics
Having said all this, SQL to text is a hard problem to solve and 4 pt, topics t WHERE pr.unics_id = pt.project AND pt.topic
there is no single solution. Different explanations may be suitable = t.code AND pr.end_year = 2014;
SELECT p.full_name FROM people p, projects pr,
for different applications or even different groups of people, albeit project_topics pt, topics t WHERE p.unics_id =
all correct. As future work, we plan to improve our system by 5
pr.principal_investigator AND pr.unics_id = pt.project
adding an ontology which provides mappings for the interpretation AND pt.topic = t.code AND t.title = ’Systems’;
SELECT m.title FROM people p, projects pr,
of several parts of the query graph. project_programmes pm, programmes m WHERE p.unics_id =
6 pr.principal_investigator AND pr.unics_id = pm.project
AND pm.programme = m.code AND p.full_name = ’Thomas
ACKNOWLEDGMENTS Bell’;
This work was supported by the European Union’s Horizon 2020 SELECT p.acronym FROM projects p, project_members pm,
institutions i, countries c WHERE p.unics_id = pm.project
research and innovation programme under grant agreement (No 7
AND pm.institution_id = i.unics_id AND i.country_id =
863410). We also thank the Intelligent Open Data Exploration (IN- c.unics_id AND c.country_name = ’Greece’;
SELECT pe.full_name FROM projects pr, people pe
ODE) team for contributing on the system’s evaluation. WHERE pr.principal_investigator = pe.unics_id AND
8
pr.start_year = 2014;
SELECT i.institutions_name FROM institutions i, countries
REFERENCES 9 c WHERE i.country_id = c.unics_id AND c.country_name =
[1] Sihem Amer-Yahia, Georgia Koutrika, Frederic Bastian, Theofilos Belmpas, Mar- ’France’;
tin Braschler, Ursin Brunner, Diego Calvanese, Maximilian Fabricius, Orest SELECT mb.member_name FROM project_members mb,
Gkini, Catherine Kosten, Davide Lanti, Antonis Litke, Hendrik Lücke-Tieke, 10 activity_types a WHERE a.code = mb.activity_type AND
Francesco Alessandro Massucci, Tarcisio Mendes de Farias, Alessandro Mosca, a.description = ’Research Organisations’;
Francesco Multari, Nikolaos Papadakis, Dimitris Papadopoulos, Yogendra Patil, SELECT count(p.title) FROM projects p GROUP BY
Aurélien Personnaz, Guillem Rull, Ana Claudia Sima, Ellery Smith, Dimitrios 11
p.start_year;
Skoutas, Srividya Subramanian, Guohui Xiao, and Kurt Stockinger. 2021. INODE: SELECT count(i.name) FROM institutions i, countries c
Building an End-to-End Data Exploration System in Practice [Extended Vision]. 12
WHERE i.country_id=c.unics_id GROUP BY c.name;
CoRR abs/2104.04194 (2021). arXiv:2104.04194 https://arxiv.org/abs/2104.04194 13 SELECT title FROM topics WHERE title like ’%climate%’;
[2] Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Sum- SELECT count(p.title) FROM projects p WHERE
marizing Source Code using a Neural Attention Model. In ACL. The Association 14
p.start_year=2018;
for Computer Linguistics.
[3] Andreas Kokkalis, Panagiotis Vagenas, Alexandros Zervakis, Alkis Simitsis, Geor-
gia Koutrika, and Yannis Ioannidis. 2012. Logos: a system for translating queries
into narratives. In Proceedings of the 2012 ACM SIGMOD International Conference Table 7: The 14 SQL queries of the SDSS database.
on Management of Data. 673–676.
[4] Georgia Koutrika, Alkis Simitsis, and Yannis E Ioannidis. 2010. Explaining struc-
ID SDSS SQL Queries
tured queries in natural language. In ICDE. IEEE, 333–344.
1 SELECT objid FROM photoobj WHERE clean=1;
[5] Axel-Cyrille Ngonga Ngomo, Lorenz Bühmann, Christina Unger, Jens Lehmann,
and Daniel Gerber. 2013. Sorry, i don’t speak SPARQL: translating SPARQL SELECT specobjid, z FROM specobj WHERE class = ’QSO’ AND
2
queries into natural language. In Proceedings of the 22nd international conference zwarning = 0;
on World Wide Web. 977–988. SELECT objid FROM photoobj WHERE ra >185 AND ra <185.1
3
[6] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a AND dec <5;
method for automatic evaluation of machine translation. In Proceedings of the 4 SELECT specobjid FROM specobj WHERE survey = ’segue2’;
40th annual meeting of the Association for Computational Linguistics. 311–318. SELECT specobjid FROM specobj WHERE class = ’STAR’ AND
5
[7] Ehud Reiter and Robert Dale. 1997. Building applied natural language gen- zwarning = 0;
eration systems. Nat. Lang. Eng. 3, 1 (1997), 57–87. https://doi.org/10.1017/ SELECT s.specobjid FROM specobj as s WHERE s.subclass =
6
S1351324997001502 ’STARFORMING’ AND s.class= ’GALAXY’;
[8] Alkis Simitsis and Yannis E. Ioannidis. 2009. DBMSs Should Talk Back Too. In SELECT s.specobjid FROM specobj as s WHERE s.subclass =
7
CIDR. ’OB’ AND s.class= ’STAR’;
[9] Alkis Simitsis, Georgia Koutrika, Yannis Alexandrakis, and Yannis Ioannidis. 2008. SELECT * FROM photoobj WHERE ra >100 and dec <100 AND
8
Synthesizing structured text from logical database subsets. In EDBT. 428–439. type = 3;
[10] Chris van der Lee, Emiel Krahmer, and Sander Wubben. 2018. Automated learning 9 SELECT DISTINCT type FROM photoobj;
of templates for data-to-text generation: comparing rule-based, statistical and 10 SELECT class, count(*) FROM specobj GROUP BY class;
neural methods. In Proceedings of the 11th International Conference on Natural SELECT g.* FROM specobj s, galspecline g WHERE
11
Language Generation. 35–45. s.specobjid = g.specobjid and ra <185 AND dec <25;
[11] Sam Wiseman, Stuart M Shieber, and Alexander M Rush. 2018. Learning neural SELECT n.* FROM neighbors n, photoobj p WHERE p.objid =
templates for text generation. arXiv preprint arXiv:1808.10122 (2018). 12
n.objid AND p.b = 1.072 and p.l = 174.535;
[12] Kun Xu, Lingfei Wu, Zhiguo Wang, Yansong Feng, and Vadim Sheinin. 2018. SELECT s.* FROM specobj s, galspecline g
SQL-to-Text Generation with Graph-to-Sequence Model. In Proceedings of the 13
WHERE s.specobjid = g.specobjid;
2018 Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, SELECT p.u, p.g, p.r, p.i, p.z FROM specobj s, photoobj p
David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). 931–936. 14
WHERE s.bestobjid = p.objid AND s.class = ’QSO’;
6