Let the Database Talk Back:
                                  Natural Language Explanations for SQL
            Stavroula Eleftherakis                                              Orest Gkini                               Georgia Koutrika
             Athena Research Center                                     Athena Research Center                          Athena Research Center
                  Athens, Greece                                            Athens, Greece                                  Athens, Greece
             seleftheraki@athenarc.gr                                     orestg@athenarc.gr                             georgia@athenarc.gr

ABSTRACT                                                                                  addressed [8]. Furthermore, several different explanations can be
Database interaction is often characterized as a non-trivial and time-                    generated for the same SQL query, making evaluation challenging.
consuming process due to user’s inexperience with the data or the                            Logos [3] is a well-established system which translates SQL
query language. Therefore, there is a need for the databases to be                        queries into narratives. In contrast to neural approaches, Logos’
able to "talk back" in order to assist the users during data exploration                  model can be interpreted. This gives us the advantage of offering
and eventually lead them to the desired results. In this paper, we                        precise translations, knowing exactly how translations are pro-
tackle the problem of SQL-to-NL by extending the graph-based                              duced. Nevertheless, the system’s translation capabilities are highly
model of Logos [3]. Our novel extensions include improvements                             depended on the query’s syntax, meaning that for each type of
in terms of the system’s translation capabilities and the fluency of                      query the system should have the necessary tools to model it. More-
the generated explanations. Finally, we report several challenges,                        over, depended on the query’s complexity (number of attributes,
highlighted by experiments on different user cases, i.e, astronomy                        bridge tables, etc.) translations can become unnatural.
and policy making.                                                                           In our work, we provide NL explanations as part of a data ex-
                                                                                          ploration platform used by our collaborators representing different
                                                                                          fields, including astrophysics and policy making [1]. In this plat-
Reference Format:
                                                                                          form, NL explanations are provided for explaining the internal SQL
Stavroula Eleftherakis, Orest Gkini, and Georgia Koutrika. Let the Database
Talk Back: Natural Language Explanations for SQL. In the 2nd Workshop                     queries generated by different system components used for recom-
on Search, Exploration, and Analysis in Heterogeneous Datastores (SEA                     mendations, NL-to-SQL translation, and data exploration. Thus, we
Data 2021).                                                                               have to translate a variety of queries containing different clauses
                                                                                          and operators. Furthermore, there is a need for these translations
                                                                                          to be as natural is possible.
1     INTRODUCTION                                                                           In an effort to tackle the above-mentioned challenges, we ex-
                                                                                          tend Logos to two important directions: (a) translation capabilities,
Nowadays, the availability of data coming from many and different
                                                                                          where we focus on the system’s ability of translating different types
application fields urges the deployment of sophisticated tools which
                                                                                          of queries, (b) fluent explanations, where we focus on improving
help us effectively explore and extract knowledge out of this data.
                                                                                          the system’s translations. Furthermore, we present experimental
However, the variety of the data sources (astronomical, biomedical,
                                                                                          results as well as results of a user study over two different databases:
etc.) as well as the complexity of query languages such as SQL
                                                                                          astronomical data and policy-making data. Our evaluation shows
and SPARQL often pose obstacles during data exploration due to
                                                                                          the effectiveness of our approach and provides several insights
the users’ unfamiliarity with the database content or the query
                                                                                          regarding challenges that arise due to the ambiguous nature of the
language. Explaining queries in text can help tackle both problems
                                                                                          NL explanations including: (a) scoring textual explanations, and (b)
as it enables users to understand the SQL queries that are used to
                                                                                          generating explanations suitable for different groups of people.
retrieve the answers through the data exploration process [8].
                                                                                             The rest of the paper is organised as follows. In Section 2, related
   The problem of translating SQL queries to natural language, or
                                                                                          work is discussed. In Section 3, we provide background information
SQL-to-NL in short, appears to be deceivingly simple, as SQL queries
                                                                                          on Logos. In Section 4, we present our novel extensions. In Section
are using a restricted vocabulary, comprising SQL elements (clauses,
                                                                                          5, the effectiveness of the system is explored. Finally, concluding
operators, etc.) and database elements (i.e., relations, attributes), and
                                                                                          remarks are provided in Section 6.
there is no ambiguity in interpreting such elements. Hence, initially
a straightforward translation appears adequate. Reality teaches
us quite the opposite, as the resulting text should be accurate in                        2    RELATED WORK
capturing the respective SQL query, and effective allowing fast                           Existing approaches for SQL-to-NL can be broadly divided into two
and unique interpretation of it. Achieving both of these qualities is                     categories. Template-/rule-based approaches (e.g., [4], [5]) require
very difficult and raises several technical challenges that need to be                    the design of special templates and rules that are used to compose
                                                                                          sentences. Due to the non-ambiguous nature of the SQL queries,
                                                                                          templates can help limit the syntactic variability of their output
Copyright © 2021 for the individual papers by the papers’ authors. Copyright © 2021
for the volume as a collection by its editors. This volume and its papers are published   and smooth explanations out [7]. Logos [3] falls into this category.
under the Creative Commons License Attribution 4.0 International (CC BY 4.0).                The second line of research tackles the problem as neural machine
Published in the Proceedings of the 2nd Workshop on Search, Exploration, and Anal-        translation and uses sequence-to-sequence (Seq2Seq) models (e.g.,
ysis in Heterogeneous Datastores, co-located with VLDB 2021 (August 16-20, 2021,
Copenhagen, Denmark) on CEUR-WS.org.                                                      [2], [12]) which treat both the natural language description and
                                                                                     Figure 3: A query with IN and NOT IN operators.
     Figure 1: A subgraph of the CORDIS database graph.
                                                                                  Figure 2 zooms in on the join path connecting relation PEOPLE
                                                                               with relation PROJECTS and provides a more detailed view. A possi-
                                                                               ble label which expresses this connection is “principal investigators
                                                                               of”, while the default one is “associated with”. Such designer labels
         Figure 2: A join on the CORDIS database graph.                        are stored in special tables called designer tables. □
                                                                                  NL explanations are created by traversing the query graph ac-
the query as sequences. In contrast to the previous category, those            companied with a template mechanism [9]. The template mechanism
models automatically learn how to translate queries without the                uses the provided labels (designer or default) to form meaningful
need of predefined query patterns. However, they require plenty of             phrases. A template label 𝑙 (𝑢) or 𝑙 ((𝑢, 𝑣)) is assigned to a node 𝑢
training data and fine tuning and their effectiveness is still very low.       or a path (𝑢, 𝑣), respectively. For instance, a join template label
   Lately, hybrid models (e.g., [10], [11]) have been proposed to              may have the form: 𝑙 ((𝑢, 𝑣)) = 𝑙 (𝑢) + 𝑒𝑥𝑝𝑟 + 𝑙 (𝑢, 𝑣) + 𝑙 (𝑣), where
solve a similar problem to ours, i.e., data to natural language. Those         𝑢 and 𝑣 are relation nodes of the query graph. Using the example
models automatically learn templates and use them in order gener-              of the previous paragraph (see Figure 2), a template for the path
ate textual explanations.                                                      connecting the tables PEOPLE and PROJECTS is the following:
                                                                               𝑙 ( (PEOPLE, PROJECTS)) = 𝑙 (PEOPLE) + " as " + “principal investigators
3    BACKGROUND                                                                of” +𝑙 (PROJECTS) = "people as principal investigators of projects" .
                                                                                  Synthesis of the NL explanations is performed as graph traversal
Logos’ approach to NL generation comprises graph representations
                                                                               of the query graph. There are three traversal strategies: (a) Binary
of the database and the queries, template phrases associated with
                                                                               Search Tree (BST) algorithm, where the translation consists of
parts of the graph, and graph traversal in particular directions to
                                                                               a composition of clauses each one of them focusing on specific
compose the templates found on the way into the final text forma-
                                                                               query semantics, (b) Multiple Reference Points (MRP) algorithm,
tion. We provide an overview of the main ingredients: the database
                                                                               where information from all parts of the query graph is blended in
graph, the query graph, labels, templates, and the algorithms.
                                                                               the translation, (c) Template composition (TMT) algorithm, where
   The database graph captures the relationships between the data-
                                                                               predefined, richer templates corresponding to different query parts
base relations and attributes (nodes of the graph). Its edges are
                                                                               are used in an effort to produce more concise translations. In what
divided into three types: (a) membership edges (attributes to rela-
                                                                               follows, our examples were created using the MRP traversal strategy
tions), representing attribute projections in queries, (b) selection
                                                                               and correspond to queries targeting the CORDIS database.
edges (relations to attributes), representing attributes in predicates,
and (c) join edges (attribute to attribute), representing joins.
   The query graph of a query is the part of the database graph                4    TOWARDS RICHER TRANSLATIONS
that the query refers to, extended with additional nodes (e.g., for            Logos was extended in two different directions. The first direction
functions) and edges (e.g., for group-by’s) in order to capture the            aimed to enable the translation of more query types, while the other
entire query meaning. Each node or edge of the query graph can                 one aimed to create fluent explanations. For this purpose, we have
be annotated with a label that signifies its meaning in natural                upgraded the query parsing, graph generation, traversal strategies,
language. There are default labels that can be used for any database.          and template mechanism.
Additionally, labels can be provided by a domain expert.                           Translation Capabilities. Our first goal was to extend the
   Example. Let us consider the CORDIS 1 database, which stores                translation capabilities of Logos towards translating more clauses
information about research projects funded by the European Union.              and operators. To that end, Logos should be able to analyze those
In Figure 1, we see a subgraph of the corresponding database graph.            new types of queries, enrich (if necessary) the query graph with new
Note that for simplicity, we do not show the attributes used to                types of nodes and edges, and translate those new elements using
join the various relations. By default, the name of each node is               any of the traversal strategies (BST, MRP, TMT). Thus, we have
also its label. We also see how edges are annotated with default               implemented changes in the system’s parser, introduced new graph
labels. We can easily override these. For instance, for the relation           elements, and created new translation rules. As a result, the system
PROJECT_MEMBERS, instead of the system’s default label “project                is now capable of translating queries with the SELECT TOP and
members”, we can use the short label “participants".                           LIMIT clauses, as well as the (NOT) IN and (NOT) LIKE operators.
                                                                                   Once an input query is given to the system, the parser analyzes
1 https://data.europa.eu/euodp/en/data/dataset/cordisH2020projects             it and generates a parsing tree that stores important information for
                                                                           2
Table 1: Example SQL queries and generated interpretations.                   Table 2: Example SQL queries and generated interpretations,
                                                                              where B and A denotes before and after, respectively.
 SQL Query Interpretations
 1. SELECT TOP 10 title FROM projects WHERE start_year=2021;                      SQL Query Interpretations
 "Find projects whose start year is 2021. Limit the results to top 10."           1. SELECT pr.* FROM projects pr, project_subject_areas psa,
 2. SELECT total_cost, ec_max_contribution, framework_program,                    subject_areas sa WHERE pr.unics_id = psa.project AND
 ec_fund_scheme FROM projects WHERE framework_program IN (’FP7’)                  psa.subject_area = sa.code AND sa.title = ’Mathematics and Statistics’;
 AND ec_fund_scheme NOT IN (’ERC-SG’, ’ERC-CG’);                                  B: "Find everything about projects associated with project subject areas,
 "Find the total costs, max contributions, framework programs and fund            and for project subject areas associated with subject areas whose title is
 schemes of projects whose framework program is FP7 and fund scheme is            Mathematics and Statistics."
 not ERC-SG and ERC-CG."                                                          A: "Find everything about projects on subject areas whose title is
 3. SELECT title FROM topics WHERE title LIKE ’%climate%’;                        Mathematics and Statistics."
  "Find the titles of topics whose title is like %climate%."                      2. SELECT COUNT(p.title), t.title FROM topics t, project_topics pt,
 4. SELECT COUNT(∗) FROM projects WHERE start_year=2012;                          projects p WHERE t.code = pt.topic AND pt.project = p.unics_id
  "Find the number of projects whose start year is 2012."                         GROUP BY t.title;
                                                                                  B: "Create groups according to the title of topics. Find the number of
                                                                                  projects and the title of topics associated with project topics associated
                                                                                  with the projects."
the creation of the query graph. For the parser, we have introduced
                                                                                  A: "Find the titles of topics and the number of projects on these topics
to the system several new parsing nodes (e.g., for limit) capturing
                                                                                  grouped by the titles of topics."
information about all the aforementioned clauses and operators.                   3. SELECT m.member_name FROM project_members m, projects p
    Queries having SELECT TOP or LIMIT clauses are marked as "lim-                WHERE m.project = p.unics_id;
ited" and their associated limitation number is temporally stored in              B: "Find the member name of project members associated with projects."
the system. The translation algorithm (BST, MRP, TMT) checks if                   A: "Find participants participated in projects."
the input query is limited and if that is the case an explanation is
separately created and added to the end of the translation. As an
example, we provide that of query 1 in Table 1.                               Essentially, for every attribute node 𝑢 of the query graph we take
    In order for Logos to translate queries having IN and NOT IN              (if that exists) the plural form 𝑝 of its label 𝑙 (𝑢), i.e., 𝑝 (𝑙 (𝑢)).
operators, new types of query graph elements were created: the in                 In addition, we have modified the way the MRP algorithm trans-
edge, the not-in edge, and the value-list node. An illustrative example       lates the group by clause. Previously the translation for this part
of this case is query 2 in Table 1 whose query graph is given in              would result in a separate sentence. Leveraging the nature of the
Figure 3. The translation algorithm detects those new elements of             algorithm which aims on generating translations by blending differ-
the query graph and produces a textual explanation.                           ent parts of the query, we now blend this translation more naturally.
    Regarding the translation of queries having the (NOT) LIKE                    An example that captures all the aforementioned changes (bridge
operator, we created two new types of edges: the like edge and the            tables, plural form, group by on MRP) is that of query 2 in Table 2.
not-like edge. An example of this case is that of query 3 in Table 1.             Lastly, we have improved the translation of queries which in-
    Our last improvement, in terms of translation capabilities, is the        clude only heading attributes [4] in the SELECT clause. The heading
creation of a new query graph node, the star node. In combination             attribute is the most characteristic attribute of its relation. As an
with the function node that represents the COUNT operator, we now             example we offer that of query 3 in Table 2.
get explanations like that of query 4 in Table 1.
    Fluent Explanations. Our second goal was to improve the                   5      EXPERIMENTAL RESULTS
translations generated by the system in terms of fluency. This kind           The evaluation of Logos is divided into two parts: (a) the automated
of changes require only the modification of the traversal strategies          evaluation part, where we evaluate our results using the Bilingual
and the template mechanism.                                                   Evaluation Understudy (BLEU) automated metric [6], and (b) the
    While exploring the system’s capabilities, we noticed that many           human evaluation part, where we evaluate our results using the
database schemas include bridge tables, i.e., special tables used to          help of SQL experts. The purpose of the first part is to use a well-
transform many-to-many relationships into one-to-many relation-               established metric to show how good the NL explanations are, while
ships. Bridge tables (usually) appear in queries in order to join two         the second part aims at evaluating qualitative aspects of the NL
tables. Although those tables play an important role in terms of              explanations, such as clarity and fluency.
data modeling, they cause "noisy" explanations. For instance, let                For both types of evaluation, we created 28 queries (14 for the
us consider query 1 of Table 2. Apparently, the previously pro-               CORDIS database, and 14 for the SDSS 2 database) (see Tables 6
duced explanation of the system is unnatural. Bridge tables are               and 7, respectively). Using the MRP algorithm, we translated those
manually stored in a designer table and excluded from the trans-              queries twice. One time using the default version of the system,
lation. The template mechanism has been modified so that if 𝑢,                which works by considering only the database schema (internal
𝑣, and 𝑤 are table nodes with 𝑣 corresponding to a bridge table,              knowledge), and one time using both the database schema and the
the template label of the path connecting 𝑢 to 𝑤 is of the form               designer tables (internal and external knowledge), which, as men-
𝑙 ((𝑢, 𝑤)) = 𝑙 (𝑢) + 𝑙 (𝑢, 𝑤) + 𝑙 (𝑤).                                        tioned in the previous sections, store information about heading
    Moreover, a mini dictionary has been developed providing the
translation process with the plural form of all the attribute labels.         2 https://www.sdss.org/

                                                                          3
Table 3: BLEU scores for the textual explanations of the 14                 that the automatically generated translations were in fact correct.
CORDIS queries, and the 14 SDSS queries.                                    However, they looked very different from the ground truth. Indica-
                                                                            tively, we show the translations of query 9 (from Table 6).
       Query      CORDIS BLEU Scores       SDSS BLEU Scores                    CORDIS query 9 explanations:
        ID        Logos v.1 Logos v.2    Logos v.1 Logos v.2
                                                                                – Logos v.1: “Find the institutions names of institutions associated
         1           3.21      3.98         4.99      12.55
                                                                                   with countries whose country name is France.”
         2           4.37      6.27         2.45       2.66
         3           2.26      9.26        11.71      15.73                     – Logos v.2: “Find institutions located in countries whose name is
         4           3.51     12.87         4.86       5.01                        France.”
         5           2.01     15.46         2.84       3.67                     – Ground truth: "Show names of institutions from France.".
         6           3.40     11.20         3.38       4.07                     We see that Logos would not necessarily produce translations the
         7           4.03      4.32         3.74       4.37                 way that a human mind would produce. And even different people
         8           8.23      9.55        14.72      14.01                 would provide different explanations for the same SQL query (albeit
         9          12.30      9.29         4.46      18.80                 all correct). This shows the opportunity of enhancing the translation
        10          18.30      7.41         4.03       4.07                 capabilities of the system with learning that not only leverages the
        11          14.46     24.81        18.46      27.36                 database schema but is also performed on previously defined human
        12          10.70     17.40         6.87       8.56
                                                                            translations. It also shows the challenge of creating automated
        13           4.07      4.20         4.30      22.24
                                                                            evaluation metrics which do not judge the explanations quality
        14           3.67      4.46        35.61       5.50
                                                                            only by estimating their similarity with the ground truth. These
       MAX          18.30     24.81        35.61      27.36
       MIN           2.01      4.20         2.45       2.66
                                                                            observations lead to the need of conducting human evaluation as
      MEDIAN         4.05      9.28         4.66       7.03                 well, which will be presented in the next subsection.
                                                                                Focusing now on the results of CORDIS, we see that there is
                                                                            significant improvement on the translations of queries with id 3-6,
attributes, bridge tables, and node or edge labels. First of all, we        11, and 12. This is mainly due to the exclusion of bridge tables from
want to investigate the effect of the designer tables to the transla-       the translation procedure and the heading attribute addition. For
tions. Moreover, we want to know how close to the ground truth              queries with id 9, and 10 we noticed a score reduction. Indicatively,
(textual explanations given by SQL experts of the databases) the            looking at the translations of query 9 above, we observe that al-
system’s explanations are. In what follows, we denote the default           though the translation of Logos v.2 is more natural than that of
version of the system as Logos v.1 and the version that takes the           Logos v.1, the presence of the sentence “names of institutions” in
advantage of the designer tables as Logos v.2.                              the translation of the latter leads to a higher BLEU score.
                                                                                Let us now focus on the results of SDSS. The scores are lower
5.1    Automated Evaluation                                                 than those of the CORDIS database. This is due to the nature of
                                                                            the SDSS database that uses abbreviated names and letter symbols
Automated evaluation was carried out to compare the generated
                                                                            in order to describe the content of its tables and attributes. For in-
explanations to the ground truth i.e., textual explanations of SQL
                                                                            stance, “photoobj” instead of “photometric objects”, or the letter “u”
queries given by SQL experts of the databases, members of the
                                                                            to denote the magnitude of a photometric object in “u” (ultraviolet)
INODE project [1]. The quality of the results is measured using the
                                                                            filter. During the experiments, we realized that by transforming
BLEU-4 score. BLEU is a score for comparing a candidate translation
                                                                            those names and symbols into meaningful textual sentences, we
of text to one or more reference translations. The scores range
                                                                            increase the size of the explanations compared to the size of ex-
between 0% and 100%. A score of 100% means that the estimated,
                                                                            planations provided by the astrophysicist SQL expert. This shows
by the system, explanation matches completely the ground truth.
                                                                            another challenge for the automatic generation of NL explanations:
   The results are summarized in Table 3. We also report the min-
                                                                            different styles of explanations may be given by domain experts
imum and the maximum BLEU score. Furthermore, we noticed a
                                                                            in different fields. Thus, different explanations are suitable for dif-
large variation between the scores; thus, we decided to report the
                                                                            ferent groups of people (e.g., astronomers and data scientists). For
median BLEU score per translation system instead of the average.
                                                                            example, for query 14, the BLEU score of Logos v.2 is substantially
For both databases, median scores are under 10%. Looking at the
                                                                            lower than that of Logos v.1.
medians, we conclude that Logos v.2 produces translations closer
                                                                                SDSS query 14 explanations:
to the ground truth than those obtained from Logos v.1. Especially
for the CORDIS database, the median BLEU score of Logos v.2 is                  – Logos v.1: “Find the u, g, r, i and z of photoobj associated with specobj
more than two times higher than that of Logos v.1.                                 whose class is QSO.”
   The low scores do not indicate that the translations produced                – Logos v.2: “Find the magnitude u, magnitude g, magnitude r, mag-
by Logos are not correct. BLEU scores work by counting matching                   nitude i and magnitude z of photometric objects corresponding to
n-grams in the candidate translation to n-grams in the reference                  spectroscopic objects whose class is QSO.”
text, where 1-gram or unigram would be each token and a bigram                  – Ground truth: "Show me the u, g, r, i, z magnitudes of spectroscopic
comparison would be each word pair. That means that the score                     quasars.".
is higher the more common parts a NL explanation has with the                  We concluded that this kind of notation (abbreviated names, and
ground truth. Manual examination of the NL explanations that the            letter symbols), for the attributes of the SDSS tables, is sometimes
two versions of Logos generated versus the ground truth showed              preferred over full descriptions.
                                                                        4
Table 4: Average clarity, fluency, and precision per database, and translation system (their standard deviation in parentheses).

                                                            CORDIS                                      SDSS
                               Features
                                           Logos v.1     Logos v.2     Ground truth   Logos v.1     Logos v.2     Ground truth
                                Clarity    4.25 (1.29)   5.79 (1.13)    6.79 (0.31)   4.32 (1.11)   6.04 (0.77)    6.50 (0.57)
                               Fluency     3.64 (1.43)   4.75 (1.06)    6.86 (0.35)   3.39 (0.97)   5.50 (0.87)    6.57 (0.56)
                               Precision   6.04 (1.46)   5.79 (1.10)    5.18 (1.75)   6.50 (0.50)   6.21 (0.70)    5.18 (1.01)


  Lastly, it has been observed that the SDSS database includes                    Table 5: Final scores per feature for the explanations of the
many discrete variables (attributes) that define different types of               CORDIS query with id 9.
objects, e.g., stars. A fine example of that case is that of query 7.
  SDSS query 7 (Table 7) explanations:                                                Features        Rating
                                                                                                                           CORDIS QUERY ID 9
                                                                                                                  Logos v.1 Logos v.2 Ground truth
      – Logos v.1: “Find the specobjids of specobj whose subclass is OB and                          Expert A         2          6           7
        class is STAR.”                                                               Clarity        Expert B         4          6           7
      – Logos v.2: “Find spectroscopic objects whose spectroscopic subclass                         Final Score       3          6           7
        is OB and class is STAR.”                                                                    Expert A         1          3           7
      – Ground truth: "Find all spectroscopic stars which are massive and             Fluency        Expert B         4          4           7
        hot.".                                                                                      Final Score      2.5        3.5          7
                                                                                                     Expert A         7          7           7
   We see that both versions of Logos do not understand that “sub-
                                                                                      Precision      Expert B         6          7           7
class = OB” and “class = STAR” means massive and hot stars. This                                    Final Score      6.5         7           7
justifies the low BLEU scores in both versions of the system. For
this purpose, additional knowledge is required. We are in fact in the
process of integrating an ontology that provides such mappings.                   feature. Indicatively, in Table 5 we show the data collected for the
                                                                                  explanations of the CORDIS query with id 9, and the obtained final
                                                                                  scores. From those final scores, 18 rating sets (2 databases x 3 trans-
5.2     Human Evaluation
                                                                                  lation systems x 3 features) consisting of 14 elements each (1 for
For this experimental setting, an online survey was conducted.                    every query), were created.
A total of 21 people, all SQL experts, participated in the survey.                   In Table 4, we show the averages of those sets, accompanied
The experts were members of the INODE project [1]. People who                     with their standard deviation between brackets. Logos v.2 leads to
contributed to the creation of labels (ground truth explanation)                  better translations in terms of clarity and fluency for both databases
were excluded from the evaluation process. From the pool of 28                    (average score increases). However, we see that these scores do not
SQL queries (see Tables 6 and 7), participants were asked to rate the             surpass those of the ground truth. An interesting observation is
textual explanations of 4 randomly chosen queries (2 per database).               that as the explanations become clearer and more fluent, precision
The queries were equally distributed to all participants. For each                decreases. In other words, as the explanations become more natural,
query, the participant rated 3 explanations (1 per translation): (a) the          they tend to lose their ability to explicitly explain each part of their
explanation produced by Logos v.1, (b) the explanation produced                   associated SQL query. Lastly, we see that the difference between
by Logos v.2, and (c) the ground truth explanation, resulting in a                the average fluency scores of Logos v.2 and Logos v.1, increases for
total of 84 explanations rated by humans.                                         the SDSS database. As mentioned in the previous Section, this is
   The participants judged the quality of the translations on the                 due to the nature of the SDSS database which has a less explainable
7-point Likert-scales. They provided ratings for:                                 database schema in terms of NL explanation. By adopting labels
      • clarity: how clear and understandable the explanation is.                 for the components of the database schema (tables, attributes, and
      • fluency: how natural the explanation is.                                  joins), we increase the average fluency score (Logos v.2).
      • precision: how well the information of the provided SQL
        query is captured on its textual explanation.                             6    CONCLUDING REMARKS
A score of 7 to all the aforementioned categories means that the                  In this paper, the translation of SQL queries has been discussed as a
provided explanation is extremely clear, natural, and precise, re-                potential solution to problems that rise in data exploration. In that
spectively. We score neutral as a 4.                                              vein, we have extended the graph-based model of Logos [3]. The
   Data associated with respondents that completed the survey in                  system’s capabilities have been extended to two different directions:
less than 5 minutes (half the approximate time for filling out the sur-           (a) that of translating different types of queries and (b) that of
vey) were deleted. Furthemore, we deleted the data of participants                creating fluent explanations. To this end, we have implemented
which have selected the same response to every question, regard-                  changes in the system’s parser, introduced new types of nodes
less of the question. After cleaning the data, we ended up having 2               and edges for the query graph, created new translation rules and
different scores (per explanation, and feature), corresponding to 2               templates, and finally modified the available traversal strategies
different participants. The final score of an explanation for a given             (BST, MRP, TMT). In addition, a data set composed of 28 queries
feature is obtained by taking the average of the 2 different scores.              coming from different user cases (astronomy, policy making) has
Therefore, we ended up having 1 single score per explanation, and                 been created to perform both automated and human evaluation.
                                                                              5
   The experiments highlighted the following two challenges: (a)                              7    APPENDIX
the need for creating automated metrics which do not consider
only the ground truth and (b) the need for generating different                                   Table 6: The 14 SQL queries of the CORDIS database.
explanations for different groups of people. Regarding the latter, we
observed that: (a) scientific related databases are far more difficult                            ID        CORDIS SQL Queries
                                                                                                            SELECT sum(pm.ec_contribution) AS funding_received
to be explained by natural language and (b) sometimes the scientists                              1         FROM projects p, project_members pm
themselves ask for less detailed interpretations. Lastly, it has been                                       WHERE pm.project=p.unics_id AND p.framework_program=’H2020’;
observed that as the explanations become more natural, they tend                                            SELECT c.name FROM institutions i, countries c
                                                                                                  2
                                                                                                            WHERE c.unics_id=i.country_id AND i.name=’Athena’;
to lose their ability to explicitly explain each part of the query.                                         SELECT pr.title FROM projects pr, project_subject_areas
Depending on the application, one may want to have more precise                                   3         psa, subject_areas sa WHERE pr.unics_id = psa.project AND
                                                                                                            psa.subject_area = sa.code AND sa.title = ’Mathematics and Statistics’;
or more natural explanations.
                                                                                                            SELECT distinct t.title FROM projects pr, project_topics
   Having said all this, SQL to text is a hard problem to solve and                               4         pt, topics t WHERE pr.unics_id = pt.project AND pt.topic
there is no single solution. Different explanations may be suitable                                         = t.code AND pr.end_year = 2014;
                                                                                                            SELECT p.full_name FROM people p, projects pr,
for different applications or even different groups of people, albeit                                       project_topics pt, topics t WHERE p.unics_id =
all correct. As future work, we plan to improve our system by                                     5
                                                                                                            pr.principal_investigator AND pr.unics_id = pt.project
adding an ontology which provides mappings for the interpretation                                           AND pt.topic = t.code AND t.title = ’Systems’;
                                                                                                            SELECT m.title FROM people p, projects pr,
of several parts of the query graph.                                                                        project_programmes pm, programmes m WHERE p.unics_id =
                                                                                                  6         pr.principal_investigator AND pr.unics_id = pm.project
                                                                                                            AND pm.programme = m.code AND p.full_name = ’Thomas
ACKNOWLEDGMENTS                                                                                             Bell’;
This work was supported by the European Union’s Horizon 2020                                                SELECT p.acronym FROM projects p, project_members pm,
                                                                                                            institutions i, countries c WHERE p.unics_id = pm.project
research and innovation programme under grant agreement (No                                       7
                                                                                                            AND pm.institution_id = i.unics_id AND i.country_id =
863410). We also thank the Intelligent Open Data Exploration (IN-                                           c.unics_id AND c.country_name = ’Greece’;
                                                                                                            SELECT pe.full_name FROM projects pr, people pe
ODE) team for contributing on the system’s evaluation.                                                      WHERE pr.principal_investigator = pe.unics_id AND
                                                                                                  8
                                                                                                            pr.start_year = 2014;
                                                                                                            SELECT i.institutions_name FROM institutions i, countries
REFERENCES                                                                                        9         c WHERE i.country_id = c.unics_id AND c.country_name =
 [1] Sihem Amer-Yahia, Georgia Koutrika, Frederic Bastian, Theofilos Belmpas, Mar-                          ’France’;
     tin Braschler, Ursin Brunner, Diego Calvanese, Maximilian Fabricius, Orest                             SELECT mb.member_name FROM project_members mb,
     Gkini, Catherine Kosten, Davide Lanti, Antonis Litke, Hendrik Lücke-Tieke,                   10        activity_types a WHERE a.code = mb.activity_type AND
     Francesco Alessandro Massucci, Tarcisio Mendes de Farias, Alessandro Mosca,                            a.description = ’Research Organisations’;
     Francesco Multari, Nikolaos Papadakis, Dimitris Papadopoulos, Yogendra Patil,                          SELECT count(p.title) FROM projects p GROUP BY
     Aurélien Personnaz, Guillem Rull, Ana Claudia Sima, Ellery Smith, Dimitrios                  11
                                                                                                            p.start_year;
     Skoutas, Srividya Subramanian, Guohui Xiao, and Kurt Stockinger. 2021. INODE:                          SELECT count(i.name) FROM institutions i, countries c
     Building an End-to-End Data Exploration System in Practice [Extended Vision].                12
                                                                                                            WHERE i.country_id=c.unics_id GROUP BY c.name;
     CoRR abs/2104.04194 (2021). arXiv:2104.04194 https://arxiv.org/abs/2104.04194                13        SELECT title FROM topics WHERE title like ’%climate%’;
 [2] Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Sum-                       SELECT count(p.title) FROM projects p WHERE
     marizing Source Code using a Neural Attention Model. In ACL. The Association                 14
                                                                                                            p.start_year=2018;
     for Computer Linguistics.
 [3] Andreas Kokkalis, Panagiotis Vagenas, Alexandros Zervakis, Alkis Simitsis, Geor-
     gia Koutrika, and Yannis Ioannidis. 2012. Logos: a system for translating queries
     into narratives. In Proceedings of the 2012 ACM SIGMOD International Conference                   Table 7: The 14 SQL queries of the SDSS database.
     on Management of Data. 673–676.
 [4] Georgia Koutrika, Alkis Simitsis, and Yannis E Ioannidis. 2010. Explaining struc-
                                                                                                       ID      SDSS SQL Queries
     tured queries in natural language. In ICDE. IEEE, 333–344.
                                                                                                        1      SELECT objid FROM photoobj WHERE clean=1;
 [5] Axel-Cyrille Ngonga Ngomo, Lorenz Bühmann, Christina Unger, Jens Lehmann,
     and Daniel Gerber. 2013. Sorry, i don’t speak SPARQL: translating SPARQL                                  SELECT specobjid, z FROM specobj WHERE class = ’QSO’ AND
                                                                                                       2
     queries into natural language. In Proceedings of the 22nd international conference                        zwarning = 0;
     on World Wide Web. 977–988.                                                                               SELECT objid FROM photoobj WHERE ra >185 AND ra <185.1
                                                                                                       3
 [6] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a                                AND dec <5;
     method for automatic evaluation of machine translation. In Proceedings of the                     4       SELECT specobjid FROM specobj WHERE survey = ’segue2’;
     40th annual meeting of the Association for Computational Linguistics. 311–318.                            SELECT specobjid FROM specobj WHERE class = ’STAR’ AND
                                                                                                       5
 [7] Ehud Reiter and Robert Dale. 1997. Building applied natural language gen-                                 zwarning = 0;
     eration systems. Nat. Lang. Eng. 3, 1 (1997), 57–87. https://doi.org/10.1017/                             SELECT s.specobjid FROM specobj as s WHERE s.subclass =
                                                                                                       6
     S1351324997001502                                                                                         ’STARFORMING’ AND s.class= ’GALAXY’;
 [8] Alkis Simitsis and Yannis E. Ioannidis. 2009. DBMSs Should Talk Back Too. In                              SELECT s.specobjid FROM specobj as s WHERE s.subclass =
                                                                                                       7
     CIDR.                                                                                                     ’OB’ AND s.class= ’STAR’;
 [9] Alkis Simitsis, Georgia Koutrika, Yannis Alexandrakis, and Yannis Ioannidis. 2008.                        SELECT * FROM photoobj WHERE ra >100 and dec <100 AND
                                                                                                       8
     Synthesizing structured text from logical database subsets. In EDBT. 428–439.                             type = 3;
[10] Chris van der Lee, Emiel Krahmer, and Sander Wubben. 2018. Automated learning                      9      SELECT DISTINCT type FROM photoobj;
     of templates for data-to-text generation: comparing rule-based, statistical and                   10      SELECT class, count(*) FROM specobj GROUP BY class;
     neural methods. In Proceedings of the 11th International Conference on Natural                            SELECT g.* FROM specobj s, galspecline g WHERE
                                                                                                       11
     Language Generation. 35–45.                                                                               s.specobjid = g.specobjid and ra <185 AND dec <25;
[11] Sam Wiseman, Stuart M Shieber, and Alexander M Rush. 2018. Learning neural                                SELECT n.* FROM neighbors n, photoobj p WHERE p.objid =
     templates for text generation. arXiv preprint arXiv:1808.10122 (2018).                            12
                                                                                                               n.objid AND p.b = 1.072 and p.l = 174.535;
[12] Kun Xu, Lingfei Wu, Zhiguo Wang, Yansong Feng, and Vadim Sheinin. 2018.                                   SELECT s.* FROM specobj s, galspecline g
     SQL-to-Text Generation with Graph-to-Sequence Model. In Proceedings of the                        13
                                                                                                               WHERE s.specobjid = g.specobjid;
     2018 Conference on Empirical Methods in Natural Language Processing, Ellen Riloff,                        SELECT p.u, p.g, p.r, p.i, p.z FROM specobj s, photoobj p
     David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). 931–936.                             14
                                                                                                               WHERE s.bestobjid = p.objid AND s.class = ’QSO’;

                                                                                          6