Let the Database Talk Back: Natural Language Explanations for SQL Stavroula Eleftherakis Orest Gkini Georgia Koutrika Athena Research Center Athena Research Center Athena Research Center Athens, Greece Athens, Greece Athens, Greece seleftheraki@athenarc.gr orestg@athenarc.gr georgia@athenarc.gr ABSTRACT addressed [8]. Furthermore, several different explanations can be Database interaction is often characterized as a non-trivial and time- generated for the same SQL query, making evaluation challenging. consuming process due to user’s inexperience with the data or the Logos [3] is a well-established system which translates SQL query language. Therefore, there is a need for the databases to be queries into narratives. In contrast to neural approaches, Logos’ able to "talk back" in order to assist the users during data exploration model can be interpreted. This gives us the advantage of offering and eventually lead them to the desired results. In this paper, we precise translations, knowing exactly how translations are pro- tackle the problem of SQL-to-NL by extending the graph-based duced. Nevertheless, the system’s translation capabilities are highly model of Logos [3]. Our novel extensions include improvements depended on the query’s syntax, meaning that for each type of in terms of the system’s translation capabilities and the fluency of query the system should have the necessary tools to model it. More- the generated explanations. Finally, we report several challenges, over, depended on the query’s complexity (number of attributes, highlighted by experiments on different user cases, i.e, astronomy bridge tables, etc.) translations can become unnatural. and policy making. In our work, we provide NL explanations as part of a data ex- ploration platform used by our collaborators representing different fields, including astrophysics and policy making [1]. In this plat- Reference Format: form, NL explanations are provided for explaining the internal SQL Stavroula Eleftherakis, Orest Gkini, and Georgia Koutrika. Let the Database Talk Back: Natural Language Explanations for SQL. In the 2nd Workshop queries generated by different system components used for recom- on Search, Exploration, and Analysis in Heterogeneous Datastores (SEA mendations, NL-to-SQL translation, and data exploration. Thus, we Data 2021). have to translate a variety of queries containing different clauses and operators. Furthermore, there is a need for these translations to be as natural is possible. 1 INTRODUCTION In an effort to tackle the above-mentioned challenges, we ex- tend Logos to two important directions: (a) translation capabilities, Nowadays, the availability of data coming from many and different where we focus on the system’s ability of translating different types application fields urges the deployment of sophisticated tools which of queries, (b) fluent explanations, where we focus on improving help us effectively explore and extract knowledge out of this data. the system’s translations. Furthermore, we present experimental However, the variety of the data sources (astronomical, biomedical, results as well as results of a user study over two different databases: etc.) as well as the complexity of query languages such as SQL astronomical data and policy-making data. Our evaluation shows and SPARQL often pose obstacles during data exploration due to the effectiveness of our approach and provides several insights the users’ unfamiliarity with the database content or the query regarding challenges that arise due to the ambiguous nature of the language. Explaining queries in text can help tackle both problems NL explanations including: (a) scoring textual explanations, and (b) as it enables users to understand the SQL queries that are used to generating explanations suitable for different groups of people. retrieve the answers through the data exploration process [8]. The rest of the paper is organised as follows. In Section 2, related The problem of translating SQL queries to natural language, or work is discussed. In Section 3, we provide background information SQL-to-NL in short, appears to be deceivingly simple, as SQL queries on Logos. In Section 4, we present our novel extensions. In Section are using a restricted vocabulary, comprising SQL elements (clauses, 5, the effectiveness of the system is explored. Finally, concluding operators, etc.) and database elements (i.e., relations, attributes), and remarks are provided in Section 6. there is no ambiguity in interpreting such elements. Hence, initially a straightforward translation appears adequate. Reality teaches us quite the opposite, as the resulting text should be accurate in 2 RELATED WORK capturing the respective SQL query, and effective allowing fast Existing approaches for SQL-to-NL can be broadly divided into two and unique interpretation of it. Achieving both of these qualities is categories. Template-/rule-based approaches (e.g., [4], [5]) require very difficult and raises several technical challenges that need to be the design of special templates and rules that are used to compose sentences. Due to the non-ambiguous nature of the SQL queries, templates can help limit the syntactic variability of their output Copyright © 2021 for the individual papers by the papers’ authors. Copyright © 2021 for the volume as a collection by its editors. This volume and its papers are published and smooth explanations out [7]. Logos [3] falls into this category. under the Creative Commons License Attribution 4.0 International (CC BY 4.0). The second line of research tackles the problem as neural machine Published in the Proceedings of the 2nd Workshop on Search, Exploration, and Anal- translation and uses sequence-to-sequence (Seq2Seq) models (e.g., ysis in Heterogeneous Datastores, co-located with VLDB 2021 (August 16-20, 2021, Copenhagen, Denmark) on CEUR-WS.org. [2], [12]) which treat both the natural language description and Figure 3: A query with IN and NOT IN operators. Figure 1: A subgraph of the CORDIS database graph. Figure 2 zooms in on the join path connecting relation PEOPLE with relation PROJECTS and provides a more detailed view. A possi- ble label which expresses this connection is “principal investigators of”, while the default one is “associated with”. Such designer labels Figure 2: A join on the CORDIS database graph. are stored in special tables called designer tables. □ NL explanations are created by traversing the query graph ac- the query as sequences. In contrast to the previous category, those companied with a template mechanism [9]. The template mechanism models automatically learn how to translate queries without the uses the provided labels (designer or default) to form meaningful need of predefined query patterns. However, they require plenty of phrases. A template label 𝑙 (𝑢) or 𝑙 ((𝑢, 𝑣)) is assigned to a node 𝑢 training data and fine tuning and their effectiveness is still very low. or a path (𝑢, 𝑣), respectively. For instance, a join template label Lately, hybrid models (e.g., [10], [11]) have been proposed to may have the form: 𝑙 ((𝑢, 𝑣)) = 𝑙 (𝑢) + 𝑒𝑥𝑝𝑟 + 𝑙 (𝑢, 𝑣) + 𝑙 (𝑣), where solve a similar problem to ours, i.e., data to natural language. Those 𝑢 and 𝑣 are relation nodes of the query graph. Using the example models automatically learn templates and use them in order gener- of the previous paragraph (see Figure 2), a template for the path ate textual explanations. connecting the tables PEOPLE and PROJECTS is the following: 𝑙 ( (PEOPLE, PROJECTS)) = 𝑙 (PEOPLE) + " as " + “principal investigators 3 BACKGROUND of” +𝑙 (PROJECTS) = "people as principal investigators of projects" . Synthesis of the NL explanations is performed as graph traversal Logos’ approach to NL generation comprises graph representations of the query graph. There are three traversal strategies: (a) Binary of the database and the queries, template phrases associated with Search Tree (BST) algorithm, where the translation consists of parts of the graph, and graph traversal in particular directions to a composition of clauses each one of them focusing on specific compose the templates found on the way into the final text forma- query semantics, (b) Multiple Reference Points (MRP) algorithm, tion. We provide an overview of the main ingredients: the database where information from all parts of the query graph is blended in graph, the query graph, labels, templates, and the algorithms. the translation, (c) Template composition (TMT) algorithm, where The database graph captures the relationships between the data- predefined, richer templates corresponding to different query parts base relations and attributes (nodes of the graph). Its edges are are used in an effort to produce more concise translations. In what divided into three types: (a) membership edges (attributes to rela- follows, our examples were created using the MRP traversal strategy tions), representing attribute projections in queries, (b) selection and correspond to queries targeting the CORDIS database. edges (relations to attributes), representing attributes in predicates, and (c) join edges (attribute to attribute), representing joins. The query graph of a query is the part of the database graph 4 TOWARDS RICHER TRANSLATIONS that the query refers to, extended with additional nodes (e.g., for Logos was extended in two different directions. The first direction functions) and edges (e.g., for group-by’s) in order to capture the aimed to enable the translation of more query types, while the other entire query meaning. Each node or edge of the query graph can one aimed to create fluent explanations. For this purpose, we have be annotated with a label that signifies its meaning in natural upgraded the query parsing, graph generation, traversal strategies, language. There are default labels that can be used for any database. and template mechanism. Additionally, labels can be provided by a domain expert. Translation Capabilities. Our first goal was to extend the Example. Let us consider the CORDIS 1 database, which stores translation capabilities of Logos towards translating more clauses information about research projects funded by the European Union. and operators. To that end, Logos should be able to analyze those In Figure 1, we see a subgraph of the corresponding database graph. new types of queries, enrich (if necessary) the query graph with new Note that for simplicity, we do not show the attributes used to types of nodes and edges, and translate those new elements using join the various relations. By default, the name of each node is any of the traversal strategies (BST, MRP, TMT). Thus, we have also its label. We also see how edges are annotated with default implemented changes in the system’s parser, introduced new graph labels. We can easily override these. For instance, for the relation elements, and created new translation rules. As a result, the system PROJECT_MEMBERS, instead of the system’s default label “project is now capable of translating queries with the SELECT TOP and members”, we can use the short label “participants". LIMIT clauses, as well as the (NOT) IN and (NOT) LIKE operators. Once an input query is given to the system, the parser analyzes 1 https://data.europa.eu/euodp/en/data/dataset/cordisH2020projects it and generates a parsing tree that stores important information for 2 Table 1: Example SQL queries and generated interpretations. Table 2: Example SQL queries and generated interpretations, where B and A denotes before and after, respectively. SQL Query Interpretations 1. SELECT TOP 10 title FROM projects WHERE start_year=2021; SQL Query Interpretations "Find projects whose start year is 2021. Limit the results to top 10." 1. SELECT pr.* FROM projects pr, project_subject_areas psa, 2. SELECT total_cost, ec_max_contribution, framework_program, subject_areas sa WHERE pr.unics_id = psa.project AND ec_fund_scheme FROM projects WHERE framework_program IN (’FP7’) psa.subject_area = sa.code AND sa.title = ’Mathematics and Statistics’; AND ec_fund_scheme NOT IN (’ERC-SG’, ’ERC-CG’); B: "Find everything about projects associated with project subject areas, "Find the total costs, max contributions, framework programs and fund and for project subject areas associated with subject areas whose title is schemes of projects whose framework program is FP7 and fund scheme is Mathematics and Statistics." not ERC-SG and ERC-CG." A: "Find everything about projects on subject areas whose title is 3. SELECT title FROM topics WHERE title LIKE ’%climate%’; Mathematics and Statistics." "Find the titles of topics whose title is like %climate%." 2. SELECT COUNT(p.title), t.title FROM topics t, project_topics pt, 4. SELECT COUNT(∗) FROM projects WHERE start_year=2012; projects p WHERE t.code = pt.topic AND pt.project = p.unics_id "Find the number of projects whose start year is 2012." GROUP BY t.title; B: "Create groups according to the title of topics. Find the number of projects and the title of topics associated with project topics associated with the projects." the creation of the query graph. For the parser, we have introduced A: "Find the titles of topics and the number of projects on these topics to the system several new parsing nodes (e.g., for limit) capturing grouped by the titles of topics." information about all the aforementioned clauses and operators. 3. SELECT m.member_name FROM project_members m, projects p Queries having SELECT TOP or LIMIT clauses are marked as "lim- WHERE m.project = p.unics_id; ited" and their associated limitation number is temporally stored in B: "Find the member name of project members associated with projects." the system. The translation algorithm (BST, MRP, TMT) checks if A: "Find participants participated in projects." the input query is limited and if that is the case an explanation is separately created and added to the end of the translation. As an example, we provide that of query 1 in Table 1. Essentially, for every attribute node 𝑢 of the query graph we take In order for Logos to translate queries having IN and NOT IN (if that exists) the plural form 𝑝 of its label 𝑙 (𝑢), i.e., 𝑝 (𝑙 (𝑢)). operators, new types of query graph elements were created: the in In addition, we have modified the way the MRP algorithm trans- edge, the not-in edge, and the value-list node. An illustrative example lates the group by clause. Previously the translation for this part of this case is query 2 in Table 1 whose query graph is given in would result in a separate sentence. Leveraging the nature of the Figure 3. The translation algorithm detects those new elements of algorithm which aims on generating translations by blending differ- the query graph and produces a textual explanation. ent parts of the query, we now blend this translation more naturally. Regarding the translation of queries having the (NOT) LIKE An example that captures all the aforementioned changes (bridge operator, we created two new types of edges: the like edge and the tables, plural form, group by on MRP) is that of query 2 in Table 2. not-like edge. An example of this case is that of query 3 in Table 1. Lastly, we have improved the translation of queries which in- Our last improvement, in terms of translation capabilities, is the clude only heading attributes [4] in the SELECT clause. The heading creation of a new query graph node, the star node. In combination attribute is the most characteristic attribute of its relation. As an with the function node that represents the COUNT operator, we now example we offer that of query 3 in Table 2. get explanations like that of query 4 in Table 1. Fluent Explanations. Our second goal was to improve the 5 EXPERIMENTAL RESULTS translations generated by the system in terms of fluency. This kind The evaluation of Logos is divided into two parts: (a) the automated of changes require only the modification of the traversal strategies evaluation part, where we evaluate our results using the Bilingual and the template mechanism. Evaluation Understudy (BLEU) automated metric [6], and (b) the While exploring the system’s capabilities, we noticed that many human evaluation part, where we evaluate our results using the database schemas include bridge tables, i.e., special tables used to help of SQL experts. The purpose of the first part is to use a well- transform many-to-many relationships into one-to-many relation- established metric to show how good the NL explanations are, while ships. Bridge tables (usually) appear in queries in order to join two the second part aims at evaluating qualitative aspects of the NL tables. Although those tables play an important role in terms of explanations, such as clarity and fluency. data modeling, they cause "noisy" explanations. For instance, let For both types of evaluation, we created 28 queries (14 for the us consider query 1 of Table 2. Apparently, the previously pro- CORDIS database, and 14 for the SDSS 2 database) (see Tables 6 duced explanation of the system is unnatural. Bridge tables are and 7, respectively). Using the MRP algorithm, we translated those manually stored in a designer table and excluded from the trans- queries twice. One time using the default version of the system, lation. The template mechanism has been modified so that if 𝑢, which works by considering only the database schema (internal 𝑣, and 𝑤 are table nodes with 𝑣 corresponding to a bridge table, knowledge), and one time using both the database schema and the the template label of the path connecting 𝑢 to 𝑤 is of the form designer tables (internal and external knowledge), which, as men- 𝑙 ((𝑢, 𝑤)) = 𝑙 (𝑢) + 𝑙 (𝑢, 𝑤) + 𝑙 (𝑤). tioned in the previous sections, store information about heading Moreover, a mini dictionary has been developed providing the translation process with the plural form of all the attribute labels. 2 https://www.sdss.org/ 3 Table 3: BLEU scores for the textual explanations of the 14 that the automatically generated translations were in fact correct. CORDIS queries, and the 14 SDSS queries. However, they looked very different from the ground truth. Indica- tively, we show the translations of query 9 (from Table 6). Query CORDIS BLEU Scores SDSS BLEU Scores CORDIS query 9 explanations: ID Logos v.1 Logos v.2 Logos v.1 Logos v.2 – Logos v.1: “Find the institutions names of institutions associated 1 3.21 3.98 4.99 12.55 with countries whose country name is France.” 2 4.37 6.27 2.45 2.66 3 2.26 9.26 11.71 15.73 – Logos v.2: “Find institutions located in countries whose name is 4 3.51 12.87 4.86 5.01 France.” 5 2.01 15.46 2.84 3.67 – Ground truth: "Show names of institutions from France.". 6 3.40 11.20 3.38 4.07 We see that Logos would not necessarily produce translations the 7 4.03 4.32 3.74 4.37 way that a human mind would produce. And even different people 8 8.23 9.55 14.72 14.01 would provide different explanations for the same SQL query (albeit 9 12.30 9.29 4.46 18.80 all correct). This shows the opportunity of enhancing the translation 10 18.30 7.41 4.03 4.07 capabilities of the system with learning that not only leverages the 11 14.46 24.81 18.46 27.36 database schema but is also performed on previously defined human 12 10.70 17.40 6.87 8.56 translations. It also shows the challenge of creating automated 13 4.07 4.20 4.30 22.24 evaluation metrics which do not judge the explanations quality 14 3.67 4.46 35.61 5.50 only by estimating their similarity with the ground truth. These MAX 18.30 24.81 35.61 27.36 MIN 2.01 4.20 2.45 2.66 observations lead to the need of conducting human evaluation as MEDIAN 4.05 9.28 4.66 7.03 well, which will be presented in the next subsection. Focusing now on the results of CORDIS, we see that there is significant improvement on the translations of queries with id 3-6, attributes, bridge tables, and node or edge labels. First of all, we 11, and 12. This is mainly due to the exclusion of bridge tables from want to investigate the effect of the designer tables to the transla- the translation procedure and the heading attribute addition. For tions. Moreover, we want to know how close to the ground truth queries with id 9, and 10 we noticed a score reduction. Indicatively, (textual explanations given by SQL experts of the databases) the looking at the translations of query 9 above, we observe that al- system’s explanations are. In what follows, we denote the default though the translation of Logos v.2 is more natural than that of version of the system as Logos v.1 and the version that takes the Logos v.1, the presence of the sentence “names of institutions” in advantage of the designer tables as Logos v.2. the translation of the latter leads to a higher BLEU score. Let us now focus on the results of SDSS. The scores are lower 5.1 Automated Evaluation than those of the CORDIS database. This is due to the nature of the SDSS database that uses abbreviated names and letter symbols Automated evaluation was carried out to compare the generated in order to describe the content of its tables and attributes. For in- explanations to the ground truth i.e., textual explanations of SQL stance, “photoobj” instead of “photometric objects”, or the letter “u” queries given by SQL experts of the databases, members of the to denote the magnitude of a photometric object in “u” (ultraviolet) INODE project [1]. The quality of the results is measured using the filter. During the experiments, we realized that by transforming BLEU-4 score. BLEU is a score for comparing a candidate translation those names and symbols into meaningful textual sentences, we of text to one or more reference translations. The scores range increase the size of the explanations compared to the size of ex- between 0% and 100%. A score of 100% means that the estimated, planations provided by the astrophysicist SQL expert. This shows by the system, explanation matches completely the ground truth. another challenge for the automatic generation of NL explanations: The results are summarized in Table 3. We also report the min- different styles of explanations may be given by domain experts imum and the maximum BLEU score. Furthermore, we noticed a in different fields. Thus, different explanations are suitable for dif- large variation between the scores; thus, we decided to report the ferent groups of people (e.g., astronomers and data scientists). For median BLEU score per translation system instead of the average. example, for query 14, the BLEU score of Logos v.2 is substantially For both databases, median scores are under 10%. Looking at the lower than that of Logos v.1. medians, we conclude that Logos v.2 produces translations closer SDSS query 14 explanations: to the ground truth than those obtained from Logos v.1. Especially for the CORDIS database, the median BLEU score of Logos v.2 is – Logos v.1: “Find the u, g, r, i and z of photoobj associated with specobj more than two times higher than that of Logos v.1. whose class is QSO.” The low scores do not indicate that the translations produced – Logos v.2: “Find the magnitude u, magnitude g, magnitude r, mag- by Logos are not correct. BLEU scores work by counting matching nitude i and magnitude z of photometric objects corresponding to n-grams in the candidate translation to n-grams in the reference spectroscopic objects whose class is QSO.” text, where 1-gram or unigram would be each token and a bigram – Ground truth: "Show me the u, g, r, i, z magnitudes of spectroscopic comparison would be each word pair. That means that the score quasars.". is higher the more common parts a NL explanation has with the We concluded that this kind of notation (abbreviated names, and ground truth. Manual examination of the NL explanations that the letter symbols), for the attributes of the SDSS tables, is sometimes two versions of Logos generated versus the ground truth showed preferred over full descriptions. 4 Table 4: Average clarity, fluency, and precision per database, and translation system (their standard deviation in parentheses). CORDIS SDSS Features Logos v.1 Logos v.2 Ground truth Logos v.1 Logos v.2 Ground truth Clarity 4.25 (1.29) 5.79 (1.13) 6.79 (0.31) 4.32 (1.11) 6.04 (0.77) 6.50 (0.57) Fluency 3.64 (1.43) 4.75 (1.06) 6.86 (0.35) 3.39 (0.97) 5.50 (0.87) 6.57 (0.56) Precision 6.04 (1.46) 5.79 (1.10) 5.18 (1.75) 6.50 (0.50) 6.21 (0.70) 5.18 (1.01) Lastly, it has been observed that the SDSS database includes Table 5: Final scores per feature for the explanations of the many discrete variables (attributes) that define different types of CORDIS query with id 9. objects, e.g., stars. A fine example of that case is that of query 7. SDSS query 7 (Table 7) explanations: Features Rating CORDIS QUERY ID 9 Logos v.1 Logos v.2 Ground truth – Logos v.1: “Find the specobjids of specobj whose subclass is OB and Expert A 2 6 7 class is STAR.” Clarity Expert B 4 6 7 – Logos v.2: “Find spectroscopic objects whose spectroscopic subclass Final Score 3 6 7 is OB and class is STAR.” Expert A 1 3 7 – Ground truth: "Find all spectroscopic stars which are massive and Fluency Expert B 4 4 7 hot.". Final Score 2.5 3.5 7 Expert A 7 7 7 We see that both versions of Logos do not understand that “sub- Precision Expert B 6 7 7 class = OB” and “class = STAR” means massive and hot stars. This Final Score 6.5 7 7 justifies the low BLEU scores in both versions of the system. For this purpose, additional knowledge is required. We are in fact in the process of integrating an ontology that provides such mappings. feature. Indicatively, in Table 5 we show the data collected for the explanations of the CORDIS query with id 9, and the obtained final scores. From those final scores, 18 rating sets (2 databases x 3 trans- 5.2 Human Evaluation lation systems x 3 features) consisting of 14 elements each (1 for For this experimental setting, an online survey was conducted. every query), were created. A total of 21 people, all SQL experts, participated in the survey. In Table 4, we show the averages of those sets, accompanied The experts were members of the INODE project [1]. People who with their standard deviation between brackets. Logos v.2 leads to contributed to the creation of labels (ground truth explanation) better translations in terms of clarity and fluency for both databases were excluded from the evaluation process. From the pool of 28 (average score increases). However, we see that these scores do not SQL queries (see Tables 6 and 7), participants were asked to rate the surpass those of the ground truth. An interesting observation is textual explanations of 4 randomly chosen queries (2 per database). that as the explanations become clearer and more fluent, precision The queries were equally distributed to all participants. For each decreases. In other words, as the explanations become more natural, query, the participant rated 3 explanations (1 per translation): (a) the they tend to lose their ability to explicitly explain each part of their explanation produced by Logos v.1, (b) the explanation produced associated SQL query. Lastly, we see that the difference between by Logos v.2, and (c) the ground truth explanation, resulting in a the average fluency scores of Logos v.2 and Logos v.1, increases for total of 84 explanations rated by humans. the SDSS database. As mentioned in the previous Section, this is The participants judged the quality of the translations on the due to the nature of the SDSS database which has a less explainable 7-point Likert-scales. They provided ratings for: database schema in terms of NL explanation. By adopting labels • clarity: how clear and understandable the explanation is. for the components of the database schema (tables, attributes, and • fluency: how natural the explanation is. joins), we increase the average fluency score (Logos v.2). • precision: how well the information of the provided SQL query is captured on its textual explanation. 6 CONCLUDING REMARKS A score of 7 to all the aforementioned categories means that the In this paper, the translation of SQL queries has been discussed as a provided explanation is extremely clear, natural, and precise, re- potential solution to problems that rise in data exploration. In that spectively. We score neutral as a 4. vein, we have extended the graph-based model of Logos [3]. The Data associated with respondents that completed the survey in system’s capabilities have been extended to two different directions: less than 5 minutes (half the approximate time for filling out the sur- (a) that of translating different types of queries and (b) that of vey) were deleted. Furthemore, we deleted the data of participants creating fluent explanations. To this end, we have implemented which have selected the same response to every question, regard- changes in the system’s parser, introduced new types of nodes less of the question. After cleaning the data, we ended up having 2 and edges for the query graph, created new translation rules and different scores (per explanation, and feature), corresponding to 2 templates, and finally modified the available traversal strategies different participants. The final score of an explanation for a given (BST, MRP, TMT). In addition, a data set composed of 28 queries feature is obtained by taking the average of the 2 different scores. coming from different user cases (astronomy, policy making) has Therefore, we ended up having 1 single score per explanation, and been created to perform both automated and human evaluation. 5 The experiments highlighted the following two challenges: (a) 7 APPENDIX the need for creating automated metrics which do not consider only the ground truth and (b) the need for generating different Table 6: The 14 SQL queries of the CORDIS database. explanations for different groups of people. Regarding the latter, we observed that: (a) scientific related databases are far more difficult ID CORDIS SQL Queries SELECT sum(pm.ec_contribution) AS funding_received to be explained by natural language and (b) sometimes the scientists 1 FROM projects p, project_members pm themselves ask for less detailed interpretations. Lastly, it has been WHERE pm.project=p.unics_id AND p.framework_program=’H2020’; observed that as the explanations become more natural, they tend SELECT c.name FROM institutions i, countries c 2 WHERE c.unics_id=i.country_id AND i.name=’Athena’; to lose their ability to explicitly explain each part of the query. SELECT pr.title FROM projects pr, project_subject_areas Depending on the application, one may want to have more precise 3 psa, subject_areas sa WHERE pr.unics_id = psa.project AND psa.subject_area = sa.code AND sa.title = ’Mathematics and Statistics’; or more natural explanations. SELECT distinct t.title FROM projects pr, project_topics Having said all this, SQL to text is a hard problem to solve and 4 pt, topics t WHERE pr.unics_id = pt.project AND pt.topic there is no single solution. Different explanations may be suitable = t.code AND pr.end_year = 2014; SELECT p.full_name FROM people p, projects pr, for different applications or even different groups of people, albeit project_topics pt, topics t WHERE p.unics_id = all correct. As future work, we plan to improve our system by 5 pr.principal_investigator AND pr.unics_id = pt.project adding an ontology which provides mappings for the interpretation AND pt.topic = t.code AND t.title = ’Systems’; SELECT m.title FROM people p, projects pr, of several parts of the query graph. project_programmes pm, programmes m WHERE p.unics_id = 6 pr.principal_investigator AND pr.unics_id = pm.project AND pm.programme = m.code AND p.full_name = ’Thomas ACKNOWLEDGMENTS Bell’; This work was supported by the European Union’s Horizon 2020 SELECT p.acronym FROM projects p, project_members pm, institutions i, countries c WHERE p.unics_id = pm.project research and innovation programme under grant agreement (No 7 AND pm.institution_id = i.unics_id AND i.country_id = 863410). We also thank the Intelligent Open Data Exploration (IN- c.unics_id AND c.country_name = ’Greece’; SELECT pe.full_name FROM projects pr, people pe ODE) team for contributing on the system’s evaluation. WHERE pr.principal_investigator = pe.unics_id AND 8 pr.start_year = 2014; SELECT i.institutions_name FROM institutions i, countries REFERENCES 9 c WHERE i.country_id = c.unics_id AND c.country_name = [1] Sihem Amer-Yahia, Georgia Koutrika, Frederic Bastian, Theofilos Belmpas, Mar- ’France’; tin Braschler, Ursin Brunner, Diego Calvanese, Maximilian Fabricius, Orest SELECT mb.member_name FROM project_members mb, Gkini, Catherine Kosten, Davide Lanti, Antonis Litke, Hendrik Lücke-Tieke, 10 activity_types a WHERE a.code = mb.activity_type AND Francesco Alessandro Massucci, Tarcisio Mendes de Farias, Alessandro Mosca, a.description = ’Research Organisations’; Francesco Multari, Nikolaos Papadakis, Dimitris Papadopoulos, Yogendra Patil, SELECT count(p.title) FROM projects p GROUP BY Aurélien Personnaz, Guillem Rull, Ana Claudia Sima, Ellery Smith, Dimitrios 11 p.start_year; Skoutas, Srividya Subramanian, Guohui Xiao, and Kurt Stockinger. 2021. INODE: SELECT count(i.name) FROM institutions i, countries c Building an End-to-End Data Exploration System in Practice [Extended Vision]. 12 WHERE i.country_id=c.unics_id GROUP BY c.name; CoRR abs/2104.04194 (2021). arXiv:2104.04194 https://arxiv.org/abs/2104.04194 13 SELECT title FROM topics WHERE title like ’%climate%’; [2] Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Sum- SELECT count(p.title) FROM projects p WHERE marizing Source Code using a Neural Attention Model. In ACL. The Association 14 p.start_year=2018; for Computer Linguistics. [3] Andreas Kokkalis, Panagiotis Vagenas, Alexandros Zervakis, Alkis Simitsis, Geor- gia Koutrika, and Yannis Ioannidis. 2012. Logos: a system for translating queries into narratives. In Proceedings of the 2012 ACM SIGMOD International Conference Table 7: The 14 SQL queries of the SDSS database. on Management of Data. 673–676. [4] Georgia Koutrika, Alkis Simitsis, and Yannis E Ioannidis. 2010. Explaining struc- ID SDSS SQL Queries tured queries in natural language. In ICDE. IEEE, 333–344. 1 SELECT objid FROM photoobj WHERE clean=1; [5] Axel-Cyrille Ngonga Ngomo, Lorenz Bühmann, Christina Unger, Jens Lehmann, and Daniel Gerber. 2013. Sorry, i don’t speak SPARQL: translating SPARQL SELECT specobjid, z FROM specobj WHERE class = ’QSO’ AND 2 queries into natural language. In Proceedings of the 22nd international conference zwarning = 0; on World Wide Web. 977–988. SELECT objid FROM photoobj WHERE ra >185 AND ra <185.1 3 [6] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a AND dec <5; method for automatic evaluation of machine translation. In Proceedings of the 4 SELECT specobjid FROM specobj WHERE survey = ’segue2’; 40th annual meeting of the Association for Computational Linguistics. 311–318. SELECT specobjid FROM specobj WHERE class = ’STAR’ AND 5 [7] Ehud Reiter and Robert Dale. 1997. Building applied natural language gen- zwarning = 0; eration systems. Nat. Lang. Eng. 3, 1 (1997), 57–87. https://doi.org/10.1017/ SELECT s.specobjid FROM specobj as s WHERE s.subclass = 6 S1351324997001502 ’STARFORMING’ AND s.class= ’GALAXY’; [8] Alkis Simitsis and Yannis E. Ioannidis. 2009. DBMSs Should Talk Back Too. In SELECT s.specobjid FROM specobj as s WHERE s.subclass = 7 CIDR. ’OB’ AND s.class= ’STAR’; [9] Alkis Simitsis, Georgia Koutrika, Yannis Alexandrakis, and Yannis Ioannidis. 2008. SELECT * FROM photoobj WHERE ra >100 and dec <100 AND 8 Synthesizing structured text from logical database subsets. In EDBT. 428–439. type = 3; [10] Chris van der Lee, Emiel Krahmer, and Sander Wubben. 2018. Automated learning 9 SELECT DISTINCT type FROM photoobj; of templates for data-to-text generation: comparing rule-based, statistical and 10 SELECT class, count(*) FROM specobj GROUP BY class; neural methods. In Proceedings of the 11th International Conference on Natural SELECT g.* FROM specobj s, galspecline g WHERE 11 Language Generation. 35–45. s.specobjid = g.specobjid and ra <185 AND dec <25; [11] Sam Wiseman, Stuart M Shieber, and Alexander M Rush. 2018. Learning neural SELECT n.* FROM neighbors n, photoobj p WHERE p.objid = templates for text generation. arXiv preprint arXiv:1808.10122 (2018). 12 n.objid AND p.b = 1.072 and p.l = 174.535; [12] Kun Xu, Lingfei Wu, Zhiguo Wang, Yansong Feng, and Vadim Sheinin. 2018. SELECT s.* FROM specobj s, galspecline g SQL-to-Text Generation with Graph-to-Sequence Model. In Proceedings of the 13 WHERE s.specobjid = g.specobjid; 2018 Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, SELECT p.u, p.g, p.r, p.i, p.z FROM specobj s, photoobj p David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). 931–936. 14 WHERE s.bestobjid = p.objid AND s.class = ’QSO’; 6