Experiencing Hypothetical Datalog in SQL Puzzles
                         Fernando Sáenz-Pérez1
                         1
                             Faculty of Computer Science, Complutense University of Madrid, 28040 Madrid, Spain


                                        Abstract
                                        This work presents some SQL puzzles making use of the application of Hypothetical Datalog (HD) to implement
                                        an SQL system. HD provides a way of declaring SQL Common Table Expressions as used to specify local view
                                        definitions and recursive queries. We present the concepts of HD that will be employed to translate SQL queries
                                        to HD, providing its syntax, an inference system and translation rules. Finally, several SQL puzzles used during
                                        teaching SQL in a database subject are proposed.

                                        Keywords
                                        Hypothetical Datalog, Recursive SQL, Teaching, Puzzles


                         1. Introduction
                         SQL is usually understood as a specific-purpose, declarative programming language, intended to insert
                         and extract data from relational databases. However, since the addition of recursive common table
                         expressions (CTE, included for the first time in the ISO/IEC Standard SQL:1999, also known as SQL3) its
                         nature has evolved to become a Turing-complete language. To acknowledge this, a specific class of tag
                         systems, CTS’s (Cyclic Tag Systems) were proposed in [1], and a corollary of its work was that a CTS
                         can emulate a Turing-complete class of tag systems. Since a CTS can be implemented in standard SQL,1
                         this proves that SQL is Turing complete.
                            This result can be used to pose more advanced problems as puzzles to students with the aim of
                         sharpening their programming skills and promoting open-mindedness. Using puzzles to this end is
                         a well-known technique [2], and in fact they are used all around the world as in the Hour of Code,2
                         Programming Praxis,3 CodeKata,4 ¡Acepta el reto! 5 and also specific for SQL [3].6 Our experience and
                         student feedback when teaching database concepts to advanced students (typically in the double degree
                         of Computer Science and Mathematics) during last years confirms the usefulness of puzzles as an
                         intellectual challenge posed to students (proposed as extra-curricular activities). This paper collects
                         some of those SQL puzzles, ranging from the easier ones to the more complex and appealing ones.
                         Though SQL can not be considered as an esoteric language [4], expressing these puzzles with this
                         language can be viewed as an esoteric usage.
                            While these puzzles can be solved in most SQL systems, when teaching we use instead the Datalog
                         Educational System (DES)7 [5]. This system, which is capable of solving standard SQL queries, is in
                         our view more adequate for students than others because, in particular, it provides better syntax error
                         explanations and even semantic error warnings [6]. Moreover, students can use its SQL declarative
                         debugger [7] for fixing semantically incorrect views. These last two features are absent in commercial
                         SQL systems. Apart from this, DES extends the SQL language beyond the standard, including the

                          Datalog 2.0 2024: 5th International Workshop on the Resurgence of Datalog in Academia and Industry, October 11-14, 2024, Dallas,
                          Texas, USA
                          $ fernan@sip.ucm.es (F. Sáenz-Pérez)
                           0000-0001-6075-4398 (F. Sáenz-Pérez)
                                     © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                         1
                           https://wiki.postgresql.org/index.php?title=Cyclic_Tag_System&oldid=15106
                         2
                           https://hourofcode.com
                         3
                           https://programmingpraxis.com
                         4
                           http://codekata.com
                         5
                           Face the challenge! https://www.aceptaelreto.com
                         6
                           SQLzoo https://sqlzoo.net, HackerRank https://www.hackerrank.com/domains/sql
                         7
                           Desktop applications for different OS’s and Prolog host systems (https://des.sourceforge.io), and an on-line system DESweb
                           (https://desweb.fdi.ucm.es) which includes the last development version are available.

CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings

                                                                                                               54
Fernando Sáenz-Pérez CEUR Workshop Proceedings                                                                       54–68


possibility of making assumptions, both positive (assuming data which is not in the database) and
negative (neglecting data which is already in the database) for querying the database. However, since
these puzzles are targeted at database students, we restrict ourselves to uses admitted by the SQL
standard and leave the extension out of the scope of this paper.
   DES is built with concepts from intuitionistic logic programming (ILP) [8] dealing to a Hypothetical
Datalog (HD) language, which makes possible those positive and negative assumptions. As a conse-
quence, it can be used to express temporary relations, similar to what an SQL WITH clause allows [9].
Each SQL query submitted to this system is translated into an HD program and solved by its logic
engine. This paper extends [9] by introducing expressions in conditions and meta-predicates which
are needed to support the translation of standard SQL queries with expressions in WHERE clauses and
aggregates as required to solve those puzzles.8

1.1. Motivation
SQL data-retrieval queries usually start with a SELECT keyword, which builds the outcome in terms
of its expression list. An exception to this is the WITH clause, which permits local view declarations
which can be used in its outcome SQL statement. These declarations are known as Common Table
Expressions (CTE’s), and its syntax in recursive WITH queries can be simplified as:
      WITH RECURSIVE
           R1 AS SQL1 , -- CTE1
           ...,
           R𝑛 AS SQL𝑛 -- CTE𝑛
      SQL               -- Outcome SQL
where each R𝑖 is a local, temporary view name defined by the SQL query SQL𝑖 , and can be referenced
only in the context of each SQL𝑖 and SQL , which is the query that builds the outcome. Each declaration
R𝑖 AS SQL𝑖 is a CTE. Whilst a local view declaration is similar to a CREATE VIEW declaration, its scope
is restricted as already noted, and its lifetime is the same as for SQL .
   Though CTE’s are used for a number of reasons (divide-and-conquer, readability, creating views for
which the user has no rights, . . . ), we are interested in the use that enables SQL as a Turing complete
language: by specifying recursive queries. As an example, the following defines natural numbers:

        WITH RECURSIVE
          nat(n) AS (SELECT 0 UNION ALL SELECT n+1 FROM nat)
        SELECT n FROM nat;

where the keyword RECURSIVE is required by the standard to specify a (potentially) recursive CTE,
the first SELECT is the base case (simply returning the tuple (0) as a FROM-less statement), which is
unionised with the inductive case (second SELECT). This single CTE is used in the outcome SQL (third
SELECT) to deliver all the naturals. Query semantics can be understood as finding its fixpoint for the
application of the unionised queries on an enlarging relation nat, which is increased with a new number
in each fixpoint-search iteration. In this case, the fixpoint would be infinite, though top-𝑁 queries can
limit the number of retrieved tuples.
   Current SQL systems include limitations on recursive queries: mutual and non-linear recursion are
not allowed, only one recursive case can be set, and duplicate elimination is not allowed in the union
of the base and inductive cases. These limitations can be overcome with other database languages
such as Datalog operating over the same relational data, a language capable of expressing mutual and
non-linear recursion, and duplicates [10, 11]. However, local (temporary) view definitions as needed
to represent the behavior of a WITH clause are not part of a Datalog language, which only allows for
non-volatile predicates.


8
    This work was supported by the State Research Agency (AEI) of the Spanish Ministry of Science and Innovation under grant
    PID2019-104735RB-C42 (SAFER).


                                                              55
Fernando Sáenz-Pérez CEUR Workshop Proceedings                                                       54–68


   Here is where intuitionistic logic programming (ILP) can be of help because it adds embedded
implications in the body of clauses. Intuitively, in an embedded implication 𝐴 ⇒ 𝐵, the atom 𝐴
is assumed during proving 𝐵 [12], limiting its scope to this proof. A database language based on
ILP is Hypothetical Datalog [8], which is extended in [9, 13, 14] with rules (not only atoms) in such
assumptions, a feature that can be leveraged to represent those SQL local view definitions. Next section
recalls this feature implemented in the DES system, which also includes essential SQL functionalities
such as handling duplicates and aggregates.


2. Intuitionistic Logic Programming for SQL
The Hypothetical Datalog language in [9] implements a system based on intuitionistic logic program-
ming. Here, its syntax and semantics are extended with conditions in expressions and meta-predicates
for supporting common SQL needs: top-𝑁 queries (i.e., at most, 𝑁 solutions to its goal are allowed),
aggregates, and expressions in comparison predicates. Next, 𝑣𝑎𝑟𝑠(𝑇 ) is the set of variables in 𝑇 .
Definition 2.1 (Syntax).
𝜌 := 𝐴 | 𝐴 ← 𝜑1 ∧ . . . ∧ 𝜑𝑛
𝜑 := 𝐴 | ¬𝐴 | 𝐸1 ◇ 𝐸2 | 𝛿(𝐴) | 𝜏 (𝑁, 𝐴) | 𝒢(𝐴, 𝑋𝑠, 𝜑) | 𝜌1 ∧ . . . ∧ 𝜌𝑛 ⇒ 𝜑
where 𝜌 stands for rule, 𝜑 for goal, 𝐴 for atom (possibly containing variables and constants, but no
compound terms), 𝑁 is a natural number, 𝑋𝑠 is a list of variables, 𝐸𝑖 are expressions, ◇ is a comparison
operator (=, <, ≤. . . ), symbols ¬, 𝛿, 𝜏 and 𝒢 are meta-predicates, and 𝑣𝑎𝑟𝑠(𝜌𝑖 ) do not occur out of 𝜌𝑖 .
   Here, the symbol ⇒ denotes the embedded implication (it builds a goal that can occur in the body
of a rule), and 𝛿 is a duplicate elimination meta-predicate. The first condition (on compound terms) is
needed for ensuring a finite fixpoint in absence of infinite relations, and the second one (on variables of
rules 𝜌𝑖 ) ensures that 𝜌𝑖 is not specialized by any substitution out of it along inferencing, so 𝜌𝑖 can be
assumed “as is” for any inference step, which is an adequate requirement for modelling local definitions
in SQL WITH statements (however, it represents a limitation with respect to [12]).
   Symbols ¬, 𝛿, 𝜏 and 𝒢 are meta-predicates, standing respectively for negation, duplicate elimination,
top-𝑁 queries, and group-by (for solving aggregations as counts and summations, which can occur in
expressions as other scalar functions). A group-by goal 𝒢(𝐴, 𝑋𝑠, 𝜑) means that ground instances of
𝐴 are grouped by the list of variables 𝑋𝑠 (grouping criteria) and, for each group, there is at most one
instantation such that it fulfils 𝜑.
   Next, we extend the inference system in [9] with inference rules for comparison operators, top-𝑁
and group-by goals. The well-known concepts of predicate dependency graph (PDG) and stratifiable
                                                                                                       ¬
database Δ [15] are used here (see also page 298 in [9]). Here, the PDG includes a negative arc 𝑝 ← 𝑞
between the head predicate 𝑝 of a rule and a predicate 𝑞 in its body, such that 𝑞 occurs in an argument
                                                                                                       ¬
of a meta-predicate ¬, 𝜏 or 𝒢; otherwise, it includes a positive arc 𝑝 ← 𝑞. A negative arc 𝑝 ← 𝑞
imposes that 𝑞 must be solved before 𝑝 for avoiding non-monotonicity [15]. 𝑠𝑡𝑟(Δ, 𝑃 ) is the stratum
corresponding to predicate 𝑃 in Δ [9] such that for a positive arc p←q, 𝑠𝑡𝑟(Δ, p) ≥ 𝑠𝑡𝑟(Δ, q), and for
                  ¬
a negative arc p←q, 𝑠𝑡𝑟(Δ, p) > 𝑠𝑡𝑟(Δ, q).
   Δ ⊢ 𝜓 is an inference expression, where 𝜓 is an identified ground atom 𝑖𝑑 : 𝐴. Such an 𝑖𝑑 is of
the form 𝑖𝑑1 · · · · · 𝑖𝑑𝑛 (𝑛 ≥ 1) and helps data provenance for supporting duplicates. 𝑝𝑟𝑒𝑑(𝐴) is the
predicate of atom 𝐴. A negative inference expression is of the form Δ ⊢ 𝑖𝑑 : −𝐴. Negative inference
expressions play a key role in the SQL extension for negative assumptions, but are out of the scope
of this paper. Next definition specifies an inference system where each rule is read as: If the formulas
above the line can be inferred, then the ones below the line can be inferred as well. The application of
an inference rule is an inference step.
Definition 2.2 (Inference system). Given a database Δ and a set of input inference expressions ℰ,
the inference system associated to the 𝑠-th stratum is defined as follows, where 𝑑𝑠 (ℰ) is a closure
operator that denotes the set of inference expressions derivable in this system:
Axioms:


                                                    56
Fernando Sáenz-Pérez CEUR Workshop Proceedings                                                        54–68


    • Δ ⊢ 𝑖𝑑 : 𝐴 is an axiom for each (ground) atomic formula 𝑖𝑑 : 𝐴 in Δ, where 𝑠𝑡𝑟(Δ, 𝑝𝑟𝑒𝑑(𝐴)) = 𝑠.
    • Δ ⊢ 𝑐1 ◇ 𝑐2 is an axiom, where ◇ ∈ {=, <, >, ≤, ≥}, 𝑐𝑖 are constants, and 𝑐1 ◇ 𝑐2 holds.
    • Each expression in ℰ is an axiom.
Inference Rules:
    • For any rule 𝐴 ← 𝜑1 ∧ . . . ∧ 𝜑𝑛 with identifier 𝑖𝑑 in Δ, where 𝑠𝑡𝑟(Δ, 𝑝𝑟𝑒𝑑(𝐴)) = 𝑠 and for any
      ground substitution 𝜃:
      Δ ⊢ 𝑖𝑑𝑖 : 𝜑𝑖 𝜃 for each 𝑖
           Δ ⊢ 𝑖𝑑′ : 𝐴𝜃                                                                             (Clause)
      where 𝑖𝑑′ is the composition of identifiers 𝑖𝑑 · 𝑖𝑑1 · · · 𝑖𝑑𝑛 .
    • For any constants 𝑐𝑖 and expressions 𝑒𝑖 :
       Δ ⊢ 𝑐1 ◇ 𝑐2
      Δ ⊢ 𝑒1 𝜃 ◇ 𝑒2 𝜃                                                                           (Expression)
      where ◇ ∈ {=, <, >, ≤, ≥}, and 𝑒𝑖 𝜃 evaluates to 𝑐𝑖 .
    • For any atoms 𝐴, 𝐴′ :
        Δ ⊢ 𝑖𝑑 : 𝐴
      Δ ⊢ 𝛼 : 𝛿(𝐴′ 𝜃)                                                                           (Duplicates)
      where 𝐴 = 𝐴′ 𝜃, 𝛼 is a single, unique identifier.
    • For any atoms 𝐴𝑖 , 𝐴, natural number 𝑁 , and ground substitutions 𝜃𝑖 :
         Δ ⊢ 𝑖𝑑𝑖 : 𝐴𝑖
      Δ ⊢ 𝛼𝑖 : 𝜏 (𝑁, 𝐴𝜃𝑖 )                                                                             (Top)
      where 𝐴𝑖 = 𝐴𝜃𝑖 , 𝛼𝑖 are at most 𝑁 single, unique identifiers.
    • For any atoms 𝐴𝑖 , 𝐴, 𝐵𝑗 , 𝐵 and ground substitutions 𝜃𝑖 , 𝜃:
      Δ ⊢ 𝑖𝑑𝑖 : 𝐴𝑖 Δ ⊢ 𝑖𝑑𝑗 : 𝐵𝑗
      Δ ⊢ 𝛼𝑖 : 𝒢(𝐴𝜃𝑖 , 𝑋𝑠𝜃𝑖 , 𝐵𝜃𝑖 )                                                      (Group-By)
      where each 𝛼𝑖 is a unique identifier for each group 𝑋𝑠𝜃𝑖 , and 𝐴 = 𝐴𝑖 𝜃, 𝐵 = 𝐵𝑗 𝜃.
    • For any goal 𝜑:
      Δ ∪ {𝜌1 , . . . , 𝜌𝑛 } ⊢ 𝜑
      Δ ⊢ 𝜌1 ∧ . . . ∧ 𝜌𝑛 ⇒ 𝜑                                                                 (Assumption)

Example 1. Natural numbers can be specified in this language as follows:

                    𝜑1 ≡ (𝑛𝑎𝑡(0) ∧ (𝑛𝑎𝑡(𝑋) ← 𝑛𝑎𝑡(𝑌 ) ∧ 𝑋 = 𝑌 + 1)) ⇒ 𝑛𝑎𝑡(𝑋)
  Starting with Δ = ∅, and applying the Assumption rule:

                      {(𝑛𝑎𝑡(0)), (𝑛𝑎𝑡(𝑋) ← 𝑛𝑎𝑡(𝑌 ) ∧ 𝑋 = 𝑌 + 1)} ⊢ 𝑛𝑎𝑡(𝑋)
                    ∅ ⊢ (𝑛𝑎𝑡(0) ∧ (𝑛𝑎𝑡(𝑋) ← 𝑛𝑎𝑡(𝑌 ) ∧ 𝑋 = 𝑌 + 1)) ⇒ 𝑛𝑎𝑡(𝑋)
   Δ, originally empty, has been enlarged with the two rules in the assumption (let us call it Δ′ ), and each
one receives a unique identifier (as any other possible rule in the program). For this example, let 𝑟1 and
𝑟2 be the respective identifiers for 𝑛𝑎𝑡(0) and 𝑛𝑎𝑡(𝑋) ← 𝑛𝑎𝑡(𝑌 ) ∧ 𝑋 = 𝑌 + 1. In a second inference
step, we can apply the Clause inference rule, giving: Δ′ ⊢ 𝑟1 : 𝑛𝑎𝑡(𝑋){𝑋 ↦→ 0} ≡ Δ′ ⊢ 𝑟1 : 𝑛𝑎𝑡(0),
where 𝜃 = {𝑋 ↦→ 0}, and Δ′ ⊢ 𝑟1 : 𝑛𝑎𝑡(0) represents a first solution of the query. This can be read as:
the atom 𝑛𝑎𝑡(0) identified by 𝑟1 has been inferred for the context Δ′ . Next, using 𝑟2 in Clause for some
𝑖𝑑 after Assumption yields:

                               Δ′ ⊢ 𝑖𝑑 : 𝑛𝑎𝑡(𝑌 )𝜃1 Δ′ ⊢ (𝑋 = 𝑌 + 1)𝜃1
                                         Δ′ ⊢ 𝑟2 · 𝑖𝑑 : 𝑛𝑎𝑡(𝑋)𝜃1


                                                      57
Fernando Sáenz-Pérez CEUR Workshop Proceedings                                                        54–68


where axioms above the line can be inferred: First, the axiom Δ′ ⊢ 𝑟1 : 𝑛𝑎𝑡(0) (with 𝜃 = {𝑌 ↦→ 0}) as
before, and the second one because it becomes a trivial axiom Δ′ ⊢ 1 = 1 after applying Expression (i.e.,
the evaluated 𝑋 = 0 + 1 for {𝑋 ↦→ 1}). So 𝜃1 = {𝑌 ↦→ 0, 𝑋 ↦→ 1}, and the axiom Δ′ ⊢ 𝑟2 · 𝑟1 : 𝑛𝑎𝑡(1)
can be inferred. Proceeding in the same way, the following axioms are inferred with 𝑘 > 0:

                                        Δ′ ⊢ 𝑟2 · · · 𝑟2 ·𝑟1 : 𝑛𝑎𝑡(𝑘)
                                             ⏟ ⏞
                                                  𝑘                                                      □

  Negative information is deduced by applying the closed-world assumption to a set of inference
expressions ℰ (written as 𝑐𝑤𝑎(ℰ)) as the union of ℰ and the negative inference expression for Δ ⊢ 𝜑
such that Δ ⊢ 𝜑 ∈
                / ℰ. Thus, a stratified bottom-up construction of the unified stratified semantics can be
specified by:

    • ℰ0 = ∅
    • ℰ 𝑠+1 = 𝑐𝑤𝑎(𝑑𝑠+1 (ℰ 𝑠 )) for 𝑠 ≥ 0.

which builds a set of axioms ℰ that provides a way to assign a meaning to a goal 𝜑 with:

                               𝑠𝑜𝑙𝑣𝑒(𝜑, ℰ) = {Δ ⊢ 𝑖𝑑 : 𝜓 ∈ ℰ | 𝜑𝜃 = 𝜓}
where 𝜃 is a substitution and each axiom in ℰ is mapped to the database Δ it was deduced for, and the
inferred fact 𝜓 is labelled with its data source 𝑖𝑑 (for supporting duplicates). We use Δ(ℰ) to denote
the multiset of facts 𝜓 so that Δ ⊢ 𝑖𝑑 : 𝜓 ∈ ℰ for any 𝑖𝑑. In Example 1, there is only one stratum, so:

                      𝑠𝑜𝑙𝑣𝑒(𝜑1 , ℰ) = {Δ′ ⊢ 𝑟2 · · · 𝑟2 ·𝑟1 : 𝑛𝑎𝑡(𝑘) | 𝑘 ≥ 0} = ℰ𝜑1
                                            ⏟ ⏞
                                                  𝑘

and the outcome would be the multiset Δ(ℰ𝜑1 ) = {𝑛𝑎𝑡(𝑘) | 𝑘 ≥ 0}.
   A top-10 goal for this example would restrict the number of solutions as: Δ(ℰ𝜏 (10,𝜑1 ) ) = {𝑛𝑎𝑡(𝑘) |
0 ≤ 𝑘 ≤ 9}, which corresponds to modifying the outcome SQL query in the WITH statement by SELECT
TOP 10 n FROM nat.
   The stratified construction of inference expressions captures the requirements of SQL clauses such
as EXCEPT, NOT IN, and GROUP BY, which are sources of negative arcs in the construction of a predicate
dependency graph (PDG) [16].


3. Translating SQL to Hypothetical Datalog
Here, the function SQL_to_DL is defined by extending Definition 6 in [9] with top-𝑁 queries, aggregates
and expressions. This function takes a relation name and an SQL statement as input and returns a multiset
of Hypothetical Datalog rules providing the same meaning as the SQL relation for a corresponding
predicate with the same name as the relation. From here on: Without loss of expressivity, we assume
that a group-by statement includes a relation name in its FROM clause; set-related operators and symbols
refer to multisets because SQL relations can contain duplicates; each attribute is qualified by the relation
name it corresponds to so that they can be uniquely identified; an empty conjunctive goal is 𝑡𝑟𝑢𝑒;
where convenient, variables of goals are made explicit; and for the sake of conciseness, Hypothetical
Datalog is simply referred to as Datalog.

Definition 3.1. The function SQL_to_DL takes a relation name and an SQL statement as input and
returns a Datalog program as defined by cases as follows:
% Basic SELECT statement
SQL_to_DL (𝑟, SELECT 𝐸𝑥𝑝𝑠 FROM 𝑅𝑒𝑙 WHERE 𝐶𝑜𝑛𝑑) =
  { 𝑟(𝑋) ← 𝐷𝐿𝑅𝑒𝑙(𝑌 ) ∧ 𝐷𝐿𝐸𝑥𝑝𝑠(𝑋) ∧ 𝐷𝐿𝐶𝑜𝑛𝑑(𝑍) } 𝑅𝑒𝑙𝑅𝑢𝑙𝑒𝑠 𝐸𝑥𝑝𝑅𝑢𝑙𝑒𝑠 𝐶𝑜𝑛𝑑𝑅𝑢𝑙𝑒𝑠,
                                                     ⋃︀         ⋃︀       ⋃︀
where SQLREL_to_DL (𝑅𝑒𝑙) = (𝐷𝐿𝑅𝑒𝑙(𝑌 ), 𝑅𝑒𝑙𝑅𝑢𝑙𝑒𝑠), SQLEXPS_to_DL (𝐸𝑥𝑝𝑠) =


                                                      58
Fernando Sáenz-Pérez CEUR Workshop Proceedings                                                        54–68


(𝐷𝐿𝐸𝑥𝑝𝑠(𝑋), 𝑡𝑟𝑢𝑒, 𝐸𝑥𝑝𝑅𝑢𝑙𝑒𝑠), and SQLCOND_to_DL (𝐶𝑜𝑛𝑑) = (𝐷𝐿𝐶𝑜𝑛𝑑(𝑍), 𝐶𝑜𝑛𝑑𝑅𝑢𝑙𝑒𝑠).
Here, each E𝑖 in the list 𝐸𝑥𝑝𝑠 is either an expression or an argument name present in the relation 𝑅𝑒𝑙
with corresponding logic variable 𝑋𝑖 ∈ 𝑋. 𝑅𝑒𝑙 is either a single defined relation (table or view), or a join
of relations, or an SQL statement. Function SQLREL_to_DL (respectively, SQLCOND_to_DL ) takes an SQL
relation (respectively, condition) and returns a goal with as many variables 𝑌 as arguments of 𝑅𝑒𝑙 and,
possibly, additional rules which result from the translation (and similarly for SQLEXPS_to_DL (𝐸𝑥𝑝𝑠)).
Variables 𝑍 ⊆ 𝑌 come as a result of the translation of the condition 𝐷𝐿𝐶𝑜𝑛𝑑 to a goal.
% GROUP BY statement
SQL_to_DL (𝑟, SELECT 𝐸𝑥𝑝𝑠 FROM 𝑅𝑒𝑙 GROUP BY 𝐶𝑜𝑙𝑠) =
   { 𝑟(𝑋) ← 𝐷𝐿𝐸𝑥𝑝𝑠 ∧ 𝒢(𝐷𝐿𝑅𝑒𝑙(𝑌 ), 𝑉 , 𝐷𝐿𝐴𝑔𝑔) } 𝐸𝑥𝑝𝑠𝑅𝑢𝑙𝑒𝑠 𝑅𝑒𝑙𝑅𝑢𝑙𝑒𝑠,
                                                   ⋃︀              ⋃︀
where SQLEXPS_to_DL (𝐸𝑥𝑝𝑠) = (𝐷𝐿𝐸𝑥𝑝𝑠, 𝐷𝐿𝐴𝑔𝑔, 𝐸𝑥𝑝𝑠𝑅𝑢𝑙𝑒𝑠), SQLREL_to_DL (𝑅𝑒𝑙) =
(𝐷𝐿𝑅𝑒𝑙(𝑌 ), 𝑅𝑒𝑙𝑅𝑢𝑙𝑒𝑠). Here, 𝑉 ⊆ 𝑌 are the variables of the predicate 𝐷𝐿𝑅𝑒𝑙 corresponding to the
attributes 𝐶𝑜𝑙𝑠 of the relation 𝑅𝑒𝑙.
% Top-𝑁
SQL_to_DL (𝑟, SELECT TOP N 𝐸𝑥𝑝𝑠 FROM 𝑅𝑒𝑙 WHERE 𝐶𝑜𝑛𝑑) = { 𝑟(𝑋) ← 𝜏 (𝑁, 𝑟 ′ (𝑋)) }
                                                                                            ⋃︀
                                                                                                 𝑅𝑢𝑙𝑒𝑠,
where SQL_to_DL (𝑟′ , SELECT 𝐸𝑥𝑝𝑠 FROM 𝑅𝑒𝑙 WHERE 𝐶𝑜𝑛𝑑) = 𝑅𝑢𝑙𝑒𝑠
% Union
SQL_to_DL (𝑟, SQL1 UNION ALL SQL2 ) = SQL_to_DL (𝑟, SQL1 )          SQL_to_DL (𝑟, SQL2 )
                                                               ⋃︀

% Difference
SQL_to_DL (𝑟, SQL1 EXCEPT SQL2 ) =
  { 𝑟(𝑋) ← 𝑠(𝑋) ∧ ¬𝑡(𝑋) } SQL_to_DL (𝑠, SQL1 ) SQL_to_DL (𝑡, SQL2 )
                            ⋃︀                ⋃︀

% Intersection
SQL_to_DL (𝑟, SQL1 INTERSECT SQL2 ) =
   { 𝑟(𝑋) ← 𝑠(𝑋) ∧ 𝑡(𝑋) } SQL_to_DL (𝑠, SQL1 ) SQL_to_DL (𝑡, SQL2 )
                          ⋃︀                  ⋃︀

% WITH statement
SQL_to_DL (𝑟, WITH 𝑟1 AS SQL1 , ..., 𝑟𝑛 AS SQL𝑛 SQL ) =
  { 𝑟(𝑋) ← ({SQL_to_DL (𝑟𝑖 , SQL 𝑖 ) | 1 ≤ 𝑖 ≤ 𝑛}) ⇒ 𝑠(𝑋) } SQL_to_DL (𝑠, SQL )
            ⋀︀                                             ⋃︀


where        (𝐵𝑎𝑔) denotes 𝐵1 ∧ · · · ∧ 𝐵𝑚 (𝐵𝑖 ∈ 𝐵𝑎𝑔).
        ⋀︀


  Next three definitions specify the functions SQLREL_to_DL , SQLEXPS_to_DL and SQLCOND_to_DL
which are used by SQL_to_DL .

Definition 3.2. The function SQLREL_to_DL takes an SQL relation (either a relation name, or a join of
relations, or an SQL statement) as input and returns both a Datalog goal and program as defined by
cases as follows:
% Extensional/Intensional relation name
SQLREL_to_DL (𝑅𝑒𝑙𝑁 𝑎𝑚𝑒) = (𝑅𝑒𝑙𝑁 𝑎𝑚𝑒(𝑋), {})
where 𝑋 are the 𝑛 variables corresponding to the 𝑛-ary relation 𝑅𝑒𝑙𝑁 𝑎𝑚𝑒.
% Join of relations
SQLREL_to_DL ((𝑅𝑒𝑙1 , . . . , 𝑅𝑒𝑙𝑛 )) = (𝐷𝐿𝑅𝑒𝑙(𝑋1 )∧· · ·∧𝐷𝐿𝑅𝑒𝑙(𝑋𝑛 ), 𝑅𝑒𝑙𝑅𝑢𝑙𝑒𝑠1 ∪· · ·∪𝑅𝑒𝑙𝑅𝑢𝑙𝑒𝑠𝑛 )
where SQLREL_to_DL (𝑅𝑒𝑙𝑖 ) = (𝐷𝐿𝑅𝑒𝑙(𝑋𝑖 ), 𝑅𝑒𝑙𝑅𝑢𝑙𝑒𝑠𝑖 ).
% SQL statement
SQLREL_to_DL (𝑆𝑄𝐿) = (𝑅𝑒𝑙𝑁 𝑎𝑚𝑒(𝑋), SQL_to_DL (𝑅𝑒𝑙𝑁 𝑎𝑚𝑒, 𝑆𝑄𝐿))
where 𝑋 are the 𝑛 variables corresponding to the 𝑛-ary statement 𝑆𝑄𝐿, and 𝑅𝑒𝑙𝑁 𝑎𝑚𝑒 is a fresh
relation name.


                                                    59
Fernando Sáenz-Pérez CEUR Workshop Proceedings                                                        54–68


Definition 3.3. The function SQLEXPS_to_DL takes a sequence of SQL expressions as input and returns
a Datalog conjunctive goal for expressions, another for aggregates, and a program, as defined by cases
as follows:
% Sequence of expressions                 ⋀︀                      ⋀︀
SQLEXPS_to_DL ((𝐸𝑥𝑝1 , . . . , 𝐸𝑥𝑝𝑛 )) = ( {𝐷𝐿𝐸𝑥𝑝𝑖 | 1 ≤ 𝑖 ≤ 𝑛}, {𝐷𝐿𝐴𝑔𝑔𝑖 | 1 ≤ 𝑖 ≤ 𝑛},
                                          𝐸𝑥𝑝𝑅𝑢𝑙𝑒𝑠1 ∪ · · · ∪ 𝐸𝑥𝑝𝑅𝑢𝑙𝑒𝑠𝑛 ),
where SQLEXPS_to_DL (𝐸𝑥𝑝𝑖 ) = (𝐷𝐿𝐸𝑥𝑝𝑖 , 𝐷𝐿𝐴𝑔𝑔𝑖 , 𝐸𝑥𝑝𝑅𝑢𝑙𝑒𝑠𝑖 ).
% Expression
SQLEXPS_to_DL (𝐸𝑥𝑝) = ((𝑋 = 𝐷𝐿𝐸𝑥𝑝) ∧ 𝐷𝐿𝑄𝑠, 𝐷𝐿𝐴𝑔𝑔, 𝐸𝑥𝑝𝑅𝑢𝑙𝑒𝑠), where 𝑋 is the variable
corresponding to the expression 𝐸𝑥𝑝 at position 𝑝 in the list of SQL expressions (𝐸𝑥𝑝1 , . . . , 𝐸𝑥𝑝𝑛 ),
and 𝐷𝐿𝐸𝑥𝑝 is the result of: 1) replacing each attribute 𝑎𝑖 of a relation 𝑟 in 𝐸𝑥𝑝 by the respective
variable 𝑋𝑖 of the predicate 𝑟; 2) replacing each aggregate function 𝑓 over an attribute 𝑏𝑗 of the relation
𝑠 by a fresh variable 𝑌𝑗 . For each 𝑓 out of 𝑛, 𝐷𝐿𝐴𝑔𝑔 includes a conjunctive goal 𝑌𝑗 = 𝑓 (𝑍𝑗 ), where
𝑍𝑗 is the variable of 𝑠(𝑍) which corresponds to the attribute 𝑏𝑗 ; and 3) replacing each query 𝑄 with
alias 𝑎 by a fresh variable 𝑈 such that SQL_to_DL (𝑎, 𝑄) = (𝑄𝑅𝑢𝑙𝑒𝑠), 𝐸𝑥𝑝𝑅𝑢𝑙𝑒𝑠 contains 𝑄𝑅𝑢𝑙𝑒𝑠,
and 𝐷𝐿𝑄𝑠 includes a conjunctive goal 𝑎(𝑈 ).

Definition 3.4. The function SQLCOND_to_DL takes an SQL condition as input and returns a Datalog
conjunctive goal and a program, as defined by cases as follows:
% Basic condition
SQLCOND_to_DL (𝐸𝑥𝑝1 𝑐𝑜𝑝 𝐸𝑥𝑝2 ) = ((𝑋1 𝑐𝑜𝑝′ 𝑋2 ) {(𝑋𝑖 = 𝐷𝐿𝐸𝑥𝑝𝑖 ) ∧ 𝐷𝐿𝑄𝑠𝑖 | 1 ≤ 𝑖 ≤
                                                     ⋀︀
2}, 𝐸𝑥𝑝𝑅𝑢𝑙𝑒𝑠1 ∪ 𝐸𝑥𝑝𝑅𝑢𝑙𝑒𝑠2 ),
where SQLEXPS_to_DL (𝐸𝑥𝑝𝑖 ) = ((𝑋𝑖 = 𝐷𝐿𝐸𝑥𝑝𝑖 ) ∧ 𝐷𝐿𝑄𝑠𝑖 , 𝑡𝑟𝑢𝑒, 𝐸𝑥𝑝𝑅𝑢𝑙𝑒𝑠𝑖 ), 1 ≤ 𝑖 ≤ 2, and 𝑐𝑜𝑝′
is the Datalog corresponding comparison operator to 𝑐𝑜𝑝.
% Conjunction
SQLCOND_to_DL (𝐶1 AND 𝐶2 ) = (𝐶1′ ∧ 𝐶2′ , 𝐶𝑅𝑢𝑙𝑒𝑠1 ∪ 𝐶𝑅𝑢𝑙𝑒𝑠2 )
where 𝐶𝑖 (1 ≤ 𝑖 ≤ 2) are conditions and SQLCOND_to_DL (𝐶𝑖 )=(𝐶𝑖′ , 𝐶𝑅𝑢𝑙𝑒𝑠𝑖 ).
% Disjunction
SQLCOND_to_DL (𝐶1 OR 𝐶2 ) = (𝛿(𝑑), {𝑑 ← 𝐶1′ , 𝑑 ← 𝐶2′ } ∪ 𝐶𝑅𝑢𝑙𝑒𝑠1 ∪ 𝐶𝑅𝑢𝑙𝑒𝑠2 )
where 𝐶𝑖 (1 ≤ 𝑖 ≤ 2) are conditions, SQLCOND_to_DL (𝐶𝑖 )=(𝐶𝑖′ , 𝐶𝑅𝑢𝑙𝑒𝑠𝑖 ), and 𝑑 is a fresh atom.
% Negated condition
SQLCOND_to_DL (NOT 𝐶 ) = (𝐶 ′ , 𝐶𝑅𝑢𝑙𝑒𝑠)
where 𝐶 ′ =SQLCOND_to_DL (𝐶)=(𝐶 ′ , 𝐶𝑅𝑢𝑙𝑒𝑠), and 𝐶 represents the logical complement of 𝐶 (𝑋 < 𝑌 =
𝑋 ≥ 𝑌 , 𝐶1 ∧ 𝐶2 = 𝐶1 ∨ 𝐶2 , and so on).

  Note that disjunction requires duplicate elimination because, otherwise, there can be more tuples in
the result because a disjunction in a rule is rewritten as two rules, as the next example illustrates:
Example 2. Consider the SQL query SELECT SQRT(2) FROM dual WHERE TRUE OR TRUE defining a
relation 𝑟. The relation dual, as originally defined by Oracle, contains a single row and a single column,
and it was included as a predefined relation to allow for calculations such as this example. DES includes
this as a propositional relation instead (there is no need for an argument because it is not intended to be
used). Translating this query using Definition 3.1-Basic SELECT statement, where 𝑅𝑒𝑙 = dual, 𝐸𝑥𝑝𝑠 =
SQRT(2) and 𝐶𝑜𝑛𝑑 = TRUE OR TRUE results in:
SQL_to_DL (𝑟, SELECT SQRT(2) FROM dual WHERE TRUE OR TRUE) =
   { 𝑟(𝑋) ← 𝐷𝐿𝑅𝑒𝑙 ∧ 𝐷𝐿𝐸𝑥𝑝𝑠(𝑋) ∧ 𝐷𝐿𝐶𝑜𝑛𝑑 } 𝑅𝑒𝑙𝑅𝑢𝑙𝑒𝑠 𝐸𝑥𝑝𝑅𝑢𝑙𝑒𝑠 𝐶𝑜𝑛𝑑𝑅𝑢𝑙𝑒𝑠 =
                                                         ⋃︀           ⋃︀             ⋃︀
   { 𝑟(𝑋) ← 𝑑𝑢𝑎𝑙 ∧ 𝑋 = 𝑠𝑞𝑟𝑡(2) ∧ 𝛿(𝑑), 𝑑 ← 𝑡𝑟𝑢𝑒, 𝑑 ← 𝑡𝑟𝑢𝑒 }
(rules with respective identifiers 𝑟1 , 𝑟2 and 𝑟3 ), because following Definition 3.2-Extensional/Intensional
relation name: SQLREL_to_DL (dual) = (𝐷𝐿𝑅𝑒𝑙, 𝑅𝑒𝑙𝑅𝑢𝑙𝑒𝑠) = (𝑑𝑢𝑎𝑙, {}), following Defini-
tion 3.3-Expression: SQLEXPS_to_DL (SQRT(2)) = (𝐷𝐿𝐸𝑥𝑝, 𝐷𝐿𝐴𝑔𝑔, 𝐸𝑥𝑝𝑠𝑅𝑢𝑙𝑒𝑠) = ((𝑋 =


                                                     60
Fernando Sáenz-Pérez CEUR Workshop Proceedings                                                                        54–68


𝑠𝑞𝑟𝑡(2)), 𝑡𝑟𝑢𝑒, {}), and following Definition 3.4-Disjunction: SQLCOND_to_DL (TRUE OR TRUE) =
(𝐷𝐿𝐶𝑜𝑛𝑑, 𝐶𝑜𝑛𝑑𝑅𝑢𝑙𝑒𝑠) = (𝛿(𝑑), {𝑑 ← 𝑡𝑟𝑢𝑒, 𝑑 ← 𝑡𝑟𝑢𝑒}). Thus, 𝑠𝑜𝑙𝑣𝑒(𝑟(𝑋), ℰ) = {Δ ⊢ 𝑟1 · 𝛼 :
𝑟(1.4142)} (where 𝛼 is the fresh identifier introduced by Definition 2.2-Duplicates). Without duplicate
elimination (goal 𝑑 instead of 𝛿(𝑑) in rule 𝑟1 ): 𝑠𝑜𝑙𝑣𝑒(𝑟(𝑋), ℰ) = {Δ ⊢ 𝑟1 · 𝑟2 : 𝑟(1.4142), Δ ⊢ 𝑟1 · 𝑟3 :
𝑟(1.4142)}, which is not the expected outcome for the SQL query.
   The actual system applies other techniques to simplify the translated program, such as partial
evaluation and folding/unfolding [17], which are not presented here. In this case, the resulting translation
is simply {𝑟(1.4142)}, as it can be checked in the system by displaying compilations with the command
/show_compilations on. Observe also that the system emits the warning for this example: [Sem]
Tautological condition: (true OR true), which is one of the results of its semantic analysis [6],
indicating a superfluous condition.                                                                        □
      Next, the translation of a recursive WITH statement is illustrated.
Example 3. Consider the SQL query in Subsection 1.1 defining natural numbers. Translating this
query follows Definition 3.1-WITH statement, where 𝑟1 = nat, SQL 1 = SELECT 0 UNION ALL SELECT
n+1 FROM nat, and SQL = SELECT n FROM nat:
SQL_to_DL (𝑟, WITH RECURSIVE nat(n) AS (SELECT 0 UNION ALL SELECT n+1 FROM nat) SELECT n
FROM nat ) = { 𝑟(𝑋) ← ({SQL_to_DL (𝑟𝑖 , SQL 𝑖 ) | 1 ≤ 𝑖 ≤ 𝑛}) ⇒ 𝑠(𝑋) }           SQL_to_DL (𝑠, SQL ) =
                          ⋀︀                                                  ⋃︀
{(𝑟(𝑋) ← ((𝑛𝑎𝑡(𝑋1 ) ← 𝑑𝑢𝑎𝑙 ∧ 𝑋1 = 0 ∧ 𝑡𝑟𝑢𝑒) ∧ (𝑛𝑎𝑡(𝑋2 ) ← 𝑛𝑎𝑡(𝑌 ) ∧ 𝑋2 = 𝑌 + 1 ∧ 𝑡𝑟𝑢𝑒) ⇒
𝑠(𝑋))), (𝑠(𝑋3 ) ← 𝑛𝑎𝑡(𝑍) ∧ 𝑋3 = 𝑍 ∧ 𝑡𝑟𝑢𝑒)} because, by Definition 3.1-Union: SQL_to_DL (𝑛𝑎𝑡,
SELECT 0 UNION ALL SELECT n+1 FROM nat) = SQL_to_DL (𝑛𝑎𝑡, SELECT 0) ∪ SQL_to_DL (𝑛𝑎𝑡, SELECT
n+1 FROM nat) and, by Definition 3.1-Basic SQL statement, where SELECT 0 is an abbreviation for
SELECT 0 FROM dual WHERE TRUE, the following is get by performing similar steps to Example 2:
SQL_to_DL (𝑛𝑎𝑡, SELECT 0) = { 𝑛𝑎𝑡(𝑋1 ) ← 𝑑𝑢𝑎𝑙 ∧ 𝑋1 = 0 ∧ 𝑡𝑟𝑢𝑒 }, and applying again the same
definition to: SQL_to_DL (𝑛𝑎𝑡, SELECT n+1 FROM nat) = { 𝑛𝑎𝑡(𝑋2 ) ← 𝑛𝑎𝑡(𝑌 ) ∧ 𝑋2 = 𝑌 + 1 ∧ 𝑡𝑟𝑢𝑒 },
and: SQL_to_DL (𝑠, SELECT n FROM nat) = { 𝑠(𝑋3 ) ← 𝑛𝑎𝑡(𝑍) ∧ 𝑋3 = 𝑍 ∧ 𝑡𝑟𝑢𝑒 }.
  Further simplification and unfolding steps done in the system results in a single rule:
{(𝑟(𝑋) ← (𝑛𝑎𝑡(0)) ∧ (𝑛𝑎𝑡(𝑌 ) ← 𝑛𝑎𝑡(𝑍) ∧ 𝑌 = 𝑍 + 1) ⇒ 𝑛𝑎𝑡(𝑋))}.                                      □
   [9] provides a theorem for the semantic equivalence of an SQL relation and its counterpart Datalog
translation for a subset of the language presented here. An SQL relation is translated into an extended
relational algebra relation (ERA) definition (with SQL_to_ERA ), for which an interpretation for a com-
puted answer in terms of the ERA operators is provided. The Datalog program is interpreted with the
unified stratified semantics, which provides the set of axioms ℰ for the original database Δ augmented
with the translation of the query 𝑄 to this Datalog program. So, the proof inductively proceeds by
strata, and for each stratum by cases, showing the semantic equivalence of the translation from the
SQL query 𝑄 into an ERA expression for a relation 𝑟 (with SQL_to_ERA ), and the translation of 𝑄 into
a Datalog program (with SQL_to_DL ).


4. SQL Puzzles
This section provides several SQL puzzles9 provided to students and solved in DES, mainly with its
on-line interface DESweb. We use the concrete SQL syntax that DES supports, allowing for omitting
the verbose RECURSIVE modifier which, though promoted and required by the standard for recursive
queries, it seems to be an unneeded writing burden (a simple program analysis can classify recursive
predicates and apply both specific compilations and optimizations). Also, UNION can be used instead of
UNION ALL if duplicates are not required. Even when it might seem that discarding duplicates results
in a solving overhead, it turns out to be the opposite for recursive queries [18]. For each puzzle, its
problem statement is given, together with an equivalent Datalog formulation, and outcome. A selection
of other puzzles can be found in Appendix A.
9
    At the time of posing these puzzles and up to the best of our knowledge, most of these puzzles were novel in SQL (we only
    found a statement for primes), and solutions to them are reported here for the first time, both in SQL and Datalog.


                                                              61
Fernando Sáenz-Pérez CEUR Workshop Proceedings                                                                         54–68


  Here, we use the concrete syntax of DES, where :- stands for the neck of a rule, the comma (,)
for goal conjunction, => for embedded implication, /\ for rule assumptions, distinct(𝐴) for 𝛿(𝐴),
top(𝑁 ,𝐴) for 𝜏 (𝑁, 𝐴), and group_by(𝐴,𝑋𝑠,𝜑) for 𝒢(𝐴, 𝑋𝑠, 𝜑). Relation attributes are not explicitly
qualified with the relation name unless necessary.

4.1. Euler Number
Problem Statement: Compute the Euler number 𝑒 as a Taylor series:
                                               1  1  1  1      ∑︁ 1
                                         𝑒=      + + + + ··· =
                                               0! 1! 2! 3!        𝑖!
                                                                                𝑖≥0

As stop condition implicitly use the system precision for numbers.
A Solution: A straightforward solution is to define a CTE for the factorials with a schema of two
arguments: the index 𝑛 (n) and the value of 𝑛! (factorial). Its base case is (0, 1), and the inductive
case just computes (n + 1, n+1*factorial), given an already-computed tuple (n, factorial). Then,
define each term in the series as another CTE, say taylor, with a single column (euler) with each term
in the series, computed as 1/factorial from the CTE factorials:
WITH
  factorials(n, factorial) AS
    (SELECT 0, 1 UNION ALL SELECT n+1, (n+1)*factorial FROM factorials)
  WITH
    taylor(euler) AS
      (SELECT 1/factorial FROM factorials)
  SELECT SUM(euler) FROM (SELECT TOP 18 euler FROM taylor);

   Note a couple of things: First, there are two nested WITH statements (factorials is only known in
the context of the subsequent WITH statement. Alternatively, two consecutive CTE’s are also possible.
Second, the outcome SQL needs a top-𝑁 clause to limit recursion. Up to now, main RDBMS’s as
PostgreSQL, DB2, SQL Server, and Oracle do not admit nested WITH statements.
Datalog Formulation: Next, it is shown the automatically-generated core Datalog10 query and program
corresponding to the SQL statement. The program is asserted before solving the query and retracted
afterwards.
factorials(0,1) /\ (factorials(B,C) :- factorials(D,E), B=D+1, C=(D+1)*E)
=> answer_1_6(A).
answer_1_6(A) :- (taylor(B) :- factorials(_C,D), B=1/D) => answer_1_8(A).
answer_1_8(A) :- group_by(answer_1_10(B),[],A=sum(B)).
answer_1_10(A) :- top(18,taylor(B)), B=A.

  This translation folds some relations for the deductive engine to be able to solve the query. Instead, a
user would alternative write a query like this:
factorials(0,1) /\ ( factorials(D,E) :- factorials(F,G), D=F+1,                                E=(F+1)*G )
=> (taylor(B) :- factorials(_C,D), B=1/D)
=> group_by(top(18,taylor(B)),[],A=sum(B)).

An alternative solution is to define a CTE with an argument for the index 𝑛 of the 𝑛-th term in
the series, the value of the term (𝑡𝑒𝑟𝑚), and the current value of the running summation (𝑣𝑎𝑙). Thus,
the base case is 𝑛 = 0, 𝑡𝑒𝑟𝑚 = 1, and 𝑣𝑎𝑙 = 1, and the inductive case computes the next term for
presolved cases delivering 𝑛 + 1, 𝑡𝑒𝑟𝑚/(𝑛 + 1) (which equates to divide the 𝑛-th term by 𝑛 + 1), and
accumulating in 𝑣𝑎𝑙 the 𝑛 + 1-th approximation. The stop condition should test if the last term is
strictly greater than 0, since otherwise this means that the system numeric precision is met.
10
     Core Datalog is the language directly processed by the deductive engine. (User) Datalog is the language that the user can
     use at the top-level, and that enables high-level formulations for expressions and some metapredicates.


                                                               62
Fernando Sáenz-Pérez CEUR Workshop Proceedings                                                          54–68


WITH
  re(n,term,val) AS
    (SELECT 0, 1.0, 1.0 UNION
     SELECT n+1, term/(n+1), val+term/(n+1) FROM re WHERE term/(n+1) > 0)
  SELECT MAX(val) AS euler FROM re;

Datalog Formulation:
re(0,1,1) /\ ( re(B,C,D) :- re(E,F,G), H=F/(E+1), H>0, B=E+1, C=F/(E+1), D=G+F/(E+1) )
=> group_by(re(_B,_C,D), [], A=max(D)).

  By submitting this user Datalog query at the top-level, the system compiles it to an almost equal core
Datalog query and program as shown above for the SQL case. Such compilations can be inspected with
the command /show_compilations on.
Outcome: The query returns in the SICStus implementation:
answer(euler:float) ->
{ answer(2.7182818284590455) }

   Obviously, the precision of the approximation to the Euler number depends on the underlying system
arithmetics. Here, the last digit should read 2 instead of 5. This outcome is available by submitting the
command display_answer on.

4.2. All Time Greatest Hits
Problem Statement: We are given the table hits(theme string, copies int),11 holding song titles
and number of sold copies for each title. Write a recursive SQL query to assign the rank of each song in
terms of its sold copies (number 1 is for the most sold song, number 2 for the next and so on). If there
are two or more songs with the same number of copies, they will receive the same rank number.
A Solution: An idea to solve this puzzle is to build pairs with an integer representing the rank and the
number of copies for hits such that there no other hits with a higher number of copies. Then, return
the maximum rank for each group of hits with the same number of copies, adding to the output the
song titles (known from the table hits and their number of copies).
   The base case in the recursive part is rank 1 for all hits. The inductive case rec is the next rank in
rec such that the copies in hits is strictly less than the copies in rec, i.e., once all copies receive a rank
1, the rank 2 is received to all copies but the one with the maximum number of copies, and so on. Its
SQL formulation is as follows:
WITH rec(ranking, copies) AS
  (SELECT 1, copies FROM hits
   UNION
   SELECT ranking+1, hits.copies
   FROM hits, rec
   WHERE hits.copies < rec.copies)
SELECT ranking, theme, hits.copies
FROM hits, (SELECT MAX(ranking) AS ranking, copies FROM rec GROUP BY copies) m
WHERE m.copies = hits.copies;

Datalog Formulation: Observe that a Datalog equivalent query is neater than its SQL counterpart:
( rec(1,D) :- hits(_E,D) ) /\ ( rec(F,G) :- hits(_H,G), rec(I,J), G<J, F=I+1 )
=> hits(B,C), group_by(rec(D,C), [C], A=max(D)).

Outcome:
answer(h1.theme:varchar(50),h1.copies:int,pos:int) ->
{ answer(1,’White Christmas’,50),       answer(2,’In the Summertime’,31),
  answer(3,’Silent Night’,30),          answer(4,’My Heart will Go On’,25),
  answer(4,’Rock Around the Clock’,25), answer(5,’I Will Always Love You’,20),
  answer(5,’It\’s Now or Never’,20),    answer(5,’We Are the World’,20),
  answer(6,’If I Didn\’t Care’,19) }
Info: 9 tuples computed.
11
     https://www.countryliving.com/life/news/a45720/white-christmas-song-history.


                                                            63
Fernando Sáenz-Pérez CEUR Workshop Proceedings                                                      54–68


4.3. Prime Numbers
Problem Statement: Compute the first 𝑛 prime numbers, where 𝑛 is a natural bound.
A Solution: The sieve of Eratosthenes can be expressed in SQL by generating numbers and getting rid
of those which can be divided by a lesser number. So, a natural number generator can be defined as
a CTE, and then the factors for each number as another CTE. For a number to be a factor of another
number it suffices to check that the division remainder is 0. The CTE factor below delivers pairs of
𝑥,𝑓 , where 𝑥 is a number and 𝑓 is one of its factors. The outcome selects each 𝑛 for which there is only
one factor.
/set bound 100
WITH
  number(n) AS
    (SELECT 0 UNION SELECT n+1 FROM number WHERE n < $bound$),
  factor(x, f) AS
    (SELECT x.n, f.n FROM number x, number f
     WHERE x.n > f.n AND f.n > 0 AND x.n mod f.n = 0)
SELECT x FROM (SELECT x, COUNT(*) c FROM factor GROUP BY x) WHERE c=1;


  Note that DES includes user variables which can be used as parameters, such as bound in this example,
which are set with the command /set Variable Value.
Datalog Formulation: Again, the translation seems to be more compact than the source SQL statement.

     number(0) /\ ( number(B) :- number(C), C<100, B=C+1 )
/\ ( factor(D,E) :- number(D), number(E), D>E, E>0, 0=D mod E )
=> group_by(factor(A,_F), [A], (_G=count,_G=1)).

Outcome:

answer(factor.x:int) ->
{ answer(2), answer(3), answer(5), answer(7), ..., answer(97) }
Info: 25 tuples computed.

4.4. Graph of an Explicit Function
Problem Statement: Plot the sine function as depicted in the figure below.
A Solution: Here we define two CTE’s: One working on the real plane (plot) which relates 𝑥 and 𝑦
coordinates via a function (parametrically specified with the user variable f), with square dimensions.
The second CTE works on the “pixel" plane (bars) by building a string of spaces with as many spaces
as pixels in the ordinates (horizontally represented instead of vertically), which will be truncated to the
required length with respect to the value of the function at each 𝑥 coordinate. Observe that strings in
the display are quoted and ended with the characters “’),”, so that it visually resembles the plot.
   Both plot and bars are built in a similar way natural numbers are defined: a base case with the first
point in the plane, and with a single space, respectively. The recursive case of plot adds pairs (x+dx,
fx) for 𝑥 coordinates up to its bound, whilst for bars, a space is added up to reaching its top length.
   Once both relations are defined, the outcome query truncates (with the function substr: substring)
the longest bar up to the value of the corresponding 𝑦 coordinate for each 𝑥 coordinate found in plot.


-- Dimension of the plot (d+1) rows and (d+1) columns:
/set d 50
-- Start x:
/set x0 0.0
-- End x:
/set x1 4*pi
-- Delta x:
/set dx ($x1$-$x0$)/$d$
-- Function:
/set f sin(xi)
WITH


                                                    64
Fernando Sáenz-Pérez CEUR Workshop Proceedings                                                        54–68


  plot(x, fx) AS
   (SELECT $x0$ xi, $f$
    UNION
    SELECT x+$dx$ xi, $f$ FROM plot WHERE x <= $x1$),
  bars(bar) AS
   (SELECT ’ ’
    UNION
    SELECT bar||’ ’ FROM bars WHERE length(bar)<$d$)
SELECT substr((select max(bar) bar from bars), 1,
     round($d$*(fx-(SELECT min(fx) FROM plot))*(SELECT 1/(max(fx)-min(fx)) FROM plot)))
FROM plot;

Datalog Formulation: The following equivalent Datalog query includes the aggregate predicates
max/3 and min/3, which are syntax simplifications for a group_by predicate (e.g., max(R,X,M) ≡
group_by(R,[],M=max(X))). Functions concat(A,B) (concatenation of A and B and written as A||B
in standard SQL), length(A) (length of A), and substr(A,P,N) (substring of N characters of A taken
from position P) are used.

   ( plot($x0$,F) :- F=sin($x0$) )
/\ ( plot(B,C) :- plot(D,_E), D=<$x1$, B=D+$dx$, C=sin(D+$dx$) )
/\ bars(’ ’) /\ ( bars(H) :- bars(G), length(G)<$d$, H=concat(G,’ ’) )
=> plot(_A,_B),
   max(bars(_K),_K,_H),
   min(plot(_L,_M),_M,_C),
   group_by(plot(_N,_O),[],(_F=1/(max(_O)-min(_O)))),
   A=substr(_H,1,round($d$*(_B-_C)*_F)).

4.5. Base Conversion
Problem Statement: Given the table conversion(c:string, fb:int, tb:int), with natural numbers
to be converted (c) from a base (fb) to another base (tb), write an SQL query returning a row for each
element in the table conversion, with schema (number_fb, fb, number_tb, tb), such that number_fb is
the number in base fb equivalent to the number number_tb in base tb.
A Solution: This problem can be subdivided as usually in two subproblems: Convert to base 10,
and then to the target base. We can assume a table digits(n INT, d STRING) containing tuples as {
(0,’0’), (1,’1’), . . . , (10,’A’), . . . }.
   For converting to base 10, the approach is to recursively generate tuples with the inspected position
𝑖 (index of the digit from right to left starting with 0) with a value corresponding to the number up
to its 𝑖-th position. So, we can think of a schema for this first CTE as to_base_10(c, fb, tb, s, i,
n), where original parameters in conversion (c, fb and tb) are kept with their same names, s is the
number truncated up to the 𝑖 − 1-th index, i is the index 𝑖, and n is the value (base 10) for the number c
up to the 𝑖-th position. The original number is required in this schema because there will be in general
several numbers to be converted, so that this first column will discriminate the number for which the
conversion is made. A similar requirement applies to the original base fb and the target base fb.
   For the sake of keeping neat the formulation, the base case for to_base_10 starts at index −1, and
the original number is appended (with the || operator) with a void −1-th position, delivering a value 0
for it. Otherwise, we could start at index 0, but calculations in the recursive case should be also included
in the base case. This recursive case builds tuples from to_base_10(c, fb, tb, s, i, n) itself as
follows: its first 3 columns are the original ones, as found in to_base_10. The number s (recall: a string)
is right trimmed in 1 character; the index i is incremented, and its quantity (weighted with its position:
fb^(i+1)) is added to n, where the value of the digit is taken from the digits table.
   Converting a number in base 10 to another base is a somewhat similar problem, but with schema
to_base_b(c, fb, tb, n, st): the three original columns (c, fb and tb), the number in base 10 (n)
and the most significant digits already processed (st). Here, the base case comes from the results for
the relation to_base_10, selecting the value n for the final calculation (the length of the truncated
number is 1: the last, most-significant character in the string). In this case, the index is not used because
another approach is taken: dividing the current number by the target base and delivering the digit


                                                     65
Fernando Sáenz-Pérez CEUR Workshop Proceedings                                                    54–68


corresponding to the remainder of the division as the next digit, from the less to the most significant
digits. Then, the recursive case builds the outcome by appending the calculated digit to the left of the
former outcome.

WITH
 to_base_10(c, fb, tb, s, i, n) AS (
   SELECT c, fb, tb, c || ’ ’, -1, 0
   FROM conversion
   UNION
   SELECT c, fb, tb, SUBSTR(s,1,LENGTH(s)-1), i+1,
          integer(n+(SELECT n
                      FROM digits
                      WHERE d=SUBSTR(to_base_10.s,LENGTH(to_base_10.s)-1,1) )*fb^(i+1))
   FROM to_base_10 WHERE LENGTH(s)>1
   ),
 to_base_b(c, fb, tb, n, st) AS (
   SELECT c, fb, tb, n, ’’
   FROM to_base_10
   WHERE LENGTH(s)=1
   UNION
   SELECT c, fb, tb, n DIV tb,
          (SELECT d FROM digits WHERE n=to_base_b.n MOD to_base_b.tb) || st
   FROM to_base_b
   WHERE n MOD tb > 0
   )
SELECT c AS number_fb, fb AS fb, st AS number_tb, tb AS tb FROM to_base_b WHERE n=0;

Datalog Formulation: The Datalog translation is more elaborated than in previous examples (but still,
neater than its counterpart SQL formulation).

   ( to_base_10(E,F,G,H,-1,0) :- conversion(E,F,G), H=concat(E,’ ’) )
/\ ( to_base_10(I,J,K,L,M,N) :-
       to_base_10(I,J,K,O,P,Q), R=length(O), R>1, T=R-1, L=substr(O,1,T), M=P+1,
       digits(U,V), X=length(O)-1, V=substr(O,X,1), N=integer(Q+U*J**(P+1)))
/\ ( to_base_b(Y,Z,AA,AB,’’) :- to_base_10(Y,Z,AA,AC,_AD,AB), length(AC)=1 )
/\ ( to_base_b(AE,AF,AG,AH,AI) :-
       to_base_b(AE,AF,AG,AJ,AK), AJ mod AG>0, AH=AJ div AG,
       digits(AM,AN), AM=AJ mod AG, AI=concat(AN,AK) )
=> to_base_b(A,B,D,0,C).

4.6. Universal Turing Machine
Problem Statement: Implement a universal Turing machine as a state transition system, following
https://aturingmachine.com/. Possible transitions are specified in the input table transitions(state,
symbol, newstate, newsymbol, move), and the initial state is given in initial_state(state, tape,
pos). Here, state is a numeric value identifying the state, tape is a string containing a sequence of
symbols 0’s and 1’s, pos is the position of the scanned symbol (the leftmost character of the tape is
always the position 1 even when the tape can can grow indefinitely to the right or to the left), move can
be either be ’L’ (to the left) or ’R’ (to the right).
Input Data: The two input tables are filled as shown next to implement a binary counting machine. It
counts from 0 upwards without stopping. Inspecting its outcome can be done by adding a stop condition
on the number of the transitions (steps).
INSERT INTO initial_state VALUES(0,’ ’,1);
-- (0,B) -> (1,B) Left: When a blank (B) is found we change to state 1:
INSERT INTO transitions VALUES(0,’ ’,1,’ ’,’L’);
-- (0,0) -> (0,0) Right: This state moves the tape to the right most digit:
INSERT INTO transitions VALUES(0,’0’,0,’0’,’R’);
-- The following transition is not needed:
-- (0,1) -> (0,1) Right: This state moves the tape to the right most digit
-- (1,B) -> (0,1) Right: If we change a Blank to a 1 we change back to state 0:
INSERT INTO transitions VALUES(1,’ ’,0,’1’,’R’);
-- (1,0) -> (0,1) Right: If we change a 0 to a 1 we change back to state 0:


                                                   66
Fernando Sáenz-Pérez CEUR Workshop Proceedings                                                         54–68


INSERT INTO transitions VALUES(1,’0’,0,’1’,’R’);
-- (1,1) -> (1,0) Left: If we change a 1 to a 0 we keep looking to the left:
INSERT INTO transitions VALUES(1,’1’,1,’0’,’L’);

A Solution: Implementing a universal Turing machine can be done with a recursive query where the
base case contains the initial state, and the recursive case selects the next state in terms of the symbol in
the current tape position. This must consider that the tape can grow both the left and to the right. For
the last case, the string tape can be concatenated with a new symbol and the index at the first position
does not change. For the former case, a new blank symbol is prepended and the index 1 is for this
position.
WITH m(state,tape,pos,step) AS (
  SELECT state,tape,pos,1
  FROM initial_state
  UNION ALL
  SELECT
      -- State
      t.newstate,
      -- Tape
      CASE
        WHEN t.move=’L’ AND pos>1 THEN
          SUBSTR(tape,1,pos-1)||t.newsymbol||SUBSTR(tape,pos+1)
        WHEN t.move=’L’ AND pos=1 THEN
          ’ ’||t.newsymbol||SUBSTR(tape,pos+1)
        WHEN t.move=’R’ AND LENGTH(tape)=pos THEN
          SUBSTR(tape,1,pos-1)||t.newsymbol||’ ’
        WHEN t.move=’R’ AND LENGTH(tape)>pos THEN
          SUBSTR(tape,1,pos-1)||t.newsymbol||SUBSTR(tape,pos+1)
      END,
      -- Move
      CASE
        WHEN t.move=’L’ AND pos>1 THEN pos-1
        WHEN t.move=’L’ AND pos=1 THEN 1
        WHEN t.move=’R’           THEN pos+1
      END,
      -- Step
      step+1
  FROM transitions t, m
  WHERE m.state=t.state AND
         SUBSTR(m.tape,m.pos,1)=t.symbol
         -- AND step < $max_step$ -- Use this as a stop condition
    )
SELECT step,state,pos,tape FROM m;

Datalog Formulation:

   (m(E,F,G,1) :-
      initial_state(E,F,G))
/\ (m(H,I,J,K) :-
      transitions(L,M,H,N,O), m(L,P,Q,R), ’$substr’(P,Q,1,M), R<40, S=Q-1,
      ’$substr’(P,1,S,T), ’$concat’(T,N,U), V=Q+1, ’$substr’(P,V,W), ’$concat’(U,W,X),
      ’$concat’(’ ’,N,Y), Z=Q+1, ’$substr’(P,Z,AA), ’$concat’(Y,AA,AB), AC=Q-1,
      ’$substr’(P,1,AC,AD), ’$concat’(AD,N,AE), ’$concat’(AE,’ ’,AF), AG=Q-1,
      ’$substr’(P,1,AG,AH), ’$concat’(AH,N,AI), AJ=Q+1, ’$substr’(P,AJ,AK),
      ’$concat’(AI,AK,AL),
      ’$case’([((O=’L’,Q>1),Q-1),((O=’L’,Q=1),1),(O=’R’,Q+1)],null,J), K=R+1,
      ’$case’([((O=’L’,Q>1),X),((O=’L’,Q=1),AB),((O=’R’,’$length’(P,AM),AM=Q),AF),
               ((O=’R’,’$length’(P,AN),AN>Q),AL)],null,I))
=> m(B,D,C,A).


5. Conclusions and Future Work
SQL puzzles for challenging students’ skills have been presented and solved in the DES system, which
has been selected because it helps students with syntax and semantic warnings about SQL queries.
Since the system is based on intuitionistic logic programming, it allows for neater and simpler SQL


                                                     67
Fernando Sáenz-Pérez CEUR Workshop Proceedings                                                     54–68


formulations to solve those puzzles because SQL queries are translated into a Hypothetical Datalog
program. Both the syntax and the inference system for the Hypothetical Datalog system in [9] have been
extended to support expressions in conditions and other meta-predicates needed for expressing SQL more
complex queries. The implementation can be greatly enhanced in several ways; for example, replacing
its prototype SQL parser with a more performant one, and supporting the semi-naïve differential
optimization [19], which hugely improves performance for recursive problems, and fixing some known
issues. Additionally, more puzzles will be developed for the enjoyment of students (and the teacher).


References
 [1] M. Cook, Universality in elementary cellular automata, Complex Systems 15 (2004).
 [2] A. Levitin, M.-A. Papalaskari, Using puzzles in teaching algorithms, SIGCSE Bull. 34 (2002)
     292–296.
 [3] J. Celko, Joe Celko’s SQL Puzzles and Answers, Second Edition (The Morgan Kaufmann Series in
     Data Management Systems), Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2006.
 [4] S. Wikipedia, Esoteric Programming Languages: Brainfuck, Intercal, Befunge, Esoteric Program-
     ming Language, Kvikkalkul, One Instruction Set Computer, Unlambda, Malbo, University-Press
     Org, 2013. URL: https://books.google.es/books?id=4X_hngEACAAJ.
 [5] F. Sáenz-Pérez, DES: A Deductive Database System, Electronic Notes on Theoretical Computer
     Science 271 (2011) 63–78.
 [6] F. Sáenz-Pérez, Applying constraint logic programming to sql semantic analysis, Theory and
     Practice of Logic Programming 19 (2019) 808–825. doi:10.1017/S1471068419000206.
 [7] R. Caballero, Y. García-Ruiz, F. Sáenz-Pérez, Declarative Debugging of Wrong and Missing Answers
     for SQL Views, in: Eleventh International Symposium on Functional and Logic Programming
     (FLOPS 2012), LNCS 7294, Springer, 2012, pp. 73–87.
 [8] A. J. Bonner, Hypothetical Datalog: Complexity and Expressibility, Theoretical Computer Science
     76 (1990) 3–51.
 [9] F. Sáenz-Pérez, Intuitionistic Logic Programming for SQL, in: M. V. Hermenegildo, P. Lopez-Garcia
     (Eds.), Logic-Based Program Synthesis and Transformation, Springer International Publishing,
     Cham, 2017, pp. 293–308.
[10] F. Arni, K. Ong, S. Tsur, H. Wang, C. Zaniolo, The Deductive Database System LDL++, TPLP 3
     (2003) 61–94.
[11] F. Sáenz-Pérez, Tabling with support for relational features in a deductive database, ECEASST 55
     (2012).
[12] A. J. Bonner, L. T. McCarty, Adding Negation-as-Failure to Intuitionistic Logic Programming,
     in: E. L. Lusk, R. A. Overbeek (Eds.), Proceedings of the North American Conference on Logic
     Programming (NACLP), The MIT Press, 1990, pp. 681–703.
[13] F. Sáenz-Pérez, Restricted predicates for hypothetical datalog, Electronic Proceedings in Theoretical
     Computer Science 200 (2015) 64–79.
[14] F. Sáenz-Pérez, Implementing tabled hypothetical datalog, in: Proceedings of the 25th IEEE
     International Conference on Tools with Artificial Intelligence, ICTAI’13, 2013, pp. 596–601.
[15] J. D. Ullman, Principles of Database and Knowledge-Base Systems, Volume II, Computer Science
     Press, 1988.
[16] ISO/IEC, SQL:2016 ISO/IEC 9075-1:2016 Standard, 2016.
[17] L. Sterling, E. Shapiro, The Art of Prolog (2Nd Ed.): Advanced Programming Techniques, MIT
     Press, Cambridge, MA, USA, 1994.
[18] S. Nieva, F. Sáenz-Pérez, J. Sánchez, HR-SQL: An SQL Database System with Extended Recursion
     and Hypothetical Reasoning. Ongoing Work, in: XV Jornadas sobre Programación y Lenguajes,
     PROLE2015 (SISTEDES), 2015, pp. 1–15.
[19] I. Balbin, K. Ramamohanarao, A generalization of the differential approach to recursive query
     evaluation, The Journal of Logic Programming 4 (1987) 259 – 262.


                                                   68