Differentiating Relational Queries
                                                                             Paul Peseux
                                                        supervised by M.Berar, T.Paquet & V.Nicollet
                                                                 Litis Normandie & Lokad
                                                                        Paris, France
                                                                  paul.peseux@lokad.com

ABSTRACT                                                                               2     ADSL
This work is about performing automatic differentiation of a query                     [8] [10] [2] [21] [. . . ] proposes to differentiate subsets of common
in the context of relational databases and queries. This is done in                    programming languages. What these initiatives have in common
order to perform optimization through gradient descent in these                        is the purpose of differentiating a pre-existing language. It is an
relational databases. This work describes a form of automatic dif-                     interesting task, while being complicated, as those languages are
ferentiation for a subset of relational queries.                                       not crafted for differentiation. This is especially true for relational
                                                                                       programming languages.
                                                                                          We introduce ADSL1 , which is A Differentiable Sub Language
                                                                                       that is intended to lower relational language. It is a simple language
                                                                                       where Automatic Differentiation is a first class citizen. This idea
                                                                                       is similar to [1] [9] [14]. ADSL is closed by differentiation: the
1    INTRODUCTION                                                                      adjoint, i.e. the derived program, of an ADSL program is also a
Modern Differentiable Programming applied to Deep Learning con-                        differentiable ADSL program. This closure gives immediate access
centrates on dense and regular problems such as images [13] [12]                       to higher order derivatives, which are sometimes used [15] [5].
and sound [6], or studies ways to project unstructured problems                        ADSL is a simple SSA language that supports loops and conditional.
into this framework [11] (e.g. auto-encoders for text data). Its suc-                  Its major specificity is its projectors and broadcasts support.
cess is partly due to automatic differentiation [21]. Parallel to this                    According to the definition below, an ADSL program is a list of
many business domains have a very well-defined structure, but this                     Statements < 𝑆 >, whose grammar is defined by:
structure is relational. For example supply chain data is organized
                                                                                       ⟨S⟩       ::= .
in relational databases and experts are used to working with these.
                                                                                                  | ⟨ v ← 𝑒⟩                               Variable assignment
A canonical example coming from this domain: items in the prod-
                                                                                                  | ⟨ Cond ( v Ψ 𝑃𝑇 𝑃𝐸 Φ)⟩                         Conditional
uct table come from suppliers in the supplier table and are stored
                                                                                                  | ⟨ For ( 𝜒 𝑃 Ξ)⟩                                       Loop
in warehouses in the warehouse table; the problem’s structure is
                                                                                                  | ⟨ Return v ⟩                          Output of a program
completely different from Computer Vision or Natural Language
Processing, two hot topics in Machine Learning (ML). As people                         ⟨e⟩        ::= .
that understand the supply chain complexity work with relational                                   | ⟨v⟩                                                  Variable
databases, it seems to be the adequate place to let them build their                               | ⟨f⟩                                                    Scalar
own models and optimize them. One of the main ways to optimize                                     | ⟨b⟩                                                  Boolean
a model is through gradient-based methods; if the model is written                                 | ⟨v+w⟩                                      Variable Addition
with queries then we need to differentiate them to optimize the                                    | ⟨ Call1 op v ⟩                                  Function Call
model. Letting experts build white box models will help them to                                    | ⟨ Call2 op v w ⟩                 Function Call (2 parameters)
check the sanity of models, which is very difficult to do on black box                             | ⟨ Param i ⟩                                Parameter access
models, such as deep neural networks. It is called Interpretable ML                                | ⟨ Const i ⟩                                  Constant access
in [19] and directly applies to the supply chain, where thousands of                               | ⟨v ⊳ 𝛽 ⟩                                 Broadcast Projector
orders are placed everyday for a single company. Furthermore [20]                                  | ⟨v ⊲ 𝛼 ⟩                               Aggregation Projector
has shown the advantages when performing an optimization in the                                    | ⟨ Pred ⟩                                            Predicate
database system itself, limiting data transfer costs, over pulling the
                                                                                       ⟨ Pred ⟩ ::= .
data out to an external ML-oriented system.
                                                                                                 | ⟨ And v w ⟩
   Many sub parts of languages have been differentiated: Python [17]
                                                                                                 | ⟨ Or v w ⟩
[4], C [8], Julia [10], Swift [18], F# [3] . . . A more complete refer-
                                                                                                 | ⟨ Not v ⟩
ence can be found at [7]. However, there are only a few attempts at
                                                                                                 | ⟨v<w⟩
differentiating SQL·like programming languages, to our knowledge.
                                                                                                 | ⟨ v ≤ 𝑤⟩
Thus we study a way to differentiate relational queries in order to
perform optimization through gradient descent.                                           ADSL is tight but it is enough to tackle many real business
                                                                                       problems, such as those encountered in the supply chain. Its main
                                                                                       characteristic is to be easily and fully differentiable. Projectors
Proceedings of the VLDB 2021 PhD Workshop, August 16th, 2021. Copenhagen, Den-
mark. Copyright (C) 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).                             1 Adsl library can be found at https://github.com/Lokad/Adsl
                                                                                  We believe that this schema is a good way to differentiate rela-
             R                                     (𝑇𝑠 , →) × 𝑓                tional queries: if the output is a relational query then we can use all
                                                                               the optimizations and parallelizations developed for regular ones.

                                                                               4     GRAPH POINT OF VIEW
            R′                                     (𝑇𝑠 , →) × 𝑓 ′
                                                                               In this section we introduce notations on graphs that will be ap-
                                                                               plied to the SQL Table tree in order to simplify it and facilitate its
                 Figure 1: Path to differentiation.
                                                                               compilation to ADSL.
                                                                                 Definition 4.1 (Polytree). A Polytree is a directed acyclic graph
(broadcasts and aggregations) are overused (with INNER JOIN and                whose underlying undirected graph is a tree.
GROUP BY SQL operator) while lowering a relational query.
                                                                                   For example, any tree structure of a website is a Polytree.
   Remark 1. We special case the addition. At first glance, it might
                                                                                 Definition 4.2 (Cross Edge). A cross-edge is a pair of edges in a
seem unclear why it is not included in Call2. It is due to our automatic
                                                                               graph (𝐴 −→ 𝐵, 𝐶 −→ 𝐵) which indicates that 𝐵 comes from a
differentiation implementation and goes beyond the scope of the paper.
                                                                               cross operation between 𝐴 and 𝐶.
3    DERIVED QUERY                                                                 Here is a simple way to create such an edge in SQL:
In this section we describe our approach to derive relational query.           CREATE t a b l e B AS
This approach is based on compilation.                                             SELECT ∗ FROM A
   Let R be a relational query that creates the float column 𝐿𝑜𝑠𝑠 in               CROSS JOIN C
the table 𝑂𝑇 . We assume that R involves a float column 𝑋 in the
table 𝑃𝑇 . In optimization or Machine Learning, such an objective                  Let 𝑃 be a Polytree with cross-edges: (𝑁 , 𝐸, (𝑒𝑖 , 𝑒 𝑗 ))
function is often called loss. 𝑂𝑇 stands for Observation Table and                 Definition 4.3 (PolyStar). Let’s define a PolyStar 𝑃★ as
𝑃𝑇 for Parameter Table. It is possible that 𝑂𝑇 = 𝑃𝑇 .                                𝑃★ = { (𝑃, 𝑛) | 𝑃 a Polytree with cross-edges & n a node of 𝑃 }   (1)
   Our main goal is to minimize the scalar:
                                                                                  A PolyStar is a PolyTree with a special focus on a specific node
SELECT sum ( L o s s ) FROM OT                                                 of the graph.
                                                                                  Let (𝑃, 𝑜𝑡) ∈ 𝑃★. We call
   Differentiating a relational query means that we want to create
the column 𝐿𝑜𝑠𝑠 ′ in 𝑃𝑇 that is the derivative of 𝑂𝑇 .𝐿𝑜𝑠𝑠 with respect             • an upstream node a node 𝑛 of 𝑃 such that 𝑛 −→ 𝑜𝑡.
to 𝑃𝑇 .𝑋 . Doing so will unlock optimization through gradient-based                 • an upstream cross node a cross node 𝑛 of 𝑃 such that one of
methods. As it appears hard to differentiate arbitrary query, we                      its parents is an upstream node.
reduce our scope to a subset of queries that should be wide enough                  • an observation-cross a cross node of 𝑃 such that one of its
to cover many industrial cases. First we do not consider a query                      parents is 𝑜𝑡.
that involves a Common Table Expression: these have to be inlined                   • a downstream node a node 𝑑 of 𝑃 such that is not an observation-
in the query. It drastically helps query compilation to ADSL. Second                  cross node and that 𝑜𝑡 −→ 𝑑.
𝑃𝑇 .𝑋 should appear once and only once.                                             • We call a full node the remaining nodes of 𝑃.
   Let 𝑇𝑠 be the set of tables used in R. Then {𝑂𝑇 , 𝑃𝑇 } ⊂ 𝑇𝑠 .                  By construction, there is no path between a full node and 𝑜𝑡.
   Let’s introduce the relation"𝑇𝐴 −→ 𝑇𝐵 " when the primary key
of 𝑇𝐴 is a foreign key in 𝑇𝐵 . It is said that 𝑇𝐴 broadcasts into 𝑇𝐵 .         4.1     SQL Table tree simplification
   Here is a simple way to create such 𝑇𝐴 and 𝑇𝐵 in SQL:                       A relational query in SQL creates many tables, even though some
                                                                               could be grouped. For example,
CREATE t a b l e TA AS
    SELECT f o r e i g n K e y AS p r i m a r y K e y                          SELECT L o s s FROM OT
    FROM TB
    GROUP BY f o r e i g n K e y                                               creates another table with a bijection from its index to 𝑂𝑇 index.
                                                                               Thus we introduce a novel join operator that helps us to simplify
   (𝑇𝑠 , −→) naturally forms a graph. Then we can compile the query            the table tree: TOTAL JOIN. 𝑇1 TOTAL JOIN 𝑇2 ON ⟨𝜃 ⟩ is the same
R to the pair (𝑇𝑠 , −→) × 𝑓 where 𝑓 is an ADSL program. Evaluation             semantic as 𝑇1 INNER JOIN 𝑇2 ON ⟨𝜃 ⟩ with the additional con-
of (𝑇𝑠 , −→) × 𝑓 gives 𝑂𝑇 .𝐿𝑜𝑠𝑠.                                               straint that for each line of 𝑇1 , there is exactly one line of 𝑇2 that
                                                                               corresponds. To make a successful 𝑇1 TOTAL JOIN 𝑇2 ON ⟨𝜃 ⟩ it is
   As 𝑓 is an ADSL program it is possible to differentiate it with             sufficient that 𝜃 columns are a primary key in 𝑇2 and a foreign key
respect to the input associated to 𝑃𝑇 .𝑋 as 𝑓 ′ .                              in 𝑇2 , but it is not necessary.
   We state that with                                                             Thanks to this join operator that is reminiscent of [16], we can
                                                                               gather tables in the graph that come from this operation. Indeed,
                           R ′ = (𝑇𝑠 , −→) × 𝑓 ′
                                                                               creating a new table is thus equivalent to adding a column to the
   evaluation of R ′ gives the expected 𝑂𝑇 .𝐿𝑜𝑠𝑠 ′ , which is repre-           origin table. Then the compilation in ADSL from any join oper-
sented in Figure 1.                                                            ator that is not a TOTAL JOIN gives a projector (a broadcast or
                                                                           2
                                                                          4.3     Why all these notations?
                         𝑇1                                               All these notations help us to compile the query as easily as possible.
                                                                          While computing a line of 𝑃𝑇 .𝑋 ′ from (𝑇𝑠 , −→) × 𝑓 ′ , an input
                                 TOTAL JOIN                               coming from
                                                                                • the observation table gives a scalar
             𝑇3                       𝑇2                                        • an upstream table gives a scalar
                                                                                • a full table gives a vector of the size of the full table itself.
                                                                                • an upstream-cross table gives a vector of the size of the left
                                                                                  table used in the cross operation
                                                                                • a downstream table gives a vector of certain size.
                                                                              In summary, we introduce the TOTAL JOIN operator to turn the
                           𝑇4                   𝑇5                        SQL Table tree into a PolyStar. Once it is a PolyStar, its lowering,
                                                                          i.e. compilation, to ADSL is simplified.

                                                                          5 EXPERIMENTS
       Figure 2: Simplification thanks to TOTAL JOIN
                                                                          5.1 Dataset
                                                                          We used the Chicago taxi rides dataset that can be found here 2 .
an aggregator). If a TOTAL JOIN is used then operations can be
                                                                          We chose this dataset because it has also been used by [20] which
performed line by line thus we can compile it to a scalar operation
                                                                          partly motivated our work.
such as +, Call1, Call2 . . .
                                                                             For each ride, we use the taxi identifier, distance (in miles) and
     Remark 2 (Sufficient condition). If the query is written with-       the tips (in dollar). In this example, the Observation table is the
out any Common Table Expression and involves 𝑃𝑇 .𝑋 only once, then        Trips table and the Taxis table is an upstream table:
𝑇𝑠 , 𝑂𝑇 ∈ 𝑃★ where 𝑇𝑠 are the used tables in R.
                                                                          CREATE t a b l e T a x i s AS
                                                                              SELECT t a x i I d , 1 AS a FROM T r i p s
4.2    A supply chain example                                                 GROUP BY t a x i I d
In this section we take a real case from the supply chain industry
to illustrate our previous formalization.
   Let’s consider that our database contains information on prod-         5.2     Linear Regression
ucts that a company sells. It has the Product table recording the         We use linear regression to predict the tip based on the trip distance.
products. These products are organized by categories. The Orders
table records products orders. Assuming that we also have a Week                                     𝑡𝑖𝑝𝑠 = 𝑎 × 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 + 𝑏                     (2)
table whose primary is the week number, we would write:                   It is an interesting example to perform a benchmark but according
CREATE t a b l e C a t e g o r y AS                                       to us, it does not illustrate the relational aspect of the dataset. Thus
    SELECT c a t e g o r y FROM P r o d u c t                             we also used an augmented version of this model where the slope
    GROUP BY c a t e g o r y                                              depends on the taxi identifier, the intercept remains shared among
                                                                          taxis:
CREATE t a b l e CategoryWeek AS
                                                                                             𝑡𝑖𝑝𝑠 taxiId = 𝑎 taxiId × 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 + 𝑏              (3)
    SELECT ∗ FROM C a t e g o r y
    CROSS JOIN Week                                                          This example illustrates how the relational information between
                                                                          the Trips table and the Taxis table has to be used.
CREATE t a b l e ProductWeek AS
    SELECT ∗ FROM P r o d u c t                                           DECLARE @ i n t e r c e p t FLOAT ;
    CROSS JOIN Week                                                       SET @ i n t e r c e p t = 0 . 0 ;

   Then we get Figure 3.                                                  SELECT
                                                                             tripId , taxiId ,
                                                                             ( E s t i m a t e d − T i p s ) ^ 2 AS Loss ,
                                                                          FROM (
                                                                            SELECT
                                                                                 ∗,
                                                                                Taxis . a ∗ Trips . distance + @intercept
                                                                                        AS E s t i m a t e d
                                                                            FROM T r i p s
                                                                            INNER JOIN T a x i s
                                                                            WHERE T r i p s . t a x i I d = T a x i s . t a x i I d ) ;
                        Figure 3: PolyStar
                                                                          2 https://data.cityofchicago.org/Transportation/Taxi-Trips/wrvz-psew

                                                                      3
                                                                               relational programming language will consider Automatic Differ-
                  𝜕
                    (𝑎𝑥 + 𝑏 − 𝑦) 2 = 2𝑎(𝑎𝑥 + 𝑏 − 𝑦)                            entiation as first class citizens in the future, this would strengthen
                 𝜕𝑎                                                            "Query 2.0" [22] and unlock many interesting applications. This
thus the derived query (with respect to the slope 𝑎) should be:                would help every engineer working on relational databases to de-
DECLARE @ i n t e r c e p t FLOAT ;                                            velop efficient white-box models by easily plugging their expertise
SET @ i n t e r c e p t = 0 . 0 ;                                              into it.

SELECT                                                                         ACKNOWLEDGMENTS
   tripId , taxiId ,
                                                                               This work was supported by ANRT French program and Lokad.
  2 ∗ a ∗ ( E s t i m a t e d − T i p s ) AS G r a d i e n t ,
FROM (
  SELECT                                                                       REFERENCES
     ∗,                                                                         [1] Martín Abadi and Gordon Plotkin. 2019. A simple differentiable programming
                                                                                    language. Proceedings of the ACM on Programming Languages 4 (12 2019), 1–28.
     Taxis . a ∗ Trips . distance + @intercept                                      https://doi.org/10.1145/3371106
          AS E s t i m a t e d                                                  [2] Atilim Baydin, Barak Pearlmutter, Alexey Radul, and Jeffrey Siskind. 2018. Auto-
  FROM T r i p s                                                                    matic differentiation in machine learning: A survey. Journal of Machine Learning
  INNER JOIN T a x i s                                                              Research 18 (04 2018), 1–43.
                                                                                [3] Atilim Günes Baydin, Barak A. Pearlmutter, and J. Siskind. 2016. DiffSharp: An
  WHERE T r i p s . t a x i I d = T a x i s . t a x i I d ) ;                       AD Library for .NET Languages. ArXiv abs/1611.03423 (2016).
                                                                                [4] Olivier Breuleux and Bart van Merriënboer. 2017. Automatic Differentiation in
For the sake of notation, slopes are initialized at 1 and intercept at              Myia.
0.                                                                              [5] M. Cerezo and Patrick Coles. 2020. Impact of Barren Plateaus on the Hessian and
                                                                                    Higher Order Derivatives.
   Such a model has a straight forward explanation; the model can               [6] Hoon Chung, Sung Joo Lee, Hyeong Bae Jeon, and J. Park. 2020. Semi-Supervised
be white boxed. Taxi’s slope shows its ability to get tips. It is called            Speech Recognition Acoustic Model Training Using Policy Gradient. Applied
Interpretable ML [19]. We’ve used linear regression for simplicity                  Sciences 10 (2020), 3542.
                                                                                [7] Autodiff.org community. 2020. Tools for Automatic Differentiation. http://www.
sake, but unlocking differentiable programming, i.e. access program                 autodiff.org/?module=Tools.
derivative, to relational programming language unlocks an amazing               [8] L. Hascoët and V. Pascual. 2013. The Tapenade automatic differentiation tool:
variety of other models. All experiments were run on Azure with                     Principles, model, and specification. ACM Trans. Math. Softw. 39 (2013), 20:1–
                                                                                    20:43.
¨Standard_L8s_v2¨, a 8 vCPU machine, running at 2.557 GHz with                  [9] Y. Hu, L. Anderson, Tzu-Mao Li, Q. Sun, N. Carr, Jonathan Ragan-Kelley, and F.
a disk of 1.9TB NVMe. Our prototype and experiments were run                        Durand. 2020. DiffTaichi: Differentiable Programming for Physical Simulation.
                                                                                    ArXiv abs/1910.00935 (2020).
on a supply chain Domain Specific Language. It is a Python·like                [10] Michael Innes. 2018. Don’t Unroll Adjoint: Differentiating SSA-Form Programs.
implementation of SQL narrowed for supply chain problems. Tests                     (10 2018).
were carried out five times and the average runtimes recorded. In              [11] Mike Innes, A. Edelman, K. Fischer, C. Rackauckas, E. Saba, V. B. Shah, and Will
                                                                                    Tebbutt. 2019. A Differentiable Programming System to Bridge Machine Learning
Table 1, we present our result for 10 epochs of gradient descent on                 and Scientific Computing. ArXiv abs/1907.07587 (2019).
the Chicago dataset.                                                           [12] Tzu-Mao Li. 2019. Differentiable Visual Computing. ArXiv abs/1904.12228 (2019).
                                                                               [13] Tzu-Mao Li, Michaël Gharbi, Andrew Adams, F. Durand, and Jonathan Ragan-
                                                                                    Kelley. 2018. Differentiable programming for image processing and deep learning
             Table 1: Runtime for Linear Regressions                                in halide. ACM Transactions on Graphics (TOG) 37 (2018), 1 – 13.
                                                                               [14] Carol Mak and C. Ong. 2020. A Differential-form Pullback Programming
                                                                                    Language for Higher-order Reverse-mode Automatic Differentiation. ArXiv
      Trips       Taxis    Shared Slope (sec)       Taxi’s Slope (sec)              abs/2002.08241 (2020).
                                                                               [15] Andrea Mari, Thomas Bromley, and Nathan Killoran. 2020. Estimating the
       103          479              5.6 × 10−2            6.1 × 10−2               gradient and higher-order derivatives on quantum hardware.
                                                                               [16] Frank McSherry. 2010. Privacy Integrated Queries: An Extensible Platform for
       105         1037            3.14 × 10−1            4.27 × 10−1               Privacy-Preserving Data Analysis. Commun. ACM 53 (09 2010), 89–97. https:
    1.95 × 107     9201              4.47 × 102            5.38 × 102               //doi.org/10.1145/1559845.1559850
                                                                               [17] B. V. Merrienboer, D. Moldovan, and Alexander B. Wiltschko. 2018. Tangent:
                                                                                    Automatic differentiation using source-code transformation for dynamically
   In Table 1, Shared Slope is the implementation relative to equa-                 typed array programming. ArXiv abs/1809.09569 (2018).
                                                                               [18] Marc Rasi Bart Chrzaszcz Richard Wei, Dan Zheng. 2020. Differentiable
tion 2 and Taxi’s Slope is relative to equation 3. We’ve not repro-                 Programming Manifesto.           https://github.com/apple/swift/blob/main/docs/
duced all experiments from [20] as our focus is to differentiate a                  DifferentiableProgramming.md.
relational query that involves table relationships. In our example             [19] Cynthia Rudin. 2019. Stop explaining black box machine learning models for
                                                                                    high stakes decisions and use interpretable models instead. Nature Machine
this is the relationship between Trips and Taxis tables.                            Intelligence 1 (05 2019), 206–215. https://doi.org/10.1038/s42256-019-0048-x
                                                                               [20] Maximilian E. Schüle, Frédéric Simonis, Thomas Heyenbrock, A. Kemper, Stephan
6    CONCLUSION                                                                     Günnemann, and T. Neumann. 2019. In-Database Machine Learning: Gradient
                                                                                    Descent and Tensor Algebra for Main Memory Database Systems. In BTW.
In this work we’ve presented a concrete approach to perform differ-            [21] Bart van Merriënboer, Olivier Breuleux, Arnaud Bergeron, and Pascal Lamblin.
                                                                                    2018. Automatic differentiation in ML: Where we are and where we should be
entiation on relational query. Our claim is that derived query should               going. (10 2018).
also be a query. Thus we have introduced a dedicated programming               [22] Wu Weiyuan, Lampros Flokas, Eugene Wu, and Jiannan Wang. 2020. Complaint-
language ADSL that is closed by differentiation. Thanks to the intro-               driven Training Data Debugging for Query 2.0. 1317–1334. https://doi.org/10.
                                                                                    1145/3318464.3389696
duced operator TOTAL JOIN and PolyStar, we can clarify the roles
of different tables in the relational query to differentiate. Our imple-
mentation allows us to efficiently tackle real world problems such
as those encountered in a supply chain, for example. We hope that
                                                                           4