Application of SMT in a Meta-Compiler:
A Logic DSL for Specifying Type Systems
Romain Béguet1 , Raphaël Amiard1
1
    AdaCore, 46 Rue d’Amsterdam, 75009 Paris, France


                                         Abstract
                                         Building a compiler for a programming language is a notoriously hard task, especially when starting
                                         from scratch. That’s why we developed Langkit, a meta-compiler which allows language designers to
                                         focus on the high-level aspects of their language. In particular, the specification of type systems is done
                                         in a declarative manner by writing equations in a high-level logic DSL. Those equations are then resolved
                                         by a custom solver embedded in the generated compiler front-end whenever a code fragment of the
                                         target language needs to be analyzed. This framework is successfully used at AdaCore to implement
                                         name and type resolution of Ada, powering navigation in IDEs and enabling development of deep static
                                         analysis tools.
                                             We discuss the implementation of our solver, and how a recent switch to using a DPLL(T) solver
                                         backend with a custom theory allowed us both to address long-standing combinatorial explosion problems
                                         we had with our previous approach, but also to gain new insights in how to emit human-readable
                                         diagnostics for type errors.

                                         Keywords
                                         Satisfiability Modulo Theory, Meta-Compilation, Logic Programming


1. Introduction
In a typical multi-pass compiler pipeline for a statically typed language, the result of parsing
a code fragment of the target language is an abstract syntax tree which does not carry any
semantic information: identifiers are mere symbols that are not yet linked to their corresponding
definition, expressions are untyped, etc. Thus, one or several semantic analysis phases take
place to compute cross-references and validate or infer types.
   In some languages (such as C), figuring out the definition a name refers to, or the type of
an expression, is not context-sensitive: it merely requires a single name lookup in its lexical
environment. In other words, resolving an identifier does not require resolving the name or
the type of any other adjacent nodes. In languages such as Java or C++ that support function
overloading, the function definition targeted by a function call may depend on the type of one
or several of its arguments: therefore resolving this call requires a more involved analysis, often
done in cooperation with a type analysis working on several sub-parts of the expression at once.
Some languages such as Ada or Swift even allow overloading functions on their return-type,
which may cause type information to flow forward and backward through sub-expressions of a
given statement.

SMT 2023: 21st International Workshop on Satisfiability Modulo Theories, Rome, Italy, July 5-6, 2023
$ beguet@adacore.com (R. Béguet); amiard@adacore.com (R. Amiard)
                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
   In such complex languages, it is often useful to compute the static semantics of a given
program in two stage [1, 2], by first accumulating local constraints specified by each construct
appearing in the program according to its syntactic category and lexical context, and then have
a dedicated solver try to satisfy a fixed group of constraints at once in order to find a solution –
the exact scope of collection and solving depending on the language. This is the approach taken
by compiler implementations such as GHC [3] or Swiftc [4], but also by our meta-compiler
framework Langkit [5], which we describe in the next section.
   This technique offers several advantages: writing, understanding and debugging the type
system is easier because typing rules are described explicitly instead of being implied by the
execution of an algorithm. Elaborating an optimized implementation of a type system dedicated
to a particular language often requires significant efforts (many articles have been published
over the course of several years about optimization of overload resolution in Ada [6, 7, 8, 9]) and
such implementations are hardly reusable. A constraint solver on the other hand is generally an
independent component that can be reused and optimized without leaking in the implementation
of the type system, making the latter easier to maintain and extend. However using a constraint
solver to type programs raises an issue: how to produce good error messages for sets of
constraints that have no solution, given that producing good error messages for complex type
systems is already a complex problem [10]. We thus present in the following sections our main
contributions, which are:
    • The description and formalization of Langkit’s high-level domain-specific language (DSL)
      for specifying static semantics of programming languages that support complex features
      such as subtyping, return-type overloading and preferred interpretations.
    • Its implementation, as a lazy encoding of a custom theory of ordered disjunctions on top of
      the DPLL(T) architecture with minor extensions, and a demonstration that the resulting
      framework allows generating an efficient semantic analyzer for Ada.
    • Extensions to our DSL to allow language implementations to produce concise and relevant
      error messages for object language expressions that are semantically incorrect – even in
      presence of overloading – based on explanations produced by our theory.


2. Logic Framework
Our Langkit meta-compiler allows users to specify a language frontend, from syntax to semantics,
including the specification of the type system through methods that build sets of logic constraints.
The main use-case of Langkit so far has been to implement Libadalang [11], demonstrating
that the logic framework we developed is sufficient to implement the type system of Ada,
a complex industrial language spanning multiple revisions and featuring generics, ad-hoc
polymorphism (including return-type overloading), subtyping, object oriented features, etc.
The entire specification for the Ada language frontend, including description of the syntax tree,
the scope graph definition, and the name and type resolution specification, is around 25000
SLOCs, which we estimate to be at least 10 times smaller than what is needed by the main Ada
compiler GNAT [12] to implement the equivalent functionalities.
  As far as we know, Langkit is the only generic language framework that is able to express
complex type systems such as those of Swift, Ada or C# (which is known to be NP-hard [13]),
                      ℰ    ::=    𝒜                     Atom
                              |   ℰ ∧ℰ                  Conjunction
                              |   ℰ× ⃗ ℰ                Ordered disjunction
                      𝒜    ::=    𝒱←𝒞                   Assignment
                              |   𝒱 ← ℱ(𝒱, . . . , 𝒱)   Propagation
                              |   𝒱↔𝒱                   Aliasing
                              |   𝒫(𝒱, . . . , 𝒱)       Assertion
                              |   ⊤                     True
                              |   ⊥                     False
                      𝒱    ::=    𝑥, 𝑦, . . .           Variables
                      𝒞    ::=    A , B, . . .          Constants
                      ℱ    ::=    F, G, . . .           Function symbols
                      𝒫    ::=    P, Q, . . .           Predicate symbols

                       Grammar 1: Formal syntax of Langkit’s logic DSL


where types and names flow in both directions in the expression tree, and in some particular cases
even require looking deeper down the tree [14]. In particular, we think that one of the features
that makes our framework powerful is that there is no distinct stage between name binding
and type inference: every bit of semantic information is represented by a logic variable to be
resolved. Inter-dependency between name resolution and type inference is represented by type
variables being defined as functions of name variables (as in footype ← ReturnType(fooref ))
and vice-versa.
   Langkit has also been used to implement more complex type system features in proof-of-
concepts projects such as Dependz [15], implementing a small dependently typed calculus.
However, it remains to be seen how adaptable our approach is to languages with type systems
that contain more structural features, like structural subtyping.

2.1. Formalization
In our Langkit meta-compiler, an element of unresolved semantic information – such as a type
to be inferred – can be represented by a logic variable. Using Langkit’s logic DSL – whose
grammar is formalized in 1, users can define constraints on those variables: an Assignment
allows assigning constant values to variables, such as footype ← Integer. A Propagation
assigns to a variable a function of another variable – as in bartype ← ReturnType(barref ) – by
evaluating that function at solve-time once all the argument variables have been solved. Such
functions are defined by the user in the pure (side-effect free) functional language fragment
provided by Langkit. An Aliasing allows unifying two variables together, forcing them to hold
the same value. An Assertion can be used to enforce that a predicate over given variables
must hold – as in IsSubtype(footype , bartype ), and is evaluated as soon as all input variables
are solved. Predicate symbols are also defined by the user as functions returning booleans in
Langkit’s functional DSL. Finally, it is possible to combine all of these constraints in arbitrarily
nested conjunctions and (ordered) disjunctions.
Ordered Disjunctions. The ordered disjunction operator (×      ⃗ ) resembles the one introduced
by Qualitative Choice Logic [16] and LPODs [17, 18]. Intuitively, the meaning of ℰ1 ×    ⃗ ℰ2 is:
try ℰ1 first, but if it doesn’t work then try ℰ2 . This behavior allows language designers to
encode static preference semantics in their language. For example, in a binary operation X + Y
in Ada, if there are multiple possible interpretations of + among which one of them is the +
operator defined on root integers (one of the primitive types in Ada), then this interpretation is
preferred [19]. This can also for example be used to encode the preferred order for overload
resolution in C++, or even the order in which implicit values should be considered when
synthesizing an implicit parameter in Scala.

Definition 2.1 (Partially Ordered Split Programs). Formally, we define the semantics of ordered
disjunctions by reasoning in terms of partially ordered split programs, where a split program is
only a conjunction of atoms. For that, we define the operator 𝜓(ℰ) reducing an equation ℰ to
its partially ordered set of split programs ⟨{𝑃1 , . . . , 𝑃𝑛 }, ≤⟩, using the rules from figure 1.

                                                                 Atom
                                    𝜓(𝒜) = ⟨{𝒜}, {𝒜 ≤ 𝒜}⟩

                         𝜓(ℰ1 ) = ⟨𝑆1 , ≤1 ⟩    𝜓(ℰ2 ) = ⟨𝑆2 , ≤2 ⟩
                                                                                Disjunction
                   ⃗ ℰ2 ) = ⟨𝑆1 ∪ 𝑆2 , ≤1 ∪ ≤2 ∪ {𝑃1 ≤ 𝑃2 }𝑃1 ∈𝑆1 ,𝑃2 ∈𝑆2 ⟩
              𝜓(ℰ1 ×
                         𝜓(ℰ1 ) = ⟨𝑆1 , ≤1 ⟩    𝜓(ℰ2 ) = ⟨𝑆2 , ≤2 ⟩
                                                                                          Conjunction
     𝜓(ℰ1 ∧ ℰ2 ) = ⟨{𝑃1 ∧ 𝑃2 }𝑃1 ∈𝑆1 ,𝑃2 ∈𝑆2 , {𝑃1 ∧ 𝑃2 ≤ 𝑄1 ∧ 𝑄2 }𝑃1 ≤1 𝑄1 ,𝑃2 ≤2 𝑄2 ⟩

Figure 1: Reduction rules for 𝜓, mapping a DSL equation to a partially ordered set of split programs.


  Intuitively, the resulting partial order allows reasoning about preference on split programs
that do not contain ordered disjunctions anymore – as if the ordering information had been
extracted out of them. We have for example that:

               ⃗ 𝑣 ← B) ∧ (𝑤 ← C ×
      𝜓((𝑣 ← A ×                 ⃗ 𝑤 ← D)) = ⟨{𝑃1 , 𝑃2 , 𝑃3 , 𝑃4 }, 𝑃1 ≤ 𝑃2,3 ≤ 𝑃4 ⟩

                  𝑃1 = 𝑣 ← A ∧ 𝑤 ← C,                         𝑃2 = 𝑣 ← A ∧ 𝑤 ← D,
                  𝑃3 = 𝑣 ← B ∧ 𝑤 ← C,                         𝑃4 = 𝑣 ← B ∧ 𝑤 ← D.

  In particular, 𝑃2 and 𝑃3 are not comparable because 𝑃2 contains 𝑣 ← A which is preferred
over 𝑣 ← B, but 𝑃3 contains 𝑤 ← C which is preferred over 𝑤 ← D. However, both 𝑃2 and 𝑃3
are smaller than 𝑃4 (which contains all least preferred alternatives). 𝑃1 on the other hand is
smaller than all of them, as it contains all most preferred alternatives.

Definition 2.2 (Evaluation of Split Programs). We now give a method for evaluating equations
that do not contain ordered disjunctions. By definition, such an equation is of the form 𝐴1 ∧
· · · ∧ 𝐴𝑛 (with 𝑛 ≥ 1). Since conjunctions are commutative and associative, we can actually
reason about such equations as an unordered set of atoms {𝐴1 , . . . , 𝐴𝑛 }.
   We therefore define Φ : 𝒫(𝒜) → 𝒫(𝒱 × 𝒞) ∪ {⊥} as reducing a set of atoms to a variable-to-
constant mapping 𝜇 or a failure term ⊥ using the reduction rules from figure 2, where predicate
calls P(𝐶1 , . . . , 𝐶𝑛 ) ∈ {⊤, ⊥} and function calls F(𝐶1 , . . . , 𝐶𝑛 ) = 𝐶 may run arbitrary, side-
effect free computations:


                                        Φ(𝑃 ) = 𝜇               Φ({𝐴[𝑣1 := 𝑣2 ]}𝐴∈𝑃 ) = 𝜇
                            Bot                          Top                                     Alias
        Φ(𝑃 ∪ {⊥}) = ⊥              Φ(𝑃 ∪ {⊤}) = 𝜇                   Φ(𝑃 ∪ {𝑣1 ↔ 𝑣2 }) = 𝜇

                 𝐶1 ̸= 𝐶2                              𝑛≥0       𝑣1 . . . 𝑣𝑛 pairwise distinct
                                              Fail                                                Success
   Φ(𝑃 ∪ {𝑣 ← 𝐶1 } ∪ {𝑣 ← 𝐶2 }) = ⊥                  Φ({𝑣𝑖 ← 𝐶𝑖 }1≤𝑖≤𝑛 ) = {(𝑣𝑖 , 𝐶𝑖 )}1≤𝑖≤𝑛
         (𝑣1 ← 𝐶1 ), . . . , (𝑣𝑛 ← 𝐶𝑛 ) ∈ 𝑃      Φ(𝑃 ∪ {𝑤 ← F(𝐶1 , . . . , 𝐶𝑛 )}) = 𝜇
                                                                                        Propagate
                            Φ(𝑃 ∪ {𝑤 ← F(𝑣1 , . . . , 𝑣𝑛 )}) = 𝜇
             (𝑣1 ← 𝐶1 ), . . . , (𝑣𝑛 ← 𝐶𝑛 ) ∈ 𝑃      Φ(𝑃 ∪ {P(𝐶1 , . . . , 𝐶𝑛 )}) = 𝜇
                                                                                        Assert
                                  Φ(𝑃 ∪ {P(𝑣1 , . . . , 𝑣𝑛 )}) = 𝜇

Figure 2: Reduction rules for Φ, describing evaluation of split programs.


   In particular, rule Success triggers when the only atoms left are assignments to distinct
variables. Conversely, rule Fail triggers when there are two conflicting assignments to the
same variable. Both rules Propagate and Assert require all their variable arguments to have
corresponding assignments, which values are then used as arguments to dynamically invoke
the used-defined operation denoted by the function or predicate symbol. Note that for the sake
of brevity we have not included rules to explicitly fail when reduction is stuck (which can only
happen if rules Propagate and Assert cannot be applied because some variables are never
assigned), but this is easily detectable and we reduce to ⊥ as well in these cases.

Definition 2.3 (Preferred Solutions). Given an equation ℰ and its partially ordered set of split
programs 𝜓(ℰ) = ⟨𝑆, ≤⟩, we define the set of preferred solutions 𝑆 * ⊆ 𝑆 of ℰ as the set of split
programs for which evaluation succeeds and for which there exists no smaller split program for
which evaluation succeeds as well:

               𝑆 * = {𝑃 * ∈ 𝑆 : Φ(𝑃 * ) ̸= ⊥ ∧ (∄𝑄 ∈ 𝑆 : 𝑄 < 𝑃 * ∧ Φ(𝑄) ̸= ⊥)}

  Using this definition, we can already sketch a naive algorithm to compute the set of preferred
solutions of any given equation: construct its set of split programs in a preferred order (i.e.
construct 𝑃 before 𝑄 if 𝑃 ≤ 𝑄), applying the oracle Φ on each of them. As soon as we find a
split program 𝑃 * that satisfies Φ, insert it in the resulting set. Now continue the process but
only consider split programs that are not comparable to 𝑃 * , i.e. 𝑄 ∈ 𝑆 ∧ ¬(𝑃 * ≤ 𝑄).
  It is common in languages to consider an expression ambiguous when there are several
possible non-comparable solutions. This case can be easily detected by checking whether all
preferred solutions are equivalent or not: |{Φ(𝑃 * )}𝑃 * ∈𝑆 * | = 1.
3. Implementation and Results
Unsurprisingly, the algorithm sketched in definition 2.3 is very inefficient in practice. In
particular, its run-time for an equation that has no solution is always exponential in its number
of disjunctions, since it will expand and evaluate all split programs independently before
determining that none are feasible. Consider also the following (solvable) equation:

                                       ⃗ 𝑥 ← A) ∧ (𝑦 ← A ×
                        𝑥 ← A ∧ (𝑥 ← B ×                 ⃗ 𝑦←B×
                                                              ⃗ 𝑦 ← C)

The algorithm presented so far will compute the set preferred solutions (which only contains
one solution evaluating to {(𝑥, A), (𝑦, A)}) after evaluating 4 fully expanded options.

Early Pruning A first improvement can be made on this algorithm to avoid doing redundant
work in many cases: instead of using our oracle Φ on fully expanded split programs, we can use
it on intermediate split programs during the expansion of the equation. This requires a slight
modification to Φ to make it output a different outcome when it fails because evaluation is
stuck or because of an actual conflict: it only makes sense to prune the expansion in the second
case, as the first case may simply be due to the incompleteness of the partial equation. In our
example, we would get the following intermediate equations by splitting on the first ordered
disjunction:
                                   ⃗ 𝑦←B×
       1. (𝑥 ← A ∧ 𝑥 ← B) ∧ (𝑦 ← A ×    ⃗ 𝑦 ← C)
                                   ⃗ 𝑦←B×
       2. (𝑥 ← A ∧ 𝑥 ← A) ∧ (𝑦 ← A ×    ⃗ 𝑦 ← C)
If we decide to run the oracle on the partial solutions (underlined above) – starting with the
first one since it is preferred – we will find a contradiction early, and therefore we can avoid
expanding its right-hand side. Thanks to this improvement, we now only require evaluating
one partially expanded equation and one fully expanded equation (the solution).
   However it has a serious flaw: the optimization can range from being extremely useful to
completely ineffective depending on the order in which disjunctions are expanded: if we had
decided in our example to expand the ordered disjunctions on the right first, we would not have
discovered the contradiction until everything was fully expanded. This is inevitable because
no predefined expansion order in the solver will be optimal for all input equations. Thus, the
burden of choosing the right order is left to the language designer, which would make our logic
DSL not purely declarative1 .

Conflict-Driven Traversal In order to design an optimal traversal strategy, we make the
two following observations:
       1. First, we notice that even if we do not choose the optimal disjunction to split on in our
          algorithm above, we can compute a posteriori the minimal set of atoms that would have
          been needed to reject this option. In our example, we can easily extract from option
1
    Note that this is precisely how our previous solver was implemented, and this not only caused maintenance problems
    in our Ada specification due to the order-dependent nature of the DSL being implicit, it was also sometimes a
    challenging task to end up with the optimal ordering when the semantics of multiple Ada constructs interfered,
    causing the performance of the generated semantic analyzer to degrade.
      {𝑥 ← A, 𝑥 ← B, 𝑦 ← A} that atoms {𝑥 ← A, 𝑥 ← B} were the cause of the conflict. We
      call this process Explain-Conflict.
   2. If we knew in advance the complete set of atoms that are in conflict, we could reformulate
      the search problem as a satisfiability problem in a propositional logic augmented with
      ordered disjunctions (such as Qualitative Choice Logic [16]), in which atoms are abstracted
      by boolean variables. In our example, this would come down to finding an assignment
                          ⃗ 𝑎)∧(𝑑 ×
      that satisfies 𝑎∧(𝑏 ×         ⃗ 𝑒×
                                       ⃗ 𝑓 )∧(¬𝑎 ∨ ¬𝑏), where the underlined clause prevents
      the conflicting atoms from being set at the same time.
  By combining these two ingredients together, we can already devise a general solver scheme
based on a lazy encoding [20] of our theory: iteratively refine the initial formula with conflicts
found by the SAT-with-ordered-disjunctions solver on each candidate (here, to find one solution):

Algorithm 1 Lazy Encoding Scheme
  ℰ # ← Encode(ℰ)                     ◁ Encode the equation into a proposition with ordered disjunctions
  while 𝑀 ← Solve(ℰ # ) do                            ◁ Invoke the SAT-with-ordered-disjunctions solver
     𝐶 ← Decode(𝑀 )                                         ◁ Reconstruct the candidate from the model
     𝑅 ← Φ(𝐶)                                                                  ◁ Evaluate the candidate
     if 𝑅 = ⊥ then
         𝜂 ← Explain-Conflict(𝐶)                       ◁ Build a clause that contradicts this candidate
         ℰ # ← ℰ # ∧ Encode(𝜂)                  ◁ Append it to the original formula to forbid this model
     else
         return 𝑅                                                   ◁ R is one of the preferred solutions
     end if
  end while
  return Unsat


Encoding of Ordered Disjunctions Instead of designing an ad-hoc SAT-with-ordered-
disjunctions solver, we propose a partial encoding 𝜎 to propositional logic, so as to leverage
decades of research on optimizations in SAT solvers. The encoding is partial because the order-
ing information cannot easily be conveyed through the propositional formula itself [16] – thus
we will handle it differently. We define 𝜎 using the two rules in figure 3, which reduce a DSL
equation to a propositional formula alongside a mapping from each boolean variable appearing
in the propositional formula to the set of atoms from the DSL equation that it represents. The
mapping allows decoding models into candidate solutions that we can evaluate with Φ.


                      𝑣 ⊢ 𝜎(ℰ1 ) = 𝜋1 ‖ 𝜇1       ...  𝑣 ⊢ 𝜎(ℰ𝑛 ) = 𝜋𝑛 ‖ 𝜇𝑛
                                                  ⋀︀𝑛                                ⋃︀𝑛     (1)
      𝑣 ⊢ 𝜎(𝒜1 ∧ · · · ∧ 𝒜𝑚 ∧ ℰ1 ∧ · · · ∧ ℰ𝑛 ) = 𝑖=1 𝜋𝑖 ‖ {𝑣 → {𝒜1 , . . . , 𝒜𝑛 }} ∪ 𝑖=1 𝜇𝑖

    𝑛>1     fresh(𝑤1 , . . . , 𝑤𝑛 ) 𝑤1 ⊢ 𝜎(ℰ1 ) = 𝜋1 ‖ 𝜇1   ... 𝑤𝑛 ⊢ 𝜎(ℰ𝑛 ) = 𝜋𝑛 ‖ 𝜇𝑛
                                                                                           (2)
               ⃗ ℰ𝑛 ) = 𝑖=1 𝜋𝑖 ∧ 𝑖=1 (𝑣 ∨ ¬𝑤𝑖 ) ∧ (¬𝑣 ∨ 𝑖=1 𝑤𝑖 ) ∧ AMO𝑛𝑖=1 (𝑤𝑖 ) ‖ 𝑛𝑖=1 𝜇𝑖
                         ⋀︀𝑛        ⋀︀𝑛                   ⋁︀𝑛
         ⃗ ··· ×
                                                                                  ⋃︀
𝑣 ⊢ 𝜎(ℰ1 ×

Figure 3: Rules for the partial encoding 𝜎, reducing a DSL equation to a propositional formula.
  We give the following intuitions about the encoding rules:
    • The context variable 𝑣 in both rules is transferred from parent to child. Except from an
      initial context variable (say 𝑣0 ), only disjunctions create variables – which correspond to
      the selection of each of their branch. Thus, 𝑣 represents the current parent branch.
    • In the first rule, the order in which atoms 𝒜𝑖 and nested equations ℰ𝑖 occur in the
      conjunction does not matter. Note that a single variable is used to represent all the atoms
      in a conjunction.
    • In the second rule, we append to the formulas generated by each branch these facts:
         1. A branch variable 𝑤𝑖 can only be selected if its parent branch 𝑣 is selected.
         2. At least one branch variable 𝑤𝑖 must be selected if its parent branch 𝑣 is selected.
         3. At most one branch variable 𝑤𝑖 can be selected at the same time.
Remark. We do not impose a specific encoding for the generated at-most-one (AMO) constraints
themselves, as there exists a wide variety of them [21] which may be more effective in certain
scenarios than others – we discuss our approach in a later paragraph.
   This implementation of Encode can be directly plugged in algorithm 1, and since it generates
clauses in CNF, Solve may be implemented by most available SAT solvers. Here, using a
pairwise encoding for AMO constraints, our running example would be encoded into a formula
equivalent to:
        𝑎 ∧ (𝑏 ∨ 𝑐) ∧ (¬𝑏 ∨ ¬𝑐) ∧ (𝑑 ∨ 𝑒 ∨ 𝑓 ) ∧ (¬𝑑 ∨ ¬𝑒) ∧ (¬𝑑 ∨ ¬𝑓 ) ∧ (¬𝑒 ∨ ¬𝑓 )
which correctly yields models {𝑎, 𝑏, 𝑑}, {𝑎, 𝑐, 𝑑}, {𝑎, 𝑏, 𝑒}, . . . However, the order in which
these models are generated is never specified, therefore a preferred order is not guaranteed.

Enforcing a Preferred Order Indeed, modern SAT solvers typically rely on complex heuris-
tics for literal decisions [22], and since literals in our proposed encoding directly correspond to
branches of ordered disjunctions, it is not possible to enforce that models must be generated in
an order of our choosing. To remedy this situation, we propose to extend the DPLL(T) interface
to allow the theory to interfere in literal decisions. In particular, we propose to replace the
standard Decide [20] transition with a stricter rule which we call T-Decide:
               𝑙 or ¬𝑙 appears in 𝐹     𝑙 is undefined in 𝑀     𝑀 ◁𝑇 𝑙
                                                                         T-Decide
                                 𝑀 ‖ 𝐹 ⇒ 𝑀 𝑙𝑑 ‖ 𝐹
where 𝑀 ◁𝑇 𝑙 signifies that theory 𝑇 allows literal 𝑙 to be decided in the context of the current
model 𝑀 . Note that this rule does not prevent the theory from allowing multiple literals
to be considered (or from forbidding any of them to be, for that matter), thus sophisticated
decision heuristics of DPLL implementations may still apply. In our specific case, when the
theory is presented with multiple possible literals decisions, it simply makes sure not to select a
literal corresponding to the 𝑛’th branch of an ordered disjunction if a literal corresponding to
selecting the 𝑚’th branch (with 𝑚 < 𝑛) of that same disjunction is also among the possible
decisions. Moreover, once a first solution 𝑃 * is found (if ever), the theory may switch its mode
of decision in order to force the traversal of candidate solutions that are non-comparable to
𝑃 * (by forbidding literals that could only end up producing less satisfying candidate), so as to
generate the set of preferred solutions and detect ambiguities in a smaller number of iterations.
Theorem 3.1. Our literal decision scheme effectively generates split programs in a preferred order.

Proof. Assume that our procedure does not implement a preferred order of traversal. This
implies that at some point, it produced a split program 𝐶 whereas it should have produced
a split program 𝐶 ′ < 𝐶. So, out of all the ordered disjunctions of the original equation, the
branches selected in 𝐶 ′ are the same as those selected in 𝐶 excepted for 𝑘 ≥ 1 of them which
must be strictly preferred. Consider one of those 𝑘 ordered disjunction, encoded to literals
𝑤1 . . . 𝑤𝑙 according to rule (2) from figure 3. Then for some 1 ≤ 𝑚 < 𝑛 ≤ 𝑙, the model for 𝐶
contains ¬𝑤𝑚 and 𝑤𝑛 while that of 𝐶 ′ contains 𝑤𝑚 and ¬𝑤𝑛 . By definition of our decision
scheme, the theory could not have decided 𝑤𝑛 if 𝑤𝑚 was also a possible decision, thus ¬𝑤𝑚
must have been implied by previous literal assignments in 𝐶, a subset of which must not exist
in 𝐶 ′ (otherwise they would have implied ¬𝑤𝑚 in 𝐶 ′ as well). Since literals directly map to
branches of ordered disjunctions, these differences directly correspond to different branch
selections between 𝐶 and 𝐶 ′ . As implication graphs are directed and acyclic, we can apply this
reasoning at most 𝑘 times until we have to consider the selection of a preferred branch which
cannot be explained in terms of other preferred branches. At this stage, the literals that explain
this selection necessarily correspond to least preferred branches in 𝐶 ′ , which makes 𝐶 and 𝐶 ′
not comparable, therefore contradicting our initial hypothesis.

The Theory Solver On top of guiding the selection of literals, our theory solver must be as
smart as possible in its implementation of the Explain-Conflict routine in order to extract
minimal explanations out of invalid candidates and make the traversal converge as fast as
possible. There are two failure cases to handle when evaluating a candidate solution:

    • The first one is when the evaluation results in ⊥ due to rules Fail or Assert. The
      explanation for such a failure is always a subset of the atoms from the candidate solution.
      In particular, it suffices to include in the explanation all the atoms that we encounter by
      traversing the dependency graph of atom assignments backwards (taking aliasing into
      account) starting from the conflicting assignments or the failing predicate.
    • The second case to handle is when the evaluation is stuck because of a missing variable
      assignment (for example in equation 𝑥 ← 𝐶 ∧ P(𝑥, 𝑦), 𝑦 is never assigned a value). In
      this case we build a clause that forbids the atoms with such unsatisfied dependencies to be
      part of a model if the model does not also contain atoms that satisfy these dependencies –
      which may involve atoms that were not part of the candidate solutions. In the case of a
      cyclic dependency (as in equation 𝑥 ← P(𝑦) ∧ 𝑦 ← P(𝑥)), we build a clause that breaks
      the cycle by requiring an assignment to any of the variables involved in the cycle (as this
      assignment will propagate throughout the cycle).

Optimizing At-Most-One Constraints We initially encoded our AMO constraints using a
naive pairwise encoding, but quickly found instances where it required generating too many
clauses, causing a non-negligible overhead. We experimented with the binary encoding and
found it to be slightly less efficient on the average cases. In the end, we implemented built-in
support for at-most-one constraints in our CDCL algorithm. The idea of adding native support
for cardinality constraints in SAT solvers is not new [23, 24], but for our use case we exploited
     the fact that our encoding produces AMO constraints over contiguous literals (see second
     rule in figure 3), and designed an even more compact in-memory representation for them,
     requiring only a marker flag and two integers determining the range of literals that participate
     in the constraint. Thus, a big constraint such as AMO(𝑥1 , 𝑥2 , . . . , 𝑥998 , 𝑥999 ) only requires us 3
     integers to represent, whereas a pairwise encoding would generate several hundred thousands of
     binary clauses. This allows keeping memory overhead to a minimum and enables fast iterations
     over the constraints’ literals in the unit propagation and conflict resolution routines.

     3.1. Performance Results
     Our new implementation replaced a legacy solver based on the naive algorithm with early
     pruning described in the beginning of this section. We first give an overview of the performance
     improvements obtained on carefully crafted test cases exhibiting known pathological patterns
     that can be found in actual codebases.

     Synthetic Test-Cases The main source of complexity in typechecking Ada comes from the
     use of function overloading. Thus, our first synthetic test case simply declares 𝑁 different
     overloads and performs 𝑀 nested calls to the overloaded function (see figure 4a).
        Our second synthetic test case extends the first one by adding bridges between types (see
     figure 4b), such that while a chain of nested calls of length 𝐾 < 𝑀 may be resolved in many
     different ways, the complete chain of length 𝑀 has only one possible interpretation. This
     effectively tricks the solvers into consider candidates that later become dead ends.


                                                           1   procedure Test is
                                                           2      type T1 is null record;
                                                           3      type T2 is null record;
                                                           4      type T3 is null record;
                                                           5
1    procedure Test is                                     6      function F (X : T1) return T1 is (null record);
2       type T1 is null record;                            7      function F (X : T1) return T2 is (null record);
3       type T2 is null record;                            8      function F (X : T1) return T3 is (null record);
4       type T3 is null record;                            9
5                                                         10      function F (X : T2) return T2 is (null record);
6       function F (X : T1) return T1 is (null record);   11      function F (X : T2) return T3 is (null record);
7       function F (X : T2) return T2 is (null record);   12
8       function F (X : T3) return T3 is (null record);   13      function F (X : T3) return T3 is (null record);
9                                                         14
10      procedure P (X : T1) is null ;                    15      procedure P (X : T1) is null ;
11      procedure P (X : T2) is null ;                    16      procedure P (X : T2) is null ;
12      procedure P (X : T3) is null ;                    17      procedure P (X : T3) is null ;
13                                                        18
14     X : T1;                                            19     X : T1;
15   begin                                                20   begin
16     P (F (F (F (X)) ) ) ;                              21     P (F (F (F (X)) ) ) ;
17   end Test ;                                           22   end Test ;


         (a) Synthetic example 1, for 𝑁 = 𝑀 = 3.                   (b) Synthetic example 2, for 𝑁 = 𝑀 = 3.

     Figure 4: Patterns of code exhibiting known pathological solver behavior.
    (a) Synthetic case 1 (𝑀 = 𝑁 ).       (b) Synthetic case 2 (𝑀 = 3).           (c) Synthetic case 2 (𝑀 = 𝑁 ).
Figure 5: Performance measurements on synthetic examples. A logarithmic scale is needed in plots (a)
and (b) to visualize both curves on the same plot.


   For both cases, we report the total number of atoms evaluated by each implementation, which
is a deterministic measure of the work done that directly correlates to the actual run-time of the
solvers. In 5a, we see that both solvers have polynomial complexity on the first synthetic case,
but of very different factors and degrees (𝑂(𝑁 5 ) versus 𝑂(𝑁 3 )). On the second synthetic case,
the naive implementation completely blows up if we let 𝑀 = 𝑁 . Therefore in 5b, we fixed
𝑀 = 3 and only let 𝑁 vary. Even then, the naive implementation exhibits a 𝑂(𝑁 6 ) behavior,
whereas our new implementation is only quadratic. Finally, in 5c we compare the run-time
of our frontend to the run-time of the the main industrial Ada compiler GNAT [12] on that
second synthetic example, with 𝑀 = 𝑁 . This comparison is by nature flawed as we have to
compare run-time of the complete frontends including parsing, construction of lexical scopes,
etc. However, we could verify that on the tested synthetic cases, name and type resolution
do dominate the run-time in our frontend. What this measurement shows is that while our
solution has much higher constant factors (probably explained by the generic nature of our
framework), the asymptotic run-time of the two frontends is 𝑂(𝑁 4 ).

Real-World Codebases We have stress-tested our new implementation on a variety of actual
Ada codebases featuring different coding styles, from internal 50k SLOCs codebases maintained
by AdaCore to industrial codebases of several millions SLOCs to code excerpts exhibiting known
pathological cases – including instances of those shown above. We measured the total time2
spent by the frontend to analyze each complete codebase with both the legacy solver backend
(𝑡𝑜𝑙𝑑 ) and the DPLL(T)-based solver backend (𝑡𝑛𝑒𝑤 ). In figure 6a, we plot the ratio 𝑡𝑜𝑙𝑑 /𝑡𝑛𝑒𝑤 for
each codebase, which shows significant performance improvements3 . Figure 6b gives an idea of
the effective performance of the frontends in terms of number of source lines of code resolved
per second on five industrial codebases (averaging 1.8M SLOCs).
   Another consequence of the new solver being able to significantly reduce the search space is
2
  Note that since the time spent in the frontend is not exclusively inside the solver, the increase in performance of
  the solver itself needed to justify these speedups is even greater.
3
  G2W-expr is actually one particular source file of the GNAT2Why codebase, and is the only instance which we have
  found to be negatively impacted by the transition. We have yet to investigate the underlying issue.
(a) Ratios of run-time between the new frontends (𝑡old /𝑡new ). (b) Number of lines of code analyzed/s on
    Logarithmic scale.                                              various industrial codebases.
Figure 6: Performance comparison between the legacy frontend and the DPLL(T)-based frontend on
real Ada codebases.


that the transition allowed fixing hundreds of timeouts over several codebases, which were due
to our previous solver being stuck in an exponential exploration of the problem space (but the
timeouts triggered fast enough to have no notable impact on the reported ratios).
   Finally, while it is difficult to make a fair performance comparison between our analyzer and
that of GNAT due to major architectural differences, we have found our implementation to be
competitive in everyday usage.

3.2. Producing Error-Case Diagnostics
IDEs and language servers [25] being one of the primary targets of our meta-compiler framework,
it is important for our name resolution framework to allow emitting helpful error message
when analyzing user code that is semantically invalid, so as to implement live feedback features
of modern IDEs. With this in mind, we recently extended our logic DSL to allow language
designers to annotate their typing rules with minimal efforts so as to convey the information
required for producing a relevant error message for each kind of error of their type system.

   1. We extended the atom constructors to allow attaching abstract contextual information
      to them, which we call logic contexts. Intuitively, this can be used by the language-
      independent solver to regain language-specific knowledge, and can be exploited when a
      candidate solution is rejected to provide insight as to what was being tested.
   2. We extended the interface of atoms to allow language designers to specify a location and
      an error message template that should be instantiated if this atom is part of a conflict
      which caused a candidate solution to be rejected.

  We then extended the behavior of our solver in case it couldn’t resolve a given equation, so
as to run itself a second time in an error-reporting mode. In this mode, it will encounter the
same conflicts as the first time, but will use the explanation produced after each conflict and
extract the logic contexts of the atoms that we know minimally explain the conflicts. It can then
emit an error message by instantiating the failed atom’s error message template with the actual
solve-time values computed so far, and the extracted logic contexts. The key insight is that since
explanations only contain atoms that were relevant for a failure, the error reported to the user
will also only contain contexts that are relevant for the failure, instead of for example showing
overloads that were selected in other sub-expression which cannot in any way intervene in the
failure under investigation.
   Note that we prioritize maximal performance on satisfiable equations, as they represent the
majority of cases even during live code edition. Thus we favor the overhead of re-running
the solver a second time when we find an unsatisfiable instance, rather than the overhead of
consistently populating and keeping around all the data structures required to report an error
message that in most cases will not be needed.
   This work is still in progress (for instance, we cannot emit errors for cases with multiple
solutions yet) but we already found concrete scenarios in which error messages produced by our
generated Ada frontend were more helpful than messages emitted by the GNAT compiler. This
is also a promising lead, using the custom theory we developed, to provide a general framework
to produce useful error messages from the specification of type systems by Langkit users.


4. Related Work
Many specific language implementations, such as the aforementioned Swift’s Swiftc [4] and
Haskell’s GHC [3], have a custom unification-based solver as their core. However, such solvers
are usually specialized for a given type system, and contain built-in primitives that encode the
typing rules of the language they intend to type.

    • In the case of Swift, it was shown that it is possible to generate typing problems where the
      search space is truly pathological [14]. While the design of the solver is not completely
      formalized, it appears relatively clear from the developer documentation [4] that the
      solver has a relatively naive design that explores the tree of solutions, similar to the
      algorithm exposed at the beginning of section 3.
    • In the case of GHC, even though Hindley-Milner type inference has an exponential
      worst case, careful design of the type system seems key in making sure that the solver
      works with a relatively simple unification algorithm while staying efficient in most of the
      real-world use cases.
    • The Rust compiler team also has a project to reimplement the trait part of their type system
      using a constraint solver [26] based on Prolog, using a classical depth-first exploration
      algorithm for solving.

  The takeaway is that languages with complex type systems seem to benefit from a solver-
based approach where accumulation of constraints is separated from solving the constraint
system. In that light, having a generalized approach based on SMT seems like a good foundation
for future work on complex type systems.
  On the meta-compiler front, the closest related work that we know of is Spoofax [27]. In [28],
they present their framework for modeling static semantics of languages. However, they mention
that their framework is not capable of specifying languages with subtyping and type-based
overloading. In a more recent paper [29], they address some of those issues, but it seems more
limited in several aspects.

    • The constraint unification algorithm seems to be a regular unification algorithm, and no
      effort is yet made to optimize exponential search-space problems, which is the very point
      of the approach described in this paper.
    • No effort was yet made to see whether the approach would scale to real-world use cases.
      Finally, there is not yet a framework to generate error messages when the constraints are
      unsatisfiable.

   MPS [30] (short for Meta-Programming System) is another very powerful framework and
language development workbench by Jetbrains. It covers everything from IDE integration to
code generation, and provides a language based on constraints to describe the type system of
the object language. While the constraint language seems to be powerful enough to express
type systems with subtyping, it seems to have built-in support for a lot of notions, and thus to
have a meta type theory that is much more rigid than what is allowed by our framework. To the
best of our knowledge there is no support for anything other than simple name-resolution rules,
and thus anything like type-based overloading resolution seems impossible to express in their
framework. Finally, while there is a user guide, there is no formal description of the capabilities
of the high level constraint language, nor of the underlying solver, so a lot of observations we
are able to make are hard to confirm/infirm.


5. Future Work
So far, Langkit has really been used to define one main type system, and a limited number of
minor experiments. One of the big long term goals would be to express other type systems with
different paradigms, such as structural-subtyping, or other systems with a lot of inference such
as the Swift or Rust type systems, and see how efficient and expressive our framework is for
those. In terms of solver optimizations, there are several leads to explore as well:

    • Our current solver does not perform theory propagation, which has shown in some cases
      to be a game changer [31].
    • Although we have been able to take advantage of extensive research on SAT solvers to
      implement various optimizations on our own implementation (conflict analysis, two-
      watched literals, blocking literals, etc.), we also want to try plugging existing state-of-the-
      art SAT solver backends, after modifying them to support our extensions.
    • We want to experiment encoding some properties of our logic more eagerly, such as
      dependencies between atoms, in order to minimize the amount of rounds needed to
      converge.

  However these three leads require careful experimentation, as the potential overhead brought
by each of them on simple instances may affect overall performance negatively even if the
benefit is high on complex cases, as semantic analyzers also spend a lot of time resolving trivial
equations.
References
 [1] R. Milner, A theory of type polymorphism in programming, J. Comput. Syst. Sci. 17
     (1978) 348–375. URL: https://doi.org/10.1016/0022-0000(78)90014-4. doi:10.1016/0022-
     0000(78)90014-4.
 [2] F. Pottier, D. Rémy, The Essence of ML Type Inference, 2005, pp. 389–489.
 [3] S. Peyton Jones, Type inference as constraint solving: how GHC’s type inference engine
     actually works, Zurihac keynote talk, 2019. URL: https://www.microsoft.com/en-us/
     research/publication/type-inference-as-constraint-solving-how-ghcs-type-inference-
     engine-actually-works/.
 [4] Swift type checker design and implementation, https://apple-swift.readthedocs.io/en/
     latest/TypeChecker.html, 2017.
 [5] Langkit, https://github.com/AdaCore/langkit, 2023.
 [6] H. Ganzinger, K. Ripken, Operator identification in Ada: Formal specification, complexity,
     and concrete implementation, ACM Sigplan Notices 15 (1980) 30–42.
 [7] T. Pennello, F. DeRemer, R. Meyers, A simplified operator identification scheme for Ada,
     SIGPLAN Not. 15 (1980) 82–87. URL: https://doi.org/10.1145/947680.947688. doi:10.1145/
     947680.947688.
 [8] T. P. Baker, A one-pass algorithm for overload resolution in ada, ACM Trans. Program.
     Lang. Syst. 4 (1982) 601–614.
 [9] D. A. Mundie, D. A. Fisher, Optimized overload resolution and type matching for ada,
     Ada Lett. XI (1991) 83–90. URL: https://doi.org/10.1145/112630.112640. doi:10.1145/
     112630.112640.
[10] D. Zhang, A. C. Myers, D. Vytiniotis, S. L. P. Jones, Diagnosing type errors with class,
     in: D. Grove, S. M. Blackburn (Eds.), Proceedings of the 36th ACM SIGPLAN Conference
     on Programming Language Design and Implementation, Portland, OR, USA, June 15-17,
     2015, ACM, 2015, pp. 12–21. URL: https://doi.org/10.1145/2737924.2738009. doi:10.1145/
     2737924.2738009.
[11] Libadalang, https://github.com/AdaCore/libadalang, 2023.
[12] GNAT: The GNU Ada compiler, https://files.adacore.com/gnat-book/, 2004.
[13] Lambda expressions vs. anonymous methods, part five, https://learn.microsoft.com/en-us/
     archive/blogs/ericlippert/lambda-expressions-vs-anonymous-methods-part-five, 2007.
[14] Exponential time complexity in the Swift type checker, https://www.cocoawithlove.com/
     blog/2016/07/12/type-checker-issues.html, 2016.
[15] Dependz, https://github.com/Roldak/Dependz, 2020.
[16] G. Brewka, S. Benferhat, D. L. Berre,          Qualitative choice logic,     Artif. Intell.
     157 (2004) 203–237. URL: https://doi.org/10.1016/j.artint.2004.04.006. doi:10.1016/
     j.artint.2004.04.006.
[17] G. Brewka, Logic programming with ordered disjunction, CoRR cs.AI/0207042 (2002).
     URL: https://arxiv.org/abs/cs/0207042.
[18] R. Confalonieri, J. Nieves, Nested Logic Programs with Ordered Disjunction, volume 677,
     2010, pp. 55–66.
[19] Ada 2022 reference manual, chapter 8.6, paragraph 29, http://www.ada-auth.org/standards/
     22rm/html/RM-8-6.html, 2022.
[20] R. Nieuwenhuis, A. Oliveras, C. Tinelli, Solving SAT and SAT modulo theories: From an
     abstract Davis–Putnam–Logemann–Loveland procedure to DPLL(T), J. ACM 53 (2006)
     937–977. URL: https://doi.org/10.1145/1217856.1217859. doi:10.1145/1217856.1217859.
[21] V.-H. Nguyen, V.-Q. Nguyen, K. Kim, P. Barahona, Empirical study on SAT-encodings of
     the at-most-one constraint, in: The 9th International Conference on Smart Media and Ap-
     plications, SMA 2020, Association for Computing Machinery, New York, NY, USA, 2021, p.
     470–475. URL: https://doi.org/10.1145/3426020.3426170. doi:10.1145/3426020.3426170.
[22] M. W. Moskewicz, C. F. Madigan, Y. Zhao, L. Zhang, S. Malik, Chaff: Engineering an
     efficient SAT solver, in: Proceedings of the 38th Annual Design Automation Conference,
     DAC ’01, Association for Computing Machinery, New York, NY, USA, 2001, p. 530–535.
     URL: https://doi.org/10.1145/378239.379017. doi:10.1145/378239.379017.
[23] J. Maglalang, Native cardinality constraints: More expressive, more efficient constraints
     (2012).
[24] M. H. Liffiton, J. C. Maglalang, A cardinality solver: More expressive constraints for free,
     in: A. Cimatti, R. Sebastiani (Eds.), Theory and Applications of Satisfiability Testing – SAT
     2012, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012, pp. 485–486.
[25] J. K. Rask, F. P. Madsen, N. Battle, H. D. Macedo, P. G. Larsen, The specification language
     server protocol: A proposal for standardised LSP extensions, in: J. Proença, A. Paskevich
     (Eds.), Proceedings of the 6th Workshop on Formal Integrated Development Environment,
     F-IDE@NFM 2021, Held online, 24-25th May 2021, volume 338 of EPTCS, 2021, pp. 3–18.
     URL: https://doi.org/10.4204/EPTCS.338.3. doi:10.4204/EPTCS.338.3.
[26] The Chalk book, https://rust-lang.github.io/chalk/book/, 2023.
[27] L. C. L. Kats, E. Visser, The spoofax language workbench, in: W. R. Cook, S. Clarke,
     M. C. Rinard (Eds.), Companion to the 25th Annual ACM SIGPLAN Conference on Object-
     Oriented Programming, Systems, Languages, and Applications, OOPSLA 2010, part of
     SPLASH 2010, October 17-21, 2010, Reno/Tahoe, Nevada, USA, ACM, 2010, pp. 237–238.
     URL: https://doi.org/10.1145/1869542.1869592. doi:10.1145/1869542.1869592.
[28] H. Antwerpen, P. Neron, A. Tolmach, E. Visser, G. Wachsmuth, A constraint language
     for static semantic analysis based on scope graphs, 2016, pp. 49–60. doi:10.1145/
     2847538.2847543.
[29] H. van Antwerpen, C. B. Poulsen, A. Rouvoet, E. Visser, Scopes as types, Proc. ACM
     Program. Lang. 2 (2018) 114:1–114:30. URL: https://doi.org/10.1145/3276484. doi:10.1145/
     3276484.
[30] A. Bucchiarone, A. Cicchetti, F. Ciccozzi, A. Pierantonio (Eds.), Domain-Specific Languages
     in Practice: with JetBrains MPS, Springer, 2021. URL: https://doi.org/10.1007/978-3-030-
     73758-0. doi:10.1007/978-3-030-73758-0.
[31] R. Nieuwenhuis, A. Oliveras, DPLL(T) with exhaustive theory propagation and its applica-
     tion to difference logic, volume 3576, 2005, pp. 321–334. doi:10.1007/11513988_33.