Lifting congruence closure with free variables to
               λ-free higher-order logic via SAT encoding
                                          (work in progress)
    Sophie Tourret1 , Pascal Fontaine2,3 , Daniel El Ouraoui2 , Haniel Barbosa4 ∗†
                          1
                             Max-Planck-Institut für Informatik, Saarbrücken, Germany
                    2
                         University of Lorraine, CNRS, Inria, and LORIA, Nancy, France
                                      3
                                        Université de Liège, Liège, Belgium
                        4
                            Universidade Federal de Minas Gerais, Belo Horizonte, Brazil


                                                    Abstract
            Recent work in extending SMT solvers to higher-order logic (HOL) has not explored
       lifting quantifier instantiation algorithms to perform higher-order unification. As a con-
       sequence, widely used instantiation techniques, such as trigger- and particularly conflict-
       based, can only be applied in a limited manner. Congruence closure with free variables
       (CCFV) is a decision procedure for the E-ground (dis-)unification problem, which is at
       the heart of these instantiation techniques. Here, as a first step towards fully supporting
       trigger- and conflict-based instantiation in HOL, we define the E-ground (dis-)unification
       problem in λ-free higher-order logic (λfHOL), an extension of first-order logic where func-
       tion symbols may be partially applied and functional variables may occur, and extend
       CCFV to solve it. To improve scalability in the context of handling higher-order vari-
       ables, we rely on an encoding of the CCFV search as a propositional formula. We present
       a solution reconstruction procedure so that models for the propositional formula lead to
       solutions for the E-ground (dis-)unification problem. This is instrumental to port trigger-
       and conflict-based instantiation to be fully applied in λfHOL.


1      Introduction
Higher-order (HO) logic is a pervasive setting for reasoning about numerous real-world appli-
cations. In particular, it is widely used in proof assistants to provide trustworthy, formal, and
machine-checkable proofs of theorems. A major challenge in this setting is to automate proofs
as much as possible, thereby making the proof assistants easier to use. A successful approach
to this automation challenge is hammering [10]. It consists in encoding proof obligations into
first-order (FO) logic and use first-order provers to discharge them. A similar layered approach
is also used by HO theorem provers such as Leo-III [21] and Satallax [12], which regularly invoke
FO provers to discharge intermediate goals that depend solely on FO reasoning.
    Such approaches have known performance, soundness, or completeness issues due to the
black-box integration between FO and HO reasoning [8, 16, 23]. To mitigate these problems,
recent work [4,6,7,9,23] has focused on extending FO provers, based on the superposition calcu-
lus [1, 17] or on SMT solving [5], to natively support HOL, so that the integration between the
highly efficient FO component and the new HO one is graceful, i.e. the prover behaves mostly
as its first-order counterpart on FO problems and also handles HO problems natively. The
HO extension of SMT solvers [4] has not explored lifting quantifier instantiation algorithms to
    ∗ The order of authors is inverse alphabetic.
  † The work has received funding from the European Research Council (ERC) under the European Union’s

Horizon 2020 research and innovation program (grant agreement No. 713999, Matryoshka).


François Bobot, Tjark Weber (eds.); SMT 2020, pp. 3–14
Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
Lifting CCFV to λfHOL via SAT encoding (WIP)                  Tourret, Fontaine, El Ouraoui and Barbosa


perform higher-order unification. As a consequence, standard instantiation techniques, such as
trigger-based [13, 14], conflict-based [3, 19], model-based [15, 20] and enumerative [18] instantia-
tion, can only be applied in a limited manner. Moreover, the efficient implementation of these
techniques depends on specific term indices that must be adapted to handle equalities between
functions and a curried representation of terms [4, Sect. 3.4].
    Congruence closure with free variables (CCFV) [3] is a framework for casting instantiation
techniques in SMT. It is based on solving the E-ground (dis)unification problem, i.e. given a
conjunctive set of ground equality literals E and a conjunctive set of (possibly non-ground)
equality literals L, it finds substitutions σ such that E |= Lσ holds. These substitutions lead
to instantiations following the instantiation technique being used.

Example 1. Consider an E-ground (dis)unification problem with E = {f (a) ' f (b), h(a) '
h(c), g(b) 6' h(c)} and L = {h(x1 ) ' h(c), h(x2 ) 6' g(x3 ), f (x1 ) ' f (x3 ), x4 ' g(x5 )}. CCFV
solves E |= Lσ by building a set of equality constraints Eσ so that E ∪ Eσ |= L:
    • h(x1 ) ' h(c): either x1 ' c or x1 ' a belongs to Eσ ;
    • h(x2 ) 6' g(x3 ): either x2 ' c ∧ x3 ' b or x2 ' a ∧ x3 ' b belongs to Eσ ;
    • f (x1 ) ' f (x3 ): either x1 ' x3 or x1 ' a ∧ x3 ' b or x1 ' b ∧ x3 ' a must be in Eσ ;
    • x4 ' g(x5 ): the literal itself must be in Eσ .
One solution is thus Eσ = {x1 ' a, x2 ' a, x3 ' b, x4 ' g(x5 )}, corresponding to the substitu-
tion σ = {x1 7→ a, x2 7→ a, x3 7→ b, x4 7→ g(x5 )}.
     By extending CCFV to perform HO unification effectively we can in one fell swoop remove
the limitations for HOL of all the instantiation techniques supported by the framework. This can
be specially helpful for techniques that heavily depend on unification (as opposed to matching),
such as conflict-based instantiation, which has been shown to be particularly effective for proof
obligations originating from proof assistants [19].
     Given the complexity of the task at hand, we follow previous approaches by proceeding in a
stepwise manner [7,23], first extending CCFV to λ-free HOL (λfHOL), a fragment of HOL with
partial applications and functional variables. In this effort we quickly realized that extending
CCFV as a tableaux-like calculus, as it was originally presented [3], posed a strong limitation
on our ability to add new features to the framework. Moreover, the added complications of
handling functional variables and a curried representation of terms, let alone lambda terms in
the future, indicated that to have a scalable implementation we should handle the combinatorial
nature of the search more efficiently than via regular backtracking. We thus present a lifting of
CCFV to λfHOL via an encoding into an equisatisfiable SAT problem (Sect. 4). The encoding is
based on fully reducing the search for substitutions to SAT based on the entailment conditions
of literals in L (Sect. 4.1), after they have been preprocessed (Sect. 3), while taking into account
the dependencies between variables due to cycles (Sect. 4.2) and congruence (Sect. 4.3). We
present a solution reconstruction procedure (Sect. 5) so that satisfiable assignments lead to
substitutions solving the original problem.1


2     Preliminaries and problem statement
We work in λ-free higher-order logic (λfHOL) with Henkin semantics, following Bentkamp et
al. [7]. We introduce here the relevant notions of this logic.
   1 Proofs of the results are available as an appendix in the extended version of this paper at http:

//matryoshka.gforge.inria.fr/pubs/lfhoccfv_wip.pdf


4
Lifting CCFV to λfHOL via SAT encoding (WIP)                          Tourret, Fontaine, El Ouraoui and Barbosa


     We use a monomorphic type system equipped with a set of atomic types S and a binary
function → such that given two types τ , ν, the type τ → ν is the type of functions from τ to
ν. The sets Σ and V respectively contain the function symbols (a, b, f , g...) and variables (w,
x, y, z) upon which terms are built. Each symbol and variable is annotated with a type, e.g.
f : τ → τ , but the types will be omitted when irrelevant or obvious, i.e. almost all the time
except in the following definition. Terms are defined as: u = a | x | (u1 : τ → ν) (u2 : τ ) where
a ∈ Σ, x ∈ V and u1 , u2 are terms. Note that this entails a curried representation of terms. A
term is ground if it does not contain variables. We always denote terms using u plus various
sub- and superscripts and ground terms using t instead. Subterms of u are u, u1 , u2 and all
subterms of u1 and u2 (if they occur). Strict subterms are all subterms excluding u itself. The
notations u[u0 ] and u[u0 ]s respectively denote that u0 is a subterm or a strict subterm of u.
     Literals are equalities (u1 ' u2 ) and disequalities (u1 6' u2 ) of terms, respectively denoted
positive and negative literals. It is possible to handle predicate literals, e.g. Q and ¬Q, implicitly
as equations and disequations, e.g. Q ' > and Q 6' >, by defining a binary type for Booleans
and an interpreted symbol > of this type. This is used in practice to apply the techniques
presented here to predicates but it does not impact the theory so we do not mention it further
in this paper. Sets of literals are denoted by L, L0 , etc., and sets of ground literals by E. The
set of all terms, subterms included, that occur in a set of literals L is denoted by T(L). The
set of all ground terms in L is denoted by Tg (L).
     Substitutions, denoted σ, are functions that map variables to terms such that only finitely
many of them are not mapped to themselves. They are extended as usual to apply on terms and
sets of terms, written in postfix notation. The domain of σ is dom(σ) = {x | x ∈ V and xσ 6= x}
and its range is ran(σ) = {xσ | x ∈ dom(σ)}. A substitution is ground if its range is a set of
ground terms. It is acyclic if each variable x never occurs as a subterm of xσ n for any n > 0.
The fixpoint of an acyclic substitution always exists and is denoted σ ? .
     An interpretation I assigns: 1. to each atomic type τ a non-empty set I(τ ); 2. to each type
τ → ν a subset I(τ → ν) of the function space from I(τ ) to I(ν); 3. to each function or variable
u : τ in Σ ∪ V an element I(u) in I(τ ). I naturally extends to an interpretation I(u) for each
term u. An equation u1 ' u2 is entailed by an interpretation I if and only if I(u1 ) = I(u2 ),
and a disequation u1 6' u2 is entailed by I if and only if I(u1 ) 6= I(u2 ). An interpretation I is
a model of a set of literals L if it entails all of them. This is denoted by I |= L. By extension,
we write E |= L when every model of E is also a model of L.2
     Given a set of ground literals E, the congruence closure of E is the partition into classes of
all ground terms such that two ground terms t1 and t2 belong to the same class if and only if
E |= t1 ' t2 . Given additionally a set of literals L, the restriction of the congruence closure to
Tg (E ∪ L) is denoted E cc . The notation [t] denotes the E-congruence class in which t occurs.3
The representative of [t] is a chosen element in the class. The notation [t] ∈ E cc denotes that
E |= t ' t0 for some ground term t0 ∈ Tg (E ∪ L). Notice that, if [t] ∈ E cc , then [t0 ] ∈ E cc for
any subterm t0 of t. We abuse this notation by writing [u] ∈ E cc and u ∈ [t] for non-ground
terms u to indicate that we want u, or rather uσ for some grounding substitution σ, to belong
to an E-equivalence class that exists in E cc or to a particular class [t] ∈ E cc respectively.
We denote by [[t]] the set of signatures in [t], that is, all the pairs of classes [t1 ][t2 ] such that
[t1 t2 ] = [t].
Example 2. Let E = {a ' f a, g ' f, g b ' h c} and L = {x ' y d}. Then the congruence
closure E cc = {{a, f a}, {f, g}, {g b, h c}, {b}, {c}, {d}, {h}}. In the full congruence closure of E,
the class [a] is infinite since it includes all terms of the form f n a and g n a for n ≥ 0, among
   2 This is a simplification of the actual formalism by Bentkamp et al. [7].
   3 Note that [t] alone refers to the class of t, while u[t] refers to a term u with a subterm t.


                                                                                                             5
Lifting CCFV to λfHOL via SAT encoding (WIP)                        Tourret, Fontaine, El Ouraoui and Barbosa


others. Entailment also goes beyond E cc , e.g. E |= f b ' h c and E |= a ' f (g a). Moreover
[[a]] = {[f ][a]}, [[g b]] = {[f ][b], [h][c]} and the other signatures are empty.
    Following Barbosa et al. [3], we present below the definition of the E-ground (dis)unification
problem in λfHOL and the theorem characterizing its set of solutions. Although both statements
coincide with their first-order logic (FOL) counterparts, in λfHOL the problem and its solutions,
if they exist, may include functional variables, which are now part of the set of terms over which
substitutions range. Nevertheless, the lifting to λfHOL is straightforward since it can be directly
encoded into FOL (e.g. by means of an applicative encoding).
Definition 3 (E-ground (dis)unification). Given two finite sets of equational literals E and
L, where E is ground, the E-ground (dis)unification problem is that of finding substitutions σ
such that E |= Lσ.
Theorem 4. Given an E-ground (dis)unification problem, if a substitution σ exists such that
E |= Lσ, then there is an acyclic substitution σ 0 such that ran(σ 0 ) ⊆ T(E ∪ L), σ 0? is ground,
and E |= Lσ 0? .

Proof. Let app denote the encoding of terms from λfHOL to FOL.4 Let us assume that there
exists a σ such that E |= Lσ. Then app(E) |= app(Lσ) = app(L)app(σ). The theorem we want
to prove holds in first-order logic (see Theorem 1 in [3]) thus there exists an acyclic substitution
σ 0 such than ran(σ 0 ) ⊆ Tapp (app(E) ∪ app(L)), σ 0? is ground, and app(E) |= app(L)σ 0? . Then
the substitution σ 00 , where app(σ 00 ) = σ 0 is also acyclic and such that ran(σ 00 ) ⊆ T(E ∪ L) and
E |= Lσ 00? .

Example 5. Let E = {f ' g, f a ' b, h c ' b} and L = {f (x a) ' g b}. The E-
(dis)unification problem for E and L admits the solutions σ1 = {x 7→ f } and σ2 = {x 7→ g}.
It is possible to encode this problem in FOL via the applicative encoding. The result is
E@ = {f ' g, @(f, a) ' b, @(h, c) ' b} and L@ = {@(f, @(x, a)) ' @(g, b)}.
   In FOL, CCFV heavily relies on the head symbols to filter potential unification candidates.
This mechanism is greatly hindered in the applicative fragment since all applied (sub-)terms
have @ at their head. This observations motivated the creation of the approach presented here.


3     Preprocessing
To ease the SAT encoding (Sect. 4) we assume that a series of standard preprocessing tech-
niques, shown below, have been applied to L so that it comprises only literals of the form x 6' y,
x 6' t or u0 ' u1 u2 , where at least one of u1 and u2 is a variable.

Normalizing. A set of literals is E-normalized, abbreviated as normalized because E is
always clear from the context, if every ground term it contains is the representative of its
congruence class modulo E. A set of literals can be normalized by replacing all occurrences of
ground terms by their representative.

Example 6. Given the problem E = {(f a) b ' (f b) a, g b ' g c, h1 ' h2 , g ' f a, h1 6' h3 },
L = {h1 x 6' b, x ' (f a) y, h1 ((f x) b) ' a, g b ' (f x) y, (f a) a ' g b}, the non-singleton
     4 WiP note: The applicative encoding allows to encode λfHOL with Henkin semantics to FOL with standard

semantics. In future work, we plan to replace this proof with one that does not rely on the applicative encoding.
It is also described, e.g., by Barbosa et al. [4].


6
Lifting CCFV to λfHOL via SAT encoding (WIP)                  Tourret, Fontaine, El Ouraoui and Barbosa


classes in E cc are [g] = {gg , f a}, [(f a) b] = {g b, g cc, (f a) b, (f b) a} and [h1 ] = {h1 , h2 }
where bold font t identifies the representative term in a class [t]. The normalized L is thus
Lnorm = {hh2 x 6' b, x ' g y, h2 ((f x) b) ' a, g c ' (f x) y, g a ' g cc}.

Removing ground literals. Eliminating ground literals from L amounts to replace them
by > if they are entailed by E and by ⊥ otherwise, followed by removing all occurrences of
> from L. If there is any occurrence of ⊥ then L itself becomes {⊥} since the E-ground
(dis)unification problem is then unsatisfiable.

Example 7. Consider E and L as in the previous example. Removing ground literals from
Lnorm produces Lnon-ground = {⊥} because its last equation is not entailed by E. If we consider
instead L0 = L\{(f a) a ' g b} then L0norm = Lnorm \{g a ' g c} and finally L0non-ground = L0norm .

Flattening. A set of literals is flattened if each of its literals is of the form x ' t, x ' y,
x 6' t, x 6' y, or u0 ' u1 u2 where x and y are variables, t is a ground term, and u0 , u1 , u2 are
either variables or ground terms with at least u1 or u2 being a variable. Any set of literals can
be flattened by introducing new variables.

Example 8. Consider L0 as in the previous example. The flattened version of L0non-ground is
L0flat = {z1 6' b, z1 ' h2 x, x ' g y, a ' h2 z2 , z2 ' z3 b, z3 ' f x, g c ' z3 y}.

Trivial assignments.        A set of literals L has trivial assignments if it contains literals of the
following form:

   • x ' t, where x is a variable and t is ground;
   • x ' y, where x, y are variables;
   • x ' u, where variable x does not occur elsewhere in L including in u, and u is any term
     (not containing x).

As long as there is a trivial assignment x ' u0 in L it is possible to consider the equivalent
problem where L is replaced by L0 = (L \ {x ' u0 })σ instead, where σ = {x 7→ u0 }. In the
third case, this simply amounts to removing the x ' u equation from L.
    Since L0 may contain new ground literals and trivial assignments, as well as ground terms
that are not in normal form, it is necessary to iterate on normalization, ground literals simplifica-
tion and trivial assignments instantiation until all such literals have been removed. Eliminating
trivial assignments might render a literal ground, but otherwise flattening is not impacted by
this transformation. Indeed, none of the three cases of trivial assigments will ever lead to the
replacement of a variable inside a literal by an applied non-ground term. This process termi-
nates since each step eliminates one variable from L among finitely many. Sect. 5 provides a
way to build a solution for L from a solution for L0 .

Example 9. L0flat from the previous example contains no trivial assignments.

Example 10. Let L = {x ' f a, y ' x b, z ' y z}. Assume that L is already normalized for
a given E. Note that L is also flattened and without ground literals. However, it contains the
trivial assignment x ' f a since f a is ground. Applying the previously described procedure
yields L0 = {y ' (f a) b, z ' y z}. This new set is still flattened and without ground literals but
it contains a new trivial assignment, namely y ' (f a) b. Assume (f a) b is normalized. After
another iteration of the procedure, the remaining problem is L00 = {z ' ((f a) b) z}.

                                                                                                     7
Lifting CCFV to λfHOL via SAT encoding (WIP)                   Tourret, Fontaine, El Ouraoui and Barbosa


Example 11. Let L = {y ' x a, z ' x, z ' f x, g 6' x c}. As in the previous example, L
is flattened and without ground literals and we assume it is normalized for a given E. Both
literals y ' x a and z ' x are trivial assignments and are to be eliminated. The preprocessing
chain yields L0 = {x ' f x, g 6' x c}.
   A flattened normalized set of literals without trivial assignments and ground literals is called
a preprocessed set of literals.


4     Encoding CCFV as a SAT problem
Every solution of the E-ground (dis)unification problem is a substitution: it maps variables to
terms. Thanks to Theorem 4 we know that if a generic solution exists, then a ground one can
always be found. Thus it is enough to search for ground solutions to answer the general problem.
Such a ground solution associates each variable to a ground term belonging to a particular E-
congruence class, in or out of E cc .    By considering all classes from E cc and merging all
                        cc
the classes outside of E together, the association between variables and classes turns into a
combinatorial problem with finitely many possibilities. Barbosa et al. [3] presented CCFV, a
decision procedure for E-ground (dis)unification, as a tableaux-like procedure that decomposes
L in a top-down manner so that the possibilities for mapping variables are increasingly reduced.
The decompositions are based on the entailment conditions for the literals in L, according to
their structure. However, considering the entailment conditions in the presence of functional
variables and a curried representation of terms, particularly when we move beyond λfHOL
and need to take higher-order unification into account, opens up more possibilities that cannot
be handled in such a high-level procedure without a severe loss of efficiency or an extremely
intricate design.
    As an alternative solution aiming for greater flexibility and better performance, here we
tackle the combinatorial problem by, from the get-go, fully decomposing each literal in L as a
propositional disjunction over all the conditions for its entailment, relying on a SAT solver to
determine an assignment that respects all conditions (Sect. 4.1). Moreover, by reducing the
problem to SAT we also need to encode properties that were handled silently at the first-order
level by term indexing and by the underlying ground congruence closure component, namely
cyclic dependencies between variables (Sect. 4.2) and variables assigned by derived congruence
reasoning (Sect. 4.3).

4.1    Encoding’s core
We assume L is preprocessed. The set L thus only contains literals of the form x 6' u and
u0 ' u1 u2 where x is a variable and u, u0 , u1 , u2 are variables or ground terms, at least one
of u1 and u2 being a variable. The problem is encoded into a set C of formulas, which is the
union of all C` for ` ∈ L.
                                  W
            Cx6'u :                 [t1 ],[t2 ]∈E cc ,E|=t1 6't2 (x ∈ [t1 ] ∧ u ∈ [t2 ])
                                  W
        Cu0 'u1 u2 : [u0 ] 6∈ E ∨ [t0 ]∈E cc , [t1 ][t2 ]∈[[t0 ]] (u0 ∈ [t0 ] ∧ u1 ∈ [t1 ] ∧ u2 ∈ [t2 ])
                               cc


The intuition behind this encoding is as follows. Assume there exists a solution σ to the E-
ground (dis)unification problem. For negative literals, E |= (x 6' u)σ holds only if there exist
t1 , t2 such that E |= t1 6' t2 and xσ ∈ [t1 ], uσ ∈ [t2 ]. Moreover, for E |= t1 6' t2 to hold it
must be the case that [t1 ], [t2 ] ∈ E cc . For positive literals, E |= (u0 ' u1 u2 )σ holds if u0 σ
and (u1 u2 )σ are in the same congruence class. If [u0 σ] ∈ E cc , we know exactly which kind of

8
Lifting CCFV to λfHOL via SAT encoding (WIP)                         Tourret, Fontaine, El Ouraoui and Barbosa


applied term can belong to this class: any term t1 t2 such that [t1 ][t2 ] ∈ [[u0 σ]]. Thus in that
case it is enough to consider all possible signatures of the class that u0 is assigned to.5
    All membership tests where u or ui is not a variable are simplified in the above formulas
(to true or false). In the literals that remain, u and ui are always variables and literals are of
the form x ∈ [t] or [x] ∈/ E cc for some variable x and some class [t] ∈ E cc . A new propositional
variable Px,[t] is introduced for each literal x ∈ [t] occurring in any C` . Another propositional
variable Qx is introduced for each literal x ∈ / E cc occurring in any C` .
    Note that since a variable cannot be mapped to two distinct classes at the same time, we
also encode that the Px,[t] that occur in the encoding are mutually exclusive between themselves
and with Qx , for each x ∈ V.
    As illustrated by the following example, the above encoding is not enough to represent the
problem in the general case.
Example 12. Let E = {a ' f b, a ' b, f 6' f 0 , k ' h f } and L = {x ' y x, z ' g y, z '
v f, y 6' y 0 } where x, y, y 0 , z, and v are variables. The only non-singleton classes in E cc
are [a] = {a, b, f b} and [k] = {k, h f }. The non-empty signatures are [[a]] = {[f ][a]} and
[[k]] = {[h][f ]} The set L is already preprocessed. Then

                  Cx'y x = Qx ∨ (Px,[a] ∧ Py,[f ] ),        Cz'g y = Qz ,
                   Cz'v f = Qz ∨ (Pz,[k] ∧ Pv,[h] ),        Cy6'y0 = (Py,[f ] ∧ Py0 ,[f 0 ] ) ∨ (Py,[f 0 ] ∧ Py0 ,[f ] ).

and the relevant mutual exclusion constraints are ¬Py,[f ] ∨ ¬Py,[f 0 ] , ¬Qx ∨ ¬Px,[a] , ¬Py0 ,[f ] ∨
¬Py0 ,[f 0 ] , and ¬Qz ∨ ¬Pz,[k] .
    Note that in Cx'y x the disjunct Px,[k] ∧ Py,[h] ∧ Px,[f ] does not occur, because it assigns x
to two distinct classes. Since this trivially contradicts the mutual exclusion constraints, this
disjunct can never be true.6 Another noteworthy point is that Cz'v f is redundant to Cz'g y .
This could also be detected during the encoding and simplified. It would prevent the addition
of the useless mutual exclusion constraint on z-related literals.
    Note that one model of the obtained formula is {Qx , Qz , Py,[f ] , Py0 ,[f 0 ] }, which implies in
particular that [x] 6∈ E cc . However, all the solutions to the current problem must map x to
[a] ∈ E cc because it is the only class that contain both a term and its image by f .
   In fact, this issue happens for a whole family of variables, that we denote as cyclic variables.
An extra constraint is required to handle them separately.

4.2     Cycle-based constraints
Formally, a variable is cyclic in a set of terms L if L |= x ' u[x]. The cycle can be directly
apparent in an equation in L as in the previous example, but it can also be less obvious,
as in L = {x ' y a, y ' g x} where both x and y are cyclic. Cyclic variables in an E-
ground (dis)unification problem must all be mapped to terms in E cc . This is a consequence of
Lemma 13.

Lemma 13. Let E be a set of ground equational terms. Let t and t0 [t]s be ground terms. If
E |= t ' t0 [t]s then [t] ∈ E cc .
    5 WiP note: These informal statements will be formalized as a lifting to λfHOL of Theorem 4.4 in [2], which

captures the conditions in FOL for solving the E-ground (dis)unification problem for a given literal in L based
on its structure.
    6 Even if this was not the case, there would still be a typing incompatibility in allowing x to be mapped to

[a] and to [f ] in different disjuncts since a and f must have distinct types.


                                                                                                                       9
Lifting CCFV to λfHOL via SAT encoding (WIP)                                 Tourret, Fontaine, El Ouraoui and Barbosa


                                         x            y         z        v         y0

                              Figure 1: Dependency graph of L from Ex. 12

Corollary 14. Given an E-ground (dis)unification problem, if x has a cyclic definition in L
then [x] ∈ E cc .
Proof. If L |= x ' u[x]s and E |= Lσ for some ground σ then E |= xσ ' (u[x]s )σ. By Lemma
13, [xσ] ∈ E cc .
Example 15. In Ex. 12, only x is cyclic. It is thus enough to add the unit clause ¬Qx to the
encoding to restrict models to those that map x to a class in E cc .
    The dependency between variables can be encoded as a directed graph where a vertice
represents a variable and an edge from the vertex x to the vertex y indicates the presence of
an equation x ' u1 u2 in L where y is either of u1 and u2 . Figure 4.2 represents this graph for
the set of literals L from Ex. 12.
    Thus it is possible to use any algorithm that enumerates the cycles in a graph to find all
cyclic variables in L. For our prototype implementation, we used a naive algorithm but linear
algorithms are known, e.g. Tarjan’s strongly connected components algorithm [22]. Once the
cycles are detected, the additional constraints ¬Qi must be added to the encoding for each
cyclic variable as well as for all variables reachable from cyclic variables in the dependency
graph.7 The later is required because all subterms of a term in E cc must also be in E cc .
    The core encoding and the cycle-based constraints ensure that all the variables that must
be mapped to terms in E cc indeed are. However, the variables that can be mapped outside of
E cc may also have extra constraints that are not captured by what has been described so far.
Example 16. Consider Ex. 12, with the addition of ¬Qx to the encoding. A model of the
resulting formula is {Px,[a] , Py,[f ] , Py0 ,[f 0 ] , Qz }. This does not tell us anything about v, which
means that this variable could be mapped to any (type-compatible) class. However, mapping v
to [h] does not lead to a valid solution. This would force z to be equal both to g f and to h f ,
although [g f ] 6= [h f ], which is impossible.
   Further constraints are required on the variables that can be mapped outside of E cc . We
denote those variables as floaters.

4.3      Floater-based constraints
For a variable x, being a floater means it occurs on the left-hand side of an equation in L and
that [x] 6∈ E cc . It follows from the latter that any equation x ' u1 u2 ∈ L where x is a floater
must either be a tautology or hold by congruence. In both cases, it means that if another
equation of the same form x ' u01 u02 also occurs in L, necessarily [u1 ] = [u01 ] and [u2 ] = [u02 ].
Of course, before running the SAT solver, we don’t know exactly which variables are floaters,
so we have to encode these constraints conditionally. If there are n equalities x ' u1j u2j in
L where j ∈ {1..n}, x is not cyclic nor occurs in a disequation, and if n > 1, the previously
described constraints can be expressed as:
                             Qx ⇒ (Qu11 ≡ · · · ≡ Qu1n )            and Qx ⇒ (Qu21 ≡ · · · ≡ Qu2n );
                    cc
       for [t] ∈ E , Qx ⇒ (Pu11 ,[t] ≡ · · · ≡ Pu1n ,[t] )          and Qx ⇒ (Pu21 ,[t] ≡ · · · ≡ Pu2n ,[t] ).
     7 Alternatively, all occurrences of Q
                                             i for relevant is are set to false and the formula is simplified accordingly.


10
Lifting CCFV to λfHOL via SAT encoding (WIP)                           Tourret, Fontaine, El Ouraoui and Barbosa


Note that as soon as one of the uij is ground for i ∈ {1, 2} and j ∈ {1..n}, the corresponding
Quij is false and the truth values of the Puij ,[t] s are also known beforehand for all [t] ∈ E cc ,
which simplifies and significantly narrows the constraints.
Example 17. Consider again Ex. 12, the only floater-based constraints to add originate from
the two equations involving z: z ' g y and z ' v f . In their simplified form, because f and
g are ground, they are Qz ⇒ ¬Qv , Qz ⇒ Pv,[g] , Qz ⇒ ¬Pv,[h] , Qz ⇒ ¬Qy , Qz ⇒ Py,[f ] ,
Qz ⇒ ¬Py,[f 0 ] .
    Thanks to the addition of these constraints, the only model of the encoding is now M =
{Px,[a] , Py,[f ] , Py0 ,[f 0 ] , Qz , Pv,[g] }. The substitution σ = {x 7→ a, y 7→ f, y 0 7→ f 0 , z 7→ g f, v 7→ g}
is a solution of the problem. It satisfies all the constraints in M.
Conjecture 18. Given two sets of equational literals E and L such that E is ground, the
E-ground (dis)unification problem has a solution if and only if the encoding of this problem is
satisfiable. The proof of this central result is still work in progress.


5     Reconstruction of a solution from a SAT model
Let E and L form an E-ground (dis)unification problem. Let Lpre be a preprocessed version of
L and Ltriv be the set of trivial assignments that were removed from L during the preprocessing.
Assume that the encoding of the problem is satisfiable and let M be a model of this encoding.
Let tr denote the representative of [t]. To build a ground substitution σ such that E |= Lσ,
we proceed iteratively, starting with the variables in E cc and those that do not depend on any
other variables, then going backward through the variable dependency graph of Lpre ∪ Ltriv ,
until all variables are mapped to ground terms. In more details:
    • σ0 = {x 7→ tr | M |= Px,[t] }, where x occurs in Lpre .
    • σ1 = σ0 ◦ σ 0 such that, for any variable x not grounded by σ0 and that only occurs on
      the right-hand side of equations in Lpre ∪ Ltriv , σ 0 maps x : τ to tτ r where tτ denotes a
      default term of the appropriate type.
    • The mapping of all other variables must be built iteratively. For i > 1 let Ii = {(x, t) |
      x ' t ∈ (Lpre ∪ Ltriv )σi−1 } and let σi = σi−1 ◦ {x 7→ tr | (x, t) ∈ Ii }. The process
      terminates as soon as all variables have been assigned and the final result is σ.
Example 19. For the problem in Ex. 12, considering the model M given in Ex. 17, σ0 = {x 7→
a, y 7→ f, y 0 7→ f 0 , v 7→ g} is determined looking at all the Px,[t] in M. Then σ1 = σ0 because
there are no variables matching the criterion. Finally, one iteration of the last step is enough
to obtain σ = σ0 ◦ {z 7→ f g} that is a ground substitution such that E |= Lσ.
The solution obtained may not satisfy all the [x] 6∈ E cc constraints. They will only be satisfied
if the variables mapped in the second step are assigned default values outside E cc . In practice,
in the context of SMT, we may want to do the exact inverse and map as many of these variables
as possible inside of E cc .
    Another noteworthy point concerning this second step is that there must be only one unique
default value for each type, or at least one default class. To illustrate this, consider a problem
where x ' f y, x ' f z ∈ Lpre , but neither of the three variables occur anywhere else. If y and
z are mapped to two terms that do not belong to the same E-congruence class, the resulting
substitution will not be a solution of the problem.
Conjecture 20. Let E and L form an E-ground (dis)unification problem that admits a so-
lution. Then the substitution σ constructed following the reconstruction method described in
this section is a ground solution of the problem.

                                                                                                                  11
Lifting CCFV to λfHOL via SAT encoding (WIP)                Tourret, Fontaine, El Ouraoui and Barbosa


    It is possible to get a set of distinct solutions by computing all the models of the encoding
instead of just one and then extracting solutions from all of them. This can be done in an
iterative way by adding to the encoding the clause that is the negation of the model previously
found to obtain a different model, until the formula becomes unsatisfiable.
    Note that, for several reasons, there is no one-to-one correspondance between the models
of the SAT encoding and the solutions of the E-ground (dis)unification problem. First, we can
construct a ground substitution from every model, but there are also non-ground ones. Second,
even if we restrict the search to ground substitutions, if there are non-singleton classes, there is
a whole family of substitutions that correspond to one model since the reconstructed solution
always maps a variable to the representative of a class, when it could really be mapped to any
element of the class. Finally, even if we restrict the search to substitutions that map variables
only to the representatives of classes, if there are floaters in the problem, in some cases they
can be mapped to any class of the right type as in, e.g., E = ∅ and L = {x ' y z}.


6    Ongoing and future work
A first-order version of CCFV with support for λfHO terms encoded with the applicative
encoding [4] is already implemented in the λfHOL extension of the lightweight SMT solver
veriT [11]. It has a term-indexing issue because terms are indexed by the symbol at their head
and the applicative symbol occurs at the head of all functional symbols. Thus, contrarily to
what happens in FOL where the head symbol significantly restricts the mapping possibilities,
here only types can be exploited to retrieve suitable terms, which can lead to many more terms
being retrieved at once in comparison with the first-order case. The present approach does not
have this problem because it does not rely on the applicative encoding at all.
    A prototype is under development in veriT. However, model reconstruction is not yet opera-
tional. As soon as it is, we will be able to compare our approach with the applicative version of
CCFV. Given that the later, used by trigger- and conflict-based instantiation, currently takes
around 40% of the overall SMT computation time in the benchmarks used by Barbosa et al. [4],
we expect that any improvement on the efficiency of CCFV will lead to an important speedup in
the SMT process, but it remains to be observed in practice if this is the case for our approach.
    Experiments notwithstanding, there is much that can be improved in our prototype regard-
ing the data-structures as well as the algorithms for the preprocessing and the encoding phases.
In particular, cycle detection is currently implemented with a naive algorithm in O(n3 ). It will
eventually be replaced by Tarjan’s linear algorithm. Furthermore, the encoding itself could be
improved. Each constraint Cx6'u and Cu0 'u1 u2 imposes that variables belong to some classes
among all, and indirectly that they do not belong to the remaining classes. This information
could be propagated to other constraints and lead to further narrowing of the set of possible
classes for some variables. By starting with the most restrictive constraints, one might be able
to significantly restrict the size of the constraints, the number of propositional variables, and
also the number of necessary mutual exclusion constraints. An incremental encoding to SAT,
as well as a heuristic to guide the SAT solver so that it starts its search on the most restricted
variables are also on our list of ideas to be investigated.


7    Acknowledgments
We are grateful to Jasmin Blanchette for many discussions throughout the development of this
work, for providing funding for research visits and for suggesting many improvements. Experi-

12
Lifting CCFV to λfHOL via SAT encoding (WIP)                   Tourret, Fontaine, El Ouraoui and Barbosa


ments were carried out using the Grid’5000 testbed (https://www.grid5000.fr/), supported
by a scientific interest group hosted by Inria and including CNRS, RENATER, and several
universities as well as other organizations.


References
 [1] L. Bachmair and H. Ganzinger. Rewrite-based equational theorem proving with selection and
     simplification. Journal of Logic and Computation, 4(3):217–247, 1994.
 [2] H. Barbosa. New techniques for instantiation and proof production in SMT solving. PhD thesis,
     Université de Lorraine, Universidade Federal do Rio Grande do Norte, 2017.
 [3] H. Barbosa, P. Fontaine, and A. Reynolds. Congruence closure with free variables. In A. Legay and
     T. Margaria, editors, Tools and Algorithms for Construction and Analysis of Systems (TACAS),
     Part II, volume 10206 of Lecture Notes in Computer Science, pages 214–230, 2017.
 [4] H. Barbosa, A. Reynolds, D. E. Ouraoui, C. Tinelli, and C. W. Barrett. Extending SMT solvers
     to higher-order logic. In P. Fontaine, editor, Proc. Conference on Automated Deduction (CADE),
     volume 11716 of Lecture Notes in Computer Science, pages 35–54. Springer, 2019.
 [5] C. Barrett, R. Sebastiani, S. Seshia, and C. Tinelli. Satisfiability modulo theories. In A. Biere,
     M. J. H. Heule, H. van Maaren, and T. Walsh, editors, Handbook of Satisfiability, volume 185 of
     FAIA, chapter 26, pages 825–885. IOS Press, 2009.
 [6] A. Bentkamp, J. Blanchette, S. Tourret, P. Vukmirović, , and U. Waldmann. Superposition with
     lambdas. In P. Fontaine, editor, Proc. Conference on Automated Deduction (CADE), Lecture
     Notes in Computer Science. (Accepted for publication). Springer, 2019.
 [7] A. Bentkamp, J. C. Blanchette, S. Cruanes, and U. Waldmann. Superposition for lambda-free
     higher-order logic. In D. Galmiche, S. Schulz, and R. Sebastiani, editors, Int. Joint Conference
     on Automated Reasoning (IJCAR), volume 10900 of Lecture Notes in Computer Science, pages
     28–46. Springer, 2018.
 [8] A. Bhayat and G. Reger. Set of support for higher-order reasoning. In B. Konev, J. Urban, and
     P. Rümmer, editors, Practical Aspects of Automated Reasoning (PAAR), volume 2162 of CEUR
     Workshop Proceedings, pages 2–16. CEUR-WS.org, 2018.
 [9] A. Bhayat and G. Reger. Restricted combinatory unification. In P. Fontaine, editor, Proc. Con-
     ference on Automated Deduction (CADE), volume 11716 of Lecture Notes in Computer Science,
     pages 74–93. Springer, 2019.
[10] J. C. Blanchette, C. Kaliszyk, L. C. Paulson, and J. Urban. Hammering towards QED. J.
     Formalized Reasoning, 9(1):101–148, 2016.
[11] T. Bouton, D. C. B. de Oliveira, D. Déharbe, and P. Fontaine. veriT: an open, trustable and
     efficient SMT-solver. In R. A. Schmidt, editor, CADE–22, volume 5663 of LNCS, pages 151–156.
     Springer, 2009.
[12] C. E. Brown. Satallax: an automatic higher-order prover. In B. Gramlich, D. Miller, and U. Sattler,
     editors, Int. Joint Conference on Automated Reasoning (IJCAR), volume 7364 of LNCS, pages
     111–117. Springer, 2012.
[13] L. de Moura and N. Bjørner. Efficient e-matching for SMT solvers. In F. Pfenning, editor, CADE–
     21, volume 4603 of LNCS, pages 183–198. Springer, 2007.
[14] D. Detlefs, G. Nelson, and J. B. Saxe. Simplify: a theorem prover for program checking. J. ACM,
     52:365–473, 2005.
[15] Y. Ge and L. de Moura. Complete instantiation for quantified formulas in satisfiabiliby modulo
     theories. In A. Bouajjani and O. Maler, editors, CAV 2009, volume 5643 of LNCS, pages 306–320.
     Springer, 2009.
[16] J. Meng and L. C. Paulson. Translating higher-order clauses to first-order clauses. Journal of
     Automated Reasoning, 40(1):35–60, 2008.

                                                                                                     13
Lifting CCFV to λfHOL via SAT encoding (WIP)                  Tourret, Fontaine, El Ouraoui and Barbosa


[17] R. Nieuwenhuis and A. Rubio. Paramodulation-based theorem proving. In A. Robinson and
     A. Voronkov, editors, Handbook of automated reasoning, volume 1, pages 371–443. Elsevier Science,
     2001.
[18] A. Reynolds, H. Barbosa, and P. Fontaine. Revisiting enumerative instantiation. In D. Beyer and
     M. Huisman, editors, Tools and Algorithms for Construction and Analysis of Systems (TACAS),
     Part II, volume 10806 of Lecture Notes in Computer Science, pages 112–131. Springer, 2018.
[19] A. Reynolds, C. Tinelli, and L. de Moura. Finding conflicting instances of quantified formulas in
     SMT. In FMCAD 2014, pages 195–202. IEEE, 2014.
[20] A. Reynolds, C. Tinelli, A. Goel, S. Krstić, M. Deters, and C. Barrett. Quantifier instantiation
     techniques for finite model finding in SMT. In M. P. Bonacina, editor, CADE–24, volume 7898 of
     LNCS, pages 377–391. Springer, 2013.
[21] A. Steen and C. Benzmüller. The higher-order prover Leo-III. In D. Galmiche, S. Schulz, and
     R. Sebastiani, editors, Int. Joint Conference on Automated Reasoning (IJCAR), volume 10900 of
     LNCS, pages 108–116. Springer, 2018.
[22] R. E. Tarjan. Depth-first search and linear graph algorithms. SIAM J. Comput., 1(2):146–160,
     1972.
[23] P. Vukmirovic, J. C. Blanchette, S. Cruanes, and S. Schulz. Extending a brainiac prover to lambda-
     free higher-order logic. In T. Vojnar and L. Zhang, editors, Tools and Algorithms for Construction
     and Analysis of Systems (TACAS), Part I, volume 11427 of Lecture Notes in Computer Science,
     pages 192–210. Springer, 2019.


14