Computing Uniform Interpolants for EUF
via (conditional) DAG-based Compact Representations?

                    Silvio Ghilardi1 , Alessandro Gianola2,3 , and Deepak Kapur4
                1 Dipartimento di Matematica, Università degli Studi di Milano (Italy)
            2    Faculty of Computer Science, Free University of Bozen-Bolzano (Italy)
                                     gianola@inf.unibz.it
                    3 CSE Department, University of California San Diego (USA)
                4 Department of Computer Science, University of New Mexico (USA)


        Abstract. The concept of a uniform interpolant for a quantifier-free formula
        from a given formula with a list of symbols, while well-known in the logic litera-
        ture, has been unknown to the formal methods and automated reasoning commu-
        nity. This concept is precisely defined. Two algorithms for computing the uniform
        interpolant of a quantifier-free formula in EUF endowed with a list of symbols
        to be eliminated are proposed. The first algorithm is non-deterministic and gen-
        erates a uniform interpolant expressed as a disjunction of conjunction of literals,
        whereas the second algorithm gives a compact representation of a uniform inter-
        polant as a conjunction of Horn clauses. Both algorithms exploit efficient ded-
        icated DAG representations of terms. Correctness and completeness proofs are
        supplied, using arguments combining rewrite techniques with model theory.

        Keywords: Uniform Interpolation · SMT · Term rewriting · Model Theory.


1     Introduction
The theory of equality over uninterpreted symbols, henceforth denoted by EU F, is one
of the simplest theories that have found numerous applications in computer science,
formal methods and logic. Starting with the works of Shostak [26] and Nelson and Op-
pen [23] in the early eighties, some of the first algorithms were proposed in the context
of developing approaches for combining decision procedures for quantifier-free theo-
ries including freely constructed data structures and linear arithmetic over the rationals.
EU F was first exploited for hardware verification of pipelined processors by Dill [4]
and more widely subsequently in formal methods and verification using model check-
ing frameworks. With the popularity of SMT solvers, where EU F serves as a glue for
combining solvers for different theories, numerous new graph-based algorithms have
been proposed in the literature over the last two decades for checking unsatisfiability of
a conjunction of (dis)equalities of terms built using function symbols and constants.
    In [22], the use of interpolants for automatic invariant generation was proposed,
leading to a plethora of research activities to develop algorithms for generating inter-
polants for specific theories as well as their combination. This new application is differ-
ent from the role of interpolants for analyzing proof theories of various logics starting
? Copyright     c 2020 for this paper by its authors. Use permitted under Creative Commons Li-
    cense Attribution 4.0 International (CC BY 4.0).
2         S. Ghilardi et al.

with the pioneering work of [11,16,25] (for a recent survey in the SMT area, see [3,2]).
Approaches like [22,16,25], however, assume access to a proof of α → β for which
an interpolant is being generated. Given that there can in general be many interpolants
including infinitely many for some theories, little is known about what kind of inter-
polants are effective for different applications, even though some research has been
reported on the strength and quality of interpolants.
     In this paper, a different approach is taken, motivated by the insight connecting in-
terpolating theories with those admitting quantifier-elimination, as advocated in [20].
Particularly, in the preliminaries the concept of a uniform interpolant (UI) defined by
a formula α, in the context of formal methods and verification, is proposed for EU F,
which is well-known not to admit quantifier elimination. A uniform interpolant for a
formula α is in particular, for any formula β , an ordinary interpolant [11,21] for the
pair (α, β ) such that α → β (as well as a reverse interpolant [22] for an unsatisfiable
pair (α, γ)).5 A uniform interpolant could be defined for theories irrespective of whether
they admit quantifier elimination; for theories admitting quantifier elimination, a uni-
form interpolant can be obtained using quantifier elimination: indeed, this shows that a
theory enjoying quantifier elimination admits uniform interpolants as well. Then, a UI
is shown to exist for EU F and to be unique. A related concept of a cover is proposed in
[15] (see also [7,8]).
     In the current paper, two different algorithms for generating UIs from a formula in
EU F (with a list of symbols to be eliminated) are proposed with different character-
istics. They share a common subpart based on concepts used in a ground congruence
closure proposed in [17], which flattens the input and generates a canonical rewrite sys-
tem on constants along with unique rules of the form f (· · · ), where f is an uninterpreted
symbol and the arguments (· · · ) are canonical forms of constants. Further, eliminated
symbols are represented as a DAG to avoid any exponential blow-up.
     The first algorithm is non-deterministic where undecided equalities on constants
are hypothesized to be true or false, generating a branch in each case, and recursively
applying the algorithm. It could also be formulated as an algorithm similar in spirit
to the use of equality interpolants in Nelson and Oppen framework for combination,
where different partitions on constants are tried, with each leading to a branch in the
algorithm. New symbols are introduced along each branch to avoid exponential blow-
up. The second algorithm generalizes the concept of a DAG to conditional DAG in
which subterms are replaced by new symbols under a conjunction of equality atoms,
resulting in its compact and efficient representation. A fully or partially expanded form
of a UI can be derived based on their use in applications. Because of their compact
representation, UIs can be kept of polynomial size for a large class of formulas.
     The former algorithm is tableaux-based and produces the output in disjunctive nor-
mal form, whereas the second algorithm is based on manipulation of Horn clauses and
gives the output in (compressed) conjunctive normal form. We believe that the two al-
gorithms are complementary to each others, especially from the point of view of ap-
plications. Model checkers typically synthesize safety invariants using conjunctions
5 The third author recently learned from the first author that this concept has been used exten-

    sively in logic for decades [14,24] to his surprise since he had the erroneous impression that
    he came up with the concept in 2012, which he presented in a series of talks [18,19].
                                                  Computing UIs for EUF via DAGs           3

of clauses and in this sense they might better take profit from the second algorithm;
however, model-checkers dually representing sets of backward reachable states as dis-
junctions of cubes (i.e., conjunctions of literals) would better adopt the first algorithm.
Non-deterministic manipulations of cubes are also required to match certain PSPACE
lower bounds, as in the case of SAS systems mentioned in [9]. On the other hand, re-
garding the overall complexity, it seems to be easier to avoid exponential blow-ups in
concrete examples by adopting the second algorithm.
    The termination, correctness and completeness of both the algorithms are proved
by using results in model theory about model completions; this relies on a basic result
(Lemma 2 below) taken from [7].
    Both our algorithms are simple, intuitive and easy to understand in contrast to other
algorithms in the literature. In fact, the algorithm from [7] requires the full saturation
of all the formulae deductively implied in a version of superposition calculus equipped
with ad hoc settings, whereas the main merit of our second algorithm is to show that
a very light form of completion is sufficient, thus simplifying the whole procedure and
getting seemingly better complexity results.6 The algorithm from [15] requires some
bug fixes (as pointed out in [7]) and the related completeness proof is still missing.
    The paper is structured as follows: in the next paragraph we discuss about related
work on the use UIs. In Section 2 we state the main problem, fix some notation, dis-
cuss DAG representations and congruence closure. In Sections 3 and 4, we respectively
give the two algorithms for computing uniform interpolants in EU F (correctness and
completeness of such algorithms are proved in Section 5). We conclude in Section 6.

Related work on the use of UIs. The use of uniform interpolants in model-checking
safety problems for infinite state systems was already mentioned in [15] and further
exploited in a recent research line on the verification of data-aware processes [6,5,9].
Model checkers need to explore the space of all reachable states of a system; a pre-
cise exploration (either forward starting from a description of the initial states or back-
ward starting from a description of unsafe states) requires quantifier elimination. The
latter is not always available or might have prohibitive complexity; in addition, it is
usually preferable to make over-approximations of reachable states both to avoid di-
vergence and to speed up convergence. One well-established technique for comput-
ing over-approximations consists in extracting interpolants from spurious traces, see
e.g. [22]. One possible advantage of uniform interpolants over ordinary interpolants is
that they do not introduce over-approximations and so abstraction/refinements cycles
are not needed in case they are employed (the precise reason for that goes through the
connection between uniform interpolants, model completeness and existentially closed
structures, see [9] for a full account). In this sense, computing uniform interpolants
has the same advantages and disadvantages as computing quantifier eliminations, with
two remarkable differences. The first difference is that uniform interpolants may be
available also in theories not admitting quantifier elimination (EU F being the typical

6 Although we feel that some improvement is possible, the termination argument in [7] gives a

  double exponential bound, whereas we have a simple exponential bound for both algorithms
  (with optimal chances to keep the output polynomial in many concrete cases in the second
  algorithm).
4       S. Ghilardi et al.

example); the second difference is that computing uniform interpolants may be tractable
when the language is suitably restricted e.g. to unary function symbols (this was already
mentioned in [15], see also Remark 3 below). Restriction to unary function symbols
is sufficient in database driven verification to encode primary and foreign keys [9]. It
is also worth noticing that, precisely by using uniform interpolants for this restricted
language, in [9] new decidability results have been achieved for interesting classes of
infinite state systems. Notably, such results also operationally mirrored in the MCMT
[13] implementation since version 2.8.


2     Preliminaries

We adopt the usual first-order syntactic notions, including signature, term, atom,
(ground) formula; our signatures are always finite or countable and include equality.
Without loss of generality, only functional signatures, i.e. signatures whose only pred-
icate symbol is equality, are considered. A tuple hx1 , . . . , xn i of variables is compactly
represented as x. The notation t(x), φ (x) means that the term t, the formula φ has free
variables included in the tuple x. This tuple is assumed to be formed by distinct vari-
ables, thus we underline that, when we write e.g. φ (x, y), we mean that the tuples x, y
are made of distinct variables that are also disjoint from each other. A formula is said to
be universal (resp., existential) if it has the form ∀x(φ (x)) (resp., ∃x(φ (x))), where φ is
quantifier-free. Formulae with no free variables are called sentences.
    From the semantic side, the standard notion of Σ -structure M is used: this is a pair
formed of a set (the ‘support set’, indicated as |M|) and of an interpretation function.
The interpretation function maps n-ary function symbols to n-ary operations on |M|
(in particular, constants symbols are mapped to elements of |M|). A free variables
assignment I on M extends the interpretation function by mapping also variables to
elements of |M|; the notion of truth of a formula in a Σ -structure under a free variables
assignment I is the standard one.
    It may be necessary to expand a signature Σ with a fresh name for every a ∈ |M|:
such expanded signature is called Σ |M| and M is by abuse seen as a Σ |M| -structure
itself by interpreting the name of a ∈ |M| as a (the name of a is directly indicated as a
for simplicity).
    A Σ -theory T is a set of Σ -sentences; a model of T is a Σ -structure M where all
sentences in T are true. We use the standard notation T |= φ to say that φ is true in all
models of T for every assignment to the variables occurring free in φ . We say that φ is
T -satisfiable iff there is a model M of T and an assignment to the variables occurring
free in φ making φ true in M.


2.1    Uniform Interpolants

Fix a theory T and an existential formula ∃e φ (e, z); call a residue of ∃eφ (e, z)
any quantifier- free formula θ (z, y) such that T |= ∃e φ (e, z) → θ (z, y) (equivalently,
such that T |= φ (e, z) → θ (z, y)). The set of residues of ∃eφ (e, z) is denoted as
Res(∃e φ (e, z)). A quantifier-free formula ψ(z) is said to be a T -uniform inter-
                                                     Computing UIs for EUF via DAGs             5

polant7 (or, simply, a uniform interpolant, abbreviated UI) of ∃e φ (e, z) iff ψ(z) ∈
Res(∃e φ (e, z)) and ψ(z) implies (modulo T ) all the formulae in Res(∃e φ (e, z)). It is
immediately seen that UIs are unique (modulo T -equivalence). A theory T has uniform
quantifier-free interpolation iff every existential formula ∃e φ (e, z) has a UI.
Example 1. Consider the existential formula ∃e( f (e, z1 ) = z2 ∧ f (e, z3 ) = z4 ): it can be
shown that its EU F-uniform interpolant is z1 = z3 → z2 = z4 .
Notably, if T has uniform quantifier-free interpolation, then it has ordinary quantifier-
free interpolation, in the sense that if we have T |= φ (e, z) → φ 0 (z, y) (for quantifier-free
formulae φ , φ 0 ), then there is a quantifier-free formula θ (z) such that T |= φ (e, z) → θ (z)
and T |= θ (z) → φ 0 (z, y). In fact, if T has uniform quantifier-free interpolation, then the
interpolant θ is independent on φ 0 (the same θ (z) can be used as interpolant for all
entailments T |= φ (e, z) → φ 0 (z, y), varying φ 0 ). Uniform quantifier-free interpolation
has a direct connection to an important notion from classical model theory, namely
model completeness (see [7] for more information).

2.2   Problem Statement
In this paper the problem of computing UIs for the case in which T is pure identity
theory in a functional signature Σ is considered; this theory is called EU F(Σ ) or just
EU F in the SMT-LIB2 terminology. Two different algorithms are proposed for that
(while proving correctness and completeness of such algorithms, it is simultaneously
shown that UIs exist in EU F). The first algorithm computes a UI in disjunctive normal
form format, whereas the second algorithm supplies a UI in conjunctive normal form
format. Both algorithms use suitable DAG-compressed representation of formulae.
    The following notation is used throughout the paper. Since it is easily seen that
existential quantifiers commute with disjunctions, it is sufficient to compute UIs for
primitive formulae, i.e. for formulae of the kind ∃e φ (e, z), where φ is a constraint, i.e.
a conjunction of literals. We partition all the 0-ary symbols from the input as well as
symbols newly introduced into disjoint sets. We use the following conventions:
  - e = e0 , . . . , eN (with N integer) are symbols to be eliminated, called variables,
  - z = z0 , . . . , zM (with M integer) are symbols not to be eliminated, called parameters,
  - symbols a, b, . . . stand for both variables and parameters.
In the following we will also use symbols y for indicating variables that changed their
status and do not need to be eliminated anymore: we use symbols a, b, . . . for them as
well. Variables e are usually skolemized during the manipulations of our algorithms and
proofs below, in the sense that they have to be considered as fresh individual constants.
Remark 1. UI computations eliminate symbols which are existentially quantified vari-
ables (or skolemized constants). Elimination of function symbols can be reduced to
elimination of variables in the following way. Consider a formula ∃ f φ ( f , z), where φ
is quantifier-free. Successively abstracting out functional terms, we get that ∃ f φ ( f , z)
is equivalent to a formula of the kind ∃e ∃ f ( i ( f (t i ) = ei ) ∧ ψ), where the e are fresh
                                                 V

variables (with ei ∈ e), t i are terms, f does not occur in t i , ei , ψ and ψ is quantifier-free.
The latter is semantically equivalent to ∃e( i6= j (t i = t j → ei = e j ) ∧ ψ), where t i = t j
                                               V

is the conjunction of the component-wise equalities of the tuples t i and t j .
 7 In some literature [15,7] uniform interpolants are called covers.
6        S. Ghilardi et al.

2.3   Flat Literals, DAGs and Congruence Closure
A flat literal is a literal of one of the following kinds
                              f (a1 , . . . , an ) = b,   a1 = a2 ,   a1 6= a2                   (1)
where a1 , . . . , an and b are (not necessarily distinct) variables or constants. A formula
is flat iff all literals occurring in it are flat; flat terms are terms that may occur in a flat
literal (i.e. terms like those appearing in (1)).
     We call a DAG-definition V    (or simply a DAG) any formula δ (y, z) of the following
form (where y := y1 . . . , yn ): ni=1 (yi = fi (y1 , . . . , yi−1 , z)) . Thus, δ (y, z) provides an
explicit definition of the y in terms of the parameters z.
     Given a DAG δ , we can in fact associate to it the substitution σδ recursively defined
by the mapping (yi )σδ := fi ((y1 )σδ , . . . , (yi−1 )σδ , z). DAGs are commonly used to
represent formulae and substitutions in compressed form: in fact a formula like
                                          ∃y (δ (y, z) ∧ Φ(y, z))                                (2)
is equivalent to Φ((y)σδ , z), and is called DAG-representation . The formula
Φ((y)σδ , z) is said to be the unravelling of (2): notice that computing such an unravel-
ling in uncompressed form by explicitly performing substitutions causes an exponential
blow-up. This is why we shall systematically prefer DAG-representations (2) to their
uncompressed forms.
    As above stated, our main aim is to compute the UI of a primitive formula ∃e φ (e, z);
using trivial logical manipulations (that have just linear complexity costs), it can be
shown that, without loss of generality the constraint φ (e, z) can be assumed to be flat.
To do so, it is sufficient to perform a preprocessing procedure by applying well-known
Congruence Closure Transformations: the reader is referred to [17] for a full account.


3     The Tableaux Algorithm
The algorithm proposed in this section is tableaux-like. It manipulates formulae in the
following DAG-primitive format
                               ∃y (δ (y, z) ∧ Φ(y, z) ∧ ∃e Ψ (e, y, z))                          (3)
where δ (y, z) is a DAG and Φ,Ψ are flat constraints (notice that the e do not occur in
Φ). We call a formula of that format a DAG-primitive formula. To make reading easier,
we shall omit in (3) the existential quantifiers, so as (3) will be written simply as
                                    δ (y, z) ∧ Φ(y, z) ∧Ψ (e, y, z) .                            (4)
    Initially the DAG δ and the constraint Φ are the empty conjunction. In the DAG-
primitive formula (4), variables z are called parameter variables, variables y are called
(explicitly) defined variables and variables e are called (truly) quantified variables. Vari-
ables z are never modified; in contrast, during the execution of the algorithm it could
happen that some quantified variables may disappear or become defined variables (in
the latter case they are renamed: a quantified variables ei becoming defined is renamed
as y j , for a fresh y j ). Below, letters a, b, . . . range over e ∪ y ∪ z.
                                                         Computing UIs for EUF via DAGs     7

Definition 1. A term t (resp. a literal L) is e-free when there is no occurrence of any of
the variables e in t (resp. in L). Two flat terms t, u of the kinds

                         t := f (a1 , . . . , an )   u := f (b1 , . . . , bn )            (5)

are said to be compatible iff for every i = 1, . . . , n, either ai is identical to bi or both
ai and bi are e-free. The difference set of two compatible terms as above is the set of
disequalities ai 6= bi , where ai is not equal to bi .


3.1   The Algorithm

Our algorithm applies the transformations below (except the last one) in a “don’t care”
non-deterministic way. The last transformation has lower priority and splits the execu-
tion of the algorithm in several branches: each branch will produce a different disjunct
in the output formula. Each state of the algorithm is a DAG-primitive formula like (4).
We now provide the rules that constitute our ‘tableaux-like’ algorithm.

(1)   Simplification Rules:
    (1.0) if an atom like t = t belongs to Ψ , just remove it; if a literal like t 6= t occurs
          somewhere, delete Ψ , replace Φ with ⊥ and stop;
    (1.i) If t is not a variable and Ψ contains both t = a and t = b, remove the latter
          and replace it with b = a.
    (1.ii) If Ψ contains ei = e j with i > j, remove it and replace everywhere ei by e j .
(2) DAG Update Rule: if Ψ contains ei = t(y, z), remove it, rename everywhere ei as
    y j (for fresh y j ) and add y j = t(y, z) to δ (y, z). More formally:
                                                                            
                        δ (y, z) ∧ Φ(y, z) ∧ Ψ (e, ei , y, z) ∧ ei = t(y, z)

                                                   ⇓
                                                 
                         δ (y, z) ∧ y j = t(y, z) ∧ Φ(y, z) ∧Ψ (e, y j , y, z)

(3) e-Free Literal Rule: if Ψ contains a literal L(y, z), move it to Φ(y, z). More for-
    mally:                                                         
                         δ (y, z) ∧ Φ(y, z) ∧ Ψ (e, y, z) ∧ L(y, z)
                                                     ⇓
                                                         
                             δ (y, z) ∧ Φ(y, z) ∧ L(y, z) ∧Ψ (e, y, z)

(4) Splitting Rule: If Ψ contains a pair of atoms t = a and u = b, where t and u are
    compatible flat terms like in (5), and no disequality from the difference set of t, u
    belongs to Φ, then non-deterministically apply one of the following alternatives:
    (4.0) remove from Ψ the atom f (b1 , . . . , bn ) = b, add to Ψ the atom b = a and add
        to Φ all equalities ai = bi such that ai 6= bi is in the difference set of t, u;
    (4.1) add to Φ one of the disequalities from the difference set of t, u (notice that
        the difference set cannot be empty, otherwise Rule (1.i) applies).
8        S. Ghilardi et al.

When no more rule is applicable, delete Ψ (e, y, z) from the resulting formula
                                   δ (y, z) ∧ Φ(y, z) ∧Ψ (e, y, z)
so as to obtain for any branch an output formula in DAG-representation of the kind
                                     ∃y (δ (y, z) ∧ Φ(y, z)) .
   The following proposition states that, by applying the previous rules, termination is
always guaranteed.
Proposition 1. The non-deterministic procedure presented above always terminates.
Proof. It is sufficient to show that every branch of the algorithm must terminate. In
order to prove that, first observe that the total number of the variables involved never
increases and it decreases if (1.ii) is applied (it might decrease also by the effect of
(1.0)). Whenever such a number does not decrease, there is a bound on the number
of inequalities that can occur in Ψ , Φ. Now transformation (4.1) decreases the number
of inequalities that are actually missing; the other transformations do not increase this
number. Finally, all transformations except (4.1) reduce the length of Ψ .             a
     The following remark will be useful to prove the correctness of our algorithm, since
it gives a description of the kind of literals contained in a state triple that is terminal
(i.e., when no rule applies).
Remark 2. Notice that if no transformation applies to (3), the set Ψ can only contain
inequalities of the kind ei 6= a, together with equalities of the kind f (a1 , . . . , an ) = a.
However, when it contains f (a1 , . . . , an ) = a, one of the ai must belong to e (otherwise
(2) or (3) applies). Moreover, if f (a1 , . . . , an ) = a and f (b1 , . . . , bn ) = b are both in Ψ ,
then either they are not compatible or ai 6= bi belongs to Φ for some i and for some
variables ai , bi not in e (otherwise (4) or (1.i) applies).
Remark 3. The complexity of the above algorithm is exponential, however the com-
plexity of producing a single branch is quadratic. Notice that if functions symbols are
all unary, there is no need to apply Rule 4, hence for this restricted case computing UI
is a tractable problem. The case of unary functions has relevant applications in database
driven verification [9,6,5] (where unary function symbols are used to encode primary
and foreign keys).
Example 2. Let us compute the UI of the formula ∃e0 (g(z4 , e0 ) = z0 ∧ f (z2 , e0 ) =
g(z3 , e0 ) ∧ h( f (z1 , e0 )) = z0 ). Flattening gives the set of literals
       g(z4 , e0 ) = z0 ∧ e1 = f (z2 , e0 ) ∧ e1 = g(z3 , e0 ) ∧ e2 = f (z1 , e0 ) ∧ h(e2 ) = z0   (6)
where the newly introduced variables e1 , e2 need to be eliminated too. Applying (4.0)
removes g(z3 , e0 ) = e1 and introduces the new equalities z3 = z4 , e1 = z0 . This causes
e1 to be renamed as y1 by (2). Applying again (4.0) removes f (z1 , e0 ) = e2 and adds the
equalities z1 = z2 , e2 = y1 ; moreover, e2 is renamed as y2 . To the literal h(y2 ) = z0 we
can apply (3). The branch terminates with y1 = z0 ∧ y2 = y1 ∧ z1 = z2 ∧ z3 = z4 ∧ h(y2 ) =
z0 ∧ f (z2 , e0 ) = y1 ∧ g(z4 , e0 ) = z0 . This produces z1 = z2 ∧ z3 = z4 ∧ h(z0 ) = z0 as a
first disjunct of the uniform interpolant. The other branches produce z1 = z2 ∧ z3 6= z4 ,
z1 6= z2 ∧ z3 = z4 and z1 6= z2 ∧ z3 6= z4 as further disjuncts, so that the UI turns out to be
equivalent (by trivial logical manipulations) to z1 = z2 ∧ z3 = z4 → h(z0 ) = z0 .
                                                          Computing UIs for EUF via DAGs               9

4     The Conditional Algorithm
This section discusses a new algorithm with the objective of generating a compact rep-
resentation of the UI in EU F: this representation avoids splitting and is based on con-
ditions in Horn clauses generated from literals whose left sides have the same function
symbol. A by-product of this approach is that the size of the output UI often can be kept
polynomial. Further, the output of this algorithm generates the UI of ∃e φ (e, z) (where
φ (e, z) is a conjunction of literals and e = e0 , . . . , eN , z = z0 , . . . , zM , as usual) in con-
junctive normal form as a conjunction of Horn clauses (we recall that a Horn clause is
a disjunction of literals containing at most one positive literal). Toward this goal, a new
data structure of a conditional DAG, a generalization of a DAG, is introduced so as to
maximize sharing of sub-formulas.
    Using the core preprocessing procedure explained in Subsection 2.3, it is assumed
                           V
that φ is the conjunction S1 , where S1 is a set of flat literals containing only literals of
the following two kinds:
                                      f (a1 , . . . , ah ) = a                                      (7)
                                                 a 6= b                                              (8)
(recall that we use letters a, b, . . . for elements of e ∪ z). In addition we can assume that
variables in e must occur in (8) and in the left side of (7). We do not include equalities
like a = b because they can be eliminated by replacement.

4.1    The Algorithm
The algorithm requires two steps in order to get a set of clauses representing the output
in a suitably compressed format.
    Step 1. Out of every pair of literals f (a1 , . . . , ah ) = a and f (a01 , . . . , a0h ) = a0 of the
kind (7) (where a 6≡ a0 ), we produce the Horn clause
                                  a1 = a01 , . . . , ah = a0h → a = a0                               (9)
which can be further simplified by deleting identities in the antecedent. Let us call S2
the set of clauses obtained from S1 by adding to it these new Horn clauses.
      Step 2. We saturate S2 with respect to the following rewriting rule

 Γ → e j = ei    C
    Γ → C[ei ] p
where j > i, C[ei ] p means the result of the replacement of e j by ei in the position p of
the clause C and Γ → C[ei ] p is the clause obtained by merging Γ with the antecedent
of the clause C[ei ] p .
    Notice that we apply the rewriting rule only to conditional equalities of the kind
Γ → e j = ei : this is because clauses like Γ → e j = zi are considered ‘conditional defi-
nitions’ (and the clauses like Γ → z j = zi as ‘conditional facts’).
    We let S3 be the set of clauses obtained from S2 by saturating it with respect to the
above rewriting rule, by removing from antecedents identical literals of the kind a = a
and by removing subsumed clauses.
10         S. Ghilardi et al.

Example 3. Let S1 be the set of the following literals

                        f1 (e0 , z1 ) = e1 ,   f1 (e0 , z2 ) = z3 ,     f2 (e0 , z4 ) = e2 ,
                        f2 (e0 , z5 ) = z6 , g1 (e0 , e1 ) = e2 , g1 (e0 , z01 ) = z02 ,
                        g2 (e0 , e2 ) = e1 ,   g2 (e0 , z001 ) = z002   h(e1 , e2 ) = z0

Step 1 produces the following set S2 of Horn clauses

                                z1 = z2 → e1 = z3 , z4 = z5 → e2 = z6 ,
                                e1 = z01 → e2 = z02 , e2 = z001 → e1 = z002

Since there are no Horn clauses whose consequent is an equality of the kind ei = e j ,
Step 2 does not produce further clauses and we have S3 = S2 .

4.2     Conditional DAGs
In order to be able to extract the output UI in a uncompressed format out of the above
set of clauses S3 , we must identify all the ‘implicit conditional definitions’ it contains.
    Let w be an ordered subset of the e = {e1 , . . . , eN }: that is, in order to specify w we
must take a subset of the e and an ordering of this subset. Intuitively, these w will play
the role of placeholders inside a conditional definition.
    If we let w be w1 , . . . , ws (where, say, wi is some eki with ki ∈ {1, . . . , N}), we let
Li be the language restricted to z and w1 , . . . , wi (for i ≤ s): in other words, an Li -term
or an Li -clause may contain only terms built up from z, w1 , . . . , wi by applying to them
function symbols. In particular, Ls (also called Lw ) is the language restricted to z ∪ w.
We let L0 be the language restricted to z.
    Given a set S of clauses and w as above, a w-conditional DAG δ (or simply a con-
ditional DAG δ ) built out of S is a set of Horn clauses from S

                                   Γ1 → w1 = t1 , . . . , Γs → ws = ts                         (10)

where Γi is a finite tuple of Li−1 -atoms and ti is a Li−1 -term. Given a w-conditional DAG
δ we can define the formulae φδi (for i = 1, . . . , s + 1) as follows:

- φδs+1 is the conjunction of all Lw -clauses belonging to S;
- for i ≤ s, the formula φδi is Γi → ∀wi (wi = ti → φδi+1 ).

It can be seen that φδi is equivalent to a quantifier-free Li−1 formula,8 in particular φδ1
(abbreviated as φδ ) is equivalent to an L0 -quantifier-free formula. The explicit compu-
tation of such quantifier-free formulae may however produce an exponential blow-up.

Example 4. Let us analyze the conditional DAG δ that can be extracted out of the set
S3 of the Horn clauses mentioned in Example 3 (we disregard those δ such that φδ is
the empty conjunction >). We can get not logically equivalent formulae for φδ1 and φδ2
 8 It can be shown that such a formula can be turned, again up to equivalence, into a conjunction

     of Horn clauses.
                                                      Computing UIs for EUF via DAGs        11

considering δ1 with w1 = e1 , e2 and conditional definitions z1 = z2 → e1 = z3 , e1 =
z01 → e2 = z02 or δ2 with w2 = e2 , e1 and conditional definitions z4 = z5 → e2 = z6 , e2 =
z001 → e1 = z002 In fact, φδ1 is logically equivalent to

                        z1 = z2 ∧ z3 = z01 →       S3−0 [z3 /e1 , z02 /e2 ] .
                                               ^
                                                                                          (11)

whereas φδ2 is logically equivalent to

                         z4 = z5 ∧ z6 = z001 →       S3−0 [z6 /e2 , z002 /e1 ]
                                                 ^
                                                                                          (12)

where we used the notation S3−0 [z3 /e1 , z02 /e2 ] to mean the result of the substitution of
                              V

e1 with z3 and of e2 with z02 in the conjunction of S3 -clauses not involving e0 (a similar
notation is used for S3−0 [z6 /e2 , z002 /e1 ]) . A third possibility is to use the conditional
definitions z1 = z2 → e1 = z3 and z4 = z5 → e2 = z6 with (equivalently) either w1 or w2
resulting in a conditional dag δ3 with φδ3 logically equivalent to

                                                   S3−0 [z3 /e1 , z6 /e2 ] .
                                               ^
                        z1 = z2 ∧ z4 = z5 →                                               (13)

      The next lemma (proved in [12]) shows the relevant property of φδ :

Lemma 1. For every set of clauses S and for every w-conditional DAG δ built out of S,
the formula S → φδ is logically valid.
           V


Notice  that it is not true that the conjunction of all possible φδ (varying δ and w) implies
  S: in fact, such a conjunction can be empty for instance in case S is just {e1 = e2 }.
V


4.3    Extraction of UI’s

We shall prove below that in order to get a UI of ∃e φ (e, a), one can take the conjunction
of all possible φδ , varying δ among the conditional DAGs that can be built out of the
set of clauses S3 from Step 2 of the above algorithm.

Example 5. If φ is the conjunction of the literals of Example 3, then the conjunction of
(11), (12) and (13) is a UI of ∃e φ ; in fact, no further non-trivial conditional dag δ can
be extracted (if we take w = e1 or w = e2 or w = 0/ to extract δ , then it happens that φδ
is the empty conjunction >).

Example 6. Let us turn to the literals (6) of Example 2. Step 1 produces out of them the
conditional clauses

                       z3 = z4 → e1 = z0 ,         z1 = z2 → e2 = e1 .                    (14)

Step 2 produces by rewriting the further clauses z1 = z2 → f (z1 , e0 ) = e1 and z1 =
z2 → h(e1 ) = z0 . We can extract two conditional DAGs δ (using both the conditional
definitions (14) or just the first one); in both cases φδ is z1 = z2 ∧ z3 = z4 → h(z0 ) = z0 ,
which is the UI.
12        S. Ghilardi et al.

    As it should be evident from the two examples above, the conditional DAGs repre-
sentation of the output considerably reduces computational complexity in many cases;
this is a clear advantage of the present algorithm over the algorithm from Section 3 and
over other approaches like, e.g. [7]. Still, the next example shows that in some cases the
overall complexity remains exponential.

Example 7. Let e be e0 , . . . , eN and let z be {z0 , z00 } ∪ {zi, j , z0i, j | 1 ≤ i < j ≤ N}. Let
φ (e, z) be the conjunction of the identities f (e0 , e1 ) = z0 , f (e0 , eN ) = z00 and the set of
identities hi j (e0 , zi j ) = ei , hi j (e0 , z0i j ) = e j , varying i, j such that 1 ≤ i < j ≤ N. After
applying Step 1 of the algorithm presented in Subsection 4.1, we get the Horn clauses
zi j = z0i j → ei = e j , as well as the clause e1 = eN → z0 = z00 . If we now apply Step 2, we
can never produce a conditional clause of the kind Γ → ei = t with t being e-free (be-
cause we can only rewrite some ei into some e j ). Thus no sequence of clauses like (10)
can be extracted from S3 : notice in fact that the term t1 from such a sequence must not
contain the e. In other words, the only w-conditional DAG δ that can be extracted is
based on the empty w ⊆ e and is empty itself. However, such δ produces a formula φδ
that is quite big: it is the conjunction of the clauses from S3 where the e do not occur
(S3 contains in fact Γ → z0 = z00 for exponentially many e-free Γ ’s).


5    Correctness and Completeness Proofs

In this section we prove correctness and completeness of our two algorithms. To this
aim, we need some preliminaries, both from model theory and from term rewriting.
    For model theory, we refer to [10]. We just recall few definitions. A Σ -embedding
(or, simply, an embedding) between two Σ -structures M and N is a map µ : |M| −→
|N | among the support sets |M| of M and |N | of N satisfying the condition (M |=
ϕ ⇒ N |= ϕ) for all Σ |M| -literals ϕ (M is regarded as a Σ |M| -structure, by in-
terpreting each additional constant a ∈ |M| into itself and N is regarded as a Σ |M| -
structure by interpreting each additional constant a ∈ |M| into µ(a)). If µ : M −→ N
is an embedding which is just the identity inclusion |M| ⊆ |N |, we say that M is a
substructure of N or that N is an extension of M.
    Extensions and UI are related to each other by the following result we take from [7]:

Lemma 2 (Cover-by-Extensions). A formula ψ(y) is a UI in T of ∃e φ (e, y) iff it sat-
isfies the following two conditions:

(i) T |= ∀y (∃e φ (e, y) → ψ(y));
(ii) for every model M of T , for every tuple of elements a from the support of M such
     that M |= ψ(a) it is possible to find another model N of T such that M embeds
     into N and N |= ∃e φ (e, a).

     To conveniently handle extensions, we need diagrams. Let M be a Σ -structure.
The diagram of M [10], written ∆Σ (M) (or just ∆ (M)), is the set of ground Σ |M| -
literals that are true in M. An easy but important result, called Robinson Diagram
Lemma [10], says that, given any Σ -structure N , the embeddings µ : M −→ N are in
bijective correspondence with expansions of N to Σ |M| -structures which are models
                                                            Computing UIs for EUF via DAGs                13

of ∆Σ (M). The expansions and the embeddings are related in the obvious way: the
name of a is interpreted as µ(a). It is convenient to see ∆Σ (M) as a set of flat literals
as follows: the positive part of ∆Σ (M) contains the Σ |M| -equalities f (a1 , . . . , an ) = b
which are true in M and the negative part of ∆Σ (M) contains the Σ |M| -inequalities
a 6= b, varying a, b among the pairs of different elements of |M|.
    For term rewriting we refer to a textbook like [1]; we only recall the following
classical result:
Lemma 3. Let R be a canonical ground rewrite system over a signature Σ . Then there is
a Σ -structure M such that for every pair of ground terms t, u we have that M |= t = u
iff the R-normal form of t is the same as the R-normal form of u. Consequently R is
consistent with a set of negative literals S iff for every t 6= u ∈ S the R-normal forms of
t and u are different.
    We are now ready to prove correctness and completeness of our algorithms. We first
give the relevant intuitions for the proof technique, which is the same for both cases.
By Lemma 2 above, what we need to show is that if a model M satisfies the output
formula of the algorithm, then it can be extended to a superstructure N satisfying the
input formula of the algorithm. By Robinson Diagram Lemma, this is achieved if we
show that ∆ (M) is consistent with the output formula of the algorithm. The output
formula is equivalent to a disjunction of constraints and the diagram ∆ (M) is also a
constraint (albeit infinitary). The positive part of ∆ (M) is a canonical rewriting system
(equalities like f (a1 , . . . , an ) = a are obviously oriented from left-to-right) and every
term occurring in ∆ (M) is in normal form. If an algorithm works properly, it will
be easy to see that the completion of the union of ∆ (M) with the relevant disjunct
constraint is trivial and does not produce inconsistencies.
Correctness and Completeness of the Tableaux Algorithm
Theorem 1. Suppose that we apply the algorithm of Subsection 3.1 to the primitive
formula ∃e(φ (e, z)) and that the algorithm terminates with its branches in the states

      δ1 (y1 , z) ∧ Φ1 (y1 , z) ∧Ψ1 (e1 , y1 , z), . . . , δk (yk , z) ∧ Φk (yk , z) ∧Ψk (ek , yk , z)

then the UI of ∃e(φ (e, z)) in EU F is the unravelling (see Subsection 2.3) of the formula
                                    k
                                    _
                                         ∃yi (δi (yi , z) ∧ Φi (yi , z)) .                               (15)
                                   i=1

Proof. Since ∃e(φ (e, z)) is logically equivalent to ki=1 ∃yi (δi (yi , z) ∧ Φi (yi , z) ∧
                                                                      W

∃eiΨi (e1 , y1 , z)), it is sufficient to check that if a formula like (3) is terminal (i.e. no
rule applies to it) then its UI is ∃y (δ (y, z) ∧ Φ(y, z)). To this aim, we apply Lemma 2:
we pick a model M satisfying δ (y, z) ∧ Φ(y, z) via an assignment I to the variables
y, z9 and we show that M can be embedded into a model M0 such that, for a suitable
extensions I 0 of I to the variables e, we have that (M0 , I 0 ) satisfies also Ψ (e, y, z). This
is proved in [12], by using the Robinson Diagram Lemma.                                        a
 9 Actually the values of the assignment I to the z uniquely determines the values of I to the y.
14      S. Ghilardi et al.

Correctness and Completeness of the Conditional Algorithm
Theorem 2. Let S3 be obtained as in Steps 1-2 from ∃e φ (e, z). Then the conjunction of
all possible φδ (varying δ among the conditional DAGs that can be built out of S3 ) is a
UI of ∃e φ (e, z) in EU F.
Proof. We   use Lemma 2. Condition (i) of that Lemma is ensured by Lemma 1 above
                                            So let us take a model M and elements ã
         V
because S3 is logically equivalent to φ . V
from its support such that we have M |= δ φδ under the assignment of the ã to the
parameters z. We need to expand it to a superstructure N in such a way that we have
N |= S1 , under some assignment to z, e extending the assignment z 7→ ã (recall that
      V
V
  S1 is logically equivalent to φ too). The proof is involved and it requires Robinson
Diagram Lemma and additional lemmas: all the details are reported in [12].           a


6    Conclusions
Two different algorithms for computing uniform interpolants (UIs) from a formula in
EU F with a list of symbols to be eliminated are presented. They share a common
subpart as well as they are different in their overall objectives. The first algorithm is non-
deterministic and generates a UI expressed as a disjunction of conjunctions of literals,
whereas the second algorithm gives a compact representation of a UI as a conjunction of
Horn clauses. The output of both algorithms needs to be expanded if a fully (or partially)
unravelled uniform interpolant is needed for an application. This restriction/feature is
similar in spirit to syntactic unification where also efficient unification algorithms never
produce output in fully expanded form to avoid an exponential blow-up.
    For generating a compact representation of the UI, both algorithms make use of
DAG representations of terms by introducing new symbols to stand for subterms arising
in the full expansion of the UI. Moreover, the second algorithm uses a conditional DAG,
a new data structure introduced in the paper, to represent subterms under conditions.
    The complexity of the algorithms is also analyzed. It is shown that the first algorithm
generates exponentially many branches with each branch of at most quadratic length;
the UIs produced by the second algorithm have often polynomial size in concrete ex-
amples (but worst case size is still exponential). A fully expanded UI can easily be of
exponential size. An implementation of both the algorithms, along with a comparative
study are planned as future work. In parallel with the implementation, a characteriza-
tion of classes of formulae for which computation of UIs requires polynomial time in
our algorithms (especially in the second one) needs further investigation.
Acknowledgments. The third author has been partially supported by the National Sci-
ence Foundation award CCF -1908804.


References
 1. F. Baader and T. Nipkow. Term Rewriting and All That. Cambridge University Press, United
    Kingdom, 1998.
 2. M. P. Bonacina and M. Johansson. Interpolation systems for ground proofs in automated
    deduction: a survey. J. Autom. Reasoning, 54(4):353–390, 2015.
                                                     Computing UIs for EUF via DAGs            15

 3. M. P. Bonacina and M. Johansson. On interpolation in automated theorem proving. J. Autom.
    Reasoning, 54(1):69–97, 2015.
 4. J. R. Burch and D. L. Dill. Automatic verification of pipelined microprocessor control. In
    D. L. Dill, editor, Proc. of CAV, volume 818 of LNCS, pages 68–80. Springer, 1994.
 5. D. Calvanese, S. Ghilardi, A. Gianola, M. Montali, and A. Rivkin. Formal modeling and
    SMT-based parameterized verification of data-aware BPMN. In Proc. of BPM, volume
    11675 of LNCS, pages 157–175. Springer, 2019.
 6. D. Calvanese, S. Ghilardi, A. Gianola, M. Montali, and A. Rivkin. From model completeness
    to verification of data aware processes. In Description Logic, Theory Combination, and All
    That, volume 11560 of LNCS, pages 212–239. Springer, 2019.
 7. D. Calvanese, S. Ghilardi, A. Gianola, M. Montali, and A. Rivkin. Model completeness,
    covers and superposition. In Proc. of CADE, volume 11716 of LNCS (LNAI), pages 142–
    160. Springer, 2019.
 8. D. Calvanese, S. Ghilardi, A. Gianola, M. Montali, and A. Rivkin. Combined Covers and
    Beth Definability. In N. Peltier and V. Sofronie-Stokkermans, editors, Proc. of IJCAR, vol-
    ume 12166 of LNCS (LNAI), pages 181–200. Springer, 2020.
 9. D. Calvanese, S. Ghilardi, A. Gianola, M. Montali, and A. Rivkin. SMT-based verification
    of data-aware processes: a model-theoretic approach. Math. Struct. Comput. Sci., 30(3):271–
    313, 2020.
10. C.-C. Chang and J. H. Keisler. Model Theory. North-Holland Publishing Co., Amsterdam-
    London, third edition, 1990.
11. W. Craig. Three uses of the Herbrand-Gentzen theorem in relating model theory and proof
    theory. J. Symbolic Logic, 22:269–285, 1957.
12. S. Ghilardi, A. Gianola, and D. Kapur. Compactly representing uniform interpolants for EUF
    using (conditional) DAGS. CoRR, abs/2002.09784, 2020.
13. S. Ghilardi and S. Ranise. MCMT: A model checker modulo theories. In Proc. of IJCAR,
    pages 22–29, 2010.
14. S. Ghilardi and M. Zawadowski. Sheaves, games, and model completions, volume 14 of
    Trends in Logic—Studia Logica Library. Kluwer Academic Publishers, Dordrecht, 2002.
15. S. Gulwani and M. Musuvathi. Cover algorithms and their combination. In Proc. of ESOP,
    Held as Part of ETAPS, pages 193–207, 2008.
16. G. Huang. Constructing Craig interpolation formulas. In Computing and Combinatorics
    COCOON, pages 181–190. LNCS, 959, 1995.
17. D. Kapur. Shostak’s congruence closure as completion. In Proc. of RTA, pages 23–37, 1997.
18. D. Kapur. Nonlinear polynomials, interpolants and invariant generation for system analy-
    sis. In Proc. of the 2nd International Workshop on Satisfiability Checking and Symbolic
    Computation co-located with ISSAC, 2017.
19. D. Kapur. Conditional congruence closure over uninterpreted and interpreted symbols. J.
    Systems Science & Complexity, 32(1):317–355, 2019.
20. D. Kapur, R. Majumdar, and C. G. Zarba. Interpolation for data structures. In Proc. of
    SIGSOFT FSE, pages 105–116, 2006.
21. R. C. Lyndon. An interpolation theorem in the predicate calculus. Pacific J. Math., 9(1):129–
    142, 1959.
22. K. L. McMillan. Lazy abstraction with interpolants. In Proc. of CAV, pages 123–136, 2006.
23. G. Nelson and D. C. Oppen. Simplification by cooperating decision procedures. ACM Trans.
    Program. Lang. Syst., 1(2):245–257, 1979.
24. A. M. Pitts. On an interpretation of second order quantification in first order intuitionistic
    propositional logic. J. Symb. Log., 57(1):33–52, 1992.
25. P. Pudlák. Lower bounds for resolution and cutting plane proofs and monotone computations.
    J. Symb. Log., 62(3):981–998, 1997.
26. R. E. Shostak. Deciding combinations of theories. J. ACM, 31(1):1–12, 1984.