Walking through the Forest: Fast EUF
              Proof-Checking Algorithms
        Frédéric Besson, Pierre-Emmanuel Cornilleau and Ronan Saillard
                            Inria Rennes – Bretagne Atlantique, France

                                            Abstract
         The quantifier-free logic of Equality with Uninterpreted Function symbols (EUF) is at
     the core of Satisfiability Modulo Theory (SMT) solvers. There exist several competing
     proof formats for EUF proofs. We propose original proof formats obtained from proof
     forests that are the artifacts proposed by Nieuwenhuis and Oliveras to extract efficiently
     EUF unsatisfiable cores. Our proof formats can be generated by SMT solvers for almost
     free. Moreover, our preliminary experiments show that our novel verifiers outperform other
     existing EUF verifiers and that our proof formats appear to be more concise than existing
     EUF proofs.


1    Introduction
SMT solvers (e.g., Z3 [15], Yices [16], CVC3 [4], veriT [10]) are the cornerstone of software ver-
ification tools (e.g., Dafny [18], Why/Krakatoa/Caduceus [17], VCC [11]). They are capable of
discharging proof obligations of impressive size and make possible verification tasks that would
otherwise be unfeasible. Scalability matters, but when it comes to verifying critical software,
soundness is mandatory. SMT solvers are based on highly optimised decision procedures and
proving the correctness of their implementation is probably not a viable option. To sidestep this
difficulty, several SMT solvers are generating proof witnesses that can be validated by external
verifiers. In order to minimise the Trusted Computing Base (TCB), an ambitious approach
consists in validating SMT proofs using a general purpose proof verifier i.e., a proof-assistant
such as Isabelle/HOL or Coq. Recent works [9, 2] show that SMT proofs are big objects and
that the bottleneck is usually the proof-assistant. Ideally, SMT proofs should be i) generated
by SMT solvers with little overhead; ii) verified quickly by proof assistants.
    Even for the simplest logic such as the quantifier-free logic of equality with uninterpreted
function symbols (EUF) there exist competing proof formats generated by SMT solvers. Several
of those formats have been validated by proof-assistants: Z3 proofs can be reconstructed in
Isabelle/HOL and HOL4 [9]; veriT proofs can be reconstructed in Coq [2]. Even though the
results are impressive, certain SMT proofs cannot be verified because they are too big.
    We propose novel proof formats for EUF and compare their efficiency in terms of i) gen-
eration time; ii) checking time; iii) and proof size. Our comparisons are empirical and do not
compare the formats in terms of proof-complexity. Instead, we have implemented the verifiers
using the functional language of Coq and compared their asymptotic behaviours experimen-
tally over families of handcrafted benchmarks. The contributions of the paper are efficient Coq
verifiers for two novel EUF proof formats and their comparison with existing verifiers. The
code of the verifiers is available [21]. Our first proof format is made of proof forests that are
the artifact proposed by Nieuwenhuis and Oliveras to extract efficiently unsatisfiable cores [20].
Our second proof format is a trimmed down version of the proof forest reduced only to the edges
responsible for triggering a congruence. For the sake of comparison, we have also implemented
a Coq verifier for the EUF proof format of Z3 [14] and compared with the existing Coq verifier
for the EUF proof format of veriT [2]. The main result of our preliminary experiments is that

58                                           David Pichardie, Tjark Weber (eds.); PxTP 2012, pp. 58–64
Walking through the Forest                                  F. Besson, P.-E. Cornilleau, and R. Saillard


the running time of all the verifiers is negligible w.r.t. the type-checking of the EUF proofs.
Another result of our experiments is that proof forests verifiers are very fast and that (trimmed
down) proof forests appear to be more concise than existing EUF proofs. Another advantage
is that proof forests can be generated for almost free by SMT solvers based on proof forests.
    The rest of the paper is organised as follows. Section 2 provides basic definitions and known
facts about the EUF logic. Section 3 recalls the nature of proof forests and explains the workings
of our novel EUF proof verifiers. Section 4 is devoted to experiments. Section 5 concludes.


2     Background
We write T (Σ, V) the smallest set of terms built upon a set of variables V and a ranked alphabet
Σ of uninterpreted function symbols. For EUF, an atomic formula is of the form t = t0 for t,
t0 ∈ T (Σ, ∅) and a literal is an atomic formula or its negation. Equality (=) is reflexive,
symmetric, transitive and closed under congruence:
                        x=y         x = y, y = z           x1 = y1 , . . . , xn = yn
              x=x       y=x             x=z           f (x1 , . . . , xn ) = f (y1 , . . . , yn )
     Ackermann has proved using a reduction approach that the EUF logic is decidable [1]. The
reduction consists in introducing for each term f (~x) a boolean variable f~x and encoding the
congruence rule by adding the boolean formula x1 = y1 ∧ . . . ∧ xn = yn ⇒ f~x = f~y for each
pair of terms f (~x), f (~y ). This encoding is responsible for a quadratic blow up of the formula.
Nelson has proposed an efficient decision procedure for deciding the satisfiability of a set of EUF
literals using a congruence closure algorithm [19]. Nieuwenhuis and Oliveras have shown how to
efficiently generate unsatisfiable cores by instrumenting the congruence closure algorithm with a
proof forest [20] gathering all the necessary information for generating proofs (see Section 3.1).


3     Walking through the proof forest
In the following, we assume w.l.o.g. that EUF terms are flat and curried [20, Section 3.1] i.e.,
are of the form a or f (a1 , a2 ) where f represents an explicit application symbol and a, a1 and a2
are constants. In the rest, we drop the f and write a1 (a2 ) for f (a1 , a2 ). These transformations
can be performed in linear time and simplify the decision procedure.

3.1    Proof forest
Nieuwenhuis and Oliveras have proposed a proof-producing congruence closure algorithm for
deciding EUF [20]. Their main contribution is an efficient Explain operation that outputs a
(small) set of input equations E needed to deduce an equality, say a = b. If a 6= b is also
part of the input, E ∪ a 6= b is a (small) unsatisfiable core that is used by the SMT solver for
backtracking. As a result, SMT solvers using congruence closure run a variant of the Explain
algorithm – whether or not they are proof producing.
    The Explain algorithm is based on a specific data structure: the proof forest. A proof forest
is a collection of trees and each edge a → b in the proof forest is labelled by a reason justifying
why the equality a = b holds. A reason is either an input equation a = b or a pair of input
equations a1 (a2 ) = a and b1 (b2 ) = b. For the second case, there must be justifications in the
forest for a1 = b1 and a2 = b2 .
    We give in Figure 1 a modified version of the original Explain algorithm. The NearestAncestor
and Parent functions return (if it exists) respectively the nearest common ancestor of two nodes

                                                                                                     59
Walking through the Forest                                              F. Besson, P.-E. Cornilleau, and R. Saillard


l e t ExplainAlongPath ( a , c ) :=
      a:= HighestNode ( a ) ;
      c := HighestNode ( c ) ;
      i f a = c then r e t u r n
      else
         b:= Parent ( a ) ;                                                 l e t E x p l a i n ( a , b ) :=
                                         a=b                                  c := N e a r e s t A n c e s t o r ( a , b ) ;
         i f edge has form a −−→ b                                            ExplainAlongPath ( a , c ) ;
         then Output ( a = b )                                                ExplainAlongPath ( b , c )
                                                  b1 (b2 )=b
         e l s e { /∗ edge has form a −−−−−−→ b∗/                                           (b) Explain
                                                 a1 (a2 )=a
            Output a1 ( a2)=a and b1 ( b2)=b ;
            E x p l a i n ( a1 , b1 ) ; E x p l a i n ( a2 , b2 ) } ;
         Union ( a , b ) ; ExplainAlongPath ( b , c )
                      (a) ExplainAlongPath

                                 Figure 1: Recursive Explain algorithm


in the proof forest and the parent of a node in the proof forest. A union-find data structure [23]
is also used: Union(a,b) merges the equivalence classes of a and b and Find(a) returns the
current representive of the class of a. The HighestNode(c) operation returns the highest node
(i.e., the node with minimal depth in the proof forest) belonging to the equivalence class of c.
   Contrary to the original version, our algorithm is recursive. The Union operation has also
been moved. As a consequence output equations are ordered differently. This will ease our
futur proof production. Another difference is that the original algorithm always terminates but
                                                                        b=f (b)
does not detect certain ill-formed proof forests e.g., a −−−−→ b. In this case, the recursive
                                                                        a=f (a)
algorithm does not terminate (this issue is dealt with in section 4.1).


3.2    Proof forest verifier

The Explain algorithm of Figure 1 can be turned into a EUF proof verifier. The verifier is a
version of Explain augmented with additional checks to ensure that the edges obtained from the
SMT solver correspond to a well-formed proof forest. For instance, the verifier checks that edges
                                                                                          b1 (b2 )=b
are only labelled by input equations. Moreover, for edges of the form a −−−−−−→ b, the recursive
                                                                                          a1 (a2 )=a
calls to Explain ensure that a1 = b1 and a2 = b2 have proofs in the proof forest i.e., a1 (resp.
a2 ) is connected with b1 (resp. b2 ) by some valid path in the proof forest. For efficiency and
simplicity, the least common ancestors are not computed by the verifier but used as untrusted
hints. The soundness of the verifier does not depend on the validity of this information as the
proposed least common ancestor is just used to guide the proof. If the return node is not a
common ancestor, the verifier will simply fail.
    For the verifier, a EUF proof is a pruned proof forest corresponding to the edges walked
through during a preliminary run of Explain. As the SMT solver needs to traverse the proof
forest to extract unsatisfiable core, we argue that the proof forest is a EUF proof that requires
no extra-work from the SMT solver.

60
Walking through the Forest                                        F. Besson, P.-E. Cornilleau, and R. Saillard


l e t UFchecker i n p u t e d g e s ( x , y ) :=
     f o r ( a = b ) ∈ i n p u t do Union ( a , b ) done
               b1 (b2 )=b
      f o r a −−−−−−→ b i n e d g e s do
              a1 (a2 )=a
         i f ( a1 ( a2)=a ) ∈ i n p u t & ( b1 ( b2)=b ) ∈ i n p u t
                       & i s E q u a l ( a1 , b1 ) & i s E q u a l ( a2 , b2 )
         then Union ( a , b ) e l s e f a i l
      done
      return isEqual (x , y)

                            Figure 2: Verifier algorithm for trimmed forests


3.3      A verifier using trimmed forests
To avoid traversing the same edge several times, the Explain algorithm and its verifier are
using a union-find data structure. Therefore, the Explain verifier implicitly embeds a decision
procedure for the theory of equality. Our optimised verifier fully exploits this observation and
starts by feeding all the input equalities of the form a = b into its union-find. For the decision
procedure, new equalities are obtained by applying the congruence rule and efficiency crucially
depends on a clever indexing of the equations. The verifier does not require this costly machinery
and takes as argument a trimmed down proof forest reduced to the list of edges of the forest
             b1 (b2 )=b
of form a −−−−−−→ b. The edge labels indicate the equations that need to be paired to derive
             a1 (a2 )=a
a = b by the congruence rule. The algorithm of the verifier is given in Figure 2 where the
predicate isEqual(a,b) checks whether a and b have the same representative in the union-find
i.e., Find(a) = Find(b). Once again, a preliminary run of Explain is sufficient to generate a
proof for this optimised verifier.


4      Implementation and experiments
4.1      EUF verifiers in Coq
Our verifiers are implemented using the native version of Coq [8] which features persistent
arrays [13]. Persistent arrays are purely functional data-structures that ensure constant time
accesses and updates of the array as soon as it is used in a monadic way. For maximum
efficiency, all the verifiers make a pervasive use of those arrays that allow for an efficient union-
find implementation: union and find have their logarithmic asymptotic complexity.
     Compared to other languages, a constraint imposed by Coq is that all programs must be
terminating. The UFchecker (see Section 3.3) is trivially terminating. Termination of the
proof-forest verifier is more intricate because the Explain algorithm (see Figure 3.2) does not
terminate if the proof forest is ill-formed e.g., has cycles. However, if the proof forest is well-
formed, an edge is only traversed once. As a result, at each recursive call, our verifier decrements
an integer initialised to the size of the proof forest. In Coq, the verifier fails after exhausting
the maximum number of allowed recursive calls.
     For the sake of comparison, we have also implemented the EUF proof format of Z3. Z3
refutations are also generated using Explain [14, Section 3.4.2]. Unlike our verifiers, Z3 proofs
use explicit boolean reasoning and modus ponens. As a consequence formulae do not have

                                                                                                           61
Walking through the Forest                                                                                                                                   F. Besson, P.-E. Cornilleau, and R. Saillard


a constant size. As already noticed by others [9, 2], efficient verifiers require a careful han-
dling of sharing. Our terms and formulae are hash-consed ; sharing is therefore maximum and
comparison of terms or formulae is a constant-time operation.

4.2                          Benchmarks
We have assessed the efficiency of our EUF verifiers on several families of handcrafted conjunc-
tive EUF benchmarks. The benchmarks are large and all the literals are necessary to prove
unsatisfiability. The formulae are not representative of conflict clauses obtained from SMT-
LIB [3] benchmarks which are usually very small [7] and would not stress the scalability of
the verifiers. For all our benchmarks the running time of the verifiers is negligible especially
compared to the time spent type-checking the EUF proofs and the proof size is linear in the
size of the formulae.

                        90                                                                                                                           20000
                                    VeriT
                             Proof Forest                                                                                                                           VeriT
                                      Z3                                                                                                                     Proof Forest
                        80    UFchecker                                                                                                              18000            Z3
                                                                                                                                                              UFchecker

                        70                                                                                                                           16000


                                                                                                                        compiled library size (Ko)
                                                                                                                                                     14000
                        60
     coqc time (sec.)


                                                                                                                                                     12000
                        50
                                                                                                                                                     10000
                        40
                                                                                                                                                     8000
                        30
                                                                                                                                                     6000
                        20
                                                                                                                                                     4000

                        10                                                                                                                           2000

                        0                                                                                                                               0
                        0    10000 20000 30000 40000 50000 60000 70000 80000 90000 100000                                                               0    10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
                                          benchmark size (number of variables)                                                                                           benchmark size (number of variables)

                                            (a) Total running time                                                                                                          (b) Size of proofs


                                                                                   100
                                                                                                 VeriT
                                                                                          Proof Forest
                                                                                                   Z3
                                                                                           UFchecker

                                                                                    10
                                                           checking time (sec.)


                                                                                     1


                                                                                    0.1


                                                                                   0.01


                                                                                  0.001
                                                                                      0   10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
                                                                                                             number of variables

                                                                                           (c) Running time of the verifier

                                                                                  Figure 3: Comparison of the verifiers

           Figure 3 shows our experimental results for a family F of formulae of the general form
                            
                               x0 = x1          x0 6= x(j+1)·j
                       Fj =
                               f (xi·j , xi·j ) = xi·j+1 = ... = xi·j+j for i ∈ {0 . . . j}
The benchmarks are indexed by the number of EUF variables and the results are obtained
using a Linux laptop with a processor Intel Core 2 Duo T9600 (2.80GHz) and 4GB of Ram.

62
Walking through the Forest                                  F. Besson, P.-E. Cornilleau, and R. Saillard


Figure 3a shows the time needed to construct and compile Coq proof terms. Figure 3b shows
the size of the compiled proof terms. Figure 3c is focusing on the running time of the verifiers
excluding the time needed to construct and dump proof terms.
    For all our benchmarks, the UFchecker shows a noticeable advantage over the other verifiers.
We can also remark that its behaviour is more predicable. The veriT verifier [2] is using proofs
almost as small as proof forests but the traces generated by veriT are sometimes two orders of
magnitude bigger. In the timings, the pre-processing needed to perform this impressive proof
reduction is accounted for and might explain why the veriT verifier gets slower as the benchmark
size grows. Remark also that for the biggest benchmarks, veriT fails to generate proofs.
    Our Z3 verifier scales well despite being slightly but constantly outrun by the verifiers based
on proof-forests. The running time of the different verifiers is given Figure 3c using a logarithmic
scale. This emphases the fact that, except for the veriT verifier, the running time of the all
the verifiers is below the second and is therefore not the limiting factor of the verification. Not
surprisingly, the UFchecker requires smaller proofs and therefore its global checking time is also
smaller. For big benchmarks, its running time also tends to be smaller than the running time
of other verifiers.


5     Conclusion
Proof generating SMT solvers such as Z3 [15], CVC3 [4] or veriT [10] routinely generate EUF
proofs in various formats. Those formats have in common that proof-checking is essentially
linear in the size of the proof. In theory, our EUF proofs cannot be checked in linear time
and the verifiers need to embed a union-find algorithm. In practice, our experiments conclude
that our UFchecker verifier outperforms existing verifiers for EUF proofs and that succinct
EUF proofs are the key for scalability. Embedding union-find in EUF proofs was previously
proposed by Conchon et al., [12]. However, in their approach, union-find computations are
interleaved with logic inferences that are responsible for an overhead that is absent from our
verifiers. The importance of succinct proofs has been recognised by Stump [22] and its Logical
Framework with Side Conditions (LFSC). Unlike LFSC proofs, our proofs are less flexible but
purely computational and use the principle of proof by reflection [5, Chapter 16]. As future
work, we intend to integrate the UFchecker into the SMT proof verifier developed by several of
the current authors [6] and extend its scope to the logic of constructors.


References
 [1] W. Ackermann. Solvable Cases of the Decision Problem. Studies in Logic and the Foundations of
     Mathematics. North-Holland, Amsterdam, 1954.
 [2] M. Armand, G. Faure, B. Grégoire, C. Keller, L. Théry, and B. Werner. A Modular Integration
     of SAT/SMT Solvers to Coq through Proof Witnesses. In CPP, volume 7086 of LNCS, pages
     135–150. Springer, 2011.
 [3] C. Barret, A. Stump, and C. Tinelli. The SMT-LIB standard: Version 2.0, 2010.
 [4] C. Barrett and C. Tinelli. CVC3. In Proc. of CAV 2007, volume 4590 of LNCS, pages 298–302.
     Springer, 2007.
 [5] Y. Bertot and P. Castéran. Interactive Theorem Proving and Program Development. Coq’Art: The
     Calculus of Inductive Constructions. Texts in Theoretical Computer Science. Springer, 2004.
 [6] F. Besson, P-E. Cornilleau, and D. Pichardie. Modular SMT Proofs for Fast Reflexive Checking
     Inside Coq. In CPP, volume 7086 of LNCS, pages 151–166. Springer, 2011.

                                                                                                     63
Walking through the Forest                                   F. Besson, P.-E. Cornilleau, and R. Saillard


 [7] F. Besson, P-E. Cornilleau, and D. Pichardie. A nelson-oppen based proof system using theory
     specific proof systems. In PxTP 2011, 2011.
 [8] M. Boespflug, M. Dénès, and B. Grégoire. Full Reduction at Full Throttle. In CPP, volume 7086
     of LNCS, pages 362–377. Springer, 2011.
 [9] S. Böhme and T. Weber. Fast LCF-style Proof Reconstruction for Z3. In Proc. of ITP 2010,
     volume 6172 of LNCS, pages 179–194. Springer, 2010.
[10] T. Bouton, D. C. B. de Oliveira, D. Déharbe, and P. Fontaine. veriT: an open, trustable and
     efficient SMT-solver. In Proc. of CADE 2009, LNCS. Springer, 2009.
[11] E. Cohen, M. Dahlweid, M. A. Hillebrand, D. Leinenbach, M. Moskal, T. Santen, W. Schulte, and
     S. Tobies. VCC: A Practical System for Verifying Concurrent C. In TPHOLs, volume 5674 of
     LNCS, pages 23–42. Springer, 2009.
[12] S. Conchon, E. Contejean, J. Kanig, and S. Lescuyer. Lightweight integration of the ergo theo-
     rem prover inside a proof assistant. In Proceedings of the second workshop on Automated formal
     methods, AFM ’07, pages 55–59. ACM, 2007.
[13] S. Conchon and J-C. Filliâtre. Semi-persistent Data Structures. In ESOP, volume 4960 of LNCS,
     pages 322–336. Springer, 2008.
[14] L. M. de Moura and N. Bjørner. Proofs and Refutations, and Z3. In Proc. of the LPAR 2008
     Workshops, Knowledge Exchange: Automated Provers and Proof Assistants, volume 418. CEUR-
     WS.org, 2008.
[15] L. M. de Moura and N. Bjørner. Z3: An efficient SMT solver. In Proc. of TACAS 2008, volume
     4963 of LNCS, pages 337–340. Springer, 2008.
[16] B. Dutertre and L. de Moura. The Yices SMT solver. Tool paper at http://yices.csl.sri.com/tool-
     paper.pdf, 2006.
[17] J-C. Filliâtre and C. Marché. The Why/Krakatoa/Caduceus Platform for Deductive Program
     Verification. In CAV, volume 4590 of LNCS, pages 173–177, 2007.
[18] K. R. M. Leino. Dafny: An Automatic Program Verifier for Functional Correctness. In LPAR-16,
     volume 6355 of LNCS, pages 348–370. Springer, 2010.
[19] G. Nelson and D. C. Oppen. Fast decision procedures based on congruence closure. J. ACM,
     27(2):356–364, April 1980.
[20] R. Nieuwenhuis and A. Oliveras. Proof-Producing Congruence Closure. In Proc. of RTA 2005,
     volume 3467 of LNCS, pages 453–468. Springer, 2005.
[21] R. Saillard. EUF Verifiers in Coq. http://www.irisa.fr/celtique/ext/euf.
[22] A. Stump. Proof checking technology for satisfiability modulo theories. Electron. Notes Theor.
     Comput. Sci., 228:121–133, January 2009.
[23] R. E. Tarjan. Efficiency of a good but not linear set union algorithm. J. ACM, 22(2):215–225,
     April 1975.


64