=Paper=
{{Paper
|id=Vol-2752/paper11
|storemode=property
|title=Boolean Reasoning in a Higher-Order Superposition Prover
|pdfUrl=https://ceur-ws.org/Vol-2752/paper11.pdf
|volume=Vol-2752
|authors=Petar Vukmirović,Visa Nummelin
|dblpUrl=https://dblp.org/rec/conf/cade/VukmirovicN20
}}
==Boolean Reasoning in a Higher-Order Superposition Prover==
<pdf width="1500px">https://ceur-ws.org/Vol-2752/paper11.pdf</pdf>
<pre>
Boolean Reasoning in a Higher-Order Superposition
Prover
Petar Vukmirovića , Visa Nummelina
a
    Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands


                                         Abstract
                                         We present a pragmatic approach to extending a Boolean-free higher-order superposition calculus to
                                         support Boolean reasoning. Our approach extends inference rules that have been used only in a first-
                                         order setting, uses some well-known rules previously implemented in higher-order provers, as well as
                                         new rules. We have implemented the approach in the Zipperposition theorem prover. The evaluation
                                         shows highly competitive performance of our approach and clear improvement over previous techniques.

                                         Keywords
                                         higher-order logic, theorem proving, superposition


1. Introduction
In the last decades, automatic theorem provers have been used successfully as backends to
“hammers” in proof assistants [1, 2] and to software verifiers [3]. Most advanced provers, such as
CVC4 [4], E [5], and Vampire [6], are based on first-order logic, whereas most frontends that use
them are based on versions of higher-order logic. Thus, there is a large gap in expressiveness
between front- and backends. This gap is bridged using well-known translations from higher-
order to first-order logic [7, 8]. However, translations are usually less efficient than native
support [9, 10, 11]. The distinguishing features of higher-order logic used by proof assistants
that the translation must eliminate include 𝜆-binders, function extensionality – the property that
functions are equal if they agree on every argument, described by the axiom @p𝑥, 𝑦 : 𝜏 Ñ 𝜈q.
p@p𝑧 : 𝜏 q. 𝑥 𝑧 « 𝑦 𝑧q ñ 𝑥 « 𝑦, and formulas occurring as arguments of function symbols [8].
   A group of authors including Vukmirović [10] recently designed a complete calculus for
extensional Boolean-free higher-order logic. This calculus is an extension of superposition,
the calculus used in most successful provers such as E or Vampire. The extension removes the
need to translate the first two above mentioned features of higher-order logic. Kotelnikov et al.
[12, 13] extended the language of first-order logic to support the third feature of higher-order
logic that requires translation. They described two approaches: one based on calculus-level
treatment of Booleans and the other, which requires no changes to the calculus, based on
preprocessing.


PAAR 2020: Seventh Workshop on Practical Aspects of Automated Reasoning, June 29–30, 2020, Paris, France (virtual)
  p.vukmirovic@vu.nl (P. Vukmirović); visa.nummelin@vu.nl (V. Nummelin)
{ https://petarvukmirovic.github.io/home/ (P. Vukmirović)
 0000-0001-7049-6847 (P. Vukmirović); 0000-0003-0078-790X (V. Nummelin)
                                       © 2020 Copyright for this paper by its authors.
                                       Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                        148
Petar Vukmirović et al. CEUR Workshop Proceedings                                         148–166


   To fully bridge the gap between higher-order and first-order tools, we combine the two
approaches: we use the efficient higher-order superposition calculus and extend it with inference
rules that reason with Boolean terms. In early work, Kotelnikov et al. [12] have described a
FOOL paramodulation rule that, under some order requirements, removes the need for the axiom
describing the Boolean domain – @p𝑝 : 𝑜q. 𝑝 « J _ 𝑝 « K. In this approach, it is assumed that
a problem with formulas occurring as arguments of symbols is translated to first-order logic.
   The backbone of our approach is based on an extension of this rule to higher-order logic.
Namely, we do not translate away any Boolean structure that is nested inside non-Boolean
terms and allow our rule to hoist the nested Booleans to the literal level. Then, we clausify the
resulting formula (i.e., a clause that contains formulas in literals) using a new rule.
   An important feature that we inherit by building on top of Bentkamp et al. [10] is support
for (function) extensionality. Moving to higher-order logic with Booleans also means that we
need to consider Boolean extensionality: @p𝑝 : 𝑜qp𝑞 : 𝑜q. p𝑝 ô 𝑞q ñ 𝑝 « 𝑞. We extend the rules
of Bentkamp et al. that treat function extensionality to support Boolean extensionality as well.
   Rules that extend the two orthogonal approaches form the basis of our support for Boolean
reasoning (Section 3). In addition, we have implemented rules that are inspired by the ones
used in the higher-order provers Leo-III [14] and Satallax [15], such as elimination of Leibniz
equality, primitive instantiation and treatment of choice operator [16]. We have also designed
new rules that use higher-order unification to resolve Boolean formulas that are hoisted to literal
level, delay clausification of non-atomic literals, reason about formulas under 𝜆-binders, and
many others. Even though the rules we use are inspired by the ones of refutationally complete
higher-order provers, we do not guarantee completeness of our extension of 𝜆-superposition.
   We compare our native approach with two alternatives based on preprocessing (Section 4).
First, we compare it to an axiomatization of the theory of Booleans. Second, inspired by work
of Kotelnikov et al. [13], we implemented the preprocessing approach that does not require
introduction of Boolean axioms. We also discuss some examples, coming from TPTP [17], that
illustrate advantages and disadvantages of our approach (Section 5).
   Our approach is implemented in the Zipperposition theorem prover [18, 19]. Zipperposition
is an easily extensible open source prover that Bentkamp et al. used to implement their higher-
order superposition calculus. We further extend their implementation.
   We performed an extensive evaluation of our approach (Section 6). In addition to evaluating
different configurations of our new rules, we have compared them to full higher-order provers
CVC4, Leo-III, Satallax and Vampire. The results suggest that it is beneficial to natively support
Boolean reasoning – the approach outperforms preprocessing-based approaches. Furthermore,
it is very competitive with state-of-the-art higher order provers. We discuss the differences
between our approach and the approaches we base on, as well as related approaches (Section 7).


2. Background
We base our work on Bentkamp et al.’s [10] extensional polymorphic clausal higher-order logic.
We extend the syntax of this logic by adding logical connectives to the language of terms. The
semantic of the logic is extended by interpreting Boolean type 𝑜 as a two-element domain. This
amounts to extending Bentkamp et al’s fragment of higher-order logic to full-higher order logic


                                               149
Petar Vukmirović et al. CEUR Workshop Proceedings                                                148–166


(HOL). Our notation, definitions and the following text are largely based on Bentkamp et al.’s.
   A signature is a quadruple pΣty , Vty , Σ, Vq where Σty is a set of type constructors, Vty is a
set of type variables and Σ and V are sets of constants and term variables, respectively. We
require nullary type constructors 𝜄 and 𝑜, as well as binary constructor Ñ to be in Σty . A
type 𝜏, 𝜐 is either a type variable 𝛼 P Vty or of the form 𝜅p𝜏1 , . . . 𝜏𝑛 q where 𝜅 is an 𝑛-ary type
constructor. We write 𝜅 for 𝜅pq, 𝜏 Ñ 𝜐 for Ñ p𝜏, 𝜐q, and we abbreviate tuples p𝑎1 , . . . , 𝑎𝑛 q as
𝑎𝑛 for 𝑛 ě 0. Similarly, we drop parentheses to shorten 𝜏1 Ñ p¨ ¨ ¨ Ñ p𝜏𝑛´1 Ñ 𝜏𝑛 q ¨ ¨ ¨ q into
𝜏1 Ñ ¨ ¨ ¨ Ñ 𝜏𝑛 . Each symbol in Σ is assigned a type declaration of the form Π𝛼𝑛 . 𝜏 where all
variables occurring in 𝜏 are among 𝛼𝑛 .
   Function symbols a, b, f, g, . . . are elements of Σ; their type declarations are written as
f : Π𝛼𝑛 . 𝜏 . Term variables from the set V are written 𝑥, 𝑦, 𝑧 . . . and we denote their types as
𝑥 : 𝜏 . When the type is not important, we omit type declarations. We assume that symbols
J, K, ␣, ^, _, ñ, ô with their standard meanings and type declarations are elements of Σ.
Furthermore, we assume that polymorphic symbols @ and D with type declarations Π𝛼. p𝛼 Ñ
𝑜q Ñ 𝑜 and « : Π𝛼. 𝛼 Ñ 𝛼 Ñ 𝑜 are in Σ, with their standard meanings. All these symbols are
called logical symbols. We write binary logical symbols in infix notation.
   Terms are defined inductively as follows. Variables 𝑥 : 𝜏 are terms of type 𝜏 . If f : Π𝛼𝑛 . 𝜏
is in Σ and 𝜐 𝑛 is a tuple of types, called type arguments, then fx𝜐 𝑛 y (written as f if 𝑛 “ 0, or if
type arguments can be inferred from the context) is a term of type 𝜏 t𝛼𝑛 ÞÑ 𝜐 𝑛 u, called constant.
If 𝑥 is a variable of type 𝜏 and 𝑠 is a term of type 𝜐 then 𝜆𝑥. 𝑠 is a term of type 𝜏 Ñ 𝜐. If 𝑠 and 𝑡
are of type 𝜏 Ñ 𝜐 and 𝜏 , respectively, then 𝑠 𝑡 is a term of type 𝜐. We call terms of Boolean type
(𝑜) formulas and denote them by 𝑓, 𝑔, ℎ, . . .; we use 𝑝, 𝑞, 𝑟, . . . for variables whose result type
is 𝑜 and p, q, r for constants with the same result type. We shorten iterated lambda abstraction
𝜆𝑥1 . . . . 𝜆𝑥𝑛 . 𝑠 to 𝜆𝑥𝑛 . 𝑠, and iterated application p𝑠 𝑡1 q ¨ ¨ ¨ 𝑡𝑛 to 𝑠 𝑡𝑛 . We assume the standard
notion of free and bound variables, capture-avoiding substitutions 𝜎, 𝜌, 𝜃, . . ., and 𝛼-, 𝛽-, 𝜂-
conversion. Unless stated otherwise, we view terms as 𝛼𝛽𝜂-equivalence classes, with 𝜂-long
𝛽-reduced form as the representative. Each term 𝑠 can be uniquely written as 𝜆𝑥𝑚 . 𝑎 𝑡𝑛 where
𝑎 is either variable or constant and 𝑚, 𝑛 ě 0; we call 𝑎 the head of 𝑠. We say that a term 𝑎 𝑡𝑛 is
written in spine notation [20]. Following our previous work [21], we define nonstandard notion
of subterms and positions inductively as a graceful extension of the first-order counterparts: a
term 𝑠 is a subterm of itself at position 𝜀. If 𝑠 is a subterm of 𝑡𝑖 at position 𝑝 then 𝑠 is a subterm
of 𝑎 𝑡𝑛 at position 𝑖.𝑝, where 𝑎 is a head. If 𝑠 is a subterm of 𝑡 at position 𝑝 then 𝑠 is a subterm
of 𝜆𝑥. 𝑡 at position 1.𝑝. We use 𝑠|𝑝 to denote subterm of 𝑠 at position 𝑝.
   Given a formula 𝑓 we call its Boolean subterm 𝑓 |𝑝 a top-level Boolean if for all proper prefixes
𝑞 of 𝑝, the head of 𝑓 |𝑞 is a logical constant. Otherwise, we call it a nested Boolean. For example,
in the formula 𝑓 “ h a « g pp ñ qq _ ␣p, 𝑓 |1 and 𝑓 |2 are top-level Booleans, whereas 𝑓 |1.2.1
is a nested Boolean (as well as its subterms). Only top-level Booleans are allowed in first-order
logic, whereas nested Booleans are characteristic for higher-order logic. A formula is called an
atom if it is of the form 𝑎 𝑡𝑛 , where 𝑎 is a non-logical head, or of the form 𝑠 « 𝑡, where if 𝑠 or 𝑡
are of type 𝑜, and one of them has a logical head, the other one must be J or K. A literal 𝐿 is
an atom or its negation. A clause 𝐶 is a multiset of literals, interpreted and written (abusing
_) disjunctively as 𝐿1 _ ¨ ¨ ¨ _ 𝐿𝑛 . We write 𝑠 ff 𝑡 for ␣p𝑠 « 𝑡q. We say a variable is free in a
clause 𝐶 if it is not bound inside any subterm of a literal in 𝐶.


                                                   150
Petar Vukmirović et al. CEUR Workshop Proceedings                                          148–166


3. The Native Approach
Some support for Booleans was already present in Zipperposition before we started extending the
calculus of Bentkamp et al. In this section, we start by describing the internals of Zipperposition
responsible for reasoning with Booleans. We continue by describing 15 rules that we have
implemented. For ease of presentation we divide them in three categories. We assume some
familiarity with the superposition calculus [22] and adopt the notation used by Schulz [23].

3.1. Support for Booleans in Zipperposition
Zipperposition is an open source1 prover written in OCaml. From its inception, it was designed
as a prover that supports easy extension of its base superposition calculus to various theories,
including arithmetic, induction and limited support for higher-order logic [18].
   Zipperposition internally stores applications in flattened, spine notation. It also exploits
associativity of ^ and _ to flatten nested applications of these symbols. Thus, the terms
p ^ pq ^ rq and pp ^ qq ^ r are represented as ^ p q r. The prover’s support for 𝜆-terms is used
to represent quantified nested Booleans: formulas @𝑥. 𝑓 and D𝑥. 𝑓 are represented as @ p𝜆𝑥. 𝑓 q
and D p𝜆𝑥. 𝑓 q. After clausification of the input problem, no nested Booleans will be modified or
renamed using fresh predicate symbols.
   The version of Zipperposition preceding our modifications distinguished between equational
and non-equational literals. Following E [5], we modified Zipperposition to represent all literals
equationally: a non-equational literal 𝑓 is stored as 𝑓 « J, whereas ␣𝑓 is stored as 𝑓 ff J.
Equations of the form 𝑓 « K and 𝑓 ff K are transformed into 𝑓 ff J and 𝑓 « J, respectively.

3.2. Core Rules
Kotelnikov et al. [12], to the best of our knowledge, pioneered the approach of extending a
first-order superposition prover to support nested Booleans. They call effects of including the
axiom @p𝑝 : 𝑜q. 𝑝 « J _ 𝑝 « K a “recipe for disaster”. To combat the explosive behavior of
the axiom, they imposed the following two requirements to the simplification order ą (which
is a parameter to the superposition calculus): J ą K and J and K are two smallest ground
terms with respect to ą. If these two requirements are met, there is no self-paramodulation
of the clause and only paramodulation possible is from literal 𝑝 « J of the mentioned axiom
into a Boolean subterm of another clause. Finally, Kotelnikov et al. replace the axiom with the
inference rule FOOL Paramodulation (FP):

                                                   𝐶r𝑓 s
                                                              FP
                                             𝐶rJs _ 𝑓 « K

where 𝑓 is a nested non-variable Boolean subterm of clause 𝐶, different from J and K. In
addition, they translate the initial problem containing nested Booleans to first-order logic
without interpreted Booleans; this translation introduces proxy symbols for J and K, and proxy
type for 𝑜.

   1
       https://github.com/sneeuwballen/zipperposition


                                                        151
Petar Vukmirović et al. CEUR Workshop Proceedings                                          148–166


  We created two rules that are syntactically similar to FP but are adapted for higher-order
logic with one key distinction – we do not perform any translation:

               𝐶r𝑓 s                                     𝐶r𝑓 s
                           Cases                                               CasesSimp
         𝐶rKs _ 𝑓 « J                      𝐶rKs _ 𝑓 « J 𝐶rJs _ 𝑓 ff J
The double line in the definition of CasesSimp denotes that the premise is replaced by conclusions;
obviously, the prover that uses the rules should not include them both at the same time. In
addition, since literals 𝑓 « K are represented as negative equations 𝑓 ff J, which cannot be
used to paramodulate from, we change the first requirement on the order to K ą J.
   These two rules hoist Boolean subterms 𝑓 to the literal level; therefore, some results of Cases
and CasesSimp will have literals of the form 𝑓 « J (or 𝑓 ff J) where 𝑓 is not an atom. This
introduces the need for the rule called eager clausification (EC):
                                               𝐶
                                                       EC
                                         𝐷1 ¨ ¨ ¨ 𝐷𝑚
                                                                        .
We say that a clause is standard if all of its literals are of the form 𝑠 « 𝑡, where 𝑠 and 𝑡 are not
                             .                                                          .
Booleans or of the form 𝑓 « J, where the head of 𝑓 is not a logical symbol and « denotes «
or ff. The rule EC is applicable if clause 𝐶 “ 𝐿1 _ ¨ ¨ ¨ _ 𝐿𝑛 is not standard. The resulting
clauses 𝐷𝑚 represent the result of clausification of the formula @𝑥. 𝐿1 _ ¨ ¨ ¨ _ 𝐿𝑛 where 𝑥
are all free variables of 𝐶.
   An advantage of leaving nested Booleans unmodified is that the prover will be able to prove
some problems containing them without using the prolific rules described above. For example,
given two clauses f pp 𝑥 ñ p 𝑦q « a and f pp a ñ p bq ff a, the empty clause can easily be
derived without the above rules. A disadvantage of this approach is that the proving process
will periodically be interrupted by expensive calls to the clausification algorithm.
   Naive application of Cases and CasesSimp rules can result in many redundant clauses.
Consider a clause 𝐶 “ p pp pp pp aqqq « J where p : 𝑜 Ñ 𝑜, a : 𝑜. Then, the clause
𝐷 “ a « J _ p K « J can be derived from 𝐶 in eight ways using the rules, depending
on which nested Boolean subterm was chosen for the inference. In general, if a clause has
a subterm occurrence of the form p𝑛 a, where both p and a have result type 𝑜, the clause
a « J _ p K « J can be derived in 2𝑛´1 ways. To combat these issues we implemented
pragmatic restrictions of the rule: only the leftmost outermost (or innermost) eligible subterm
will be considered. With this modification 𝐷 can be derived in only one way. Furthermore,
some intermediate conclusions of the rules will not be derived, pruning the search space.
   The clausification algorithm by Nonnengart and Weidenbach [24] aggressively simplifies the
input problem using well-known Boolean equivalences before clausifying it. For example, the
formula p ^ J will be replaced by p. To simplify nested Booleans we implemented the rule
                                        𝐶r𝑓 𝜎s
                                                   BoolSimp
                                        𝐶r𝑔𝜎s
where 𝑓 ÝÑ 𝑔 P 𝐸 runs over fixed set of rewrite rules 𝐸, and 𝜎 is any substitution. In the
current implementation of Zipperposition, 𝐸 consists of the rules described by Nonnengart


                                                 152
Petar Vukmirović et al. CEUR Workshop Proceedings                                             148–166


and Weidenbach [24, Section 3]. This set contains the rules describing how each logical symbol
behaves when either of its argument is J or K: for example, it includes J ñ 𝑝 ÝÑ 𝑝 and
𝑝 ñ J ÝÑ J. Leo-III implements a similar rule, called simp [25, Section 4.2.1.].
   Our decision to represent negative atoms as negative equations was motivated by the need
to alter Zipperposition’s earlier behavior as little as possible. Namely, negative atoms were not
used as literals that can be used to paramodulate from, and as such added to the laziness of the
superposition calculus. However, it might be useful to consider unit clauses of the form 𝑓 ff J
as 𝑓 « K to strengthen demodulation. To that end, we have introduced the following rule:

                                  𝑓 ff J      𝐶r𝑓 𝜎s
                                                        BoolDemod
                                  𝑓 ff J       𝐶rKs

3.3. Higher-Order Considerations
To achieve refutational completeness of higher-order resolution and similar calculi it is necessary
to instantiate variables with result type 𝑜, predicate variables, with arbitrary formulas [25, 16].
Fortunately, we can approximate the formulas using a complete set of logical symbols (e.g., ␣,
@, and ^). Since such an approximation is not only necessary for completeness of some calculi,
but very useful in practice, we implemented the primitive instantiation (PI) rule:
                                                         .
                                       𝐶 _ 𝜆𝑥𝑚 . 𝑝 𝑠𝑛 « 𝑡
                                                    .              PI
                                 p𝐶 _ 𝜆𝑥𝑚 . 𝑝 𝑠𝑛 « 𝑡qt𝑝 ÞÑ 𝑓 u

where 𝑝 is a free variable of the type 𝜏1 Ñ ¨ ¨ ¨ Ñ 𝜏𝑛 Ñ 𝑜. Choosing a different 𝑓 that
instantiates 𝑝, we can balance between explosiveness of approximating a complete set of logical
symbols and incompleteness of pragmatic approaches. We borrow the notion of imitation
from higher-order unification jargon [21]: we say that the term 𝜆𝑥𝑚 . f p𝑦1 𝑥𝑚 q ¨ ¨ ¨ p𝑦𝑛 𝑥𝑚 q is an
imitation of constant f : 𝜏1 Ñ ¨ ¨ ¨ Ñ 𝜏𝑛 Ñ 𝜏 for some variable 𝑧 of type 𝜈1 Ñ ¨ ¨ ¨ Ñ 𝜈𝑚 Ñ 𝜏 .
Variables 𝑦 𝑛 are fresh free variables, where each 𝑦𝑖 has the type 𝜈1 Ñ ¨ ¨ ¨ Ñ 𝜈𝑚 Ñ 𝜏𝑖 ; variable
𝑥𝑖 is of type 𝜈𝑖 .
    Rule PI was already implemented by Simon Cruanes in Zipperposition, before we started our
modifications. The rule has different modes that generate sets of possible terms 𝑓 for 𝑝 : 𝜏1 Ñ
¨ ¨ ¨ Ñ 𝜏𝑛 Ñ 𝑜: Full, Pragmatic, and Imit ‹ where ‹ is an element of a set of logical constants
𝑃 “ t^, _, «x𝛼y, ␣, @, Du. Mode Full contains imitations (for 𝑝) of all elements of 𝑃 . Mode
Pragmatic contains imitations of ␣, J and K; if there exist indices 𝑖, 𝑗 such that 𝑖 ‰ 𝑗 and 𝜏𝑖 “ 𝜏𝑗 ,
it contains 𝜆𝑥𝑛 . 𝑥𝑖 « 𝑥𝑗 ; if there exist indices 𝑖, 𝑗 such that 𝑖 ‰ 𝑗, and 𝜏𝑖 “ 𝜏𝑗 “ 𝑜, then it
contains 𝜆𝑥𝑛 . 𝑥𝑖 ^𝑥𝑗 and 𝜆𝑥𝑛 . 𝑥𝑖 _𝑥𝑗 ; if for some 𝑖, 𝜏𝑖 “ 𝑜, then it contains 𝜆𝑥𝑛 . 𝑥𝑖 . Mode Imit ‹
contains imitations of J, K and ‹ (except for Imit @D which contains imitations of both @ and D).
    While experimenting with our implementation we have noticed some proof patterns that
led us to come up with the following modifications. First, it often suffices to perform PI only on
initial clauses – which is why we allow the rule to be applied only to the clauses created using
at most 𝑘 generating inferences. Second, if the rule was used in the proof, its premise is usually
only used as part of that inference – which is why we implemented a version of PI that removes


                                                 153
Petar Vukmirović et al. CEUR Workshop Proceedings                                                     148–166


the clause after all possible PI inferences have been performed. We observed that the mode Imit ‹
is useful in practice since often only a single approximation of a logical symbol is necessary.
   Efficiently treating axiom of choice is notoriously difficult for higher-order provers. Andrews
formulates this axiom as @p𝑝 : 𝛼 Ñ 𝑜q. pDp𝑥 : 𝛼q. 𝑝 𝑥q ñ 𝑝 p𝜀 𝑝q, where 𝜀 : Π𝛼. p𝛼 Ñ 𝑜q Ñ 𝛼
denotes the choice operator [16]. After clausification, this axiom becomes 𝑝 𝑥 ff J _ 𝑝 p𝜀 𝑝q « J.
Since term 𝑝 𝑥 matches any Boolean term in the proof state, this axiom is very explosive.
Therefore, Leo-III [14] deals with the choice operator on the calculus level. Namely, whenever a
clause 𝐶 “ 𝑝 𝑥 ff J _ 𝑝 pf 𝑝q « J is chosen for processing, 𝐶 is removed from the proof state
and f is added to set of choice functions CF (which initially contains just 𝜀). Later, elements
of CF will be used to heuristically instantiate the axiom of choice. We reused the method of
recognizing choice functions, but generalized the rule for creating the instance of the axiom
(assuming 𝜉 P CF ):

                                               𝐶r𝜉 𝑡s
                                                                           Choice
                          𝑥 p𝑡 𝑦q ff J _ 𝑥 p𝑡 p𝜉 p𝜆𝑧. 𝑥 p𝑡 𝑧qqqq « J

    Let 𝐷 be the conclusion of Choice. The fresh variable 𝑥 in 𝐷 acts as arbitrary context around
𝑡, the chosen instantiation for 𝑝 from axiom of choice; the variable 𝑥 can later be replaced by
imitation of logical symbols to create more complex instantiations of the choice axiom. To
generate useful instances early, we create 𝐷t𝑥 ÞÑ 𝜆𝑧. 𝑧u and 𝐷t𝑥 ÞÑ 𝜆𝑧. ␣𝑧u. Then, based on
Zipperposition parameters, 𝐷 will either be deleted or kept. Note that 𝐷 will not subsume its
instances, since the matching algorithm of Zipperposition is too weak for this.
    Most provers natively support extensionality reasoning: Bhayat et al. [26] modify first-order
unification to return unification constraints consisting of pairs of terms of functional type,
whereas Steen relies on the unification rules of Leo-III’s calculus [25, Section 4.3.3.] to deal with
extensionality. Bentkamp et al [10] altered core generating inference rules of the superposition
calculus to support extensionality. Instead of requiring that terms involved in the inference
are unifiable, it is required that they can be decomposed into disagreement pairs such that at
least one of the disagreement pairs is of functional type. Disagreement pairs of terms 𝑠 and
𝑡 of the same type are defined inductively using function dp: dpp𝑠, 𝑡q “ H if 𝑠 and 𝑡 are equal;
dpp𝑎 𝑠𝑛 , 𝑏 𝑡𝑚 q “ Ť
                   tp𝑎 𝑠𝑛 , 𝑏 𝑡𝑚 qu if 𝑎 and 𝑏 are different heads; dpp𝜆𝑥. 𝑠, 𝜆𝑦. 𝑡q “ tp𝜆𝑥. 𝑠, 𝜆𝑦. 𝑡qu;
dpp𝑎 𝑠𝑛 , 𝑎 𝑡𝑛 q “ 𝑛𝑖“1 dpp𝑠𝑖 , 𝑡𝑖 q. Then the extensionality rules are stated as follows:
                                                           .
                                𝑠«𝑡_𝐶             𝑢r𝑠1 s « 𝑣 _ 𝐷
                                                               .                  ExtSup
                    p𝑠1 ff 𝑠11 _ ¨ ¨ ¨ _ 𝑠𝑛 ff 𝑠1𝑛 _ 𝑢r𝑡s « 𝑣 _ 𝐶 _ 𝐷q𝜎
                                   𝑠 ff 𝑠1 _ 𝐶
                                                                   ExtER
                    p𝑠1 ff 𝑠11 _ ¨ ¨ ¨ _ 𝑠𝑛 ff 𝑠1𝑛 ¨ ¨ ¨ _ 𝐶q𝜎
                                       𝑠 « 𝑡 _ 𝑠1 « 𝑢 _ 𝐶
                                                                                   ExtEF
                    p𝑠1 ff 𝑠11 _ ¨ ¨ ¨ _ 𝑠𝑛 ff 𝑠1𝑛 _ 𝑡 ff 𝑢 _ 𝑠1 « 𝑢 _ 𝐶q𝜎

Rules ExtSup, ExtER, and ExtEF are extensional versions of superposition, equality resolution
and equality factoring [23]. The union of these three rules is denoted by Ext. In each rule, 𝜎
is a most general unifier of the types of 𝑠 and 𝑠1 , and dpp𝑠𝜎, 𝑠1 𝜎q “ tp𝑠1 , 𝑠11 q, . . . , p𝑠𝑛 , 𝑠1𝑛 qu. All


                                                     154
Petar Vukmirović et al. CEUR Workshop Proceedings                                         148–166


side conditions for extensional rules are the same as for the standard rules, except that condition
that 𝑠 and 𝑠1 are unifiable is replaced by the condition that at least one 𝑠𝑖 is of functional type
and that 𝑛 ą 0. This rule is easily extended to support Boolean extensionality by requiring that
at least one 𝑠𝑖 is of functional or type 𝑜, and adding the condition “dpp𝑓, 𝑔q “ tp𝑓, 𝑔qu if 𝑓 and
𝑔 are different formulas” to the definition of dp.
   Consider the clause f p␣p _ ␣qq ff f p␣pp ^ qqq. This problem is obviously unsatisfiable,
since arguments of f on different sides of the disequation are extensionally equal; however,
without Ext rules Zipperposition will rely on Cases(Simp) and EC rules to derive the empty
clause. Rule ExtER will generate 𝐶 “ ␣p _ ␣q ff ␣pp ^ qq. Then, 𝐶 will get clausified using
EC, effectively reducing the problem to ␣p␣p _ ␣q ô ␣pp ^ qqq, which is first-order.
   Zipperposition restricts ExtSup by requiring that 𝑠 and 𝑠1 are not of function or Boolean
types. If the terms are of function type, our experience is that better treatment of function
extensionality is to apply fresh free variables (or Skolem terms, depending on the sign [10])
to both sides of a (dis)equation to reduce it to a first-order literal; Boolean extensionality is
usually better supported by applying EC on the top-level Boolean term. Thus, for the following
discussion we can assume 𝑠 and 𝑠1 are not 𝜆-abstractions or formulas. Then, ExtSup is applicable
if 𝑠 and 𝑠1 have the same head, and a functional or Boolean subterm. To efficiently retrieve such
terms, we added an index that maps symbols to positions in clauses where they appear as a
head of a term that has a functional or Boolean subterm. This index will be empty for first-order
problems, incurring no overhead if extensionality reasoning is not needed. Furthermore, we
do not apply Ext rules if all disagreement pairs have at least one side whose head is a variable;
those will be dealt with more efficiently using standard, non-extensional, versions of the rules.
We also eagerly resolve literals 𝑠𝑖 ff 𝑠1𝑖 using at most one unifier returned by terminating,
pragmatic variant of unification algorithm by Vukmirović et al. [21].
   Expressiveness of higher-order logic allows users to define equality using a single axiom,
called Leibniz equality [16]: @p𝑥 : 𝛼qp𝑦 : 𝛼q. p@p𝑝 : 𝛼 Ñ 𝑜q. 𝑝 𝑥 ñ 𝑝 𝑦q ñ 𝑥 « 𝑦. Leibniz
equality often appears in TPTP problems. Since modern provers have the native support for
equality, it is usually beneficial to recognize and replace occurrences of Leibniz equality.
   Before we began our modifications, Zipperposition had a powerful rule that recognizes clauses
that contain variations of Leibniz equality and instantiates them with native equality. This rule
was designed by Simon Cruanes, and to the best of our knowledge, it has not been documented
so far. With his permission we describe this rule as follows:
                                           1                  𝑗
        𝑝 𝑠1𝑛 « J _ ¨ ¨ ¨ _ 𝑝 𝑠𝑖𝑛 « J _ 𝑝 𝑡𝑛 ff J _ ¨ ¨ ¨ _ 𝑝 𝑡𝑛 ff J _ 𝐶
                                                                            ElimPredVar
                      p𝑝 𝑠1𝑛 « J _ ¨ ¨ ¨ _ 𝑝 𝑠𝑖𝑛 « J _ 𝐶q𝜎

where 𝑝 is a free variable, 𝑝 does not occur in any 𝑠𝑙𝑘 or 𝑡𝑙𝑘 , or in 𝐶, and 𝜎 is defined as
t𝑝 ÞÑ 𝜆𝑥𝑛 . 𝑗𝑘“1 p 𝑛𝑙“1 𝑥𝑙 « 𝑡𝑘𝑙 qu.
             Ž      Ź
   To better understand how this rule removes variable-headed negative literals, consider the
clause 𝐶 “ 𝑝 a1 a2 « J _ 𝑝 b1 b2 ff J _ 𝑝 c1 c2 ff J. Since all side conditions are fulfilled, the
rule ElimPredVar will generate 𝜎 “ t𝑝 ÞÑ 𝜆𝑥𝑦. p𝑥 « b1 ^𝑦 « b2 q_p𝑥 « c1 ^𝑦 « c2 qu. After
applying 𝜎 to 𝐶 and subsequent 𝛽-reduction, negative literal 𝑝 b1 b2 ff J will reduce to pb1 «
b1 ^ b2 « b2 q _ pb1 « c1 ^ b2 « c2 q ff J, which is equivalent to K. Thus, we can remove this
literal and all negative literals of the form 𝑝 𝑡𝑛 ff J from 𝐶 and apply 𝜎 to the remaining ones.


                                               155
Petar Vukmirović et al. CEUR Workshop Proceedings                                          148–166


 The previous rule removes all variables occurring in disequations in one attempt. We imple-
mented two rules that behave more lazily, inspired by the ones present in Leo-III and Satallax:

  𝑝 𝑠𝑛 « J _ 𝑝 𝑡𝑛 ff J _ 𝐶                          𝑝 𝑠𝑛 ff J _ 𝑝 𝑡𝑛 « J _ 𝐶
                               ElimLeibniz`                                     ElimLeibniz´
        p𝑠𝑖 « 𝑡𝑖 _ 𝐶q𝜎                                  p𝑠𝑖 « 𝑡𝑖 _ 𝐶q𝜎 1

where 𝑝 is a free variable, 𝑝 does not occur in 𝑡𝑖 , 𝜎 “ t𝑝 ÞÑ 𝜆𝑥𝑛 . 𝑥𝑖 « 𝑡𝑖 u and 𝜎 1 “ t𝑝 ÞÑ
𝜆𝑥𝑛 . ␣p𝑥𝑖 « 𝑡𝑖 qu. This rule differs from ElimPredVar in three ways. First, it acts on occurrences
of variables in both positive and negative literals. Second, due to its simplicity, it usually does
not require EC as the following step. Third, it imposes much weaker conditions on 𝑝. However,
removing all negative variables in one step might improve performance. Coming back to example
of the clause 𝐶 “ 𝑝 a1 a2 « J _ 𝑝 b1 b2 ff J _ 𝑝 c1 c2 ff J, we can apply ElimLeibniz`
using the substitution 𝜎 “ t𝜆𝑥𝑦. 𝑥 « b1 u to obtain the clause 𝐶 1 “ a1 « b1 _ a1 ff c1 .

3.4. Additional Rules
Zipperposition’s unification algorithm [21] uses flattened representation of terms with logical
operators ^ and _ for heads to unify terms that are not unifiable modulo 𝛼𝛽𝜂-equivalence,
but are unifiable modulo associativity and commutativity of ^ and _. Let ˛ denote either ^
or _. When the unification algorithm is given two terms ˛ 𝑠𝑛 and ˛ 𝑡𝑛 , where neither of 𝑠𝑛 nor
𝑡𝑛 contain duplicates, it performs the following steps: First, it removes all terms that appear
in both 𝑠𝑛 and 𝑡𝑛 from the two argument tuples. Next, the remaining terms are sorted first by
their head term and then by their weight. Finally, the
                                                     ` sorted lists are unified pairwise. As an˘
example, consider the problem of unifying the pair ^ pp aq pq pf aqq, ^ pq pf aqq p𝑟 pf pf aqqq
where 𝑟 is a free variable. If the arguments of ^ are simply sorted as described above, we
would unsuccessfully try to unify p a with q pf aq. However, by removing term q pf aq from the
argument lists, we will be left with the problem pp a, 𝑟 pf pf aqqq which has a unifier.
   The winner of THF division of CASC-27 [27], Satallax [15], has one crucial advantage over
Zipperposition: it is based on higher-order tableaux, and as such it does not require formulas
to be converted to clauses. The advantage of tableaux is that once it instantiates a variable with
a term, this instantiation naturally propagates through the whole formula. In Zipperposition,
which is based on higher-order superposition, the original formula is clausified and instantiating
a variable in a clause 𝐶 does not automatically instantiate it in all clauses that are results of
clausification of the same formula as 𝐶. To mitigate this issue, we have created extensions of
equality resolution and equality factoring that take Boolean extensionality into account:


       𝑠 « 𝑠1 _ 𝐶                                     𝑝 𝑠𝑛 « J _ 𝑠1 ff J _ 𝐶
                    BoolER                                                      BoolEF`´
           𝐶𝜎                                           p𝑝 𝑠𝑛 « ␣𝑠1 _ 𝐶q𝜎
       𝑝 𝑠𝑛 ff J _ 𝑠1 « J _ 𝐶                         𝑝 𝑠𝑛 ff J _ 𝑠1 ff J _ 𝐶
                                  BoolEF´`                                      BoolEF´´
         p𝑝 𝑠𝑛 « ␣𝑠1 _ 𝐶q𝜎                               p𝑝 𝑠𝑛 « 𝑠1 _ 𝐶q𝜎

All side conditions except for the ones concerning the unifiability of terms are as in the original
equality resolution and equality factoring rules. In rule BoolER, 𝜎 unifies 𝑠 and ␣𝑠1 . In the


                                               156
Petar Vukmirović et al. CEUR Workshop Proceedings                                          148–166


`´ and ´` versions of BoolEF, 𝜎 unifies 𝑝 𝑠𝑛 and ␣𝑠1 , and in the remaining version it unifies
𝑝 𝑠𝑛 and 𝑠1 . Intuitively, these rules bring Boolean (dis)equations in the appropriate form for
application of the corresponding base rules. It suffices to consider literals of the form 𝑠 « 𝑠1
for BoolER since Zipperposition rewrites 𝑠 ô 𝑡 « J and ␣p𝑠 ô 𝑡q ff J to 𝑠 « 𝑡 (and does
analogous rewriting into 𝑠 ff 𝑡).
   Another approach to mitigate harmful effects of eager clausification is to delay it as long
as possible. Following the approach by Ganzinger and Stuber [28], we represent every input
formula 𝑓 as a unit clause 𝑓 « J and use the following lazy clausification (LC) rules:

       p𝑓 ^ 𝑔q « J _ 𝐶                   p𝑓 _ 𝑔q « J _ 𝐶                 p𝑓 ñ 𝑔q « J _ 𝐶
                               LC^                             LC_                             LCñ
   𝑓 «J_𝐶         𝑔«J_𝐶                𝑓 «J_𝑔«J_𝐶                      𝑓 ff J _ 𝑔 « J _ 𝐶

       p␣𝑓 q « J _ 𝐶              p@𝑥. 𝑓 q « J _ 𝐶                pD𝑥. 𝑓 q « J _ 𝐶
                         LC␣                          LC@                               LCD
         𝑓 ff J _ 𝐶                𝑓 t𝑥 ÞÑ 𝑦u _ 𝐶              𝑓 t𝑥 ÞÑ skx𝛼y𝑦 𝑛 u _ 𝐶
                                         𝑓 «𝑔_𝐶
                                                                        LC«
                       𝑓 ff J _ 𝑔 « J _ 𝐶       𝑓 « J _ 𝑔 ff J _ 𝐶
The rules described above are as given by Ganzinger and Stuber (adapted to our setting), with
the omission of rules for negative literals (𝑓 ff J), which are easy to derive and which can be
found in their work [28]. In LC« we require both 𝑓 and 𝑔 to be formulas and at least one of
them not to be J. In LC@ , 𝑦 is a fresh variable, and in LCD , sk is a fresh symbol and 𝛼 and 𝑦 𝑛
are all the type and term variables occurring freely in D𝑥. 𝑓 .
   Naive application of the LC rules can result in exponential blowup in problem size. To avoid
this, we rename formulas that have repeated occurrences. We keep the count of all non-atomic
                                                                                          .
formulas occurring as either side of a literal. Before applying the LC rule on a clause 𝑓 « J _ 𝐶,
we check whether the number of 𝑓 ’s occurrences exceeds the threshold 𝑘. If it does, based on
                              .
the polarity of the literal 𝑓 « J, we add the clause p 𝑦 𝑛 ff J _ 𝑓 « J (if the literal is positive)
or p 𝑦 𝑛 « J _ 𝑓 ff J (if the literal is negative), where 𝑦 𝑛 are all free variables of 𝑓 and p is
                                                   .                  .
a fresh symbol. Then, we replace the clause 𝑓 « J _ 𝐶 by p 𝑦 𝑛 « J _ 𝐶.
   Before the number of occurrences of 𝑓 is checked, we first check (using a fast, incomplete
matching algorithm) if there is a formula 𝑔, for which definition was already introduced, such
that 𝑔𝜎 “ 𝑓 , for some substitution 𝜎. This check can have three outcomes. First, if the definition
                                                                            .
q 𝑥𝑛 is already introduced for 𝑔 with the polarity matching that of 𝑓 « J, then 𝑓 is replaced
by pq 𝑥𝑛 q𝜎. Second, if the definition was introduced, but with different polarity, we create the
clause defining 𝑔 with the missing polarity, and replace 𝑓 with pq 𝑥𝑛 q𝜎. Last, if the there is no
renamed formula 𝑔 generalizing 𝑓 , then we perform the previously described check.
   In addition to reusing names for formula definitions, we reuse the Skolem symbols introduced
by the LCD rule. When LCD is applied to 𝑓 “ D𝑥. 𝑓 1 we check if there is a Skolem skx𝛼𝑚 y𝑦 𝑛
introduced for a formula 𝑔 “ D𝑥. 𝑔 1 , such that 𝑔𝜎 “ 𝑓 . If so, the symbol sk is reused and D𝑥. 𝑓 1
is replaced by 𝑓 1 t𝑥 ÞÑ pskx𝛼𝑚 y𝑦 𝑛 q𝜎u. Renaming and name reusing techniques are inspired by
the VCNF algorithm described by Reger et al. [29].
   Rules Cases and CasesSimp deal with Boolean terms, but we need to rely on extensionality
reasoning to deal with 𝜆-abstractions whose body has type 𝑜. Using the observation that the


                                                157
Petar Vukmirović et al. CEUR Workshop Proceedings                                           148–166


formula @𝑥𝑛 . 𝑓 implies that 𝜆𝑥𝑛 . 𝑓 is extensionally equal to 𝜆𝑥𝑛 . J (and similarly, if @𝑥𝑛 . ␣𝑓 ,
then 𝜆𝑥𝑛 . 𝑓 « 𝜆𝑥𝑛 . Kq, we designed the following rule (where all free variables of 𝑓 are 𝑥𝑛
and variables occurring freely in 𝐶):

                                     𝐶r𝜆𝑥𝑛 . 𝑓 s
                                                                             Interpret𝜆
          p@𝑥𝑛 . 𝑓 q ff J _ 𝐶r𝜆𝑥𝑛 . Js       p@𝑥𝑛 . ␣𝑓 q ff J _ 𝐶r𝜆𝑥𝑛 . Ks


4. Alternative Approaches
An alternative to heavy modifications of the prover needed to support the rules described above
is to treat Booleans as yet another theory. Since the theory of Booleans is finitely axiomatizable,
simply stating those axioms instead of creating special rules might seem appealing. Another
approach is to preprocess nested Booleans by hoisting them to the top level.

4.1. Axiomatization
A simple axiomatization of the theory of Booleans is given by Bentkamp et al. [10]. Following
their approach, we introduce the proxy type bool , which corresponds to 𝑜, to the signature.
We define proxy symbols t, f, not, and, or, impl, equiv, forall, exists, choice, and eq which correspond
to the homologous logical constants from Section 2. In their type declarations 𝑜 is replaced by
bool .
   To make this paper self-contained we include the axioms from Bentkamp et al. [10]. Definitions
of symbols are computational in nature: symbols are characterized by their behavior on t and
f. This also reduces interferences between different axioms. Axioms are listed as follows:

      t ff f                    or t 𝑥 « t                equiv 𝑥 𝑦 « and pimpl 𝑥 𝑦q pimpl 𝑦 𝑥q
  𝑥«t_𝑥«f                       or f 𝑥 « 𝑥                         forallx𝛼yp𝜆𝑥. tq « t
    not t « f                  impl t 𝑥 « 𝑥                  𝑦 « p𝜆𝑥. tq _ forallx𝛼y 𝑦 « f
    not f « t                  impl f 𝑥 « t                           existsx𝛼y 𝑦 «
   and t 𝑥 « 𝑥           𝑥 ff 𝑦 _ eqx𝛼y 𝑥 𝑦 « t              not pforallx𝛼y p𝜆𝑥. not p𝑦 𝑥qqq
   and f 𝑥 « f           𝑥 « 𝑦 _ eqx𝛼y 𝑥 𝑦 « f               𝑦 𝑥 « f _ 𝑦 pchoicex𝛼y𝑦q « t

4.2. Preprocessing Booleans
Kotelnikov et al. extended VCNF, Vampire’s algorithm for clausification, to support nested
Booleans [13]. Vukmirović et al. extended the clausification algorithm of Ehoh, the lambda-free
higher-order version of E, to support nested Booleans inspired by VCNF extension [11, Section
8]. Zipperposition and Ehoh share the same clausification algorithm, enabling us to reuse the
extension with one notable difference: unlike in Ehoh, not all nested Booleans different from
variables, J and K will be removed. Namely, Booleans that are below 𝜆-abstraction and contain
𝜆-bound variables will not be preprocessed. They cannot be easily hoisted to the level of an atom
in which they appear, since this process might leak any variables bound in the context in which
the nested Boolean appears. Similar preprocessing techniques are used in other higher-order
provers [30].


                                                158
Petar Vukmirović et al. CEUR Workshop Proceedings                                         148–166


5. Examples
The TPTP library contains thousands of higher-order benchmarks, many of them hand-crafted
to point out subtle interferences of functional and Boolean properties of higher order logic. In
this section we discuss some problems from the TPTP library that illustrate the advantages and
disadvantages of our approach.
   In the last five instances of the CASC theorem proving competition, the core calculus of the
best performing higher-order prover was tableaux – a striking contrast from first-order part
of the competition dominated by superposition-based provers. TPTP problem SET557^1 (a
statement of Cantor’s theorem) might shed some light on why tableaux-based provers excel on
higher-order problems. This problem conjectures that there is no surjection from a set to its
power set:
                        ␣pDp𝑥 : 𝜄 Ñ 𝜄 Ñ 𝑜q. @p𝑦 : 𝜄 Ñ 𝑜q. Dp𝑧 : 𝜄q. 𝑥 𝑧 « 𝑦q
After negating the conjecture and clausification this problem becomes sk1 psk2 𝑦q « 𝑦 where sk1
and sk2 are Skolem symbols. Then, we can use ArgCong rule [10] which applies fresh variable
𝑤 to both sides of the equation, yielding clause 𝐶 “ sk1 psk2 𝑦q 𝑤 « 𝑦 𝑤. Most superposition-
or paramodulation-based higher-order theorem provers (such as Leo-III, Vampire and Zip-
perposition) will split this clause into two clauses 𝐶1 “ sk1 psk2 𝑦q 𝑤 ff J _ 𝑦 𝑤 « J and
𝐶2 “ sk1 psk2 𝑦q 𝑤 « J _ 𝑦 𝑤 ff J. This clausification step makes the problem considerably
harder. Namely, the clause 𝐶 instantiated with the substitution t𝑦 ÞÑ 𝜆𝑥. ␣psk1 𝑥 𝑥q, 𝑤 ÞÑ
sk2 p𝜆𝑥. ␣psk1 𝑥 𝑥qqu yields the empty clause. However, if the original clause is split into two
as described above, Zipperposition will rely on PI rule to instantiate 𝑦 with imitation of ␣ and
on equality factoring to further instantiate this approximation. These desired inferences need
to be applied on both new clauses and represent only a fraction of inferences that can be done
with 𝐶1 and 𝐶2 , reducing the chance of successful proof attempt. Rule BoolER imitates the
behavior of tableaux prover: it essentially rewrites the clause 𝐶 into ␣psk1 psk2 𝑦q 𝑤q ff 𝑦 𝑤
which makes finding the necessary substitution easy and does not require a clausification step.
   Combining rule (Bool)ER with lazy clausification is very fruitful as the problem SYO033^1
illustrates. This problem also contains the single conjecture

                   Dp𝑥 : p𝜄 Ñ 𝑜q Ñ 𝑜q. @p𝑦 : 𝜄 Ñ 𝑜q.p𝑥 𝑦 ô p@p𝑧 : 𝜄q. 𝑦 𝑧qq

The problem is easily solved if we instantiate variable 𝑥 with the constant @. Moreover, the
prover does not have to blindly guess this instantiation for 𝑥, but can obtain it by unifying
𝑥 𝑦 with @ 𝑦 (which is the 𝜂-short form of @p𝑧 : 𝜄q. 𝑦 𝑧). However, when the problem is
clausified, all quantifiers are removed. Then, Zipperposition only finds the proof if appropriate
instantiation mode of PI is used, and if both clauses resulting from clausifying the negated
conjecture are appropriately instantiated. In contrast, lazy clausification will derive the clause
𝑥 psk 𝑥q ff @ psk 𝑥q from the negated conjecture in three steps. Then, equality resolution results
in an empty clause, swiftly finishing the proof without any explosive inferences. This effect
is even more pronounced on problems SYO287^5 and SYO288^5, in which critical proof step
is instantiation of a variable with imitation of _ and ^. In configurations that do not use lazy
clausification and BoolER, Zipperposition times out in any reasonable time limit; with those
two options it solves mentioned problems in less than 100 ms.


                                               159
Petar Vukmirović et al. CEUR Workshop Proceedings                                                     148–166


  In some cases, it is better to preprocess the problem. For example, TPTP problem SYO500^1.005
contains many nested Boolean terms:
     f0 pf1 pf1 pf1 pf2 pf3 pf3 pf3 pf4 aqqqqqqq « f0 pf0 pf0 pf1 pf2 pf2 pf2 pf3 pf4 pf4 pf4 aqqqqqqqqqqq
In this problem, all functions f𝑖 are of type 𝑜 Ñ 𝑜, and constant a is of type 𝑜. FOOL unfolding
of nested Boolean terms will result in exponential blowup in the problem size. However,
superposition-based theorem provers are well-equipped for this issue: their CNF algorithms
use smart simplifications and formula renaming to mitigate these effects. Moreover, when the
problem is preprocessed, the prover is aware of the problem size before the proving process
starts and can adjust its heuristics properly. E, Zipperposition and Vampire, instructed to
perform FOOL unfolding, solve the problem swiftly, using their default modes. However, if
problem is not preprocessed, Zipperposition struggles to prove it using Cases(Simp) and due to
the large number of (redundant) clauses it creates, succeeds only if specific heuristic choices are
made.


6. Evaluation
We performed extensive evaluation to determine usefulness of our approach. As our benchmark
set, we used all 2606 monomorphic theorems from the TPTP library, given in THF format.
All of the experiments were performed on StarExec [31] servers with Intel Xeon E5-2609 0
CPUs clocked at 2.40 GHz. The evaluation is separated in two parts that answer different
questions: How useful are the new rules? How does our approach compare with state-of-the-art
higher-order provers?

6.1. Evaluation of the Rules
For this part of the evaluation, we fixed a single well-performing Zipperposition configuration
called base (b). Since we are testing a single configuration, we used the CPU time limit of 15 s
– roughly the time a single configuration is given in a portfolio mode. Configuration b uses the
pragmatic variant pv21121 of the unification algorithm given by Vukmirović et al. [21]. It enables
BoolSimp rule, EC rule, PI rule in Pragmatic mode with 𝑘 “ 2, ElimLeibniz and ElimPredVar
rules, BoolER rule, and BoolEF rules. To evaluate the usefulness of all rules we described above,
we enable, disable or change the parameters of a single rule, while keeping all other parameters
of 𝑏 intact. In figures that contain sufficiently different configurations, cells are of the form 𝑛p𝑚q
where 𝑛 is the total number of proved problems by a particular configuration and 𝑚 is the number
of unique problems that a given configuration solved, compared to the other configurations in the
same figure. Intersections of rows and columns denote corresponding combination of parameters.
Result for the base configuration is written in cursive; the best result is written in bold.
   First, we tested different parameters of Cases and CasesSimp rules. In Figure 1 we report
the results. The columns correspond to three possible options to choose subterm on which the
inference is performed: a stands for any eligible subterm, lo and li stands for leftmost outermost
and leftmost innermost subterms, respectively. The rows correspond to two different rules: b is
the base configuration, which uses CasesSimp, and bc swaps this rule for Cases. Although the
margin is slim, the results show it is usually preferable to select leftmost-outermost subterm.


                                                     160
Petar Vukmirović et al. CEUR Workshop Proceedings                                                 148–166


                                              a     lo            li
                                  b        1646    1648       1640
                                  bc       1644    1645       1644
Figure 1: Effect of the Cases(Simp) rule on success rate


                       ´PI       b𝑝        b𝑓     b^       b_          b«        b␣    b@D
             𝑘“1               1648     1628      1637     1634        1630     1641   1637
             𝑘“2       1636    1646     1629      1636     1631        1627     1638   1634
             𝑘“8               1643     1625      1633     1631        1623     1637   1635
Figure 2: Effect of PI rule on success rate


                    ´EL           `EL                                         ´BEF      `BEF
       ´EPV        1584 (0)     1644 (0)                    ´BER            1644 (2)   1643 (0)
       `EPV        1612 (0)     1646 (0)                    `BER            1645 (0)   1646 (0)
 Figure 3: Effect of Leibniz equality elimination Figure 4: Effect of BoolER and BoolEF


   Second, we evaluated all the modes of PI rule with 3 values for parameter 𝑘: 1, 2, and 8 (Figure
2). The columns denote, from left to right: disabling the PI rule, Pragmatic mode, Full mode,
and Imit ‹ modes with appropriate logical symbols. The rows denote different values of 𝑘. The
results show that different values for 𝑘 have a modest effect on success rate. The raw data reveal
that when we focus our attention to configurations with 𝑘 “ 2, mode Full can solve 10 problems
no other mode (including disabling PI rule) can. Modes Imit ^ and Pragmatic solve 2 problems,
whereas Imit _ solves one problem uniquely. This result suggests that, even though this is not
evident from Figure 2, sets of problems solved solved by different modes somewhat differ.
   Figure 3 gives results of evaluating rules that treat Leibniz equality on the calculus level: EL
stands for ElimLeibniz, whereas EPV denotes ElimPredVar; signs ´ and ` denote that rule is
removed from or added to configuration b, respectively. Disabling both rules severely lowers
the success rate. The results suggest that including ElimLeibniz is beneficial to performance.
   Similarly, Figure 4 discusses merits of including (`) or excluding (´) BoolER (BER) and
BoolEF (BEF) rules. Our expectations were that inclusion of those two rules would make bigger
impact on success rate. It turned out that, in practice, most of the effects of these rules could be
achieved using a combination of the PI rule and basic superposition calculus rules.
   Combining these two rules with lazy clausification is more useful: when the rule EC is replaced
by the rule LC, the success rate increases (compared to 1646 problems solved by 𝑏) to 1660
problems. We also discovered that reasoning with choice is useful: when rule Choice is enabled,
the success rate increases to 1653. We determined that including or excluding the conclusion 𝐷
of Choice, after it is simplified, makes no difference. Counterintuitively, disabling BoolSimp
rule results in 1640 problems, which is only 6 problems short of configuration 𝑏. Disabling Ext
and Interpret-𝜆 rules results in solving 25 and 31 problems less, respectively. Raw data show


                                                   161
Petar Vukmirović et al. CEUR Workshop Proceedings                                         148–166


                      CVC4       Leo-III     Satallax     Vampire      Zipperposition
           pure      1806 (5)    1627 (0)    2067 (0)     1924 (7)        1980 ( 0)
           coop         –        2085 (3)    2214 (9)        –            2190 (17)
Figure 5: Comparison with other higher-order provers


that in total, using configurations from Figure 1 to Figure 4, 1682 problems can be solved.
  Last, we compare our approach to alternatives. Axiomatizing Booleans brings Zipperposition
down to a grinding halt: only 1106 problems can be solved using this mode. On the other hand,
preprocessing is fairly competitive: it solves only 8 problems less than the 𝑏 configuration.

6.2. Comparison with Other Higher-Order Provers
We compared Zipperposition with all higher-order theorem provers that took part in THF
division of CASC-27[27]: CVC4 1.8 prerelease [4], Leo-III 1.4 [14], Satallax 3.4 [15], and Vampire-
THF 4.4 [6]. In this part of the evaluation, Zipperposition is ran in portfolio mode that runs
configurations in different time slices. We set the CPU time limit to 180 s, the time allotted to
each prover at CASC-27.
   Leo-III and Satallax are cooperative theorem provers – they periodically invoke first-order
provers to finish the proof attempt. Leo-III uses CVC4, E and iProver [32] as backends, while
Satallax uses Ehoh [11] as backend. Zipperposition can use Ehoh as backend as well. To test
effectiveness of each calculus, we run the cooperative provers in two versions: pure, which
disables backends, and coop which uses all supported backends.
   In both pure and cooperative mode, Satallax comes out as the winner. Zipperposition comes
in close second, showing that our approach is a promising basis for further extensions. Leo-
III uses SMT solver CVC4, which features native support for Booleans, as a backend. It is
possible that the use of CVC4 is one of the reasons for massive improvement in success rate of
cooperative configuration of Leo-III, compared with the pure version. Therefore, we conjecture
that including support for SMT backends in Zipperposition might be beneficial.


7. Discussion and Related Work
Our work is primarily motivated by the goal of closing the gap between higher-order “hammer”
or software verifier frontends and first-order backends. Considerable amount of research effort
has gone into making the translations of higher-order logic as efficient as possible. Descriptions
of hammers like HOLyHammer [1] and Sledgehammer [2] for Isabelle contain details of these
translations. Software verifiers Boogie [33] and Why3 [34] use similar translations.
   Established higher-order provers like Leo-III and Satallax perform very well on TPTP benchmarks;
however, recent evaluations show that on Sledgehammer problems they are outperformed by
translations to first-order logic [10, 11, 9]. Those two provers are built from the ground up as
higher-order provers – treatment of exclusively higher-order issues such as extensionality or
choice is built into them usually using explosive rules. Those explosive rules might contribute


                                               162
Petar Vukmirović et al. CEUR Workshop Proceedings                                       148–166


to their suboptimal performance on mostly first-order Sledgehammer problems.
   In contrast, our approach is to start with a first-order prover and gradually extend it with
higher-order features. The work performed in the context of Matryoshka project [35], in which
both authors of this paper participate, resulted in adding support for 𝜆-free higher-order logic
with Booleans to E [11] and veriT [9], and adding support for Boolean-free higher-order logic
to Zipperposition. Authors of many state-of-the-art first-order provers have implemented some
form of support for higher-order reasoning. This is true both for SMT solvers, witnessed by
the recent extension of CVC4 and veriT [9], and for superposition provers, witnessed by the
extension of Vampire [36]. All of those approaches were arguably more focused on functional
aspects of higher-order logic, such as 𝜆-binders and function extensionality, than on Boolean
aspects such as Boolean subterms and Boolean extensionality. A notable exception is work by
Kotelnikov et al. that introduced support for Boolean subterms to first-order Vampire [12, 13].
   The main merit of our approach is that it combines two successful complementary approaches
to support features of higher-order logic that have not been combined before in a modular
way. It is based on a higher-order superposition calculus that incurs around 1% of overhead
on first-order problems compared with classic superposition [10]. We conjecture that it is this
efficient reasoning base on which the approach is based that contributes to its competitive
performance.


8. Conclusion
We presented a pragmatic approach to support Booleans in a modern automatic prover for
clausal higher-order logic. Our approach combines previous research efforts that extended
first-order provers with complementary features of higher-order logic. It also proposes some
solutions for the issues that emerge with this combination. The implementation shows clear
improvement over previous techniques and competitive performance.
   What our work misses is an overview of heuristics that can be used to curb the explosion
incurred by some of the rules described in this paper. In future work, we plan to address this
issue. Similarly, unlike Bentkamp et al. [10], we do not give any completeness guarantees for
our extension. We plan to develop a refutationally complete calculus that supports Booleans
around core rules such as Cases and LC in future work.

Acknowledgment We are grateful to the maintainers of StarExec for letting us use their
service. We thank Alexander Bentkamp, Jasmin Blanchette and Simon Cruanes for many
stimulating discussions and all the useful advice. Alexander Bentkamp suggested the rule
BoolER. We are also thankful to Ahmed Bhayat, Predrag Janičić and the anonymous reviewers
for suggesting many improvements to this paper. We thank Evgeny Kotelnikov for patiently
explaining the details of FOOL, Alexander Steen for clarifying Leo-III’s treatment of Booleans,
and Giles Reger and Martin Suda for explaining the renaming mechanism implemented in VCNF.
Vukmirović’s research has received funding from the European Research Council (ERC) under
the European Union’s Horizon 2020 research and innovation program (grant agreement No.
713999, Matryoshka). Nummelin has received funding from the Netherlands Organization for
Scientific Research (NWO) under the Vidi program (project No. 016.Vidi.189.037, Lean Forward).


                                               163
Petar Vukmirović et al. CEUR Workshop Proceedings                                        148–166


References
 [1] C. Kaliszyk, J. Urban, Hol(y)hammer: Online ATP service for HOL light, Mathematics in
     Computer Science 9 (2015) 5–22.
 [2] L. C. Paulson, J. C. Blanchette, Three years of experience with Sledgehammer, a practical
     link between automatic and interactive theorem provers, in: G. Sutcliffe, S. Schulz,
     E. Ternovska (Eds.), IWIL-2010, volume 2 of EPiC, EasyChair, 2012, pp. 1–11.
 [3] J. Filliâtre, A. Paskevich, Why3 - where programs meet provers, in: M. Felleisen,
     P. Gardner (Eds.), Programming Languages and Systems - 22nd European Symposium on
     Programming, ESOP 2013, Held as Part of the European Joint Conferences on Theory and
     Practice of Software, ETAPS 2013, Rome, Italy, March 16-24, 2013. Proceedings, volume
     7792 of LNCS, Springer, 2013, pp. 125–128.
 [4] C. W. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jovanovic, T. King, A. Reynolds,
     C. Tinelli, CVC4, in: G. Gopalakrishnan, S. Qadeer (Eds.), CAV 2011, volume 6806 of LNCS,
     Springer, 2011, pp. 171–177.
 [5] S. Schulz, S. Cruanes, P. Vukmirovic, Faster, higher, stronger: E 2.3, in: P. Fontaine (Ed.),
     CADE 2019, volume 11716 of LNCS, Springer, 2019, pp. 495–507.
 [6] L. Kovács, A. Voronkov, First-order theorem proving and vampire, in: N. Sharygina,
     H. Veith (Eds.), CAV 2013, volume 8044 of LNCS, Springer, 2013, pp. 1–35.
 [7] J. Robinson, A note on mechanizing higher order logic, in: B. Meltzer, D. Michie (Eds.),
     Machine Intelligence, volume 5, Edinburgh University Press, 1970, pp. 121–135.
 [8] J. Meng, L. C. Paulson, Translating higher-order clauses to first-order clauses, J. Autom.
     Reasoning 40 (2008) 35–60.
 [9] H. Barbosa, A. Reynolds, D. E. Ouraoui, C. Tinelli, C. W. Barrett, Extending SMT solvers
     to higher-order logic, in: P. Fontaine (Ed.), CADE 2019, volume 11716 of LNCS, Springer,
     2019, pp. 35–54.
[10] A. Bentkamp, J. Blanchette, S. Tourret, P. Vukmirović, U. Waldmann, Superposition with
     lambdas, 2020. Submitted to a journal, http://matryoshka.gforge.inria.fr/pubs/lamsup_
     report.pdf.
[11] P. Vukmirovic, J. C. Blanchette, S. Cruanes, S. Schulz, Extending a brainiac prover to
     lambda-free higher-order logic, 2020. Submitted to a journal, http://matryoshka.gforge.
     inria.fr/pubs/ehoh_article.pdf.
[12] E. Kotelnikov, L. Kovács, A. Voronkov, A first class boolean sort in first-order theorem
     proving and TPTP, CoRR abs/1505.01682 (2015). arXiv:1505.01682.
[13] E. Kotelnikov, L. Kovács, M. Suda, A. Voronkov, A clausal normal form translation for
     FOOL, in: C. Benzmüller, G. Sutcliffe, R. Rojas (Eds.), GCAI 2016, volume 41 of EPiC Series
     in Computing, EasyChair, 2016, pp. 53–71.
[14] A. Steen, C. Benzmüller, The higher-order prover Leo-III, in: D. Galmiche, S. Schulz,
     R. Sebastiani (Eds.), IJCAR 2018, volume 10900 of LNCS, Springer, 2018, pp. 108–116.
[15] C. E. Brown, Satallax: An automatic higher-order prover, in: B. Gramlich, D. Miller,
     U. Sattler (Eds.), IJCAR 2012, volume 7364 of LNCS, Springer, 2012, pp. 111–117.
[16] P. B. Andrews, Classical type theory, in: J. A. Robinson, A. Voronkov (Eds.), Handbook of
     Automated Reasoning, volume II, Elsevier and MIT Press, 2001, pp. 965–1007.
[17] G. Sutcliffe, The TPTP problem library and associated infrastructure - from CNF to TH0,


                                               164
Petar Vukmirović et al. CEUR Workshop Proceedings                                       148–166


     TPTP v6.4.0, J. Autom. Reasoning 59 (2017) 483–502.
[18] S. Cruanes, Extending Superposition with Integer Arithmetic, Structural Induction, and
     Beyond, Ph.D. thesis, École polytechnique, 2015.
[19] S. Cruanes, Superposition with structural induction, in: C. Dixon, M. Finger (Eds.), FroCoS
     2017, volume 10483 of LNCS, Springer, 2017, pp. 172–188.
[20] I. Cervesato, F. Pfenning, A linear spine calculus, J. Log. Comput. 13 (2003) 639–688.
[21] P. Vukmirovic, A. Bentkamp, V. Nummelin, Efficient full higher-order unification, in: Z. M.
     Ariola (Ed.), FSCD 2020, volume 167 of LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum für
     Informatik, 2020, pp. 5:1–5:17.
[22] L. Bachmair, H. Ganzinger, Rewrite-based equational theorem proving with selection and
     simplification, J. Log. Comput. 4 (1994) 217–247.
[23] S. Schulz, E - a brainiac theorem prover, AI Commun. 15 (2002) 111–126.
[24] A. Nonnengart, C. Weidenbach, Computing small clause normal forms, in: J. A. Robinson,
     A. Voronkov (Eds.), Handbook of Automated Reasoning (in 2 volumes), Elsevier and MIT
     Press, 2001, pp. 335–367.
[25] A. Steen, Extensional paramodulation for higher-order logic and its effective
     implementation Leo-III, Ph.D. thesis, Free University of Berlin, Dahlem, Germany, 2018.
[26] A. Bhayat, G. Reger, A combinator-based superposition calculus for higher-order logic, in:
     N. Peltier, V. Sofronie-Stokkermans (Eds.), IJCAR 2020, volume 12166 of Lecture Notes in
     Computer Science, Springer, 2020, pp. 278–296.
[27] G. Sutcliffe, The CADE-27 automated theorem proving system competition - CASC-27, AI
     Commun. 32 (2019) 373–389.
[28] H. Ganzinger, J. Stuber, Superposition with equivalence reasoning and delayed clause
     normal form transformation, Inf. Comput. 199 (2005) 3–23.
[29] G. Reger, M. Suda, A. Voronkov, New techniques in clausal form generation, in:
     C. Benzmüller, G. Sutcliffe, R. Rojas (Eds.), GCAI 2016, volume 41 of EPiC Series in
     Computing, EasyChair, 2016, pp. 11–23.
[30] M. Wisniewski, A. Steen, K. Kern, C. Benzmüller, Effective normalization techniques for
     HOL, in: N. Olivetti, A. Tiwari (Eds.), IJCAR 2016, volume 9706 of LNCS, Springer, 2016,
     pp. 362–370.
[31] A. Stump, G. Sutcliffe, C. Tinelli, StarExec: A cross-community infrastructure for logic
     solving, in: S. Demri, D. Kapur, C. Weidenbach (Eds.), IJCAR 2014, volume 8562 of LNCS,
     Springer, 2014, pp. 367–373.
[32] K. Korovin, iprover - an instantiation-based theorem prover for first-order logic (system
     description), in: A. Armando, P. Baumgartner, G. Dowek (Eds.), IJCAR 2008, volume 5195
     of LNCS, Springer, 2008, pp. 292–298.
[33] K. R. M. Leino, P. Rümmer, A polymorphic intermediate verification language: Design and
     logical encoding, in: J. Esparza, R. Majumdar (Eds.), TACAS 2010, volume 6015 of LNCS,
     Springer, 2010, pp. 312–327.
[34] F. Bobot, J.-C. Filliâtre, C. Marché, A. Paskevich, Why3: Shepherd Your Herd of Provers,
     in: Boogie 2011: First International Workshop on Intermediate Verification Languages,
     Wroclaw, Poland, 2011, pp. 53–64.
[35] J. Blanchette, P. Fontaine, S. Schulz, S. Tourret, U. Waldmann, Stronger higher-order
     automation: A report on the ongoing matryoshka project, in: M. Suda, S. Winkler (Eds.),


                                               165
Petar Vukmirović et al. CEUR Workshop Proceedings                                      148–166


     EPTCS 311: Proceedings of the Second International Workshop on Automated Reasoning:
     Challenges, Applications, Directions, Exemplary Achievements - Natal, Brazil, August 26,
     2019, Electronic Proceedings in Theoretical Computer Science, EPTCS, EPTCS, 2019, pp.
     11–18.
[36] A. Bhayat, G. Reger, Restricted combinatory unification, in: P. Fontaine (Ed.), CADE 2019,
     volume 11716 of LNCS, Springer, 2019, pp. 74–93.


                                               166

</pre>