Boolean Reasoning in a Higher-Order Superposition Prover Petar VukmiroviΔ‡a , Visa Nummelina a Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands Abstract We present a pragmatic approach to extending a Boolean-free higher-order superposition calculus to support Boolean reasoning. Our approach extends inference rules that have been used only in a first- order setting, uses some well-known rules previously implemented in higher-order provers, as well as new rules. We have implemented the approach in the Zipperposition theorem prover. The evaluation shows highly competitive performance of our approach and clear improvement over previous techniques. Keywords higher-order logic, theorem proving, superposition 1. Introduction In the last decades, automatic theorem provers have been used successfully as backends to β€œhammers” in proof assistants [1, 2] and to software verifiers [3]. Most advanced provers, such as CVC4 [4], E [5], and Vampire [6], are based on first-order logic, whereas most frontends that use them are based on versions of higher-order logic. Thus, there is a large gap in expressiveness between front- and backends. This gap is bridged using well-known translations from higher- order to first-order logic [7, 8]. However, translations are usually less efficient than native support [9, 10, 11]. The distinguishing features of higher-order logic used by proof assistants that the translation must eliminate include πœ†-binders, function extensionality – the property that functions are equal if they agree on every argument, described by the axiom @pπ‘₯, 𝑦 : 𝜏 Γ‘ 𝜈q. p@p𝑧 : 𝜏 q. π‘₯ 𝑧 Β« 𝑦 𝑧q Γ± π‘₯ Β« 𝑦, and formulas occurring as arguments of function symbols [8]. A group of authors including VukmiroviΔ‡ [10] recently designed a complete calculus for extensional Boolean-free higher-order logic. This calculus is an extension of superposition, the calculus used in most successful provers such as E or Vampire. The extension removes the need to translate the first two above mentioned features of higher-order logic. Kotelnikov et al. [12, 13] extended the language of first-order logic to support the third feature of higher-order logic that requires translation. They described two approaches: one based on calculus-level treatment of Booleans and the other, which requires no changes to the calculus, based on preprocessing. PAAR 2020: Seventh Workshop on Practical Aspects of Automated Reasoning, June 29–30, 2020, Paris, France (virtual) p.vukmirovic@vu.nl (P. VukmiroviΔ‡); visa.nummelin@vu.nl (V. Nummelin) { https://petarvukmirovic.github.io/home/ (P. VukmiroviΔ‡)  0000-0001-7049-6847 (P. VukmiroviΔ‡); 0000-0003-0078-790X (V. Nummelin) Β© 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 148 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 To fully bridge the gap between higher-order and first-order tools, we combine the two approaches: we use the efficient higher-order superposition calculus and extend it with inference rules that reason with Boolean terms. In early work, Kotelnikov et al. [12] have described a FOOL paramodulation rule that, under some order requirements, removes the need for the axiom describing the Boolean domain – @p𝑝 : π‘œq. 𝑝 Β« J _ 𝑝 Β« K. In this approach, it is assumed that a problem with formulas occurring as arguments of symbols is translated to first-order logic. The backbone of our approach is based on an extension of this rule to higher-order logic. Namely, we do not translate away any Boolean structure that is nested inside non-Boolean terms and allow our rule to hoist the nested Booleans to the literal level. Then, we clausify the resulting formula (i.e., a clause that contains formulas in literals) using a new rule. An important feature that we inherit by building on top of Bentkamp et al. [10] is support for (function) extensionality. Moving to higher-order logic with Booleans also means that we need to consider Boolean extensionality: @p𝑝 : π‘œqpπ‘ž : π‘œq. p𝑝 Γ΄ π‘žq Γ± 𝑝 Β« π‘ž. We extend the rules of Bentkamp et al. that treat function extensionality to support Boolean extensionality as well. Rules that extend the two orthogonal approaches form the basis of our support for Boolean reasoning (Section 3). In addition, we have implemented rules that are inspired by the ones used in the higher-order provers Leo-III [14] and Satallax [15], such as elimination of Leibniz equality, primitive instantiation and treatment of choice operator [16]. We have also designed new rules that use higher-order unification to resolve Boolean formulas that are hoisted to literal level, delay clausification of non-atomic literals, reason about formulas under πœ†-binders, and many others. Even though the rules we use are inspired by the ones of refutationally complete higher-order provers, we do not guarantee completeness of our extension of πœ†-superposition. We compare our native approach with two alternatives based on preprocessing (Section 4). First, we compare it to an axiomatization of the theory of Booleans. Second, inspired by work of Kotelnikov et al. [13], we implemented the preprocessing approach that does not require introduction of Boolean axioms. We also discuss some examples, coming from TPTP [17], that illustrate advantages and disadvantages of our approach (Section 5). Our approach is implemented in the Zipperposition theorem prover [18, 19]. Zipperposition is an easily extensible open source prover that Bentkamp et al. used to implement their higher- order superposition calculus. We further extend their implementation. We performed an extensive evaluation of our approach (Section 6). In addition to evaluating different configurations of our new rules, we have compared them to full higher-order provers CVC4, Leo-III, Satallax and Vampire. The results suggest that it is beneficial to natively support Boolean reasoning – the approach outperforms preprocessing-based approaches. Furthermore, it is very competitive with state-of-the-art higher order provers. We discuss the differences between our approach and the approaches we base on, as well as related approaches (Section 7). 2. Background We base our work on Bentkamp et al.’s [10] extensional polymorphic clausal higher-order logic. We extend the syntax of this logic by adding logical connectives to the language of terms. The semantic of the logic is extended by interpreting Boolean type π‘œ as a two-element domain. This amounts to extending Bentkamp et al’s fragment of higher-order logic to full-higher order logic 149 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 (HOL). Our notation, definitions and the following text are largely based on Bentkamp et al.’s. A signature is a quadruple pΞ£ty , Vty , Ξ£, Vq where Ξ£ty is a set of type constructors, Vty is a set of type variables and Ξ£ and V are sets of constants and term variables, respectively. We require nullary type constructors πœ„ and π‘œ, as well as binary constructor Γ‘ to be in Ξ£ty . A type 𝜏, 𝜐 is either a type variable 𝛼 P Vty or of the form πœ…p𝜏1 , . . . πœπ‘› q where πœ… is an 𝑛-ary type constructor. We write πœ… for πœ…pq, 𝜏 Γ‘ 𝜐 for Γ‘ p𝜏, 𝜐q, and we abbreviate tuples pπ‘Ž1 , . . . , π‘Žπ‘› q as π‘Žπ‘› for 𝑛 Δ› 0. Similarly, we drop parentheses to shorten 𝜏1 Γ‘ pΒ¨ Β¨ Β¨ Γ‘ pπœπ‘›Β΄1 Γ‘ πœπ‘› q Β¨ Β¨ Β¨ q into 𝜏1 Γ‘ Β¨ Β¨ Β¨ Γ‘ πœπ‘› . Each symbol in Ξ£ is assigned a type declaration of the form Π𝛼𝑛 . 𝜏 where all variables occurring in 𝜏 are among 𝛼𝑛 . Function symbols a, b, f, g, . . . are elements of Ξ£; their type declarations are written as f : Π𝛼𝑛 . 𝜏 . Term variables from the set V are written π‘₯, 𝑦, 𝑧 . . . and we denote their types as π‘₯ : 𝜏 . When the type is not important, we omit type declarations. We assume that symbols J, K, ␣, ^, _, Γ±, Γ΄ with their standard meanings and type declarations are elements of Ξ£. Furthermore, we assume that polymorphic symbols @ and D with type declarations Π𝛼. p𝛼 Γ‘ π‘œq Γ‘ π‘œ and Β« : Π𝛼. 𝛼 Γ‘ 𝛼 Γ‘ π‘œ are in Ξ£, with their standard meanings. All these symbols are called logical symbols. We write binary logical symbols in infix notation. Terms are defined inductively as follows. Variables π‘₯ : 𝜏 are terms of type 𝜏 . If f : Π𝛼𝑛 . 𝜏 is in Ξ£ and 𝜐 𝑛 is a tuple of types, called type arguments, then fx𝜐 𝑛 y (written as f if 𝑛 β€œ 0, or if type arguments can be inferred from the context) is a term of type 𝜏 t𝛼𝑛 ΓžΓ‘ 𝜐 𝑛 u, called constant. If π‘₯ is a variable of type 𝜏 and 𝑠 is a term of type 𝜐 then πœ†π‘₯. 𝑠 is a term of type 𝜏 Γ‘ 𝜐. If 𝑠 and 𝑑 are of type 𝜏 Γ‘ 𝜐 and 𝜏 , respectively, then 𝑠 𝑑 is a term of type 𝜐. We call terms of Boolean type (π‘œ) formulas and denote them by 𝑓, 𝑔, β„Ž, . . .; we use 𝑝, π‘ž, π‘Ÿ, . . . for variables whose result type is π‘œ and p, q, r for constants with the same result type. We shorten iterated lambda abstraction πœ†π‘₯1 . . . . πœ†π‘₯𝑛 . 𝑠 to πœ†π‘₯𝑛 . 𝑠, and iterated application p𝑠 𝑑1 q Β¨ Β¨ Β¨ 𝑑𝑛 to 𝑠 𝑑𝑛 . We assume the standard notion of free and bound variables, capture-avoiding substitutions 𝜎, 𝜌, πœƒ, . . ., and 𝛼-, 𝛽-, πœ‚- conversion. Unless stated otherwise, we view terms as π›Όπ›½πœ‚-equivalence classes, with πœ‚-long 𝛽-reduced form as the representative. Each term 𝑠 can be uniquely written as πœ†π‘₯π‘š . π‘Ž 𝑑𝑛 where π‘Ž is either variable or constant and π‘š, 𝑛 Δ› 0; we call π‘Ž the head of 𝑠. We say that a term π‘Ž 𝑑𝑛 is written in spine notation [20]. Following our previous work [21], we define nonstandard notion of subterms and positions inductively as a graceful extension of the first-order counterparts: a term 𝑠 is a subterm of itself at position πœ€. If 𝑠 is a subterm of 𝑑𝑖 at position 𝑝 then 𝑠 is a subterm of π‘Ž 𝑑𝑛 at position 𝑖.𝑝, where π‘Ž is a head. If 𝑠 is a subterm of 𝑑 at position 𝑝 then 𝑠 is a subterm of πœ†π‘₯. 𝑑 at position 1.𝑝. We use 𝑠|𝑝 to denote subterm of 𝑠 at position 𝑝. Given a formula 𝑓 we call its Boolean subterm 𝑓 |𝑝 a top-level Boolean if for all proper prefixes π‘ž of 𝑝, the head of 𝑓 |π‘ž is a logical constant. Otherwise, we call it a nested Boolean. For example, in the formula 𝑓 β€œ h a Β« g pp Γ± qq _ ␣p, 𝑓 |1 and 𝑓 |2 are top-level Booleans, whereas 𝑓 |1.2.1 is a nested Boolean (as well as its subterms). Only top-level Booleans are allowed in first-order logic, whereas nested Booleans are characteristic for higher-order logic. A formula is called an atom if it is of the form π‘Ž 𝑑𝑛 , where π‘Ž is a non-logical head, or of the form 𝑠 Β« 𝑑, where if 𝑠 or 𝑑 are of type π‘œ, and one of them has a logical head, the other one must be J or K. A literal 𝐿 is an atom or its negation. A clause 𝐢 is a multiset of literals, interpreted and written (abusing _) disjunctively as 𝐿1 _ Β¨ Β¨ Β¨ _ 𝐿𝑛 . We write 𝑠 ff 𝑑 for ␣p𝑠 Β« 𝑑q. We say a variable is free in a clause 𝐢 if it is not bound inside any subterm of a literal in 𝐢. 150 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 3. The Native Approach Some support for Booleans was already present in Zipperposition before we started extending the calculus of Bentkamp et al. In this section, we start by describing the internals of Zipperposition responsible for reasoning with Booleans. We continue by describing 15 rules that we have implemented. For ease of presentation we divide them in three categories. We assume some familiarity with the superposition calculus [22] and adopt the notation used by Schulz [23]. 3.1. Support for Booleans in Zipperposition Zipperposition is an open source1 prover written in OCaml. From its inception, it was designed as a prover that supports easy extension of its base superposition calculus to various theories, including arithmetic, induction and limited support for higher-order logic [18]. Zipperposition internally stores applications in flattened, spine notation. It also exploits associativity of ^ and _ to flatten nested applications of these symbols. Thus, the terms p ^ pq ^ rq and pp ^ qq ^ r are represented as ^ p q r. The prover’s support for πœ†-terms is used to represent quantified nested Booleans: formulas @π‘₯. 𝑓 and Dπ‘₯. 𝑓 are represented as @ pπœ†π‘₯. 𝑓 q and D pπœ†π‘₯. 𝑓 q. After clausification of the input problem, no nested Booleans will be modified or renamed using fresh predicate symbols. The version of Zipperposition preceding our modifications distinguished between equational and non-equational literals. Following E [5], we modified Zipperposition to represent all literals equationally: a non-equational literal 𝑓 is stored as 𝑓 Β« J, whereas ␣𝑓 is stored as 𝑓 ff J. Equations of the form 𝑓 Β« K and 𝑓 ff K are transformed into 𝑓 ff J and 𝑓 Β« J, respectively. 3.2. Core Rules Kotelnikov et al. [12], to the best of our knowledge, pioneered the approach of extending a first-order superposition prover to support nested Booleans. They call effects of including the axiom @p𝑝 : π‘œq. 𝑝 Β« J _ 𝑝 Β« K a β€œrecipe for disaster”. To combat the explosive behavior of the axiom, they imposed the following two requirements to the simplification order Δ… (which is a parameter to the superposition calculus): J Δ… K and J and K are two smallest ground terms with respect to Δ…. If these two requirements are met, there is no self-paramodulation of the clause and only paramodulation possible is from literal 𝑝 Β« J of the mentioned axiom into a Boolean subterm of another clause. Finally, Kotelnikov et al. replace the axiom with the inference rule FOOL Paramodulation (FP): 𝐢r𝑓 s FP 𝐢rJs _ 𝑓 Β« K where 𝑓 is a nested non-variable Boolean subterm of clause 𝐢, different from J and K. In addition, they translate the initial problem containing nested Booleans to first-order logic without interpreted Booleans; this translation introduces proxy symbols for J and K, and proxy type for π‘œ. 1 https://github.com/sneeuwballen/zipperposition 151 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 We created two rules that are syntactically similar to FP but are adapted for higher-order logic with one key distinction – we do not perform any translation: 𝐢r𝑓 s 𝐢r𝑓 s Cases CasesSimp 𝐢rKs _ 𝑓 Β« J 𝐢rKs _ 𝑓 Β« J 𝐢rJs _ 𝑓 ff J The double line in the definition of CasesSimp denotes that the premise is replaced by conclusions; obviously, the prover that uses the rules should not include them both at the same time. In addition, since literals 𝑓 Β« K are represented as negative equations 𝑓 ff J, which cannot be used to paramodulate from, we change the first requirement on the order to K Δ… J. These two rules hoist Boolean subterms 𝑓 to the literal level; therefore, some results of Cases and CasesSimp will have literals of the form 𝑓 Β« J (or 𝑓 ff J) where 𝑓 is not an atom. This introduces the need for the rule called eager clausification (EC): 𝐢 EC 𝐷1 Β¨ Β¨ Β¨ π·π‘š . We say that a clause is standard if all of its literals are of the form 𝑠 Β« 𝑑, where 𝑠 and 𝑑 are not . . Booleans or of the form 𝑓 Β« J, where the head of 𝑓 is not a logical symbol and Β« denotes Β« or ff. The rule EC is applicable if clause 𝐢 β€œ 𝐿1 _ Β¨ Β¨ Β¨ _ 𝐿𝑛 is not standard. The resulting clauses π·π‘š represent the result of clausification of the formula @π‘₯. 𝐿1 _ Β¨ Β¨ Β¨ _ 𝐿𝑛 where π‘₯ are all free variables of 𝐢. An advantage of leaving nested Booleans unmodified is that the prover will be able to prove some problems containing them without using the prolific rules described above. For example, given two clauses f pp π‘₯ Γ± p 𝑦q Β« a and f pp a Γ± p bq ff a, the empty clause can easily be derived without the above rules. A disadvantage of this approach is that the proving process will periodically be interrupted by expensive calls to the clausification algorithm. Naive application of Cases and CasesSimp rules can result in many redundant clauses. Consider a clause 𝐢 β€œ p pp pp pp aqqq Β« J where p : π‘œ Γ‘ π‘œ, a : π‘œ. Then, the clause 𝐷 β€œ a Β« J _ p K Β« J can be derived from 𝐢 in eight ways using the rules, depending on which nested Boolean subterm was chosen for the inference. In general, if a clause has a subterm occurrence of the form p𝑛 a, where both p and a have result type π‘œ, the clause a Β« J _ p K Β« J can be derived in 2𝑛´1 ways. To combat these issues we implemented pragmatic restrictions of the rule: only the leftmost outermost (or innermost) eligible subterm will be considered. With this modification 𝐷 can be derived in only one way. Furthermore, some intermediate conclusions of the rules will not be derived, pruning the search space. The clausification algorithm by Nonnengart and Weidenbach [24] aggressively simplifies the input problem using well-known Boolean equivalences before clausifying it. For example, the formula p ^ J will be replaced by p. To simplify nested Booleans we implemented the rule 𝐢r𝑓 𝜎s BoolSimp 𝐢rπ‘”πœŽs where 𝑓 ÝÑ 𝑔 P 𝐸 runs over fixed set of rewrite rules 𝐸, and 𝜎 is any substitution. In the current implementation of Zipperposition, 𝐸 consists of the rules described by Nonnengart 152 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 and Weidenbach [24, Section 3]. This set contains the rules describing how each logical symbol behaves when either of its argument is J or K: for example, it includes J Γ± 𝑝 ÝÑ 𝑝 and 𝑝 Γ± J ÝÑ J. Leo-III implements a similar rule, called simp [25, Section 4.2.1.]. Our decision to represent negative atoms as negative equations was motivated by the need to alter Zipperposition’s earlier behavior as little as possible. Namely, negative atoms were not used as literals that can be used to paramodulate from, and as such added to the laziness of the superposition calculus. However, it might be useful to consider unit clauses of the form 𝑓 ff J as 𝑓 Β« K to strengthen demodulation. To that end, we have introduced the following rule: 𝑓 ff J 𝐢r𝑓 𝜎s BoolDemod 𝑓 ff J 𝐢rKs 3.3. Higher-Order Considerations To achieve refutational completeness of higher-order resolution and similar calculi it is necessary to instantiate variables with result type π‘œ, predicate variables, with arbitrary formulas [25, 16]. Fortunately, we can approximate the formulas using a complete set of logical symbols (e.g., ␣, @, and ^). Since such an approximation is not only necessary for completeness of some calculi, but very useful in practice, we implemented the primitive instantiation (PI) rule: . 𝐢 _ πœ†π‘₯π‘š . 𝑝 𝑠𝑛 Β« 𝑑 . PI p𝐢 _ πœ†π‘₯π‘š . 𝑝 𝑠𝑛 Β« 𝑑qt𝑝 ΓžΓ‘ 𝑓 u where 𝑝 is a free variable of the type 𝜏1 Γ‘ Β¨ Β¨ Β¨ Γ‘ πœπ‘› Γ‘ π‘œ. Choosing a different 𝑓 that instantiates 𝑝, we can balance between explosiveness of approximating a complete set of logical symbols and incompleteness of pragmatic approaches. We borrow the notion of imitation from higher-order unification jargon [21]: we say that the term πœ†π‘₯π‘š . f p𝑦1 π‘₯π‘š q Β¨ Β¨ Β¨ p𝑦𝑛 π‘₯π‘š q is an imitation of constant f : 𝜏1 Γ‘ Β¨ Β¨ Β¨ Γ‘ πœπ‘› Γ‘ 𝜏 for some variable 𝑧 of type 𝜈1 Γ‘ Β¨ Β¨ Β¨ Γ‘ πœˆπ‘š Γ‘ 𝜏 . Variables 𝑦 𝑛 are fresh free variables, where each 𝑦𝑖 has the type 𝜈1 Γ‘ Β¨ Β¨ Β¨ Γ‘ πœˆπ‘š Γ‘ πœπ‘– ; variable π‘₯𝑖 is of type πœˆπ‘– . Rule PI was already implemented by Simon Cruanes in Zipperposition, before we started our modifications. The rule has different modes that generate sets of possible terms 𝑓 for 𝑝 : 𝜏1 Γ‘ Β¨ Β¨ Β¨ Γ‘ πœπ‘› Γ‘ π‘œ: Full, Pragmatic, and Imit β€Ή where β€Ή is an element of a set of logical constants 𝑃 β€œ t^, _, Β«x𝛼y, ␣, @, Du. Mode Full contains imitations (for 𝑝) of all elements of 𝑃 . Mode Pragmatic contains imitations of ␣, J and K; if there exist indices 𝑖, 𝑗 such that 𝑖 ‰ 𝑗 and πœπ‘– β€œ πœπ‘— , it contains πœ†π‘₯𝑛 . π‘₯𝑖 Β« π‘₯𝑗 ; if there exist indices 𝑖, 𝑗 such that 𝑖 ‰ 𝑗, and πœπ‘– β€œ πœπ‘— β€œ π‘œ, then it contains πœ†π‘₯𝑛 . π‘₯𝑖 ^π‘₯𝑗 and πœ†π‘₯𝑛 . π‘₯𝑖 _π‘₯𝑗 ; if for some 𝑖, πœπ‘– β€œ π‘œ, then it contains πœ†π‘₯𝑛 . π‘₯𝑖 . Mode Imit β€Ή contains imitations of J, K and β€Ή (except for Imit @D which contains imitations of both @ and D). While experimenting with our implementation we have noticed some proof patterns that led us to come up with the following modifications. First, it often suffices to perform PI only on initial clauses – which is why we allow the rule to be applied only to the clauses created using at most π‘˜ generating inferences. Second, if the rule was used in the proof, its premise is usually only used as part of that inference – which is why we implemented a version of PI that removes 153 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 the clause after all possible PI inferences have been performed. We observed that the mode Imit β€Ή is useful in practice since often only a single approximation of a logical symbol is necessary. Efficiently treating axiom of choice is notoriously difficult for higher-order provers. Andrews formulates this axiom as @p𝑝 : 𝛼 Γ‘ π‘œq. pDpπ‘₯ : 𝛼q. 𝑝 π‘₯q Γ± 𝑝 pπœ€ 𝑝q, where πœ€ : Π𝛼. p𝛼 Γ‘ π‘œq Γ‘ 𝛼 denotes the choice operator [16]. After clausification, this axiom becomes 𝑝 π‘₯ ff J _ 𝑝 pπœ€ 𝑝q Β« J. Since term 𝑝 π‘₯ matches any Boolean term in the proof state, this axiom is very explosive. Therefore, Leo-III [14] deals with the choice operator on the calculus level. Namely, whenever a clause 𝐢 β€œ 𝑝 π‘₯ ff J _ 𝑝 pf 𝑝q Β« J is chosen for processing, 𝐢 is removed from the proof state and f is added to set of choice functions CF (which initially contains just πœ€). Later, elements of CF will be used to heuristically instantiate the axiom of choice. We reused the method of recognizing choice functions, but generalized the rule for creating the instance of the axiom (assuming πœ‰ P CF ): 𝐢rπœ‰ 𝑑s Choice π‘₯ p𝑑 𝑦q ff J _ π‘₯ p𝑑 pπœ‰ pπœ†π‘§. π‘₯ p𝑑 𝑧qqqq Β« J Let 𝐷 be the conclusion of Choice. The fresh variable π‘₯ in 𝐷 acts as arbitrary context around 𝑑, the chosen instantiation for 𝑝 from axiom of choice; the variable π‘₯ can later be replaced by imitation of logical symbols to create more complex instantiations of the choice axiom. To generate useful instances early, we create 𝐷tπ‘₯ ΓžΓ‘ πœ†π‘§. 𝑧u and 𝐷tπ‘₯ ΓžΓ‘ πœ†π‘§. ␣𝑧u. Then, based on Zipperposition parameters, 𝐷 will either be deleted or kept. Note that 𝐷 will not subsume its instances, since the matching algorithm of Zipperposition is too weak for this. Most provers natively support extensionality reasoning: Bhayat et al. [26] modify first-order unification to return unification constraints consisting of pairs of terms of functional type, whereas Steen relies on the unification rules of Leo-III’s calculus [25, Section 4.3.3.] to deal with extensionality. Bentkamp et al [10] altered core generating inference rules of the superposition calculus to support extensionality. Instead of requiring that terms involved in the inference are unifiable, it is required that they can be decomposed into disagreement pairs such that at least one of the disagreement pairs is of functional type. Disagreement pairs of terms 𝑠 and 𝑑 of the same type are defined inductively using function dp: dpp𝑠, 𝑑q β€œ H if 𝑠 and 𝑑 are equal; dppπ‘Ž 𝑠𝑛 , 𝑏 π‘‘π‘š q β€œ Ε€ tpπ‘Ž 𝑠𝑛 , 𝑏 π‘‘π‘š qu if π‘Ž and 𝑏 are different heads; dppπœ†π‘₯. 𝑠, πœ†π‘¦. 𝑑q β€œ tpπœ†π‘₯. 𝑠, πœ†π‘¦. 𝑑qu; dppπ‘Ž 𝑠𝑛 , π‘Ž 𝑑𝑛 q β€œ π‘›π‘–β€œ1 dpp𝑠𝑖 , 𝑑𝑖 q. Then the extensionality rules are stated as follows: . 𝑠«𝑑_𝐢 𝑒r𝑠1 s Β« 𝑣 _ 𝐷 . ExtSup p𝑠1 ff 𝑠11 _ Β¨ Β¨ Β¨ _ 𝑠𝑛 ff 𝑠1𝑛 _ 𝑒r𝑑s Β« 𝑣 _ 𝐢 _ 𝐷q𝜎 𝑠 ff 𝑠1 _ 𝐢 ExtER p𝑠1 ff 𝑠11 _ Β¨ Β¨ Β¨ _ 𝑠𝑛 ff 𝑠1𝑛 Β¨ Β¨ Β¨ _ 𝐢q𝜎 𝑠 Β« 𝑑 _ 𝑠1 Β« 𝑒 _ 𝐢 ExtEF p𝑠1 ff 𝑠11 _ Β¨ Β¨ Β¨ _ 𝑠𝑛 ff 𝑠1𝑛 _ 𝑑 ff 𝑒 _ 𝑠1 Β« 𝑒 _ 𝐢q𝜎 Rules ExtSup, ExtER, and ExtEF are extensional versions of superposition, equality resolution and equality factoring [23]. The union of these three rules is denoted by Ext. In each rule, 𝜎 is a most general unifier of the types of 𝑠 and 𝑠1 , and dppπ‘ πœŽ, 𝑠1 𝜎q β€œ tp𝑠1 , 𝑠11 q, . . . , p𝑠𝑛 , 𝑠1𝑛 qu. All 154 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 side conditions for extensional rules are the same as for the standard rules, except that condition that 𝑠 and 𝑠1 are unifiable is replaced by the condition that at least one 𝑠𝑖 is of functional type and that 𝑛 Δ… 0. This rule is easily extended to support Boolean extensionality by requiring that at least one 𝑠𝑖 is of functional or type π‘œ, and adding the condition β€œdpp𝑓, 𝑔q β€œ tp𝑓, 𝑔qu if 𝑓 and 𝑔 are different formulas” to the definition of dp. Consider the clause f p␣p _ ␣qq ff f p␣pp ^ qqq. This problem is obviously unsatisfiable, since arguments of f on different sides of the disequation are extensionally equal; however, without Ext rules Zipperposition will rely on Cases(Simp) and EC rules to derive the empty clause. Rule ExtER will generate 𝐢 β€œ ␣p _ ␣q ff ␣pp ^ qq. Then, 𝐢 will get clausified using EC, effectively reducing the problem to ␣p␣p _ ␣q Γ΄ ␣pp ^ qqq, which is first-order. Zipperposition restricts ExtSup by requiring that 𝑠 and 𝑠1 are not of function or Boolean types. If the terms are of function type, our experience is that better treatment of function extensionality is to apply fresh free variables (or Skolem terms, depending on the sign [10]) to both sides of a (dis)equation to reduce it to a first-order literal; Boolean extensionality is usually better supported by applying EC on the top-level Boolean term. Thus, for the following discussion we can assume 𝑠 and 𝑠1 are not πœ†-abstractions or formulas. Then, ExtSup is applicable if 𝑠 and 𝑠1 have the same head, and a functional or Boolean subterm. To efficiently retrieve such terms, we added an index that maps symbols to positions in clauses where they appear as a head of a term that has a functional or Boolean subterm. This index will be empty for first-order problems, incurring no overhead if extensionality reasoning is not needed. Furthermore, we do not apply Ext rules if all disagreement pairs have at least one side whose head is a variable; those will be dealt with more efficiently using standard, non-extensional, versions of the rules. We also eagerly resolve literals 𝑠𝑖 ff 𝑠1𝑖 using at most one unifier returned by terminating, pragmatic variant of unification algorithm by VukmiroviΔ‡ et al. [21]. Expressiveness of higher-order logic allows users to define equality using a single axiom, called Leibniz equality [16]: @pπ‘₯ : 𝛼qp𝑦 : 𝛼q. p@p𝑝 : 𝛼 Γ‘ π‘œq. 𝑝 π‘₯ Γ± 𝑝 𝑦q Γ± π‘₯ Β« 𝑦. Leibniz equality often appears in TPTP problems. Since modern provers have the native support for equality, it is usually beneficial to recognize and replace occurrences of Leibniz equality. Before we began our modifications, Zipperposition had a powerful rule that recognizes clauses that contain variations of Leibniz equality and instantiates them with native equality. This rule was designed by Simon Cruanes, and to the best of our knowledge, it has not been documented so far. With his permission we describe this rule as follows: 1 𝑗 𝑝 𝑠1𝑛 Β« J _ Β¨ Β¨ Β¨ _ 𝑝 𝑠𝑖𝑛 Β« J _ 𝑝 𝑑𝑛 ff J _ Β¨ Β¨ Β¨ _ 𝑝 𝑑𝑛 ff J _ 𝐢 ElimPredVar p𝑝 𝑠1𝑛 Β« J _ Β¨ Β¨ Β¨ _ 𝑝 𝑠𝑖𝑛 Β« J _ 𝐢q𝜎 where 𝑝 is a free variable, 𝑝 does not occur in any π‘ π‘™π‘˜ or π‘‘π‘™π‘˜ , or in 𝐢, and 𝜎 is defined as t𝑝 ΓžΓ‘ πœ†π‘₯𝑛 . π‘—π‘˜β€œ1 p π‘›π‘™β€œ1 π‘₯𝑙 Β« π‘‘π‘˜π‘™ qu. Ε½ ΕΉ To better understand how this rule removes variable-headed negative literals, consider the clause 𝐢 β€œ 𝑝 a1 a2 Β« J _ 𝑝 b1 b2 ff J _ 𝑝 c1 c2 ff J. Since all side conditions are fulfilled, the rule ElimPredVar will generate 𝜎 β€œ t𝑝 ΓžΓ‘ πœ†π‘₯𝑦. pπ‘₯ Β« b1 ^𝑦 Β« b2 q_pπ‘₯ Β« c1 ^𝑦 Β« c2 qu. After applying 𝜎 to 𝐢 and subsequent 𝛽-reduction, negative literal 𝑝 b1 b2 ff J will reduce to pb1 Β« b1 ^ b2 Β« b2 q _ pb1 Β« c1 ^ b2 Β« c2 q ff J, which is equivalent to K. Thus, we can remove this literal and all negative literals of the form 𝑝 𝑑𝑛 ff J from 𝐢 and apply 𝜎 to the remaining ones. 155 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 The previous rule removes all variables occurring in disequations in one attempt. We imple- mented two rules that behave more lazily, inspired by the ones present in Leo-III and Satallax: 𝑝 𝑠𝑛 Β« J _ 𝑝 𝑑𝑛 ff J _ 𝐢 𝑝 𝑠𝑛 ff J _ 𝑝 𝑑𝑛 Β« J _ 𝐢 ElimLeibniz` ElimLeibnizΒ΄ p𝑠𝑖 Β« 𝑑𝑖 _ 𝐢q𝜎 p𝑠𝑖 Β« 𝑑𝑖 _ 𝐢q𝜎 1 where 𝑝 is a free variable, 𝑝 does not occur in 𝑑𝑖 , 𝜎 β€œ t𝑝 ΓžΓ‘ πœ†π‘₯𝑛 . π‘₯𝑖 Β« 𝑑𝑖 u and 𝜎 1 β€œ t𝑝 ΓžΓ‘ πœ†π‘₯𝑛 . ␣pπ‘₯𝑖 Β« 𝑑𝑖 qu. This rule differs from ElimPredVar in three ways. First, it acts on occurrences of variables in both positive and negative literals. Second, due to its simplicity, it usually does not require EC as the following step. Third, it imposes much weaker conditions on 𝑝. However, removing all negative variables in one step might improve performance. Coming back to example of the clause 𝐢 β€œ 𝑝 a1 a2 Β« J _ 𝑝 b1 b2 ff J _ 𝑝 c1 c2 ff J, we can apply ElimLeibniz` using the substitution 𝜎 β€œ tπœ†π‘₯𝑦. π‘₯ Β« b1 u to obtain the clause 𝐢 1 β€œ a1 Β« b1 _ a1 ff c1 . 3.4. Additional Rules Zipperposition’s unification algorithm [21] uses flattened representation of terms with logical operators ^ and _ for heads to unify terms that are not unifiable modulo π›Όπ›½πœ‚-equivalence, but are unifiable modulo associativity and commutativity of ^ and _. Let Λ› denote either ^ or _. When the unification algorithm is given two terms Λ› 𝑠𝑛 and Λ› 𝑑𝑛 , where neither of 𝑠𝑛 nor 𝑑𝑛 contain duplicates, it performs the following steps: First, it removes all terms that appear in both 𝑠𝑛 and 𝑑𝑛 from the two argument tuples. Next, the remaining terms are sorted first by their head term and then by their weight. Finally, the ` sorted lists are unified pairwise. As an˘ example, consider the problem of unifying the pair ^ pp aq pq pf aqq, ^ pq pf aqq pπ‘Ÿ pf pf aqqq where π‘Ÿ is a free variable. If the arguments of ^ are simply sorted as described above, we would unsuccessfully try to unify p a with q pf aq. However, by removing term q pf aq from the argument lists, we will be left with the problem pp a, π‘Ÿ pf pf aqqq which has a unifier. The winner of THF division of CASC-27 [27], Satallax [15], has one crucial advantage over Zipperposition: it is based on higher-order tableaux, and as such it does not require formulas to be converted to clauses. The advantage of tableaux is that once it instantiates a variable with a term, this instantiation naturally propagates through the whole formula. In Zipperposition, which is based on higher-order superposition, the original formula is clausified and instantiating a variable in a clause 𝐢 does not automatically instantiate it in all clauses that are results of clausification of the same formula as 𝐢. To mitigate this issue, we have created extensions of equality resolution and equality factoring that take Boolean extensionality into account: 𝑠 Β« 𝑠1 _ 𝐢 𝑝 𝑠𝑛 Β« J _ 𝑠1 ff J _ 𝐢 BoolER BoolEF`Β΄ 𝐢𝜎 p𝑝 𝑠𝑛 Β« ␣𝑠1 _ 𝐢q𝜎 𝑝 𝑠𝑛 ff J _ 𝑠1 Β« J _ 𝐢 𝑝 𝑠𝑛 ff J _ 𝑠1 ff J _ 𝐢 BoolEFΒ΄` BoolEF´´ p𝑝 𝑠𝑛 Β« ␣𝑠1 _ 𝐢q𝜎 p𝑝 𝑠𝑛 Β« 𝑠1 _ 𝐢q𝜎 All side conditions except for the ones concerning the unifiability of terms are as in the original equality resolution and equality factoring rules. In rule BoolER, 𝜎 unifies 𝑠 and ␣𝑠1 . In the 156 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 `Β΄ and Β΄` versions of BoolEF, 𝜎 unifies 𝑝 𝑠𝑛 and ␣𝑠1 , and in the remaining version it unifies 𝑝 𝑠𝑛 and 𝑠1 . Intuitively, these rules bring Boolean (dis)equations in the appropriate form for application of the corresponding base rules. It suffices to consider literals of the form 𝑠 Β« 𝑠1 for BoolER since Zipperposition rewrites 𝑠 Γ΄ 𝑑 Β« J and ␣p𝑠 Γ΄ 𝑑q ff J to 𝑠 Β« 𝑑 (and does analogous rewriting into 𝑠 ff 𝑑). Another approach to mitigate harmful effects of eager clausification is to delay it as long as possible. Following the approach by Ganzinger and Stuber [28], we represent every input formula 𝑓 as a unit clause 𝑓 Β« J and use the following lazy clausification (LC) rules: p𝑓 ^ 𝑔q Β« J _ 𝐢 p𝑓 _ 𝑔q Β« J _ 𝐢 p𝑓 Γ± 𝑔q Β« J _ 𝐢 LC^ LC_ LCΓ± 𝑓 Β«J_𝐢 𝑔«J_𝐢 𝑓 Β«J_𝑔«J_𝐢 𝑓 ff J _ 𝑔 Β« J _ 𝐢 p␣𝑓 q Β« J _ 𝐢 p@π‘₯. 𝑓 q Β« J _ 𝐢 pDπ‘₯. 𝑓 q Β« J _ 𝐢 LC␣ LC@ LCD 𝑓 ff J _ 𝐢 𝑓 tπ‘₯ ΓžΓ‘ 𝑦u _ 𝐢 𝑓 tπ‘₯ ΓžΓ‘ skx𝛼y𝑦 𝑛 u _ 𝐢 𝑓 «𝑔_𝐢 LCΒ« 𝑓 ff J _ 𝑔 Β« J _ 𝐢 𝑓 Β« J _ 𝑔 ff J _ 𝐢 The rules described above are as given by Ganzinger and Stuber (adapted to our setting), with the omission of rules for negative literals (𝑓 ff J), which are easy to derive and which can be found in their work [28]. In LCΒ« we require both 𝑓 and 𝑔 to be formulas and at least one of them not to be J. In LC@ , 𝑦 is a fresh variable, and in LCD , sk is a fresh symbol and 𝛼 and 𝑦 𝑛 are all the type and term variables occurring freely in Dπ‘₯. 𝑓 . Naive application of the LC rules can result in exponential blowup in problem size. To avoid this, we rename formulas that have repeated occurrences. We keep the count of all non-atomic . formulas occurring as either side of a literal. Before applying the LC rule on a clause 𝑓 Β« J _ 𝐢, we check whether the number of 𝑓 ’s occurrences exceeds the threshold π‘˜. If it does, based on . the polarity of the literal 𝑓 Β« J, we add the clause p 𝑦 𝑛 ff J _ 𝑓 Β« J (if the literal is positive) or p 𝑦 𝑛 Β« J _ 𝑓 ff J (if the literal is negative), where 𝑦 𝑛 are all free variables of 𝑓 and p is . . a fresh symbol. Then, we replace the clause 𝑓 Β« J _ 𝐢 by p 𝑦 𝑛 Β« J _ 𝐢. Before the number of occurrences of 𝑓 is checked, we first check (using a fast, incomplete matching algorithm) if there is a formula 𝑔, for which definition was already introduced, such that π‘”πœŽ β€œ 𝑓 , for some substitution 𝜎. This check can have three outcomes. First, if the definition . q π‘₯𝑛 is already introduced for 𝑔 with the polarity matching that of 𝑓 Β« J, then 𝑓 is replaced by pq π‘₯𝑛 q𝜎. Second, if the definition was introduced, but with different polarity, we create the clause defining 𝑔 with the missing polarity, and replace 𝑓 with pq π‘₯𝑛 q𝜎. Last, if the there is no renamed formula 𝑔 generalizing 𝑓 , then we perform the previously described check. In addition to reusing names for formula definitions, we reuse the Skolem symbols introduced by the LCD rule. When LCD is applied to 𝑓 β€œ Dπ‘₯. 𝑓 1 we check if there is a Skolem skxπ›Όπ‘š y𝑦 𝑛 introduced for a formula 𝑔 β€œ Dπ‘₯. 𝑔 1 , such that π‘”πœŽ β€œ 𝑓 . If so, the symbol sk is reused and Dπ‘₯. 𝑓 1 is replaced by 𝑓 1 tπ‘₯ ΓžΓ‘ pskxπ›Όπ‘š y𝑦 𝑛 q𝜎u. Renaming and name reusing techniques are inspired by the VCNF algorithm described by Reger et al. [29]. Rules Cases and CasesSimp deal with Boolean terms, but we need to rely on extensionality reasoning to deal with πœ†-abstractions whose body has type π‘œ. Using the observation that the 157 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 formula @π‘₯𝑛 . 𝑓 implies that πœ†π‘₯𝑛 . 𝑓 is extensionally equal to πœ†π‘₯𝑛 . J (and similarly, if @π‘₯𝑛 . ␣𝑓 , then πœ†π‘₯𝑛 . 𝑓 Β« πœ†π‘₯𝑛 . Kq, we designed the following rule (where all free variables of 𝑓 are π‘₯𝑛 and variables occurring freely in 𝐢): 𝐢rπœ†π‘₯𝑛 . 𝑓 s Interpretπœ† p@π‘₯𝑛 . 𝑓 q ff J _ 𝐢rπœ†π‘₯𝑛 . Js p@π‘₯𝑛 . ␣𝑓 q ff J _ 𝐢rπœ†π‘₯𝑛 . Ks 4. Alternative Approaches An alternative to heavy modifications of the prover needed to support the rules described above is to treat Booleans as yet another theory. Since the theory of Booleans is finitely axiomatizable, simply stating those axioms instead of creating special rules might seem appealing. Another approach is to preprocess nested Booleans by hoisting them to the top level. 4.1. Axiomatization A simple axiomatization of the theory of Booleans is given by Bentkamp et al. [10]. Following their approach, we introduce the proxy type bool , which corresponds to π‘œ, to the signature. We define proxy symbols t, f, not, and, or, impl, equiv, forall, exists, choice, and eq which correspond to the homologous logical constants from Section 2. In their type declarations π‘œ is replaced by bool . To make this paper self-contained we include the axioms from Bentkamp et al. [10]. Definitions of symbols are computational in nature: symbols are characterized by their behavior on t and f. This also reduces interferences between different axioms. Axioms are listed as follows: t ff f or t π‘₯ Β« t equiv π‘₯ 𝑦 Β« and pimpl π‘₯ 𝑦q pimpl 𝑦 π‘₯q π‘₯Β«t_π‘₯Β«f or f π‘₯ Β« π‘₯ forallx𝛼ypπœ†π‘₯. tq Β« t not t Β« f impl t π‘₯ Β« π‘₯ 𝑦 Β« pπœ†π‘₯. tq _ forallx𝛼y 𝑦 Β« f not f Β« t impl f π‘₯ Β« t existsx𝛼y 𝑦 Β« and t π‘₯ Β« π‘₯ π‘₯ ff 𝑦 _ eqx𝛼y π‘₯ 𝑦 Β« t not pforallx𝛼y pπœ†π‘₯. not p𝑦 π‘₯qqq and f π‘₯ Β« f π‘₯ Β« 𝑦 _ eqx𝛼y π‘₯ 𝑦 Β« f 𝑦 π‘₯ Β« f _ 𝑦 pchoicex𝛼y𝑦q Β« t 4.2. Preprocessing Booleans Kotelnikov et al. extended VCNF, Vampire’s algorithm for clausification, to support nested Booleans [13]. VukmiroviΔ‡ et al. extended the clausification algorithm of Ehoh, the lambda-free higher-order version of E, to support nested Booleans inspired by VCNF extension [11, Section 8]. Zipperposition and Ehoh share the same clausification algorithm, enabling us to reuse the extension with one notable difference: unlike in Ehoh, not all nested Booleans different from variables, J and K will be removed. Namely, Booleans that are below πœ†-abstraction and contain πœ†-bound variables will not be preprocessed. They cannot be easily hoisted to the level of an atom in which they appear, since this process might leak any variables bound in the context in which the nested Boolean appears. Similar preprocessing techniques are used in other higher-order provers [30]. 158 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 5. Examples The TPTP library contains thousands of higher-order benchmarks, many of them hand-crafted to point out subtle interferences of functional and Boolean properties of higher order logic. In this section we discuss some problems from the TPTP library that illustrate the advantages and disadvantages of our approach. In the last five instances of the CASC theorem proving competition, the core calculus of the best performing higher-order prover was tableaux – a striking contrast from first-order part of the competition dominated by superposition-based provers. TPTP problem SET557^1 (a statement of Cantor’s theorem) might shed some light on why tableaux-based provers excel on higher-order problems. This problem conjectures that there is no surjection from a set to its power set: ␣pDpπ‘₯ : πœ„ Γ‘ πœ„ Γ‘ π‘œq. @p𝑦 : πœ„ Γ‘ π‘œq. Dp𝑧 : πœ„q. π‘₯ 𝑧 Β« 𝑦q After negating the conjecture and clausification this problem becomes sk1 psk2 𝑦q Β« 𝑦 where sk1 and sk2 are Skolem symbols. Then, we can use ArgCong rule [10] which applies fresh variable 𝑀 to both sides of the equation, yielding clause 𝐢 β€œ sk1 psk2 𝑦q 𝑀 Β« 𝑦 𝑀. Most superposition- or paramodulation-based higher-order theorem provers (such as Leo-III, Vampire and Zip- perposition) will split this clause into two clauses 𝐢1 β€œ sk1 psk2 𝑦q 𝑀 ff J _ 𝑦 𝑀 Β« J and 𝐢2 β€œ sk1 psk2 𝑦q 𝑀 Β« J _ 𝑦 𝑀 ff J. This clausification step makes the problem considerably harder. Namely, the clause 𝐢 instantiated with the substitution t𝑦 ΓžΓ‘ πœ†π‘₯. ␣psk1 π‘₯ π‘₯q, 𝑀 ΓžΓ‘ sk2 pπœ†π‘₯. ␣psk1 π‘₯ π‘₯qqu yields the empty clause. However, if the original clause is split into two as described above, Zipperposition will rely on PI rule to instantiate 𝑦 with imitation of ␣ and on equality factoring to further instantiate this approximation. These desired inferences need to be applied on both new clauses and represent only a fraction of inferences that can be done with 𝐢1 and 𝐢2 , reducing the chance of successful proof attempt. Rule BoolER imitates the behavior of tableaux prover: it essentially rewrites the clause 𝐢 into ␣psk1 psk2 𝑦q 𝑀q ff 𝑦 𝑀 which makes finding the necessary substitution easy and does not require a clausification step. Combining rule (Bool)ER with lazy clausification is very fruitful as the problem SYO033^1 illustrates. This problem also contains the single conjecture Dpπ‘₯ : pπœ„ Γ‘ π‘œq Γ‘ π‘œq. @p𝑦 : πœ„ Γ‘ π‘œq.pπ‘₯ 𝑦 Γ΄ p@p𝑧 : πœ„q. 𝑦 𝑧qq The problem is easily solved if we instantiate variable π‘₯ with the constant @. Moreover, the prover does not have to blindly guess this instantiation for π‘₯, but can obtain it by unifying π‘₯ 𝑦 with @ 𝑦 (which is the πœ‚-short form of @p𝑧 : πœ„q. 𝑦 𝑧). However, when the problem is clausified, all quantifiers are removed. Then, Zipperposition only finds the proof if appropriate instantiation mode of PI is used, and if both clauses resulting from clausifying the negated conjecture are appropriately instantiated. In contrast, lazy clausification will derive the clause π‘₯ psk π‘₯q ff @ psk π‘₯q from the negated conjecture in three steps. Then, equality resolution results in an empty clause, swiftly finishing the proof without any explosive inferences. This effect is even more pronounced on problems SYO287^5 and SYO288^5, in which critical proof step is instantiation of a variable with imitation of _ and ^. In configurations that do not use lazy clausification and BoolER, Zipperposition times out in any reasonable time limit; with those two options it solves mentioned problems in less than 100 ms. 159 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 In some cases, it is better to preprocess the problem. For example, TPTP problem SYO500^1.005 contains many nested Boolean terms: f0 pf1 pf1 pf1 pf2 pf3 pf3 pf3 pf4 aqqqqqqq Β« f0 pf0 pf0 pf1 pf2 pf2 pf2 pf3 pf4 pf4 pf4 aqqqqqqqqqqq In this problem, all functions f𝑖 are of type π‘œ Γ‘ π‘œ, and constant a is of type π‘œ. FOOL unfolding of nested Boolean terms will result in exponential blowup in the problem size. However, superposition-based theorem provers are well-equipped for this issue: their CNF algorithms use smart simplifications and formula renaming to mitigate these effects. Moreover, when the problem is preprocessed, the prover is aware of the problem size before the proving process starts and can adjust its heuristics properly. E, Zipperposition and Vampire, instructed to perform FOOL unfolding, solve the problem swiftly, using their default modes. However, if problem is not preprocessed, Zipperposition struggles to prove it using Cases(Simp) and due to the large number of (redundant) clauses it creates, succeeds only if specific heuristic choices are made. 6. Evaluation We performed extensive evaluation to determine usefulness of our approach. As our benchmark set, we used all 2606 monomorphic theorems from the TPTP library, given in THF format. All of the experiments were performed on StarExec [31] servers with Intel Xeon E5-2609 0 CPUs clocked at 2.40 GHz. The evaluation is separated in two parts that answer different questions: How useful are the new rules? How does our approach compare with state-of-the-art higher-order provers? 6.1. Evaluation of the Rules For this part of the evaluation, we fixed a single well-performing Zipperposition configuration called base (b). Since we are testing a single configuration, we used the CPU time limit of 15 s – roughly the time a single configuration is given in a portfolio mode. Configuration b uses the pragmatic variant pv21121 of the unification algorithm given by VukmiroviΔ‡ et al. [21]. It enables BoolSimp rule, EC rule, PI rule in Pragmatic mode with π‘˜ β€œ 2, ElimLeibniz and ElimPredVar rules, BoolER rule, and BoolEF rules. To evaluate the usefulness of all rules we described above, we enable, disable or change the parameters of a single rule, while keeping all other parameters of 𝑏 intact. In figures that contain sufficiently different configurations, cells are of the form 𝑛pπ‘šq where 𝑛 is the total number of proved problems by a particular configuration and π‘š is the number of unique problems that a given configuration solved, compared to the other configurations in the same figure. Intersections of rows and columns denote corresponding combination of parameters. Result for the base configuration is written in cursive; the best result is written in bold. First, we tested different parameters of Cases and CasesSimp rules. In Figure 1 we report the results. The columns correspond to three possible options to choose subterm on which the inference is performed: a stands for any eligible subterm, lo and li stands for leftmost outermost and leftmost innermost subterms, respectively. The rows correspond to two different rules: b is the base configuration, which uses CasesSimp, and bc swaps this rule for Cases. Although the margin is slim, the results show it is usually preferable to select leftmost-outermost subterm. 160 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 a lo li b 1646 1648 1640 bc 1644 1645 1644 Figure 1: Effect of the Cases(Simp) rule on success rate Β΄PI b𝑝 b𝑓 b^ b_ bΒ« b␣ b@D π‘˜β€œ1 1648 1628 1637 1634 1630 1641 1637 π‘˜β€œ2 1636 1646 1629 1636 1631 1627 1638 1634 π‘˜β€œ8 1643 1625 1633 1631 1623 1637 1635 Figure 2: Effect of PI rule on success rate Β΄EL `EL Β΄BEF `BEF Β΄EPV 1584 (0) 1644 (0) Β΄BER 1644 (2) 1643 (0) `EPV 1612 (0) 1646 (0) `BER 1645 (0) 1646 (0) Figure 3: Effect of Leibniz equality elimination Figure 4: Effect of BoolER and BoolEF Second, we evaluated all the modes of PI rule with 3 values for parameter π‘˜: 1, 2, and 8 (Figure 2). The columns denote, from left to right: disabling the PI rule, Pragmatic mode, Full mode, and Imit β€Ή modes with appropriate logical symbols. The rows denote different values of π‘˜. The results show that different values for π‘˜ have a modest effect on success rate. The raw data reveal that when we focus our attention to configurations with π‘˜ β€œ 2, mode Full can solve 10 problems no other mode (including disabling PI rule) can. Modes Imit ^ and Pragmatic solve 2 problems, whereas Imit _ solves one problem uniquely. This result suggests that, even though this is not evident from Figure 2, sets of problems solved solved by different modes somewhat differ. Figure 3 gives results of evaluating rules that treat Leibniz equality on the calculus level: EL stands for ElimLeibniz, whereas EPV denotes ElimPredVar; signs Β΄ and ` denote that rule is removed from or added to configuration b, respectively. Disabling both rules severely lowers the success rate. The results suggest that including ElimLeibniz is beneficial to performance. Similarly, Figure 4 discusses merits of including (`) or excluding (Β΄) BoolER (BER) and BoolEF (BEF) rules. Our expectations were that inclusion of those two rules would make bigger impact on success rate. It turned out that, in practice, most of the effects of these rules could be achieved using a combination of the PI rule and basic superposition calculus rules. Combining these two rules with lazy clausification is more useful: when the rule EC is replaced by the rule LC, the success rate increases (compared to 1646 problems solved by 𝑏) to 1660 problems. We also discovered that reasoning with choice is useful: when rule Choice is enabled, the success rate increases to 1653. We determined that including or excluding the conclusion 𝐷 of Choice, after it is simplified, makes no difference. Counterintuitively, disabling BoolSimp rule results in 1640 problems, which is only 6 problems short of configuration 𝑏. Disabling Ext and Interpret-πœ† rules results in solving 25 and 31 problems less, respectively. Raw data show 161 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 CVC4 Leo-III Satallax Vampire Zipperposition pure 1806 (5) 1627 (0) 2067 (0) 1924 (7) 1980 ( 0) coop – 2085 (3) 2214 (9) – 2190 (17) Figure 5: Comparison with other higher-order provers that in total, using configurations from Figure 1 to Figure 4, 1682 problems can be solved. Last, we compare our approach to alternatives. Axiomatizing Booleans brings Zipperposition down to a grinding halt: only 1106 problems can be solved using this mode. On the other hand, preprocessing is fairly competitive: it solves only 8 problems less than the 𝑏 configuration. 6.2. Comparison with Other Higher-Order Provers We compared Zipperposition with all higher-order theorem provers that took part in THF division of CASC-27[27]: CVC4 1.8 prerelease [4], Leo-III 1.4 [14], Satallax 3.4 [15], and Vampire- THF 4.4 [6]. In this part of the evaluation, Zipperposition is ran in portfolio mode that runs configurations in different time slices. We set the CPU time limit to 180 s, the time allotted to each prover at CASC-27. Leo-III and Satallax are cooperative theorem provers – they periodically invoke first-order provers to finish the proof attempt. Leo-III uses CVC4, E and iProver [32] as backends, while Satallax uses Ehoh [11] as backend. Zipperposition can use Ehoh as backend as well. To test effectiveness of each calculus, we run the cooperative provers in two versions: pure, which disables backends, and coop which uses all supported backends. In both pure and cooperative mode, Satallax comes out as the winner. Zipperposition comes in close second, showing that our approach is a promising basis for further extensions. Leo- III uses SMT solver CVC4, which features native support for Booleans, as a backend. It is possible that the use of CVC4 is one of the reasons for massive improvement in success rate of cooperative configuration of Leo-III, compared with the pure version. Therefore, we conjecture that including support for SMT backends in Zipperposition might be beneficial. 7. Discussion and Related Work Our work is primarily motivated by the goal of closing the gap between higher-order β€œhammer” or software verifier frontends and first-order backends. Considerable amount of research effort has gone into making the translations of higher-order logic as efficient as possible. Descriptions of hammers like HOLyHammer [1] and Sledgehammer [2] for Isabelle contain details of these translations. Software verifiers Boogie [33] and Why3 [34] use similar translations. Established higher-order provers like Leo-III and Satallax perform very well on TPTP benchmarks; however, recent evaluations show that on Sledgehammer problems they are outperformed by translations to first-order logic [10, 11, 9]. Those two provers are built from the ground up as higher-order provers – treatment of exclusively higher-order issues such as extensionality or choice is built into them usually using explosive rules. Those explosive rules might contribute 162 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 to their suboptimal performance on mostly first-order Sledgehammer problems. In contrast, our approach is to start with a first-order prover and gradually extend it with higher-order features. The work performed in the context of Matryoshka project [35], in which both authors of this paper participate, resulted in adding support for πœ†-free higher-order logic with Booleans to E [11] and veriT [9], and adding support for Boolean-free higher-order logic to Zipperposition. Authors of many state-of-the-art first-order provers have implemented some form of support for higher-order reasoning. This is true both for SMT solvers, witnessed by the recent extension of CVC4 and veriT [9], and for superposition provers, witnessed by the extension of Vampire [36]. All of those approaches were arguably more focused on functional aspects of higher-order logic, such as πœ†-binders and function extensionality, than on Boolean aspects such as Boolean subterms and Boolean extensionality. A notable exception is work by Kotelnikov et al. that introduced support for Boolean subterms to first-order Vampire [12, 13]. The main merit of our approach is that it combines two successful complementary approaches to support features of higher-order logic that have not been combined before in a modular way. It is based on a higher-order superposition calculus that incurs around 1% of overhead on first-order problems compared with classic superposition [10]. We conjecture that it is this efficient reasoning base on which the approach is based that contributes to its competitive performance. 8. Conclusion We presented a pragmatic approach to support Booleans in a modern automatic prover for clausal higher-order logic. Our approach combines previous research efforts that extended first-order provers with complementary features of higher-order logic. It also proposes some solutions for the issues that emerge with this combination. The implementation shows clear improvement over previous techniques and competitive performance. What our work misses is an overview of heuristics that can be used to curb the explosion incurred by some of the rules described in this paper. In future work, we plan to address this issue. Similarly, unlike Bentkamp et al. [10], we do not give any completeness guarantees for our extension. We plan to develop a refutationally complete calculus that supports Booleans around core rules such as Cases and LC in future work. Acknowledgment We are grateful to the maintainers of StarExec for letting us use their service. We thank Alexander Bentkamp, Jasmin Blanchette and Simon Cruanes for many stimulating discussions and all the useful advice. Alexander Bentkamp suggested the rule BoolER. We are also thankful to Ahmed Bhayat, Predrag JaničiΔ‡ and the anonymous reviewers for suggesting many improvements to this paper. We thank Evgeny Kotelnikov for patiently explaining the details of FOOL, Alexander Steen for clarifying Leo-III’s treatment of Booleans, and Giles Reger and Martin Suda for explaining the renaming mechanism implemented in VCNF. Vukmirović’s research has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 713999, Matryoshka). Nummelin has received funding from the Netherlands Organization for Scientific Research (NWO) under the Vidi program (project No. 016.Vidi.189.037, Lean Forward). 163 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 References [1] C. Kaliszyk, J. Urban, Hol(y)hammer: Online ATP service for HOL light, Mathematics in Computer Science 9 (2015) 5–22. [2] L. C. Paulson, J. C. Blanchette, Three years of experience with Sledgehammer, a practical link between automatic and interactive theorem provers, in: G. Sutcliffe, S. Schulz, E. Ternovska (Eds.), IWIL-2010, volume 2 of EPiC, EasyChair, 2012, pp. 1–11. [3] J. FilliΓ’tre, A. Paskevich, Why3 - where programs meet provers, in: M. Felleisen, P. Gardner (Eds.), Programming Languages and Systems - 22nd European Symposium on Programming, ESOP 2013, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2013, Rome, Italy, March 16-24, 2013. Proceedings, volume 7792 of LNCS, Springer, 2013, pp. 125–128. [4] C. W. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jovanovic, T. King, A. Reynolds, C. Tinelli, CVC4, in: G. Gopalakrishnan, S. Qadeer (Eds.), CAV 2011, volume 6806 of LNCS, Springer, 2011, pp. 171–177. [5] S. Schulz, S. Cruanes, P. Vukmirovic, Faster, higher, stronger: E 2.3, in: P. Fontaine (Ed.), CADE 2019, volume 11716 of LNCS, Springer, 2019, pp. 495–507. [6] L. KovΓ‘cs, A. Voronkov, First-order theorem proving and vampire, in: N. Sharygina, H. Veith (Eds.), CAV 2013, volume 8044 of LNCS, Springer, 2013, pp. 1–35. [7] J. Robinson, A note on mechanizing higher order logic, in: B. Meltzer, D. Michie (Eds.), Machine Intelligence, volume 5, Edinburgh University Press, 1970, pp. 121–135. [8] J. Meng, L. C. Paulson, Translating higher-order clauses to first-order clauses, J. Autom. Reasoning 40 (2008) 35–60. [9] H. Barbosa, A. Reynolds, D. E. Ouraoui, C. Tinelli, C. W. Barrett, Extending SMT solvers to higher-order logic, in: P. Fontaine (Ed.), CADE 2019, volume 11716 of LNCS, Springer, 2019, pp. 35–54. [10] A. Bentkamp, J. Blanchette, S. Tourret, P. VukmiroviΔ‡, U. Waldmann, Superposition with lambdas, 2020. Submitted to a journal, http://matryoshka.gforge.inria.fr/pubs/lamsup_ report.pdf. [11] P. Vukmirovic, J. C. Blanchette, S. Cruanes, S. Schulz, Extending a brainiac prover to lambda-free higher-order logic, 2020. Submitted to a journal, http://matryoshka.gforge. inria.fr/pubs/ehoh_article.pdf. [12] E. Kotelnikov, L. KovΓ‘cs, A. Voronkov, A first class boolean sort in first-order theorem proving and TPTP, CoRR abs/1505.01682 (2015). arXiv:1505.01682. [13] E. Kotelnikov, L. KovΓ‘cs, M. Suda, A. Voronkov, A clausal normal form translation for FOOL, in: C. BenzmΓΌller, G. Sutcliffe, R. Rojas (Eds.), GCAI 2016, volume 41 of EPiC Series in Computing, EasyChair, 2016, pp. 53–71. [14] A. Steen, C. BenzmΓΌller, The higher-order prover Leo-III, in: D. Galmiche, S. Schulz, R. Sebastiani (Eds.), IJCAR 2018, volume 10900 of LNCS, Springer, 2018, pp. 108–116. [15] C. E. Brown, Satallax: An automatic higher-order prover, in: B. Gramlich, D. Miller, U. Sattler (Eds.), IJCAR 2012, volume 7364 of LNCS, Springer, 2012, pp. 111–117. [16] P. B. Andrews, Classical type theory, in: J. A. Robinson, A. Voronkov (Eds.), Handbook of Automated Reasoning, volume II, Elsevier and MIT Press, 2001, pp. 965–1007. [17] G. Sutcliffe, The TPTP problem library and associated infrastructure - from CNF to TH0, 164 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 TPTP v6.4.0, J. Autom. Reasoning 59 (2017) 483–502. [18] S. Cruanes, Extending Superposition with Integer Arithmetic, Structural Induction, and Beyond, Ph.D. thesis, Γ‰cole polytechnique, 2015. [19] S. Cruanes, Superposition with structural induction, in: C. Dixon, M. Finger (Eds.), FroCoS 2017, volume 10483 of LNCS, Springer, 2017, pp. 172–188. [20] I. Cervesato, F. Pfenning, A linear spine calculus, J. Log. Comput. 13 (2003) 639–688. [21] P. Vukmirovic, A. Bentkamp, V. Nummelin, Efficient full higher-order unification, in: Z. M. Ariola (Ed.), FSCD 2020, volume 167 of LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum fΓΌr Informatik, 2020, pp. 5:1–5:17. [22] L. Bachmair, H. Ganzinger, Rewrite-based equational theorem proving with selection and simplification, J. Log. Comput. 4 (1994) 217–247. [23] S. Schulz, E - a brainiac theorem prover, AI Commun. 15 (2002) 111–126. [24] A. Nonnengart, C. Weidenbach, Computing small clause normal forms, in: J. A. Robinson, A. Voronkov (Eds.), Handbook of Automated Reasoning (in 2 volumes), Elsevier and MIT Press, 2001, pp. 335–367. [25] A. Steen, Extensional paramodulation for higher-order logic and its effective implementation Leo-III, Ph.D. thesis, Free University of Berlin, Dahlem, Germany, 2018. [26] A. Bhayat, G. Reger, A combinator-based superposition calculus for higher-order logic, in: N. Peltier, V. Sofronie-Stokkermans (Eds.), IJCAR 2020, volume 12166 of Lecture Notes in Computer Science, Springer, 2020, pp. 278–296. [27] G. Sutcliffe, The CADE-27 automated theorem proving system competition - CASC-27, AI Commun. 32 (2019) 373–389. [28] H. Ganzinger, J. Stuber, Superposition with equivalence reasoning and delayed clause normal form transformation, Inf. Comput. 199 (2005) 3–23. [29] G. Reger, M. Suda, A. Voronkov, New techniques in clausal form generation, in: C. BenzmΓΌller, G. Sutcliffe, R. Rojas (Eds.), GCAI 2016, volume 41 of EPiC Series in Computing, EasyChair, 2016, pp. 11–23. [30] M. Wisniewski, A. Steen, K. Kern, C. BenzmΓΌller, Effective normalization techniques for HOL, in: N. Olivetti, A. Tiwari (Eds.), IJCAR 2016, volume 9706 of LNCS, Springer, 2016, pp. 362–370. [31] A. Stump, G. Sutcliffe, C. Tinelli, StarExec: A cross-community infrastructure for logic solving, in: S. Demri, D. Kapur, C. Weidenbach (Eds.), IJCAR 2014, volume 8562 of LNCS, Springer, 2014, pp. 367–373. [32] K. Korovin, iprover - an instantiation-based theorem prover for first-order logic (system description), in: A. Armando, P. Baumgartner, G. Dowek (Eds.), IJCAR 2008, volume 5195 of LNCS, Springer, 2008, pp. 292–298. [33] K. R. M. Leino, P. RΓΌmmer, A polymorphic intermediate verification language: Design and logical encoding, in: J. Esparza, R. Majumdar (Eds.), TACAS 2010, volume 6015 of LNCS, Springer, 2010, pp. 312–327. [34] F. Bobot, J.-C. FilliΓ’tre, C. MarchΓ©, A. Paskevich, Why3: Shepherd Your Herd of Provers, in: Boogie 2011: First International Workshop on Intermediate Verification Languages, Wroclaw, Poland, 2011, pp. 53–64. [35] J. Blanchette, P. Fontaine, S. Schulz, S. Tourret, U. Waldmann, Stronger higher-order automation: A report on the ongoing matryoshka project, in: M. Suda, S. Winkler (Eds.), 165 Petar VukmiroviΔ‡ et al. CEUR Workshop Proceedings 148–166 EPTCS 311: Proceedings of the Second International Workshop on Automated Reasoning: Challenges, Applications, Directions, Exemplary Achievements - Natal, Brazil, August 26, 2019, Electronic Proceedings in Theoretical Computer Science, EPTCS, EPTCS, 2019, pp. 11–18. [36] A. Bhayat, G. Reger, Restricted combinatory unification, in: P. Fontaine (Ed.), CADE 2019, volume 11716 of LNCS, Springer, 2019, pp. 74–93. 166