Reuse of Introduced Symbols in Automatic Theorem Provers

Reuse of Introduced Symbols in Automatic Theorem Provers MichaelRawson MartinSuda Robotics Czech Institute of Informatics Cybernetics PetraHozzová GilesReger University of Manchester TUWien Reuse of Introduced Symbols in Automatic Theorem Provers 1613-0073 315019E31B7785670EA5891E27131E2E GROBID - A machine learning software for extracting information from scholarly documents

Automatic theorem provers may introduce fresh function or predicate symbols for various reasons. Sometimes, such symbols can be reused. We describe a simple form of symbol reuse in the first-order system Vampire, investigate its practical effect, and propose future schemes for more aggressive reuse.

Introduction

An automatic theorem prover might introduce a fresh symbol -that is, a symbol it has not previously seen -at any time between the start and end of its execution. During traditional preprocessing [1,2], the two most well-known examples are Skolemization to eliminate ∃-binders, and some variation of the Tseitin transformation 1 to avoid excessive duplication of subformulae. Fresh symbols may also be used in preprocessing to eliminate exotic input features [3], or to optimise inputs with respect to some search algorithm [4]. New symbols may even be introduced during proof-search proper, such as for clause splitting [5] or induction [6]. In some cases, these symbols can be reused. We examine reuse of fresh symbols in the Vampire [7] automatic theorem prover, but results could easily generalise to other similar systems.

For example, consider Skolemizing

∃x.P (x) ∨ Q(x) ∃x.P (x) ∨ Q(x)

Plainly, these are the same formula and could employ the same Skolem constant, but both of Vampire's normal-form routines produce two constants -and therefore two clauses -instead of the possible one:

P (s 1 ) ∨ Q(s 1 ) P (s 2 ) ∨ Q(s 2 )

PAAR'22: 8th Workshop on Practical Aspects of Automated Reasoning, August 11-12, 2022, Haifa, Israel

Vampire could in principle notice that these are the same formula, then only process it once. A natural way to achieve this is by formula sharing [8], but this would be quite a large change to Vampire and has some issues of its own. The example given here is very simple, but symbol reuse can be employed extensively: see §4 for some ideas.

Motivation

It may appear that reusing introduced symbols is a clear win for system performance.

In the case of definition introduction, removing duplicate definitions reduces the total number of clauses, which is considered likely to improve performance. However, reused symbols allow inference between parts of the search space that do not otherwise interact: in some cases this might shorten proofs or eliminate redundancy, but in others it expands the search space needlessly.

There is cause for optimism: Vampire uses an increasing amount of SAT/SMT technology to organise and perform ground reasoning tasks, such as AVATAR [9], AVATAR modulo theories [10], the InstGen calculus and global subsumption [11], MACE-style finite model building [4], theory instantiation [12], and more [13]. Reusing symbols in this context reduces the SAT/SMT model space, which is likely to improve both ground and first-order performance. Additionally, concurrent proof attempts [14] can share information over a common signature: reusing symbols consistently extends this common signature to introduced symbols.

Justification

When is it possible to reuse a symbol? Informally, introduced symbols often represent or stand for some formula. Then, an existing symbol representing the same formula may be reused. This sleight of hand is most alluring with definition introduction, where a fresh predicate P is defined to be equivalent to F : P stands for F . Skolemization, when encountering ∃x.F , introduces a function s with semantics roughly "some s such that F": again, s stands for F .

It is possible to relax the requirement for identical formulae to merely equivalent formulae. Consider introducing a symbol s to represent F : naturally enough, s could also be used to refer to G if F ≡ G. First-order logic is sufficiently expressive to even allow s(x) to stand for F [x], and then use s(t) for F [t]. Such relaxations can be powerful, see §4 for more on this.

Setting

Vampire [7] is a state-of-the-art automatic theorem prover for first-order logic, with extensions to support various theories, induction [15,6], rank-1 polymorphism [16], firstclass booleans [3], and higher-order logic [17]. It implements many different techniques for reasoning efficiently despite these very hard problem domains. These techniques are not equally effective on all problems and have many tunable parameters, which can make the tool tricky for end-users: therefore, pre-tuned strategy schedules are provided that run a series of pre-tuned options.

Vampire operates exclusively within a suitably-extended clause normal form. All formulae not in this form are translated into it before proof search begins, using Vampire's advanced preprocessing routines [8]. This preprocessing frequently introduces fresh symbols.

Implementation and Pitfalls

We implement symbol reuse for Skolemization and definition introduction in Vampire's preprocessing routines. Formulae are identified up to α-equivalence for this initial work, and symbol reuse for Skolem symbols and definitions can be switched on and off separately. We also describe some unpleasant bugs we encountered during testing (subsections beginning Pitfall ) which we hope the reader can now avoid in their own implementation.

Key Functions

The basic implementation idea is to define a "key" function k from a formula to some reasonable key type, such as a formula, string or code sequence. This key can then be used to lookup and reuse symbols that have been used for the same key before. Suppose we need a symbol to stand for some formula F . If we have an existing symbol for k(F ), then we may re-use this symbol. Otherwise, we create a fresh symbol and store it with k(F ) so that we may reuse it later. In this work we consider a relatively simple key function (see below), but the general setting allows more complex key functions in future ( §4).

Detecting α-equivalence

The simplest key function is the identity function k id , but this will only reuse symbols for really trivial cases, and even a renaming of variables will defeat it. Consider introducing a symbol for

F ≡ ∀x∀y. H[x, y] versus G ≡ ∀y∀x. H[y, x]

for example: k id (F ) = k id (G). We implement a slightly more sophisticated key function k α to detect α-equivalence, which canonically renames formulae left-to-right using a sequence x i for each new variable encountered. Then both the above formulae map to the same key,

k α (F ) = k α (G) = ∀x 0 ∀x 1 .H[x 0 , x 1 ]

allowing more generous symbol reuse. This still does not handle more complex cases, such as identifying ∀x∀y. H[x, y] and ∀x∀y. H[y, x]. We note that for our purposes this is equivalent in power to more-complex schemes like de Bruijn indices, as we only wish to identify strictly α-equivalent formulae. We do not need other operations such as capture-avoiding substitution.

Pitfall: Free Variables

Introduced symbols must usually close over the free variables of the formula they represent. For example, consider Skolemizing ∀x∀y∃z.P (x, y, z). The outer two universal quantifiers are removed, and we obtain P (x, y, s(x, y)), where s is a fresh symbol applied to the now-free variables x and y. The reuse key in this case is

k α (∃z.P (x, y, z)) = ∃x 0 .P (x 1 , x 2 , x 0 ).

If we were now to additionally Skolemize ∀x∀y∃z.P (y, x, z), then the key is the same and P (y, x, s(y, x)) is a legitimate instance of symbol reuse. Here we draw the reader's attention to the order of variables in the Skolem term. However, Vampire's existing Skolemization routine created Skolem terms with variables in the order of binding, not of occurrence, and therefore produced P (y, x, s(x, y)), which is unsound. To see this, consider the satisfiable set of formulae

∀x∀y∃z. z = f (x, y) ∀x∀y∃z. z = f (y, x) f (a, b) = f (b, a)

Implementors must ensure that terms are created with variables in the order they occur in subformulae (or at least in a consistent order with respect to keys), lest they fall prey to this mis-reuse.

Pitfall: Sorts and ad-hoc Polymorphism

Some symbols are usually assumed to be ad-hoc polymorphic, even in monomorphic many-sorted systems. A good example is the equality predicate, but Vampire also considers e.g. arithmetic operators to be ad-hoc polymorphic over the integers, rationals, and reals. The problem here is that the reuse key must differentiate between the types of arguments the overloaded symbol is applied to. Consider Skolemizing the input problem ∀x : σ. ∀y : σ. ∃z : ρ. P (z) ∨ x = y ∀x : τ. ∀y : τ. ∃z : ρ. P (z) ∨ x = y where σ, τ, ρ are distinct sorts. The reuse key in both cases is

k α (∃z : ρ. P (z) ∨ x = y) = ∃x 0 : ρ. P (x 0 ) ∨ x 1 = x 2

But the symbol cannot be reused in this case as it would be ill-typed. One solution (which we implement) is to include the types of free variables in the reuse key, so that the keys are

x 1 : σ, x 2 : σ ∃x 0 : ρ. P (x 0 ) ∨ x 1 = x 2 x 1 : τ, x 2 : τ ∃x 0 : ρ. P (x 0 ) ∨ x 1 = x 2

Another solution, perhaps neater, is annotating ad-hoc overloaded symbols with their argument types, resulting in keys

∃x 0 : ρ. P (x 0 ) ∨ x 1 = σ x 2 ∃x 0 : ρ. P (x 0 ) ∨ x 1 = τ x 2

Yet another is to have the reused symbol be polymorphic over the ad-hoc type variables in the reuse key, although this does require rank-1 polymorphism. We implement the first solution for this work, although at some point in the future expect to switch to another.

Summary of Implemented Technique

To recap, we define a key function k α (F ) = Γ F , where Γ is an ordered list of pairs x : τ giving the types of free variables in F , and F is a canonically-renamed copy of F . If two formulae F and G have the same key, we may use the same introduced symbol s in terms/predicates representing F or G. We reuse symbols in this way while performing definition introduction or Skolemization. The term for a given formula F is constructed from a possibly-reused symbol s and the free variables of F in order of their occurrence.

Compromise: Quantifier Blocks and Skolemization

While developing the technique, we noticed that on certain extreme problems attempting to reuse Skolem symbols leads to a degradation of performance. A careful look revealed a prohibitive quadratic complexity of repetitive key computation for deep existential quantifier blocks. For instance, on a formula ∃x 1 x 2 . . . x n . F we would need to compute

k α (∃x 1 x 2 . . . x n . F ), k α (∃x 2 . . . x n . F ), k α (∃x 3 . . . x n . F ) .

. . , which becomes too expensive for a large n and F . We decided to avoid this expensive case by storing Skolem symbols for reuse on perquantifier-block basis. This way, we only need one call to the key function for each (existential) quantifier block and nominally store a whole vector of Skolems with the key. In the actual implementation, however, only one symbol is stored, because we can assume the remaining ones occupy consecutive slots in the symbol table. As a mild concession, this trick gives up on the ability to reuse symbols within blocks: for example, when s 1 , s 2 . . . , s n get jointly stored at k α (∃x 1 x 2 . . . x n . F ), we are not able to later selectively retrieve, e.g., s n alone for k α (∃x n . F ). However, in practice, this seems to be a reasonable compromise.

Practical Effect

With the change described in §2, the example from §1 now works as expected. Practical effectiveness now depends upon two factors: whether benchmarks actually contain αequivalent (sub)formulae for which Vampire can reuse introduced symbols, and whether this reuse is beneficial for proof search in Vampire (or downstream users of Vampire's clausification routines).

Reusable Symbols

We consider the untyped first-order ("FOF") benchmarks of the TPTP [18] problem library, version 7.5.0, which amounts to 9091 problems over a moderate number of domains. Vampire processed these problems using its default clausification routine and attempted to reuse definition and Skolem symbols up to α-equivalence. Of these 9091 problems, Vampire could reuse at least one symbol for 4311 problems. In some cases, a large number of symbols could be reused many times: in the case of ITP024+4, Vampire was able to reuse one of 3978 introduced symbols on 19442 separate occasions, with the most commonly-reused symbol reused in 184 different locations. We consider this relatively strong evidence that attempting to reuse introduced symbols may be worthwhile, even for smaller systems that have otherwise simple preprocessing.

Effect on Proof Search

To measure the effect of symbol reuse on proof search, we ran Vampire2 (version 4.6.1) in its default (single-strategy) setting and its variations using symbol reuse on the mentioned 9091 FOF TPTP problems. The experiment was run on a server with Intel® Xeon® Gold 6140 CPUs clocked at 2.3 GHz with 500 GB RAM. To utilise the parallelism of our server while maintaining stability of the experiment we limited the runs using an instruction limit (rather than the more usual time limit). 3 More specifically, we used a limit of 50 billion instructions, which approximately corresponds to 10 seconds on the machine. The result of our experiment are shown in Table 1. We can see that there is a consistent (although modest) improvement by both symbol reuse applications and that there is a combined benefit of using then jointly. The column with unique solutions reminds us that neither of the techniques is beneficial universally. However, as a sign of complementarity, unique solutions indicate that the newly available symbol reuse options will be useful at constructing more powerful strategy schedules.

To get a more robust version of the results, resistant to possible influence of statistical noise, we rerun the whole experiment in a shuffling mode, randomizing the prover at various don't-care non-deterministic call sites and averaging the results over many independently seeded runs (please check out our previous work [19] for the exact description of the method). Table 2 reports on the computed averages as well as the estimated standard deviation (sigma). We can see that the positive effect of definition reuse on its own could not be confirmed, but the combined strategy using both definition reuse and Skolem symbol reuse is still the best one. All the averages are slightly lower than the values in Table 1 due to the overhead of shuffling (in accord with a remark in [19]).

We could identify a single TPTP problem as a robust gain of symbol reuse, namely the problem LCL667+1.010, which means the probability of solving this problem by default was 0.0 and by skr+dr 1.0, and no robust loss. Vampire reused 344 Skolem symbols on LCL667+1.010.

Bonus: Induction

We were pleasantly surprised to find that Vampire can now also reuse symbols generated during proof search with inductive reasoning. For inductive reasoning, Vampire uses many different inference rules [6,15,20,21]. The simplest is of the form

¬L[t] ∨ C cnf(F → ∀x.L[x])

where t is a ground term, L is a ground literal, C is a clause, cnf(•) denotes a translation to clausal normal form, and F → ∀x.L[x] is a valid induction schema. The schema can be instantiated with various induction axioms, such as structural induction axiom for inductive datatypes or induction axiom with symbolic bounds for integers [6].

All clauses from the clausified conclusion of the rule contain the literal L[x]. After adding these clauses to the search space, Vampire immediately resolves them against the premise of the rule, ¬L[t] ∨ C, obtaining cnf(¬F ) ∨ C.

(

Since the resulting clauses (1) do not contain the term t from ¬L[t] ∨ C, the result of applying induction on ¬L[t]∨C is the same as on ¬L[t ]∨C for different t and t . Vampire therefore reduces redundancy by applying the induction rule on a premise ¬L[t ] ∨ C only if it did not already apply induction with the same axiom on some ¬L[t] ∨ C. However, this redundancy-avoiding heuristic does not work for some induction axioms having a more complex consequent, used in more-complex induction inference rules. One such case is the upward integer induction axiom with default bound [6]:

L[0] ∧ ∀y.(y ≥ 0 ∧ L[y] =⇒ L[y + 1]) =⇒ ∀x.(x ≥ 0 =⇒ L[x])(2)

Consider applying the induction rule with the axiom (2) on the premise ¬L[t]. The result of clausifying and Skolemizing the axiom is:

¬L[0] ∨ s ≥ 0 ∨ x < 0 ∨ L[x] ¬L[0] ∨ L[s] ∨ x < 0 ∨ L[x] (3) ¬L[0] ∨ ¬L[s + 1] ∨ x < 0 ∨ L[x]

where s is a Skolem constant corresponding to y in (2). Resolving the clauses (3) with the premise ¬L[t] results in clauses containing t:

¬L[0] ∨ s ≥ 0 ∨ t < 0 ¬L[0] ∨ L[s] ∨ t < 0 (4) ¬L[0] ∨ ¬L[s + 1] ∨ t < 0

If Vampire later encounters ¬L[t ], it applies induction with the same axiom (2) on it, since the resulting clauses will be different -they will contain t instead of t. However, to do that, the negated antecedent of (2) needs to be clausified and Skolemized twice.

Vampire can therefore apply symbol reuse to obtain the clauses:

¬L[0] ∨ s ≥ 0 ∨ t < 0 ¬L[0] ∨ L[s] ∨ t < 0 ¬L[0] ∨ ¬L[s + 1] ∨ t < 0

with the same Skolem constant s as in (4). In this way, symbol reuse has the potential to reduce work when using some integer induction rules. For example, on one benchmark4 from the inductive benchmark set [22], Vampire can reuse 6 Skolem constants, and generates only 1430 clauses to find a proof compared to 1737 without symbol reuse.

Future Schemes

If F ≡ G, then the same symbol can be used to stand for F or G. However, since this criterion is undecidable for first-order logic, we suggest some more-pragmatic schemes below that we have not yet implemented. If, on the other hand, reusing symbols is of paramount importance, sub-formulae are typically small and the criterion easy compared to the main problem, then determining F ≡ G with the theorem prover itself may be an option.

Normalised Formulae and AC Operators

Our existing scheme to detect α-equivalence ( §2.2) still cannot identify even some triviallyequivalent formulae: consider e.g. F ∧ G ≡ G ∧ F . By a suitable choice of key function it would be possible to identify some such cases with little computational effort. More aggressive schemes are possible in exchange for more computation, such as with the key "sorted conjunctive normal form". We note that even more care must be taken with free variables (refer §2.3) as the order of occurrence of variables may be changed in the resulting keys. We suspect this scheme may be particularly useful when considering clausification during proof search [23].

Generalisation and Instantiation

The schemes discussed so far cannot reuse symbols for the following scenario. Suppose that we have an often-repeated formula F [t i ] containing a series of different terms t i :

F [t 1 ], F [t 2

] and so on, and that we wish to introduce symbols to represent these formulae. Such formulae are quite common in common-sense reasoning over large knowledge bases, such as those exported from SUMO [24].

To fix this shortcoming, we can generalise

F [t i ] to F [x]

for some fresh variable x, introduce a single symbol s(x) and use s(t i ) to represent each F [t i ]. However, producing candidate generalisations from a set of formulae is not easy to implement efficiently, and application of this technique would be necessarily heuristic since it is not clear which generalisation(s) are best for proof search. An alternative is to maximally generalise k gen (F ) such that no non-variable terms occur in it. For example, consider

F ≡ P (c, g(d)) ∧ ¬Q(c, x)

Then, the reuse key is k gen (F ) = P (x 0 , x 1 ) ∧ ¬Q(x 2 , x 3 ) and the term s(c, g(d), c, x) can be used to represent F . While the potential for symbol reuse is very high with this approach (particularly when combined with §4.1), this technique also increases symbol arity enormously, which is generally viewed negatively from the perspective of system performance.

Related Work

Computing normal forms [1] optimised for consumption [2] by automated theorem provers is well-studied. However, there is less work that aims to reduce the number of introduced symbols specifically, rather than reducing number or size of clauses. Other techniques in Vampire have achieved such a reduction as a side-effect [8,15]. In the context of Spass [25], the technique of generalized renaming [26] is effective at reducing the number of introduced definitions. Generalized renaming is similar in spirit to §4.2 but with a greater degree of sophistication.

Conclusion

Reusing symbols is often overlooked as a topic for preprocessing, but can be effective, especially for systems using some amount of ground reasoning. We show that even a simple form of reuse can be practically effective in Vampire and suggest some future schemes for reusing a greater number of symbols. The techniques proposed and evaluated here are relatively easy to implement, despite the hazards that we highlight, and are widely-applicable to other ATP systems.

Table 11Number of TPTP problems solved by Vampire's default strategy under a 50 billion instruction limit, with formula definition reuse (dr), Skolem symbol reuse (skr), and both at once (skr+dr).solved uniquelydefault429011dr429714skr430012skr+dr430515

Table 22An analogue of Table1for problems solved on average (based on the methodology from[19]).av. solved sigmadefault4281.310.4dr4281.210.5skr4287.210.4skr+dr4289.210.5

https://github.com/vprover/vampire See appendix A of our previous work[19] for more details on instruction limiting. https://github.com/vprover/inductive_benchmarks/blob/master/benchmarks/int/val/smt2/ declared_axall_conjall_valconst_unint.smt2

Acknowledgments

Petra Hozzová and Michael Rawson were supported by the ERC CoG ARTIST 101002685 and the FWF grant LogiCS W1255-N23. Martin Suda was supported by the Czech Science Foundation project no. 20-06390Y and the project RICAIP no. 857306 under the EU-H2020 programme.

Normal form transformations MBaaz UEgly ALeitsch Handbook of automated reasoning Elsevier 2001 Computing small clause normal forms ANonnengart CWeidenbach Handbook of automated reasoning Elsevier 2001 The Vampire and the FOOL EKotelnikov LKovács GReger AVoronkov Proceedings of the 5th ACM SIGPLAN Conference on Certified Programs and Proofs the 5th ACM SIGPLAN Conference on Certified Programs and Proofs ACM 2016 New techniques that improve MACE-style finite model finding KClaessen NSörensson Proceedings of the CADE-19 Workshop: Model Computation-Principles, Algorithms, Applications the CADE-19 Workshop: Model Computation-Principles, Algorithms, Applications Citeseer 2003 Splitting without backtracking ARiazanov AVoronkov Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, IJCAI 2001 BNebel the Seventeenth International Joint Conference on Artificial Intelligence, IJCAI 2001 Morgan Kaufmann 2001 Integer induction in saturation PHozzová LKovács AVoronkov Proceedings of CADE CADE Springer 2021 12699 First-order theorem proving and Vampire LKovács AVoronkov International Conference on Computer Aided Verification Springer 2013 New techniques in clausal form generation GReger MSuda AVoronkov GCAI 2016 EPiC Series in Computing EasyChair 2016 41 AVATAR: The architecture for first-order theorem provers AVoronkov International Conference on Computer Aided Verification Springer 2014 AVATAR modulo theories GReger NSBjørner MSuda AVoronkov GCAI 2016 EPiC Series in Computing EasyChair 2016 41 Instantiation-based automated reasoning: From theory to practice KKorovin International Conference on Automated Deduction Springer 2009 Unification with abstraction and theory instantiation in saturation-based reasoning GReger MSuda AVoronkov International Conference on Tools and Algorithms for the Construction and Analysis of Systems Springer 2018 The uses of SAT solvers in Vampire GReger MSuda Proceedings of the 1st and 2nd Vampire Workshops EPiC Series in Computing the 1st and 2nd Vampire Workshops EasyChair 2015 38 A multithreaded Vampire with shared persistent grounding MRawson GReger FMCAD 2021 IEEE 2021 280 Induction in Saturation-Based Proof Search GReger AVoronkov Proceedings of CADE PFontaine CADE Springer 2019 11716 A polymorphic Vampire ABhayat GReger International Joint Conference on Automated Reasoning Springer 2020 A combinator-based superposition calculus for higher-order logic ABhayat GReger International Joint Conference on Automated Reasoning Springer 2020 The TPTP problem library and associated infrastructure GSutcliffe Journal of Automated Reasoning 43 2009 Vampire getting noisy: Will random bits help conquer chaos? (system description) MSuda 2022 EasyChair 7719 EasyChair Preprint no Induction with Generalization in Superposition Reasoning MHajdú PHozzová LKovács JSchoisswohl AVoronkov 10.1007/978-3-030-53518-6_8 Proc. of CICM CBenzmüller BMiller of CICM Springer 2020 12236 Induction with recursive definitions in superposition MHajdu PHozzová LKovács AVoronkov Formal Methods in Computer Aided Design (FMCAD) IEEE 2021. 2021 Inductive benchmarks for automated reasoning MHajdu PHozzová LKovács JSchoisswohl AVoronkov Intelligent Computer Mathematics FKamareddine CSacerdoti Coen

Cham

Springer International Publishing 2021 Superposition with first-class booleans and inprocessing clausification VNummelin ABentkamp STourret PVukmirovic Proceedings of CADE CADE Springer 2021 12699 Large theory reasoning with SUMO at CASC APease GSutcliffe NSiegel STrac AI communications 23 2010 Spass version 3.5 CWeidenbach DDimova AFietzke RKumar MSuda PWischnewski International Conference on Automated Deduction Springer 2009 Computing tiny clause normal forms NAzmy CWeidenbach Proceedings of CADE CADE Springer 2013 7898