1. Introduction

Eficient Implementation of Large-Scale Watchlists

Constantin Ruhdorfer

Stephan Schulz

Duale Hochschule Baden Württemberg Stuttgart

120 133

In this work, we explore techniques for improving the performance of the automated theorem proving system E when dealing with large watchlists. A watchlist can focus the proof search towards so-called hints, likely useful intermediate results provided externally. Recently, hints have been automatically extracted from previous proofs, creating massive watchlists and thus making evaluation of new clauses against the wachtlist a performance bottleneck. We introduce a new index for the frequent special case of unit clause hints, taking advantage of the fact that subsumption can be implemented much more eficiently for unit clauses than for the general case. We implement several strategies for exploiting the structure and properties of equational unit clauses. Additionally, we have added a new soft subsumption mechanism to E that can abstract away diferences of constant or Skolem symbols, efectively allowing a less precise match when evaluating a given clause against the watchlist. We have tested the new mechanisms on a large set of problems taken from the Mizar 40 project, using a large watchlist containing over 300 000 clauses. We show that the usage of the unit clause index significantly increases performance with this given watchlist. The use of soft subsumption shows more mixed results. We believe that most watchlists can take advantage of these techniques and have made them available to the user via E's command line interface.

eol>Automated theorem proving Hints First order logic

1. Introduction

Automated theorem provers (ATPs or ATP systems) are programs that accept a set of axioms and a conjecture in a suitable logic, and then try to automatically derive a proof of the conjecture. Many of the most successful theorem provers are based on first-order logic (with equality), an expressive logic with unambiguous semantics for which relatively mature calculi exist. First order logic is semi-decidable. In theory, proofs for valid conjectures can always be found, but proof search for an invalid conjecture may not terminate. This means that an ATP has to search for proofs in an infinite and highly branching search space. Thus, guiding this search is of critical importance for the success of the system. For systems based on forward deduction, the critical choice is which of the many possible intermediate steps should be taken next, i.e. which new formulas should be deduced. This is usually based on simple syntactic criteria (as e.g. described in [ 1 ]). However, these heuristics are often insuficient to find complex proofs.

One way to improve the proof search is via hints. Originally [ 2 ], hints are possible intermediate lemmas provided by the user of the prover. If the prover finds such a lemma (or a more general one), it can focus its search on this lemma. User hints can come from the user’s domain expertise and intuition, or possibly from simplified settings. In recent years, we have utilized the same mechanism with a very diferent source of hints, namely intermediate results contributing to proofs of other theorems in the domain [ 3 ]. The system can iteratively build a database of results that are often useful in the domain. In contrast to manually provided hints, the number of hints mined from existing proofs can often be extremely large, and hence evaluating new formulas against the hint set can become quite expensive.

In this work, we are interested in two aspects of the field at hand: Firstly, we want to improve the eficiency of using large sets of hints in guiding a theorem prover (more concretely, the equational theorem prover E [ 4, 5 ]). Secondly, we explore a variety of notions about what it means for a hint to match a new formula (or more specifically, clause), with the aim of broadening the potential influence of a hint to also influence the selection of clauses that are similar to the hint, not only those that are strictly more general.

2. Preliminiaries

First-order logic with equality We will assume two disjoint sets and where is the set of function symbols and the set of variable symbols. Function symbols have an associated arity which we will denote with / for symbol and arity ∈ N. Constants are function symbols with = 0.

We will typically use , , to denote constants, , , ℎ to denote function symbols and either , , or X1, . . . , X to denote variables.

The set of syntactically correct terms is denoted by Term(F , V ), where Term(F , V ) is the smallest set that satisfies the following conditions: 1. ∈ Term(F , V ) for all ∈ 2. / ∈ , 1, . . . , ∈ Term(F , V ) implies (1, . . . , ) ∈ Term(F , V ) An (equational) atom is an unordered pair of terms, written as ≃ . Observe that we handle the non-equational case as a special case where we encode non-equational atoms as equalities with the reserved constant $true, e.g. () ≃ $true. We will typically write non-equational literals in the conventional manner for convenience (e.g. ()). A literal is either an atom, or a negated atom. We write a negative literal as ̸≃ and define a negation operator on literals as ≃ = ̸≃ and ̸≃ = ≃ . We use ≃˙ if we do not want to specify the polarity of a literal, or, in a less precise way, let , 1, 2, . . . stand for arbitrary literals. In this notation ≃ is commutative.

A clause is a multiset of literals {1, 2, ..., }, usually written and always interpreted as a disjunction 1 ∨ 2 ∨ ... ∨ . A unit clause is a clause containing only one literal. We denote the set of all clauses as Clauses(F , V ) and the empty clause as □ .

A substitution is a mapping : → Term(F , V ) with the property that Dom( ) = { ∈ | () ̸= } is finite. This mapping can be extended to terms, atoms, literals and clauses in the obvious way. If is a substitution, we call (), (), () instances of , , or .

Similarly, a match from a term (atom, literal, clause) to another is a substitution such that () ≡ , where ≡ is the syntactic identity.

In most theorem provers for classical first-order logic, proofs are found via contradiction. In other words proof search tries to establish if a given set of clauses is unsatisfiable . For generating calculi, new clauses are deduced via a set of inference rules that take one or more (most often two) clauses as premises, and generate a new clause entailed by these premises. If this process eventually derives the empty clause, unsatisfiability has been established (the empty clause is inherently unsatisfiable, and so is any set of clauses that entails it).

Subsumption is a syntactic relation between two clauses. A clause subsumes another clause , if one of its instances is a multi-subset of the the other, i.e. if () ⊆ . A subsuming clause is more general than the subsumed clause, i.e. the subsumed clause is entailed by the subsuming clause (but not, in general, the other way round). Subsumption plays a double role in this work. On the one hand, in most calculi we can ignore subsumed clauses, and subsumption (the removal of subsumed clauses from the proof search) is a major and important optimisation technique. On the other hand, if a clause subsumed another, it is considered “at least as good” as the first one. The original notion of a clause matching a hint is based on subsumption. We do not require a clause to be identical to a hint to prefer it, but we also prefer clauses that subsume a hint (but note that we further generalise this relation later).

Positions in a term A potential position ∈ N* in a term is defined as a sequence over natural numbers. The empty position is denoted with the special symbol .

The set of positions in a term is denoted with pos() and defined recursively by case distinction: If only consists of a variable symbol ∈ then pos() = . If on the other hand ≡ (1, ..., ) then pos() = { } ∪ {. | 1 ≤ ≤ , ∈ pos()} with ∈ N* . A position ∈ pos() of a term can be used to refer to the subterm of at . To be more exact: if = , then | = . Otherwise, ≡ .′ and ≡ (1, . . . , ). In that case, = |′ . The top symbol of ∈ is top() = and the top symbol of (1, . . . , ) is top( (1, . . . , )) = .

2.1. Proof search

E is a saturating theorem prover based on the superposition calculus [ 6 ]. To prove a conjecture, axioms and the negated conjecture are converted to clause normal form, resulting in a set of clauses that is unsatisfiable if and only if the conjecture holds. The proof state thus is a set of clauses, and the proof search is realised by saturating this set of clauses by adding logical consequences that can be deduced from existing clauses by application of a number of inference rules. If this process generates the empty clause as an explicit witness of unsatisfiability, the proof has been concluded.

In practice, this proof search is realised via the given-clause algorithm. The proof state is represented by two disjoint sets of clauses, the set of unprocessed clauses, and the set of processed clauses. The algorithm repeatedly picks a clause from , computes all possible consequences between this given clause and all clauses in , and adds them to . It then adds (the given clause) to . This maintains the invariant that all direct consequences of clauses in have been computed. In addition to these generating inferences, the algorithm can use simplifying rules to replace clauses by simpler clauses, or to delete redundant clauses.

The most critical choice point for the given-clause algorithm is the selection of the given clause for each iteration of the main loop. This is traditionally controlled by heuristic evaluations, based on symbol counting (smaller clauses are preferred), clause age (older clauses are preferred), and various combinations and refinements of these measures (compare [ 7 ]).

2.2. Watchlist

Large parts of this work focus around the watchlist technique which was originally developed by Robert Verof who named it the hint strategy [ 2 ]. The strategy was developed for guiding ATP programs in their proof search by comparing newly generated clauses against a list of hints. Such a list of hints is user-provided and usually contains lemmas, facts or otherwise clauses the user suspects might be relevant to the given problem. This technique was first implemented into Otter [ 8 ].

In the E ATP system the watchlist mechanism is implemented two-fold as a dynamic and a static variant [ 9 ]. Regardless of the variant used the list is loaded on start-up and stored as a Clause-Set where it is simplified like processed clauses. A Clause-Set is a internal data structure in E that stores clauses using a doubly linked list and provides access to its members via various indices. Every newly generated or processed clause is compared against the watchlist by checking whether or not the new clause matches one or more clauses in the watchlist. If it does it is prioritized for processing.

2.3. Indexing techniques

One of the most important factors when it comes to the performance of ATP systems is eficient indexing. Indexing helps to avoid, or at least reduce, time spent on sequential search within large sets of clauses or terms. E has included several diferent indices for a while, including (perfect) discrimination tree indexing [ 10 ], feature vector indexing [ 11 ] and fingerprint indexing [ 12 ].

E has been using feature vector indexing for non-unit subsumption [ 11 ], and indexes only the processed set of clauses . Feature vector indexing is particularly suitable for indexing relatively large multi-literal clauses, since it handles the complexity of equation- and literal permutation by using features that are invariant under these permutations. This is a major advantage compared to other approaches of handling subsumption via indices. It does not, however, come into play for relatively small unit clauses, for which better indices exist. One of which is fingerprint indexing which is a technique that samples positions in terms for its indexing representation. We can adjust these sampled positions in a way that takes advantage of the fact that every unit clause exactly consist out two terms (more on that later). First, we will introduce fingerprint indexing for this purpose.

Fingerprint Indexing A fingerprint index [ 12 ] is as trie over fingerprints fp of terms. The general fingerprint feature function gfpf : Term(F , V ) × N* → ′ where ′ = ⊎ {A, B, N} is defined by case distinction: gfpf(, ) = ⎧ ⎪A ⎪ ⎪ ⎪⎨top(|) ⎪B ⎪ ⎪ ⎪⎩N if ∈ pos(), | ∈ if ∈ pos(), | ̸∈ if = ., ∈ pos() and | ∈ for some otherwise Here top() is if ∈ and if ≡ (1, ..., ). Given that the fingerprint feature function is a function fpf : (, ) → ′ and is defined by fpf() = gfpf(, ) for a fixed ∈ N* . Lastly the fingerprint function is defined by fp : Term(F , V ) → ( ′) for a fixed ∈ N. A ifngerprint is a vector of elements of ′ and is calculated by fp() for a given term .

For an arbitrary fpf and two terms and assume two values = fpf() and = fpf(). An overview for the compatibility of unification and matching from onto , given and , is presented in Figure 1.

Unification

1 2 A 1 2 A B N

Y N Y Y N

N Y Y Y N

Y Y Y Y N

B Y Y Y Y Y

N N N N Y Y

Matching

1 2 A 1 2 A B N

Y N Y Y N

N Y Y Y N

N N Y Y N

B N N N Y N

N N N N Y Y

3. Implementation

When we originally introduced the watchlist feature, we expected to work with fairly small watchlists, and decided to use feature vector indexing for all hint matching. However, watchlists now contain several hundred thousand clauses, and evaluating new clauses against the watchlist has become a major bottleneck. To reduce this bottleneck, we have split the watchlist index into a pair of unit and non-unit clause indices, to decrease access times by storing fewer clauses in either by using the most appropriate indexing technique for either set. For a similar idea see [ 3 ]. Unit clause index We have implemented a new unit clause index in E based on fingerprint indexing. Since this is a technique for term and not clause indexing we exploit the structure of unit clauses to generate an indexing representation for the given unit clause. We use the fact that all unit clauses are of the form {lterm ≃˙ rterm} to construct a new term, represented by a $ cell of the form ≃˙ (lterm, rterm), over which we calculate the fingerprint. We do that by alternating between lterm and rterm for sampling positions. Since all indexed terms start with one kind of equality symbol (e.g. ≃ or ̸≃) we can skip it when constructing the fingerprint for a term. The position is therefore never sampled.

Clauses that are not orientable are inserted twice into the index since changing the orientation of a clause also changes its fingerprint. In the worst-case this would lead to the size of index doubling. We therefore checked old runs of E and found that around 15% of clauses were not orientable and would therefore be inserted twice. Although this is a considerate increase we guess that this would have a negligible impact on performance while designing this data structure. Inserted clauses are simply stored as a pointer in the leaf of the fingerprint trie using a splay tree.

We present an example index in Figure 2 which assumes an example fingerprint function FPW4 that samples at (1, 2, 1.1, 2.1) and = { /2, /1, /0, /0, /0}.

g f g f f A c g,f g,A f,c

A A a g,f,A g,A,A f,c,a b A N N g,f,A,b

(X1) ≃ (, ) g,f,A,A

(X1) ≃ (X2, ) g,A,A,N f,c,a,N (X1) ≃ X2 (X1) ≃ X1 () ≃

We have implemented several fingerprint functions to cover a wide variety of needs. We started with functions that assume full equality in the terms they sample which means that they sample both sides equally: NoIdx (no unit clause index), FPW2, FPW4, FPW6, FPW8 and FPW10 (see Table 1 for details). Although E is an equality based ATP system not all problems are purely equational or equational at all. The same also applies to watchlists. E already categorizes problems based on whether they are non-equational , somewhat equational and purely equational . We use the same mechanism to classify the degree of equality in the given watchlists.

Based on that we alter the strategy used to sample the positions. This is since with increasingly less equational watchlist the right side of a term is more likely to simply be $ and there is no useful information to be sampled. To address this we also introduced a left only (marked with "L", e.g. FPW2L) and a left leaning (marked with "LL", e.g. FPW2LL) version of each fingerprint function. The left only version will skip position and continue to only sample positions on the left side, e.g. FPW2L samples at 1, 1.1 and FPW6L at 1, 1.1, 1.1, 1.1.1, 1.2.1, 1.1.2. The left leaning version will sample roughly between 2/3 and 3/4 of the positions from the left side, depending on the size of the fingerprint function. Since FPW2 samples so few positions FPW2LL is the same function as FPW2L. For an overview over all strategies please consult Table 1.

Strategy name

Positions sampled NoIdx FPW2 FPW2L

FPW2Flex FPW4 FPW4L FPW4LL FPW6 FPW6L FPW6LL FPW6Flex FPW8 FPW8L FPW8LL FPW10 FPW10L FPW10LL

While the left leaning version surely is more useful for somewhat equational clause sets, using the strategy will still result in many sampled positions that are non-existent and therefore not useful when it comes to matching. We therefore propose yet another strategy to be used for partly equational clause sets which we will denote by "*Flex" (e.g. FPW2Flex). A flex type strategy is one where we first classify the input based on whether or not the right side of the term is $. We then use an L type sampling method on the term if it is or a balanced one if it is not. On the one hand this allows us to better exploit the structure of the given term while on the other hand this will result in the index returning some terms that are not actually a match if they had been sampled with the same fingerprint function. This is since now their ifngerprints might be sampled from diferent positions. While that seems troublesome at first this is not really an issue for two reasons: (i) It is unlikely that this will afect many terms since one unmatchable symbol in the fingerprint will already reject the match and (ii) since ifngerprint indexing is an non-perfect indexing method to begin with we need to check whether the given clause subsumes the every returned clause anyway. In the worst case we will need to check slightly more results for subsumption.

We implemented all these options into E and made them available through Es domain-specific language (DSL). On top of that we also implemented an automatic mode (available as "auto") that maps a watchlist to an "LL" type function, an watchlist to an "L" type function and a watchlist to a normal strategy. As a basis for that we used "FPW6" since we expect it to perform well across many diferent watchlists.

Clause abstraction We have also implemented a clause abstraction mechanism in E for the watchlist feature. The mechanism supports two modes of operation: One abstracting constants and one abstracting skolem symbols. If turned on our implementation will rewrite all clauses that are inserted into or checked against the watchlist to adhere to the abstraction. This efectively allows for less precise matches against the watchlist.

If constants are to be abstracted all constants are rewritten to the first constant met during the proof for an untyped problem and for a typed one to the first constant met with the appropriate sort. That is given a clause 1 = { (, ) ≃˙ ( 1), ( 1) ≃˙ ()}, = { /2, /1, /0, /0, /0} and the first met constant we will rewrite the clause 1 to ′1 = { (, ) ≃˙ ( 1), ( 1) ≃˙ ()} assuming , , share the same sort.

The mechanism works similarly for abstracting skolem symbols where we rewrite them to the first met skolem symbol with the same type. We have also made this an available option to turn on through Es DSL.

4. Experimental Results

We tested on Intel Xeon E5-2698 v3 CPUs at 2.30 GHz using the Linux 3.19.0-25-generic kernel in 64-bit mode. All tests were run with a time limit of 720 seconds and a limit of 10.000 generated clauses. For orchestrating the experiments we used the ATPy library1.

4.1. Unit clause index

For testing we used a strategy2 provided by the automated reasoning group at Czech Technical University in Prague who also provided a watchlist based on previous runs of the system. The watchlist contained 367.408 clauses of which 153.997 are unit. The strategy was run against a tenth of the Mizar 40 project [ 13 ] amounting to 5787 problems. We have made all of this data and the E version used available at http://eprover.eu/E-eu/SoftWatch.html.

Table 2 shows diferent versions of the index and their performance. Observe that we compare our implementation against a version of E that only uses feature vector indexing as an indexing technique for the watchlist. This version is refereed to as “Conventional” or “NoIdx” (no index) since it is missing the unit clause index. To measure performance we observe the runtime E given the set of problems. We chose to compare runtimes since the proof search for any given problem nearly always stays the same between diferent indexing strategies. To verify that 1Written by Jan Jakubův; Online accessible at https://github.com/ai4reason/pyprove 2The exact options given were: − definitional − cnf=24 − split− aggressive − simul− paramod − forward− context− sr − destructive− er− aggressive − destructive− er − prefer− initial− clauses − tKBO6 − winvfreqrank − c1 − Ginvfreq − F1 − delete− bad− limit=150000000 − WSelectMaxLComplexAvoidPosPred − H ’(1∗ConjectureTermPrefixWeight(PreferProcessed,1,3,0.1,5,0,0.1,1,4),1 ∗ConjectureTermPrefixWeight( PreferWatchlist,1,3,0.5,100,0,0.2,0.2,4),1∗Refinedweight(PreferWatchlist ,4,300,4,4,0.7) ,1 ∗ RelevanceLevelWeight2( PreferWatchlist ,0,1,2,1,1,1,200,200,2.5,9999.9,9999.9) ,1∗ StaggeredWeight( PreferWatchlist ,1) ,1∗ SymbolTypeweight(PreferWatchlist ,18,7,− 2,5,9999.9,2,1.5) ,2∗ Clauseweight( PreferWatchlist ,20,9999,4) ,2∗ ConjectureSymbolWeight(PreferWatchlist ,9999,20,50,− 1,50,3,3,0.5) ,2∗ StaggeredWeight( PreferWatchlist ,2) ) ’ − free− numbers

All runs

the runs indeed stay the same we compared a random subsample of proof searches. In our comparisons we will diferentiate between the runtime for all problems and only those that were deemed successful 3.

We expected to find similar results compared with the original fingerprint paper [ 12 ]. Meaning that we expected to find that a fingerprint size of 6 is a good balance of trie depth and clause distribution. Although comparing FPW6 with the no indexed version yields an improvement of 16.35% for all runs and 19.65% for all runs that were successful we find that other strategies were even more successful. Surprisingly FPW2L yielded the best performance performing 28.86% for all runs and 44.87% for all successful runs.

While this index increases performance on average Figure 3 shows that the actual performance is dependent on the problem itself. Please note the figures’ logarithmic scale. Notice that most strategies perform very similar. This very likely is an efect of testing on the same watchlist where all strategies perform well, but one can exploit some inherent structure of the watchlist better. The “auto” strategy is not listed since for the watchlist tested it would evaluate to the performance of FPW6LL.

Lastly, we were interested in examining wether the diferent strategies solve the same problems or instead if they are proving diferent problems. We found that they overwhelmingly do. That is to say the intersection of problems solved by all strategies, which is a very limiting factor, is 1668 problems big (includes the baseline). Given that most strategies solve around 1680 problems this means that the overlap of solved problems is 99%. We compiled the runtimes 3Runs with exit status "Theorem" or "CounterSatisfiable" (a) FPW2L runtime comparison (in seconds). (b) FPW6 runtime comparison (in seconds). on these 1668 problems in Table 3.

The problems solved by all strategies are very likely the easier problems in the whole set. We therefore might expect the performance timings to be better than the the runtimes observed in Table 2 but this is not the case. Index FVI (baseline) FPW2 FPW2L FPW4 FPW4L FPW4LL FPW6 FPW6L FPW6LL FPW8 FPW8L FPW8LL FPW10 FPW10L FPW10LL

4.2. Validating the use of fingerprint indexing

One central claim of this paper is that exploiting the structure of unit clauses leads to better performance compared to a standard feature vector index. This claim can easily be verified by comparing the performance of E when either indexing technique is only filled with unit clauses. This can be achieved by using a watchlist that only contains unit clauses. We do that by removing all non-unit clauses from the watchlist described above. We did not alter the set of problems.

4.3. Clause Abstraction

We used a similar test setup for determining performance of the clause abstraction feature. We have used the previous options for the prover (as stated in footnote 2) together with either adding the flag − watchlist− clause− abstraction=constant or − watchlist− clause− abstraction=skolem respectively. The results are shown in Table 5.

We find that the performance varies from problem to problem. This is especially true when (a) All runs (in seconds).

(b) Successful runs (in seconds). abstracting constant symbols where some problems started to run out of time. Compare this to just one problem for all other strategies tested (see Figure 3). This efect is clearly visible in the scatter plot Figure 5a.

5. Future Work

We have identified at least two more interesting areas of study. Firstly instead of rewriting a skolem symbol to one of the same arity, we would also be interested in rewriting complete skolem terms to a constant. Secondly, we are also interested in generalizing the idea of splitting the watchlist indices into even more smaller ones to increase performance.

6. Conclusion

In this work, we have presented a special unit clause index for the watchlist feature based on ifngerprint indexing. We explored the performance for several strategies with that index given a large watchlist of 300 000 clauses and showed that the index largely increases performance (a) Constant abstraction (in seconds).

(b) Skolem symbol abstraction (in seconds). compared to a version without the index. We conclude that the performance of the index is dependent on the watchlist, its structure and the strategy used. We believe that most watchlists can benefit from this index.

We have also introduced some mechanism to E that allow for less precise matches on the watchlist. While that showed more mixed results in terms of performance it is an interesting topic that would benefit from additional exploration.

Acknowledgments

Special thanks to the Automated Reasoning Group at Czech Technical University in Prague for providing the watchlist, the problem files and the experimental environment.

[1]

Schulz ,

Möhrmann , Performance of clause selection heuristics for saturation-based theorem proving , in: N. Olivetti , A . Tiwari (Eds.), Proc. of the 8th IJCAR, Coimbra , volume 9706 of LNAI , Springer, 2016 , pp. 330 - 345 .

[2]

Verof , Using hints to increase the efectiveness of an automated reasoning program: Case studies , Journal of Automated Reasoning 16 ( 1996 ) 223 - 239 . URL: https://doi.org/10. 1007/BF00252178. doi: 10 .1007/BF00252178.

[3]

Goertzel ,

Jakubuv ,

Urban , Enigmawatch: Proofwatch meets ENIGMA , CoRR abs/ 1905 .09565 ( 2019 ). URL: http://arxiv.org/abs/ 1905 .09565.

[4]

Schulz ,

E - A Brainiac

Theorem Prover , Journal of AI Communications 15 ( 2002 ) 111 - 126 .

[5]

Schulz ,

Cruanes ,

Vukmirović , Faster, higher, stronger: E 2 .3, in: P. Fontaine (Ed.), Proc. of the 27th CADE , Natal, Brasil, number 11716 in LNAI, Springer, 2019 , pp. 495 - 507 .

[6]

Bachmair ,

Ganzinger , Rewrite-Based Equational Theorem Proving with Selection and Simplification , Journal of Logic and Computation 3 ( 1994 ) 217 - 247 .

[7]

Schulz ,

Möhrmann , Performance of clause selection heuristics for saturation-based theorem proving , in: Proceedings of the 8th International Joint Conference on Automated Reasoning - Volume 9706 , Springer-Verlag, Berlin, Heidelberg, 2016 , p. 330 - 345 . doi: 10 . 1007/978-3- 319 -40229-1_ 23 .

[8]

McCune , Otter 2 .0, in: M. E. Stickel (Ed.), 10th International Conference on Automated Deduction , Springer Berlin Heidelberg, Berlin, Heidelberg, 1990 , pp. 663 - 664 .

[9]

Goertzel ,

Jakubův ,

Schulz , J. Urban, ProofWatch: Watchlist guidance for large theories in E , in: J. Avigad , A . Mahboubi (Eds.), Interactive Theorem Proving: 9th International Conference , Oxford, UK, Springer, 2018 , pp. 270 - 288 .

[10]

McCune , Experiments with discrimination-tree indexing and path indexing for term retrieval , J. Autom. Reason . 9 ( 1992 ) 147 - 167 . URL: https://doi.org/10.1007/BF00245458. doi: 10 .1007/BF00245458.

[11]

Schulz , Simple and Eficient Clause Subsumption with Feature Vector Indexing , in: M. P. Bonacina, M. E. Stickel (Eds.), Automated Reasoning and Mathematics: Essays in Memory of William W. McCune , volume 7788 of LNAI , Springer, 2013 , pp. 45 - 67 .

[12]

Schulz , Fingerprint Indexing for Paramodulation and Rewriting , in: B. Gramlich , U.

Sattler , D.

Miller (Eds.), Proc. of the 6th IJCAR, Manchester , volume 7364 of LNAI , Springer, 2012 , pp. 477 - 483 .

[13]

Kaliszyk ,

Urban , Mizar 40 for mizar 40, CoRR abs/1310.2805 ( 2013 ). URL: http: //arxiv.org/abs/1310.2805.