Handling Common Transitive Relations in First-Order Automated Reasoning Koen Claessen Ann Lillieström Chalmers University of Technology {koen,annl}@chalmers.se Abstract We present a number of alternative ways of handling transitive binary relations that commonly occur in first-order problems, in particular equivalence relations, total orders, and reflexive, transitive relations. We show how such relations can be discovered syntactically in an in- put theory. We experimentally evaluate different treatments on prob- lems from the TPTP, using resolution-based reasoning tools as well as instance-based tools. Our conclusions are that (1) it is beneficial to consider different treatments of binary relations as a user, and that (2) reasoning tools could benefit from using a preprocessor or even built-in support for certain binary relations. 1 Introduction Most automated reasoning tools for first-order logic have some kind of built-in support for reasoning about equal- ity. Equality is one of the most common binary relations, and there are great performance benefits from providing built-in support for equality. Together, these two advantages by far outweigh the cost of implementation. Other common concepts for which there exists built-in support in many tools are associative, commutative operators; and real-valued, rational-valued, and integer-valued arithmetic. Again, these concepts seem to appear often enough to warrant the extra cost of implementing special support in reasoning tools. This paper is concerned with investigating what kind of special treatment we could give to commonly appearing transitive binary relations, and what effect this treatment has in practice. Adding special treatment of transitive relations to reasoning tools has been the subject of study before, in particular by means of chaining [1]. The transitivity axiom @x , y, z . Rpx , yq ^ Rpy, z q ñ Rpx , z q can lead to an expensive proof exploration in resolution and superposition based theorem provers, and can generate a huge number of instances in instance-based provers and SMT-solvers. Transitive relations are also common enough to motivate special built-in support. However, as far as we know, chaining is not implemented in any of the major first-order reasoning tools (at least not in E [6], Vampire [5], Z3 [4], and CVC4 [2], which were used in this paper). As an alternative to adding built-in support, in this paper we mainly look at (1) what a user of a reasoning tool may do herself to optimize the handling of these relations, and (2) how a preprocessing tool may be able to do this automatically. Adding built-in reasoning support in the tools themselves is not a main concern of this paper. Copyright c by the paper’s authors. Copying permitted for private and academic purposes. In: P. Fontaine, S. Schulz, J. Urban (eds.): Proceedings of the 5th Workshop on Practical Aspects of Automated Reasoning (PAAR 2016), Coimbra, Portugal, 02-07-2016, published at http://ceur-ws.org 11 reflexive ” @x . Rpx , x q euclidean ” @x , y, z . Rpx , yq ^ Rpx , z q ñ Rpy, z q antisymmetric ” @x , y . Rpx , yq ^ Rpy, x q ñ x “ y transitive ” @x , y, z . Rpx , yq ^ Rpy, z q ñ Rpx , z q asymmetric ” @x , y . Rpx , yq _ Rpy, x q total ” @x , y . Rpx , yq _ Rpy, x q symmetric ” @x , y . Rpx , yq ñ Rpy, x q coreflexive ” @x , y . Rpx , yq ñ x “ y Figure 1: Definitions of basic properties of binary relations By “treatment” we mean any way of logically expressing the relation. For example, a possible treatment of a binary relation R in a theory T may simply mean axiomatizing R in T . But it may also mean transforming T into a satisfiability-equivalent theory T 1 where R does not even syntactically appear. As an example, consider a theory T in which an equivalence relation R occurs. One way to deal with R is to simply axiomatize it, by means of reflexivity, symmetry, and transitivity: @x . Rpx , x q @x , y . Rpx , yq ñ Rpy, x q @x , y, z . Rpx , yq ^ Rpy, z q ñ Rpx , z q Another way is to “borrow” the built-in equality treatment that exists in most theorem provers. We can do this by introducing a new symbol rep, and replacing all occurrences of Rpx , yq by the formula: reppx q “ reppyq The intuition here is that rep is now the representative function of the relation R. No axioms are needed. As we shall see, this alternative treatment of equivalence relations is satisfiability-equivalent with the original one, and actually is beneficial in practice in certain cases. In general, when considering alternative treatments, we strive to make use of concepts already built-in to the reasoning tool in order to express other concepts that are not built-in. For the purpose of this paper, we have decided to focus on three different kinds of transitive relations: (1) equivalence relations and partial equivalence relations, (2) total orders and strict total orders, and (3) general reflexive, transitive relations. The reason we decided to concentrate on these three are because (a) they appear frequently in practice, and (b) we found well-known ways but also novel ways of dealing with these. The target audience for this paper is thus both people who use reasoning tools and people who implement reasoning tools. 2 Common properties of binary relations In this section, we take a look at commonly occurring properties of binary relations, which combinations of these are interesting for us to treat specially, and how we may go about discovering these. Take a look at Fig. 1. It lists 8 basic and common properties of binary relations. Each of these properties can be expressed using one logic clause, which makes it easy to syntactically identify the presence of such a property in a given theory. When we investigated the number of occurrences of these properties in a subset of the TPTP problem library1 [7], we ended up with the table in Fig. 2. The table was constructed by gathering all clauses from all TPTP problems (after clausification), and keeping every clause that only contained one binary relation symbol and, possibly, equality. Each such clause was then categorized as an expression of a basic property of a binary relation symbol. We found only 163 such clauses that did not fit any of the 8 properties we chose as basic properties, but were instead instances of two new properties. Both of these were quite esoteric and did not seem to have a standard name in mathematics. The table also contains occurrences where a negated relation was stated to have a certain property, and also occurrences where a flipped relation (a relation with its arguments swapped) was stated to have a certain property, 1 For the statistics in this paper, we decided to only look at unsorted TPTP problems with 10.000 clauses or less. 12 4945 reflexive 2082 euclidean 1874 antisymmetric 1567 transitive 784 asymmetric 784 total 388 symmetric 3 coreflexive (163 other) Figure 2: Number of occurrences of binary relation properties in TPTP 430+18 equivalence relations 181+7 partial equivalence relations 328+7 (strict) total orders 573+20 reflexive, transitive relations (excluding the above) Figure 3: Number of occurrences of binary relations in TPTP, divided up into Theo- rem/Unsatisfiable/Unknown/Open problems + Satisfiable/CounterSatisfiable problems. and also occurrences of combined negated and flipped relations. This explains for example why the number of occurrences of total relations is the same as for asymmetric relations; if a relation is total, the negated relation is asymmetric and vice-versa. We adopt the following notation. Given a property of binary relations prop, we introduce its negated version, which is denoted by prop . The property prop holds for R if and only if prop holds for R. Similarly, we introduce the flipped version of a property prop, which is denoted by prop ò . The property prop ò holds for R if and only if prop holds for the flipped version of R. Using this notation, we can for example say that total is equivalent with asymmetric . Sometimes the property we call euclidean here is called right euclidean; the corresponding variant left euclidean can be denoted euclidean ò . Note that prop is not the same as prop! For example, a relation R can be reflexive, or reflexive (which means that R is reflexive), or reflexive, which means that R is not reflexive. Using this notation on the 8 original basic properties from Fig. 1, we end up with 32 new basic properties that we can use. (However, as we have already seen, some of these are equivalent to others.) This paper will look at 5 kinds of different binary relations, which are defined as combinations of basic properties: equivalence relation ” treflexive, symmetric, transitive u partial equivalence relation ” tsymmetric, transitive u total order ” ttotal , antisymmetric, transitive u strict total order ” tantisymmetric , asymmetric, transitive u reflexive, transitive relation ” treflexive, transitive u As a side note, in mathematics, strict total orders are sometimes defined using a property called trichotomous, which means that exactly one of Rpx , yq, x “ y, or Rpy, x q must be true. However, when you clausify this property in the presence of transitivity, you end up with antisymmetric which says that at least one of Rpx , yq, x “ y, or Rpy, x q must be true. There seems to be no standard name in mathematics for the property antisymmetric , which is why we use this name. In Fig. 3, we display the number of binary relations we have found in (our subset of) the TPTP for each category. The next section describes how we found these. 3 Syntactic discovery of common binary relations If our goal is to automatically choose the right treatment of equivalence relations, total orders, etc., we must have an automatic way of identifying them in a given theory. It is easy to discover for example an equivalence relation in a theory by means of syntactic inspection. If we find the presence of the axioms reflexive, symmetric, and transitive, for the same relational symbol R, we know that R is an equivalence relation. 13 total ô total ò total ô asymmetric total ô asymmetric ò reflexive ô reflexive ò reflexive ô reflexive ò symmetric ô symmetric symmetric ô symmetric ò symmetric ô symmetric ò transitive ô transitive ò transitive ô transitive ò coreflexive ô coreflexive ò coreflexive ô coreflexive ò antisymmetric ô antisymmetric ò antisymmetric ô antisymmetric ò Figure 4: Basic properties that are equivalent But there is a problem. There are other ways of axiomatizing equivalence relations. For example, a much more common way to axiomatize equivalence relations in the TPTP is to state the two properties reflexive and euclidean for R. Rather than enumerating all possible ways to axiomatize certain relations by hand, we wrote a program that computes all possible ways for any combination of basic properties to imply any other combination of basic properties. Our program generates a table that can be precomputed in a minute or so and then used to very quickly detect any alternative axiomatization of binary relations using basic properties. Let us explain how this table was generated. We start with a list of 32 basic properties (the 8 original basic properties, plus their negated, flipped, and negated flipped versions). Firstly, we use an automated theorem prover (we used E [6]) to discover which of these are equivalent with other such properties. The result is displayed in Fig. 4. Thus, 17 basic properties can be removed from the list, because they can be expressed using other properties. The list of basic properties now has 15 elements left. Secondly, we want to generate all implications of the form tprop 1 , . . , prop n u ñ prop where the set tprop 1 , . . , prop n u is minimal. We do this separately for each prop. The results are displayed in Fig. 5. The procedure uses a simple constraint solver (a SAT-solver) to keep track of all implications it has tried so far, and consists of one main loop. At every loop iteration, the constraint solver guesses a set tprop 1 , . . , prop n u from the set of all properties P´ tprop u. The procedure then asks E whether or not tprop 1 , . . , prop n u ñ prop is valid. If it is, then we look at the proof that E produces, and print the implication tprop a , . . , prop b u ñ prop, where tprop a , . . , prop b u is the subset of properties that were used in the proof. We then also tell the constraint solver never to guess a superset of tprop a , . . , prop b u again. If the guessed implication can not be proven, we tell the constraint solver to never guess a subset of tprop 1 , . . , prop n u again. The procedure stops when no guesses that satisfy all constraints can be made anymore. After the loop terminates, we may need to clean up the implications somewhat because some implications may subsume others. In order to avoid generating inconsistent sets tprop 1 , . . , prop n u (that would imply any other property), we also add the artificial inconsistent property false to the set, and generate implications for this property first. We exclude any found implication here from the implication sets of the real properties. This procedure generates a complete list of minimal implications. It works well in practice, especially if all guesses are maximized according to their size. The vast majority of the time is spent on the implication proofs, and no significant time is spent in the SAT-solver. To detect a binary relation R with certain properties in a given theory, we simply gather all basic properties about R that occur in the theory, and then compute which other properties they imply, using the pre-generated table. In this way, we never have to do any theorem proving in order to detect a binary relation with certain properties. 14 false ð treflexive, reflexive u transitive ð teuclidean, antisymmetric u false ð treflexive, asymmetric u transitive ð teuclidean, euclidean u false ð ttotal, reflexive u transitive ð ttotal, euclidean ò u false ð ttotal, asymmetric u transitive ð tasymmetric, euclidean ò u reflexive ð ttotal u transitive ð tsymmetric, euclidean ò u euclidean ð tsymmetric, asymmetric u transitive ð tasymmetric, euclidean ò u euclidean ð ttotal, symmetric u transitive ð tantisymmetric, euclidean ò u euclidean ð tantisymmetric, coreflexive u transitive ð teuclidean, reflexive u euclidean ð ttransitive, euclidean u transitive ð teuclidean, euclidean ò u euclidean ð tantisymmetric, euclidean u transitive ð teuclidean, euclidean ò u euclidean ð treflexive, coreflexive u transitive ð tantisymmetric, euclidean ò u euclidean ð treflexive, euclidean u transitive ð teuclidean ò , antisymmetric u euclidean ð tcoreflexive u transitive ð teuclidean , euclidean ò u euclidean ð ttotal, coreflexive u total ð treflexive, coreflexive u euclidean ð treflexive , euclidean ò u total ð treflexive, euclidean ò u ò euclidean ð treflexive, euclidean u total ð treflexive, euclidean u euclidean ð tasymmetric, euclidean ò u total ð treflexive, antisymmetric u euclidean ð tsymmetric, antisymmetric u total ð treflexive, transitive u ò euclidean ð ttransitive, reflexive , euclidean u symmetric ð tcoreflexive u euclidean ð ttotal, euclidean u symmetric ð teuclidean , reflexive u euclidean ð ttransitive, coreflexive u symmetric ð teuclidean, reflexive u euclidean ð tasymmetric, coreflexive u symmetric ð tcoreflexive u euclidean ð tasymmetric, euclidean u symmetric ð ttotal, euclidean u ò euclidean ð ttotal, euclidean u symmetric ð ttotal, euclidean u ò euclidean ð tantisymmetric, reflexive , euclidean u symmetric ð teuclidean, asymmetric u euclidean ð tasymmetric, euclidean ò u symmetric ð tasymmetric, euclidean u euclidean ð tsymmetric, transitive u symmetric ð treflexive, euclidean ò u ò euclidean ð teuclidean , euclidean u symmetric ð treflexive, euclidean ò u euclidean ð tsymmetric, euclidean ò u symmetric ð treflexive , euclidean ò u euclidean ð treflexive, euclidean ò u symmetric ð treflexive , euclidean ò u euclidean ð treflexive, symmetric, antisymmetric u symmetric ð ttotal, euclidean ò u euclidean ð treflexive, symmetric, transitive u symmetric ð tasymmetric, euclidean ò u euclidean ð ttotal, euclidean ò u symmetric ð teuclidean , euclidean ò u euclidean ð teuclidean ò , coreflexive u symmetric ð ttotal, euclidean ò u antisymmetric ð tcoreflexive u symmetric ð teuclidean, euclidean ò u antisymmetric ð tasymmetric u symmetric ð teuclidean, euclidean ò u antisymmetric ð ttransitive, reflexive u symmetric ð teuclidean, reflexive u antisymmetric ð treflexive , euclidean ò u symmetric ð tasymmetric, euclidean ò u antisymmetric ð teuclidean, reflexive u symmetric ð teuclidean , euclidean ò u transitive ð tsymmetric, asymmetric u symmetric ð treflexive, euclidean u transitive ð ttotal, symmetric u coreflexive ð tsymmetric, asymmetric u transitive ð tantisymmetric, coreflexive u coreflexive ð tantisymmetric, coreflexive u transitive ð tcoreflexive u coreflexive ð ttransitive, euclidean , reflexive u transitive ð teuclidean, transitive u coreflexive ð tasymmetric, coreflexive u transitive ð tsymmetric, antisymmetric u coreflexive ð teuclidean, reflexive u transitive ð treflexive, euclidean u coreflexive ð treflexive, antisymmetric, euclidean u transitive ð teuclidean, reflexive u coreflexive ð tasymmetric, euclidean u transitive ð ttotal, euclidean u coreflexive ð teuclidean, asymmetric u transitive ð treflexive, coreflexive u coreflexive ð tsymmetric, antisymmetric u transitive ð tasymmetric, coreflexive u coreflexive ð ttotal, antisymmetric, euclidean u transitive ð tasymmetric, euclidean u coreflexive ð treflexive, antisymmetric, euclidean ò u transitive ð tasymmetric, transitive u coreflexive ð treflexive, antisymmetric, euclidean ò u transitive ð tantisymmetric, transitive u coreflexive ð tsymmetric, transitive, reflexive u transitive ð teuclidean ò , euclidean ò u coreflexive ð ttransitive, reflexive , coreflexive u transitive ð teuclidean, coreflexive u coreflexive ð tantisymmetric, euclidean , euclidean ò u transitive ð ttotal, coreflexive u coreflexive ð ttotal, antisymmetric, euclidean ò u transitive ð teuclidean, asymmetric u coreflexive ð ttotal, euclidean, antisymmetric u transitive ð ttotal, euclidean u coreflexive ð ttotal, antisymmetric, euclidean ò u transitive ð treflexive, euclidean ò u coreflexive ð tantisymmetric, reflexive , euclidean ò u transitive ð teuclidean ò , transitive u coreflexive ð treflexive , euclidean ò u transitive ð teuclidean, symmetric u coreflexive ð tantisymmetric, euclidean , reflexive u transitive ð treflexive , euclidean ò u coreflexive ð tasymmetric, euclidean ò u transitive ð tantisymmetric, euclidean u coreflexive ð tasymmetric, euclidean ò u transitive ð ttotal, euclidean ò u coreflexive ð tantisymmetric, euclidean , euclidean ò u transitive ð teuclidean, antisymmetric u coreflexive ð teuclidean, antisymmetric, euclidean ò u transitive ð treflexive, euclidean ò u coreflexive ð teuclidean, antisymmetric, euclidean ò u transitive ð treflexive, symmetric, antisymmetric u coreflexive ð teuclidean, reflexive, antisymmetric u transitive ð teuclidean ò , coreflexive u coreflexive ð ttransitive, reflexive , euclidean ò u transitive ð treflexive, symmetric, transitive u Figure 5: The complete list of implications between properties 15 4 Handling equivalence relations Equalification As mentioned in the introduction, an alternative way of handling equivalence relations R is to create a new symbol rep and replace all occurrences of R with a formula involving rep: R reflexive R symmetric _ R transitive T r. . Rpx , yq . .s T r. . reppx q “ reppyq . .s To explain the above notation: We have two theories, one on the left-hand side of the arrow, and one on the right-hand side of the arrow. The transformation transforms any theory that looks like the left-hand side into a theory that looks like the right-hand side. We write T r. . e . .s for theories in which e occurs syntactically; in the transformation, all occurrences of e should be replaced. We call the above transformation equalification. This transformation may be beneficial because the reasoning now involves built-in equality reasoning instead of reasoning about an unknown symbol using axioms. The transformation is correct, meaning that it preserves (non-)satisfiability: (ñ) If we have a model of the LHS theory, then R must be interpreted as an equivalence relation. Let rep be the representative function of R, in other words we have Rpx , yq ô reppx q “ reppyq. Thus we also have a model of the RHS theory. (ð) If we have a model of the RHS theory, let Rpx , yq :“ reppx q “ reppyq. It is clear that R is reflexive, symmetric, and transitive, and therefore we have model of the LHS theory. In the transformation, we also remove the axioms for reflexivity, symmetry, and transitivity, because they are not needed anymore. But what if R is axiomatized as an equivalence relation using different axioms? Then we can remove any axiom about R that is implied by reflexivity, symmetry, and transitivity. Luckily we have already computed a table of which properties imply which other ones (shown in Fig. 5). Pequalification There are commonly occurring binary relations called partial equivalence relations that almost behave as equivalence relations, but not quite. In particular, they do not have to obey the axiom of reflexivity. Can we do something for these too? It turns out that a set with a partial equivalence relation R can be partitioned into two subsets: (1) one subset on which R is an actual equivalence relation, and (2) one subset of elements which are not related to anything, not even themselves. Thus, an alternative way of handling partial equivalence relations R is to create two new symbols, rep and P , and replace all occurrences of R with a formula involving rep and P: R symmetric R transitive _ T r. . Rpx , yq . .s T r. . pP px q ^ P pyq ^ reppx q “ reppyqq . .s Here, P is the predicate that indicates the subset on which R behaves as an equivalence relation. We call this transformation pequalification. This transformation may be beneficial because the reasoning now involves built-in equality reasoning instead of reasoning about an unknown symbol using axioms. However, there is also a clear price to pay since the size of the problem grows considerably. The transformation is correct, meaning that it preserves (non-)satisfiability: (ñ) If we have a model of the LHS theory, then R must be interpreted as a partial equivalence relation. Let P px q :“ Rpx , x q, in other words P is the subset on which R behaves like an equivalence relation. Let rep be a representative function of R on P , in other words we have pP px q ^ P pyqq ñ pRpx , yq ô reppx q “ reppyqq. By the definition of P we then also have Rpx , yq ô pP px q ^ P pyq ^ reppx q “ reppyqq. Thus we also have a model of the RHS theory. (ð) If we have a model of the RHS theory, let Rpx , yq :“ P px q ^ P pyq ^ reppx q “ reppyq. This R is symmetric and transitive, and therefore we have model of the LHS theory. 5 Handling total orders Ordification Many reasoning tools have built-in support for arithmetic, in particular they support an order ď on numbers. It turns out that we can “borrow” this operator when handling general total orders. Suppose we have a total order: 16 R : A ˆ A Ñ Bool We now introduce a new injective function: rep : A Ñ R We then replace all occurrences of R with a formula involving rep in the following way: R total R antisymmetric _ @x , y. reppx q “ reppyq ñ x “ y R transitive T r. . Rpx , yq . .s T r. . reppx q ď reppyq . .s (Here, ď is the order on reals.) We call this transformation ordification. This transformation may be beneficial because the reasoning now involves built-in arithmetic reasoning instead of reasoning about an unknown symbol using axioms. The above transformation is correct, meaning that it preserves (non-)satisfiability: (ñ) If we have a model of the LHS theory, then without loss of generality (by Löwenheim-Skolem), we can assume that the domain is countable. Also, R must be interpreted as a total order. We now construct rep recursively as a mapping from the model domain to R, such that we have Rpx , yq ô reppx q ď reppyq, in the following way. Let ta0 , a1 , a2 , . .u be the domain of the model, and set reppa0 q :“ 0. For any n ą 0, pick a value for reppan q that is consistent with the total order R and all earlier domain elements ai , for 0 ď i ă n. This can always be done because there is always extra room for a new, unique element between any two distinct values of R. Thus rep is injective and we also have a model of the RHS theory. (ð) If we have a model of the RHS theory, let Rpx , yq :“ reppx q ď reppyq. It is clear that R is total and transitive, and also antisymmetric because rep is injective, and therefore we have model of the LHS theory. Note on Q vs. R The proof would have worked for Q as well instead of R. The transformation can therefore be used for any tool that supports Q or R or both, and should choose whichever comparison operator is cheapest if there is a choice. Using integer arithmetic would however not have been correct. Note on strict total orders One may have expected to have a transformation specifically targeted to strict total orders, i.e. something like: R antisymmetric R asymmetric _ @x , y. reppx q “ reppyq ñ x “ y R transitive T r. . Rpx , yq . .s T r. . reppx q ă reppyq . .s However, the transformation for total orders already covers this case! Any strict total order R is also recognized as a total order R, and ordification already transforms such theories in the correct way. The only difference is that Rpx , yq is replaced with preppx q ď reppyqq instead of reppx q ă reppyq, which is satisfiability-equivalent. (There may be a practical performance difference, which is not something we have investigated.) Maxification Some reasoning tools do not have orders on real arithmetic built-in, but they may have other concepts that are built-in that can be used to express total orders instead. One such concept is handling of associative, commutative (AC) operators. For such a tool, one alternative way of handling total orders R is to create a new function symbol max and replace all occurrences of R with a formula involving max: R total max associative R antisymmetric _ max commutative R transitive @x , y. maxpx , yq “ x _ maxpx , yq “ y T r. . Rpx , yq . .s T r. . maxpx , yq “ y . .s We call this transformation maxification. This transformation may be beneficial because the reasoning now involves built-in equality reasoning with AC unification (and one extra axiom) instead of reasoning about an unknown relational symbol (using three axioms). 17 The above transformation is correct, meaning that it preserves (non-)satisfiability: (ñ) If we have a model of the LHS theory, then R must be interpreted as a total order. Let max be the maximum function associated with this order. Clearly, it must be associative and commutative, and the third axiom also holds. Moreover, we have Rpx , yq ô maxpx , yq “ y. Thus we also have a model of the RHS theory. (ð) If we have a model of the RHS theory, let Rpx , yq :“ maxpx , yq “ y. Given the axioms in the RHS theory, R is total, antisymmetric, and transitive, and therefore we have model of the LHS theory. (It may be the case that maxification can also be used to express orders that are weaker than total orders. At the time of this writing, we have not figured out how to do this.) 6 Handling reflexive, transitive relations Transification The last transformation we present is designed as an alternative treatment for any relation that is reflexive and transitive. It does not make use of any built-in concept in the tool. Instead, it transforms theories with a transitivity axiom into theories without that transitivity axiom. Instead, transitivity is specialized at each positive occurrence of the relational symbol. As such, an alternative way of handling reflexive, transitive relations R is to create a new symbol Q and replace all positive occurrences of R with a formula involving Q; the negative occurrences are simply replaced by Q: R reflexive Q reflexive R transitive _ T r. . Rpx , yq . . T r. . p@r . Qpr , x q ñ Qpr , yqq . . Rpx , yq . .s Qpx , yq . .s We call this transformation transification. This transformation may be beneficial because reasoning about transitivity in a naive way can be very expensive for theorem provers, because from transitivity there are many possible conclusions to draw that trigger each other “recursively”. The transformation only works on problems where every occurrence of R is either positive or negative (and not both, such as under an equivalence operator). If this is not the case, the problem has to be translated into one where this is the case. This can for example be done by means of clausification. Note that the resulting theory does not blow-up; only clauses with a positive occurrence of R gets one extra literal per occurrence. We replace any positive occurrence of Rpx , yq with an implication that says “for any r , if you could reach x from r , now you can reach y too”. Thus, we have specialized the transitivity axiom for every positive occurrence of R. Note that in the RHS theory, Q does not have to be transitive! Nonetheless, the transformation is correct, meaning that it preserves (non-)satisfiability: (ñ) If we have a model of the LHS theory, then R is reflexive and transitive. Now, set Q px , yq :“ R px , yq. Q is obviously reflexive. We have to show that R px , yq implies @r . Qpr , x q ñ Qpr , yq. This is indeed the case because R is transitive. Thus we also have a model of the RHS theory. (ð) If we have a model of the RHS theory, then Q is reflexive. Now, set Rpx , yq :“ @r . Qpr , x q ñ Qpr , yq. R is reflexive (by reflexivity of implication) and transitive (by transitivity of implication). Finally, we have to show that Qpx , yq implies Rpx , yq, which is the same as showing that @r . Qpr , x q ñ Qpr , yq implies Qpx , yq, which is true because Q is reflexive. Thus we also have a model of the RHS theory. 7 Experimental results We evaluate the effects of the different axiomatizations using two different resolution based theorem provers, E 1.9 [6] (with the xAuto and tAuto options) and Vampire 4.0 [5] (with the casc mode option), and two SMT- solvers, Z3 4.4.2 [4] and CVC4 1.4 [2]. The experiments were performed on a 2xQuad Core Intel Xeon E5620 processor with 24 GB physical memory, running at 2.4 GHz. We use a time limit of 5 minutes on each problem. We started from a set of 11674 test problems from the TPTP, listed as Unsatisfiable, Theorem, Unknown or Open (leaving out the very large theories). For each problem, a new theory was generated for each applicable transformation. For most problems, no relation matching any of the given criteria was detected, and thus no new theories were produced for these problems. Evaluation of reasoning tools on Satisfiable and CounterSatisfiable problems is left as future work. The experimental results are summarized in Fig. 6. 18 E Vampire Z3 CVC4 equalification (430) 422 +4 -33 428 +0 -2 362 +50 -3 370 +18 -39 pequalification (181) 96 +0 -34 93 +4 -9 38 +9 -4 59 +1 -11 transification (573) 324 +2 -26 308 +26 -11 234 +10 -46 255 +13 -42 ordification (328) n/a 296 +16 -12 238 +51 -13 267 +13 -15 maxification (328) 273 +1 -23 296 +2 -0 238 +1 -41 267 +4 -0 Figure 6: Table showing for each theorem prover the number of test problems solved before the transformation, how many new problems are solved after the transformation, and the number of problems that could be solved before but not after the transformation. (Total number of applicable problems for each transformation in parentheses). A +value in boldface indicates that there were hard problems (Rating 1.0) solved with that combination of treatment and theorem prover. An underlined -value indicates that time slicing (running both methods in 50% of the time each) solves a strictly larger superset of problems with that combination of treatment and theorem prover. 7.1 Equivalence relations Equivalence relations were present in 430 of the test problems. The majority of these problems appear in the GEO and SYN categories. Interestingly, among these 430 problems, there are only 23 problems whose equivalence relations are axiomatized with transitivity axioms. The remaining 407 problems axiomatize equivalence relations with euclidean and reflexivity axioms, as discussed in section 3. The number of equivalence relations in each problem ranges from 1 to 40, where problems with many equivalence relations all come from the SYN category. There is no clear correspondence between the number of equivalence relations in a problem and the performance of the prover prior to and after the transformation. Equalification As can be seen in Fig. 6, equalification performs very well with Z3, and somewhat well with CVC4, while it worsens the results of the resolution based provers, which already performed well on the original problems. Using a time slicing strategy, which runs the prover on the original problem for half the time and on the transformed problem for the second half, solves a strict superset of problems than the original for all of the theorem provers used in the evaluation. Fig. 7 shows in more detail the effect on solving times for the different theorem provers. Pequalification In 181 of the test problems, relations that are transitive and symmetric, but not reflexive, were found. The majority of these problems are in the CAT and FLD categories of TPTP. All of the tested theorem provers perform worse on these problems compared to the problems with true equivalence relations. This is also the case after performing pequalification. Pequalification turns out to be particularly bad for E, which solves 34 fewer problems after the transformation. Pequalification makes Z3 perform only slightly better and Vampire and CVC4 slightly worse. Hard problems solved using Pequalification After Pequalification, one problem with rating 1.0 (FLD041- 2) is solved by Vampire. Rating 1.0 means that no known current theorem prover solves the problem in reasonable time. 7.2 Total orders Total orders were found in 328 problems. The majority are in the SWV category, and the remaining in HWV, LDA and NUM. In 77 of the problems, the total order is positively axiomatized, and in the other 251 problems it is negative (and thus axiomatized as a strict total order). There is never more than one order present in any of the problems. Ordification For each of the problems, we ran the theorem provers with built-in support for arithmetic on the problems before and after applying ordification. Vampire was run on a version in TFF format [8], and Z3 and CVC4 on a version in SMT format [3]. The original problems were also transformed into TFF and SMT in order to achieve a relevant comparison. Ordification performs well for Z3, while for Vampire and CVC4 it is good for some problems and bad for some. Fig. 8 shows how solving times are affected, and the diagrams also show great potential for time slicing, in particular for Vampire and Z3. Hard problems solved using Ordification After Ordification, 15 problems, all from the SWV category, with rating 1.0 are solved. (SWV004-1, SWV035+1, SWV040+1, SWV044+1, SWV049+1, SWV079+1, SWV100+1, 19 100 100 equalified-vampire 10 10 equalified-E 1 1 0.1 0.1 0.01 0.01 0.01 0.1 1 10 100 0.01 0.1 1 10 100 original-E original-vampire 100 100 10 10 equalified-cvc4 equalified-z3 1 1 0.1 0.1 0.01 0.01 0.01 0.1 1 10 100 0.01 0.1 1 10 100 original-z3 original-cvc4 Figure 7: The time taken to solve problems, with and without equalification, using E, Vampire, Z3 and CVC4 SWV101+1, SWV108+1, SWV110+1, SWV113+1, SWV118+1, SWV120+1, SWV124+1, SWV130+1) Vam- pire and Z3 each solve 14 hard problems and CVC4 solves 13 of them. One problem (SWV004-1) is even categorized as Unknown, which means that no prover has ever solved it. After ordification, all three provers were able to solve it in less than a second. Maxification Maxification, our second possible treatment of total orders, turned out to be disadvantageous to E and Z3, while having a small positive effect on Vampire and CVC4 (see Fig. 6). 7.3 Transitive and Reflexive relations 573 test problems include relations that are transitive and reflexive, excluding equivalence relations and total orders. In all of them, transitivity occurs syntactically as an axiom. The problems come from a variety of categories, but a vast majority are in SET, SEU and SWV. Only about half of the original problems were solved by each theorem prover, which may indicate that transitivity axioms are difficult for current theorem provers to deal with. We evaluate the performance of the theorem provers before and after transification. Problems that include equivalence relations and total orders are excluded, as the corresponding methods equalification and ordification give better results and should be preferred. Vampire benefits the most from transification, and solves 32 new problems after the transformation, while 10 previously solvable problems become unsolvable within the time limit. For E, the effect of transification is almost exclusively negative. Vampire also has a significantly worse performance than E overall on the original problems that include relations that are transitive and reflexive. Both Z3 and CVC4 perform worse after the transformation, but time slicing can be a good way to improve the results. Hard problems solved using Transification After transification, two new problems with rating 1.0 are solved, both by Vampire (SEU322+2 and SEU372+2). 20 100 100 100 10 10 ordified-vampire 10 ordified-CVC4 ordified-Z3 1 1 1 0.1 0.1 0.1 0.01 0.01 0.01 0.001 0.001 0.01 0.1 1 10 100 0.001 0.01 0.1 1 10 100 0.001 0.01 0.1 1 10 100 original-vampire original-Z3 original-CVC4 Figure 8: Effects of ordification, using Vampire, Z3 and CVC4 Equalification and Transification Since all equivalence relations are transitive and reflexive, the method for transification works also on equivalence relations. Comparing the two methods on the 430 problems with equivalence relations, we concluded that equalification and transification work equally bad for E, Vampire and CVC4. Both transification and equalification improve the results for Z3, but equalification does so significantly. Ordification and Transification We compared ordification and transification on the 328 problems containing total orders. Transification seems to make these problems generally more difficult for theorem provers to solve, while ordification instead improved the results on many of the problems. Transification makes the theorem prover perform worse on these problems also for E, which cannot make use of ordification since it does not provide support for arithmetic. 8 Discussion, Conclusions, and Future Work We have presented 5 transformations that can be applied to theories with certain transitive relations: equalifica- tion, pequalification, transification, ordification, and maxification. We have also created a method for syntactic discovery of binary relations where these transformations are applicable. For users of reasoning tools that create their own theories, it is clear that they should consider using one of the proposed alternative treatments when writing theories. For all of our methods, there are existing theories for which some provers performed better on these theories than others. In particular, there exist 18 TPTP problems that are now solvable that weren’t previously. For implementers of reasoning tools, our conclusions are less clear. For some combinations of treatments and provers (such as transification for Vampire, and equalification for Z3), overall results are clearly better, and we would thus recommend these treatments as preprocessors for these provers. Some more combinations of treatments and provers lend themselves to a time slicing strategy that can solve strictly more problems, and could thusly be integrated in a natural way in provers that already have the time slicing machinery in place. Related Work Chaining [1] is a family of methods that limit the use of transitivity-like axioms in proofs by only allowing certain chains of them to occur in proofs. The result is a complete proof system that avoids the derivation of unnecessary consequences of transitivity. However, chaining is not implemented in any of the reasoning tools we considered for this paper. In personal communication with some of the authors, chaining-like techniques have not been deemed important enough to be considered for implementation, and their preliminary experimental results were mostly negative. Future Work There is a lot of room for improvements and other future work. There are many other relations that are more or less common that could benefit from an alternative treatment like the transformations described in this paper. In particular, maxification seems to be an idea that could be applied to binary relations that are weaker than total orders, which may make this treatment more effective. But there are also other, non-transitive relations that are of interest. Ordification uses total orders as the base-transformation, and treats strict total orders as negated total orders. We would like to investigate more in which cases it may be beneficial to treat strict total orders differently from total orders, and when to use ă on R instead of ď. We would also like to investigate the effect of our transformations on Satisfiable and CounterSatisfiable 21 100 100 transified-vampire 10 10 transified-e 1 1 0.1 0.1 0.01 0.01 0.01 0.1 1 10 100 0.01 0.1 1 10 100 original-e original-vampire 100 100 10 10 transified-cvc4 transified-z3 1 1 0.1 0.1 0.01 0.01 0.01 0.1 1 10 100 0.01 0.1 1 10 100 original-z3 original-cvc4 Figure 9: Effects of transification, using E, Vampire, Z3 and CVC4 problems. What would be the effect on finite model finders? Some transformations change the number of variables per clause, which will influence the performance of finite model finders. What happens to saturation- based tools? Is it easier or harder to saturate a problem after transformation? Evaluating these questions is hard, because we did not find many satisfiable problems in the TPTP that contained relevant relations (see Fig. 3). Instead, we considered using Unsatisfiable problems, and measuring the time it takes to show the absence of models up to a certain size. There are other kinds of relations than binary relations. For example, we can have an ternary relation that behaves as an equivalence relation in its 2nd and 3rd argument. An alternative treatment of this relation would be to introduce a binary function symbol rep. We do not know whether or not this occurs often, and if it is a good idea to treat higher-arity relational symbols specially in this way. Lastly, we would like to look at how these ideas could be used inside a theorem prover; as soon as the prover discovers that a relation is an equivalence relation or a total order, one of our transformations could be applied, on the fly. The details of how to do this remain to be investigated. References [1] Leo Bachmair and Harald Ganzinger. Ordered chaining calculi for first-order theories of transitive relations. Journal of the ACM (JACM), 45(6):1007–1049, 1998. [2] Clark Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanović, Tim King, Andrew Reynolds, and Cesare Tinelli. CVC4. In Proceedings of the 23rd International Conference on Computer Aided Verification, CAV’11, pages 171–177, Berlin, Heidelberg, 2011. Springer-Verlag. [3] Clark Barrett, Aaron Stump, Cesare Tinelli, Sascha Boehme, David Cok, David Deharbe, Bruno Dutertre, Pascal Fontaine, Vijay Ganesh, Alberto Griggio, Jim Grundy, Paul Jackson, Albert Oliveras, Sava Krsti, 22 Michal Moskal, Leonardo De Moura, Roberto Sebastiani, To David Cok, and Jochen Hoenicke. C.: The SMT-LIB Standard: Version 2.0. Technical report, 2010. [4] Leonardo de Moura and Nikolaj Bjørner. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, volume 4963 of Lecture Notes in Computer Science, pages 337–340. Springer, 2008. [5] Laura Kovács and Andrei Voronkov. First-Order Theorem Proving and Vampire. In Computer Aided Verifi- cation - 25th International Conference, CAV 2013, Saint Petersburg, Russia, July 13-19, 2013. Proceedings, pages 1–35, 2013. [6] Stephan Schulz. The E Theorem Prover, 2015. http://www.eprover.org/. [7] Geoff Sutcliffe. The TPTP Problem Library, 2015. http://www.tptp.org/. [8] Geoff Sutcliffe, Stephan Schulz, Koen Claessen, and Peter Baumgartner. The TPTP Typed First-order Form and Arithmetic. In Nikolaj Bjørner and Andrei Voronkov, editor, The 18th International Conference on Logic for Programming Artificial Intelligence and Reasoning (LPAR-18), pages 406–419, Merida, Venezuela, mar 2012. Springer Verlag. 23