Transforming and Analyzing Proofs in the CERES-system ∗ Stefan Hetzl Alexander Leitsch Daniel Weller Bruno Woltzenlogel Paleo Institute of Computer Languages (E185), Vienna University of Technology Abstract Cut-elimination is the most prominent form of proof transformation in logic. The elimination of cuts in formal proofs corresponds to the removal of intermediate statements (lemmas) in mathe- matical proofs. Cut-elimination can be applied to mine real mathematical proofs, i.e. for extracting explicit and algorithmic information. The system CERES (cut-elimination by resolution) is based on automated deduction and was successfully applied to the analysis of nontrivial mathematical proofs. In this paper we focus on the input-output environment of CERES, and show how users can interact with the system and extract new mathematical knowledge. 1 Introduction Cut-elimination introduced by Gentzen [6] is the most prominent form of proof transformation in logic and plays an important role in automating the analysis of mathematical proofs. The removal of cuts corresponds to the elimination of intermediate statements (lemmas) from proofs resulting in a proof which is analytic in the sense, that all statements in the proof are subformulas of the result. Therefore, the proof of a combinatorial statement is converted into a purely combinatorial proof. In a formal sense Girard’s famous analysis of van der Waerden’s theorem [7] consists in the appli- cation of cut-elimination to the proof of Fürstenberg and Weiss (which uses topological arguments) with the “perspective” of obtaining van der Waerden’s elementary proof. Indeed, an application of a complex proof transformation like cut-elimination by humans requires a goal oriented strategy. CERES [4, 5] is a cut-elimination method that is based on resolution. The method roughly works as follows: The structure of an LK-proof containing cuts is mapped to an unsatisfiable set of clauses C (the characteristic clause set). A resolution refutation of C , which is obtained using a first-order theorem prover, serves as a skeleton for the new proof which contains only atomic cuts (AC normal form). In a final step also these atomic cuts can be eliminated, provided the (atomic) axioms are valid sequents; but this step is of minor mathematical interest and of low complexity. In the system CERES1 this method of cut-elimination has been implemented. The extension of CERES from LK to LKDe, a calculus containing definition introductions and equality rules (see [10] and [1]), moved the system closer to real mathematical proofs. The system CERES has been applied successfully to a well known mathematical proof, namely to Fürstenberg’s proof of the infinity of primes [2]; it was shown that the elimination of topological arguments from the proof resulted in Euclid’s famous proof. Though CERES did not (yet) produce substantial mathematical proofs previously unknown, the analysis of Fürstenberg’s proof demonstrates the potential of the method to handle nontrivial mathematics. The main task of the CERES-method is proof transformation, not just proof verification. For the latter one there are powerful higher-order systems like Isabelle2 (see [12]) and Coq3 (see [11]); these Rudnicki P, Sutcliffe G., Konev B., Schmidt R., Schulz S. (eds.); Proceedings of the Combined KEAPPA - IWIL Workshops, pp. 77-91 ∗ Supported by the Austrian Science Fund (project P19875) 1 available at http://www.logic.at/ceres/ 2 available at http://www.cl.cam.ac.uk/research/hvg/Isabelle/ 3 available at http://coq.inria.fr/ 77 Proofs in the CERES-system Hetzl, Leitsch, Weller, Woltzenlogel Paleo systems have been used successfully to verify complicated and famous mathematical proofs like this of the four color theorem (see http://research.microsoft.com/~gonthier/). The core algorithm of CERES, however, works on LK-proofs and performs cut-elimination. For this reason we have developed the higher proof language HLK which is close to the sequent calculus LK, making translations to LK easy and efficient. In this paper we present the input-output environment of CERES and illustrate the interaction of users with the system. In the first step proofs are formalized in HLK. The formalized proof is then compiled to LKDe and analyzed by the CERES-algorithm, and finally the resulting atomic cut normal form can be viewed by Prooftool. Moreover, from the normal form a Herbrand sequent can be extracted which contains mathematical information in a more condensed form. We illustrate all phases of proof specification and analysis by an example, the tape proof of Christian Urban (see also [1]). 2 System overview Figure 1 sketches how HLK, CERES and ProofTool can be used by a mathematician to analyze existing mathematical proofs and obtain new ones. According to the labels in the edges of Figure 1, the following Figure 1: Working with HLK, CERES and ProofTool. steps are executed within the system and in interaction with the user: 1. The user (intended to be a mathematician with a background in logic) selects an interesting infor- mal mathematical proof to be transformed and analyzed. Informal mathematical proofs are proofs in natural language, as they usually occur in mathematics. 2. The user writes the selected proof in HandyLK , a higher proof language, intermediary between natural mathematical language and formal calculi. 3. The HandyLK proof is input to the compiler HLK. 4. HLK generates a formal proof in sequent calculus LKDe. 78 Proofs in the CERES-system Hetzl, Leitsch, Weller, Woltzenlogel Paleo 5. The formal proof is input to CERES, which is responsible for all sorts of proof transformations, including cut-elimination. 6. CERES extracts from the formal proof a characteristic clause set, which contains clauses formed from ancestors of cut-formulas in the formal proof. 7. The characteristic clause set is then input to a resolution theorem prover, e.g. Otter4 or Prover95 . 8. The resolution theorem prover outputs a refutation of the characteristic clause set. 9. CERES receives the refutation, which will be used as a skeleton for the transformed proof in atomic- cut normal form (ACNF). 10. CERES outputs the grounded refutation in a tree format and the characteristic clause set. Moreover it extracts projections from the formal proof and plugs them into the refutation in order to generate the ACNF. The projections and the ACNF are also output. A Herbrand sequent is obtained from the instantiation information of the quantifiers of the end-sequent (resulting in an (equationally) valid sequent) and output as well. The Herbrand sequent typically summarizes the creative content of the ACNF. For details, see [8]. 11. All outputs and inputs of CERES can be opened with ProofTool. 12. ProofTool, a graphical user interface, renders all proofs, refutations, projections, sequents and clause sets so that they can be visualized by the user. 13. The information displayed via ProofTool is analyzed by the user. 14. Based on his analysis, the user can formulate new mathematical ideas, e.g. a new informal direct proof corresponding to the ACNF. 3 Proof analysis with CERES Our calculus LKDe is based on standard LK with permutation, contraction and weakening, atomic axioms and multiplicative rules. For example, the following rules deal with the ∧-connective: Γ ` ∆, A Π ` Λ, B A, Γ ` ∆ A, Γ ` ∆ ∧:r ∧ : l1 ∧ : l2 Γ, Π ` ∆, Λ, A ∧ B A ∧ B, Γ ` ∆ B ∧ A, Γ ` ∆ In addition, to bring the calculus closer to mathematical practice, definition introduction and equality handling rules are used: Let A be a first-order formula with the free variables x1 , . . . , xk (denoted by A(x1 , . . . , xk )) and P be a new k-ary predicate symbol (corresponding to the formula A). Then the definition introduction rules are: A(t1 , . . . ,tk ), Γ ` ∆ Γ ` ∆, A(t1 , . . . ,tk ) defP : l defP : r P(t1 , . . . ,tk ), Γ ` ∆ Γ ` ∆, P(t1 , . . . ,tk ) for arbitrary sequences of terms t1 , . . . ,tk . Definition introduction is a simple and very powerful tool in mathematical practice. Note that the introduction of important concepts and notations like groups, integrals etc. can be formally described by introduction of new symbols. There are also definition intro- duction rules for new function symbols which are of similar type. 4 available at http://www-unix.mcs.anl.gov/AR/otter/ 5 available at http://www.cs.unm.edu/ mccune/prover9/ ~ 79 Proofs in the CERES-system Hetzl, Leitsch, Weller, Woltzenlogel Paleo The equality rules (also called paramodulation rules) are: Γ1 ` ∆1 , s = t A[s]Λ , Γ2 ` ∆2 Γ1 ` ∆1 ,t = s A[s]Λ , Γ2 ` ∆2 = : l1 = : l2 A[t]Λ , Γ1 , Γ2 ` ∆1 , ∆2 A[t]Λ , Γ1 , Γ2 ` ∆1 , ∆2 for inference on the left and Γ1 ` ∆1 , s = t Γ2 ` ∆2 , A[s]Λ Γ1 ` ∆1 ,t = s Γ2 ` ∆2 , A[s]Λ = : r1 = : r2 Γ1 , Γ2 ` ∆1 , ∆2 , A[t]Λ Γ1 , Γ2 ` ∆1 , ∆2 , A[t]Λ on the right, where Λ denotes a set of positions of subterms where replacement of s by t has to be performed. We call s = t the active equation of the rules. We will now introduce the CERES system by presenting the analysis of a proof from [13]; it was formalized in LKDe and analyzed by CERES in [1]. The end-sequent formalizes the statement: on a tape with infinitely many cells which are all labelled by 0 or by 1 there are two cells labelled by the same number. f (x) = 0 expresses that the cell nr. x is labelled by 0. Indexing of cells is done by number terms defined over 0, 1 and +. The proof ϕ below uses two lemmas: (1) there are infinitely many cells labelled by 0 and (2) there are infinitely many cells labelled by 1. These lemmas are eliminated by CERES and a more direct argument is obtained in the resulting proof ϕ 0 . 3.1 Specifying proofs in HandyLK In this section, we introduce the higher proof language HandyLK by presenting its most important fea- tures and syntax by means of our example proof. HandyLK is based on the idea that many aspects of writing LKDe proofs can be automatized. It supports a many-sorted first-order language. Before starting to write proofs, it is first necessary to define the language and the function and pred- icate definitions one intends to use. In our example, we deal with a language with one sort (natural numbers) and the constants mentioned above. In HandyLK , we write: define type nat; define constant 0, 1 of type nat; define infix function + of type nat,nat to nat with weight 100; define function f of type nat to nat; In the definition of infix functions like +, we may set a weight that allows terms to be written in the usual mathematical way, where superfluous brackets may be dropped. In addition to the constants, we define some variables that will be used in the proof. Here, the function f is the labelling function assigning labels to tape cells. define variable k, l, n, p, q, x, n_0, n_1 of type nat; We also define the axioms we will use. Note that :- represents ` in HandyLK . define axiom :- k + l = l + k; define axiom :- k + (l + n) = (k + l) + n; define axiom k = k + (1 + l) :- ; define axiom :- k = k; Finally, we introduce some predicate definitions. define predicate A by all x ( f(x) = 0 or f(x) = 1 ); define predicate I by all n ex k f( n + k ) = x; 80 Proofs in the CERES-system Hetzl, Leitsch, Weller, Woltzenlogel Paleo Note that in the definition of I, x is a free variable. The free variables determine the arity of the defined predicate, and can be instantiated by parameters. If P is a defined predicate, F is its defining formula and x̄ are the free variables of F, then the predicate definition is interpreted as ∀x̄(P(x̄) ⇐⇒ F). In our example, the predicate A is the assumption: All cells are labelled either 0 or 1. The predicate I(x) states that there are infinitely many cells labelled x. We now turn to formalizing our example proof ϕ. In LKDe, ϕ is (τ) (ε(0)) A ` I(0), I(1) I(0) ` ∃p∃q(p 6= q ∧ f (p) = f (q)) (ε(1)) cut A ` ∃p∃q(p 6= q ∧ f (p) = f (q)), I(1) I(1) ` ∃p∃q(p 6= q ∧ f (p) = f (q)) cut A ` ∃p∃q(p 6= q ∧ f (p) = f (q)) Here τ is a proof of the fact that either infinitely many cells are labelled 0, or infinitely many cells are labelled 1. The two applications of cut encode the application of two lemmas ε(n) for n ∈ {0, 1}: if there are infinitely many cells labelled n, then there are two cells labelled by the same number. In HandyLK , we start the specification of the proof by giving an identifier, here we choose the name the-proof: define proof the-proof Next, the end-sequent is fixed: proves A :- ex p ex q ( not p = q and f(p) = f(q) ); Now, we begin by specifying the inferences from the bottom up. In HandyLK , only the active formulas of a rule have to be specified, and no structural rules (except cut) have to be written down explicitly. The HLK compiler keeps track of the formulas used in the proof, and automatically inserts the structural rules necessary to be able to apply the rules specified by the user. The first inference in ϕ is a cut, which is a binary rule. Due to the linear nature of the HandyLK language, one of the subproofs above the binary rule has to be referenced. The following proof reference clauses are available: by proof hproof i auto propositional hsequenti explicit axiom hsequenti The first clause gives the name of the subproof with which to continue, the second clause states that the subproof ends in a propositional tautology whose proof should be computed automatically, and the last clause asserts that the subproof consists solely of an axiom. In our example we reference the proof ε(1). with cut I(1) right by proof \epsilon(1); This specifies an application of cut with cut formula I(1), where the right subproof will be ε(1). In LKDe, this will be a rule application (ε(1)) Γ ` ∆, I(1) I(1), Π ` Λ cut Γ, Π ` ∆, Λ where Γ, Π, ∆, Λ will be populated by the appropriate formulas automatically by the HLK compiler. We continue the HandyLK specification of this proof part with the left subproof of the first applica- tion of cut by encoding the second application: here, we reference both the left subproof τ and the right subproof ε(0). 81 Proofs in the CERES-system Hetzl, Leitsch, Weller, Woltzenlogel Paleo with cut I(0) left by proof \tau right by proof \epsilon(0); ; This completes the HandyLK specification of ϕ — but to be able to actually compile the proof into an LKDe-proof, the subproofs τ, ε(0) and ε(1) have to be encoded in HandyLK . We continue by showing the HandyLK rule applications in τ, which are definition introduction and quantifier rules. The proof τ starts by expanding the two occurrences of the defined predicate I: A ` ∀n∃k f (n + k) = 0, ∀n∃k f (n + k) = 1 defI : r A ` I(0), ∀n∃k f (n + k) = 1 defI : r A ` I(0), I(1) In HandyLK , it suffices to specify the predicate to be expanded and give the auxiliary formulas of the rules, the correct main formula is then found by HLK through matching: with undef I :- all n ex k f( n + k ) = 0, all n ex k f( n + k ) = 1; Next, the two strong quantifiers are eliminated by ∀ : r rules, introducing the eigenvariables n0 and n1 : A ` ∃k f (n0 + k) = 0, ∃k f (n1 + k) = 1 ∀: r A ` ∃k f (n0 + k) = 0, ∀n∃k f (n + k) = 1 ∀: r A ` ∀n∃k f (n + k) = 0, ∀n∃k f (n + k) = 1 The corresponding HandyLK code is with all right :- ex k f( n_0 + k ) = 0, ex k f( n_1 + k ) = 1; Again, it suffices to specify the auxiliary formulas, the correct main formulas are found by matching. The same holds for the weak quantifier rules. In our example, we instantiate them by n1 and n0 , respectively: A ` f (n0 + n1 ) = 0, f (n1 + n0 ) = 1 ∃: r A ` ∃k f (n0 + k) = 0, f (n1 + n0 ) = 1 ∃: r A ` ∃k f (n0 + k) = 0, ∃k f (n1 + k) = 1 with ex right :- f( n_0 + n_1 ) = 0, f( n_1 + n_0 ) = 1; Of course, HLK checks whether the quantifier rule applications are legal. Our proof τ is nearly done — it remains to decompose our assumption A and to use the commutativity of +. We only show the latter, which is encoded with an application of paramodulation: ` n0 + n1 = n1 + n0 f (n1 + n0 ) = 1 ` f (n1 + n0 ) = 1 = : l2 f (n0 + n1 ) = 1 ` f (n1 + n0 ) = 1 Such a paramodulation application is common when formalizing proofs from mathematics: an equality axiom from the background theory is used as the active equation. For this reason, it receives special treatment in HandyLK : the active equation is simply specified in a by clause. with paramod by n_0 + n_1 = n_1 + n_0 right f( n_1 + n_0 ) = 1 :- ; 82 Proofs in the CERES-system Hetzl, Leitsch, Weller, Woltzenlogel Paleo This rule application can also be encoded in the general syntax for binary rules, using a proof reference: with paramod f( n_1 + n_0 ) = 1 :- left explicit axiom :- n_0 + n_1 = n_1 + n_0 ; Recall that the definition of ϕ in HandyLK contained references to proofs ε(0) and ε(1), which show that under the assumption that there are infinitely many cells labelled 0 and 1, respectively, it follows that two cells are labelled by the same number. Clearly, these proofs have a similar structure — this fact can be exploited in HandyLK by writing a meta proof that can be instantiated with some specific terms: define proof \epsilon with meta term i of type nat; proves I(i) :- ex p ex q ( not p = q and f(p) = f(q) ); We omit the details of the specification of the meta proof ε(i). Additionally, HandyLK supports the definition of proofs in a recursive way, which is a convenient way to encode sequences of proofs. This feature was used to encode a sequence of proofs for the formula scheme from [3]. 3.2 XML format for proofs The programs in the CERES system do not work directly with proofs written in HandyLK — these proofs are compiled into a flexible XML format using the HLK compiler. XML is a well known data representa- tion language which allows the use of arbitrary and well known utilities for editing, transformation and presentation and standardized programming libraries. We are interested in tree-style proofs, formulas and terms. Data is structured in trees in XML, therefore the encoding and decoding of these objects is very straightforward. The format is flexible in the sense that it only assumes that proofs are trees or directed acyclic graphs of rule applications that are labelled by sequents. In our practice, it has been used successfully to encode LKDe and resolution proofs. The following abbreviated XML code represents the proof ϕ: ... ... ... ... 83 Proofs in the CERES-system Hetzl, Leitsch, Weller, Woltzenlogel Paleo The symbol attributes are used for the visual presentation of the proof trees: they allow the user to identify proofs and rules. The param attributes contain additional rule information, in the example they specify which formulas are to be contracted (by the contraction rule c:r) and how the formulas are to be permuted (by the permutation rule \pi:r). The prooflink tags reference other proofs by their name and allow the representation of proofs as DAGs. In the above example, the formulas contained in the sequents have been left out. To give an example of how a formula is encoded in the XML format, consider the following XML code corresponding to the formula R(c, f (c)) ∧ P(c): Data is organized in our XML format in a proofdatabase containing a number of proofs. The proof- database may also contain 1. a list of defined predicates and their definitions, 2. a list of sequents specifying the axioms of the background theory in which the proofs are written, 3. arbitrary lists of sequents identified by some symbol (e.g. to store a Herbrand sequent). 3.3 The CERES method To prepare for the following section, we will now introduce the theoretical background for the CERES method of cut-elimination, which is used by the CERES system to compute a proof in atomic cut normal form and, in the end, a Herbrand sequent. The central idea of CERES consists in extracting a so-called characteristic clause set from a proof, and then using a resolution refutation of this set to obtain a proof with only atomic cuts. We consider the proofs in LKDe as directed trees with nodes which are labelled by sequents, where the root is labelled by the end-sequent. According to the inference rules, we distinguish binary and unary nodes. In an inference ν1 : S1 ν2 : S2 x ν: S where ν is labelled by S, ν1 by S1 and ν2 by S2 , we call ν1 , ν2 predecessors of ν. Similarly ν 0 is predecessor of ν in a unary rule if ν 0 labels the premiss and ν the consequent. Then the predecessor relation is defined as the reflexive and transitive closure of the relation above. Every node is predecessor of the root, and the axioms have only themselves as predecessors. For a formal definition of the concepts we refer to [4] and [5]. A similar relation holds between formula occurrences in sequents. Instead of a formal definition we give an example. Consider the rule: ∀x.P(x) ` P(a) ∀x.P(x) ` P(b) ∧: r ∀x.P(x) ` P(a) ∧ P(b) 84 Proofs in the CERES-system Hetzl, Leitsch, Weller, Woltzenlogel Paleo The occurrences of P(a) and P(b) in the premiss are ancestors of the occurrence of P(a) ∧ P(b) in the consequent. P(a) and P(b) are called auxiliary formulas of the inference, and P(a) ∧ P(b) the main formula. ∀x.P(x) in the premisses are ancestors of ∀x.P(x) in the consequent. Again the ancestor relation is defined by reflexive transitive closure. Let Ω be the set of all occurrences of cut-formulas in sequents of an LKDe-proof ϕ. The cut-formulas are not ancestors of the formulas in the end-sequent, but they might have ancestors in the axioms (if the cuts are not generated by weakening only). The construction of the characteristic clause set is based on the ancestors of the cuts in the axioms. Note that clauses are just defined as atomic sequents. We define a set of clauses Cν for every node ν in ϕ inductively: • If ν is an occurrence of an axiom sequent S(ν), and S0 is the subsequent of S(ν) containing only the ancestors of Ω then Cν = {S0 }. • Let ν 0 be the predecessor of ν in a unary inference then Cν = Cν 0 . • Let ν1 , ν2 be the predecessors of ν in a binary inference. We distinguish two cases (a) The auxiliary formulas of ν1 , ν2 are ancestors of Ω. Then Cν = Cν1 ∪ Cν2 . (b) The auxiliary formulas of ν1 , ν2 are not ancestors of Ω. Then Cν = Cν1 × Cν2 . where C × D = {C ◦ D | C ∈ C , D ∈ D} and C ◦ D is the merge of the clauses C and D. The characteristic clause set CL(ϕ) of ϕ is defined as Cν0 , where ν0 is the root. Theorem 1. Let ϕ be a proof in LKDe. Then the clause set CL(ϕ) is equationally unsatisfiable. Remark. A clause set C is equationally unsatisfiable if C does not have a model where = is interpreted as equality over a domain. Proof. This proof first appeared in [1]. Let ν be a node in ϕ and S0 (ν) the subsequent of S(ν) which consists of the ancestors of Ω (i.e. of a cut). It is shown by induction that S0 (ν) is LKDe-derivable from Cν . If ν0 is the root then, clearly, S0 (ν0 ) = ` and the empty sequent ` is LKDe-derivable from the axiom set Cν0 , which is just CL(ϕ). As all inferences in LKDe are sound over equational interpretations (where new symbols introduced by definition introduction have to be interpreted according to the defining equiv- alence), CL(ϕ) is equationally unsatisfiable. Note that, without the rules = : l and = : r, the set CL(ϕ) is just unsatisfiable. Clearly the rules = : l and = : r are sound only over equational interpretations. The next steps in CERES are (1) the computation of the proof projections ϕ[C] w.r.t. clauses C ∈ CL(ϕ), (2) the refutation of the set CL(ϕ), resulting in an RP-tree γ, i.e. in a deduction tree defined by the inferences of resolution and paramodulation, and (3) “inserting” the projections ϕ[C] into the leaves of γ. 85 Proofs in the CERES-system Hetzl, Leitsch, Weller, Woltzenlogel Paleo For step (1) we skip in ϕ all inferences where the auxiliary resp. main formulas are ancestors of a cut. Instead of the end-sequent S we get S ◦C for a C ∈ CL(ϕ). Step (2) consists in ordinary theorem proving by resolution and paramodulation (which is equation- ally complete). For refuting CL(ϕ) any first-order resolution prover can be used. By the completeness of the methods we find a refutation tree γ as CL(ϕ) is unsatisfiable by Theorem 1. Step (3) makes use of the fact that, after computation of the simultaneous most general unifier of the inferences in γ, the resulting tree γ 0 is actually a derivation in LKDe. Indeed, after computation of the simultaneous unifier, paramodulation becomes = : l and = : r and resolution becomes cut in LKDe. Note that the definition rules, like the logical rules, do not appear in γ 0 . Now for every leaf ν in γ 0 , which is labelled by a clause C0 (an instance of a clause C ∈ CL(ϕ)) we insert the proof projection ϕ[C0 ]. The result is a proof with only atomic cuts. The proof projection is only sound if the proof ϕ is skolemized, i.e. there are no strong quantifiers (i.e. quantifiers with eigenvariable conditions) in the end-sequent. If ϕ is not skolemized a priori it can be transformed into a skolemized proof ϕ 0 in polynomial (at most quadratic) time; for details see [3]. For illustration, consider the following example: ϕ= ϕ1 ϕ2 cut (∀x)(P(x) → Q(x)) ` (∃y)(P(a) → Q(y)) ϕ1 = P(u)? ` P(u) Q(u) ` Q(u)? →: l P(u)? , P(u) → Q(u) ` Q(u)? →: r P(u) → Q(u) ` (P(u) → Q(u))? ∃:r P(u) → Q(u) ` (∃y)(P(u) → Q(y))? ∀:l (∀x)(P(x) → Q(x)) ` (∃y)(P(u) → Q(y))? ∀:r (∀x)(P(x) → Q(x)) ` (∀x)(∃y)(P(x) → Q(y))? ϕ2 = P(a) ` P(a)? Q(v)? ` Q(v) →: l P(a), (P(a) → Q(v))? ` Q(v) →: r (P(a) → Q(v))? ` P(a) → Q(v) ∃:r (P(a) → Q(v))? ` (∃y)(P(a) → Q(y)) ∃:l (∃y)(P(a) → Q(y))? ` (∃y)(P(a) → Q(y)) ∀:l (∀x)(∃y)(P(x) → Q(y))? ` (∃y)(P(a) → Q(y)) We have CL(ϕ) = {P(u) ` Q(u); ` P(a); Q(v) `}. The resolution refutation δ of CL(ϕ) ` P(a) P(u) ` Q(u) R ` Q(a) Q(v) ` R ` does the job of refuting CL(ϕ). By applying the most general unifier σ of δ , we obtain a ground refutation γ = δ σ : ` P(a) P(a) ` Q(a) R ` Q(a) Q(a) ` R ` This will serve as a skeleton for a proof in ACNF. To complete the construction, we combine the skeleton with the following projections (grounded by σ ): 86 Proofs in the CERES-system Hetzl, Leitsch, Weller, Woltzenlogel Paleo ϕ(C1 ) = P(a) ` P(a) Q(a) ` Q(a) →: l P(a), P(a) → Q(a) ` Q(a) ∀:l P(a), (∀x)(P(x) → Q(x)) ` Q(a) w:r P(a), (∀x)(P(x) → Q(x)) ` (∃y)(P(a) → Q(y)), Q(a) ϕ(C2 ) = P(a) ` P(a) w:r P(a) ` P(a), Q(v) →: r ` P(a) → Q(v), P(a) ∃:r ` (∃y)(P(a) → Q(y)), P(a) w:l (∀x)(P(x) → Q(x)) ` (∃y)(P(a) → Q(y)), P(a) ϕ(C3 ) = Q(a) ` Q(a) w:l P(a), Q(a) ` Q(a) →: r Q(a) ` P(a) → Q(a) ∃:r Q(a) ` (∃y)(P(a) → Q(y)) w:l Q(a), (∀x)(P(x) → Q(x)) ` (∃y)(P(a) → Q(y)) The composition of skeleton and projections yields ϕ(γ) = ϕ(C2 ) ϕ(C1 ) B ` C, P(a) P(a), B ` C, Q(a) ϕ(C3 ) cut B, B ` C,C, Q(a) Q(a), B ` C cut B, B, B ` C,C,C contractions B`C where B = (∀x)(P(x) → Q(x)), C = (∃y)(P(a) → Q(y)). Clearly, ϕ(γ) is a proof of the end-sequent of ϕ in ACNF. 3.4 Cut-elimination, proof visualization and Herbrand sequent extraction with CERES After compiling the HandyLK source to XML with HLK, it is possible to perform proof analysis using the CERES tool. In our example, we are interested in extracting a Herbrand sequent from a cut-free proof of the end-sequent of ϕ. To this end, the CERES method is used, which extracts an unsatisfiable set of clauses from the proof and refutes it. In theory, any resolution prover can be used to refute this set of clauses; Table 1 lists the format conversions currently supported by our implementation. Table 1: Format conversions supported by CERES From To XML TPTP XML Otter input XML Prover9 input Otter proof object XML Prover9 output XML In our example, we use Otter to find a resolution refutation. Running CERES yields the following characteristic clause set in Otter input format 87 Proofs in the CERES-system Hetzl, Leitsch, Weller, Woltzenlogel Paleo -> =("f"(+(x4, x5)), "0"), =("f"(+(x5, x4)), "1"). % (C1) =("f"(+(x4, x6)), "0"), =("f"(+(+(+(x4, x6), "1"), x7)), "0") ->. % (C2) =("f"(+(x4, x6)), "1"), =("f"(+(+(+(x4, x6), "1"), x7)), "1") ->. % (C3) which corresponds to the set of clauses CL(π) = { ` f (x4 + x5 ) = 0, f (x5 + x4 ) = 1; f (x4 + x6 ) = 0, f (((x4 + x6 ) + 1) + x7 ) = 0 `; f (x4 + x6 ) = 1, f (((x4 + x6 ) + 1) + x7 ) = 1 ` } which is refutable in arithmetic. The Otter proof object resulting from the successful refutation is then converted into a resolution tree in XML format. According to the CERES method, this refutation is then transformed into an LKDe proof ϕ 0 of the end-sequent of ϕ in ACNF. ϕ 0 consists of 1262 rules, and is clearly too large to be depicted here (a small part of it can be seen in Figure 2(b)). From ϕ 0 , a Herbrand sequent is extracted and stored in XML as a list of sequents (containing only the Herbrand sequent). All proofs and sequents computed during the analysis can be visualized using ProofTool as seen in Figure 2. (a) Visualizing the input proof ϕ. (b) A bird’s eye view on the output proof ϕ 0 . Figure 2: ProofTool screenshots To make the Herbrand sequent extracted in our example easier to read, the terms have been simplified modulo commutativity and associativity of +, and the following abbreviations are used: p1 = x + z + 2, p2 = x + z + 4, p3 = z + 1, p4 = x + y + z + 3, p5 = w + x + y + z + 6, p6 = w + x + y + z + 8, p7 = w + x + y + z + 4, With this, the Herbrand sequent of the ACNF of the tape proof is f (p1 ) = 0 ∨ f (p1 ) = 1, f (p2 ) = 0 ∨ f (p2 ) = 1, f (p3 ) = 0 ∨ f (p3 ) = 1, f (p4 ) = 0 ∨ f (p4 ) = 1, f (p5 ) = 0 ∨ f (p5 ) = 1, f (p6 ) = 0 ∨ f (p6 ) = 1, f (p7 ) = 0 ∨ f (p7 ) = 1 ` p1 6= p2 ∧ f (p1 ) = f (p2 ), p3 6= p1 ∧ f (p3 ) = f (p1 ), p3 6= p2 ∧ f (p3 ) = f (p2 ), p1 6= p4 ∧ f (p1 ) = f (p4 ), p5 6= p6 ∧ f (p5 ) = f (p6 ), p7 6= p5 ∧ f (p7 ) = f (p5 ), p7 6= p6 ∧ f (p7 ) = f (p6 ), p4 6= p7 ∧ f (p4 ) = f (p7 ). Clearly, the Herbrand sequent is much smaller than the ACNF from which it is extracted. As it contains all quantifier instantiation information from the proof, it is much better suited to human interpretation than the proof itself. The Herbrand sequent we extracted from ϕ 0 can be interpreted as the following proof: 88 Proofs in the CERES-system Hetzl, Leitsch, Weller, Woltzenlogel Paleo Theorem 2. On a tape with infinitely many cells where each cell is labelled 0 or 1, there are two distinct cells that are labelled the same. Proof. It is easy to see that the following inequalities hold: p1 6= p2 , p3 6= p1 , p3 6= p2 , p1 6= p4 , p5 6= p6 , p7 = 6 p5 , p7 =6 p6 , p4 =6 p7 . We may therefore delete their occurrences from the right side of the Herbrand sequent and obtain the simplified sequent f (p1 ) = 0 ∨ f (p1 ) = 1, f (p2 ) = 0 ∨ f (p2 ) = 1, f (p3 ) = 0 ∨ f (p3 ) = 1, f (p4 ) = 0 ∨ f (p4 ) = 1, f (p5 ) = 0 ∨ f (p5 ) = 1, f (p6 ) = 0 ∨ f (p6 ) = 1, f (p7 ) = 0 ∨ f (p7 ) = 1 ` f (p1 ) = f (p2 ), f (p3 ) = f (p1 ), f (p3 ) = f (p2 ), f (p1 ) = f (p4 ), f (p5 ) = f (p6 ), f (p7 ) = f (p5 ), f (p7 ) = f (p6 ), f (p4 ) = f (p7 ). Now, we assume that this sequent is false and obtain a contradiction. If the sequent is false, then all formulas on the left side are true and all formulas on the right side are false, so we may assume f (pi ) = 0 ∨ f (pi ) = 1 for i ∈ {1, . . . , 7}. Let f (p1 ) = a with a ∈ {0, 1} and let ā = 1 − a. We have assumed that the first formula on the right side is false, so f (p2 ) = ā. From the second formula we obtain f (p3 ) = ā, and from the third f (p3 ) = ❠= a, which yields the desired contradiction. Note that the proof extracted from the Herbrand sequent uses only 3 cells even though the sequent itself is not minimal. Still, the proof yields a minimal Herbrand sequent directly: f (p1 ) = 0 ∨ f (p1 ) = 1, f (p2 ) = 0 ∨ f (p2 ) = 1, f (p3 ) = 0 ∨ f (p3 ) = 1, ` p1 6= p2 ∧ f (p1 ) = f (p2 ), p3 6= p1 ∧ f (p3 ) = f (p1 ), p3 6= p2 ∧ f (p3 ) = f (p2 ). Comparing the proof obtained from the Herbrand sequent with the original proof, we have gained an important piece of information: in the original proof, it is shown that either infinitely many cells are labelled 0 or 1, while in the cut-free proof, only finitely many cells are used in the proof. 4 Open problems and future work On the theoretical side we are working on the following two major extensions of the CERES-method: The first is an extension of CERES to second-order logic, a first step in that direction has been achieved by extending it to the fragment of second-order proofs defined by containing only quantifier-free com- prehension in [9]. This extension will allow the formalization of a much broader range of mathematical proofs and yield a more convenient proof formalization. The second theoretical extension enables the method to work without skolemization whose main benefit will be the ability to eliminate single cuts or even parts of a cut which is more realistic in mathematical applications than the simultaneous elimination of all cuts. Concerning the practical aspects, it became clear in our experiments with the system that there are two bottlenecks: 1. the formalization of the input proof and 2. finding a resolution refutation of the characteristic clause set. Accordingly also these two points are of major concern for our future work. To ease the formalization of the input proof, we plan to enhance the capabilities of HLK, in particu- lar those of the auto propositional-mode to cover also reasoning in equational background theories 89 Proofs in the CERES-system Hetzl, Leitsch, Weller, Woltzenlogel Paleo specified by term rewriting systems. We expect this feature to greatly reduce the amount of time spent on proof formalization, as large parts of formalized proofs consist of equational reasoning. In the long term, however, an interesting option would be to use one of the available proof assistants for the formalization of proofs as these are highly developed tools. The theoretical obstacle to the use of existing proof assis- tants lies in the fact that a translation of the formalized proof is necessary, as proof assistants typically work in set theory or higher-order logic. Moreover, this translation should not be uniform because, de- pending on the formalized proof and the aims of the proof analysis, one logical translation will be more useful than another. The main practical obstacle lies in transforming proof objects generated by the proof assistant (if they are provided at all) into the sequent calculus format as described in Section 3.2. Here, a lot of work will have to go into importing the standard library of basic number systems, basic operations, data structures, etc. The search for a refutation of the characteristic clause set turned out to be a hard problem for current theorem provers (as described in [2]). To handle this problem, we need automated theorem provers with high flexibility, allowing an interactive construction of the resolution refutation. For using several provers in order to combine their respective advantages it would be very beneficial to have a standard output format for encoding resolution refutations, a goal which is partially realized by the TPTP output format. Another approach to enhance the power of the provers is to exploit the fact that the characteristic clause sets of the CERES-method form a very specific subclass of theorem proving problems. In particular, it seems promising to store in the clause set certain information about the structure of the original proof as hints on how to find a refutation and to develop resolution refinements for CERES to use this additional information. Another important area for future improvement is the human-readable representation of the output proofs as well as other results of the analysis (e.g. Herbrand sequents); this aspect is also critical for the method to adapt better to larger proofs. As the terms generated by cut-elimination are frequently very large it would pay out to implement term simplifiers. A general approach to the simplification of Herbrand sequents lies in computing a minimal variant of it by most general unification and a (possibly external) tautology checker. For specific theories, e.g. arithmetic, one can even do better by using term simplification algorithms based on term rewriting systems. Another useful addition is to include a flexible handling of definitions in the extraction and display of Herbrand sequents which would make them an even more powerful tool for understanding the mathematical content of a formal proof. These changes will further contribute to the creation of an interface for proof analysis which is attractive to logicians and mathematicians alike. References [1] Matthias Baaz, Stefan Hetzl, Alexander Leitsch, Clemens Richter, and Hendrik Spohr. Proof transformation by CERES. In Lecture Notes in Artificial Intelligence, volume 4108, pages 82–93. Mathematical Knowledge Management, Springer Berlin, 2006. [2] Matthias Baaz, Stefan Hetzl, Alexander Leitsch, Clemens Richter, and Hendrik Spohr. Ceres: An Analysis of Fürstenberg’s Proof of the Infinity of Primes. Theoretical Computer Science, 403:160–175, August 2008. [3] Matthias Baaz and Alexander Leitsch. Cut normal forms and proof complexity. Annals of Pure and Applied Logic, 97(1–3):127–177, 1999. [4] Matthias Baaz and Alexander Leitsch. Cut-elimination and Redundancy-elimination by Resolution. Journal of Symbolic Computation, 29(2):149–176, 2000. [5] Matthias Baaz and Alexander Leitsch. Towards a clausal analysis of cut-elimination. Journal of Symbolic Computation, 41:381–410, 2006. [6] G. Gentzen. Untersuchungen über das logische Schließen. Mathematische Zeitschrift, 39:176–210,405–431, 1934–1935. [7] Jean-Yves Girard. Proof Theory and Logical Complexity. Elsevier, 1987. 90 Proofs in the CERES-system Hetzl, Leitsch, Weller, Woltzenlogel Paleo [8] Stefan Hetzl, Alexander Leitsch, Daniel Weller, and Bruno Woltzenlogel Paleo. Herbrand sequent extrac- tion. In Lecture Notes in Artificial Intelligence, volume 5144, pages 462–477. Mathematical Knowledge Management, Springer Berlin, 2008. [9] Stefan Hetzl, Alexander Leitsch, Daniel Weller, and Bruno Woltzenlogel Paleo. A Clausal Approach to Proof Analysis in Second-Order Logic. In Symposium on Logical Foundations of Computer Science (LFCS 2009), Lecture Notes in Computer Science. Springer, 2009. to appear. [10] Alexander Leitsch and Clemens Richter. Equational theories in ceres. unpublished (available at http: //www.logic.at/ceres/), 2005. [11] The Coq development team. The Coq proof assistant reference manual. LogiCal Project, 2004. Version 8.0. [12] Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. Isabelle/HOL — a proof assistant for Higher- Order Logic, volume 2283 of LNCS. Springer Berlin, 2002. [13] C. Urban. Classical Logic and Computation. PhD thesis, University of Cambridge Computer Laboratory, 2000. 91