1 Introduction

Transforming and Analyzing Proofs in the CERES-system

0 Stefan Hetzl Alexander Leitsch Daniel Weller Bruno Woltzenlogel Paleo Institute of Computer Languages (E185), Vienna University of Technology

77 91

Cut-elimination is the most prominent form of proof transformation in logic. The elimination of cuts in formal proofs corresponds to the removal of intermediate statements (lemmas) in mathematical proofs. Cut-elimination can be applied to mine real mathematical proofs, i.e. for extracting explicit and algorithmic information. The system CERES (cut-elimination by resolution) is based on automated deduction and was successfully applied to the analysis of nontrivial mathematical proofs. In this paper we focus on the input-output environment of CERES, and show how users can interact with the system and extract new mathematical knowledge.

1 Introduction

systems have been used successfully to verify complicated and famous mathematical proofs like this of the four color theorem (see http://research.microsoft.com/~gonthier/). The core algorithm of CERES, however, works on LK-proofs and performs cut-elimination. For this reason we have developed the higher proof language HLK which is close to the sequent calculus LK, making translations to LK easy and efficient.

In this paper we present the input-output environment of CERES and illustrate the interaction of users with the system. In the first step proofs are formalized in HLK. The formalized proof is then compiled to LKDe and analyzed by the CERES-algorithm, and finally the resulting atomic cut normal form can be viewed by Prooftool. Moreover, from the normal form a Herbrand sequent can be extracted which contains mathematical information in a more condensed form. We illustrate all phases of proof specification and analysis by an example, the tape proof of Christian Urban (see also [1]). 2

System overview

3. The HandyLK proof is input to the compiler HLK. 4. HLK generates a formal proof in sequent calculus LKDe. 6. CERES extracts from the formal proof a characteristic clause set, which contains clauses formed from ancestors of cut-formulas in the formal proof. 7. The characteristic clause set is then input to a resolution theorem prover, e.g. Otter4 or Prover95. 8. The resolution theorem prover outputs a refutation of the characteristic clause set. 9. CERES receives the refutation, which will be used as a skeleton for the transformed proof in atomiccut normal form (ACNF). 10. CERES outputs the grounded refutation in a tree format and the characteristic clause set. Moreover it extracts projections from the formal proof and plugs them into the refutation in order to generate the ACNF. The projections and the ACNF are also output. A Herbrand sequent is obtained from the instantiation information of the quantifiers of the end-sequent (resulting in an (equationally) valid sequent) and output as well. The Herbrand sequent typically summarizes the creative content of the ACNF. For details, see [8]. 11. All outputs and inputs of CERES can be opened with ProofTool. 12. ProofTool, a graphical user interface, renders all proofs, refutations, projections, sequents and clause sets so that they can be visualized by the user. 13. The information displayed via ProofTool is analyzed by the user. 14. Based on his analysis, the user can formulate new mathematical ideas, e.g. a new informal direct proof corresponding to the ACNF. 3

Proof analysis with CERES

Our calculus LKDe is based on standard LK with permutation, contraction and weakening, atomic axioms and multiplicative rules. For example, the following rules deal with the ^-connective: G ` D; A P ` L; B G; P ` D; L; A ^ B ^ : r

A; G ` D A ^ B; G ` D ^ : l1

A; G ` D B ^ A; G ` D ^ : l2 In addition, to bring the calculus closer to mathematical practice, definition introduction and equality handling rules are used:

Let A be a first-order formula with the free variables x1; : : : ; xk (denoted by A(x1; : : : ; xk)) and P be a new k-ary predicate symbol (corresponding to the formula A). Then the definition introduction rules are: A(t1; : : : ; tk); G ` D P(t1; : : : ; tk); G ` D defP : l

G ` D; A(t1; : : : ; tk) G ` D; P(t1; : : : ; tk) defP : r for arbitrary sequences of terms t1; : : : ; tk. Definition introduction is a simple and very powerful tool in mathematical practice. Note that the introduction of important concepts and notations like groups, integrals etc. can be formally described by introduction of new symbols. There are also definition introduction rules for new function symbols which are of similar type. 4available at http://www-unix.mcs.anl.gov/AR/otter/ 5available at http://www.cs.unm.edu/~mccune/prover9/ The equality rules (also called paramodulation rules) are: on the right, where L denotes a set of positions of subterms where replacement of s by t has to be performed. We call s = t the active equation of the rules.

We will now introduce the CERES system by presenting the analysis of a proof from [13]; it was formalized in LKDe and analyzed by CERES in [1]. The end-sequent formalizes the statement: on a tape with infinitely many cells which are all labelled by 0 or by 1 there are two cells labelled by the same number. f (x) = 0 expresses that the cell nr. x is labelled by 0. Indexing of cells is done by number terms defined over 0; 1 and +. The proof j below uses two lemmas: ( 1 ) there are infinitely many cells labelled by 0 and ( 2 ) there are infinitely many cells labelled by 1. These lemmas are eliminated by CERES and a more direct argument is obtained in the resulting proof j0. 3.1 Specifying proofs in HandyLK In this section, we introduce the higher proof language HandyLK by presenting its most important features and syntax by means of our example proof. HandyLK is based on the idea that many aspects of writing LKDe proofs can be automatized. It supports a many-sorted first-order language.

Before starting to write proofs, it is first necessary to define the language and the function and predicate definitions one intends to use. In our example, we deal with a language with one sort (natural numbers) and the constants mentioned above. In HandyLK , we write: define type nat; define constant 0, 1 of type nat; define infix function + of type nat,nat to nat with weight 100; define function f of type nat to nat; In the definition of infix functions like +, we may set a weight that allows terms to be written in the usual mathematical way, where superfluous brackets may be dropped. In addition to the constants, we define some variables that will be used in the proof. Here, the function f is the labelling function assigning labels to tape cells. define variable k, l, n, p, q, x, n_0, n_1 of type nat; We also define the axioms we will use. Note that :- represents ` in HandyLK . define axiom :- k + l = l + k; define axiom :- k + (l + n) = (k + l) + n; define axiom k = k + (1 + l) :- ; define axiom :- k = k;

Finally, we introduce some predicate definitions.

define predicate A by all x ( f(x) = 0 or f(x) = 1 ); define predicate I by all n ex k f( n + k ) = x; Note that in the definition of I, x is a free variable. The free variables determine the arity of the defined predicate, and can be instantiated by parameters. If P is a defined predicate, F is its defining formula and x¯ are the free variables of F, then the predicate definition is interpreted as 8x¯(P(x¯) () F). In our example, the predicate A is the assumption: All cells are labelled either 0 or 1. The predicate I(x) states that there are infinitely many cells labelled x.

We now turn to formalizing our example proof j. In LKDe, j is

(t) (e(0)) A ` I(0); I( 1 ) I(0) ` 9p9q(p 6= q ^ f (p) = f (q)) cut (e( 1 ))

A ` 9p9q(p 6= q ^ f (p) = f (q)); I( 1 ) I( 1 ) ` 9p9q(p 6= q ^ f (p) = f (q))

A ` 9p9q(p 6= q ^ f (p) = f (q)) cut Here t is a proof of the fact that either infinitely many cells are labelled 0, or infinitely many cells are labelled 1. The two applications of cut encode the application of two lemmas e(n) for n 2 f0; 1g: if there are infinitely many cells labelled n, then there are two cells labelled by the same number.

In HandyLK , we start the specification of the proof by giving an identifier, here we choose the name the-proof: define proof the-proof

Next, the end-sequent is fixed:

proves A :- ex p ex q ( not p = q and f(p) = f(q) ); Now, we begin by specifying the inferences from the bottom up. In HandyLK , only the active formulas of a rule have to be specified, and no structural rules (except cut) have to be written down explicitly. The HLK compiler keeps track of the formulas used in the proof, and automatically inserts the structural rules necessary to be able to apply the rules specified by the user. The first inference in j is a cut, which is a binary rule. Due to the linear nature of the HandyLK language, one of the subproofs above the binary rule has to be referenced. The following proof reference clauses are available: by proof hproof i auto propositional hsequenti explicit axiom hsequenti The first clause gives the name of the subproof with which to continue, the second clause states that the subproof ends in a propositional tautology whose proof should be computed automatically, and the last clause asserts that the subproof consists solely of an axiom. In our example we reference the proof e( 1 ).

with cut I( 1 )

right by proof \epsilon( 1 ); This specifies an application of cut with cut formula I( 1 ), where the right subproof will be e( 1 ). In LKDe, this will be a rule application (e( 1 )) G ` D; I( 1 ) I( 1 ); P ` L

G; P ` D; L

cut where G; P; D; L will be populated by the appropriate formulas automatically by the HLK compiler.

We continue the HandyLK specification of this proof part with the left subproof of the first application of cut by encoding the second application: here, we reference both the left subproof t and the right subproof e(0). ; with cut I(0) left by proof \tau right by proof \epsilon(0); This completes the HandyLK specification of j — but to be able to actually compile the proof into an LKDe-proof, the subproofs t, e(0) and e( 1 ) have to be encoded in HandyLK . We continue by showing the HandyLK rule applications in t, which are definition introduction and quantifier rules. The proof t starts by expanding the two occurrences of the defined predicate I:

A ` 8n9k f (n + k) = 0; 8n9k f (n + k) = 1

A ` I(0); 8n9k f (n + k) = 1

A ` I(0); I( 1 ) defI : r defI : r In HandyLK , it suffices to specify the predicate to be expanded and give the auxiliary formulas of the rules, the correct main formula is then found by HLK through matching: with undef I

:- all n ex k f( n + k ) = 0, all n ex k f( n + k ) = 1; Next, the two strong quantifiers are eliminated by 8 : r rules, introducing the eigenvariables n0 and n1: A ` 9k f (n0 + k) = 0; 9k f (n1 + k) = 1 A ` 9k f (n0 + k) = 0; 8n9k f (n + k) = 1 8 : r

A ` 8n9k f (n + k) = 0; 8n9k f (n + k) = 1 8 : r

The corresponding HandyLK code is

with all right

:- ex k f( n_0 + k ) = 0, ex k f( n_1 + k ) = 1; Again, it suffices to specify the auxiliary formulas, the correct main formulas are found by matching. The same holds for the weak quantifier rules. In our example, we instantiate them by n1 and n0, respectively: A ` f (n0 + n1) = 0; f (n1 + n0) = 1 A ` 9k f (n0 + k) = 0; f (n1 + n0) = 1 9 : r

A ` 9k f (n0 + k) = 0; 9k f (n1 + k) = 1 9 : r with ex right

:- f( n_0 + n_1 ) = 0, f( n_1 + n_0 ) = 1; Of course, HLK checks whether the quantifier rule applications are legal. Our proof t is nearly done — it remains to decompose our assumption A and to use the commutativity of +. We only show the latter, which is encoded with an application of paramodulation: ` n0 + n1 = n1 + n0 f (n1 + n0) = 1 ` f (n1 + n0) = 1 f (n0 + n1) = 1 ` f (n1 + n0) = 1 = : l2 Such a paramodulation application is common when formalizing proofs from mathematics: an equality axiom from the background theory is used as the active equation. For this reason, it receives special treatment in HandyLK : the active equation is simply specified in a by clause.

with paramod by n_0 + n_1 = n_1 + n_0 right f( n_1 + n_0 ) = 1 :- ; This rule application can also be encoded in the general syntax for binary rules, using a proof reference: with paramod f( n_1 + n_0 ) = 1 :left explicit axiom :- n_0 + n_1 = n_1 + n_0 ; Recall that the definition of j in HandyLK contained references to proofs e(0) and e( 1 ), which show that under the assumption that there are infinitely many cells labelled 0 and 1, respectively, it follows that two cells are labelled by the same number. Clearly, these proofs have a similar structure — this fact can be exploited in HandyLK by writing a meta proof that can be instantiated with some specific terms: define proof \epsilon with meta term i of type nat; proves

I(i) :- ex p ex q ( not p = q and f(p) = f(q) ); We omit the details of the specification of the meta proof e(i). Additionally, HandyLK supports the definition of proofs in a recursive way, which is a convenient way to encode sequences of proofs. This feature was used to encode a sequence of proofs for the formula scheme from [3]. 3.2

XML format for proofs The programs in the CERES system do not work directly with proofs written in HandyLK — these proofs are compiled into a flexible XML format using the HLK compiler. XML is a well known data representation language which allows the use of arbitrary and well known utilities for editing, transformation and presentation and standardized programming libraries. We are interested in tree-style proofs, formulas and terms. Data is structured in trees in XML, therefore the encoding and decoding of these objects is very straightforward.

The format is flexible in the sense that it only assumes that proofs are trees or directed acyclic graphs of rule applications that are labelled by sequents. In our practice, it has been used successfully to encode LKDe and resolution proofs. The following abbreviated XML code represents the proof j: <proof symbol="the-proof" calculus="LK"> <rule symbol="c:r" type="contrr" param="2"> <sequent>...</sequent> <rule symbol="cut" type="cut"> <sequent>...</sequent> <rule symbol="\pi:r" type="permr" param="( 1 2 )"> <sequent>...</sequent> <rule symbol="cut" type="cut"> <sequent>...</sequent> <prooflink symbol="\tau"/> <prooflink symbol="\epsilon(0)"/> </rule> </rule> <prooflink symbol="\epsilon( 1 )"/> </rule> </rule> </proof> The symbol attributes are used for the visual presentation of the proof trees: they allow the user to identify proofs and rules. The param attributes contain additional rule information, in the example they specify which formulas are to be contracted (by the contraction rule c:r) and how the formulas are to be permuted (by the permutation rule \pi:r). The prooflink tags reference other proofs by their name and allow the representation of proofs as DAGs.

In the above example, the formulas contained in the sequents have been left out. To give an example of how a formula is encoded in the XML format, consider the following XML code corresponding to the formula R(c; f (c)) ^ P(c): <conjunctiveformula type="and"> <constantatomformula symbol="R"> <constant symbol="c"/> <function symbol="f">

Data is organized in our XML format in a proofdatabase containing a number of proofs. The proofdatabase may also contain 1. a list of defined predicates and their definitions, 2. a list of sequents specifying the axioms of the background theory in which the proofs are written, 3. arbitrary lists of sequents identified by some symbol (e.g. to store a Herbrand sequent). 3.3

The CERES method To prepare for the following section, we will now introduce the theoretical background for the CERES method of cut-elimination, which is used by the CERES system to compute a proof in atomic cut normal form and, in the end, a Herbrand sequent.

The central idea of CERES consists in extracting a so-called characteristic clause set from a proof, and then using a resolution refutation of this set to obtain a proof with only atomic cuts. We consider the proofs in LKDe as directed trees with nodes which are labelled by sequents, where the root is labelled by the end-sequent. According to the inference rules, we distinguish binary and unary nodes. In an inference n1 : S1 n2 : S2 x

n : S where n is labelled by S, n1 by S1 and n2 by S2, we call n1; n2 predecessors of n. Similarly n0 is predecessor of n in a unary rule if n0 labels the premiss and n the consequent. Then the predecessor relation is defined as the reflexive and transitive closure of the relation above. Every node is predecessor of the root, and the axioms have only themselves as predecessors. For a formal definition of the concepts we refer to [4] and [5]. A similar relation holds between formula occurrences in sequents. Instead of a formal definition we give an example.

Consider the rule: The occurrences of P(a) and P(b) in the premiss are ancestors of the occurrence of P(a) ^ P(b) in the consequent. P(a) and P(b) are called auxiliary formulas of the inference, and P(a) ^ P(b) the main formula. 8x:P(x) in the premisses are ancestors of 8x:P(x) in the consequent. Again the ancestor relation is defined by reflexive transitive closure.

Let W be the set of all occurrences of cut-formulas in sequents of an LKDe-proof j. The cut-formulas are not ancestors of the formulas in the end-sequent, but they might have ancestors in the axioms (if the cuts are not generated by weakening only). The construction of the characteristic clause set is based on the ancestors of the cuts in the axioms. Note that clauses are just defined as atomic sequents. We define a set of clauses Cn for every node n in j inductively:

If n is an occurrence of an axiom sequent S(n), and S0 is the subsequent of S(n) containing only the ancestors of W then Cn = fS0g.

Let n0 be the predecessor of n in a unary inference then Cn = Cn0 .

Let n1; n2 be the predecessors of n in a binary inference. We distinguish two cases (a) The auxiliary formulas of n1; n2 are ancestors of W. Then (b) The auxiliary formulas of n1; n2 are not ancestors of W. Then

Cn = Cn1 [ Cn2 : Cn = Cn1

Cn2 : where C

D = fC D j C 2 C ; D 2 D g and C D is the merge of the clauses C and D.

The characteristic clause set CL(j) of j is defined as Cn0 , where n0 is the root.

Theorem 1. Let j be a proof in LKDe. Then the clause set CL(j) is equationally unsatisfiable. Remark. A clause set C is equationally unsatisfiable if C does not have a model where = is interpreted as equality over a domain.

Proof. This proof first appeared in [1]. Let n be a node in j and S0(n) the subsequent of S(n) which consists of the ancestors of W (i.e. of a cut). It is shown by induction that S0(n) is LKDe-derivable from Cn . If n0 is the root then, clearly, S0(n0) = ` and the empty sequent ` is LKDe-derivable from the axiom set Cn0 , which is just CL(j). As all inferences in LKDe are sound over equational interpretations (where new symbols introduced by definition introduction have to be interpreted according to the defining equivalence), CL(j) is equationally unsatisfiable. Note that, without the rules = : l and = : r, the set CL(j) is just unsatisfiable. Clearly the rules = : l and = : r are sound only over equational interpretations.

The next steps in CERES are

( 1 ) the computation of the proof projections j[C] w.r.t. clauses C 2 CL(j), ( 2 ) the refutation of the set CL(j), resulting in an RP-tree g, i.e. in a deduction tree defined by the inferences of resolution and paramodulation, and ( 3 ) “inserting” the projections j[C] into the leaves of g.

For step ( 1 ) we skip in j all inferences where the auxiliary resp. main formulas are ancestors of a cut. Instead of the end-sequent S we get S C for a C 2 CL(j).

Step ( 2 ) consists in ordinary theorem proving by resolution and paramodulation (which is equationally complete). For refuting CL(j) any first-order resolution prover can be used. By the completeness of the methods we find a refutation tree g as CL(j) is unsatisfiable by Theorem 1.

Step ( 3 ) makes use of the fact that, after computation of the simultaneous most general unifier of the inferences in g, the resulting tree g0 is actually a derivation in LKDe. Indeed, after computation of the simultaneous unifier, paramodulation becomes = : l and = : r and resolution becomes cut in LKDe. Note that the definition rules, like the logical rules, do not appear in g0. Now for every leaf n in g0, which is labelled by a clause C0 (an instance of a clause C 2 CL(j)) we insert the proof projection j[C0]. The result is a proof with only atomic cuts.

The proof projection is only sound if the proof j is skolemized, i.e. there are no strong quantifiers (i.e. quantifiers with eigenvariable conditions) in the end-sequent. If j is not skolemized a priori it can be transformed into a skolemized proof j0 in polynomial (at most quadratic) time; for details see [3].

For illustration, consider the following example: j = j1 = j2 = We have CL(j) = fP(u) ` Q(u); ` P(a); Q(v) `g. The resolution refutation d of CL(j) j1 j2 (8x)(P(x) ! Q(x)) ` (9y)(P(a) ! Q(y))

cut P(u)? ` P(u) Q(u) ` Q(u)?

P(u)?; P(u) ! Q(u) ` Q(u)? !: l

P(u) ! Q(u) ` (P(u) ! Q(u))? !: r

P(u) ! Q(u) ` (9y)(P(u) ! Q(y))? 9 : r (8x)(P(x) ! Q(x)) ` (9y)(P(u) ! Q(y))? 8 : l (8x)(P(x) ! Q(x)) ` (8x)(9y)(P(x) ! Q(y))? 8 : r

P(a) ` P(a)? Q(v)? ` Q(v)

P(a); (P(a) ! Q(v))? ` Q(v) !: l (P(a) ! Q(v))? ` P(a) ! Q(v) !: r (P(a) ! Q(v))? ` (9y)(P(a) ! Q(y)) 9 : r (9y)(P(a) ! Q(y))? ` (9y)(P(a) ! Q(y)) 9 : l (8x)(9y)(P(x) ! Q(y))? ` (9y)(P(a) ! Q(y)) 8 : l

Q(a) ` R does the job of refuting CL(j). By applying the most general unifier s of d , we obtain a ground refutation g = d s : This will serve as a skeleton for a proof in ACNF. To complete the construction, we combine the skeleton with the following projections (grounded by s ): j(C1) = j(C2) = j(C3) =

P(a) ` P(a) Q(a) ` Q(a)

P(a); P(a) ! Q(a) ` Q(a) !: l

P(a); (8x)(P(x) ! Q(x)) ` Q(a) 8 : l P(a); (8x)(P(x) ! Q(x)) ` (9y)(P(a) ! Q(y)); Q(a) w : r P(a) ` P(a)

w : r

P(a) ` P(a); Q(v) ` P(a) ! Q(v); P(a) !: r ` (9y)(P(a) ! Q(y)); P(a) 9 : r (8x)(P(x) ! Q(x)) ` (9y)(P(a) ! Q(y)); P(a) w : l

Q(a) ` Q(a)

P(a); Q(a) ` Q(a) w : l

Q(a) ` P(a) ! Q(a) !: r

Q(a) ` (9y)(P(a) ! Q(y)) 9 : r Q(a); (8x)(P(x) ! Q(x)) ` (9y)(P(a) ! Q(y)) w : l

j(C2) j(C1) B ` C; P(a) P(a); B ` C; Q(a)

cut B; B ` C;C; Q(a)

B; B; B ` C;C;C

B ` C

j(C3) Q(a); B ` C contractions cut The composition of skeleton and projections yields j(g) = where B = (8x)(P(x) ! Q(x)), C = (9y)(P(a) ! Q(y)). Clearly, j(g) is a proof of the end-sequent of j in ACNF. 3.4

Cut-elimination, proof visualization and Herbrand sequent extraction with CERES After compiling the HandyLK source to XML with HLK, it is possible to perform proof analysis using the CERES tool. In our example, we are interested in extracting a Herbrand sequent from a cut-free proof of the end-sequent of j. To this end, the CERES method is used, which extracts an unsatisfiable set of clauses from the proof and refutes it. In theory, any resolution prover can be used to refute this set of clauses; Table 1 lists the format conversions currently supported by our implementation.

In our example, we use Otter to find a resolution refutation. Running CERES yields the following characteristic clause set in Otter input format -> =("f"(+(x4, x5)), "0"), =("f"(+(x5, x4)), "1"). % (C1) =("f"(+(x4, x6)), "0"), =("f"(+(+(+(x4, x6), "1"), x7)), "0") ->. % (C2) =("f"(+(x4, x6)), "1"), =("f"(+(+(+(x4, x6), "1"), x7)), "1") ->. % (C3) which corresponds to the set of clauses which is refutable in arithmetic.

The Otter proof object resulting from the successful refutation is then converted into a resolution tree in XML format. According to the CERES method, this refutation is then transformed into an LKDe proof j0 of the end-sequent of j in ACNF. j0 consists of 1262 rules, and is clearly too large to be depicted here (a small part of it can be seen in Figure 2(b)). From j0, a Herbrand sequent is extracted and stored in XML as a list of sequents (containing only the Herbrand sequent). All proofs and sequents computed during the analysis can be visualized using ProofTool as seen in Figure 2. (a) Visualizing the input proof j.

(b) A bird’s eye view on the output proof j0.

To make the Herbrand sequent extracted in our example easier to read, the terms have been simplified modulo commutativity and associativity of +, and the following abbreviations are used: p1 = x + z + 2; p2 = x + z + 4; p3 = z + 1; p5 = w + x + y + z + 6; p6 = w + x + y + z + 8; p7 = w + x + y + z + 4; p4 = x + y + z + 3; With this, the Herbrand sequent of the ACNF of the tape proof is f (p1) = 0 _ f (p1) = 1; f (p2) = 0 _ f (p2) = 1; f (p3) = 0 _ f (p3) = 1; f (p4) = 0 _ f (p4) = 1; f (p5) = 0 _ f (p5) = 1; f (p6) = 0 _ f (p6) = 1; f (p7) = 0 _ f (p7) = 1 ` p1 6= p2 ^ f (p1) = f (p2); p3 6= p1 ^ f (p3) = f (p1); p3 6= p2 ^ f (p3) = f (p2); p1 6= p4 ^ f (p1) = f (p4); p5 6= p6 ^ f (p5) = f (p6); p7 6= p5 ^ f (p7) = f (p5); p7 6= p6 ^ f (p7) = f (p6); p4 6= p7 ^ f (p4) = f (p7): Clearly, the Herbrand sequent is much smaller than the ACNF from which it is extracted. As it contains all quantifier instantiation information from the proof, it is much better suited to human interpretation than the proof itself. The Herbrand sequent we extracted from j0 can be interpreted as the following proof: Theorem 2. On a tape with infinitely many cells where each cell is labelled 0 or 1, there are two distinct cells that are labelled the same.

Proof. It is easy to see that the following inequalities hold:

p1 6= p2; p3 6= p1; p3 6= p2; p1 6= p4; p5 6= p6; p7 6= p5; p7 6= p6; p4 6= p7: We may therefore delete their occurrences from the right side of the Herbrand sequent and obtain the simplified sequent ` f (p1) = f (p2); f (p3) = f (p1); f (p3) = f (p2); f (p1) = f (p4); f (p5) = f (p6); f (p7) = f (p5); f (p7) = f (p6); f (p4) = f (p7): Now, we assume that this sequent is false and obtain a contradiction. If the sequent is false, then all formulas on the left side are true and all formulas on the right side are false, so we may assume f (pi) = 0 _ f (pi) = 1 for i 2 f1; : : : ; 7g. Let f (p1) = a with a 2 f0; 1g and let a¯ = 1 a. We have assumed that the first formula on the right side is false, so f (p2) = a¯. From the second formula we obtain f (p3) = a¯, and from the third f (p3) = a¯¯ = a, which yields the desired contradiction.

Note that the proof extracted from the Herbrand sequent uses only 3 cells even though the sequent itself is not minimal. Still, the proof yields a minimal Herbrand sequent directly: f (p1) = 0 _ f (p1) = 1; f (p2) = 0 _ f (p2) = 1; f (p3) = 0 _ f (p3) = 1; ` p1 6= p2 ^ f (p1) = f (p2); p3 6= p1 ^ f (p3) = f (p1); p3 6= p2 ^ f (p3) = f (p2): Comparing the proof obtained from the Herbrand sequent with the original proof, we have gained an important piece of information: in the original proof, it is shown that either infinitely many cells are labelled 0 or 1, while in the cut-free proof, only finitely many cells are used in the proof. 4

Open problems and future work

On the theoretical side we are working on the following two major extensions of the CERES-method: The first is an extension of CERES to second-order logic, a first step in that direction has been achieved by extending it to the fragment of second-order proofs defined by containing only quantifier-free comprehension in [9]. This extension will allow the formalization of a much broader range of mathematical proofs and yield a more convenient proof formalization. The second theoretical extension enables the method to work without skolemization whose main benefit will be the ability to eliminate single cuts or even parts of a cut which is more realistic in mathematical applications than the simultaneous elimination of all cuts.

Concerning the practical aspects, it became clear in our experiments with the system that there are two bottlenecks: 1. the formalization of the input proof and 2. finding a resolution refutation of the characteristic clause set. Accordingly also these two points are of major concern for our future work.

To ease the formalization of the input proof, we plan to enhance the capabilities of HLK, in particular those of the auto propositional-mode to cover also reasoning in equational background theories specified by term rewriting systems. We expect this feature to greatly reduce the amount of time spent on proof formalization, as large parts of formalized proofs consist of equational reasoning. In the long term, however, an interesting option would be to use one of the available proof assistants for the formalization of proofs as these are highly developed tools. The theoretical obstacle to the use of existing proof assistants lies in the fact that a translation of the formalized proof is necessary, as proof assistants typically work in set theory or higher-order logic. Moreover, this translation should not be uniform because, depending on the formalized proof and the aims of the proof analysis, one logical translation will be more useful than another. The main practical obstacle lies in transforming proof objects generated by the proof assistant (if they are provided at all) into the sequent calculus format as described in Section 3.2. Here, a lot of work will have to go into importing the standard library of basic number systems, basic operations, data structures, etc.

The search for a refutation of the characteristic clause set turned out to be a hard problem for current theorem provers (as described in [2]). To handle this problem, we need automated theorem provers with high flexibility, allowing an interactive construction of the resolution refutation. For using several provers in order to combine their respective advantages it would be very beneficial to have a standard output format for encoding resolution refutations, a goal which is partially realized by the TPTP output format. Another approach to enhance the power of the provers is to exploit the fact that the characteristic clause sets of the CERES-method form a very specific subclass of theorem proving problems. In particular, it seems promising to store in the clause set certain information about the structure of the original proof as hints on how to find a refutation and to develop resolution refinements for CERES to use this additional information.

Another important area for future improvement is the human-readable representation of the output proofs as well as other results of the analysis (e.g. Herbrand sequents); this aspect is also critical for the method to adapt better to larger proofs. As the terms generated by cut-elimination are frequently very large it would pay out to implement term simplifiers. A general approach to the simplification of Herbrand sequents lies in computing a minimal variant of it by most general unification and a (possibly external) tautology checker. For specific theories, e.g. arithmetic, one can even do better by using term simplification algorithms based on term rewriting systems. Another useful addition is to include a flexible handling of definitions in the extraction and display of Herbrand sequents which would make them an even more powerful tool for understanding the mathematical content of a formal proof. These changes will further contribute to the creation of an interface for proof analysis which is attractive to logicians and mathematicians alike.

[1]

Matthias

Baaz , Stefan Hetzl, Alexander Leitsch, Clemens Richter, and

Hendrik

Spohr . Proof transformation by CERES . In Lecture Notes in Artificial Intelligence , volume 4108 , pages 82 - 93 . Mathematical Knowledge Management, Springer Berlin, 2006 .

[2]

Matthias

Baaz , Stefan Hetzl, Alexander Leitsch, Clemens Richter, and

Hendrik

Spohr . Ceres: An Analysis of Fu¨rstenberg's Proof of the Infinity of Primes . Theoretical Computer Science , 403 : 160 - 175 , August 2008 .

[3]

Matthias

Baaz and

Alexander

Leitsch . Cut normal forms and proof complexity . Annals of Pure and Applied Logic , 97 ( 1-3 ): 127 - 177 , 1999 .

[4]

Matthias

Baaz and

Alexander

Leitsch . Cut-elimination and Redundancy-elimination by Resolution . Journal of Symbolic Computation , 29 ( 2 ): 149 - 176 , 2000 .

[5]

Matthias

Baaz and

Alexander

Leitsch . Towards a clausal analysis of cut-elimination . Journal of Symbolic Computation , 41 : 381 - 410 , 2006 .

[6]

Gentzen . Untersuchungen u¨ber das logische Schließen . Mathematische Zeitschrift , 39 : 176 - 210 , 405 - 431 , 1934 - 1935 .

[7] Jean-Yves Girard . Proof Theory and Logical Complexity. Elsevier , 1987 .

[8]

Stefan

Hetzl , Alexander Leitsch, Daniel Weller, and Bruno Woltzenlogel Paleo. Herbrand sequent extraction . In Lecture Notes in Artificial Intelligence , volume 5144 , pages 462 - 477 . Mathematical Knowledge Management, Springer Berlin, 2008 .

[9]

Stefan

Hetzl , Alexander Leitsch, Daniel Weller, and Bruno Woltzenlogel Paleo. A Clausal Approach to Proof Analysis in Second-Order Logic . In Symposium on Logical Foundations of Computer Science (LFCS 2009 ), Lecture Notes in Computer Science. Springer, 2009 . to appear.

[10]

Alexander

Leitsch and

Clemens

Richter . Equational theories in ceres . unpublished (available at http: //www.logic.at/ceres/), 2005 .

[11] The Coq development team. The Coq proof assistant reference manual . LogiCal Project , 2004 . Version 8.0.

[12] Tobias

Nipkow

, Lawrence C. Paulson, and Markus Wenzel. Isabelle/HOL - a proof assistant for HigherOrder Logic , volume 2283 of LNCS . Springer Berlin, 2002 .

[13]

Urban . Classical Logic and Computation . PhD thesis , University of Cambridge Computer Laboratory, 2000 .