-

LFSC for SMT Proofs: Work in Progress

Aaron Stump

Andrew Reynolds

Cesare Tinelli

Austin Laugesen

Harley Eades

Corey Oliver

Ruoyu Zhang

0 0 The University of Iowa , USA

2012

21 27

This paper presents work in progress on a new version, for public release, of the Logical Framework with Side Conditions (LFSC), previously proposed as a proof meta-format for SMT solvers and other proof-producing systems. The paper reviews the type-theoretic approach of LFSC, presents a new input syntax which hides the type-theoretic details for better accessibility, and discusses work in progress on formalizing and implementing a revised core language.

formulas contexts 2 ` ::= ::= p j 1 ! j ; 2

Assump

; (p ! (q ! r)); q; p ` q rules for the SMT-LIB logics QF IDL and QF LRA, and showed how to use type computation for LFSC to compute interpolants [ 10, 9 ]. Optimized LFSC proof checking is quite e cient relative to solving time: for QF UF benchmarks we found a total average overhead of 30% for proof generation and checking (together) over and above solving alone.

We are now focusing on providing a high-quality implementation of LFSC for public release. It is our hope that the availability of such an implementation will be the nal step which will enable solver implementors to adopt LFSC as a single, semantics-based meta-format for SMT proofs. In the rest of this short paper, we describe our work in progress towards this vision, which consists of devising a more accessible input syntax (Section 2) and formalizing a new core language which is able to accommodate features like local de nitions and implicit arguments (Section 3), which we found to be important in practice but which are often not considered in standard treatments of LF.

We would like to acknowledge also the contributions of Jed McClurg and Cuong Thai to earlier work on this new version of LFSC. 2

A Surface Syntax Based on Grammars and Rules LF and LFSC are meta-languages based on a dependent type theory. The methodology for using LF/LFSC is to encode object-language constructs using dependently typed data constructors in LF/LFSC. If used directly, this requires unfamiliar type-theoretic notation and conceptualization, a burden we do not wish to impose on SMT solver implementors.

To illustrate these points, let us consider an example proof system in standard mathematical notation. Figure 1 shows the syntax and inference rules of minimal propositional logic. As our proposed syntax for side-condition code has not changed fundamentally, we are deliberately restricting our attention here to a signature without side conditions. For more on side-condition code in LFSC, including examples for a resolution calculus, see [ 6 ].

In the LF methodology, a proof system like the one in Figure 1 is encoded as a sequence of type declarations called a signature. The syntactic categories, here Formula and Context, are encoded as types, whose syntactic constructs are encoded as constructors of those types. formula : Type imp : formula ! formula ! formula holds : formula ! Type ImpIntro : f1 : formula: f2 : formula:((holds f1) ! (holds f2)) ! (holds (imp f1 f2)) ImpElim : f1 : formula: f2 : formula:(holds (imp f1 f2)) ! (holds f1) ! (holds f2)

ImpIntro (imp p (imp q r)) (imp q (imp p r)) u:ImpIntro q (imp p r) v:ImpIntro p r w:ImpElim q r (ImpElim p (imp q r) u w) v

The judgments of the proof system are also encoded as types, with the inference rules as constructors. Object-language variables are represented with meta-language variables, objectlanguage binders with LF's -binder, and logical contexts (like in this example) with the LF typing context.

Figure 2 shows the corresponding signature in standard type-theoretic notation for LF. The Assump rule does not have a constructor, because assumptions in the object language have been mapped to variables in LF. For the example proof, the -bound variables u, v, and w correspond to the assumptions of p ! (q ! r), q, and p, respectively. Since inference rules are encoded using term constructors with dependent types ( types), there is rather a lot of redundant information in that encoded example proof. Implementations of LF like Twelf allow one to omit some of these arguments in some cases, if they can be inferred from other parts of the proof term [ 7 ]. In our rst implementation of LFSC, we allowed proofs to contain underscores for omitted arguments, which we then sought to reconstruct in a similar way to what is done in Twelf.

We are designing a surface language for LFSC, intended to rely on the more familiar concepts of context-free grammars for describing syntax, and more familiar notation for inference rules, as is done in tools like Ott (and others) [ 11 ]. Using this surface language, our sample signature is encoded as in Figure 3. The ImpIntro rule uses square brackets and a turnstile for hypothetical reasoning; in this case, to represent the fact that holds f2 is to be proved under the assumption of holds f1. We use root names like f for formula to avoid specifying types for meta-variables in the rules. Root names are speci ed when a syntactic category is de ned. The example proof term is similar to the one in Figure 2, except that arguments which an analysis determines can be reconstructed can be omitted. De ning this analysis is still work in progress, but in this case, it will determine that of the meta-variables used in the ImpIntro and ImpElim rules, only f1 needs a value in order to guarantee that the types computed for terms are fully instantiated (that is, without free meta-variables). The bene ts of omitting arguments in dependently typed languages are well known: notice how much more compact the example proof term is without values for the meta-variables f2 for ImpIntro and f1 and f2 for ImpElim.

For encoding SMT logics, we made use, in our previous proposals, of LF type-checking to implement SMT type-checking of SMT terms contained in proofs. This is done in a standard way by using an indexed type term T to represent SMT terms which have (encoded) SMT type T. Our new surface syntax also supports indexed syntactic categories. For example, Figure 4

SYNTAX formula f ::= imp f1 f2 .

JUDGMENTS (holds f) RULES [ holds f1 ] |- holds f2 ----------------------------- ImpIntro holds (imp f1 f2) . holds (imp f1 f2) , holds f1 ----------------------------- ImpElim holds f2 .

ImpIntro (imp p (imp q r)) u. ImpIntro q v. ImpIntro p w. ImpElim (ImpElim u w) v gives part of a signature for a minimal SMT theory. This syntax de nition indexes the syntactic category term by the syntactic category sort. Everywhere a meta-variable for terms is used, it must be listed with its sort. Similarly, the de nitions of constructors show the resulting indices. For example, since equality is sort-polymorphic, we see that its meta-variables are indexed by (the same) sort s, while its resulting sort is indicated as bool. This example also demonstrates our surface syntax for binding, namely ^, in the declaration of forall. This notation means that forall binds a variable ranging over terms of sort s, and then has a body which must be a term of sort bool.

As a nal example of the new syntax, Figure 5 shows the LFSC encodings of SMT rules for universal quanti ers. The rules are complicated somewhat by the need to express the sort information for terms. The rst rule uses the ^ binding notation to indicate a parametric t<s> ^ holds f { t<s> } --------------------------------- all_intro holds (forall t<s> ^ f{t<s>} ). holds (forall t<s> ^ f { t<s> } ) ----------------------------------- all_elim holds f{t<s>} . judgment in the premise: to apply all_intro, we must supply a proof of holds f{t<s>} for an arbitrary term t of sort s. The fact that the term is arbitrary is indicated by the use of ^ in the premise. We also see a notation for expression contexts, using curly braces. Any meta-variable can be used as a context variable, although if that meta-variable is from an indexed syntactic category, the result index must also be shown. Here, we make use of a de ned syntactic category of formulas, so that no result sort need be listed for meta-variable f. All these di erent binding concepts (expression contexts, parametric proofs, and hypothetical proofs) are ultimately implemented with just the single -binder of LF. But for the surface language, our common informal meta-language tends to distinguish them, and so we follow that approach in our surface language. 3

Improving the Core Language

While the surface language distinguishes syntactic constructs and proof rules, LF elegantly unites them (under encoding) as just term constructors for possibly indexed datatypes. So both for conciseness and to give a semantics for the di erent features sketched above, we are working to compile signatures from our surface language to a core language for LFSC, which we are specifying formally with the help of the Ott tool [ 11 ]. We are addressing a number of issues important for practical use of LF/LFSC for large proofs:

We have explicit syntax to indicate which arguments to term constructors are implicit and which are explicit.

We are devising an algorithm to ensure that all higher-order implicit arguments (which our surface language limits to second order) can be automatically reconstructed during proof checking. Other approaches simply delay constraints falling outside the decidable higher-order pattern fragment in the hopes of not needing to solve them.

In general, we only intend to execute side-condition code on concrete terms. So we are designing the language to help ensure that as much as statically possible, the arguments to side condition programs do not contain meta-variables for omitted implicit arguments. Such meta-variables can be lled in by type checking, but not during execution of sidecondition code.

We have support for global and local de nitions, both of which are incorporated into de nitional equality. Some other treatments of LF do not handle de nitions at all, but these are essential for keeping proof sizes down. sample.sig

Surface LFSC

sample.core sig

Core LFSC

sample.pf

C++ checker for sample.sig Pass: pf valid Fail: pf invalid

Since we use de nitions, we cannot so easily make use of the so-called \canonical forms" version of LF, where all terms are assumed to be in -short -long normal form: applications of de ned symbols can be redexes. Instead, we are formulating a system with explicit de nitional equality, which can also ll in implicit arguments.

We retain support for compiling proof rules to e cient low-level (C/C++) code by avoiding explicit substitutions in our formalization. Meta-level substitutions will be carried out directly in code emitted by the compiler, so using explicit substitutions merely obscures what will happen in compiled code.

We are designing the language for side-condition programs to respect dependent typing of terms. This does not require implementing a full-blown dependently typed programming language, because indices to types can only contain terms which normalize to constructor terms. (In contrast, in dependently typed programming, indices can include terms with recursion.)

While none of these features is terribly complex, designing detailed typing rules that accommodate them all simultaneously has taken time. For example, we are implementing a requirement that all omitted arguments to a term constructor C must be reconstructed by the time we complete the type checking of an application of C. For this, we constrain the syntax of types for term constructors so that implicit arguments are listed rst, then explicit arguments and possible invocations of side-condition code, and nally a possibly indexed result type. 4

Status and Conclusion

Figure 6 shows the architecture for the LFSC system we are currently implementing. A sample signature sample.sig in the surface syntax discussed above is translated to a core-language signature sample.core sig, which is then compiled to C++ code optimized for checking proofs in that signature. A sample proof sample.pf can then be fed to that checker, which reports whether the proof passes (all inferences correct and all side conditions succeed) or fails.

Co-author Corey Oliver has completed the translator for surface-language signature (\Surface LFSC" in Figure 6). Co-author Austin Laugesen has completed implementation of the compiler for side-condition code (part of LFSC core in Figure 6). This compiler translates the LFSC side-condition language to C++, and supports both reference counting or regions for managing memory allocated during execution of side-condition code. A preliminary evaluation con rms the performance bene ts for this application of region-based memory management, where all new memory is allocated from a single monolithic region of memory which can then be reclaimed at once in constant time. Co-authors Harley Eades and Ruoyu Zhang have worked with Aaron Stump on the design of the core language, which is almost complete. The rst three co-authors have worked to devise the surface syntax, which is complete now. Implementation of the core-language type checker and compiler (\Core LFSC" in Figure 6) in OCaml are just beginning, and should be done Summer 2012. We expect to have an initial public release of the tool in early Fall 2012. We hope that this release will be a decisive step in the quest for a standard meta-format for SMT proofs, and will help the community continue its remarkable success in providing high-performance, usable logic solvers for advanced applications in many elds of Computer Science.

[1]

Barrett ,

Stump , and

Tinelli . The SMT-LIB Standard: Version 2.0 , 2010 . available from www. smtlib.org.

[2]

Besson ,

Fontaine , and

Thry . A Flexible Proof Format for SMT: a Proposal . In P. Fontaine and A . Stump, editors, Workshop on Proof eXchange for Theorem Proving (PxTP) , 2011 .

[3] Sascha

Bohme and Tjark

Weber . Fast LCF-style proof reconstruction for Z3 . In Matt Kaufmann and Lawrence Paulson, editors, Interactive Theorem Proving , volume 6172 of Lecture Notes in Computer Science, pages 179 { 194 . Springer, 2010 .

[4]

Chen ,

Chugh , and

Swamy . Type-preserving compilation of end-to-end veri cation of security enforcement . In Programming Language Design and Implementation (PLDI) , pages 412 { 423 . ACM, 2010 .

[5]

Harper ,

Honsell , and

Plotkin. A Framework for De ning Logics . Journal of the Association for Computing Machinery , 40 ( 1 ): 143 { 184 , January 1993 .

[6]

Oe ,

Reynolds , and

Stump . Fast and Flexible Proof Checking for SMT . In B. Dutertre and O. Strichman, editors, International Workshop on Satis ability Modulo Theories , 2009 .

[7]

Pfenning and C. Schurmann. System Description: Twelf | A Meta-Logical Framework for Deductive Systems . In 16th International Conference on Automated Deduction , 1999 .

[8]

Ranise and

Tinelli . The SMT-LIB Standard, Version 1.2 , 2006 . Available from the "Documents" section of http://www .smtlib.org.

[9]

Andrew

Reynolds , Liana Hadarean, Cesare Tinelli, Yeting Ge, Aaron Stump, and

Clark

Barrett . Comparing proof systems for linear real arithmetic with LFSC . In A. Gupta and D. Kroening, editors, International Workshop on Satis ability Modulo Theories , 2010 .

[10]

Andrew

Reynolds , Cesare Tinelli, and

Liana

Hadarean . Certi ed interpolant generation for EUF . In S. Lahiri and S. Seshia, editors, International Workshop on Satis ability Modulo Theories , 2011 .

[11]

Peter

Sewell , Francesco Zappa Nardelli, Scott Owens, Gilles Peskine, Thomas Ridge, Susmit Sarkar, and

Rok

Strnisa . Ott: E ective tool support for the working semanticist . J. Funct. Program. , 20 ( 1 ): 71 { 122 , 2010 .

[12]

Stump . Proof checking technology for satis ability modulo theories . In A. Abel and C. Urban, editors, Logical Frameworks and Meta-Languages: Theory and Practice , 2008 .