=Paper=
{{Paper
|id=Vol-2756/paper9
|storemode=property
|title=Why Proof-Theory Matters in Specification-Based Testing
|pdfUrl=https://ceur-ws.org/Vol-2756/paper_9.pdf
|volume=Vol-2756
|authors=Alberto Momigliano
|dblpUrl=https://dblp.org/rec/conf/ictcs/Momigliano20
}}
==Why Proof-Theory Matters in Specification-Based Testing==
Why Proof-Theory Matters in Specification-Based Testing? Alberto Momigliano Dipartimento di Informatica, Università degli Studi di Milano momigliano@di.unimi.it Abstract. We survey some recent developments in giving a logical recon- struction of specification-based testing via the lenses of structural proof-theory. 1 Introduction Formal verification of software properties is still a labor intensive endeavor, notwith- standing recent advances: automation plays only a partial role and the engineer is heavily involved not only in the specification stage, but in the proving one as well, even with the help of a proof assistant. This effort is arguably misplaced in the design phase of a software artifact, when mistakes are inevitable and even in the best scenario the specification and its implementation may change. A failed proof attempt is hardly the best way to debug either. These remarks are, of course, not novel, as they lie at the basis of model checking and other counter-examples generation techniques, where the emphasis is on automat- ically refuting, rather than proving, that some code respects its specification. Contrary to Dijkstra’s diktat, testing, and more in general validation, has found an increasing niche in formal verification, prior or even in alternative to theorem proving [6, 20]. The message of this brief report is that, somewhat surprisingly, structural proof- theory [19] offers a unifying approach to the field. While the ideas that I am going to sum up here may have a wider applicability [13], I am going to narrow it to: – Specification-based testing (SBT), also known as property-based testing [14], a lightweight validation technique whereby the user specifies executable properties that the code should satisfy and the system tries to refute them via automatic (typically random) data generation. – One specific domain of interest: the mechanization of the semantics of program- ming languages and related artifacts [1, 12], where proofs tend to be shallow, but may have hundreds of cases and are therefore a good candidate to SBT. 2 SBT as Proof-Search We use as a running examples a call-by-value λ-calculus (where values are lambdas and numerals) whose static and big-step dynamic semantics follows — readers should ? Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Partially supported by GNCS project “METALLIC #2: METodi di prova per il ragionamento Automatico per Logiche non-cLassIChe”. substitute it with a more substantial specification of a programming language whose meta-theory they wish to investigate: x:A∈Γ x6∈ dom(Γ ) Γ,x:A`M :B T-N T-VR T-AB `n:nat Γ `x:A Γ `λx.M :A→B Γ `M1 :A→B Γ `M2 :B T-AP Γ `M1 ·M2 :B value V M1 ⇓λx.M M2 ⇓V2 M{V2 /x}⇓V E-V E-AP V ⇓V M1 ·M2 ⇓V Consider now the type preservation property for closed terms: ∀MM 0A. M ⇓M 0 −→M :A−→M 0 :A In fact, the result does not hold for this calculus, since we have managed to slip a typo in one of the rules. One counter-example is M =(λx.x·n)·n, M 0 =n·n, A=nat. How to go from it to the origin of the bug is another research topic in itself [18]; suffices to say that it points to a mistake in rule T-AP, namely the type in the minor premise should be A. A tool that automatically provides such a counter-example would save us from wasting time on a potentially very long failed proof attempt. While this issue can and has been successfully tackled in a functional program- ming setting [15], at least two factors make a proof-theoretic reconstruction fruitful: 1) it fits nicely with the (co)inductive reading of a rule-based presentation of our system-under-test 2) it easily generalizes to logics that intrinsically handle issues that are pervasive in the domain of programming languages semantics, such as naming and scoping. In fact, as argued in [7], the SBT literature is rediscovering (constraint) logic programming ideas such as narrowing, mode checking, random back-chaining etc. If we view a specification (property) as a logical formula ∀x[(τ(x)∧P (x))⊃Q(x)] where τ is a typing predicate and P and Q are two other predicates defined using Horn clause specifications (to begin with), providing a counter-example consists of negating the property, and searching for a proof of ∃x[(τ(x)∧P (x))∧¬Q(x)]. Stated in this way the problem points to a logic programming solution, and since the seminal work of Miller et al. [17], structural proof-theory formulates it as a proof-search problem in the sequent calculus, where the specification is a fixed set of assumptions (typically sets of clauses) and the negated property is the goal. A first solution that I was involved with is αCheck [8], which supplements a nom- inal logic programming interpreter [10] to account for counter-example search. The tool uses 1) nominal logic as a logical foundation, which is particularly apt at encoding binding structures, 2) automatically derived type-driven exhaustive generators for data enumeration, 3) two approaches to implementing negation: negation as failure and negation elimination [9], 4) a fixed search strategy, namely iterative-deepening based on the height of partial proof trees. 3 SBT via FPC While αCheck is quite effective (see the case studies at https://github.com/ aprolog-lang/checker-examples), the approach was unnecessarily rigid, in partic- ular wiring-in a fixed data generation and search strategy, and did not reflect the Ξ 1 `G1 Ξ 2 `G2 ∧e (Ξ,Ξ 1 ,Ξ 2 ) tte (Ξ) Ξ 0 `G[t/x] ∃e (Ξ,Ξ 0 ,t) Ξ `G1 ∧G2 Ξ `tt Ξ `∃x.G Ξ 0 `G (A :- G)∈grnd (P) unfolde (Ξ,Ξ 0 ) Ξ `A ..................................................................................... n`G1 n`G2 n`G[t/x] n`G1 ∧G2 n`tt n`∃x.G n`G (A :- G)∈grnd (P) n≥0 n+1`A Fig. 1. FPC for Horn logic and one of its instantiations richness of features that SBT offers. However, being the proof-theory of αCheck based on the notion of uniform proofs [17], it is easy to generalize it via the subsuming theory of focused proof systems [3]. We can roughly characterize focusing as a complete strat- egy to organize the rules of the sequent calculus into two phases: 1) a negative phase corresponding to goal-reduction, where we apply rules involving don’t-care-nondeter- minism; as a result, there is no need to consider backtracking, and 2) a positive phase corresponding to back-chaining (don’t-know-nondeterminism): here, inference rules need to be supplied with external information (e.g., which clause to back-chain on) in order to ensure that a completed proof can be found. Thus, when building a proof tree from the conclusion to its leaves, the negative phase corresponds to a simple de- terministic computation, while the positive phase may need to be guided by an oracle. The connection with SBT is that in a query the positive phase (which corresponds to the generation of possible counter-examples) is represented by ∃x and (τ(x)∧P (x)). That is followed by the negative phase (which corresponds to counter-example testing) and is represented by ¬Q(x). This formalizes the intuition that generation may be hard, while testing is just computation. The final ingredient is how to supply the external information to the positive phase: this is where the theory of foundational proof certificates [11] (FPC) comes in. In their fully generality, FPCs can be seen as a generalization of proof-terms in the Curry-Howard tradition, and are able to define a range of proof structures used in various theorem provers (e.g., resolution refutations, Herbrand disjuncts, tableaux, etc). They can be programmed as clerks and experts predicates that decorate the sequent rules used in an FPC proof checking kernel. An FPC system is a specification of the certificate format together with the clerks and experts processing it. In our setting, we can view FPCs as simple logic programs that guide the search for potential counter-examples using different generation strategies. Figure 1 contains a simple proof system for a fragment of Horn clause provability in which each inference rule is augmented with an additional premise involving an expert predicate, a certificate Ξ, and possibly continuations of certificates (Ξ 0, Ξ1, Ξ2), if one reads the rules from conclusion to premises. The logic programmers among us will rec- ognize it as an instrumented version of the vanilla meta-interpreter over a fixed Horn program P. For example, the ∃-expert may be in charge of extracting from Ξ the term t with which to instantiate G, so that we can build the rest of the proof according to the resulting certificate Ξ 0. In the bottom part of the figure, we instantiate the frame- work with the simplest form of proof certificate, namely a positive integer, where the only active expert is a simple non-zero check while back-chaining: this characterizes ex- haustive generation bounded by height, which happens to be the generation strategy of αCheck. As detailed in [7], different FPCs capture random generation, via randomized backtracking, as well as diverse features such as δ-debugging, bug-provenance, etc. 4 To Infinity and Beyond Reasoning about infinite computations via coinduction and corecursion has an ever-increasing relevance in formal methods and, in particular, in the semantics of programming languages, (see [16] for a compelling example) and, of course, coin- duction underlies (the meta-theory of) process calculi. To our knowledge, there are no SBT approaches for coinductive specifications, save for the quite limited features provided by Isabelle/HOL’s Nitpick [6]. When addressing potentially infinite computations, where in our setup we strive to model infinite behavior (think divergence of a finite program) rather than infinite objects (e.g., streams), we need to go significantly beyond the simple proof-theory of [7] and adopt a much stronger logic with explicit rules for induction and coinduction. A natural choice is the fixed point linear logic µMALL [4], which is associated to the Bedwyr model-checker [5]. In fact, this logic has already shown its colors in the proof- theoretic reconstruction of model checking problems such as (non)-reachability [13]. µMALL consists of a sequent calculus presentation of multiplicative additive linear logic with least and greatest fixed points operators in lieu of exponentials, over a simply-typed term language. Continuing with our running example, let us now consider a coinductive defi- nition of CBV evaluation following [16]. In other terms, we take the same rules as in Section 2, but we read them as the greatest fixed point of the defined relation. This is represented by the following formula (reminiscent of a linearization of Clark’s completion), where ν is the greatest fixed point operator, Λ is the abstractor in the meta-logic and val stands for the µ-formula characterizing values: coeval ≡ ν(ΛCE.Λm.Λm0 .(∃V. m=V ⊗m0 =V ⊗ val V )⊕ (∃M1 M2 MV2 V. m=M1 ·M2 ⊗m0 =V ⊗(CE M1 (λx.M))⊗ (CE M2 V2 )⊗(CE (M{V2 /x}) V ))) ⊗ and ⊕ are multiplicative conjunction and additive disjunction and for the sake of space we do not make explicit the encoding of the object-level syntax and of substitution inside the meta-logic. Whether or not this notion of co-evaluation makes sense (see [2] for a fair crit- icism), we would like to investigate if standard properties such as type soundness or determinism of evaluation hold: they do not — to refute the latter, just note that a divergent term such as Ω co-evaluates to anything. Type preservation, for a correct version of the rules in Sec. 2, is falsified by a variant of the Y -combinator. We have a prototype implementation of SBT for coinductive specifications on top of Bedwyr, which we use both for the generation of test cases (controlled using specific FPCs) and for the testing phase. Such an implementation has the advantage of allowing us to piggyback on Bedwyr’s facilities for efficient proof-search via tabling for (co)inductive predicates, thus avoiding costly meta-interpretation. To make it more concrete, let me report the query refuting determinism of co-evaluation. We use Bedwyr’s concrete syntax, where check, implementing the kernel rules in the top of Fig. 1, is in charge of controlling the generation of lambda terms (predicate is exp, parameterized over a context of bound variables), here in the exhaustive fashion detailed ibidem (generator height 4); coeval encodes the coinductive CBV operational semantics using higher-order abstract syntax and the last conjunct corresponds to ground disequality. ?= check (height 4) ((is_exp [] M) && (is_exp [] M1) && (is_exp [] M2)) /\ coeval M M1 /\ coeval M M2 /\ (M1 = M2 -> false). Found a solution (+ 4173ms): M2 = con one, M1 = con zero, M = app (fun (x\ app x x)) (fun (x\ app x x)) The system finds in reasonable time the expected counter-example, where M is the encoding of Ω. Note that we only generate finite terms, but the testing phase now appeals (twice) to the coinductive hypothesis. Other applications of SBT w.r.t. infinite behavior are in separating various notion of equivalences in lambda and process calculi: for example, applicative and ground similarity in PCFL [22], or analogous standard results in the π-calculus. These ex- amples put forward another challenge: the specification of a coinductive notion such as applicative similarity goes beyond the Horn fragment, to wit: asim ≡ ν(ΛAS.Λm.Λn.∀M 0. eval m (λx.M 0)(∃N 0. eval m (λx.N 0)⊗ ∀R. (AS (M 0{R/x}) (N 0{R/x}))) This makes the treatment of negation in the testing phase of a SBT query problematic, since the interpretation of finite failure as provability of falsehood breaks down, at least in an intuitionistic setting. Here, the adoption of linear logic as a meta-logic comes to the rescue, as in linear logic occurrences of negations can be eliminated by using De Morgan duality and inequality. 5 Conclusion I have tried to delineate a path where structural proof-theory reconstructs, unifies and extends current trends in SBT, with a particular emphasis to ongoing work on extending the paradigm to infinite computations. A natural next step is concur- rency: logical framework such as CLF (and its implementation Celf [23]) based on sub-structural logics have been designed to encode concurrent calculi, e.g., session types [21]. However, the meta-theory of such frameworks is still in the workings and this precludes so far any reasoning about them. On the other hand, a FPC approach to the validation of those properties seems a low hanging fruit. Acknowledgment A shout-out to my co-authors in this line of work: Rob Blanco, James Cheney, Francesco Komauli, Dale Miller and Matteo Pessina. References 1. A. Abel, G. Allais, A. Hameer, B. Pientka, A. Momigliano, S. Schäfer, and K. Stark. Poplmark reloaded: Mechanizing proofs by logical relations. J. Funct. Program., 29:e19, 2019. 2. D. Ancona, F. Dagnino, and E. Zucca. Reasoning on divergent computations with coaxioms. Proc. ACM Program. Lang., 1(OOPSLA):81:1–81:26, 2017. 3. J.-M. Andreoli. Logic programming with focusing proofs in linear logic. J. of Logic and Computation, 2(3):297–347, 1992. 4. D. Baelde. Least and greatest fixed points in linear logic. ACM Trans. Comput. Log., 13(1):2:1–2:44, 2012. 5. D. Baelde, A. Gacek, D. Miller, G. Nadathur, and A. Tiu. The Bedwyr system for model checking over syntactic expressions. In F. Pfenning, editor, 21th Conf. on Automated Deduction, number 4603 in LNAI, pages 391–397. Springer, 2007. 6. J. C. Blanchette, L. Bulwahn, and T. Nipkow. Automatic proof and disproof in Isabelle/HOL. In C. Tinelli and V. Sofronie-Stokkermans, editors, FroCoS, volume 6989 of Lecture Notes in Computer Science, pages 12–27. Springer, 2011. 7. R. Blanco, D. Miller, and A. Momigliano. Property-based testing via proof reconstruction. In PPDP, pages 5:1–5:13. ACM, 2019. 8. J. Cheney and A. Momigliano. αCheck: A mechanized metatheory model checker. Theory and Practice of Logic Programming, 17(3):311–352, 2017. 9. J. Cheney, A. Momigliano, and M. Pessina. Advances in property-based testing for αProlog. In B. K. Aichernig and C. A. Furia, editors, TAP 2016, volume 9762 of Lecture Notes in Computer Science, pages 37–56. Springer, 2016. 10. J. Cheney and C. Urban. Nominal logic programming. ACM Transactions on Programming Languages and Systems, 30(5):26, August 2008. 11. Z. Chihani, D. Miller, and F. Renaud. A semantic framework for proof evidence. J. of Automated Reasoning, 59(3):287–330, 2017. 12. A. P. Felty, A. Momigliano, and B. Pientka. Benchmarks for reasoning with syntax trees containing binders and contexts of assumptions. Math. Struct. Comput. Sci., 28(9):1507–1540, 2018. 13. Q. Heath and D. Miller. A proof theory for model checking. J. of Automated Reasoning, 63(4):857–885, 2019. 14. J. Hughes. Quickcheck testing for fun and profit. In M. Hanus, editor, PADL 2007, volume 4354 of LNCS, pages 1–32. Springer, 2007. 15. C. Klein and coauthors. Run your research: on the effectiveness of lightweight mechanization. POPL ’12, pages 285–296. ACM, 2012. 16. X. Leroy and H. Grall. Coinductive big-step operational semantics. Information and Computation, 207(2):284–304, 2009. 17. D. Miller, G. Nadathur, F. Pfenning, and A. Scedrov. Uniform proofs as a foundation for logic programming. Annals of Pure and Applied Logic, 51:125–157, 1991. 18. A. Momigliano and M. Ornaghi. The blame game for property-based testing. In CILC, volume 2396 of CEUR Workshop Proceedings, pages 4–13. CEUR-WS.org, 2019. 19. S. Negri, J. von Plato, and A. Ranta. Structural Proof Theory. Cambridge University Press, 2001. 20. Z. Paraskevopoulou, C. Hritcu, M. Dénès, L. Lampropoulos, and B. C. Pierce. Foundational property-based testing. In C. Urban and X. Zhang, editors, ITP 2015, volume 9236 of Lecture Notes in Computer Science, pages 325–343. Springer, 2015. 21. F. Pfenning and D. Griffith. Polarized substructural session types. In FoSSaCS, volume 9034 of Lecture Notes in Computer Science, pages 3–22. Springer, 2015. 22. A. M. Pitts. Operationally Based Theories of Program Equivalence. In P. Dybjer and A. M. Pitts, editors, Semantics and Logics of Computation, 1997. 23. A. Schack-Nielsen and C. Schürmann. Celf - A logical framework for deductive and concurrent systems (system description). In IJCAR, volume 5195 of Lecture Notes in Computer Science, pages 320–326. Springer, 2008.