=Paper=
{{Paper
|id=Vol-1804/paper-03
|storemode=property
|title=Algebraic Techniques in Software Verification : Challenges and Opportunities
|pdfUrl=https://ceur-ws.org/Vol-1804/paper-03.pdf
|volume=Vol-1804
|authors=Martin Brain,Daniel Kroening,Ryan McCleeary
|dblpUrl=https://dblp.org/rec/conf/synasc/BrainKM16
}}
==Algebraic Techniques in Software Verification : Challenges and Opportunities==
Algebraic Techniques in Software Verification : Challenges and Opportunities Martin Brain Daniel Kroening Ryan McCleeary University of Oxford University of Oxford University of Iowa martin.brain@cs.ox.ac.uk daniel.kroening@cs.ox.ac.uk ryan-mccleeary@uiowa.edu Abstract—One of the main application areas and driving and it is not clear whether automating it is even possible. Con- forces behind the development of Satisfiability Modulo Theory sequentally, compared to directly human generated formulae, (SMT) solvers is software verification. The requirements of the formulae generated by verification tools are much larger software verification are somewhat different to other applications of automated reasoning, posing a number of challenges but and contain a significant amount of ‘simple’ and ‘irrelevant’ also providing some interesting opportunities. This paper brings parts. together and summarises the algebras and structures of interest, Opportunity: Exploit disequalities and if-then-else. Software along with some of the problems that are characteristic of verification problems tend to produce formulae with some software verification. It is hoped that this will allow computer Boolean structure, particularly from the use of if-then-else algebra researchers to assess the applicability of their techniques to this challenging, but rewarding domain. expression to model different paths of execution. One of the Software verification is the prototypical application domain advantages of the DPLL(T)[2] algorithm is that theory solvers for Satisfiability Modulo Theory (SMT) solvers. There are do not need to handle the Boolean structure of the formula. many aspects of the two research fields that show a signif- The SAT solver will provide a partial assignment to the icant degree of co-evolution. For example, the central role theory literals and the theory solver must determine if they are of theories (and the theories that are available – for example consistent. Unlike traditional algebra problems, this includes bit-vectors and arrays) can be seen as a formalisation of the equalities that are assigned to false and if-then-else operations. domain specific decision procedures that were used in early Exploiting this additional information is likely to improve verification systems [1]. Universal quantification is challenging performance but may require (or enable) new approaches to for most SMT solver algorithms, leading to poor performance some problems. and thus many software verification systems avoid generating Challenge: Handle inequalities. Another feature common in quantifiers. Likewise the importance of model generation1 is software verification problems that is not classically algebraic in part driven but the utility of these models for providing is the use of inequalities. Although there are some applica- execution or error traces in verification systems. The co- tions, such as equivalence checking, that are possible without evolution can also be seen in the SMT-LIB benchmarks which inequalities, the frequency of ordering comparisons in most feature many benchmark collections generated by verification real software means these are a critical requirement. tools. II. A LGEBRAS AND S TRUCTURES OF I NTEREST This paper aims to highlight some of the requirements and As well as having formula structure unlike conventional, ‘evolutionary pressures’ that software verification places on human generated algebraic problems, software verification SMT solver development. It is hoped that this context will problems also tend to use unconventional algebras and rela- help computer algebra researchers to identify, develop and tional structures. A minority of software verification systems refine algorithms so that they can demonstrate impact on use integers2 and reals to model variables, particularly those commercial-scale software verification problems. The topics focused on algorithm rather than program verification. How- raised are a mix between challenges that have to be overcome ever this approach seems to becoming less common as these and opportunities in under-explored / critical areas. do not capture the full behaviour of real systems (overflow, I. F ORMULA S HAPE rounding, etc.) and also increase the complexity of the decision procedure. Challenge: Tolerate irrelevant formulae. The formulae gen- erated by verification tools tend to be very different from those A. Bit-Vectors generated by human modelling of problems. One of the real Core to the performance of SMT solvers on software challenges of mathematical modelling is reducing the problem verification problems is their handling of the theory of fixed to a minimal core which captures the key challenge. This is a width bit-vector[3]. Variables in this theory are interpreted as process which requires considerable intuition and experience bit-vectors of a given size, operations are a mix of (modular) 1 Generating a model or witness is often regard as part of a satisfiability 2 Sometimes referred to as “mathematical integers” to distinguish them from check, particularly for Boolean satisfiability but is not formally required. the integer types in programming languages. Copyright c by the paper’s authors. Copying permitted for private and academic purposes. In: E. Ábrahám, J. Davenport, P. Fontaine, (eds.): Proceedings of the 1st Workshop on Satisfiability Checking and Symbolic Computation (SC2 ), Timisoara, Romania, 02-07-2016, published at http://ceur-ws.org arithmetic (plus, multiply, divide, etc.), logical operations (and, with e exponent and s significand bits, plus R∗ an extension xor, shift, etc.) and structural (extract, concatinate, etc.) with of R with infinities and not-a-number. A family of order- predicates for signed and unsigned comparisons. Although preserving functions vFe,s : Fe,s → R∗ give the real value of there have been some interesting alternatives proposed[4], [5], a floating-point number and round picks between the adjoints most solvers still use “bit-blasting”; converting the expressions of v and applies one, to map back into Fe,s , thus: to a Boolean circuit and using a SAT solver. Optimal circuits f ⊕r g ; round(r, v(f) + v(g)). are known for some operations[6] but others such as mul- tiplication, division and variable shifts introduce additional This approach allows bit-exact specification of floating-point difficulty which can, in some cases, cause the solver to numbers with a minimal number of edge-cases and a maximal time-out for even “simple” queries. Given this fundamental amount of algebraic structure. limitation of current solvers and the significance of bit-vectors Opportunity: Algebraic techniques on R∗ . vFe,s has rela- to application performance, better handling of these formula tively little algebraic structure – floating-point addition and would be of major benefit. multiplication are famously non-associative. As not-a-number Opportunity: Sub-algebras. The range of operations in the is absorbing for all operations, R∗ is not a field but it is theory of bit-vectors makes it hard to find decision procedures an additive and multiplicative commutative monoid with the that work well for arbitrary formulae. However for many associativity property [8] and so semi-ring decision procedures applications, groups of variables will only ever use a handful may be applicable. of operations. For example, variables used as counters will Opportunity: Mixed real and float. If v and round can be only assign constants, increment and check against bounds, handled then this may not only be able to handle floating-point variables that are used as bitmaps are rarely used in multi- problems but also mixed float and real problems which are of plications and the bitwise operations form a Boolean algebra, considerable commercial interest. many modern cryptographic algorithms only use add, roll and C. Differential Theories xor (ARX) and floating-point operations make heavy use of a max-plus algebra with signed comparison. Thus finding useful Another interest driven by control systems is that of rea- sub-signatures of the theory of bit-vectors, identifying their soning about differential equations. Most control systems are algebraic structure and then building decision procedures that developed with respect to a model of the environment with exploit this structure seems like a promising research direction. which they are interacting. This is normally formalised as a system of differential equations. However due to limitations B. Floating-Point of the solver technology this is not directly used in the Many software verification applications concern control verification, instead a computational model is built. If solvers systems which must use sensor data to control a real-world could handle differential equations of some form, it could system (aircraft, train, industrial robots, UAVs, autonomous significantly increase the ability of tools to verify these cyber- cars, “cyber-physical systems”, etc.). Pure integer and fixed- physical systems. point control systems are becoming a rarity with most control Opportunity: Symbolic algorithms for differential systems. systems using floating-point. Thus there is a significant com- dReal [9] has made some impressive progress using numerical mercial drive for verification systems to handle floating-point methods for handling differential equations. Using symbolic numbers efficiently and effectively. algorithms has yet to be explored but may have great potential. The so-called “standard model” converts floating-point ex- III. B EYOND S ATISFIABILITY pressions to real expressions. For example, if f and g are Determining formula satisfiability is the core role of SMT floating-point variables and ⊕r denotes floating-point addition solvers. However there are a number of similar and related with rounding mode r: symbolic reasoning tasks, both inside and outside the solver, f ⊕r g ; (f + g)(1 + δ) |δ| 6 that have significant practical impact and present interesting opportunities for applying algebraic techniques. where is a small, format dependent constant. Although this model is simple, it has a number of limitations. An A. Expression Simplification extensive set of side condition are required to correctly model Expression simplification is a vital component in most actual hardware (no overflows, no subnormal numbers, etc.) SMT solvers and program verification systems. It is used for and often these are the precise conditions we are trying to performing constant folding ((x + 1) + 1 ; x + 2), identify. If these are used then the standard model is an over- simplifications (x <= x ; true) and normalisation approximation of the behaviour of hardware, meaning that (!(x >= y) ; x < y). Although the rewriting steps are spurious SAT results can be given. Worse from an application generally simple, their effects can be drastic. For example, point of view is that it is difficult to generate the bit-exact given unsigned integers x, y and z of more than 16 bit: traces needed to assess, diagnose and fix real systems. assert((x * y) * z == x * (y * z)); As a consequence, a theory of floating-point numbers[7] has been developed and added to the SMT-LIB standard. Its will cause most bit-vector decision procedures to time out but formalisation makes use of sets Fe,s of floating-point numbers can be trivially resolved by a distributivity aware simplifier. Opportunity: Improved simplification For such a key com- fixed-points which form a complete lattice [12]. We use lf p ponent of solvers, there is relatively little in the conventional to denote the least fixed-point of a function and define LF P published literature. There are papers that mention the number to give the least fixed-point above a given set (S): of simplification rules used [10] or discuss their importance to particular applications [11]. However there seems to be LF P : M ono(2statesx,v → 2statesx,v ) × 2statesx,v → 2statesx,v little discussion of whether these sets are complete, whether the popular architecture (stateless / context-free rewrites im- LF P (f )(S) = lf p(λX.f (X ∪ S)) plemented as ad-hoc tree-walks) is best, whether they could or should be used during the solve process rather than just For simplicity we will consider safety properties; those before. Given the similarity to rule-based differentiation and which are false if there is an execution trace from an initial integration algorithms, there might be techniques, best prac- state to an unsafe state. P (x) is a formula describing the set tices and ideas from the computer algebra community that of safe states and thus: could revolutionise the role of expression simplification in SMT solver. F Reach = LF P (F RStep, JI(x)K) B. Fixed-Points and Approximation BReach = LF P (BRStep, J¬P (x)K) Current algorithms and verification systems are often built system safe ⇔ F Reach ⊆ JP (x)K “on top of” SMT solvers. The solver (or solvers) are the lowest system safe ⇔ BReach ∩ JI(x)K = ∅ level component in the algorithm and satisfiability queries are used as a way of checking inclusion or intersection between If a formula describing F Reach or BReach can be found, sets (represented symbolically, using equations). This architec- then showing system safety reduces to a single satisfiability ture has proven flexible and effective. However if the reasoning check. However there are no guarantees that the fixed-point system can perform more than just satisfiability queries it can can be described by a formulae and even when they can, be used at a higher level within verification algorithms. This there are few algorithms that can compute them. For example, section presents verification algorithms ‘from the top down’, consider the following two loops, with variables in N, one highlighting the other kinds of symbolic reasoning task that has a simple, computable fix-point, the other remains an open arise. question: 1) The Verification Question: We present a simple model while (i < n) { of software verification using the transition systems view. a[i] = n; This is perhaps idiomatic of the model checking community, } especially hardware model checking but it nicely illustrates some of the key challenges. while (i != 1) { Let x be a set of variables and v a set of values. A map if (i % 2 == 0) i = i/2; s : x → v that assigns value to variables is referred to as a else i = 3*1 + 1; state (in the context of transition systems) or a valuation (in } the context of logical formulae). Let statesx,v denote x → v, the space of all such maps. Critical to symbolic verification Opportunity: Finding exact fixed-points. There are many is the idea that we can use a formula (over x) to represent a interesting questions related to exact descriptions of fixed- set of states. To make the distinction between sets represented points; both theoretical (When does an expression over lan- by formulae and sets described by other means, we will use guage L have a fixed-point representable in L’? Is finding it JF (x)K to denote the set of all valuations (states) that satisfy computable?) and practical (What are algorithms to compute formula F (x). them? Are there subsets of expressions for which fixed-point A symbolic representation of a transition system requires a formula can be easily found?). formula for the initial states, I(x) and a formula that describes 2) Approximate Approachs to Verification: When faced the transition relation between sets T (x, x0 ). As our interest with a intractable or computationally expensive problem, a will be in the sets of states which are reachable from a given common approach in software verification is to approximate. set of states, we define forwards and backwards reachability If: steps: F Reach ⊂ O ∧ O ⊆ JP (x)K ⇒ system safe F RStep : 2statesx,v → 2statesx,v U ⊆ F Reach ∧ ¬(U ⊆ JP (x)K ⇒ ¬system safe F RStep(S) = S ∪ {t ∈ 2statesx,v |T (s, t) ∧ s ∈ S} BRStep : 2statesx,v → 2statesx,v Thus we have reduced the problem from computing fixed- BRStep(S) = S ∪ {t ∈ 2 statesx,v |T (t, s) ∧ s ∈ S} points to finding (formulae that describe) sets of states that approximate the fix-points. From the definition of F Reach Both of these are monotonic and increasing on 2statesx,v , a we have necessary and sufficient conditions for over and under complete lattice, so by the Knaster-Tarski theorem they have approximations: Merging of approximations is needed when multiple control flow paths converge, for example after an if statement or after LF P (F RStepo , Inito ) ⊆ O a loop. It is sufficient to take the disjunction of the two formula JI(x)K ⊆ Inito ∧ F RStep 6 F RStepo (1) but this tends to lead to unacceptable growth in formula size. Thus it is useful to be able to find the smallest O(x) or largest U ⊆ LF P (F RStepu , Initu ) U (x) such that: Initu ⊆ JI(x)K ∧ F RStepu 6 F RStep (2) O1 (x) ∨ O2 (x) ⇒ O(x) U (x) ⇒ U1 (x) ∨ U2 (x) Different approaches to verification can be seen as different Opportunity: Step-wise Approximations. Computing abstract ways of finding solutions to equations (1) or (2). Some of pre and post-images can clearly be seen in algebraic terms. these are known as property directed and make of P (x) so Quantifier elimination, either via CAD [13] or virtual term they produce an approximation that allows the safety of the substitution [14], can give exact results for these, is it possible system to be determined. For example, the Hoare logic system to modify them to give faster under or over approximate uses inductive invariants; formula Inv(x) such that: answers? Computing a convex hull (or the dual, ‘convex I(x) ⇒ Inv(x) Inv(x) ∧ T (x, x0 ) =⇒ Inv(x0 ). interior’) is a way of merging, as is formula simplification. This is a sufficient condition for (1) and has the advantage that Out of all of the opportunities high-lighted this seems to be it is stated purely in terms of formulae, reducing the problem to the one the is most directly accessible and closest to existing existential second-order logic. Bounded model checking aims work in computer algebra. using a sequence of solutions of (2), first I(x), then 4) Reducing Second-Order Quantification: The preceding sections have reduced computing system correctness to finding I(x) ∨ (I(x∗ ) ∧ T (x∗ , x)), formulae with the required properties; effectively solving exis- next tential second-order logic. Templates are a commonly used (for example [15]) technique to reduce second-order quantification I(x) ∨ (I(x ) ∧ T (x , x)) ∨ (I(x∗∗ ) ∧ T (x∗∗ , x∗ ) ∧ T (x∗ , x)), ∗ ∗ to first-order quantification. For example, using a template of and so on. These do not have fixed-points or second-order l 6 x ∧ x 6 u, computing a (property directed) invariant can variables and thus can be solved with simple satisfiability be reduced to finding (vectors of) constants l and K such that: calls. Abstract interpretation can be seen as picking F RStepo I(x) ⇒ l 6x∧x6u (abstract transformer) and Inito (initial abstract state) in 0 such a way that LF P (F RStepo , Inito ) is computable in the l 6 x ∧ x 6 u ∧ T (x, x ) ⇒ l 6 x0 ∧ x0 6 u abstract domain. l 6x∧x6u ⇒ P (x) Opportunity: Finding fixed-point approximations. As with computing exact fixed-points, there are both theoretical and which requires a solver that can handle quantification alter- practical challenges. Finding new sufficient solutions to (1) nation but is purely first-order logic. In abstract interpretation and (2), especially those without fixed-points or, ideally, terms the template can be seen as characterising an abstract second-order quantification would likely yield new algorithms. domain [15]; the one given in the example is an interval A classification theorem of what solutions exist or necessary abstraction. conditions would also be of great interest. Practically, algo- Opportunity: Template algorithms. Algorithms for working rithms that compute over or under approximations, either using with specific templates are of great utility as evidenced by the one of the sufficient conditions (such as finding invariants) or range of abstract domains that are currently in use. Algorithms directly (as abstract interpretation does) are of great interest. that can work with arbitrary templates or templates that 3) Pre-Image, Post-Image and Merging: One approach to are monotonic with respect to the constants have received computing solutions to (1) and (2) is to perform stepwise some initial investigation but there are likely to be significant approximations. This requires approximating the pre or post theoretical and practical gains to be made. Efficient implemen- image of the transition relation and being able to merge two tations of such algorithms allow user specified templates but or more approximations. Pre and post image approximations also allow templates to be chosen automatically, effectively can be described as finding the strongest (setwise smallest) refining the abstraction. O(x) or the weakest (setwise largest) U (x) that satisfies the appropriate formulae (given either P re or P ost): IV. C ONCLUSION 0 0 • Forwards Over : P re(x) ∧ T (x, x ) ⇒ O(x ) Modern software verification techniques are heavily depen- 0 0 • Forwards Under : U (x ) ⇒ P re(x) ∧ T (x, x ) dent on efficient SMT solvers. Correspondingly SMT solver 0 0 • Backwards Over : P ost(x ) ∧ T (x, x ) ⇒ O(x) development is often motivated, inspired and justified by 0 0 • Backwards Under : U (x) ⇒ P ost(x ) ∧ T (x, x ) software verification applications. It is hoped that this paper Although these involve second-order quantification (we are acts as a guide for computer algebra researchers to understand searching for formula), they do not involve fixed-points and this synergy, and appreciate some of the places algebraic are significantly more practical. approaches could be fruitfully deployed and to get involved! R EFERENCES [1] D. Detlefs, G. Nelson, and J. B. Saxe, “Simplify: A theorem prover for program checking,” J. ACM, vol. 52, no. 3, pp. 365–473, May 2005. [2] H. Ganzinger, G. Hagen, R. Nieuwenhuis, A. Oliveras, and C. Tinelli, “DPLL(T): Fast decision procedures,” ser. LNCS, R. Alur and D. A. Peled, Eds., vol. 3114. Heidelberg, Germany: Springer-Verlag, July 2004, pp. 175–188. [3] C. Barrett, P. Fontaine, and C. Tinelli, “The SMT-LIB standard: Version 2.5.” [Online]. Available: http://smt-lib.org [4] L. Hadarean, K. Bansal, D. Jovanović, C. Barrett, and C. Tinelli, A Tale of Two Solvers: Eager and Lazy Approaches to Bit-Vectors. Cham: Springer International Publishing, 2014, pp. 680–695. [5] A. Zeljić, C. M. Wintersteiger, and P. Rümmer, Deciding Bit-Vector Formulas with mcSAT. Cham: Springer International Publishing, 2016, pp. 249–266. [6] M. Brain, L. Hadarean, D. Kroening, and R. Martins, “Automatic gen- eration of propagation complete sat encodings,” in Verification, Model Checking, and Abstract Interpretation, ser. Lecture Notes in Computer Science, B. Jobstmann and M. K. R. Leino, Eds., vol. 9583. Heidelberg, Germany: Springer-Verlag, January 2016, pp. 536–556. [7] M. Brain, C. Tinelli, P. Ruemmer, and T. Wahl, “An automatable formal semantics for ieee-754 floating-point arithmetic,” SMT-LIB, Tech. Rep., 2014. [Online]. Available: http://smt-lib.org/papers/BTRW14.pdf [8] ——, “An automatable formal semantics for ieee-754 floating-point arithmetic,” in IEEE Symposium on Computer Arithmetic, 2015. [9] S. Gao, S. Kong, and E. M. Clarke, dReal: An SMT Solver for Nonlinear Theories over the Reals. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 208–214. [10] A. Cimatti, A. Griggio, B. J. Schaafsma, and R. Sebastiani, The MathSAT5 SMT Solver. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 93–107. [11] V. Ganesh and D. L. Dill, “A decision procedure for bit-vectors and arrays,” ser. Lecture Notes in Computer Science, W. Damm and H. Her- manns, Eds., vol. 4590. Heidelberg, Germany: Springer-Verlag, July 2007, pp. 519–531. [12] A. Tarski, “A lattice-theoretical fixpoint theorem and its applications.” Pacific J. Math., vol. 5, no. 2, pp. 285–309, 1955. [Online]. Available: http://projecteuclid.org/euclid.pjm/1103044538 [13] J. Davenport and M. England, “Recent Advances in Real Geometric Reasoning,” in Proceedings ADG 2014, F. Botana and P. Quaresma, Eds., 2015, pp. 37–52. [14] M. Kosta and T. Sturm, “A Generalized Framework for Virtual Substi- tution,” http://arxiv.org/abs/1501.05826, 2015. [15] M. Brain, S. Joshi, D. Kroening, and P. Schrammel, “Safety verification and refutation by k-invariants and k-induction ,” in Static Analysis, ser. Lecture Notes in Computer Science, S. Blazy and T. Jensen, Eds., no. 9291. Heidelberg, Germany: Springer-Verlag, September 2015, pp. 145–161.