=Paper= {{Paper |id=Vol-1804/paper-03 |storemode=property |title=Algebraic Techniques in Software Verification : Challenges and Opportunities |pdfUrl=https://ceur-ws.org/Vol-1804/paper-03.pdf |volume=Vol-1804 |authors=Martin Brain,Daniel Kroening,Ryan McCleeary |dblpUrl=https://dblp.org/rec/conf/synasc/BrainKM16 }} ==Algebraic Techniques in Software Verification : Challenges and Opportunities== https://ceur-ws.org/Vol-1804/paper-03.pdf
        Algebraic Techniques in Software Verification :
                 Challenges and Opportunities
                     Martin Brain                                 Daniel Kroening                                  Ryan McCleeary
                University of Oxford                           University of Oxford                              University of Iowa
               martin.brain@cs.ox.ac.uk                     daniel.kroening@cs.ox.ac.uk                      ryan-mccleeary@uiowa.edu



    Abstract—One of the main application areas and driving                       and it is not clear whether automating it is even possible. Con-
 forces behind the development of Satisfiability Modulo Theory                   sequentally, compared to directly human generated formulae,
 (SMT) solvers is software verification. The requirements of                     the formulae generated by verification tools are much larger
 software verification are somewhat different to other applications
 of automated reasoning, posing a number of challenges but                       and contain a significant amount of ‘simple’ and ‘irrelevant’
 also providing some interesting opportunities. This paper brings                parts.
 together and summarises the algebras and structures of interest,                   Opportunity: Exploit disequalities and if-then-else. Software
 along with some of the problems that are characteristic of                      verification problems tend to produce formulae with some
 software verification. It is hoped that this will allow computer                Boolean structure, particularly from the use of if-then-else
 algebra researchers to assess the applicability of their techniques
 to this challenging, but rewarding domain.                                      expression to model different paths of execution. One of the
    Software verification is the prototypical application domain                 advantages of the DPLL(T)[2] algorithm is that theory solvers
 for Satisfiability Modulo Theory (SMT) solvers. There are                       do not need to handle the Boolean structure of the formula.
 many aspects of the two research fields that show a signif-                     The SAT solver will provide a partial assignment to the
 icant degree of co-evolution. For example, the central role                     theory literals and the theory solver must determine if they are
 of theories (and the theories that are available – for example                  consistent. Unlike traditional algebra problems, this includes
 bit-vectors and arrays) can be seen as a formalisation of the                   equalities that are assigned to false and if-then-else operations.
 domain specific decision procedures that were used in early                     Exploiting this additional information is likely to improve
 verification systems [1]. Universal quantification is challenging               performance but may require (or enable) new approaches to
 for most SMT solver algorithms, leading to poor performance                     some problems.
 and thus many software verification systems avoid generating                       Challenge: Handle inequalities. Another feature common in
 quantifiers. Likewise the importance of model generation1 is                    software verification problems that is not classically algebraic
 in part driven but the utility of these models for providing                    is the use of inequalities. Although there are some applica-
 execution or error traces in verification systems. The co-                      tions, such as equivalence checking, that are possible without
 evolution can also be seen in the SMT-LIB benchmarks which                      inequalities, the frequency of ordering comparisons in most
 feature many benchmark collections generated by verification                    real software means these are a critical requirement.
 tools.                                                                                 II. A LGEBRAS AND S TRUCTURES OF I NTEREST
    This paper aims to highlight some of the requirements and
                                                                                    As well as having formula structure unlike conventional,
 ‘evolutionary pressures’ that software verification places on
                                                                                 human generated algebraic problems, software verification
 SMT solver development. It is hoped that this context will
                                                                                 problems also tend to use unconventional algebras and rela-
 help computer algebra researchers to identify, develop and
                                                                                 tional structures. A minority of software verification systems
 refine algorithms so that they can demonstrate impact on
                                                                                 use integers2 and reals to model variables, particularly those
 commercial-scale software verification problems. The topics
                                                                                 focused on algorithm rather than program verification. How-
 raised are a mix between challenges that have to be overcome
                                                                                 ever this approach seems to becoming less common as these
 and opportunities in under-explored / critical areas.
                                                                                 do not capture the full behaviour of real systems (overflow,
                          I. F ORMULA S HAPE                                     rounding, etc.) and also increase the complexity of the decision
                                                                                 procedure.
    Challenge: Tolerate irrelevant formulae. The formulae gen-
 erated by verification tools tend to be very different from those               A. Bit-Vectors
 generated by human modelling of problems. One of the real                          Core to the performance of SMT solvers on software
 challenges of mathematical modelling is reducing the problem                    verification problems is their handling of the theory of fixed
 to a minimal core which captures the key challenge. This is a                   width bit-vector[3]. Variables in this theory are interpreted as
 process which requires considerable intuition and experience                    bit-vectors of a given size, operations are a mix of (modular)
   1 Generating a model or witness is often regard as part of a satisfiability     2 Sometimes referred to as “mathematical integers” to distinguish them from
 check, particularly for Boolean satisfiability but is not formally required.    the integer types in programming languages.


Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
In: E. Ábrahám, J. Davenport, P. Fontaine, (eds.): Proceedings of the 1st Workshop on Satisfiability Checking and Symbolic Computation (SC2 ),
Timisoara, Romania, 02-07-2016, published at http://ceur-ws.org
arithmetic (plus, multiply, divide, etc.), logical operations (and,   with e exponent and s significand bits, plus R∗ an extension
xor, shift, etc.) and structural (extract, concatinate, etc.) with    of R with infinities and not-a-number. A family of order-
predicates for signed and unsigned comparisons. Although              preserving functions vFe,s : Fe,s → R∗ give the real value of
there have been some interesting alternatives proposed[4], [5],       a floating-point number and round picks between the adjoints
most solvers still use “bit-blasting”; converting the expressions     of v and applies one, to map back into Fe,s , thus:
to a Boolean circuit and using a SAT solver. Optimal circuits
                                                                             f ⊕r g          ;         round(r, v(f) + v(g)).
are known for some operations[6] but others such as mul-
tiplication, division and variable shifts introduce additional        This approach allows bit-exact specification of floating-point
difficulty which can, in some cases, cause the solver to              numbers with a minimal number of edge-cases and a maximal
time-out for even “simple” queries. Given this fundamental            amount of algebraic structure.
limitation of current solvers and the significance of bit-vectors        Opportunity: Algebraic techniques on R∗ . vFe,s has rela-
to application performance, better handling of these formula          tively little algebraic structure – floating-point addition and
would be of major benefit.                                            multiplication are famously non-associative. As not-a-number
   Opportunity: Sub-algebras. The range of operations in the          is absorbing for all operations, R∗ is not a field but it is
theory of bit-vectors makes it hard to find decision procedures       an additive and multiplicative commutative monoid with the
that work well for arbitrary formulae. However for many               associativity property [8] and so semi-ring decision procedures
applications, groups of variables will only ever use a handful        may be applicable.
of operations. For example, variables used as counters will              Opportunity: Mixed real and float. If v and round can be
only assign constants, increment and check against bounds,            handled then this may not only be able to handle floating-point
variables that are used as bitmaps are rarely used in multi-          problems but also mixed float and real problems which are of
plications and the bitwise operations form a Boolean algebra,         considerable commercial interest.
many modern cryptographic algorithms only use add, roll and           C. Differential Theories
xor (ARX) and floating-point operations make heavy use of a
max-plus algebra with signed comparison. Thus finding useful             Another interest driven by control systems is that of rea-
sub-signatures of the theory of bit-vectors, identifying their        soning about differential equations. Most control systems are
algebraic structure and then building decision procedures that        developed with respect to a model of the environment with
exploit this structure seems like a promising research direction.     which they are interacting. This is normally formalised as a
                                                                      system of differential equations. However due to limitations
B. Floating-Point                                                     of the solver technology this is not directly used in the
   Many software verification applications concern control            verification, instead a computational model is built. If solvers
systems which must use sensor data to control a real-world            could handle differential equations of some form, it could
system (aircraft, train, industrial robots, UAVs, autonomous          significantly increase the ability of tools to verify these cyber-
cars, “cyber-physical systems”, etc.). Pure integer and fixed-        physical systems.
point control systems are becoming a rarity with most control            Opportunity: Symbolic algorithms for differential systems.
systems using floating-point. Thus there is a significant com-        dReal [9] has made some impressive progress using numerical
mercial drive for verification systems to handle floating-point       methods for handling differential equations. Using symbolic
numbers efficiently and effectively.                                  algorithms has yet to be explored but may have great potential.
   The so-called “standard model” converts floating-point ex-                         III. B EYOND S ATISFIABILITY
pressions to real expressions. For example, if f and g are
                                                                        Determining formula satisfiability is the core role of SMT
floating-point variables and ⊕r denotes floating-point addition
                                                                      solvers. However there are a number of similar and related
with rounding mode r:
                                                                      symbolic reasoning tasks, both inside and outside the solver,
    f ⊕r g          ;          (f + g)(1 + δ)          |δ| 6         that have significant practical impact and present interesting
                                                                      opportunities for applying algebraic techniques.
where  is a small, format dependent constant. Although
this model is simple, it has a number of limitations. An              A. Expression Simplification
extensive set of side condition are required to correctly model          Expression simplification is a vital component in most
actual hardware (no overflows, no subnormal numbers, etc.)            SMT solvers and program verification systems. It is used for
and often these are the precise conditions we are trying to           performing constant folding ((x + 1) + 1 ; x + 2),
identify. If these are used then the standard model is an over-       simplifications (x <= x ; true) and normalisation
approximation of the behaviour of hardware, meaning that              (!(x >= y) ; x < y). Although the rewriting steps are
spurious SAT results can be given. Worse from an application          generally simple, their effects can be drastic. For example,
point of view is that it is difficult to generate the bit-exact       given unsigned integers x, y and z of more than 16 bit:
traces needed to assess, diagnose and fix real systems.
                                                                           assert((x * y) * z == x * (y * z));
   As a consequence, a theory of floating-point numbers[7]
has been developed and added to the SMT-LIB standard. Its             will cause most bit-vector decision procedures to time out but
formalisation makes use of sets Fe,s of floating-point numbers        can be trivially resolved by a distributivity aware simplifier.
    Opportunity: Improved simplification For such a key com-        fixed-points which form a complete lattice [12]. We use lf p
ponent of solvers, there is relatively little in the conventional   to denote the least fixed-point of a function and define LF P
published literature. There are papers that mention the number      to give the least fixed-point above a given set (S):
of simplification rules used [10] or discuss their importance
to particular applications [11]. However there seems to be          LF P : M ono(2statesx,v → 2statesx,v ) × 2statesx,v → 2statesx,v
little discussion of whether these sets are complete, whether
the popular architecture (stateless / context-free rewrites im-
                                                                               LF P (f )(S)    =    lf p(λX.f (X ∪ S))
plemented as ad-hoc tree-walks) is best, whether they could
or should be used during the solve process rather than just            For simplicity we will consider safety properties; those
before. Given the similarity to rule-based differentiation and      which are false if there is an execution trace from an initial
integration algorithms, there might be techniques, best prac-       state to an unsafe state. P (x) is a formula describing the set
tices and ideas from the computer algebra community that            of safe states and thus:
could revolutionise the role of expression simplification in
SMT solver.                                                                      F Reach      =    LF P (F RStep, JI(x)K)
B. Fixed-Points and Approximation                                                BReach       =    LF P (BRStep, J¬P (x)K)
   Current algorithms and verification systems are often built            system safe         ⇔ F Reach ⊆ JP (x)K
“on top of” SMT solvers. The solver (or solvers) are the lowest           system safe         ⇔ BReach ∩ JI(x)K = ∅
level component in the algorithm and satisfiability queries are
used as a way of checking inclusion or intersection between           If a formula describing F Reach or BReach can be found,
sets (represented symbolically, using equations). This architec-    then showing system safety reduces to a single satisfiability
ture has proven flexible and effective. However if the reasoning    check. However there are no guarantees that the fixed-point
system can perform more than just satisfiability queries it can     can be described by a formulae and even when they can,
be used at a higher level within verification algorithms. This      there are few algorithms that can compute them. For example,
section presents verification algorithms ‘from the top down’,       consider the following two loops, with variables in N, one
highlighting the other kinds of symbolic reasoning task that        has a simple, computable fix-point, the other remains an open
arise.                                                              question:
   1) The Verification Question: We present a simple model          while (i < n) {
of software verification using the transition systems view.           a[i] = n;
This is perhaps idiomatic of the model checking community,          }
especially hardware model checking but it nicely illustrates
some of the key challenges.                                         while (i != 1) {
   Let x be a set of variables and v a set of values. A map           if (i % 2 == 0) i = i/2;
s : x → v that assigns value to variables is referred to as a         else i = 3*1 + 1;
state (in the context of transition systems) or a valuation (in     }
the context of logical formulae). Let statesx,v denote x → v,
the space of all such maps. Critical to symbolic verification           Opportunity: Finding exact fixed-points. There are many
is the idea that we can use a formula (over x) to represent a       interesting questions related to exact descriptions of fixed-
set of states. To make the distinction between sets represented     points; both theoretical (When does an expression over lan-
by formulae and sets described by other means, we will use          guage L have a fixed-point representable in L’? Is finding it
JF (x)K to denote the set of all valuations (states) that satisfy   computable?) and practical (What are algorithms to compute
formula F (x).                                                      them? Are there subsets of expressions for which fixed-point
   A symbolic representation of a transition system requires a      formula can be easily found?).
formula for the initial states, I(x) and a formula that describes       2) Approximate Approachs to Verification: When faced
the transition relation between sets T (x, x0 ). As our interest    with a intractable or computationally expensive problem, a
will be in the sets of states which are reachable from a given      common approach in software verification is to approximate.
set of states, we define forwards and backwards reachability        If:
steps:
                                                                         F Reach ⊂ O ∧ O ⊆ JP (x)K ⇒ system safe
                       F RStep : 2statesx,v → 2statesx,v
                                                                      U ⊆ F Reach ∧ ¬(U ⊆ JP (x)K ⇒           ¬system safe
   F RStep(S)      =   S ∪ {t ∈ 2statesx,v |T (s, t) ∧ s ∈ S}
                       BRStep : 2statesx,v → 2statesx,v               Thus we have reduced the problem from computing fixed-
   BRStep(S)       =   S ∪ {t ∈ 2 statesx,v
                                              |T (t, s) ∧ s ∈ S}    points to finding (formulae that describe) sets of states that
                                                                    approximate the fix-points. From the definition of F Reach
Both of these are monotonic and increasing on 2statesx,v , a        we have necessary and sufficient conditions for over and under
complete lattice, so by the Knaster-Tarski theorem they have        approximations:
                                                                          Merging of approximations is needed when multiple control
                                                                       flow paths converge, for example after an if statement or after
                     LF P (F RStepo , Inito ) ⊆ O
                                                                       a loop. It is sufficient to take the disjunction of the two formula
         JI(x)K ⊆ Inito ∧ F RStep 6 F RStepo                    (1)    but this tends to lead to unacceptable growth in formula size.
                                                                       Thus it is useful to be able to find the smallest O(x) or largest
                     U ⊆ LF P (F RStepu , Initu )                      U (x) such that:
         Initu ⊆ JI(x)K ∧ F RStepu 6 F RStep                    (2)        O1 (x) ∨ O2 (x) ⇒ O(x)          U (x) ⇒ U1 (x) ∨ U2 (x)
   Different approaches to verification can be seen as different
                                                                          Opportunity: Step-wise Approximations. Computing abstract
ways of finding solutions to equations (1) or (2). Some of
                                                                       pre and post-images can clearly be seen in algebraic terms.
these are known as property directed and make of P (x) so
                                                                       Quantifier elimination, either via CAD [13] or virtual term
they produce an approximation that allows the safety of the
                                                                       substitution [14], can give exact results for these, is it possible
system to be determined. For example, the Hoare logic system
                                                                       to modify them to give faster under or over approximate
uses inductive invariants; formula Inv(x) such that:
                                                                       answers? Computing a convex hull (or the dual, ‘convex
  I(x) ⇒ Inv(x)            Inv(x) ∧ T (x, x0 ) =⇒ Inv(x0 ).            interior’) is a way of merging, as is formula simplification.
This is a sufficient condition for (1) and has the advantage that      Out of all of the opportunities high-lighted this seems to be
it is stated purely in terms of formulae, reducing the problem to      the one the is most directly accessible and closest to existing
existential second-order logic. Bounded model checking aims            work in computer algebra.
using a sequence of solutions of (2), first I(x), then                    4) Reducing Second-Order Quantification: The preceding
                                                                       sections have reduced computing system correctness to finding
                   I(x) ∨ (I(x∗ ) ∧ T (x∗ , x)),                       formulae with the required properties; effectively solving exis-
                               next                                    tential second-order logic. Templates are a commonly used (for
                                                                       example [15]) technique to reduce second-order quantification
I(x) ∨ (I(x ) ∧ T (x , x)) ∨ (I(x∗∗ ) ∧ T (x∗∗ , x∗ ) ∧ T (x∗ , x)),
            ∗         ∗
                                                                       to first-order quantification. For example, using a template of
and so on. These do not have fixed-points or second-order              l 6 x ∧ x 6 u, computing a (property directed) invariant can
variables and thus can be solved with simple satisfiability            be reduced to finding (vectors of) constants l and K such that:
calls. Abstract interpretation can be seen as picking F RStepo
                                                                                                    I(x) ⇒       l 6x∧x6u
(abstract transformer) and Inito (initial abstract state) in                                           0
such a way that LF P (F RStepo , Inito ) is computable in the                 l 6 x ∧ x 6 u ∧ T (x, x ) ⇒        l 6 x0 ∧ x0 6 u
abstract domain.                                                                        l 6x∧x6u ⇒               P (x)
   Opportunity: Finding fixed-point approximations. As with
computing exact fixed-points, there are both theoretical and           which requires a solver that can handle quantification alter-
practical challenges. Finding new sufficient solutions to (1)          nation but is purely first-order logic. In abstract interpretation
and (2), especially those without fixed-points or, ideally,            terms the template can be seen as characterising an abstract
second-order quantification would likely yield new algorithms.         domain [15]; the one given in the example is an interval
A classification theorem of what solutions exist or necessary          abstraction.
conditions would also be of great interest. Practically, algo-            Opportunity: Template algorithms. Algorithms for working
rithms that compute over or under approximations, either using         with specific templates are of great utility as evidenced by the
one of the sufficient conditions (such as finding invariants) or       range of abstract domains that are currently in use. Algorithms
directly (as abstract interpretation does) are of great interest.      that can work with arbitrary templates or templates that
   3) Pre-Image, Post-Image and Merging: One approach to               are monotonic with respect to the constants have received
computing solutions to (1) and (2) is to perform stepwise              some initial investigation but there are likely to be significant
approximations. This requires approximating the pre or post            theoretical and practical gains to be made. Efficient implemen-
image of the transition relation and being able to merge two           tations of such algorithms allow user specified templates but
or more approximations. Pre and post image approximations              also allow templates to be chosen automatically, effectively
can be described as finding the strongest (setwise smallest)           refining the abstraction.
O(x) or the weakest (setwise largest) U (x) that satisfies the
appropriate formulae (given either P re or P ost):                                            IV. C ONCLUSION
                                         0         0
   • Forwards Over : P re(x) ∧ T (x, x ) ⇒ O(x )                          Modern software verification techniques are heavily depen-
                              0                      0
   • Forwards Under : U (x ) ⇒ P re(x) ∧ T (x, x )                     dent on efficient SMT solvers. Correspondingly SMT solver
                                  0           0
   • Backwards Over : P ost(x ) ∧ T (x, x ) ⇒ O(x)                     development is often motivated, inspired and justified by
                                            0          0
   • Backwards Under : U (x) ⇒ P ost(x ) ∧ T (x, x )                   software verification applications. It is hoped that this paper
Although these involve second-order quantification (we are             acts as a guide for computer algebra researchers to understand
searching for formula), they do not involve fixed-points and           this synergy, and appreciate some of the places algebraic
are significantly more practical.                                      approaches could be fruitfully deployed and to get involved!
                              R EFERENCES
 [1] D. Detlefs, G. Nelson, and J. B. Saxe, “Simplify: A theorem prover for
     program checking,” J. ACM, vol. 52, no. 3, pp. 365–473, May 2005.
 [2] H. Ganzinger, G. Hagen, R. Nieuwenhuis, A. Oliveras, and C. Tinelli,
     “DPLL(T): Fast decision procedures,” ser. LNCS, R. Alur and D. A.
     Peled, Eds., vol. 3114. Heidelberg, Germany: Springer-Verlag, July
     2004, pp. 175–188.
 [3] C. Barrett, P. Fontaine, and C. Tinelli, “The SMT-LIB standard: Version
     2.5.” [Online]. Available: http://smt-lib.org
 [4] L. Hadarean, K. Bansal, D. Jovanović, C. Barrett, and C. Tinelli, A Tale
     of Two Solvers: Eager and Lazy Approaches to Bit-Vectors. Cham:
     Springer International Publishing, 2014, pp. 680–695.
 [5] A. Zeljić, C. M. Wintersteiger, and P. Rümmer, Deciding Bit-Vector
     Formulas with mcSAT. Cham: Springer International Publishing, 2016,
     pp. 249–266.
 [6] M. Brain, L. Hadarean, D. Kroening, and R. Martins, “Automatic gen-
     eration of propagation complete sat encodings,” in Verification, Model
     Checking, and Abstract Interpretation, ser. Lecture Notes in Computer
     Science, B. Jobstmann and M. K. R. Leino, Eds., vol. 9583. Heidelberg,
     Germany: Springer-Verlag, January 2016, pp. 536–556.
 [7] M. Brain, C. Tinelli, P. Ruemmer, and T. Wahl, “An automatable formal
     semantics for ieee-754 floating-point arithmetic,” SMT-LIB, Tech. Rep.,
     2014. [Online]. Available: http://smt-lib.org/papers/BTRW14.pdf
 [8] ——, “An automatable formal semantics for ieee-754 floating-point
     arithmetic,” in IEEE Symposium on Computer Arithmetic, 2015.
 [9] S. Gao, S. Kong, and E. M. Clarke, dReal: An SMT Solver for
     Nonlinear Theories over the Reals. Berlin, Heidelberg: Springer Berlin
     Heidelberg, 2013, pp. 208–214.
[10] A. Cimatti, A. Griggio, B. J. Schaafsma, and R. Sebastiani, The
     MathSAT5 SMT Solver. Berlin, Heidelberg: Springer Berlin Heidelberg,
     2013, pp. 93–107.
[11] V. Ganesh and D. L. Dill, “A decision procedure for bit-vectors and
     arrays,” ser. Lecture Notes in Computer Science, W. Damm and H. Her-
     manns, Eds., vol. 4590. Heidelberg, Germany: Springer-Verlag, July
     2007, pp. 519–531.
[12] A. Tarski, “A lattice-theoretical fixpoint theorem and its applications.”
     Pacific J. Math., vol. 5, no. 2, pp. 285–309, 1955. [Online]. Available:
     http://projecteuclid.org/euclid.pjm/1103044538
[13] J. Davenport and M. England, “Recent Advances in Real Geometric
     Reasoning,” in Proceedings ADG 2014, F. Botana and P. Quaresma,
     Eds., 2015, pp. 37–52.
[14] M. Kosta and T. Sturm, “A Generalized Framework for Virtual Substi-
     tution,” http://arxiv.org/abs/1501.05826, 2015.
[15] M. Brain, S. Joshi, D. Kroening, and P. Schrammel, “Safety verification
     and refutation by k-invariants and k-induction ,” in Static Analysis, ser.
     Lecture Notes in Computer Science, S. Blazy and T. Jensen, Eds., no.
     9291. Heidelberg, Germany: Springer-Verlag, September 2015, pp.
     145–161.