=Paper= {{Paper |id=Vol-1804/paper-03 |storemode=property |title=Algebraic Techniques in Software Verification : Challenges and Opportunities |pdfUrl=https://ceur-ws.org/Vol-1804/paper-03.pdf |volume=Vol-1804 |authors=Martin Brain,Daniel Kroening,Ryan McCleeary |dblpUrl=https://dblp.org/rec/conf/synasc/BrainKM16 }} ==Algebraic Techniques in Software Verification : Challenges and Opportunities== https://ceur-ws.org/Vol-1804/paper-03.pdf

Algebraic Techniques in Software Verification :
Challenges and Opportunities
Martin Brain Daniel Kroening Ryan McCleeary
University of Oxford University of Oxford University of Iowa
martin.brain@cs.ox.ac.uk daniel.kroening@cs.ox.ac.uk ryan-mccleeary@uiowa.edu

Abstract—One of the main application areas and driving and it is not clear whether automating it is even possible. Con-
forces behind the development of Satisfiability Modulo Theory sequentally, compared to directly human generated formulae,
(SMT) solvers is software verification. The requirements of the formulae generated by verification tools are much larger
software verification are somewhat different to other applications
of automated reasoning, posing a number of challenges but and contain a significant amount of ‘simple’ and ‘irrelevant’
also providing some interesting opportunities. This paper brings parts.
together and summarises the algebras and structures of interest, Opportunity: Exploit disequalities and if-then-else. Software
along with some of the problems that are characteristic of verification problems tend to produce formulae with some
software verification. It is hoped that this will allow computer Boolean structure, particularly from the use of if-then-else
algebra researchers to assess the applicability of their techniques
to this challenging, but rewarding domain. expression to model different paths of execution. One of the
Software verification is the prototypical application domain advantages of the DPLL(T)[2] algorithm is that theory solvers
for Satisfiability Modulo Theory (SMT) solvers. There are do not need to handle the Boolean structure of the formula.
many aspects of the two research fields that show a signif- The SAT solver will provide a partial assignment to the
icant degree of co-evolution. For example, the central role theory literals and the theory solver must determine if they are
of theories (and the theories that are available – for example consistent. Unlike traditional algebra problems, this includes
bit-vectors and arrays) can be seen as a formalisation of the equalities that are assigned to false and if-then-else operations.
domain specific decision procedures that were used in early Exploiting this additional information is likely to improve
verification systems [1]. Universal quantification is challenging performance but may require (or enable) new approaches to
for most SMT solver algorithms, leading to poor performance some problems.
and thus many software verification systems avoid generating Challenge: Handle inequalities. Another feature common in
quantifiers. Likewise the importance of model generation1 is software verification problems that is not classically algebraic
in part driven but the utility of these models for providing is the use of inequalities. Although there are some applica-
execution or error traces in verification systems. The co- tions, such as equivalence checking, that are possible without
evolution can also be seen in the SMT-LIB benchmarks which inequalities, the frequency of ordering comparisons in most
feature many benchmark collections generated by verification real software means these are a critical requirement.
tools. II. A LGEBRAS AND S TRUCTURES OF I NTEREST
This paper aims to highlight some of the requirements and
As well as having formula structure unlike conventional,
‘evolutionary pressures’ that software verification places on
human generated algebraic problems, software verification
SMT solver development. It is hoped that this context will
problems also tend to use unconventional algebras and rela-
help computer algebra researchers to identify, develop and
tional structures. A minority of software verification systems
refine algorithms so that they can demonstrate impact on
use integers2 and reals to model variables, particularly those
commercial-scale software verification problems. The topics
focused on algorithm rather than program verification. How-
raised are a mix between challenges that have to be overcome
ever this approach seems to becoming less common as these
and opportunities in under-explored / critical areas.
do not capture the full behaviour of real systems (overflow,
I. F ORMULA S HAPE rounding, etc.) and also increase the complexity of the decision
procedure.
Challenge: Tolerate irrelevant formulae. The formulae gen-
erated by verification tools tend to be very different from those A. Bit-Vectors
generated by human modelling of problems. One of the real Core to the performance of SMT solvers on software
challenges of mathematical modelling is reducing the problem verification problems is their handling of the theory of fixed
to a minimal core which captures the key challenge. This is a width bit-vector[3]. Variables in this theory are interpreted as
process which requires considerable intuition and experience bit-vectors of a given size, operations are a mix of (modular)
1 Generating a model or witness is often regard as part of a satisfiability 2 Sometimes referred to as “mathematical integers” to distinguish them from
check, particularly for Boolean satisfiability but is not formally required. the integer types in programming languages.

Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
In: E. Ábrahám, J. Davenport, P. Fontaine, (eds.): Proceedings of the 1st Workshop on Satisfiability Checking and Symbolic Computation (SC2 ),
Timisoara, Romania, 02-07-2016, published at http://ceur-ws.org
arithmetic (plus, multiply, divide, etc.), logical operations (and, with e exponent and s significand bits, plus R∗ an extension
xor, shift, etc.) and structural (extract, concatinate, etc.) with of R with infinities and not-a-number. A family of order-
predicates for signed and unsigned comparisons. Although preserving functions vFe,s : Fe,s → R∗ give the real value of
there have been some interesting alternatives proposed[4], [5], a floating-point number and round picks between the adjoints
most solvers still use “bit-blasting”; converting the expressions of v and applies one, to map back into Fe,s , thus:
to a Boolean circuit and using a SAT solver. Optimal circuits
f ⊕r g ; round(r, v(f) + v(g)).
are known for some operations[6] but others such as mul-
tiplication, division and variable shifts introduce additional This approach allows bit-exact specification of floating-point
difficulty which can, in some cases, cause the solver to numbers with a minimal number of edge-cases and a maximal
time-out for even “simple” queries. Given this fundamental amount of algebraic structure.
limitation of current solvers and the significance of bit-vectors Opportunity: Algebraic techniques on R∗ . vFe,s has rela-
to application performance, better handling of these formula tively little algebraic structure – floating-point addition and
would be of major benefit. multiplication are famously non-associative. As not-a-number
Opportunity: Sub-algebras. The range of operations in the is absorbing for all operations, R∗ is not a field but it is
theory of bit-vectors makes it hard to find decision procedures an additive and multiplicative commutative monoid with the
that work well for arbitrary formulae. However for many associativity property [8] and so semi-ring decision procedures
applications, groups of variables will only ever use a handful may be applicable.
of operations. For example, variables used as counters will Opportunity: Mixed real and float. If v and round can be
only assign constants, increment and check against bounds, handled then this may not only be able to handle floating-point
variables that are used as bitmaps are rarely used in multi- problems but also mixed float and real problems which are of
plications and the bitwise operations form a Boolean algebra, considerable commercial interest.
many modern cryptographic algorithms only use add, roll and C. Differential Theories
xor (ARX) and floating-point operations make heavy use of a
max-plus algebra with signed comparison. Thus finding useful Another interest driven by control systems is that of rea-
sub-signatures of the theory of bit-vectors, identifying their soning about differential equations. Most control systems are
algebraic structure and then building decision procedures that developed with respect to a model of the environment with
exploit this structure seems like a promising research direction. which they are interacting. This is normally formalised as a
system of differential equations. However due to limitations
B. Floating-Point of the solver technology this is not directly used in the
Many software verification applications concern control verification, instead a computational model is built. If solvers
systems which must use sensor data to control a real-world could handle differential equations of some form, it could
system (aircraft, train, industrial robots, UAVs, autonomous significantly increase the ability of tools to verify these cyber-
cars, “cyber-physical systems”, etc.). Pure integer and fixed- physical systems.
point control systems are becoming a rarity with most control Opportunity: Symbolic algorithms for differential systems.
systems using floating-point. Thus there is a significant com- dReal [9] has made some impressive progress using numerical
mercial drive for verification systems to handle floating-point methods for handling differential equations. Using symbolic
numbers efficiently and effectively. algorithms has yet to be explored but may have great potential.
The so-called “standard model” converts floating-point ex- III. B EYOND S ATISFIABILITY
pressions to real expressions. For example, if f and g are
Determining formula satisfiability is the core role of SMT
floating-point variables and ⊕r denotes floating-point addition
solvers. However there are a number of similar and related
with rounding mode r:
symbolic reasoning tasks, both inside and outside the solver,
f ⊕r g ; (f + g)(1 + δ) |δ| 6 that have significant practical impact and present interesting
opportunities for applying algebraic techniques.
where is a small, format dependent constant. Although
this model is simple, it has a number of limitations. An A. Expression Simplification
extensive set of side condition are required to correctly model Expression simplification is a vital component in most
actual hardware (no overflows, no subnormal numbers, etc.) SMT solvers and program verification systems. It is used for
and often these are the precise conditions we are trying to performing constant folding ((x + 1) + 1 ; x + 2),
identify. If these are used then the standard model is an over- simplifications (x <= x ; true) and normalisation
approximation of the behaviour of hardware, meaning that (!(x >= y) ; x < y). Although the rewriting steps are
spurious SAT results can be given. Worse from an application generally simple, their effects can be drastic. For example,
point of view is that it is difficult to generate the bit-exact given unsigned integers x, y and z of more than 16 bit:
traces needed to assess, diagnose and fix real systems.
assert((x * y) * z == x * (y * z));
As a consequence, a theory of floating-point numbers[7]
has been developed and added to the SMT-LIB standard. Its will cause most bit-vector decision procedures to time out but
formalisation makes use of sets Fe,s of floating-point numbers can be trivially resolved by a distributivity aware simplifier.
Opportunity: Improved simplification For such a key com- fixed-points which form a complete lattice [12]. We use lf p
ponent of solvers, there is relatively little in the conventional to denote the least fixed-point of a function and define LF P
published literature. There are papers that mention the number to give the least fixed-point above a given set (S):
of simplification rules used [10] or discuss their importance
to particular applications [11]. However there seems to be LF P : M ono(2statesx,v → 2statesx,v ) × 2statesx,v → 2statesx,v
little discussion of whether these sets are complete, whether
the popular architecture (stateless / context-free rewrites im-
LF P (f )(S) = lf p(λX.f (X ∪ S))
plemented as ad-hoc tree-walks) is best, whether they could
or should be used during the solve process rather than just For simplicity we will consider safety properties; those
before. Given the similarity to rule-based differentiation and which are false if there is an execution trace from an initial
integration algorithms, there might be techniques, best prac- state to an unsafe state. P (x) is a formula describing the set
tices and ideas from the computer algebra community that of safe states and thus:
could revolutionise the role of expression simplification in
SMT solver. F Reach = LF P (F RStep, JI(x)K)
B. Fixed-Points and Approximation BReach = LF P (BRStep, J¬P (x)K)
Current algorithms and verification systems are often built system safe ⇔ F Reach ⊆ JP (x)K
“on top of” SMT solvers. The solver (or solvers) are the lowest system safe ⇔ BReach ∩ JI(x)K = ∅
level component in the algorithm and satisfiability queries are
used as a way of checking inclusion or intersection between If a formula describing F Reach or BReach can be found,
sets (represented symbolically, using equations). This architec- then showing system safety reduces to a single satisfiability
ture has proven flexible and effective. However if the reasoning check. However there are no guarantees that the fixed-point
system can perform more than just satisfiability queries it can can be described by a formulae and even when they can,
be used at a higher level within verification algorithms. This there are few algorithms that can compute them. For example,
section presents verification algorithms ‘from the top down’, consider the following two loops, with variables in N, one
highlighting the other kinds of symbolic reasoning task that has a simple, computable fix-point, the other remains an open
arise. question:
1) The Verification Question: We present a simple model while (i < n) {
of software verification using the transition systems view. a[i] = n;
This is perhaps idiomatic of the model checking community, }
especially hardware model checking but it nicely illustrates
some of the key challenges. while (i != 1) {
Let x be a set of variables and v a set of values. A map if (i % 2 == 0) i = i/2;
s : x → v that assigns value to variables is referred to as a else i = 3*1 + 1;
state (in the context of transition systems) or a valuation (in }
the context of logical formulae). Let statesx,v denote x → v,
the space of all such maps. Critical to symbolic verification Opportunity: Finding exact fixed-points. There are many
is the idea that we can use a formula (over x) to represent a interesting questions related to exact descriptions of fixed-
set of states. To make the distinction between sets represented points; both theoretical (When does an expression over lan-
by formulae and sets described by other means, we will use guage L have a fixed-point representable in L’? Is finding it
JF (x)K to denote the set of all valuations (states) that satisfy computable?) and practical (What are algorithms to compute
formula F (x). them? Are there subsets of expressions for which fixed-point
A symbolic representation of a transition system requires a formula can be easily found?).
formula for the initial states, I(x) and a formula that describes 2) Approximate Approachs to Verification: When faced
the transition relation between sets T (x, x0 ). As our interest with a intractable or computationally expensive problem, a
will be in the sets of states which are reachable from a given common approach in software verification is to approximate.
set of states, we define forwards and backwards reachability If:
steps:
F Reach ⊂ O ∧ O ⊆ JP (x)K ⇒ system safe
F RStep : 2statesx,v → 2statesx,v
U ⊆ F Reach ∧ ¬(U ⊆ JP (x)K ⇒ ¬system safe
F RStep(S) = S ∪ {t ∈ 2statesx,v |T (s, t) ∧ s ∈ S}
BRStep : 2statesx,v → 2statesx,v Thus we have reduced the problem from computing fixed-
BRStep(S) = S ∪ {t ∈ 2 statesx,v
|T (t, s) ∧ s ∈ S} points to finding (formulae that describe) sets of states that
approximate the fix-points. From the definition of F Reach
Both of these are monotonic and increasing on 2statesx,v , a we have necessary and sufficient conditions for over and under
complete lattice, so by the Knaster-Tarski theorem they have approximations:
Merging of approximations is needed when multiple control
flow paths converge, for example after an if statement or after
LF P (F RStepo , Inito ) ⊆ O
a loop. It is sufficient to take the disjunction of the two formula
JI(x)K ⊆ Inito ∧ F RStep 6 F RStepo (1) but this tends to lead to unacceptable growth in formula size.
Thus it is useful to be able to find the smallest O(x) or largest
U ⊆ LF P (F RStepu , Initu ) U (x) such that:
Initu ⊆ JI(x)K ∧ F RStepu 6 F RStep (2) O1 (x) ∨ O2 (x) ⇒ O(x) U (x) ⇒ U1 (x) ∨ U2 (x)
Different approaches to verification can be seen as different
Opportunity: Step-wise Approximations. Computing abstract
ways of finding solutions to equations (1) or (2). Some of
pre and post-images can clearly be seen in algebraic terms.
these are known as property directed and make of P (x) so
Quantifier elimination, either via CAD [13] or virtual term
they produce an approximation that allows the safety of the
substitution [14], can give exact results for these, is it possible
system to be determined. For example, the Hoare logic system
to modify them to give faster under or over approximate
uses inductive invariants; formula Inv(x) such that:
answers? Computing a convex hull (or the dual, ‘convex
I(x) ⇒ Inv(x) Inv(x) ∧ T (x, x0 ) =⇒ Inv(x0 ). interior’) is a way of merging, as is formula simplification.
This is a sufficient condition for (1) and has the advantage that Out of all of the opportunities high-lighted this seems to be
it is stated purely in terms of formulae, reducing the problem to the one the is most directly accessible and closest to existing
existential second-order logic. Bounded model checking aims work in computer algebra.
using a sequence of solutions of (2), first I(x), then 4) Reducing Second-Order Quantification: The preceding
sections have reduced computing system correctness to finding
I(x) ∨ (I(x∗ ) ∧ T (x∗ , x)), formulae with the required properties; effectively solving exis-
next tential second-order logic. Templates are a commonly used (for
example [15]) technique to reduce second-order quantification
I(x) ∨ (I(x ) ∧ T (x , x)) ∨ (I(x∗∗ ) ∧ T (x∗∗ , x∗ ) ∧ T (x∗ , x)),
∗ ∗
to first-order quantification. For example, using a template of
and so on. These do not have fixed-points or second-order l 6 x ∧ x 6 u, computing a (property directed) invariant can
variables and thus can be solved with simple satisfiability be reduced to finding (vectors of) constants l and K such that:
calls. Abstract interpretation can be seen as picking F RStepo
I(x) ⇒ l 6x∧x6u
(abstract transformer) and Inito (initial abstract state) in 0
such a way that LF P (F RStepo , Inito ) is computable in the l 6 x ∧ x 6 u ∧ T (x, x ) ⇒ l 6 x0 ∧ x0 6 u
abstract domain. l 6x∧x6u ⇒ P (x)
Opportunity: Finding fixed-point approximations. As with
computing exact fixed-points, there are both theoretical and which requires a solver that can handle quantification alter-
practical challenges. Finding new sufficient solutions to (1) nation but is purely first-order logic. In abstract interpretation
and (2), especially those without fixed-points or, ideally, terms the template can be seen as characterising an abstract
second-order quantification would likely yield new algorithms. domain [15]; the one given in the example is an interval
A classification theorem of what solutions exist or necessary abstraction.
conditions would also be of great interest. Practically, algo- Opportunity: Template algorithms. Algorithms for working
rithms that compute over or under approximations, either using with specific templates are of great utility as evidenced by the
one of the sufficient conditions (such as finding invariants) or range of abstract domains that are currently in use. Algorithms
directly (as abstract interpretation does) are of great interest. that can work with arbitrary templates or templates that
3) Pre-Image, Post-Image and Merging: One approach to are monotonic with respect to the constants have received
computing solutions to (1) and (2) is to perform stepwise some initial investigation but there are likely to be significant
approximations. This requires approximating the pre or post theoretical and practical gains to be made. Efficient implemen-
image of the transition relation and being able to merge two tations of such algorithms allow user specified templates but
or more approximations. Pre and post image approximations also allow templates to be chosen automatically, effectively
can be described as finding the strongest (setwise smallest) refining the abstraction.
O(x) or the weakest (setwise largest) U (x) that satisfies the
appropriate formulae (given either P re or P ost): IV. C ONCLUSION
0 0
• Forwards Over : P re(x) ∧ T (x, x ) ⇒ O(x ) Modern software verification techniques are heavily depen-
0 0
• Forwards Under : U (x ) ⇒ P re(x) ∧ T (x, x ) dent on efficient SMT solvers. Correspondingly SMT solver
0 0
• Backwards Over : P ost(x ) ∧ T (x, x ) ⇒ O(x) development is often motivated, inspired and justified by
0 0
• Backwards Under : U (x) ⇒ P ost(x ) ∧ T (x, x ) software verification applications. It is hoped that this paper
Although these involve second-order quantification (we are acts as a guide for computer algebra researchers to understand
searching for formula), they do not involve fixed-points and this synergy, and appreciate some of the places algebraic
are significantly more practical. approaches could be fruitfully deployed and to get involved!
R EFERENCES
[1] D. Detlefs, G. Nelson, and J. B. Saxe, “Simplify: A theorem prover for
program checking,” J. ACM, vol. 52, no. 3, pp. 365–473, May 2005.
[2] H. Ganzinger, G. Hagen, R. Nieuwenhuis, A. Oliveras, and C. Tinelli,
“DPLL(T): Fast decision procedures,” ser. LNCS, R. Alur and D. A.
Peled, Eds., vol. 3114. Heidelberg, Germany: Springer-Verlag, July
2004, pp. 175–188.
[3] C. Barrett, P. Fontaine, and C. Tinelli, “The SMT-LIB standard: Version
2.5.” [Online]. Available: http://smt-lib.org
[4] L. Hadarean, K. Bansal, D. Jovanović, C. Barrett, and C. Tinelli, A Tale
of Two Solvers: Eager and Lazy Approaches to Bit-Vectors. Cham:
Springer International Publishing, 2014, pp. 680–695.
[5] A. Zeljić, C. M. Wintersteiger, and P. Rümmer, Deciding Bit-Vector
Formulas with mcSAT. Cham: Springer International Publishing, 2016,
pp. 249–266.
[6] M. Brain, L. Hadarean, D. Kroening, and R. Martins, “Automatic gen-
eration of propagation complete sat encodings,” in Verification, Model
Checking, and Abstract Interpretation, ser. Lecture Notes in Computer
Science, B. Jobstmann and M. K. R. Leino, Eds., vol. 9583. Heidelberg,
Germany: Springer-Verlag, January 2016, pp. 536–556.
[7] M. Brain, C. Tinelli, P. Ruemmer, and T. Wahl, “An automatable formal
semantics for ieee-754 floating-point arithmetic,” SMT-LIB, Tech. Rep.,
2014. [Online]. Available: http://smt-lib.org/papers/BTRW14.pdf
[8] ——, “An automatable formal semantics for ieee-754 floating-point
arithmetic,” in IEEE Symposium on Computer Arithmetic, 2015.
[9] S. Gao, S. Kong, and E. M. Clarke, dReal: An SMT Solver for
Nonlinear Theories over the Reals. Berlin, Heidelberg: Springer Berlin
Heidelberg, 2013, pp. 208–214.
[10] A. Cimatti, A. Griggio, B. J. Schaafsma, and R. Sebastiani, The
MathSAT5 SMT Solver. Berlin, Heidelberg: Springer Berlin Heidelberg,
2013, pp. 93–107.
[11] V. Ganesh and D. L. Dill, “A decision procedure for bit-vectors and
arrays,” ser. Lecture Notes in Computer Science, W. Damm and H. Her-
manns, Eds., vol. 4590. Heidelberg, Germany: Springer-Verlag, July
2007, pp. 519–531.
[12] A. Tarski, “A lattice-theoretical fixpoint theorem and its applications.”
Pacific J. Math., vol. 5, no. 2, pp. 285–309, 1955. [Online]. Available:
http://projecteuclid.org/euclid.pjm/1103044538
[13] J. Davenport and M. England, “Recent Advances in Real Geometric
Reasoning,” in Proceedings ADG 2014, F. Botana and P. Quaresma,
Eds., 2015, pp. 37–52.
[14] M. Kosta and T. Sturm, “A Generalized Framework for Virtual Substi-
tution,” http://arxiv.org/abs/1501.05826, 2015.
[15] M. Brain, S. Joshi, D. Kroening, and P. Schrammel, “Safety verification
and refutation by k-invariants and k-induction ,” in Static Analysis, ser.
Lecture Notes in Computer Science, S. Blazy and T. Jensen, Eds., no.
9291. Heidelberg, Germany: Springer-Verlag, September 2015, pp.
145–161.