Satisfaction Meets Practice and Confidence Tom Bienmüller Tino Teige SVP Products Chief Research Engineer Innovation & Technology BTC Embedded Systems AG BTC Embedded Systems AG Oldenburg, Germany Oldenburg, Germany bienmueller@btc-es.de teige@btc-es.de Abstract—The industrial application of formal methods and These two partners and BTC-ES strive to providing products in particular of software verification tools, e.g. based on and services for modeling, auto code generation, test satisfiability checking and symbolic computation as being in the automation, and formal verification to customers. dSPACE main focus of the SC2 network, necessitates two main GmbH and BTC-ES support the tool chain Matlab®, requirements. The methods and tools, first, need to actually aim Simulink®, Stateflow® and dSPACE’s TargetLink®. IBM at the problem class that occur in practice and, second, have to Rational and BTC-ES provide solutions for the tool chains guarantee a high level of confidence. In this position paper, we concerning the modeling tools Statemate® and Rhapsody® raise two challenges in the domain of software verification: UML. Together with dSPACE GmbH, BTC-ES has a strong 1) enhancing the field of application of software verification tools focus on the automotive domain with key customers in in practice by means of efficient support of IEEE-754-based floating-point models and 2) enhancing confidence of software Germany and Japan. Due to the cooperation with IBM verification tools by means of generating certificates for their Rational the BTC-ES products are also widespread across computed analysis results. several other domains, e.g. aerospace, defense, rail, telecommunication and consumer. The core competences of Keywords—formal specification; formal verification; model BTC Embedded Systems AG are: checking; floating-point; proof certificates  model-driven development process I. INTRODUCTION  formal specification and verification of requirements As an industrial associate partner of the SC2 network, we  automatic test case generation for models and certainly play a minor role in the academic research activities requirements performed in that network. Even though having a matching academic background for this network, the benefit we see in  automatic test case generation based on code our participation lays merely in the contribution we can give:  automatic test case, trace, and vector execution show which topics are very close to the emerging needs real end users have who successfully apply formal test and  model and code coverage verification tools already today. This could enable researchers to align their focus of work. Furthermore, as we as a tool III. USE CASES vendor are close to real world applications, we can also provide data on these models coming from practice, allowing As tool provider for automated testing and formal academic partners to validate the efficiency of their verification methods in safety critical development BTC-ES is approaches in a real world setting. The benefit for us is challenged to keep pace with the increasing demands from end obvious: getting in touch with leading researchers in a very users. There are two major aspects relevant to the SC2 important domain and get insight into ideas and technologies network: on how to answer to the emerging needs from end users.  enhanced C-Code language support derived from the increasing use of IEEE-754-based floating-point II. BTC EMBEDDED SYSTEMS AG processors, and BTC Embedded Systems AG (BTC-ES), a majority  higher demands on the confidence in formal shareholding of BTC Business Technology Consulting AG, verification tools in safety-critical development provides products and services for formal verification, projects. automated validation, and automated testing of embedded systems software. BTC-ES products significantly reduce A. IEEE-754-Based Floating-Point Support required efforts for testing of embedded systems software, and Even though not thoroughly established yet, production considerably increase the quality of the developed systems. application of formal verification techniques more and more Additionally, BTC-ES offers on-site and off-site services requests to efficiently support floating-point arithmetic for around the testing products and testing in general in order to basic operations such as addition and multiplication, but also support customers in using the BTC-ES products successfully for complex operations such as square-root calculation, and efficiently. BTC-ES cooperates with two larger exponential and trigonometric functions. Current and well- enterprises, dSPACE GmbH and IBM Rational Software. Copyright c by the paper’s authors. Copying permitted for private and academic purposes. In: E. Ábrahám, J. Davenport, P. Fontaine, (eds.): Proceedings of the 1st Workshop on Satisfiability Checking and Symbolic Computation (SC2 ), Timisoara, Romania, 02-07-2016, published at http://ceur-ws.org established model checking techniques based on propositional increasing trend of using floating-point processors and satisfiability (SAT) checking or binary decision diagrams can production code in industrial applications. deal with those floating-point operations, but the computation performance is very much negatively impacted by the B. Confidence in Formal Verification Tools in Safety-Critical necessity to reduce these operations to some Boolean decision Development Projects problem aka bit-blasting. Even though it is a good approach to In reactive embedded software development for reuse the generic and well-established techniques for this type automotive applications, formal verification techniques such of problem, the obvious drawback is that it can be a hard as model checking reached a high technology readiness level3. problem for a Boolean solver to deal even with a small amount Off-the-shelf products implementing those techniques are of combined floating-point variables. available and applied not only in pilot projects and research, To illustrate the problem of bit-blasting for more complex but also in production. This evolution comes along with arithmetic floating-point functions like the exponential increasing demands in the technology confidence. function exp() as provided by the standard C library math.h, Furthermore, applying exhaustive formal verification consider the following rather small formula to be solved: techniques are viewed under return on investment constraints, obviously balancing efforts for formal verification with 1.4 == exp(x) traditional testing approaches. Besides that, as model checking with x being a 64-bit floating-point variable ranging in the claims to be a “complete test”, it is assumed that all bugs in a interval [-104.0, 89.0]. system under verification are detected using that technology. More precisely, no witness trace showing a violation of a To solve above formula, one can apply cbmc1, a bounded formal requirement specification remains unrevealed. Even model checker for C and C++ programs [1][2]. Before passing though the techniques indeed are invented to give the ultimate the formula to cbmc, one has to cope with a suitable answer, this expectation leaves one important aspect out of definition of the function exp() which is not provided by consideration: what about semantic bugs in the model checker cbmc. We remark that the C standard does not precisely itself influencing that ultimate answer? Albeit it is well define library functions like exp() but just gives a rough understood that a software product of a size of a model description. The implementation of exp() in our experiment checker contains bugs itself, occurrences of such bugs in consists of about 700 LOC including comments. Using cbmc production projects obviously have disproportionately with a SAT solver as its backend, the C code is translated into dramatic impact on both the quality of the system and the cost a SAT formula containing about quarter of a million variables to establish them, but also in the product’s confidence. Note and more than a million of clauses, provoking a rather high that we are not talking about tool crashes here, but about bugs runtime of about 5 minutes. which are based on semantically wrong computations on a In recent years there is a trend to extend SAT by dedicated system under verification: the model checking result is wrong. theories leading to the problem of satisfiability modulo Software products of this importance and impact require theories (SMT). The SMT community currently provides a dedicated measures to assure quality. Traditional software large set of theories accompanied with powerful SMT solvers. quality assurance techniques such as systematic testing are Concerning above problem, the interval-based SMT solver applied before shipping the product to end users. This kind of iSAT32 [3][4] is a promising choice as it supports non-linear offline quality assurance works well but it can of course not real arithmetic involving complex mathematical functions like exhaustively simulate the application at the end-user’s site. exp(). Due to this built-in support, the internal formula This means that still some bugs in the tool may remain representation has no space penalty, yielding a very fast undetected even though tremendous effort is spent for this solving time of above formula of fractions of a second. The kind of offline quality assurance. The arising question obvious drawback of the iSAT3 approach (currently) is the therefore is: what about the remaining bugs in model checkers lack of floating-point interpretations of these complex which are not detected offline to the end-user’s application? mathematical functions, though some first steps towards How can we detect the occurrence of semantic bugs in a floating-point interpretation of basic operations were model checker in some kind of online quality assurance in undertaken most recently [5]. parallel to the production operation? The SMT approach perfectly fits into the challenge we are The question of online quality assurance during real facing. Though there already are first attempts to standardize production application of model checkers is essentially the the theory of floating-point arithmetic within the SMT same as the question that the system under verification has community [6][7], powerful tool support–in particular not been translated correctly to a model checkers input plus relying on bit-blasting–is still a challenging and ongoing task, correct implementation of the model checking algorithms. In e.g. [8]. We strongly believe that strengthening the effort on terms of a model checker, the above question can also be the development of efficient floating-point SMT solvers being reduced to answer the following two queries: able to handle complex mathematical functions will be very beneficial for our customers, in particular with regard to the 1) Is a detected counter-example a genuine counter- example of the system under verification? 1 http://www.cprover.org/cbmc 2 3 https://projects.avacs.org/projects/isat3 https://en.wikipedia.org/wiki/Technology_readiness_level 2) How can be made sure that claiming absence of a reducing to the cone of influence, unwinding internal loops, counter-example is a trustful answer that a counter- and rewriting to single assignment code. example indeed does not exist? Intuitively, 1) is quite easy to answer. If a counter-example is generated, then this counter-example can be viewed as simulation trace. A simulation trace contains both stimulus and computed values of the representation in the model checker. Replaying that trace on the original system under investigation by checking if some stimulus indeed leads to the computed reactions as stated in the trace allows to safely judge whether the counter-example is a genuine one also showing malfunction of the original design4. Giving an answer to 2) is much harder. No counter- example is generated which would allow for straight-forward validation of the model checker’s outcome. Alternatively, one could argue to simply let a second, different model checker answer the same inquiry in parallel to the first one. Upon successful application of the second model checker, diversity arguments could be applied: whenever two different model checkers return the same result then the probability of a wrong result is very low. That is true and theoretically the problem is solved with this argumentation. However and in particular in real world applications, there could be some constraints Fig. 1. Typical verification tool chain starting with a C program and a hindering the approach of really being applicable: specification and leading to a corresponding SAT formula (left) and idea of lifting back a potential proof certificate from the SAT to the C layer (right).  As we are in a proprietary setting: is there a second model checker available at all? When employing an SMT and/or SAT solver as the  Even if there is a second model checker available, is backend engines of the analysis tool chain, the code is finally that model checker able to produce the same result in translated into an SMT and/or SAT formula. It usually holds terms of complexity? that the SMT/SAT formula is unsatisfiable if and only if the  Would an end user accept a potential run time and state where the observer variable is true is unreachable (for space increase? some bounded search depth). An alternative approach might be based on formal proofs Whenever it turns out that the SMT/SAT formula is of absence of bugs, preferably on the original software layer. unsatisfiable, we know that the given formal requirement is When reducing the original formal requirement specification fulfilled–at least in theory. In practice however the verification to (state or line) reachability in a program then we aim at a tool chain mentioned above is a very complex collection of certificate of unreachability (of the corresponding state or software being prone to error. Efficient SMT and SAT solvers line). The definition of such certificates desirably should not in particular are highly optimized, even on bit level. In order rely on any concrete analysis technique and should be to enhance trustworthiness of an unreachability result of the revisable by a small stand-alone proof checker. This ultimate overall verification tool chain, it is desirable to generate a goal seems to be a very challenging and hardly investigated certificate which facilitates a separate validation of the result, scientific topic but would have a very big impact on the namely independent from the actual model checking software. confidence level of the system under verification and on One natural approach is to generate a certificate on the formal verification methods in general due to the possibility of SMT/SAT solver level and then to lift this certificate back validating verification results online on customer site. through all the intermediate translation and transformation Let us roughly consider a typical verification tool chain as layers up to the original code level as indicated in Fig. 1. depicted in Fig. 1: the original C program is instrumented Though there is an extensive work on certificates of according to the formal requirement specification, e.g., by unsatisfiability for SAT formulas (based on resolution proofs introducing a new Boolean observer variable which is set to of unsatisfiability, e.g. [9]), it seems that certificates of true if and only if the given requirement becomes violated. unreachability for imperative programming languages like C is Thereafter, the C program is typically translated into some hardly investigated. We believe that certification of internal language in order to perform several transformation verification results will become a major topic in future, in steps into a syntactically simpler form. Some usual rules are particular when establishing formal methods and their inlining of function calls, flattening of structural data types, trustworthiness in industry. 4 Here, we abstract from bugs in simulators enabling to replay a counter-example on the original system under investigation. IV. CONCLUSION an inquired formal requirement? We believe that providing 2 As industrial associate partner of the SC network we evidence for results a model checker produces is very described some major challenges which we believe to be very beneficial when talking about bringing formal verification into important to be addressed in order to bring formal verification production. This is why we propose to think about some kind alive in broad production application in automotive embedded of proof certificates, which enable to probe whether a model software development. checker’s result “is true with almost absolute certainty”. First, with raising attractiveness of IEEE-754 floating- ACKNOWLEDGMENT point processors in the automotive domain, adjusted model checking techniques will be needed. Here SMT solving This work was supported by the H2020-FETOPEN-2016- avoiding bit-blasting holds a lot of promise, not only for basic 2017-CSA project SC² (712689). floating-point operations but also for complex ones like exponential function, square-root, etc. We strongly believe REFERENCES that SMT is the right choice to approach this new problem [1] Daniel Kroening, Michael Tautschnig: CBMC - C Bounded Model class as we obtained promising results here already also in Checker - (Competition Contribution). TACAS 2014: 389-391 cooperations with other academic partners of the SC2 network. [2] Peter Schrammel, Daniel Kroening, Martin Brain, Ruben Martins, Tino Teige, Tom Bienmüller: Successful Use of Incremental BMC in the Second, application of model checkers in broad software Automotive Industry. FMICS 2015: 62-77 production dramatically increases the demand of confidence in [3] Martin Fränzle, Christian Herde, Tino Teige, Stefan Ratschan, Tobias the applied tool suites. Obviously, errors in one of a chain of Schubert: Efficient Solving of Large Non-linear Arithmetic Constraint tools used for developing and testing, in particular, safety Systems with Complex Boolean Structure. JSAT 1(3-4): 209-236 (2007) critical software may have tremendous impact in efforts and [4] Karsten Scheibler, Bernd Becker: Implication Graph Compression inside costs on end-user’s side for quality assurance. In worst case, the SMT Solver iSAT3. MBMV 2014: 25-36 the bug is not detected at all leading to issues in the produced [5] Karsten Scheibler, Felix Neubauer, Ahmed Mahdi, Martin Fränzle, Tino system, the car, later on with effects hopefully not causing Teige, Tom Bienmüller, Detlef Fehrer, Bernd Becker: Accurate ICP- Based Floating-Point Reasoning. FMCAD 2016 (to be published). harm to human beings. [6] Philipp Rümmer, Thomas Wahl: An SMT-LIB Theory of Binary For model checkers, the worst case scenario is semantic Floating-Point Arithmetic. SMT 2010. bugs: the model checker’s analyzed semantics of a system [7] Martin Brain, Cesare Tinelli, Philipp Rümmer, Thomas Wahl: An under verification differs from the semantics of the original Automatable Formal Semantics for IEEE-754 Floating-Point Arithmetic. ARITH 2015: 160-167 design, leading to wrong verification results. Returning a wrong witness trace for a formal requirement violation is the [8] Martin Brain, Vijay D’Silva, Alberto Griggio, Leopold Haller, Daniel Kroening: Deciding floating-point logic with abstract conflict driven easy scenario here–this can simply be tackled by subsequent clause learning. Formal Methods in System Design 45(2): 213-245 simulation against the original semantics. But what is about a (2014) model checker returning no counter witness, claiming that [9] Lintao Zhang, Sharad Malik: Validating SAT Solvers Using an there is no bug in the system under verification with respect to Independent Resolution-Based Checker: Practical Implementations and Other Applications. DATE 2003: 10880-10885