Towards Using Formal Methods in Prototyping: Advantage or Impediment? Sebastian Schirmer Tino Teige Christoph Torens Udo Brockmeyer Unmanned Aircraft BTC Embedded Systems AG Unmanned Aircraft BTC Embedded Systems AG German Aerospace Center (DLR) Oldenburg, Germany German Aerospace Center (DLR) Oldenburg, Germany Brunswick, Germany tino.teige@btc-es.de Brunswick, Germany udo.brockmeyer@btc-es.de sebastian.schirmer@dlr.de christoph.torens@dlr.de Abstract—In aviation and other safety-critical domains, soft- At the department Unmanned Aircraft of German Aerospace ware faults are unacceptable. A means of detecting and avoiding Center (DLR), prototypes are built to foster autonomy of these faults is to use formal methods. Although formal methods unmanned aircraft system. This also incorporates to consider strongly contribute to the reliability and robustness of the system, some drawbacks prevent their general usage. A drawback is their AI methods and their validation. One problem in AI is that the reputation to be hard to apply for non-experts. Non-experts have function is not written as transparent and human-readable code to be familiarized with the tools to efficiently make use of them. which can be manually checked and traced. Instead, the func- But is this reputation still valid? Over the years, formal methods tion is learned and tested based on data and, therefore, often tools have evolved. They are capable to analyze more complex considered as black-box. Currently, there is a trend towards system properties. Further, their user experience was addressed by industrial companies to actually allow non-experts to profit simulation validation of AI due to the lack of other means. To from the advantages of formal methods. check black-box systems, e.g. during simulation, one approach This paper represents the first step towards putting the men- is runtime monitoring [3], depicted in Fig. 1. There, the inputs tioned assumption under test by trying to use formal methods for and outputs of the black-box are send to a monitor. The prototyping. We propose an approach for software prototyping monitor is generated based on a formal specification of desired which makes use of the formalization of requirements. We depict advantages and discuss first results of evaluating the commercial system properties and evaluates whether the requirements tool BTC EmbeddedPlatform R that we were able to use without are met by the black-box [4]. Further, in case of an AI cost in a project cooperation. We plan to continue the project component, a monitor feedback loop, as additional qualitative cooperation to answer the headline in future. or quantitative features, can also enhance the AI learning Index Terms—prototyping, formal specification, monitoring phase [5]. For runtime assurance purposes, monitoring plays an essential role. Regulatory agencies are investigating the I. I NTRODUCTION means to ensure comparable safety of AI components, which In software development, validation is a difficult and error- are in line with common practices. One promising means prone task. There are several different development models seems to be some version of the simplex architecture [6] to that lead through the software life cycle and support the sandbox the untrusted AI function. There, a decision module development of the software product. Common to most of decides whether to switch from an untrusted function to a these methods is a set of requirements from which code trusted function. Similar to the fuzzy values mentioned above, and test cases are created. Complementary, when developing the decision module requires accurate threshold values to safety-critical software, recommended software standards such switch safely and efficiently. An extension to the simplex as DO-178C [1] also recommend traceability in both directions architecture [5] was recently presented which incorporates from requirement to code and from code to test cases. Further, neural aspects of AI. it is known that the earlier in the development process a bug is found the cheaper it can be fixed. Often a proof of concept is developed with reduced rigor in form of a prototype before the final software product is created. In aviation, ARP4754A [2] even proposes prototyping to iden- tify incorrect or missing requirements. One of the main reasons for developing a prototype is improving the understanding of the target product. Only with this knowledge it is possible to define good software requirements. Otherwise, the values used in requirements are unknown or too fuzzy. To derive these values, we think that a structured way of prototyping is desirable during the prototyping process, i.e. consistent and documented adaptation of threshold values. Fig. 1. Using runtime monitoring to check black-box systems. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). In summary, an approach which allows to early identify requirement violations and to safely (documented, consistent) adapt requirement threshold values to capture or even bound the behavior of the target (AI) application is highly desirable for both developing an end-product as well as prototyping. In this paper, we propose an approach based on the formalization of requirements and depict its possible advantages. Further, we point out first insights whether the commercial tool BTC EmbeddedPlatform R can be used for the proposed approach. II. R ELATED W ORK Imprecise or incorrect requirements is one of the root causes of software errors and the associated costs to fix a software error in late stages of development is highest for errors introduced during requirements elicitation [7][8]. A requirements-based testing process that was supported by formal methods has been shown to significantly reduce the Fig. 2. Proposed approach for prototyping. number of issues before the implementation phase [9][10]. Previous work targeted this by formalizing requirements first in a semi-formal natural language template and finally in be automatically generated. Forth, monitors can be generated a temporal logic [11]. The formal requirements were then for each property and attached to the code, checking the model-checked against a manually developed model of the adherence to the property online. In the next subsection, we software. Although this approach was extremely helpful to describe one way how the formalization can be achieved. The understand the system, it required a lot of manual development approach is shown in Fig. 2 and explained in more detail in effort and the approach did not include support for code and Subsection III-B. test development. With this work, we continue these efforts A. Tool Support and include additional aspects. BTC EmbeddedPlatform R (EP) is a tool suite for specifica- III. A PPROACH tion, testing, and verification of requirements for Simulink R and TargetLink R models and production code, currently Often during prototyping, assumptions and requirements on mainly used in the automotive domain. EP has been success- the system are implicitly given. Still, we think they should fully certified by German TÜV Süd as fit for purpose for the be explicitly stated during the prototyping process. Especially usage in safety-critical software development projects accord- functional requirements are interesting for prototyping: every ing to IEC 61508-3:2010, ISO 26262, EN 50128, IEC 62304 100Hz the controller has to output a value; control-flow as well as ISO 25119. However, there is no dedicated tool properties like if the pilot requests ascending of the drone qualification support for the aerospace domain, as of yet. then the drone shall increase height within 0.2 to 0.5 seconds; The workflow for the use case of this paper is shown in or if an obstacle is detected above then the drone shall stop Fig. 3. Textual requirements are first imported into EP. Then to increase height within 0.1 seconds; or even high-level the BTC specification method guides the user during the com- assumptions like during daytime tracking of an object works. plex conversion phases from an informal textual requirement to They can actually help developers to avoid false, e.g. outdated a uniquely defined and correctly formalized requirement. One due to system changes, assumptions on the current software particularity of the method is that the underlying formalism state. Of course, writing and checking requirements on the called Simplified Universal Pattern (SUP) inherently reflects prototype level imply an additional workload. However, it can the intrinsic nature of an informal requirement by describing guide towards the source of error. Also, after prototyping, a temporal trigger-action relationship, namely in an intuitive written requirements offer a good starting point for creating and graphical way. the full set of requirements. In general, prototyping should not turn into an end-product development, therefore a lightweight approach is desirable. The approach focuses on the formalization of such func- tional requirements. As a result, the requirements could be depicted as an automaton or formula. This is an additional step which needs to be light and efficient. But the main advantages of formalized requirements are: First, they offer an unambiguous documentation. Second, at an early stage consistency can be automatically checked and, therefore, later code iterations potentially avoided. Third, basic test cases can Fig. 3. Workflow using BTC EmbeddedPlatform R . We briefly introduce the idea behind SUP by means of an example. For more details about the syntax and semantics of SUP the reader is referred to [12][13]. Fig. 4 gives two examples of formal requirements using the temporal trigger- action relation of the graphical SUP language. The SUPs FR-1 and FR-2 formalize the textual requirements from Section III “If the pilot requests ascending of the drone then the drone shall increase height within 0.2 to 0.5 seconds.” and “If an obstacle is detected above then the drone shall stop to increase height within 0.1 seconds.”, respectively. An example Fig. 5. Example executions of SUPs. execution of FR-1 is shown in the upper part of Fig. 5: in step 1 the pilot requests ascending of the drone and thus the trigger of the SUP is activated. This in turn means that the action is actually test the requirements, most often is a manual and thus expected to occur two to five steps later (where the step size very time-consuming task. As shown in Fig. 3, EP facilitates is defined to be 0.1 seconds). In fact, the height is increased the automatic generation of test cases by taking into account in step 5 meaning that the action part of the SUP fires which the intended behavior described by the requirements, leading leads to a fulfillment of the SUP in the same step. to high-quality test cases, cf. [14]. For the SUPs FR-1 and FR- 2, a generated test case is shown in the upper part of Fig. 5. Based on formalized specifications, fully automated formal checks can be executed in order to find and fix problems Another capability is to export executable monitors, also in the requirements set at a very early design stage, which called observers, from formal requirements as depicted in leads to greater product quality with less cost compared to Fig. 3. These observers implement the behavior of the formal traditional processes. As indicated by Fig. 3, we consider the requirements and establish a very flexible way of monitoring consistency check of requirements within this paper, i.e. we the correctness of the behavior of the actual system under aim at finding any contradiction within the requirements fully test in external simulation environments. Technically, these automatically [14]. When applying consistency analysis on the observers are exported as a functional mock-up unit (FMU) two SUPs of Fig. 4, EP reveals an inconsistency witnessed in the tool independent FMI standard1 , cf. [15]. by a violating execution shown in the lower part of Fig. 5: B. Prototyping Using Formal Requirements the pilot requests ascending of the drone in step 0. Due to FR-1, this enforces to increase height at least in step 2 We propose an iterative prototyping approach (cf. Fig. 2): and at most in step 5. In the concrete inconsistency witness, 1 Informal requirements on the application are collected. height is not increased in steps 2, 3, and 4, i.e. the only 2 We formalize the requirements using EP. Note that EP chance to fulfill FR-1 is to increase height in step 5. However, focuses on functional requirements. Requirements on the in step 5 an obstacle occurs which triggers FR-2. A valid system design, e.g. structural redundancies, cannot be behavior by FR-2 is to stop to increase height in the same expressed in general. step which in turn contradicts with finishing the action of FR- 3 Then, formal requirements are checked for consistency. 1. This establishes an inconsistency between FR-1 and FR- An automatic check is highly desirable since the software 2. Though the detection of inconsistencies is fully automatic, is build iteratively and variables can change at any point their resolution is a manual task but with the support of a in the prototype development. Also, especially, wrong concrete counterexample. In the example above, it was most timing assumptions are hard to find during later iteration. likely forgotten to consider detected obstacles for the pilot Imagine the requirement on component c when input request in FR-1. One potential solution would be to define an a is set, output b should be set within 50ms. At a exit (cf. [12][13]) in FR-1 whenever an obstacle is detected, later development phase, component c is divided into with the effect that observation of FR-1 is restarted (without consecutive subcomponents c1 and c2 with requirements violating or fulfilling the SUP) whenever an obstacle occurs. if input a is set then a1 is set after 25ms and when input Finding requirement-specific test cases, i.e. test cases that a1 is set then b is set within 25ms, respectively. Due to the automated check of the formal requirements, we can automatically detect an inconsistency by finding a trace falsifying the initial requirement on c. The required time for the second property is unbounded, after 25ms and could lead to a violation, i.e. b could be set after 50ms. (FR-1) 4 Based on formal requirements Test Cases and FMU Observers are generated by EP. 5 We instrument the system code to send required data (FR-2) to the generated monitors. The monitors evaluate the Fig. 4. Two SUPs FR-1 (upper) and FR-2 (lower). 1 Details about FMU and FMI can be found at https://fmi-standard.org/. specified properties and return an early verdict whether # −−−−−−−−−−−−−−−− M o n i t o r−s i d e −−−−−−−−−−−−−−−− 1 the system adheres to the system requirements. Note fmu = FMU( p a t h ) 2 that it is preferable to monitor the system outline on a fmu . i n s t a n t i a t e ( . . . ) 3 mapping = { V a r i a b l e −> ( ID , Type ) } 4 dedicated hardware to avoid impacts on the system. s o c k e t = o p e n S o c k e t ( i p , p o r t , mapping ) 5 6 Next, we run our system. First, we use the generated Test while ( t r u e ) : 6 Cases. Then, simulations are used. ID , d a t a , e v a l u a t e = s o c k e t . r c v ( ) 7 fmu . s e t V a l u e ( ID , d a t a ) 8 After each step, the artifacts are checked for violations, if evaluate : 9 e.g. during execution the outcome of the FMU Observers are fmu . d o S t e p ( . . . ) 10 f o r key i n mapping : 11 checked whether any requirements are violated. p r i n t ( key , fmu . g e t B o o l e a n V a l u e ( key ) ) 12 . . . Teardown 13 IV. D ISCUSSION # −−−−−−−−−−−−−−−− System−s i d e −−−−−−−−−−−−−−−− 14 Currently, EP is able to realize the required functions. But ... s t a r t up . . . i n i t i a l i s e s o c k e t 15 while ( t r u e ) : 16 its scalability on real-world example from aviation domain is hgt cmd , c u r h g t =getHgtCmd ( ) , g e t C u r H e i g h t ( ) 17 unclear and needs to be addressed in future. First tests indicate obs = checkObstacleAbove ( ) 18 that the formalization of textual requirements is easy to use. cmd hgt = c a l c u l a t i o n ( hgt cmd , c u r h g t , o b s ) 19 s e t C m d H e i g h t ( cmd hgt ) 20 However, currently, one is limited to SUP which represent sendData ( 21 temporal trigger-action relationships. Still, based on experi- [ ’request_ascending’ , ’increase_hgt’ , ’obstacle’ ] 22 ence, most of the functional requirements do actually fit in this [ hgt cmd > 0 , cmd hgt > 0 , o b s t a c l e a b o v e ] ) 23 . . . Teardown 24 pattern. For others, other languages exist like the stream-based specification language Lola [16] which was already used to Fig. 6. Simplified code fragments for the data exchange between the monitor the status of unmanned aircraft systems during flight. instrumented system and the generated monitors. In future work, we have to investigate the trade-off of the approach in order to estimate the cost-benefit ratio between writing and formalizing requirements and the added value of R EFERENCES better code at early stages, e.g. test cases and monitors. Clear [1] RTCA, “DO-178C/ED-12C Software Considerations in Airborne Sys- metrics have to be found to estimate the benefit. Formalization tems and Equipment Certification,” Radio Technical Commission for overhead also incorporates mistakes during formalization. To Aeronautics, 2011 [2] SAE International, “ARP 4754A Guidelines for Development of Civil which degree these kinds of mistakes are automatically de- Aircraft and Systems,” SAE International, 2010 tected by the consistency checks is also important. [3] Y. Falcone, K. Havelund, and G. Reger, “A tutorial on runtime verifica- During simulation, the use of generated monitors offer tion,” In Engineering Dependable Software Systems, 2013 [4] K. Y. Rozier, “From Siumlation to Runtime Verification and Back: clear benefits like early feedback and avoidance of inline Connecting Single-Run Verification Techniques,” In SpringSim, 2019 assertions. There, the main concern is the integration into [5] D. Phan, N. Paoletti, R. Grosu, N. Jansen, S. Smolka, and S. Stoller, the system setup, especially the system instrumentation. The “Neural Simplex Architecture,” arXiv 1908.00528. 2019 [6] D. Seto, B. Krogh, L. Sha, and A. Chutinan, “The Simplex Architecture monitors need to be able to run along the system while having for Safe Online Control System Upgrades,” In Proc. 1998 American a low impact on its execution. Figure 6 exemplary shows Control Conference Vol. 6. our current system instrumentation. We are not limited on [7] R. R. Lutz, “Analyzing software requirements errors in safety-critical, embedded systems,” Proceedings of the IEEE International Symposium passively reading databuses. We also instrumented the code to on Requirements Engineering, 1993 send information about the internal software status via UDP. [8] Davis, Alan. “Just enough requirements management: where software development meets marketing.” Addison-Wesley, 2013. V. C ONCLUSION [9] M. Whalen, D. Cofer, S. Miller, B. Krogh, and W. Storm, ‘Integration of formal analysis into model-based software development process,” We motivated a structured approach for prototyping. The Formal Methods for Industrial Critical Systems, 2008 approach is centered around the formalization of require- [10] P. Skokovic, “Requirements-Based Testing Process in Practice,” Interna- ments. It allows to automatically check their consistency and tional Journal of Industrial Engineering and Management (IJIEM), Vol.1 No 4, 2010, pp. 155 - 161, ISSN: 2217-2661, 2010 to generate test cases and monitors. The approach can be [11] C. Torens, F.-M. Adolf, “Fail-Safe Systems from a UAS Guidance implemented using open-source tools. As formal specification Perspective,” In: Encyclopedia of Aerospace Engineering UAS. Wiley. language, linear temporal logic (LTL) could be a candidate ISBN 9780470686652 [12] T. Teige, T. Bienmüller, and H.J. Holberg, “Universal Pattern: Formal- for which tools for model checking, runtime monitoring, and ization, Testing, Coverage, Verification, and Test Case Generation for test generation do exist. We decided to use a commercial Safety-Critical Requirements,” MBMV 2016: 6-9 tool that combines all of these approaches. We showed how [13] J.S. Becker, “Analyzing Consistency of Formal Requirements,” ECE- ASST 76 (2018) the integration into development could be done. Then, we [14] T. Bienmüller, T. Teige, A. Eggers, and M. Stasch, “Modeling Require- discussed the barriers for an actual integration. In future, we ments for Quantitative Consistency Analysis and Automatic Test Case plan to actually apply the approach and to report findings to Generation,” FM&MDD 2016 [15] H. Esen, M. Kneissl, A. Molin, S. vom Dorff, B. Böddeker, E. come closer to answering the question whether formal methods Möhlmann, U. Brockmeyer, T. Teige, G.G. Padilla, S. Kalisvaart, pay off for prototyping. “Validation of Automated Valet Parking,” Validation and Verification of Automated Systems, Springer, 2019 [16] F-M. Adolf, P. Faymonville, B. Finkbeiner, S. Schirmer, and C. Torens, “Stream Runtime Monitoring on UAS,” In Runtime Verification, 2017