=Paper=
{{Paper
|id=Vol-3056/paper-02
|storemode=property
|title=From Source Code to Crash Test-Case through Software Testing Automation
|pdfUrl=https://ceur-ws.org/Vol-3056/paper-02.pdf
|volume=Vol-3056
|authors=Robin DAVID,Jonathan SALWAN,Justin BOURROUX
}}
==From Source Code to Crash Test-Case through Software Testing Automation==
From source code to crash test-cases through software testing automation Robin David1 , Jonathan Salwan2 and Justin Bourroux3 1 Quarkslab, 13 rue Saint Ambroise, Paris, France 2 Pirate, Atlantic Ocean, Earth 3 DGA-MI, Bruz, France Abstract Finding weaknesses and vulnerabilities in a source code is a difficult task. An approach to tackle this issue is static analysis. However, existing solutions and tools tend to generate numerous alerts and especially false positives. This paper presents an approach automating the software testing process from source code up to the dynamic testing of the compiled program. More specifically, from a static analysis report indicating alerts on source lines, it enables trying to cover these lines dynamically and opportunistically checking whether or not they can trigger a crash. The result is a test corpus allowing to cover alerts and to trigger them if they happen to be true positives. This paper discusses the methodology employed to track alerts down in the compiled binary, the testing engines selection process and the results obtained on a TCP/IP stack implementation for embedded and IoT systems. Keywords Software Testing, Static Analysis, Fuzzing, Dynamic Symbolic Execution, Vulnerability Research 1. Introduction search is performed in the context of the PASTIS project (Programme d’Analyse Statique et de Tests Context Evaluating the security and finding flaws Instrumentés pour la Sécurité) financed by DGA- in source code is a tedious task in software test- MI which focuses on C, C++ programs and more ing. As a baseline, multiple guidelines have been specifically network related services. published for a wide range of industries like auto- motive [1], aircraft [2] or aerospace [3] to identify Contributions We present an automated testing in- weak and vulnerable code constructs. Applied for C frastructure combining different testing techniques, code, the most known are MISRA C [4] and CERT namely fuzzing and Dynamic Symbolic Execution C [5]. These standards are integrated in off-the-shelf (DSE). Combining heterogenous testing engines to static analyzers [6, 7, 8] which usually generate nu- fuzz the same target is now usually called ensemble merous alarms with substantially high false-positive fuzzing [9]. rates. Therefore, analyzing results is a lengthy and We implemented our own fuzzing infrastructure cumbersome process. Few research in litterature and performed an experimental study of existing intend to solve the issue of validating alarms as testing techniques namely fuzzing and DSE. We generating a crashing or a violating test-case is an developped a benchmark test suite trying to reveal open research question. It requires solving both a idiosyncratic behaviors of tested tools. Based on reachability and a satisfiability issue in the program. results obtained we selected honggfuzz [10] and Our research does not address this issue directly triton [11], respectively for fuzzing and DSE. To but aims at bridging the gap between alerts identi- summarize our research provides the following con- fied at source level and the dynamic testing. That tributions: process aims at opportunistically covering and val- idating these alerts. We intend to automate the • experimental study of existing techniques process as much as possible so that the analyst can and tools on a dedicated benchmark ; focus on hard to reach corner-case alerts. This re- • combination of a static analyzer with an en- semble fuzzer aggregating heterogenous soft- C&ESAR’21: Automation in cybersecurity, November ware testing engines (greybox fuzzing and 16–17, 2021, Couvent de Jacobins, Rennes, France DSE); $ rdavid@quarkslab.com (R. David); jsalwan@quarkslab.com (J. Salwan) • consolidation of this combination in a semi- © 2021 Copyright for this paper by its authors. Use automated workflow that starts from alerts permitted under Creative Commons License Attribu- tion 4.0 International (CC BY 4.0). on source code lines, track them back in CEUR Workshop Proceedings (CEUR- CEUR Workshop Proceedings WS.org) http://ceur-ws.org ISSN 1613-0073 the compiled binary and triggers automated Proceedings of the 28th C&ESAR (2021) 27 From Source Code to Crash Test-Case through Software Testing Automation testing to cover them and to trigger the bug AFL and Honggfuzz are two leading implementations if any. That process leads to the generation of greybox fuzzers (now superseded by AFL++ [17]). of a test corpus; AFL/QBDI enables binary-only fuzzing by interfacing • benchmark assessing the robustness of two AFL with QBDI [18]. This combination also enables TCP/IP stacks which enabled uncovering a on the fly optimizations, for instances, breaking remote Denial-of-Service (DOS) which got comparisons with constants which are notoriously assigned the identifier CVE-2021-267881 . hard in mutational approaches. PULSAR has been selected for its availability to test network protocols. Input generation based on models (partially infered) 2. Experimental study of in comparison to AFL and Honggfuzz which are using techniques and tools genetic algorithms [19]. 2.1. Software testing techniques Dynamic Symbolic Execution also called whitebox- fuzzing uses a modeling of instruction semantic be- In the past decade, fuzzing [12] and dynamic sym- havior to perform its execution. Instructions are bolic execution [13], two software testing techniques, disassembled and lifted in a semantic representation have revealed themselves as being very efficient at called intermediate representation used for emula- detecting and triggering bugs. While fuzzing tends tion. A path 𝜋 in the program is then represented to be very fast it can be hindered by some code as a first-order logic formula (usually on bitvectors) constructs preventing it to progress in program ex- that is then given to an SMT solver [20]. A solution ploration. Contrarily, DSE reasons more precisely of this formula is an input covering the path 𝜋. on a per path manner but is significantly slower. Hence, we assessed various fuzzers and DSE engines manticore KLEE angr Triton to select one candidate of each to be combined to- Version 0.2.5 2.1 8.18 0.7 gether. Criterias and methodology are described in Langage Py C++ Py C++, Py Section 2.2. Open-source ✓ ✓ ✓ ✓ Base binary source binary binary Intermediate Repr. custom LLVM VEX custom Fuzzing is the mean of feeding pseudo-random in- Variadic argv size ✗ ✗ ✓ ✗ puts to the program in order to trigger unexpected Library calls ∼ ✓ ✓ ∼ behaviors. Inputs can be generated randomly or Syscalls ✓ ✓ ✓ ✗ using some feedback mechanisms. The most com- Symbolic mem. read ✓ ✓ ✓ ✗ monly used feedback is coverage but other feed- Symbolic mem. write ✗ ∼ ✓ ✗ backs have been proposed in litterature [14]. So- bit-vectors ✓ ✓ ✓ ✓ called greybox fuzzers like AFL [15], libfuzzer [16] Arrays ✓ ✓ ✓ ✗ or honggfuzz [10] uses compilation-time static in- Table 2 strumentation of the program to obtain feedback at Comparison of selected DSE tools runtime. AFL Honggfuzz AFL/QBDI PULSAR Table 2 shows DSE engines tested in this study. Version 2.52b 1.7 - - Both manticore [21] and angr [22] are developped Language C C C Python in Python and provide similar features. They imple- Open-source ✓ ✓ ✓ ✓ ment a wide range of library calls and syscalls, and binary fuzzing ✗ ✗ ✓ ✗ support to some extend symbolic reads and writes Static instr. ✓ ✓ ✓ ✓ in memory. Triton [11] provides more elementary Dynamic instr. ✗ ✗ ✓ ✗ functionalities but is designed to be modular in or- Seed-scheduling ✓ ✓ ✓ ✗ der to be embedded in a whole set of other utilities. model input gen. ✗ ✗ ✗ ✓ mutation input gen. ✓ ✓ ✓ ✓ KLEE [23] works on LLVM and is the reference in In-memory fuzzing ✓ ✓ ✓ ✗ DSE. Crash dedup/prio. ✓ ✓ ✓ ✗ Table 1 2.2. Methodology & Benchmarking Comparison of selected fuzzers To bring out two final candidates for fuzzing and DSE, we designed a test suite. It enables checking Table 1 shows fuzzers that have been assessed. specific behaviors on small snippets (atomic tests) as well as testing the scale on larger programs. Atomic 1 https://nvd.nist.gov/vuln/detail/CVE-2021-26788 28 Proceedings of the 28th C&ESAR (2021) R. David, J. Salwan and J. Bourroux tests assess the behavior, on symbolic pointers, han- 3. Testing Automation dling of non-deterministic instructions and a variety of vulnerability categories (buffer-overflow, integer- 3.1. Overview overflow, use-after-free etc.). For scalability bench- The process of automating the dynamic testing of a mark, uniq and base64 binaries of the LAVA-M source code is depicted in Figure 1. First, the code project [24] have been used. This suite provides the has to be harnessed4 to target the components of ground truth along with some quantitative results interest. It has to be prepared for both fuzzing and (72 bugs in total in the two binaries). symbolic execution, which both have to be compiled To smooth statistical discrepancies of results differently. That step is highly manual and usually caused by the random nature of fuzzing, tests were requires a good understanding of the target. Then run multiple times and the mean value was com- the harnessed code can be provided to a Source puted. Each 70 atomic tests were run 3 times for a Code Analysis Tool (SAST) that will generate a maximum duration of 300 seconds while scale bina- report of suspicious lines of code. These data are ries were run 3 times for 6h. Table 3 shows synthetic used to embed intrinsic function 5 calls in the target results of all utilities on this test suite2 . Every tools in an automated manner. The code is compiled and have been configured opportunistically with the provided to both testing engines that will intend to best parameters to provide them fair chances. The cover faulty lines and to generate crashing inputs. PULSAR results have been excluded because it was During that process they will communicate together not possible to run it correctly on the benchmark to help each other. The final output is a report of targets due to its network protocols focus. alerts, indicating whether they have been covered Atomic (70) Scale (72) Total (142) or not and whether a crash has been associated to it. Automating most of these steps enable pruning AFL 48 0 48/142 some alerts enabling the analyst to focus on deepest Honggfuzz 54 44 100/142 uncovered ones. AFL/QBDI 47 33 80/142 For the purpose of this research, a fuzzing cam- manticore 34 0 34/142 pain is expected to run for at most 24h. That time KLEE 47 1 48/142 cap has arbitrarily been set at the begining of the angr 37 0 37/142 PASTIS project. Triton 47 0 47/142 Table 3 3.2. Collaborative architecture Test suite benchmark results A challenge in designing an automated workflow is making the fuzzing and DSE to collaborate to- Fuzzing results shows that honggfuzz outper- gether and determining what kind of information to formed other engines and it has consequently been exchange. As both of these approaches work rather kept as the reference engine for fuzzing. For sym- differently and have different notions of coverage, bolic execution, while klee [23] outperformed other exchanging this kind of information directly is in- engines Triton has been selected. Being develop- herently complicated. Moreover, it makes difficult pers of Triton, the code is familiar to us, which integrating new engines. Hence, each engine solely makes it very easy to extend it and to modify it for exchange input seeds they generate with regards to PASTIS needs. Also, KLEE comes with two main theirs own coverage metric. The remote engine is issues for combining it with other fuzzing engines. in charge of deciding whether it is valuable to keep First, as it works at LLVM-IR level it requires the an input or not. The exchange medium is described program at source-level3 . Then its code-base is in Section 4. evolving fast and contains many research-related The communication is performed through a cen- features making it difficult to integrate it in a fully- tral authority called broker which enables connect- automated workflow. ing multiple instances of the engines. Figure 2 shows the general overview of the collaborative architec- ture. At startup, the broker provides the binary with appropriate parameters to the engines. Then during the execution it forwards all the inputs re- 2 Versions of tools are slightly outdated has the experi- ment was perfomed in 2018. 4 Explaining the process is left out-of-scope for this paper. 3 5 While it is the case in this study, our goal was to make ad-hoc function added in a code base, that will receive the PASTIS framework applicable to binary-only targets. specific processing at runtime by a third-party tool. Proceedings of the 28th C&ESAR (2021) 29 From Source Code to Crash Test-Case through Software Testing Automation compilation fuzzing for fuzzer Source code Alerts intrinsic Report SAST code harnessing report insertion - coverage compilation - validation DSE for DSE Figure 1: Full Analysis Workflow Master Initial Configuration Workspace pastis-broker - binary - corpus / crashes / hangs - SAST report (klocwork) - log and client statistics - configurations (coverage strategy, etc) communication - CSV of results (libpastis) 1 1 1. Connection (idle) 2. Reception of binary (+opts) 2 2 3. Seed exchange (+logs) 3 3 4. Infos of alert validation 5. Stop 4 4 5 5 communication communication (libpastis) (libpastis) Python driver Pastis-DSE execve TritonDSE (exploration of paths) Honggfuzz Triton (symbolic execution of one path) Fuzzing DSE pastis-honggfuzz pastis-triton Figure 2: Global Collaborative Architecture Overview ceived from one engine to the others. During the Hence our workflow uses intrinsic functions. The fuzzing campaign, if an engine covers or validates Listing 1 shows the intrinsic function used. It takes an alert it sends its identifier and the associated an identifier, a format string and an arbitrary num- input to the broker that centralizes all data. ber of values as argument. Its sole purpose is print- ing the identifier given in parameter. Then, at each Alert Validation To validate alerts discovered at alert location a call to this function will be added source-level they have to be trackable down in the with an unique identifier, the type of issue identified compiled binary for the test engines. Compiling by the static analyzer and contextual parameters. binaries in debug mode enables tracking the asso- For instance, it enables retrieving sizes of buffers ciated line of code for each assembly instructions. that is known to the compiler but lost once com- However, that requires each engines to be able to piled. leverage debug information. At runtime, a test engine will either have to parse stdout to find covered alerts, or to directly hook the #ifdef QB_INTRINSIC intrinsic functions (depending on its inner-working). int __klocwork_alert_placeholder(int id, When detecting a crash or violation, engines are ˓→ const char* fmst, ...){ in charge to map it to a previously covered alert printf("REACHED ID %d\n", id); if applicable. Tracking the root-cause of a crash is still an open research problem [25] thus it is done return id; here in an empirical manner. The last encountered } alert is considered to be the cause of the crash and #endif is thus considered validated. Note that we cannot invalidate an alert as being a false positive because of the potential infinite numbers of paths leading to Listing 1: Intrinsic function that code location (path combinatorial problem). 30 Proceedings of the 28th C&ESAR (2021) R. David, J. Salwan and J. Bourroux Figure 3: Sample alert report Klocwork Crashes or violations are detected in a different is based on the message-queuing framework ZMQ7 manner between fuzzing and DSE. Modern fuzzers so that it is interoperable with almost all existing uses sanitizers like ASan [26], UBSan [27] or else programming languages. TSan [28] that respectively detect: memory corrup- The analyst launches pastis-broker with all the tions, undefined-behavior or race-conditions. DSE target binary variants, an initial corpus if needed, engines usually implement their own sanitizers that some configuration parameters, and the klocwork leverage the analysis precision of symbolic execu- report to stop the campaign when all alerts are tion to implement fine-grain sanitizers. In this case, covered or validated. the sanitizers can also use contextual information Then the various test engines have to be launched provided as argument of intrinsic functions to im- with the broker IP address to receive all the fuzzing plement their checks. campaign data. If an engine supports different cov- erage strategies (block, edge, path etc) and multiple instances are connected to the broker, it will auto- 4. Implementation matically equilibrate the coverage strategies. Target Setup Once the harness of the target pro- gram is implemented for all fuzzers included in the Honggfuzz Integration Honggfuzz [10] is a mod- platform, the code is given to the SAST tool. In ern greybox fuzzer developped in C++. Besides, this research, the software Klocwork [7] developped being very efficient on many targets it has not been by Perforce has been used. As output of its analy- designed for collaborative fuzzing. As a consequence, sis, it provides an HTML file indicating faulty lines small modifications have been made on its core. The along with some additional contextual data (vari- most important is the ability to receive new inputs able names, buffer sizes ...). Figure 3 provides an while it is already fuzzing8 . Thereupon, a Python exemple of such report. wrapper has been developped to perform all commu- Our semi-automated workflow takes the report nications with the broker, to parse stdout, to inject in input, translates it in JSON for easier processing external inputs received and to send the broker all and uses the result to automatically add intrinsic inputs generated. function calls in the source code. This code addition Such overlay is called a driver, as it enables inter- is made syntactically on a per line basis and thus facing an existing engine to the PASTIS framework. requires to be double-checked by the analyst6 . Then Figure 4 summarizes the main interactions between the various variants of the program are compiled for Honggfuzz and the wrapper with which all inter- each engines or target architecture (x86_64, ARM). communications are performed through filesystem The target program is now ready to be tested using monitoring (inotify on Linux). The whole compo- the PASTIS framework. nent is called pastis-honggfuzz. PASTIS Farmework The main interface with the Triton Integration Triton [11] is a DSE frame- analyst is the broker called pastis-broker. It is work library designed to perform symbolic execu- implemented in Python and ensures all communica- tion on a given path. The whole logic of loading the tions between engines. The communication protocol program, scheduling input seeds, covering different 7 https://github.com/zeromq 6 8 An implementation using clang AST is being studied. The feature had been submitted as merge request. Proceedings of the 28th C&ESAR (2021) 31 From Source Code to Crash Test-Case through Software Testing Automation communication (libpastis) inotify on the file Workspace 5. Experimental Results Replay KlocworkReport add logs + stats.log target 5.1. CycloneTCP target HF-Wrapper telemetry execve inotify folder modification initial dynamic While this technique is applicable to any kind of reading inputs software, the PASTIS project is centered on test- Honggfuzz (in a loop) coverage crashs ing low-level network TCP/IP stacks. Among ex- isting open-source implementations, CycloneTCP9 pastis-honggfuzz writing of corpus outputs developped by Oryx-embedded, provides an imple- and crashes mentation for a wide variety of protocols. Recent publications have shown it to be robust in compari- Figure 4: Honggfuzz engine son to other TCP/IP stacks [29]. The stack provides a driver mechanism to receive paths is left to the developper. To address this issue, network frames for various MCUs and OSes. The a fully-featured DSE engine called TritonDSE has target program is a simple HTTP server with a been developed at the top of Triton. For the pur- single static page. Only standard protocols are ac- pose of the PASTIS framework, a program called tivated (Ethernet, IP, TCP, HTTP, ARP) in the PastisDSE has also been built on top of TritonDSE. target. Other protocols like, DNS, LLMNR, MBNS This program performs all the communications with are not activated to focus on assessing the abil- the broker which include receiving external inputs ity of engines to handle full TCP communications. and sending ones generated by Triton. Figure 5 The harness implements a driver which reads in- summarizes interactions of these components within put frames from a file. A single input is thus a the so-called pastis-triton component. sequence of frames representing incoming messages That component implements code, edge and path from a client. The harness also tears down the multi- coverage strategies. It also implements different san- threading logic into a single threaded application itizers for each category of vulnerability considered. enabling processing network frames in a sequen- As such, memory operations are tracked at the bit- tial manner. While it prevents finding potential level, and it enables detecting pecisely off-by-one race-conditions it strongly reduces non-reproducible (OB1). Use-After-Free are detected by tracking the test-cases. As part of the harness, various patches malloc and free primitives. were made in the code to remove checksums ver- In this setting, pastis-triton is launched in ification, add a pre-registered ARP lease (for the pure-emulation (thus not as a concolic engine) to client) and to remove randomness of TCP Initial better control all side-effects and to allow the execu- Sequence Number (ISN). tion of Aarch64 binaries on x86 hosts. In essence, it has to emulate all the side-effects performed on the system (libc functions, syscalls etc). As it cannot 5.2. Controlled environement be exhaustive, it only supports a limited number of To assess the workflow effectiveness, defects and libc functions and syscalls. vulnerabilities have been added to the CycloneTCP code. Defects are code constructs raising a SAST alert but which are structurally not triggerable, and communication (libpastis) Workspace vulnerabilities are defects that can be triggered. target Such controlled benchmark enables checking the config.json Pastis-DSE effectiveness of the framework to cover secluded metadata Callbacks Strategy (ALERT_ONLY, CHECK_ALL) locations of the code and to trigger vulnerabilities Tritondse by creating the appropriate test-case (input). corpus crashes Adding relevant defects is tedious as they have (exploration of paths) Triton worklist hangs to fullfil the following properties: • reachability: they have to be reachable by a (symbolic execution of one path) seeds pastis-triton test-case • conditionality: they should be reachable un- Figure 5: pastis-triton engine overview der some conditions (not covered systemati- cally) 9 https://www.oryx-embedded.com/products/ CycloneTCP 32 Proceedings of the 28th C&ESAR (2021) R. David, J. Salwan and J. Bourroux • non-interference: a defect should not alter As part of the testing, many test-cases were caus- the reachability, detectability of another one ing the program to hang forever. While it strongly • detectability: vulnerabilities should be trig- reduced the fuzzing speed it revealed to be a true gerable 0-day in the parsing of TCP options. It has responsi- • expressiveness: the coverage shall express the bly been disclosed to Oryx-embedded which quickly exhaustivness of the coverage (e.g: managing published a patch. The vulnerability obtained the to craft DHCP header, managing to enter CVE identifier CVE-2021-2678810 11 . HTTP parser, IPv4 reassembly etc) 5.4. Limitations To diversify vulnerabilities, 5 types are considered: BoF for buffer-overflow, IoF for integer-overflow, OB1 Most of the analysis steps depicted in Figure 1 can for off-by-one, FMT for format-string (handling user- be automated, but as of now, the most difficult ones input as format), UaF for Use-After-Free and SIGS still requires analyst. As expected, the analyst has for memory corruption (null pointers dereference to write the harness for the target. He has to make etc). Among the 20 defects added (shown in Ta- it compilable for all testing engines and he has to ble 4), 5 of them were not detected by the SAST control that automatic insertion of intrinsics does (klocwork). Weaknesses of SAST tools is left out- not break the program semantic. of-scope for this research. In the automated process, While this research shows that automating most no intrinsic functions are added for these issues and of the workflow is possible, combining both a fuzzer thus cannot be detected and validated. As a conse- and DSE raises multiples issues that are yet to be quence, the benchmark contains 15 issues for which addressed. Indeed, such heterogenous algorithms the ground-truth is available. The PASTIS frame- hardly work together. Experiments shows that work then have to cover and validate alerts within fuzzing generates numerous test-cases that DSE the 24h time slot. Also, the test engine starts its replays significantly more slowly. It thus spends a campaign with a single input in its initial corpus significant amount of time performing its dry-run12 that represents a complete TCP connection. to update its coverage with inputs received. The coverage synchronisation between engines is thus a 5.3. Results bottleneck for symbolic execution. Also, DSE in pure emulation requires a large Table 4 shows coverage and detection results. number of syscall and external libraries modeling Within 24h, all intrinsic function calls correspond- to scale on significantly larger code base. From the ing to identified alerts have been covered and 77% side-effect modeling perspective, scaling on signif- of vulnerabilities correctly validated. Multiple vul- icantly larger codebase can be addressed using a nerabilities are validated in less than a minute and concolic execution mode. Such an approach relies few of them took more than 3 hours to be detected. more heavily on concrete values during the execu- The generated test corpora covers 42% of the tion which does not need to be modeled. Conversely, whole code lines. While it seems low, it repre- the reasoning power of the symbolic aspect is re- sent almost all the coverable code. The rest being duced as side-effects are not modeled symbolically. client-side functions only called when being used Because of DSE limitations, current benchmarks as a client. Besides that, the code is written in a results do not reflect a clear gain in combining defensive manner which implies that multiple error- fuzzing and DSE rather than running them sepa- handling code are never covered. For instance, code rately. handling malloc errors is never called as no out of memory were triggered. Quantitative results and experiments revealing the improvement of combin- 6. Related work ing both testing engines have not yet been evaluated and is left as a future work. Static analysis warning driven exploration Com- bining static analysis and dynamic testing to obtain better results than each technique taken separately Depending on the class of defects, validation dif- ficulty varies. For example, FMT appeared to be 10 https://blog.quarkslab.com/ harder to trigger as it requires the engine to gener- remote-denial-of-service-on-cyclonetcp-cve-2021-26788. ate faulty format strings (e.g %s). Conversly, IoF do html 11 generates multiple false positives as the engine does 12 https://nvd.nist.gov/vuln/detail/CVE-2021-26788 Corpus replay to update the engine internal coverage. not know if the operation is performed on signed or Inputs run are not mutated. The dry-run typically decides unsigned integers. whether the input is worth being kept or not. Proceedings of the 28th C&ESAR (2021) 33 From Source Code to Crash Test-Case through Software Testing Automation Honggfuzz Triton Id Type D V Proto. Function Cov Val. Cov Val. 1 OB1 ∙ HTTP httpParseRequestLine ✓ ✓ ✗ ✗ 2 FMT ∙ HTTP httpSendErrorResponse ✓ ✓ ✗ ✗ 3 IoF ∙ HTTP httpSendRedirectResponse ✓ - ✓ - 4 BoF ∙ HTTP httpSendRedirectResponse - - - - 5 FMT ∙ HTTP httpReadRequestHeader ✓ ✗ ✗ ✗ 6 UaF ∙ HTTP httpSendRedirectResponse - - - - 7 BoF ∙ HTTP httpParseRequestLine ✓ ✓ ✓ ✓ 8 BoF ∙ HTTP httpParseContentTypeField ✓ ✓ ✓ ✓ 9 FMT ∙ HTTP httpFormatResponseHeader ✓ - ✗ - 10 FMT ∙ HTTP httpParseContentTypeField ✓ ✗ ✗ ✗ 11 OB1 ∙ HTTP httpDecodePercentEncoded. - - - - 12 IoF ∙ IPv4 ipv4ProcessPacket ✓ - ✓ - 13 SIGS ∙ ARP arpProcessReply ✓ ✓ ✓ ✓ 14 SIGS ∙ ICMP icmpProcessEchoRequest ✓ - ✓ - 15 BoF ∙ ICMP icmpSendErrorMessage - - - - 16 UaF ∙ IPv4 ipv4FragmentDatagram ✓ ✓ ✓ ✗ 17 OB1 ∙ core formatDate ✓ - ✓ - 18 SIGS ∙ ETH. ethSendFrame ✓ - ✗ - 19 UaF ∙ IGMP igmpProcessMessage ✓ ✓ ✓ ✓ 20 IoF ∙ ICMP icmpUpdateInStats. - - - - D: Default, V: Vulnerability, Cov: Covered, Val: Validated Table 4 Inserted vulnerabilities and detection by Honggfuzz, Triton has already been studied. From an error-condition Fuzzing & Symbolic Execution combination Var- infered by a static checker Check ’n’ Crash [30] aims ious approaches combining these two testing tech- at generating a test-case to validate if the error truly niques have been proposed in the past. Koushik exists. Sen published in 2007 an Hybrid Concolic Testing Another combination called SANTE [31] uses the approach combining the two [41]. Later, Driller [42] static analyzer Frama-C [32] to detect potential suggested a selective DSE algorithm launching Angr runtime errors. The result is combined with solely when the fuzzing is getting stuck. More re- Pathcrawler [33], a DSE to generate a test-case cently QSym [43] intertwines the concolic execution and to confirm the alarms. DyTa [34] another utility, within the fuzzing in a very light yet fast manner. follows a similar approach. Finally, multiple collaborative approaches allowing Another category of related work rely on directed to combine heterogenous fuzzing engines have been approaches. Gerasimov [35] uses static analysis proposed under the term ensemble fuzzing. Among warnings as targets for a directed DSE algorithm them, we can hightlight ClusterFuzz [44] by Google, iteratively reducing the distance with the warnings EnFuzz [9], Deepstate [45], collabfuzz [46] or more to cover them. They use their own static analyzer recently OneFuzz [47] by Microsoft. To our knowl- Svace [36]. In another publication [37] they also edge none of these ensemble fuzzers uses a static study the reachability of the security warnings. The analyzer as an input of test objectives. work of Li et al. [38] suggests an approach dedicated to Use-After-Free vulnerabilities where alloc and free primitives are used to drive the exploration. In 7. Future work a more general manner, multiple existing research These preliminary results open the way to fur- works focus on directed approaches to cover specific ther experiments and benchmarks. Multiple ex- locations of the program [14, 39, 40] but which are periments can be made to optimize collaboration not necessarily driven by a SAST. of test engines. We are working on improving the PASTIS framework by adding new fuzzing engines like AFL++ [17], adding slicing features to better 34 Proceedings of the 28th C&ESAR (2021) R. David, J. Salwan and J. Bourroux guide the exploration with more directed strate- [7] Perforce, Klocwork static code analysis for c, gies or to enlarge the project scope to binary-only c++ and java, 2021. [site]. targets. [8] P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Miné, D. Monniaux, X. Rival, The astreé analyzer, in: M. Sagiv (Ed.), Programming 8. Conclusion Languages and Systems, Springer Berlin Hei- delberg, Berlin, Heidelberg, 2005, pp. 21–30. This paper summarizes what has been done as part [9] Y. Chen, Y. Jiang, F. Ma, J. Liang, M. Wang, of the PASTIS project and its implementation in C. Zhou, X. Jiao, Z. Su, Enfuzz: Ensemble the PASTIS framework. We depict a test suite fuzzing with seed synchronization among di- enabling to discriminate and to choose a fuzzing verse fuzzers, in: 28th USENIX Security Sym- and DSE engine for the PASTIS plateform. We posium, Santa Clara, CA, USA, 2019, USENIX then describe the full workflow that we intend to Association, 2019, pp. 1967–1983. [site]. automate. Namely, the paper discusses the process [10] R. Swiecki, F. Gröbert, honggfuzz, https:// of analysing a source code with a SAST tool, how github.com/google/honggfuzz, 2009. to embed this data in the final compiled program [11] F. Saudel, J. Salwan, Triton: A dynamic sym- and how to automate the process of testing it with bolic execution framework, in: Symposium sur various heterogenous testing engines. The result la sécurité des technologies de l’information et is a test corpus that can be integrated as tests in des communications, SSTIC, France, Rennes, the project. An analyst, can use these results, to June 3-5 2015, SSTIC, 2015, pp. 31–54. prune and ignore irrelevant alerts, performing the [12] B. P. Miller, L. Fredriksen, B. So, An empir- root-cause on crashes and focusing on the remaining ical study of the reliability of UNIX utilities, alerts that have not been covered. This process is Commun. ACM 33 (1990) 32–44. doi:10.1145/ required in a wide range of industries like aerospace, 96267.96279. automative, defense, energy or any context that [13] C. Cadar, K. Sen, Symbolic execution for requires a higher level of insurance. software testing: Three decades later, Com- munications of the ACM 56 (2013) 82. doi:10. Acknowledgments 1145/2408776.2408795. [14] Y. Wang, X. Jia, Y. Liu, K. Zeng, T. Bao, This research was realized by Quarkslab in the con- D. Wu, P. Su, Not all coverage measurements text of the PASTIS project financed by DGA-MI are equal: Fuzzing by coverage accounting for (Direction Générale de l’Armement, Maîtrise de input prioritization, in: 27th Annual Network l’Information). and Distributed System Security Symposium, NDSS 2020, San Diego, California, USA, Febru- ary 23-26, 2020, The Internet Society, 2020. References [15] M. Zalewski, American fuzzy lop, http:// lcamtuf.coredump.cx/afl/, 2018. [1] ISO, Road vehicles – Functional safety, 2011. [16] L. Team, libfuzzer – a library for coverage- [2] L. M. Corporation, Joint Strike Fighter Air Ve- guided fuzz testing, 2018. [site]. hicle C++ Coding Standards For The System [17] A. Fioraldi, D. Maier, H. Eißfeldt, M. Heuse, Development And Demonstration Program, Afl++ : Combining incremental steps of Lockheed Martin Corporation, 2005. [PDF]. fuzzing research, in: 14th USENIX Work- [3] J. P. Laboratory, JPL Institutional Coding shop on Offensive Technologies (WOOT 20), Standard for the C Programming Language, USENIX Association, 2020. [site]. 2009. [18] E. Geretto, C. Tessier, F. Massacci, A qbdi- [4] M. I. S. R. Association, M. I. S. R. A. Staff, based fuzzer taming magic bytes, in: Italian MISRA C:2012: Guidelines for the Use of the C Conference on Cyber Security, ITASEC 2019, Language in Critical Systems, Motor Industry Pisa, Italy, February 13-15 2019, CEUR Work- Research Association, 2013. [book]. shop Proceedings, 2019. [PDF]. [5] R. C. Seacord, The CERT C Secure Cod- [19] S. Rawat, V. Jain, A. Kumar, L. Cojocar, ing Standard, 1st ed., Addison-Wesley Pro- C. Giuffrida, H. Bos, Vuzzer: Application- fessional, 2008. aware evolutionary fuzzing, in: 24th Annual [6] G. Inc., Codesonar c/c++ sast when safety Network and Distributed System Security and security matter, 2021. [site]. Symposium, NDSS 2017, San Diego, Cali- fornia, USA, February 26 - March 1, 2017, Proceedings of the 28th C&ESAR (2021) 35 From Source Code to Crash Test-Case through Software Testing Automation 2017. URL: https://www.ndss-symposium. doi:10.1145/1791194.1791203, [PDF]. org/ndss2017/ndss-2017-programme/ [29] D. dos Santos, S. Dashevskyi, J. Wetzels, vuzzer-application-aware-evolutionary-fuzzing/, A. Amri, How embedded tcp/ip stacks breed [PDF] [code] [video]. critical vulnerabilities, 2020. [slide]. [20] L. M. de Moura, N. Bjørner, Satisfiabil- [30] C. Csallner, Y. Smaragdakis, Check ’n’ ity modulo theories: introduction and appli- crash: Combining static checking and test- cations, Commun. ACM 54 (2011) 69–77. ing, 2005, pp. 422–431. doi:10.1109/ICSE. doi:10.1145/1995376.1995394. 2005.1553585. [21] T. of Bits, Manticore: Symbolic ex- [31] O. Chebaro, N. Kosmatov, A. Giorgetti, ecution for humans, 2017. https: J. Julliand, Combining static analysis and //blog.trailofbits.com/2017/04/27/ test generation for C program debugging, manticore-symbolic-execution-for-humans. in: Tests and Proofs - 4th International [22] Y. Shoshitaishvili, R. Wang, C. Salls, Conference, TAP@TOOLS 2010, Málaga, N. Stephens, M. Polino, A. Dutcher, J. Grosen, Spain, July 1-2, 2010. Proceedings, volume S. Feng, C. Hauser, C. Kruegel, G. Vigna, Sok: 6143 of Lecture Notes in Computer Science, (state of) the art of war: Offensive techniques Springer, 2010, pp. 94–100. URL: https:// in binary analysis (2016). doi.org/10.1007/978-3-642-13977-2_9. doi:10. [23] C. Cadar, D. Dunbar, D. R. Engler, KLEE: 1007/978-3-642-13977-2\_9. unassisted and automatic generation of high- [32] F. Kirchner, N. Kosmatov, V. Prevosto, coverage tests for complex systems programs, J. Signoles, B. Yakobowski, Frama-c: A in: 8th USENIX Symposium on Operating Sys- software analysis perspective, Formal Asp. tems Design and Implementation, OSDI 2008, Comput. 27 (2015) 573–609. doi:10.1007/ December 8-10, 2008, San Diego, California, s00165-014-0326-7. USA, Proceedings, 2008, pp. 209–224. [PDF] [33] N. Williams, B. Marre, P. Mouy, M. Roger, [site]. Pathcrawler: Automatic generation of path [24] B. Dolan-Gavitt, P. Hulin, E. Kirda, T. Leek, tests by combining static and dynamic anal- A. Mambretti, W. Robertson, F. Ulrich, ysis, in: Dependable Computing - EDCC-5, R. Whelan, Lava: Large-scale automated vul- 5th European Dependable Computing Confer- nerability addition, in: 2016 IEEE Symposium ence, Budapest, Hungary, April 20-22, 2005, on Security and Privacy (SP), 2016, pp. 110– Proceedings, 2005, pp. 281–292. URL: https: 121. doi:10.1109/SP.2016.15, [PDF]. //doi.org/10.1007/11408901_21. doi:10.1007/ [25] W. Cui, M. Peinado, S. K. Cha, Y. Fratantonio, 11408901\_21. V. P. Kemerlis, Retracer: Triaging crashes by [34] X. Ge, K. Taneja, T. Xie, N. Tillmann, Dyta: reverse execution from partial memory dumps, dynamic symbolic execution guided with static in: Proceedings of the 38th International Con- verification results, in: Proceedings of the 33rd ference on Software Engineering, ICSE ’16, International Conference on Software Engineer- ACM, New York, NY, USA, 2016, pp. 820–831. ing, ICSE 2011, Waikiki, Honolulu , HI, USA, doi:10.1145/2884781.2884844, [PDF]. May 21-28, 2011, ACM, 2011, pp. 992–994. [26] K. Serebryany, D. Bruening, A. Potapenko, doi:10.1145/1985793.1985971. D. Vyukov, Addresssanitizer: A fast address [35] A. Y. Gerasimov, Directed dynamic symbolic sanity checker, in: Presented as part of the execution for static analysis warnings confir- 2012 USENIX Annual Technical Conference mation, Program. Comput. Softw. 44 (2018) (USENIX ATC 12), USENIX, Boston, MA, 316–323. doi:10.1134/S036176881805002X. 2012, pp. 309–318. [PDF] [code]. [36] V. P. Ivannikov, A. A. Belevantsev, A. E. [27] W. Dietz, P. Li, J. Regehr, V. Adve, Un- Borodin, V. N. Ignatiev, D. M. Zhurikhin, derstanding integer overflow in c/c++, in: A. Avetisyan, Static analyzer svace for find- Proceedings of the 34th International Confer- ing defects in a source program code, Pro- ence on Software Engineering, ICSE ’12, IEEE gram. Comput. Softw. 40 (2014) 265–275. Press, Piscataway, NJ, USA, 2012, pp. 760–770. doi:10.1134/S0361768814050041. [PDF]. [37] A. Y. Gerasimov, L. V. Kruglov, M. K. Er- [28] K. Serebryany, T. Iskhodzhanov, Threadsan- makov, S. P. Vartanov, An approach to reach- itizer: Data race detection in practice, in: ability determination for static analysis defects Proceedings of the Workshop on Binary In- with the help of dynamic symbolic execution, strumentation and Applications, WBIA ’09, Program. Comput. Softw. 44 (2018) 467–475. ACM, New York, NY, USA, 2009, pp. 62–71. doi:10.1134/S0361768818060051. 36 Proceedings of the 28th C&ESAR (2021) R. David, J. Salwan and J. Bourroux [38] M. Li, Y. Chen, L. Wang, G. Xu, Dynami- service platform, 2021. [code]. cally validating static memory leak warnings, in: Proceedings of the 2013 International Sym- posium on Software Testing and Analysis, IS- STA 2013, Association for Computing Machin- ery, New York, NY, USA, 2013, p. 112–122. doi:10.1145/2483760.2483778. [39] M.-D. Nguyen, S. Bardin, R. Bonichon, R. Groz, M. Lemerre, Binary-level directed fuzzing for use-after-free vulnerabilities, in: 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020), USENIX Association, San Sebastian, 2020, pp. 47–62. [site]. [40] M. Böhme, V. Pham, M. Nguyen, A. Roy- choudhury, Directed greybox fuzzing, in: Pro- ceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, October 30 - November 03, 2017, ACM, 2017, pp. 2329–2344. doi:10.1145/3133956.3134020. [41] R. Majumdar, K. Sen, Hybrid concolic test- ing, in: 29th International Conference on Soft- ware Engineering (ICSE 2007), Minneapolis, MN, USA, May 20-26, 2007, 2007, pp. 416–426. URL: https://doi.org/10.1109/ICSE.2007.41. doi:10.1109/ICSE.2007.41. [42] N. Stephens, J. Grosen, C. Salls, A. Dutcher, R. Wang, J. Corbetta, Y. Shoshitaishvili, C. Kruegel, G. Vigna, Driller: Augmenting fuzzing through selective symbolic execution, in: 23rd Annual Network and Distributed Sys- tem Security Symposium, NDSS, 2016. [43] I. Yun, S. Lee, M. Xu, Y. Jang, T. Kim, QSYM : A practical concolic execution engine tailored for hybrid fuzzing, in: 27th USENIX Security Symposium (USENIX Security 18), USENIX Association, Baltimore, MD, 2018, pp. 745–761. [site]. [44] Google, Clusterfuzz - scalable fuzzing infras- tructure, 2021. [code]. [45] P. Goodman, G. Grieco, A. Groce, Tutorial: Deepstate: Bringing vulnerability detection tools into the development cycle, in: 2018 IEEE Cybersecurity Development, SecDev 2018, Cambridge, MA, USA, September 30 - October 2, 2018, 2018, pp. 130–131. doi:10. 1109/SecDev.2018.00028. [46] S. Österlund, E. Geretto, A. Jemmett, E. Güler, P. Görz, T. Holz, C. Giuffrida, H. Bos, Collabfuzz: A framework for collaborative fuzzing, in: Proceedings of the 14th Euro- pean Workshop on Systems Security, EuroSec ’21, 2021, p. 1–7. [47] Microsoft, Onefuzz - a self-hosted fuzzing-as-a- Proceedings of the 28th C&ESAR (2021) 37