=Paper=
{{Paper
|id=Vol-3056/paper-02
|storemode=property
|title=From Source Code to Crash Test-Case through Software Testing Automation
|pdfUrl=https://ceur-ws.org/Vol-3056/paper-02.pdf
|volume=Vol-3056
|authors=Robin DAVID,Jonathan SALWAN,Justin BOURROUX
}}
==From Source Code to Crash Test-Case through Software Testing Automation==
<pdf width="1500px">https://ceur-ws.org/Vol-3056/paper-02.pdf</pdf>
<pre>
From source code to crash test-cases through software
testing automation
Robin David1 , Jonathan Salwan2 and Justin Bourroux3
1
  Quarkslab, 13 rue Saint Ambroise, Paris, France
2
  Pirate, Atlantic Ocean, Earth
3
  DGA-MI, Bruz, France


                                       Abstract
                                       Finding weaknesses and vulnerabilities in a source code is a difficult task. An approach to tackle this issue is
                                       static analysis. However, existing solutions and tools tend to generate numerous alerts and especially false
                                       positives. This paper presents an approach automating the software testing process from source code up to the
                                       dynamic testing of the compiled program. More specifically, from a static analysis report indicating alerts on
                                       source lines, it enables trying to cover these lines dynamically and opportunistically checking whether or not
                                       they can trigger a crash. The result is a test corpus allowing to cover alerts and to trigger them if they happen
                                       to be true positives. This paper discusses the methodology employed to track alerts down in the compiled
                                       binary, the testing engines selection process and the results obtained on a TCP/IP stack implementation for
                                       embedded and IoT systems.

                                       Keywords
                                       Software Testing, Static Analysis, Fuzzing, Dynamic Symbolic Execution, Vulnerability Research


1. Introduction                                             search is performed in the context of the PASTIS
                                                            project (Programme d’Analyse Statique et de Tests
Context Evaluating the security and finding flaws Instrumentés pour la Sécurité) financed by DGA-
in source code is a tedious task in software test- MI which focuses on C, C++ programs and more
ing. As a baseline, multiple guidelines have been specifically network related services.
published for a wide range of industries like auto-
motive [1], aircraft [2] or aerospace [3] to identify Contributions We present an automated testing in-
weak and vulnerable code constructs. Applied for C frastructure combining different testing techniques,
code, the most known are MISRA C [4] and CERT namely fuzzing and Dynamic Symbolic Execution
C [5]. These standards are integrated in off-the-shelf (DSE). Combining heterogenous testing engines to
static analyzers [6, 7, 8] which usually generate nu- fuzz the same target is now usually called ensemble
merous alarms with substantially high false-positive fuzzing [9].
rates. Therefore, analyzing results is a lengthy and          We implemented our own fuzzing infrastructure
cumbersome process. Few research in litterature and performed an experimental study of existing
intend to solve the issue of validating alarms as testing techniques namely fuzzing and DSE. We
generating a crashing or a violating test-case is an developped a benchmark test suite trying to reveal
open research question. It requires solving both a idiosyncratic behaviors of tested tools. Based on
reachability and a satisfiability issue in the program. results obtained we selected honggfuzz [10] and
   Our research does not address this issue directly triton [11], respectively for fuzzing and DSE. To
but aims at bridging the gap between alerts identi- summarize our research provides the following con-
fied at source level and the dynamic testing. That tributions:
process aims at opportunistically covering and val-
idating these alerts. We intend to automate the                 • experimental study of existing techniques
process as much as possible so that the analyst can                and tools on a dedicated benchmark ;
focus on hard to reach corner-case alerts. This re-             • combination of a static analyzer with an en-
                                                                   semble fuzzer aggregating heterogenous soft-
C&ESAR’21: Automation in cybersecurity, November                   ware testing engines (greybox fuzzing and
16–17, 2021, Couvent de Jacobins, Rennes, France                   DSE);
$ rdavid@quarkslab.com (R. David);
jsalwan@quarkslab.com (J. Salwan)
                                                                •  consolidation of this combination in a semi-
        © 2021 Copyright for this paper by its authors. Use        automated workflow that starts from alerts
        permitted under Creative Commons License Attribu-
        tion 4.0 International (CC BY 4.0).                        on source code lines, track them back in
        CEUR Workshop Proceedings (CEUR-
    CEUR
    Workshop
    Proceedings

        WS.org)
                  http://ceur-ws.org
                  ISSN 1613-0073
                                                                   the compiled binary and triggers automated


Proceedings of the 28th C&ESAR (2021)                                                                                                               27
From Source Code to Crash Test-Case through Software Testing Automation


       testing to cover them and to trigger the bug             AFL and Honggfuzz are two leading implementations
       if any. That process leads to the generation             of greybox fuzzers (now superseded by AFL++ [17]).
       of a test corpus;                                        AFL/QBDI enables binary-only fuzzing by interfacing
     • benchmark assessing the robustness of two                AFL with QBDI [18]. This combination also enables
       TCP/IP stacks which enabled uncovering a                 on the fly optimizations, for instances, breaking
       remote Denial-of-Service (DOS) which got                 comparisons with constants which are notoriously
       assigned the identifier CVE-2021-267881 .                hard in mutational approaches. PULSAR has been
                                                                selected for its availability to test network protocols.
                                                                Input generation based on models (partially infered)
2. Experimental study of                                        in comparison to AFL and Honggfuzz which are using
   techniques and tools                                         genetic algorithms [19].

2.1. Software testing techniques                       Dynamic Symbolic Execution also called whitebox-
                                                       fuzzing uses a modeling of instruction semantic be-
In the past decade, fuzzing [12] and dynamic sym- havior to perform its execution. Instructions are
bolic execution [13], two software testing techniques, disassembled and lifted in a semantic representation
have revealed themselves as being very efficient at called intermediate representation used for emula-
detecting and triggering bugs. While fuzzing tends tion. A path 𝜋 in the program is then represented
to be very fast it can be hindered by some code as a first-order logic formula (usually on bitvectors)
constructs preventing it to progress in program ex- that is then given to an SMT solver [20]. A solution
ploration. Contrarily, DSE reasons more precisely of this formula is an input covering the path 𝜋.
on a per path manner but is significantly slower.
Hence, we assessed various fuzzers and DSE engines                        manticore   KLEE   angr   Triton
to select one candidate of each to be combined to- Version                   0.2.5     2.1   8.18     0.7
gether. Criterias and methodology are described in      Langage               Py      C++     Py   C++, Py
Section 2.2.                                            Open-source           ✓        ✓      ✓       ✓
                                                                 Base                   binary   source   binary    binary
                                                                 Intermediate Repr.    custom    LLVM      VEX     custom
Fuzzing is the mean of feeding pseudo-random in-                 Variadic argv size       ✗        ✗        ✓         ✗
puts to the program in order to trigger unexpected               Library calls            ∼        ✓        ✓         ∼
behaviors. Inputs can be generated randomly or                   Syscalls                 ✓        ✓        ✓         ✗
using some feedback mechanisms. The most com-                    Symbolic mem. read       ✓        ✓        ✓         ✗
monly used feedback is coverage but other feed-                  Symbolic mem. write      ✗        ∼        ✓         ✗
backs have been proposed in litterature [14]. So-                bit-vectors              ✓        ✓        ✓         ✓
called greybox fuzzers like AFL [15], libfuzzer [16]             Arrays                   ✓        ✓        ✓         ✗

or honggfuzz [10] uses compilation-time static in- Table 2
strumentation of the program to obtain feedback at Comparison of selected DSE tools
runtime.
                         AFL    Honggfuzz   AFL/QBDI   PULSAR      Table 2 shows DSE engines tested in this study.
 Version                2.52b      1.7          -         -     Both manticore [21] and angr [22] are developped
 Language                 C         C          C       Python   in Python and provide similar features. They imple-
 Open-source              ✓         ✓          ✓         ✓      ment a wide range of library calls and syscalls, and
 binary fuzzing           ✗         ✗          ✓         ✗      support to some extend symbolic reads and writes
 Static instr.            ✓         ✓          ✓         ✓      in memory. Triton [11] provides more elementary
 Dynamic instr.           ✗         ✗          ✓         ✗
                                                                functionalities but is designed to be modular in or-
 Seed-scheduling          ✓         ✓          ✓         ✗
                                                                der to be embedded in a whole set of other utilities.
 model input gen.         ✗         ✗          ✗         ✓
 mutation input gen.      ✓         ✓          ✓         ✓
                                                                KLEE [23] works on LLVM and is the reference in
 In-memory fuzzing        ✓         ✓          ✓         ✗      DSE.
 Crash dedup/prio.        ✓         ✓          ✓         ✗

Table 1                                                         2.2. Methodology & Benchmarking
Comparison of selected fuzzers                   To bring out two final candidates for fuzzing and
                                                 DSE, we designed a test suite. It enables checking
  Table 1 shows fuzzers that have been assessed. specific behaviors on small snippets (atomic tests) as
                                                 well as testing the scale on larger programs. Atomic
     1
         https://nvd.nist.gov/vuln/detail/CVE-2021-26788


28                                                                         Proceedings of the 28th C&ESAR (2021)
                                                                                   R. David, J. Salwan and J. Bourroux


tests assess the behavior, on symbolic pointers, han-            3. Testing Automation
dling of non-deterministic instructions and a variety
of vulnerability categories (buffer-overflow, integer-           3.1. Overview
overflow, use-after-free etc.). For scalability bench-
                                                                 The process of automating the dynamic testing of a
mark, uniq and base64 binaries of the LAVA-M
                                                                 source code is depicted in Figure 1. First, the code
project [24] have been used. This suite provides the
                                                                 has to be harnessed4 to target the components of
ground truth along with some quantitative results
                                                                 interest. It has to be prepared for both fuzzing and
(72 bugs in total in the two binaries).
                                                                 symbolic execution, which both have to be compiled
   To smooth statistical discrepancies of results
                                                                 differently. That step is highly manual and usually
caused by the random nature of fuzzing, tests were
                                                                 requires a good understanding of the target. Then
run multiple times and the mean value was com-
                                                                 the harnessed code can be provided to a Source
puted. Each 70 atomic tests were run 3 times for a
                                                                 Code Analysis Tool (SAST) that will generate a
maximum duration of 300 seconds while scale bina-
                                                                 report of suspicious lines of code. These data are
ries were run 3 times for 6h. Table 3 shows synthetic
                                                                 used to embed intrinsic function 5 calls in the target
results of all utilities on this test suite2 . Every tools
                                                                 in an automated manner. The code is compiled and
have been configured opportunistically with the
                                                                 provided to both testing engines that will intend to
best parameters to provide them fair chances. The
                                                                 cover faulty lines and to generate crashing inputs.
PULSAR results have been excluded because it was
                                                                 During that process they will communicate together
not possible to run it correctly on the benchmark
                                                                 to help each other. The final output is a report of
targets due to its network protocols focus.
                                                                 alerts, indicating whether they have been covered
                Atomic (70)      Scale (72)    Total (142)       or not and whether a crash has been associated to
                                                                 it. Automating most of these steps enable pruning
 AFL                48               0           48/142
                                                                 some alerts enabling the analyst to focus on deepest
 Honggfuzz          54               44         100/142
                                                                 uncovered ones.
 AFL/QBDI           47               33          80/142
                                                                    For the purpose of this research, a fuzzing cam-
 manticore           34               0          34/142          pain is expected to run for at most 24h. That time
 KLEE                47               1          48/142          cap has arbitrarily been set at the begining of the
 angr                37               0          37/142          PASTIS project.
 Triton              47               0          47/142

Table 3                                                          3.2. Collaborative architecture
Test suite benchmark results                                     A challenge in designing an automated workflow
                                                                 is making the fuzzing and DSE to collaborate to-
   Fuzzing results shows that honggfuzz outper-                  gether and determining what kind of information to
formed other engines and it has consequently been                exchange. As both of these approaches work rather
kept as the reference engine for fuzzing. For sym-               differently and have different notions of coverage,
bolic execution, while klee [23] outperformed other              exchanging this kind of information directly is in-
engines Triton has been selected. Being develop-                 herently complicated. Moreover, it makes difficult
pers of Triton, the code is familiar to us, which                integrating new engines. Hence, each engine solely
makes it very easy to extend it and to modify it for             exchange input seeds they generate with regards to
PASTIS needs. Also, KLEE comes with two main                     theirs own coverage metric. The remote engine is
issues for combining it with other fuzzing engines.              in charge of deciding whether it is valuable to keep
First, as it works at LLVM-IR level it requires the              an input or not. The exchange medium is described
program at source-level3 . Then its code-base is                 in Section 4.
evolving fast and contains many research-related                    The communication is performed through a cen-
features making it difficult to integrate it in a fully-         tral authority called broker which enables connect-
automated workflow.                                              ing multiple instances of the engines. Figure 2 shows
                                                                 the general overview of the collaborative architec-
                                                                 ture. At startup, the broker provides the binary
                                                                 with appropriate parameters to the engines. Then
                                                                 during the execution it forwards all the inputs re-
    2
      Versions of tools are slightly outdated has the experi-
ment was perfomed in 2018.                                          4
                                                                      Explaining the process is left out-of-scope for this paper.
    3                                                               5
      While it is the case in this study, our goal was to make        ad-hoc function added in a code base, that will receive
the PASTIS framework applicable to binary-only targets.          specific processing at runtime by a third-party tool.


Proceedings of the 28th C&ESAR (2021)                                                                                        29
From Source Code to Crash Test-Case through Software Testing Automation

                                                                                             compilation
                                                                                                                 fuzzing
                                                                                              for fuzzer
            Source               code                            Alerts        intrinsic                                            Report
                                                     SAST
             code             harnessing                         report       insertion                                             - coverage
                                                                                             compilation                            - validation
                                                                                                                  DSE
                                                                                               for DSE


Figure 1: Full Analysis Workflow


                                                                          Master
                       Initial Configuration                                                                       Workspace
                                                                      pastis-broker
               - binary                                                                                      - corpus / crashes / hangs
               - SAST report (klocwork)                                                                      - log and client statistics
               - configurations (coverage strategy, etc)                  communication                      - CSV of results
                                                                           (libpastis)


                                                             1                                    1
           1. Connection (idle)
           2. Reception of binary (+opts)                    2                                    2
           3. Seed exchange (+logs)
                                                             3                                    3
           4. Infos of alert validation
           5. Stop                                           4                                    4
                                                             5                                    5


                                            communication                                                     communication
                                             (libpastis)                                                        (libpastis)

                                       Python driver                                                         Pastis-DSE
                                                    execve
                                                                                                              TritonDSE
                                                                                                           (exploration of paths)
                                            Honggfuzz
                                                                                                                 Triton
                                                                                                   (symbolic execution of one path)

                                       Fuzzing                                                              DSE
                                  pastis-honggfuzz                                                     pastis-triton

Figure 2: Global Collaborative Architecture Overview


ceived from one engine to the others. During the       Hence our workflow uses intrinsic functions. The
fuzzing campaign, if an engine covers or validates  Listing 1 shows the intrinsic function used. It takes
an alert it sends its identifier and the associated an identifier, a format string and an arbitrary num-
input to the broker that centralizes all data.      ber of values as argument. Its sole purpose is print-
                                                    ing the identifier given in parameter. Then, at each
Alert Validation To validate alerts discovered at alert location a call to this function will be added
source-level they have to be trackable down in the with an unique identifier, the type of issue identified
compiled binary for the test engines. Compiling by the static analyzer and contextual parameters.
binaries in debug mode enables tracking the asso- For instance, it enables retrieving sizes of buffers
ciated line of code for each assembly instructions. that is known to the compiler but lost once com-
However, that requires each engines to be able to piled.
leverage debug information.                            At runtime, a test engine will either have to parse
                                                    stdout to find covered alerts, or to directly hook the
#ifdef QB_INTRINSIC                                 intrinsic functions (depending on its inner-working).
int __klocwork_alert_placeholder(int id,            When detecting a crash or violation, engines are
 ˓→  const char* fmst, ...){                        in charge to map it to a previously covered alert
   printf("REACHED ID %d\n", id);                   if applicable. Tracking the root-cause of a crash is
                                                    still an open research problem [25] thus it is done
   return id;
                                                    here in an empirical manner. The last encountered
}                                                   alert is considered to be the cause of the crash and
#endif                                              is thus considered validated. Note that we cannot
                                                    invalidate an alert as being a false positive because
                                                    of the potential infinite numbers of paths leading to
            Listing 1: Intrinsic function
                                                    that code location (path combinatorial problem).


30                                                                                         Proceedings of the 28th C&ESAR (2021)
                                                                               R. David, J. Salwan and J. Bourroux


Figure 3: Sample alert report Klocwork


   Crashes or violations are detected in a different         is based on the message-queuing framework ZMQ7
manner between fuzzing and DSE. Modern fuzzers               so that it is interoperable with almost all existing
uses sanitizers like ASan [26], UBSan [27] or else           programming languages.
TSan [28] that respectively detect: memory corrup-              The analyst launches pastis-broker with all the
tions, undefined-behavior or race-conditions. DSE            target binary variants, an initial corpus if needed,
engines usually implement their own sanitizers that          some configuration parameters, and the klocwork
leverage the analysis precision of symbolic execu-           report to stop the campaign when all alerts are
tion to implement fine-grain sanitizers. In this case,       covered or validated.
the sanitizers can also use contextual information              Then the various test engines have to be launched
provided as argument of intrinsic functions to im-           with the broker IP address to receive all the fuzzing
plement their checks.                                        campaign data. If an engine supports different cov-
                                                             erage strategies (block, edge, path etc) and multiple
                                                             instances are connected to the broker, it will auto-
4. Implementation                                            matically equilibrate the coverage strategies.
Target Setup Once the harness of the target pro-
gram is implemented for all fuzzers included in the          Honggfuzz Integration Honggfuzz [10] is a mod-
platform, the code is given to the SAST tool. In             ern greybox fuzzer developped in C++. Besides,
this research, the software Klocwork [7] developped          being very efficient on many targets it has not been
by Perforce has been used. As output of its analy-           designed for collaborative fuzzing. As a consequence,
sis, it provides an HTML file indicating faulty lines        small modifications have been made on its core. The
along with some additional contextual data (vari-            most important is the ability to receive new inputs
able names, buffer sizes ...). Figure 3 provides an          while it is already fuzzing8 . Thereupon, a Python
exemple of such report.                                      wrapper has been developped to perform all commu-
   Our semi-automated workflow takes the report              nications with the broker, to parse stdout, to inject
in input, translates it in JSON for easier processing        external inputs received and to send the broker all
and uses the result to automatically add intrinsic           inputs generated.
function calls in the source code. This code addition          Such overlay is called a driver, as it enables inter-
is made syntactically on a per line basis and thus           facing an existing engine to the PASTIS framework.
requires to be double-checked by the analyst6 . Then         Figure 4 summarizes the main interactions between
the various variants of the program are compiled for         Honggfuzz and the wrapper with which all inter-
each engines or target architecture (x86_64, ARM).           communications are performed through filesystem
The target program is now ready to be tested using           monitoring (inotify on Linux). The whole compo-
the PASTIS framework.                                        nent is called pastis-honggfuzz.


PASTIS Farmework The main interface with the                 Triton Integration Triton [11] is a DSE frame-
analyst is the broker called pastis-broker. It is            work library designed to perform symbolic execu-
implemented in Python and ensures all communica-             tion on a given path. The whole logic of loading the
tions between engines. The communication protocol            program, scheduling input seeds, covering different

                                                                7
                                                                    https://github.com/zeromq
   6                                                            8
       An implementation using clang AST is being studied.          The feature had been submitted as merge request.


Proceedings of the 28th C&ESAR (2021)                                                                                  31
From Source Code to Crash Test-Case through Software Testing Automation


               communication
                (libpastis)
                                                 inotify on the
                                                 file
                                                                                Workspace
                                                                                                      5. Experimental Results
       Replay      KlocworkReport
                                                        add logs +     stats.log        target        5.1. CycloneTCP target
              HF-Wrapper                                telemetry


                       execve                        inotify folder
                                                     modification
                                                                           initial     dynamic
                                                                                                      While this technique is applicable to any kind of
                                                        reading                              inputs
                                                                                                      software, the PASTIS project is centered on test-
              Honggfuzz                                 (in a loop)

                                                                        coverage       crashs
                                                                                                      ing low-level network TCP/IP stacks. Among ex-
                                                                                                      isting open-source implementations, CycloneTCP9
       pastis-honggfuzz                          writing of corpus                          outputs   developped by Oryx-embedded, provides an imple-
                                                   and crashes
                                                                                                      mentation for a wide variety of protocols. Recent
                                                                                                      publications have shown it to be robust in compari-
Figure 4: Honggfuzz engine
                                                                                                      son to other TCP/IP stacks [29].

                                                                                                         The stack provides a driver mechanism to receive
paths is left to the developper. To address this issue,                                               network frames for various MCUs and OSes. The
a fully-featured DSE engine called TritonDSE has                                                      target program is a simple HTTP server with a
been developed at the top of Triton. For the pur-                                                     single static page. Only standard protocols are ac-
pose of the PASTIS framework, a program called                                                        tivated (Ethernet, IP, TCP, HTTP, ARP) in the
PastisDSE has also been built on top of TritonDSE.                                                    target. Other protocols like, DNS, LLMNR, MBNS
This program performs all the communications with                                                     are not activated to focus on assessing the abil-
the broker which include receiving external inputs                                                    ity of engines to handle full TCP communications.
and sending ones generated by Triton. Figure 5                                                        The harness implements a driver which reads in-
summarizes interactions of these components within                                                    put frames from a file. A single input is thus a
the so-called pastis-triton component.                                                                sequence of frames representing incoming messages
   That component implements code, edge and path                                                      from a client. The harness also tears down the multi-
coverage strategies. It also implements different san-                                                threading logic into a single threaded application
itizers for each category of vulnerability considered.                                                enabling processing network frames in a sequen-
As such, memory operations are tracked at the bit-                                                    tial manner. While it prevents finding potential
level, and it enables detecting pecisely off-by-one                                                   race-conditions it strongly reduces non-reproducible
(OB1). Use-After-Free are detected by tracking the                                                    test-cases. As part of the harness, various patches
malloc and free primitives.                                                                           were made in the code to remove checksums ver-
   In this setting, pastis-triton is launched in                                                      ification, add a pre-registered ARP lease (for the
pure-emulation (thus not as a concolic engine) to                                                     client) and to remove randomness of TCP Initial
better control all side-effects and to allow the execu-                                               Sequence Number (ISN).
tion of Aarch64 binaries on x86 hosts. In essence, it
has to emulate all the side-effects performed on the
system (libc functions, syscalls etc). As it cannot                                                   5.2. Controlled environement
be exhaustive, it only supports a limited number of                                                   To assess the workflow effectiveness, defects and
libc functions and syscalls.                                                                          vulnerabilities have been added to the CycloneTCP
                                                                                                      code. Defects are code constructs raising a SAST
                                                                                                      alert but which are structurally not triggerable, and
              communication
               (libpastis)
                                                                               Workspace
                                                                                                      vulnerabilities are defects that can be triggered.
                                                                                        target
                                                                                                      Such controlled benchmark enables checking the
                                                                      config.json
              Pastis-DSE                                                                              effectiveness of the framework to cover secluded
                                                                       metadata
  Callbacks
                      Strategy
                       (ALERT_ONLY, CHECK_ALL)                                                        locations of the code and to trigger vulnerabilities
              Tritondse                                                                               by creating the appropriate test-case (input).
                                                                         corpus        crashes
                                                                                                         Adding relevant defects is tedious as they have
         (exploration of paths)


                Triton
                                                                         worklist       hangs         to fullfil the following properties:
                                                                                                         • reachability: they have to be reachable by a
     (symbolic execution of one path)
                                                                                            seeds

         pastis-triton                                                                                     test-case
                                                                                                         • conditionality: they should be reachable un-
Figure 5: pastis-triton engine overview                                                                    der some conditions (not covered systemati-
                                                                                                           cally)
                                                                                                         9
                                                                                                           https://www.oryx-embedded.com/products/
                                                                                                      CycloneTCP


32                                                                                                              Proceedings of the 28th C&ESAR (2021)
                                                                         R. David, J. Salwan and J. Bourroux


    • non-interference: a defect should not alter            As part of the testing, many test-cases were caus-
      the reachability, detectability of another one       ing the program to hang forever. While it strongly
    • detectability: vulnerabilities should be trig-       reduced the fuzzing speed it revealed to be a true
      gerable                                              0-day in the parsing of TCP options. It has responsi-
    • expressiveness: the coverage shall express the       bly been disclosed to Oryx-embedded which quickly
      exhaustivness of the coverage (e.g: managing         published a patch. The vulnerability obtained the
      to craft DHCP header, managing to enter              CVE identifier CVE-2021-2678810 11 .
      HTTP parser, IPv4 reassembly etc)
                                                           5.4. Limitations
   To diversify vulnerabilities, 5 types are considered:
BoF for buffer-overflow, IoF for integer-overflow, OB1     Most of the analysis steps depicted in Figure 1 can
for off-by-one, FMT for format-string (handling user-      be automated, but as of now, the most difficult ones
input as format), UaF for Use-After-Free and SIGS          still requires analyst. As expected, the analyst has
for memory corruption (null pointers dereference           to write the harness for the target. He has to make
etc). Among the 20 defects added (shown in Ta-             it compilable for all testing engines and he has to
ble 4), 5 of them were not detected by the SAST            control that automatic insertion of intrinsics does
(klocwork). Weaknesses of SAST tools is left out-          not break the program semantic.
of-scope for this research. In the automated process,         While this research shows that automating most
no intrinsic functions are added for these issues and      of the workflow is possible, combining both a fuzzer
thus cannot be detected and validated. As a conse-         and DSE raises multiples issues that are yet to be
quence, the benchmark contains 15 issues for which         addressed. Indeed, such heterogenous algorithms
the ground-truth is available. The PASTIS frame-           hardly work together. Experiments shows that
work then have to cover and validate alerts within         fuzzing generates numerous test-cases that DSE
the 24h time slot. Also, the test engine starts its        replays significantly more slowly. It thus spends a
campaign with a single input in its initial corpus         significant amount of time performing its dry-run12
that represents a complete TCP connection.                 to update its coverage with inputs received. The
                                                           coverage synchronisation between engines is thus a
5.3. Results                                               bottleneck for symbolic execution.
                                                              Also, DSE in pure emulation requires a large
Table 4 shows coverage and detection results.              number of syscall and external libraries modeling
Within 24h, all intrinsic function calls correspond-       to scale on significantly larger code base. From the
ing to identified alerts have been covered and 77%         side-effect modeling perspective, scaling on signif-
of vulnerabilities correctly validated. Multiple vul-      icantly larger codebase can be addressed using a
nerabilities are validated in less than a minute and       concolic execution mode. Such an approach relies
few of them took more than 3 hours to be detected.         more heavily on concrete values during the execu-
   The generated test corpora covers 42% of the            tion which does not need to be modeled. Conversely,
whole code lines. While it seems low, it repre-            the reasoning power of the symbolic aspect is re-
sent almost all the coverable code. The rest being         duced as side-effects are not modeled symbolically.
client-side functions only called when being used             Because of DSE limitations, current benchmarks
as a client. Besides that, the code is written in a        results do not reflect a clear gain in combining
defensive manner which implies that multiple error-        fuzzing and DSE rather than running them sepa-
handling code are never covered. For instance, code        rately.
handling malloc errors is never called as no out of
memory were triggered. Quantitative results and
experiments revealing the improvement of combin- 6. Related work
ing both testing engines have not yet been evaluated
and is left as a future work.                         Static analysis warning driven exploration Com-
                                                      bining static analysis and dynamic testing to obtain
                                                      better results than each technique taken separately
   Depending on the class of defects, validation dif-
ficulty varies. For example, FMT appeared to be         10
                                                           https://blog.quarkslab.com/
harder to trigger as it requires the engine to gener- remote-denial-of-service-on-cyclonetcp-cve-2021-26788.
ate faulty format strings (e.g %s). Conversly, IoF do html
                                                        11
generates multiple false positives as the engine does   12
                                                           https://nvd.nist.gov/vuln/detail/CVE-2021-26788
                                                           Corpus replay to update the engine internal coverage.
not know if the operation is performed on signed or Inputs run are not mutated. The dry-run typically decides
unsigned integers.                                    whether the input is worth being kept or not.


Proceedings of the 28th C&ESAR (2021)                                                                        33
From Source Code to Crash Test-Case through Software Testing Automation

                                                                                 Honggfuzz        Triton
        Id    Type     D     V    Proto.     Function
                                                                                 Cov   Val.    Cov     Val.
        1     OB1            ∙    HTTP       httpParseRequestLine                 ✓     ✓       ✗       ✗
        2     FMT            ∙    HTTP       httpSendErrorResponse                ✓     ✓       ✗       ✗
        3     IoF      ∙          HTTP       httpSendRedirectResponse             ✓     -       ✓        -
        4     BoF      ∙          HTTP       httpSendRedirectResponse             -     -       -        -
        5     FMT            ∙    HTTP       httpReadRequestHeader                ✓     ✗       ✗       ✗
        6     UaF      ∙          HTTP       httpSendRedirectResponse             -     -       -        -
        7     BoF            ∙    HTTP       httpParseRequestLine                 ✓     ✓       ✓       ✓
        8     BoF            ∙    HTTP       httpParseContentTypeField            ✓     ✓       ✓       ✓
        9     FMT      ∙          HTTP       httpFormatResponseHeader             ✓     -       ✗        -
        10    FMT            ∙    HTTP       httpParseContentTypeField            ✓     ✗       ✗       ✗
        11    OB1            ∙    HTTP       httpDecodePercentEncoded.            -     -       -        -
        12    IoF      ∙           IPv4      ipv4ProcessPacket                    ✓     -       ✓        -
        13    SIGS           ∙     ARP       arpProcessReply                      ✓     ✓       ✓       ✓
        14    SIGS     ∙          ICMP       icmpProcessEchoRequest               ✓     -       ✓        -
        15    BoF            ∙    ICMP       icmpSendErrorMessage                 -     -       -        -
        16    UaF            ∙     IPv4      ipv4FragmentDatagram                 ✓     ✓       ✓       ✗
        17    OB1      ∙           core      formatDate                           ✓     -       ✓        -
        18    SIGS     ∙          ETH.       ethSendFrame                         ✓     -       ✗        -
        19    UaF            ∙    IGMP       igmpProcessMessage                   ✓     ✓       ✓       ✓
        20    IoF            ∙    ICMP       icmpUpdateInStats.                   -     -       -        -
        D: Default, V: Vulnerability, Cov: Covered, Val: Validated

Table 4
Inserted vulnerabilities and detection by Honggfuzz, Triton


has already been studied. From an error-condition             Fuzzing & Symbolic Execution combination Var-
infered by a static checker Check ’n’ Crash [30] aims         ious approaches combining these two testing tech-
at generating a test-case to validate if the error truly      niques have been proposed in the past. Koushik
exists.                                                       Sen published in 2007 an Hybrid Concolic Testing
   Another combination called SANTE [31] uses the             approach combining the two [41]. Later, Driller [42]
static analyzer Frama-C [32] to detect potential              suggested a selective DSE algorithm launching Angr
runtime errors. The result is combined with                   solely when the fuzzing is getting stuck. More re-
Pathcrawler [33], a DSE to generate a test-case               cently QSym [43] intertwines the concolic execution
and to confirm the alarms. DyTa [34] another utility,         within the fuzzing in a very light yet fast manner.
follows a similar approach.                                   Finally, multiple collaborative approaches allowing
   Another category of related work rely on directed          to combine heterogenous fuzzing engines have been
approaches. Gerasimov [35] uses static analysis               proposed under the term ensemble fuzzing. Among
warnings as targets for a directed DSE algorithm              them, we can hightlight ClusterFuzz [44] by Google,
iteratively reducing the distance with the warnings           EnFuzz [9], Deepstate [45], collabfuzz [46] or more
to cover them. They use their own static analyzer             recently OneFuzz [47] by Microsoft. To our knowl-
Svace [36]. In another publication [37] they also             edge none of these ensemble fuzzers uses a static
study the reachability of the security warnings. The          analyzer as an input of test objectives.
work of Li et al. [38] suggests an approach dedicated
to Use-After-Free vulnerabilities where alloc and
free primitives are used to drive the exploration. In         7. Future work
a more general manner, multiple existing research
                                                              These preliminary results open the way to fur-
works focus on directed approaches to cover specific
                                                              ther experiments and benchmarks. Multiple ex-
locations of the program [14, 39, 40] but which are
                                                              periments can be made to optimize collaboration
not necessarily driven by a SAST.
                                                              of test engines. We are working on improving the
                                                              PASTIS framework by adding new fuzzing engines
                                                              like AFL++ [17], adding slicing features to better


34                                                                     Proceedings of the 28th C&ESAR (2021)
                                                                       R. David, J. Salwan and J. Bourroux


guide the exploration with more directed strate-          [7] Perforce, Klocwork static code analysis for c,
gies or to enlarge the project scope to binary-only           c++ and java, 2021. [site].
targets.                                                  [8] P. Cousot, R. Cousot, J. Feret, L. Mauborgne,
                                                              A. Miné, D. Monniaux, X. Rival, The astreé
                                                              analyzer, in: M. Sagiv (Ed.), Programming
8. Conclusion                                                 Languages and Systems, Springer Berlin Hei-
                                                              delberg, Berlin, Heidelberg, 2005, pp. 21–30.
This paper summarizes what has been done as part
                                                          [9] Y. Chen, Y. Jiang, F. Ma, J. Liang, M. Wang,
of the PASTIS project and its implementation in
                                                              C. Zhou, X. Jiao, Z. Su, Enfuzz: Ensemble
the PASTIS framework. We depict a test suite
                                                              fuzzing with seed synchronization among di-
enabling to discriminate and to choose a fuzzing
                                                              verse fuzzers, in: 28th USENIX Security Sym-
and DSE engine for the PASTIS plateform. We
                                                              posium, Santa Clara, CA, USA, 2019, USENIX
then describe the full workflow that we intend to
                                                              Association, 2019, pp. 1967–1983. [site].
automate. Namely, the paper discusses the process
                                                         [10] R. Swiecki, F. Gröbert, honggfuzz, https://
of analysing a source code with a SAST tool, how
                                                              github.com/google/honggfuzz, 2009.
to embed this data in the final compiled program
                                                         [11] F. Saudel, J. Salwan, Triton: A dynamic sym-
and how to automate the process of testing it with
                                                              bolic execution framework, in: Symposium sur
various heterogenous testing engines. The result
                                                              la sécurité des technologies de l’information et
is a test corpus that can be integrated as tests in
                                                              des communications, SSTIC, France, Rennes,
the project. An analyst, can use these results, to
                                                              June 3-5 2015, SSTIC, 2015, pp. 31–54.
prune and ignore irrelevant alerts, performing the
                                                         [12] B. P. Miller, L. Fredriksen, B. So, An empir-
root-cause on crashes and focusing on the remaining
                                                              ical study of the reliability of UNIX utilities,
alerts that have not been covered. This process is
                                                              Commun. ACM 33 (1990) 32–44. doi:10.1145/
required in a wide range of industries like aerospace,
                                                              96267.96279.
automative, defense, energy or any context that
                                                         [13] C. Cadar, K. Sen, Symbolic execution for
requires a higher level of insurance.
                                                              software testing: Three decades later, Com-
                                                              munications of the ACM 56 (2013) 82. doi:10.
Acknowledgments                                               1145/2408776.2408795.
                                                         [14] Y. Wang, X. Jia, Y. Liu, K. Zeng, T. Bao,
This research was realized by Quarkslab in the con-           D. Wu, P. Su, Not all coverage measurements
text of the PASTIS project financed by DGA-MI                 are equal: Fuzzing by coverage accounting for
(Direction Générale de l’Armement, Maîtrise de                input prioritization, in: 27th Annual Network
l’Information).                                               and Distributed System Security Symposium,
                                                              NDSS 2020, San Diego, California, USA, Febru-
                                                              ary 23-26, 2020, The Internet Society, 2020.
References                                               [15] M. Zalewski, American fuzzy lop, http://
                                                              lcamtuf.coredump.cx/afl/, 2018.
 [1] ISO, Road vehicles – Functional safety, 2011.
                                                         [16] L. Team, libfuzzer – a library for coverage-
 [2] L. M. Corporation, Joint Strike Fighter Air Ve-
                                                              guided fuzz testing, 2018. [site].
     hicle C++ Coding Standards For The System
                                                         [17] A. Fioraldi, D. Maier, H. Eißfeldt, M. Heuse,
     Development And Demonstration Program,
                                                              Afl++ : Combining incremental steps of
     Lockheed Martin Corporation, 2005. [PDF].
                                                              fuzzing research, in: 14th USENIX Work-
 [3] J. P. Laboratory, JPL Institutional Coding
                                                              shop on Offensive Technologies (WOOT 20),
     Standard for the C Programming Language,
                                                              USENIX Association, 2020. [site].
     2009.
                                                         [18] E. Geretto, C. Tessier, F. Massacci, A qbdi-
 [4] M. I. S. R. Association, M. I. S. R. A. Staff,
                                                              based fuzzer taming magic bytes, in: Italian
     MISRA C:2012: Guidelines for the Use of the C
                                                              Conference on Cyber Security, ITASEC 2019,
     Language in Critical Systems, Motor Industry
                                                              Pisa, Italy, February 13-15 2019, CEUR Work-
     Research Association, 2013. [book].
                                                              shop Proceedings, 2019. [PDF].
 [5] R. C. Seacord, The CERT C Secure Cod-
                                                         [19] S. Rawat, V. Jain, A. Kumar, L. Cojocar,
     ing Standard, 1st ed., Addison-Wesley Pro-
                                                              C. Giuffrida, H. Bos, Vuzzer: Application-
     fessional, 2008.
                                                              aware evolutionary fuzzing, in: 24th Annual
 [6] G. Inc., Codesonar c/c++ sast when safety
                                                              Network and Distributed System Security
     and security matter, 2021. [site].
                                                              Symposium, NDSS 2017, San Diego, Cali-
                                                              fornia, USA, February 26 - March 1, 2017,


Proceedings of the 28th C&ESAR (2021)                                                                     35
From Source Code to Crash Test-Case through Software Testing Automation


     2017. URL: https://www.ndss-symposium.                doi:10.1145/1791194.1791203, [PDF].
     org/ndss2017/ndss-2017-programme/                [29] D. dos Santos, S. Dashevskyi, J. Wetzels,
     vuzzer-application-aware-evolutionary-fuzzing/,       A. Amri, How embedded tcp/ip stacks breed
     [PDF] [code] [video].                                 critical vulnerabilities, 2020. [slide].
[20] L. M. de Moura, N. Bjørner, Satisfiabil- [30] C. Csallner, Y. Smaragdakis,                   Check ’n’
     ity modulo theories: introduction and appli-          crash: Combining static checking and test-
     cations, Commun. ACM 54 (2011) 69–77.                 ing, 2005, pp. 422–431. doi:10.1109/ICSE.
     doi:10.1145/1995376.1995394.                          2005.1553585.
[21] T. of Bits, Manticore:           Symbolic ex- [31] O. Chebaro, N. Kosmatov, A. Giorgetti,
     ecution     for    humans,      2017.     https:      J. Julliand, Combining static analysis and
     //blog.trailofbits.com/2017/04/27/                    test generation for C program debugging,
     manticore-symbolic-execution-for-humans.              in: Tests and Proofs - 4th International
[22] Y. Shoshitaishvili, R. Wang, C. Salls,                Conference, TAP@TOOLS 2010, Málaga,
     N. Stephens, M. Polino, A. Dutcher, J. Grosen,        Spain, July 1-2, 2010. Proceedings, volume
     S. Feng, C. Hauser, C. Kruegel, G. Vigna, Sok:        6143 of Lecture Notes in Computer Science,
     (state of) the art of war: Offensive techniques       Springer, 2010, pp. 94–100. URL: https://
     in binary analysis (2016).                            doi.org/10.1007/978-3-642-13977-2_9. doi:10.
[23] C. Cadar, D. Dunbar, D. R. Engler, KLEE:              1007/978-3-642-13977-2\_9.
     unassisted and automatic generation of high- [32] F. Kirchner, N. Kosmatov, V. Prevosto,
     coverage tests for complex systems programs,          J. Signoles, B. Yakobowski, Frama-c: A
     in: 8th USENIX Symposium on Operating Sys-            software analysis perspective, Formal Asp.
     tems Design and Implementation, OSDI 2008,            Comput. 27 (2015) 573–609. doi:10.1007/
     December 8-10, 2008, San Diego, California,           s00165-014-0326-7.
     USA, Proceedings, 2008, pp. 209–224. [PDF] [33] N. Williams, B. Marre, P. Mouy, M. Roger,
     [site].                                               Pathcrawler: Automatic generation of path
[24] B. Dolan-Gavitt, P. Hulin, E. Kirda, T. Leek,         tests by combining static and dynamic anal-
     A. Mambretti, W. Robertson, F. Ulrich,                ysis, in: Dependable Computing - EDCC-5,
     R. Whelan, Lava: Large-scale automated vul-           5th European Dependable Computing Confer-
     nerability addition, in: 2016 IEEE Symposium          ence, Budapest, Hungary, April 20-22, 2005,
     on Security and Privacy (SP), 2016, pp. 110–          Proceedings, 2005, pp. 281–292. URL: https:
     121. doi:10.1109/SP.2016.15, [PDF].                   //doi.org/10.1007/11408901_21. doi:10.1007/
[25] W. Cui, M. Peinado, S. K. Cha, Y. Fratantonio,        11408901\_21.
     V. P. Kemerlis, Retracer: Triaging crashes by [34] X. Ge, K. Taneja, T. Xie, N. Tillmann, Dyta:
     reverse execution from partial memory dumps,          dynamic symbolic execution guided with static
     in: Proceedings of the 38th International Con-        verification results, in: Proceedings of the 33rd
     ference on Software Engineering, ICSE ’16,            International Conference on Software Engineer-
     ACM, New York, NY, USA, 2016, pp. 820–831.            ing, ICSE 2011, Waikiki, Honolulu , HI, USA,
     doi:10.1145/2884781.2884844, [PDF].                   May 21-28, 2011, ACM, 2011, pp. 992–994.
[26] K. Serebryany, D. Bruening, A. Potapenko,             doi:10.1145/1985793.1985971.
     D. Vyukov, Addresssanitizer: A fast address [35] A. Y. Gerasimov, Directed dynamic symbolic
     sanity checker, in: Presented as part of the          execution for static analysis warnings confir-
     2012 USENIX Annual Technical Conference               mation, Program. Comput. Softw. 44 (2018)
     (USENIX ATC 12), USENIX, Boston, MA,                  316–323. doi:10.1134/S036176881805002X.
     2012, pp. 309–318. [PDF] [code].                 [36] V. P. Ivannikov, A. A. Belevantsev, A. E.
[27] W. Dietz, P. Li, J. Regehr, V. Adve, Un-              Borodin, V. N. Ignatiev, D. M. Zhurikhin,
     derstanding integer overflow in c/c++, in:            A. Avetisyan, Static analyzer svace for find-
     Proceedings of the 34th International Confer-         ing defects in a source program code, Pro-
     ence on Software Engineering, ICSE ’12, IEEE          gram. Comput. Softw. 40 (2014) 265–275.
     Press, Piscataway, NJ, USA, 2012, pp. 760–770.        doi:10.1134/S0361768814050041.
     [PDF].                                           [37] A. Y. Gerasimov, L. V. Kruglov, M. K. Er-
[28] K. Serebryany, T. Iskhodzhanov, Threadsan-            makov, S. P. Vartanov, An approach to reach-
     itizer: Data race detection in practice, in:          ability determination for static analysis defects
     Proceedings of the Workshop on Binary In-             with the help of dynamic symbolic execution,
     strumentation and Applications, WBIA ’09,             Program. Comput. Softw. 44 (2018) 467–475.
     ACM, New York, NY, USA, 2009, pp. 62–71.              doi:10.1134/S0361768818060051.


36                                                              Proceedings of the 28th C&ESAR (2021)
                                                                   R. David, J. Salwan and J. Bourroux


[38] M. Li, Y. Chen, L. Wang, G. Xu, Dynami-              service platform, 2021. [code].
      cally validating static memory leak warnings,
      in: Proceedings of the 2013 International Sym-
      posium on Software Testing and Analysis, IS-
      STA 2013, Association for Computing Machin-
      ery, New York, NY, USA, 2013, p. 112–122.
      doi:10.1145/2483760.2483778.
[39] M.-D. Nguyen, S. Bardin, R. Bonichon,
      R. Groz, M. Lemerre, Binary-level directed
      fuzzing for use-after-free vulnerabilities, in:
     23rd International Symposium on Research in
     Attacks, Intrusions and Defenses (RAID 2020),
      USENIX Association, San Sebastian, 2020, pp.
     47–62. [site].
[40] M. Böhme, V. Pham, M. Nguyen, A. Roy-
      choudhury, Directed greybox fuzzing, in: Pro-
      ceedings of the 2017 ACM SIGSAC Conference
      on Computer and Communications Security,
      CCS 2017, Dallas, TX, USA, October 30 -
      November 03, 2017, ACM, 2017, pp. 2329–2344.
      doi:10.1145/3133956.3134020.
[41] R. Majumdar, K. Sen, Hybrid concolic test-
      ing, in: 29th International Conference on Soft-
     ware Engineering (ICSE 2007), Minneapolis,
      MN, USA, May 20-26, 2007, 2007, pp. 416–426.
      URL: https://doi.org/10.1109/ICSE.2007.41.
      doi:10.1109/ICSE.2007.41.
[42] N. Stephens, J. Grosen, C. Salls, A. Dutcher,
      R. Wang, J. Corbetta, Y. Shoshitaishvili,
      C. Kruegel, G. Vigna, Driller: Augmenting
      fuzzing through selective symbolic execution,
      in: 23rd Annual Network and Distributed Sys-
      tem Security Symposium, NDSS, 2016.
[43] I. Yun, S. Lee, M. Xu, Y. Jang, T. Kim, QSYM
      : A practical concolic execution engine tailored
      for hybrid fuzzing, in: 27th USENIX Security
      Symposium (USENIX Security 18), USENIX
     Association, Baltimore, MD, 2018, pp. 745–761.
      [site].
[44] Google, Clusterfuzz - scalable fuzzing infras-
      tructure, 2021. [code].
[45] P. Goodman, G. Grieco, A. Groce, Tutorial:
      Deepstate: Bringing vulnerability detection
      tools into the development cycle, in: 2018
      IEEE Cybersecurity Development, SecDev
     2018, Cambridge, MA, USA, September 30
     - October 2, 2018, 2018, pp. 130–131. doi:10.
     1109/SecDev.2018.00028.
[46] S. Österlund, E. Geretto, A. Jemmett,
      E. Güler, P. Görz, T. Holz, C. Giuffrida, H. Bos,
      Collabfuzz: A framework for collaborative
      fuzzing, in: Proceedings of the 14th Euro-
      pean Workshop on Systems Security, EuroSec
     ’21, 2021, p. 1–7.
[47] Microsoft, Onefuzz - a self-hosted fuzzing-as-a-


Proceedings of the 28th C&ESAR (2021)                                                              37

</pre>