=Paper=
{{Paper
|id=Vol-3612/QuASoQ_2023_Paper_01
|storemode=property
|title=Coyote C++: An Industrial-Strength Fully Automated Unit Testing Tool
|pdfUrl=https://ceur-ws.org/Vol-3612/QuASoQ_2023_Paper_01.pdf
|volume=Vol-3612
|authors=Sanghoon Rho,Philipp Martens,Seungcheol Shin,Yeoneo Kim,Hoon Heo, Seunghyun Oh
|dblpUrl=https://dblp.org/rec/conf/apsec/RhoMSKHO23
}}
==Coyote C++: An Industrial-Strength Fully Automated Unit Testing Tool==
Coyote C++: An Industrial-Strength Fully Automated Unit Testing Tool Sanghoon Rho1 , Philipp Martens1 , Seungcheol Shin1 , Yeoneo Kim1 , Hoon Heo2 and SeungHyun Oh2 1 CODEMIND Corporation, Seoul, South Korea 2 Hyundai KEFICO Corporation, Gyeonggi-Do, South Korea Abstract Coyote C++ is an automated testing tool that uses a sophisticated concolic-execution-based approach to realize fully automated unit testing for C and C++. While concolic testing has proven effective for languages such as C and Java, tools have struggled to achieve a practical level of automation for C++ due to its many syntactical intricacies and overall complexity. Coyote C++ is the first automated testing tool to breach the barrier and bring automated unit testing for C++ to a practical level suitable for industrial adoption, consistently reaching around 90% code coverage. Notably, this testing process requires no user involvement and performs test harness generation, test case generation and test execution with “one-click” automation. In this paper, we introduce Coyote C++ by outlining its high-level structure and discussing the core design decisions that shaped the implementation of its concolic execution engine. Finally, we demonstrate that Coyote C++ is capable of achieving high coverage results within a reasonable timespan by presenting the results from experiments on both open-source and industrial software. Keywords automated unit test, coverage testing, concolic execution, C++, LLVM 1. Introduction Coyote C++ streamlines the entire testing process, from harness generation and test case generation to test execu- The significance of testing in software engineering is con- tion. The automated test case generation is based on con- tinuously escalating, necessitating thorough validation colic execution, a modern variant of symbolic execution, methods such as white-box testing. However, given the and features exquisite harness generation capabilities. rapid increase in code scale and complexity in the soft- The paper outlines the underlying technologies on ware industry, white-box testing can be time-consuming which Coyote C++ achieves a practical level of high cov- and resource-intensive[1], often leading to budget con- erage through test case generation. In order to practically straints. For this reason, there has been a long-standing utilize automated unit testing tools in the field, we pro- need for automation in white-box testing. pose that a testing speed of around 10,000 logical LOC of Lately, efforts to automate white-box unit testing are executable statements per hour with statement coverage approaching practical feasibility, with automated test- above 90% and branch coverage above 80% should be ing showing promising results for Java [2, 3], C [4, 5, 6], desirable. Currently, Coyote C++ is achieving elevated binary code [7, 8], and a few other programming lan- levels of coverage and performance according to these guages [9, 10, 11]. Conversely, adopting this technol- criteria, and is thus being effectively applied and utilized ogy for C++ has proven to be challenging due to the by our customers in the automotive industry. language’s unique features and overall complexity [12]. The rest of this paper is organized as follows. We Implicitly invoked copy or move constructors and tem- first look at research on concolic-execution-based unit plates with all their intricacies are just two examples testing and then examine design decisions made by exist- of C++ language features that are especially difficult to ing systems to build efficient concolic execution engines handle in automated white-box unit testing. in related works. Next, we provide an overview of the In this paper, we introduce Coyote C++, an automated implementation of Coyote C++, and present test results unit testing tool designed for C/C++. With a single click, obtained from open-source projects and real-world indus- trial projects. Finally, we conclude the paper by outlining QuASoQ 2023: 11th International Workshop on Quantitative Approaches to Software Quality, December 04, 2023, Seoul, South our plans for further improving Coyote C++. Korea Envelope-Open rho@codemind.co.kr (S. Rho); philipp.m@codemind.co.kr (P. Martens); shin@codemind.co.kr (S. Shin); 2. Related works yeoneo@codemind.co.kr (Y. Kim); hoon.heo@hyundai-kefico.com (H. Heo); seunghyun.oh@hyundai-kefico.com (S. Oh) Symbolic execution [13] is a static program analysis tech- © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). nique that interprets programs with symbolic values CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 45 rather than concrete values. Due to scalability issues while QSYM [21] and CREST [22] are instrumentation- with symbolic execution, this technique has been ex- based. tended into concolic execution [5, 6]. The main idea of concolic execution is to compute test inputs from path 2.3. Mitigating Path Explosion conditions which are obtained by tracking both concrete values and symbolic values. Concolic execution has been Another important design decision is how to deal with the anticipated in the automated testing domain due to its path explosion problem commonly encountered when known success in test case generation. However, this performing concolic execution on programs with com- research has not yet reached a practical level of test gen- plex control flow. In such situations, the search space eration for whole programs. of concolic execution can grow exponentially due to the Nevertheless, concolic execution is known to be re- many possible combinations of branches. To avoid this is- markably successful in unit test generation, e.g. for sue, concolic execution engines use a variety of heuristic Java [2, 3] and C [4, 5, 6]. For C++ however, automated search strategies. Notable search strategies include DFS testing has still been far from viable for industrial pur- (depth-first search), BFS (breadth-first search), random poses despite recent research efforts [14, 12]. path selection, coverage-optimized search, and adaptive When implementing concolic execution there are many heuristics [15, 23]. options for realizing various aspects of the engine [15]. Especially the engine’s execution mode, analysis target, 2.4. Memory Model handling of the path explosion problem, and its memory model can largely affect the performance of the concolic When modelling the symbolic memory of a concolic exe- execution engine in terms of coverage and execution cution engine, one can choose between treating memory time. addresses as symbolic or concrete values. The symbolic approach can theoretically handle all possible paths, but this approach may cause path constraints to become too 2.1. Online/Offline Mode complex for current SMT solvers. On the other hand, us- Concolic execution can be implemented in online or of- ing concrete addresses might not cover all possible paths fline mode. In online mode, the concolic execution en- due to overly simplified path conditions. In practice, a gine explores multiple paths in a single run by forking on fully symbolic model is used by tools like KLEE [16], and branch points. The advantage of this method is that there a concrete address model is used by SAGE [7] among oth- is no need to re-execute the common prefixes of multi- ers. Additionally, there are tools like Mayhem [17] that ple paths. However, it requires a substantial amount of use a combination of symbolic and concrete addressing memory to store all the states of multiple paths. Offline schemes. mode on the other hand explores only one path in a sin- gle run. This method requires less memory than online mode, making it better suited for parallelization. How- 3. The Design of Coyote C ++ ever, since offline mode always starts at the beginning of the program for every path, it spends a considerable 3.1. Overview amount of time on re-examining common path prefixes. In this chapter, we present an overview of Coyote C++ Prominent tools using online mode are KLEE [16], May- and discuss the core decisions that influenced its design. hem [17], and S2 E [18], whereas SAGE [7] utilizes offline As shown in the diagram in Fig. 1, the Coyote C++ tool concolic execution. is divided into two main parts. The first part builds ex- ecutable test files based on harness generation, while 2.2. Emulation/Instrumentation the second part handles generating test cases through concolic execution. There are two main methods for collecting information In the first phase, Coyote C++ uses a harness genera- about the execution path taken during concrete execution tor module to automatically generate test stubs and test of the program under test. The first method performs drivers for test execution and inserts instrumentation symbolic execution at the same time as concrete exe- code for concolic execution. This instrumentation is per- cution by running the program under test inside of an formed on LLVM IR level. Next, the binary generation emulator such as QEMU [19]. The second method in- module compiles the created testbed to executable files stead instruments the program under test with code that used in the second part. handles symbolic execution and the collection of informa- While running the executable test file in the second tion about the concrete execution of the program. Well- phase, the instrumentation code produces trace files con- known emulator-based tools are angr [20] and KLEE [16], taining information about the concrete program execu- tion on the level of LLVM IR instructions. These trace 46 Figure 1: Overview of Coyote C++. files are then used to reconstruct their respective execu- during testable binary generation. The main reason for tion paths, and with this information symbolic execution choosing offline testing over online testing is that it is is performed on the LLVM IR level to generate new test more suitable for parallelization, which is essential for input data. When this concolic execution cycle termi- providing good testing performance. Additionally, offline nates, the achieved test coverage is calculated based on testing is more advantageous from a memory manage- the generated trace files. ment standpoint. A key factor for achieving high code coverage is the 3.2. Design Decisions of Coyote C++ search strategy that controls in which order the possi- ble execution paths of a program are explored. During While implementing Coyote C++, many important de- testcase generation, the test files are initially executed sign decisions had to made. In the modules responsi- with all test inputs set to default values. The trace files ble for the testable binary generation, these decisions generated from this are then analyzed using concolic were generally made with the goal of enabling a wide execution techniques to create new test case inputs for range of transformations on intermediate code models visiting new paths. As our search strategy for exploring while retaining a sufficiently strong connection between of candidate paths, we adopted a hybrid approach that these models and the original source code. Most design combines CCS (Code Coverage Search) and DFS. CCS decisions affecting the testcase generation phase were focuses on exploring code areas that have not been tra- strongly influenced by the need to find a suitable tradeoff versed yet, making it advantageous for quickly reaching between the achieved code coverage and performance in high coverage. However, because CCS performs rather terms of test time or resource consumption. aggressive pruning on execution paths, it may produce A fundamental design decision made in Coyote C++ is unsatisfiable path conditions in certain situations. To using LLVM IR as its symbolic execution target. This al- make up for these issues, we also use the DFS strategy lows for more precision than doing source level symbolic in addition to CCS. DFS is a search strategy that has the execution while retaining more information about the potential to cover code areas not covered by CCS, but it original source code that would be lost when lowering comes with the drawback of substantial time consump- even further to the assembly level. Also, using LLVM as tion. Usually, either of these strategies terminates once a foundation for Coyote C++ allows for greater freedom every branch it has discovered has been explored. Con- in code transformations during harness generation, by- colic execution may however also be terminated early if passing syntactic constraints present on the source code a designated amount of test cases has been generated or level. if a timeout has been reached. We decided to implement offline testing by inserting in- Finally, a significant factor influencing the performance strumentation code into the LLVM IR code of the testbed of concolic execution in C++ is the memory model. Sim- 47 Table 1 Results on Open-Source Projects Project Info Coverage Test Time Name (C/C++) Files Functions Statements Branches Statement Branch [m] nuklear (C) 39 609 9,284 4,309 93.7% 87.1% 55 libsodium (C) 94 887 8,003 1,651 96.5% 89.7% 6 mathc (C) 1 843 4,192 190 99.9% 100.0% 3 aubio (C) 53 520 5,916 1,797 95.7% 92.4% 14 s2n-tls (C) 175 1,621 16,734 15,512 86.7% 81.3% 68 yaml-cpp (C++) 32 367 3,050 1,300 96.9% 95.5% 11 qnite (C++) 48 637 4,294 1,035 95.2% 89.1% 37 json-voorhees (C++) 21 451 2,507 764 92.5% 88.7% 5 QPULib (C++) 24 278 3,561 1,398 87.8% 83.8% 3 jsoncpp (C++) 3 309 2,802 1,148 91.2% 86.3% 11 Total 490 6,522 60,343 29,104 93.6% 89.4% 213 Table 2 Coverage Results from Hyundai KEFICO Project Info Coverage Test Time Name Files Functions Statements Branches Statement Branch Target A 1,855 5,129 129,131 40,718 92.8% 86.8% Target B 83 1,774 11,828 3,078 97.4% 90.7% N/A Target C 69 375 6,526 2,339 85.5% 79.9% Total 2,007 7,278 147,485 46,135 92.9% 86.7% ilar to Mayhem, the approach implemented in Coyote 4.1. Experiment on Open-Source Projects C++reads values from memory symbolically but writes For the first evaluation, we chose to reuse the test set values to concrete memory addresses. Utilizing sym- curated by Shin and Yoo for a survey on white-box au- bolic reads in contrast to reading from concrete addresses tomated testing tools [24], as it contains open-source leads to a more faithful representation of path constraints, projects written in C and C++ from a wide variety of thereby enhancing the potential for generating appropri- application domains and was composed specifically for ate test cases. For write operations however, we chose the evaluation of automated testing tools such as Coy- to rely on concrete addresses because symbolic writes ote C++. This survey also concluded that currently no are prone to making the process of solving the path con- other commercial tools truly support automated testing straints overly expensive. for C++ programs. Among open-source tools for C++, CITRUS [12] is no longer publicly available, and we were 4. Experimental Results not able to successfully apply UTBot [14] to the selected test projects due to its rather limited support for the To showcase the performance of Coyote C++, we present C++ syntax. Thus, unfortunately there were no suitable experimental results for a set of diverse open-source candidates to compare Coyote C++ against in terms of projects as well as several industrial software projects coverage and test time. from one of our customers, Hyundai KEFICO. While our Table 1 shows the statement1 and branch coverage tool allows user to add test cases and write driver func- results achieved by Coyote C++ on the ten open-source tions for achieving higher coverage, all experimental re- projects in the test set as well as the time needed for sults were obtained through one-click automation with- conducting the automated test generation and execution out any user intervention. for each project. Coyote C++ achieves statement cover- ages between 86.7% (s2n-tls) and 99.9% (mathc) as well 1 As statements we consider only executable lines of code. In contrast to physical lines of code, this excludes e.g. whitespace, comments, and type declarations. 48 as branch coverages between 81.3% (s2n-tls) and 100% 5. Conclusion and Future Work (mathc). Summing up the number of overall covered lines/branches and dividing them by the total number In this paper, we presented Coyote C++, an industry- of lines and branches in all ten projects yields a remark- grade automated testing tool based on concolic execu- able combined statement coverage of 92.5% and branch tion. After describing the general tool architecture, we coverage of 84.9%. discussed the core design decisions for our implementa- The test times presented in table 1 were attained from tion of its concolic testing engine. Finally, we evaluated an Intel Core i7-13700 system with 64GB of RAM run- the performance of Coyote C++ in terms of achieved ning Ubuntu 20.04. Overall, the test of all ten projects coverage and testing time on both a test set of diverse combined only took about three and a half hours, with open-source projects and industry code from one of our individual testing times ranging between three minutes corporate customers. We were able to demonstrate that (mathc) and just above one hour (s2n-tls). That makes it Coyote C++ can achieve high statement/branch coverage more than six times faster than the test times reported of around 90% or higher in a reasonable amount of time in the previously mentioned study [24], which we con- for software projects from a wide variety of application sider a significant improvement despite possible minor domains. differences between test setups. Furthermore, with the While Coyote C++ is already yielding promising re- exception of the qnite project, the testing speed on all sults both on open-source projects and in real industry projects surpasses our definition of practicality, with an applications, it is our plan to continuously improve the overall testing speed of roughly 17,000 statements per tool both in terms of reliably achieving high coverage hour. results and broadening its capabilities in the field of au- tomated testing. One goal for the near future is target testing for embed- 4.2. Results on Industry Projects ded software. Our tool currently performs host testing, Table 2 presents testing results produced by Coyote C++ meaning tests are not executed on the hardware that on automotive control software projects from our cus- would run the program under test in a production envi- tomer Hyundai KEFICO, a member of Hyundai Motors ronment, but rather on a separate computer, e.g., a test Group. As details about these projects such as their ac- engineer’s computer or a test server. Especially in the tual names are strictly internal information, we will refer embedded domain however, the discrepancy between em- to them as target A, B and C. bedded hardware in the production environment and the The coverage results for these industrial projects are consumer or server hardware in the testing environment quite similar to the open-source projects, with an average may lead to inaccurate test results. Thus, we are planning statement coverage of 92.9% and an average branch cov- to implement target testing support so that tests may be erage of 86.7%. At our customer, Coyote C++ is employed run directly on production hardware. not in a controlled test environment but rather in a busi- Approaching the goal of increasing automated test ness setting on multiple machines with varying hardware coverage from a different perspective, we also strive to specifications. Due to these circumstances and the fact provide users of our tool with feedback as to how they that a subset of the test results were produced incremen- should change their code so that Coyote C++ will likely tally over a longer period of time, we presently do not yield better coverage results for it. While we would like have any meaningful test time measurements available to give such guidance on the basis of code metrics, our to report for these projects. initial investigations have shown that traditional code While project C individually yields a slightly subpar metrics such as cyclomatic complexity have little to no coverage, our notion of practicality in terms of cover- correlation with automated test coverage. Thus, we see age achieved (statement coverage >90%, branch coverage the need for more thorough research involving the de- >80%) is upheld both by projects A and B individually velopment of new code metrics that can serve as a better as well as all three projects combined. This again rein- estimate for the coverage results produced by automated forces our claim that Coyote C++ is not simply a research testing and Coyote C++ in particular. prototype which only works on a limited set of specially curated programs but is rather a mature tool that can also handle more challenging industry software. Also, it References should be noted that automated testing with such high [1] L. Luo, Software testing techniques, Institute for coverage results for these projects is only possible be- software research international Carnegie mellon cause Coyote C++ has explicit handling for some common university Pittsburgh, PA 15232 (2001) 19. code patterns in embedded software that would usually [2] G. Fraser, A. Arcuri, A large-scale evaluation of au- make automated testing difficult or plainly impossible, tomated unit test generation using EvoSuite, ACM such as the usage of fixed memory addresses in code. 49 Trans. Softw. Eng. Methodol. 24 (2014). URL: https: nik, D. Mordvinov, S. Morozov, et al., UnitTest- //doi.org/10.1145/2685612. doi:10.1145/2685612 . Bot: Automated unit test generation for C code in [3] K. Sen, G. Agha, Cute and jcute: Concolic unit integrated development environments, in: 2023 testing and explicit path model-checking tools, in: IEEE/ACM 45th International Conference on Soft- T. Ball, R. B. Jones (Eds.), Computer Aided Verifica- ware Engineering: Companion Proceedings (ICSE- tion, Springer Berlin Heidelberg, Berlin, Heidelberg, Companion), IEEE, 2023, pp. 380–384. 2006, pp. 419–423. [15] R. Baldoni, E. Coppa, D. C. D’elia, C. Demetrescu, [4] Y. Kim, D. Lee, J. Baek, M. Kim, Concolic testing I. Finocchi, A survey of symbolic execution tech- for high test coverage and reduced human effort niques, ACM Comput. Surv. 51 (2018). URL: https: in automotive industry, in: 2019 IEEE/ACM 41st //doi.org/10.1145/3182657. doi:10.1145/3182657 . International Conference on Software Engineering: [16] C. Cadar, D. Dunbar, D. R. Engler, et al., KLEE: Unas- Software Engineering in Practice (ICSE-SEIP), IEEE, sisted and automatic generation of high-coverage 2019, pp. 151–160. tests for complex systems programs., in: OSDI, [5] K. Sen, D. Marinov, G. Agha, CUTE: A concolic volume 8, 2008, pp. 209–224. unit testing engine for C, ACM SIGSOFT Software [17] S. K. Cha, T. Avgerinos, A. Rebert, D. Brum- Engineering Notes 30 (2005) 263–272. ley, Unleashing Mayhem on binary code, in: [6] P. Godefroid, N. Klarlund, K. Sen, DART: Directed IEEE Symposium on Security and Privacy, SP automated random testing, in: Proceedings of 2012, 21-23 May 2012, San Francisco, California, the 2005 ACM SIGPLAN conference on Program- USA, IEEE Computer Society, 2012, pp. 380–394. ming language design and implementation, 2005, URL: http://doi.ieeecomputersociety.org/10.1109/ pp. 213–223. SP.2012.31. doi:10.1109/SP.2012.31 . [7] P. Godefroid, M. Y. Levin, D. Molnar, SAGE: white- [18] V. Chipounov, V. Kuznetsov, G. Candea, The S2E box fuzzing for security testing, Communications platform: Design, implementation, and applica- of the ACM 55 (2012) 40–44. tions, ACM Transactions on Computer Systems [8] F. Saudel, J. Salwan, Triton: A dynamic symbolic ex- (TOCS) 30 (2012) 1–49. ecution framework, in: Symposium sur la sécurité [19] F. Bellard, QEMU, a fast and portable dynamic des technologies de l’information et des communi- translator., in: USENIX annual technical confer- cations, SSTIC, France, Rennes, 2015, pp. 31–54. ence, FREENIX Track, volume 41, Califor-nia, USA, [9] N. Tillmann, J. de Halleux, Pex–white box test 2005, p. 46. generation for .net, in: B. Beckert, R. Hähnle (Eds.), [20] Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, Tests and Proofs, Springer Berlin Heidelberg, Berlin, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, Heidelberg, 2008, pp. 134–153. C. Kruegel, G. Vigna, SoK: (state of) the art of war: [10] A. Giantsios, N. Papaspyrou, K. Sagonas, offensive techniques in binary analysis, in: IEEE Concolic testing for functional languages, Sci- Symposium on Security and Privacy, 2016. ence of Computer Programming 147 (2017) [21] I. Yun, S. Lee, M. Xu, Y. Jang, T. Kim, QSYM: A 109–134. URL: https://www.sciencedirect.com/ practical concolic execution engine tailored for hy- science/article/pii/S0167642317300837. doi:https: brid fuzzing, in: 27th USENIX Security Symposium //doi.org/10.1016/j.scico.2017.04.008 . (USENIX Security 18), 2018, pp. 745–761. [11] K. Sen, S. Kalasapur, T. Brutch, S. Gibbs, Jalangi: [22] J. Burnim, K. Sen, Heuristics for scalable dynamic A selective record-replay and dynamic analysis test generation, in: 2008 23rd IEEE/ACM Inter- framework for javascript, in: Proceedings of the national Conference on Automated Software Engi- 2013 9th Joint Meeting on Foundations of Soft- neering, 2008, pp. 443–446. doi:10.1109/ASE.2008. ware Engineering, ESEC/FSE 2013, Association for 69 . Computing Machinery, New York, NY, USA, 2013, [23] S. Cha, S. Hong, J. Bak, J. Kim, J. Lee, H. Oh, En- p. 488–498. URL: https://doi.org/10.1145/2491411. hancing dynamic symbolic execution by automat- 2491447. doi:10.1145/2491411.2491447 . ically learning search heuristics, IEEE Transac- [12] R. S. Herlim, Y. Kim, M. Kim, CITRUS: Automated tions on Software Engineering 48 (2022) 3640–3663. unit testing tool for real-world C++ programs, in: doi:10.1109/TSE.2021.3101870 . 2022 IEEE Conference on Software Testing, Veri- [24] K. Shin, Y. Ryu, Performance and functionality fication and Validation (ICST), 2022, pp. 400–410. evaluation of white-box software testing tools, doi:10.1109/ICST53961.2022.00046 . part 2, https://csrc.kaist.ac.kr/blog/2023/01/ [13] J. C. King, A new approach to program testing, 25/performance-and-functionality-evaluation- ACM Sigplan Notices 10 (1975) 228–233. of-white-box-software-testing-tools-part-2/, 2023. [14] D. Ivanov, A. Babushkin, S. Grigoryev, P. Iatchenii, Accessed: 2023-10-11. V. Kalugin, E. Kichin, E. Kulikov, A. Misonizh- 50