=Paper=
{{Paper
|id=Vol-2650/paper12
|storemode=property
|title=Comprehensive Evaluation of Cross Translation Unit Symbolic Execution
|pdfUrl=https://ceur-ws.org/Vol-2650/paper12.pdf
|volume=Vol-2650
|authors=Endre Fülöp,Norbert Pataki
|dblpUrl=https://dblp.org/rec/conf/icai3/FulopP20
}}
==Comprehensive Evaluation of Cross Translation Unit Symbolic Execution==
Proceedings of the 11th International Conference on Applied Informatics Eger, Hungary, January 29–31, 2020, published at http://ceur-ws.org Comprehensive Evaluation of Cross Translation Unit Symbolic Execution∗ Endre Fülöp, Norbert Pataki ELTE Eötvös Loránd University, Budapest, Hungary Faculty of Informatics, 3in Research Group, Martonvásár, Hungary gamesh411@gmail.com,patakino@elte.hu Abstract Static analysis is a great approach to find bugs and code smells. Some of the errors span across multiple translation units (TUs). Symbolic execu- tion is a primary static analysis technique. Symbols are used to represent unknown values (e.g. user input), and symbolic calculations are carried out on them. Clang Static Analyzer (SA) is an open-source symbolic execution engine for C/C++/Objective-C. The default behaviour of the SA does not support cross translation unit analysis, but it can be parametrized to enable analysis techniques spanning across many TUs. In this paper, we evaluate the cross translation unit symbolic execution in a comprehensive way. Different caching methods, different approaches are considered. We compare the analysis of open source projects. The aim is an optimal configuration for the tool. Keywords: symbolic execution, cross translation unit, Clang MSC: 68N15 Programming languages 1. Introduction Static analysis is a well-known method to detect bugs without execution of code [7]. Static analysis works with source code, mainly focuses on bug detection [3]. However, refactoring, obfuscation and complexity metrics tools also use static anal- ysis. Static analysis tools build up the abstract syntax tree (AST) in order to run ∗ The research has been supported by the European Union, co-financed by the European Social Fund (EFOP-3.6.2-16-2017-00013, Thematic Fundamental Research Collaborations Grounding Innovation in Informatics and Infocommunications) Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 105 b.cpp: int f( int& r ); a.cpp: void g() int f( int& r ) { { int x = 14; ++r; f( x ); return 100 / ( r - 15 ); if ( x != 15 ) } { int * p = new int; } } Figure 1: TUs with cross-referencing function definitions AST-consumers on them, which are used to implement algorithms over syntax trees [5]. Symbolic execution is a major static analysis in which symbols are used to represent unknown, and calculations are carried out on them [9]. Unfortunately, separate compilation makes cross translation unit analysis chal- lenging for C family languages. Therefore, many tools do not support cross trans- lation analysis [1]. Unity build is a technique for using a single translation unit, but creating unity builds also has many difficulties [10]. However, the scope of the analysis has a significant impact on the precision [6]. Let us consider the code snippets on Figure 1 that belong to two different translation units: If one were to analyze the first translation unit via single-TU analysis, starting the symbolic execution from function f, the value of the parameter could not be reasoned about. Therefore producing a warning for the return statement, and stating that the expression is potentially a division by zero is not in line with the conservative policy of static analysis. The goal of the analyzer is to identify real bugs and help the programmer fix error-prone code constructs. However, too many bug-reports are also discouraged for practical reasons. In function g, no knowledge about the value of x right after the function call to f. In this case, one gets warning about the memory leak, but it is dead code; thus it is a false positive finding. C/C++ programmers have been eager for a more precise solution, therefore we improve the Clang SA for the cross translation unit analysis, but the potential configuration settings of the new version have not been evaluated. The rest of this paper is organized as follows. We present the approach of cross translation unit symbolic execution in section 2. We define what are the parameters of the improved analysis in section 3. We evaluate the analysis processes and present results in section 4. Finally, this paper concludes in section 5. 106 2. Cross Translation Unit Symbolic Execution Clang Static Analyzer (SA) is a powerful symbolic execution engine for the C/C++ and the Objective-C languages. Moreover, it is based on the Clang compiler infras- tructure [2]. However, it was not able to perform cross translation unit analysis for a long time. However, many problems span across multiple translation units. We improved it to achieve a more sophisticated approach [6]. The SA used a one-pass analysis initially, however, the CTU analysis needs preprocessing on the project. Thanks to this dependency, we we had to extend the analysis driver to support two-pass analysis. In the first phase, an index file is created based on the compilation database and source code. This index file contains the mapping of function definition and translation unit. The source code is parsed; thus the AST is built. The ASTs are serialized in binary format. In the second phase, SA analyzes all translation units. When it reaches a function call that has no definition in the current unit, SA finds its serialized AST snippet based on the index file. In this case, a unique merging approach is required. Both the compiled in-memory form of the current file’s AST and the binary serialized external one contain their distinct symbol tables and type representations. They have different managers regarding the source locations as well. Loading and merging ASTs have runtime cost; therefore we developed caching mechanisms. 3. Evaluation The symbolic execution engine of Clang SA implements interprocedural analysis by inlining the definition of the called function when the analysis reaches said function call. This inlining is not always performed however, for example if the definition is not available inside the translation unit. Single-TU analysis will disregard the call expression, and performs some invalidation on values possibly reachable by the function (e.g. parameters taken by reference, global variables). CTU analysis makes the definitions from other TU-s available to the analyzer, therefore increas- ing the number inlined functions, and at the same time decreasing the number and effect of invalidations. The analysis proceeds with the consumption of the statements of the function body as if they were lifted into the current scope. The analysis employs thresholds to limit the execution time of the analysis. The configuration of the CTU symbolic execution contains some settings. These settings affect the runtime performance and memory consumption significantly [4]. With inlined function definitions, the analyzer has the capability of exploring a bigger part of the project code. This does not necessarily lead to more bugs found, but measurements seem to indicate, that a higher amount of bugs is detected when the analysis is ran in CTU mode. One of the most critical parameters is the maximal number of translation units to process when an external code snippet is required for a more sophisticated approach. On the one hand, the increase in this parameter means more precise 107 analysis. On the other hand, there are limitations in the symbolic execution engine to cancel the analysis, even if the maximal number of allowed TUs is not exceeded. There are two different modes for loading external AST. The users can select between the two-pass analysis and on-demand loading of external AST approaches. We provide caching mechanisms, as well. Function-wise and translation unit- wise mechanisms are supported. If an external code snippet is required, function- wise solution means that only the AST of the called function is cached. The translation-unit-wise approach provides the caching the AST of the entire transla- tion unit, not just part that belongs to the called function. We tested the CTU symbolic execution with respect to the following projects: • Tmux1 , an open-source project written in C • Xerces2 , an open-source project written in C++ The following metrics were collected: • Wall time of execution • Resident memory usage • Disk usage of the analysis We collect the metrics based on the following parameters: • Method used – non-CTU as the baseline – AST-dump based CTU – on-demand-parsed CTU • TU unit threshold 4. Results The measurements were run on a Intel(R) Xeon(R) CPU X5670 @2.93GHz work- station with 24 virtual cores. Each measurement is driven by CodeChecker3 , with the help of run orchestrator CSA testbench4 . The runtime metrics wall clock time, and memory usage were taken with time tool5 . The evaluation of CTU analysis methods shows a definite increase in both analysis time and result-count in case of both simple and on-demand modes for Tmux as seen on Table 1 and for Xerxes on Table 2. The CTU modes make 1 https://github.com/tmux/tmux 2 https://github.com/apache/xerces-c 3 https://github.com/Ericsson/CodeChecker 4 https://github.com/Xazax-hun/csa-testbench 5 https://www.gnu.org/software/time 108 Method Threshold Bugs Time Max memory Disk (s) (kB) (kB) Non-CTU N/A 21 662.04 190128 1294 Dump based CTU 0 21 779.71 190948 166403 8 36 1143.99 240500 167481 16 121 1715.64 290636 173590 24 154 1946.96 332740 176691 32 159 2073.77 376432 177159 40 161 2096.85 421852 177185 48 162 2074.01 442892 177253 On-demand CTU 0 21 719.13 190636 1405 8 36 1261.06 245156 2577 16 121 1975.93 300012 8733 24 154 2281.72 347184 11954 32 159 2426.03 393232 12473 40 161 2482.41 442000 12512 48 162 2489.72 467704 12586 Table 1: Tmux analysis method comparison Method Threshold Bugs Time Max memory Disk (s) (kB) (kB) Non-CTU N/A 109 1437.24 236404 6492 Dump based CTU 0 109 1607.94 239104 637517 8 225 10858.03 316660 683215 16 289 17461.60 363252 693284 24 335 21300.30 426760 697806 32 362 22549.00 488412 699926 40 366 23630.78 536008 701161 48 368 23938.71 559980 701594 On-demand CTU 0 109 1544.32 237916 6597 8 225 12084.36 323416 52295 16 296 21402.23 382732 62364 24 329 25186.48 456744 66886 32 355 26988.73 536848 69006 40 360 28349.59 591512 70241 48 362 28790.37 614456 70674 Table 2: Xerces analysis method comparison 109 Tmux CTU with thresholds 2500 Analysis CPU time (seconds) 2000 1500 1000 simple ctu on-demand CTU 0 8 16 24 32 40 48 Figure 2: Analysis time vs TU threshold (Tmux) the analyzer a more significant part of the project available, thus increasing the amount of information accessible to the analyzer. The runtime cost of the different analysis methods can be seen on Figure 2 and Figure 3 for Tmux and Xerces respectively. The increase in result-count could also potentially mean that more false positives are produced. The programmer must make the decision whether a finding is positive or not on an individual basis. This means that CTU analysis could potentially provide more results in the project at the cost of an increase in the development time. The results also show that the nature of bugs being found varies, as multiple domains get connected by CTU, that are separated by modularization. For example, bugs concerning memory access are found along deeper bug paths, as the memory handling logic is most of the time separated into different translation units. CTU analysis introduces more statements to be analyzed. These statements use the same budget as the non-imported ones. Even with the same statement- budget value, the exact characteristics of found bugs depend on many factors, including the path-exploration strategy employed, the position of the inlined calls, and the structure of the inlined functions body. The CTU analysis therefore can lead to deeper bug paths as well as shallower ones. Consider the example of a function which uses a call very early during its execution. In the non-CTU case, the call is ignored, and execution proceeds with statements after it. There are some bugs found with either deep or shallow paths. Now consider the analysis of the same function but with CTU mode enabled. The aforementioned call is now matched with a definition from another TU, and is inlined. There is a possiblity now that the inlined function is very complex, or maybe more functions are inlined during the evaluation of said function. This could lead to early exhaustion of the analysis budget, and potentially the exclusion of the latter half of the original functions body. Totally disjoint sets of results are possible. 110 Xerces CTU with thresholds Analysis CPU time (seconds) 60000 50000 40000 30000 20000 10000 simple ctu on-demand CTU 0 0 8 16 24 32 40 48 Figure 3: Analysis time vs TU threshold (Xerces) There is also an interesting behaviour in case of analyzing projects up to their CTU-threshold-limit. TMUX was analyzed with a TU threshold of 100 in cases the threshold value was not explicitly mentioned. During this analysis we have tracked the messages of the analyzer. We found that even if a maximal amount of 100 was given, there were no TUs that triggered an import of more than 47 other TUs. This means that with default settings, the analyzer considered no more than this amount TUs during analysis, which we call the CTU-threshold-limit. So with thresholds ranging from 0 (which signifies equivalent behaviour as non-CTU) up to 48, the whole range of possible values were measured. Thus the threshold charts saturation-like shape in case of bugs found. 5. Limitations and Conclusion The disk usage of CTU analysis could prove to be a significant hindrance, as there is a usually a magnitude of difference between the size of the results produced, and the size of the dumped AST-nodes in case of simple CTU analysis. The on-demand CTU analysis can mitigate this, but at the cost of parsing the source files on-the-fly. These tradeoffs should be considered before switching analysis modes. On-demand CTU analysis has further weaknesses. The produced AST is not precisely equivalent to the dumped AST. We have identified three possible causes of the non-equivalence. One could be that the serialization and deserialization steps do not give back the original AST, this is still under investigation. Another reason could be the liberal handling of language elements in case of creating serial- ized AST dumps, as opposed to be more strict policy employed during on-demand parsing. This is more likely, but the exact implementation details of narrowing the gap is under revision. The last reason could be the liberal detection of compilation 111 command flags, and is deemed the least probable. Further investigation is needed, but recent discussions suggest that an architecturally more robust and more scal- able approach could circumvent the first two reasons. In the case of medium-sized projects, the differences of AST are not numerous enough to produce different re- sults; however, in the case of more complex projects, even the bugs are found differ. Currently we are verifying whether the AST serialization is to be held accountable for this phenomenon. We found that the caching is necessary for the analysis to be successful. Clang is relying on every part of the AST to be held inside the memory. When switching off the caching, we found that the analysis asserted on multiple invariant properties of the AST being violated, as the AST nodes were already destructed when the analysis reached them. References [1] Anand, S., Godefroid, P., Tillman, N., Demand-driven Compositional Sym- bolic Execution, in Proc. of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp. 367–381. [2] Arroyo, M., Chiotta F., Bavera F., An user configurable Clang Static Analyzer taint checker, in Proc. of the 2016 35th International Conference of the Chilean Computer Science Society (SCCC), pp. 1–12. [3] Babati, B., Horváth, G., Májer, V., Pataki, N., Static Analysis Toolset with Clang, in Proc. of the 10th International Conference on Applied Informatics (ICAI 2017), pp. 23–29. [4] Baldoni, R., Coppa, E., Cono D’elia, D., Demetrescu, C., Finocchi, I., A Survey of Symbolic Execution Techniques, ACM Computing Surveys, Vol. 51(3) (2018), Article No.: 50. [5] Emanuelsson, P., Nilsson U., A Comparative Study of Industrial Static Analysis Tools, Electronic notes in theoretical computer science, Vol. 217 (2008), pp. 5–21, 2008. [6] Horváth, G., Szécsi, P., Gera, Z., Krupp, D., Pataki, N., Challenges of Implementing Cross Translation Unit Analysis in Clang Static Analyzer, in Proc. of 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM 2018), pp. 171–176. [7] Johnson, B., Song, Y., Murphy-Hill, E., Bowdidge, R., Why don’t software developers use static analysis tools to find bugs? in Proc. of the 2013 International Conference on Software Engineering, ICSE ’13. (2013), pp. 672–681. [8] Joshi, A., Tewari, A., Kumar, V., Bordoloi, D., Integrating static analysis tools for improving operating system security, International Journal of Computer Science and Mobile Computing, Vol. 3(4) (2014), pp. 1251–1258. [9] King, C., Symbolic execution and program testing, Communications of the ACM Vol. 19 (1976), pp. 385–394. 112 [10] Mihalicza, J., How #includes Affect Build Times in Large Systems, in Proc. of the 8th International Conference on Applied Informatics (ICAI 2010), Vol. 2, pp. 343–350. 113