Methods and Means of Searching Errors When Working With Dynamic Memory Andrey Dergachev1[0000−0002−1754−7120] , Daniil Sadyrin2[0000−0001−5002−3639] , Aglaia Ilina3[0000−0003−1866−7914] , Ivan Loginov4[0000−0002−6254−6098] , and Iurii Korenkov5[0000−0002−8948−2776] 1 ITMO University, Kronverkskiy prospekt, 49, St. Petersburg, 197101, Russia http://www.ifmo.ru/, dam600@gmail.com 2 cyberguru007@yandex.ru 3 agilina@itmo.ru 4 ivan.p.loginov@gmail.com 5 idkorenkov@itmo.ru Abstract. The subject The work discusses methods and means of finding errors when working with dynamic memory that arise as a result of exploiting vulnerabilities in implementations of dynamic memory allo- cation algorithms in the C language - allocators. Such vulnerabilities are common for software systems of various levels and purposes, including system software. Techniques for their operation are easily implemented, their descriptions are publicly available on the Internet, which explains their widespread use. The purpose of the work The goal is to develop an integrated approach and software that can detect vulnerabilities of dynamic memory allocators both at the compilation stage and during the operation of the software, issue appropriate warnings and recom- mendations, and also at the compilation stage, edit the code so that exploitation of vulnerabilities is impossible. One of the important qual- ity criteria of the developed approach is the minimization of the over- head during the verification of software products. The method of the work Based on the studies and analysis of the exploitation techniques of the vulnerabilities Poisoned Null-byte, Overlapped Chunks, Fastbin At- tack, Unsafe Unlink, House of Einherjar, House of Force, House of Spirit, House of Lore, Unsorted Bin Attack, a conclusion was drawn on the need for checking during the verification process software applications of the following conditions: – the possibilities of manipulating the fields of data structures that store service information, up to the creation of fake sections of the main memory on the heap or on the stack; – possibilities of accessing arbitrary sections of the computer memory due to intentional violation of the logic of the memory allocation algorithms. The analysis of modern verification methods and existing software prod- ucts aimed at detecting vulnerabilities in the operation of memory allo- cation algorithms is carried out. Their advantages and disadvantages are Copyright c 2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 2 A.Dergachev et al. revealed. The results As a result of the work done, a software solution to the problem of detecting the potential exploitation of vulnerabilities of dynamic memory allocators in the form of a low-level debugger based on the method of symbolic execution of program code is proposed. Keywords: Verification · Bugs in software · Symbolic execution · Model checking · Dynamic memory · C language. 1 Introduction The presence of vulnerabilities in the operation of dynamic memory allocation algorithms can lead to loss and/or intentional data corruption and, among other negative consequences, to property damage. Obviously, the detection of appro- priate errors in the program code is necessary to minimize unwanted effects and their misuse. Therefore, the development of strategies for the automatic detec- tion of such errors within the life cycle of software and products based on them is an urgent task. 2 Vulnerabilities in the realization of the algorithms of dynamic memory allocation Table 1. Vulnerabilities in different versions of glibc. Poison Overla House House House Unsort Fastbin Unsafe House ed Null pped of Einh of For of Spir ed bin Attack Unlink of Lore -byte Chunks erjar ce it Attack glibc 2.25 + + + + + + + + + glibc 2.26 – + + + + – – + + The basic concepts of dynamic memory allocation in C [1] are the following. Each thread is assigned with some memory area – “arena” in which the memory chunks will be allocated and freed upon application requests. Each arena is owned by one or more heaps consist of chunks, and a new heap is allocated with full use of the previous heap. ”Malloc state” - (the arena header, is a structure from which the memory for storing the initial heap for this arena is usually taken) stores information about bins (linked lists of chunks in the heap), top chunk (a chunk on the upper memory boundary requested from the OS) and so on. The heap is described by a structure called “heap info”, which stores pointers to its arena, previous heap, etc. Chunk (the “malloc chunk” structure) is the range of memory on the heap allocated to the application. It can be combined with other chunks to get a larger chunk if needed. The metadata for the allocated and free chunks are different. Free chunks are stored in singly connected or doubly connected lists – i.e. bins. There are: Dynamic Memory Errors 3 – 10 of fastbin bins. They store chunks ranging in size from 32 to 160 bytes. The fastbin list works on the principle of LIFO (Last Input First Output). – 64 of smallbin bins. They store chunks smaller than 1024 bytes. Each bin gets chunks of the appropriate size. Smallbins are based on the FIFO principle (First Input First Output). – 63 of largebin bins. They store chunks larger than 1024 bytes. Largebin chunks are stored in descending order of size. – 1 unsorted bin - all released chunks get into it. When malloc is called, chunks are retrieved from the unsorted bin and transferred to the commensurate bins, or returned to the user. All beans are stored in the malloc state structure. The glibc library implements algorithms for efficient work with memory. The requested chunks, depending on the size, are extracted from different bins in the order established by the corresponding algorithms of the glibc library. At present, there is a set of known vulnerabilities in the realization of the algorithms of dy- namic memory allocation in the glibc library, which are considered on the Inter- net sites [2,3] and are illustrated with detailed examples [4]. As can be seen from table 1, glibc developers introduced patches that make it impossible to conduct a number of attacks, including a patch was proposed in 2017 to combat over- writing metadata in a single byte of the heap: https://sourceware.org/ml/libc- alpha/2017-10/msg00773.html. Table 1 shows a comparison between versions 2.25 and 2.26 of the glibc library, depending on the presence of vulnerabilities listed in the first row of the table. According to the CWE (Common Weakness Enumeration) classification [5], the list of attacks in table 1 exploits vulnerabilities that fall into the following categories: 1. CWE-122 Heap-based Buffer Overflow 2. CWE-415 Double Free 3. CWE-416 Use after free 4. CWE-476 NULL Pointer Dereference If we classify the presented vulnerabilities according to the technique of use, the main list can be presented in the following edition (table 2) depending on: – attacker’s capabilities to overwrite top chunk (see heap organization in glibc) and fields: prev size, size, fd, bk; – the ability to create a special “fake” chunk on the stack or heap; – from the presence of “double free” vulnerability; – from the ability to free a pointer to an arbitrary address (”arbitrary free”). Let us illustrate the undesirable behavior of programs(table 3) due to the use for example of the Overlapped chunks and Fastbin Attack techniques, based on some of the vulnerabilities like CWE-415. In the first column of table 3 the code of the first example is placed. In this example after deliberately recording an invalid value of the size of chunk p2 and releasing this chunk if a new memory chunk (p6) is allocated in its 4 A.Dergachev et al. Table 2. Comparative characteristics of attacks. Dema Dema Demands nds of nds of Tech Overwrite area of “fake” Doub Arbitra niques chunk le Free ry Free PREV prev Top INUSE size fd bk On the On the size chunk (1byte) stack heap Pois oned + null- byte Over lapped + chunks Fast bin + + Attack Unsafe + + + Unlink House of Ein + + + herjar House of For + ce House of Spi + + rit House of Lo + + re Unsor ted bin + + + + Attack place, the effect of overlapping two chunks (p3 and p6) appears, which clearly demonstrates the output of the program placed under the code. The second example demonstrates the usage of the double free vulnerability: the chunk p1 is released twice and entered in a fastbin. After that the rewriting in its field fd of the position of ”stack var” variable allows one after the new calling of malloc to return the chunk (pp4) to an arbitrary address, in this case, at address ”8 + (char *) & stack var”, which is reflected in the output of the program. It is obvious the occurrence of similar situations during the software operation is unacceptable and leads to the inevitable search for solutions to the problem Dynamic Memory Errors 5 Table 3. Code examples Code example 1 Code example 2 unsigned long long stack var; intptr t *p1,*p2,*p3,*p4,*p5,*p6; int prev in use = 0x1; fprintf(stderr, ”The address we want p1 = malloc(100); malloc() to return is %p.”, p2 = malloc(100); 8+(char *)&stack var); p3 = malloc(100); p4 = malloc(100); unsigned long long *p1 = malloc(8); p5 = malloc(100); unsigned long long *p2 = malloc(8); free(p1); int evil chunk size = malloc usable free(p2); size(p2) + malloc usable size(p3) free(p1); + prev in use + sizeof(size t) * 2; unsigned long long *pp1 = malloc(8); unsigned long long *pp2 = malloc(8); (unsigned int *)((unsigned char *)p1 + stackv ar = 0x20; malloc usable size(p1) ) = evil chunk size; pp1 = (unsigned long long) free(p4); (((char*)&stack var) - sizeof(pp1)); free(p2); p6 = malloc(200); unsigned long long *pp3 = malloc(8); memset(p3, ’3’, 100); unsigned long long *pp4 = malloc(8); memset(p6, ’6’, 150); fprintf(stderr, ”p6 = %s”, (char *)p6); fprintf(stderr, ”allocated chunk: fprintf(stderr, ”p3 = %s”, (char *)p3); %p, %p, %p, %p”, pp1, pp2, pp3, pp4); Program result 1 Program result 2 p6 = 66666666666666666666666666666666 6666666666666666666666666666666666666 6666666666666666666666666666666666666 6666666666666666666666666666666666666 6666666633333333333333333333333333333 The address we want malloc() to re- 333333333333333333333333333333333 turn is 0x7ffc4099d018. allocated chunk: p3 = 6666666666666666666666666666666 0x1edb010, 0x1edb030, 0x1edb010, 666666633333333333333333333333333333 0x7ffc4099d018 333333333333333333333333333333333 of automatization of the fight against the considered phenomena at all stages of the software life cycle. 3 Existed approaches and solutions At presence there are a number of approaches to solving the problem, among which, as the most promising, we can distinguish the following: fuzzing method, dynamic analysis approach, model checking approach, symbolic execution ap- proach, Binary Decision Diagrams application and SAT task solution. 6 A.Dergachev et al. It should be noted that the use of each of the methods implies the overhead of the execution of the program, which in some cases can be up to 500%. Let us consider in more detail some of them. In [6] and [7], the use of “smart” fuzzing to search for buffer overflow vul- nerabilities was proposed. Here, fuzzing is understood as an approach in which, instead of the expected input data, random or specially generated data is trans- mitted to the program. The objects of interest are crashes and freezes, violations of internal logic and checks in the application code, memory leaks. Dynamic analysis is one of the code verification methods and is performed during program execution. The disadvantages of this approach are the impossi- bility of checking all the ways the program can be executed and the slowdown of the program due to the parallel execution of dynamic analysis. In [8], a verification method is considered using the created models of pro- grams and algorithms (model checking). The specification for the program is written in the language of temporal logic, and then special algorithms automat- ically check whether the created model matches the specification. An interesting approach is the symbolic execution of a program [9]. This technique of simulation execution of a program allows to represent some of the input variables used in the program in symbolic form. Such a symbol denotes the set of values of the input variable of the program from the scope of its definition. Each symbolic execution is equivalent to executing a program on a set of specific test values of input variables, which reduces the power of the set of necessary tests. Via symbolic execution approach, it is required to select input data on which an error will occur. Since the task is relevant, today there are several products, such as CBMC, HAIT, Heap Hopper, ArcHeap and MOPS, dedicated to solving the problem. CBMC is a verifier that provides the possibility of limited model checking (Bounded Model Checker) for the languages ANSI-C and C ++. It allows to verify overflow of the array (buffer overflow), pointers safety, exceptions, and user-specified assertions. A tool called HAIT (Heap Analyzer with Input Tracing) [10] implements the approach of automatic collecting of the information about the state of the heap and the operations that are performed on it. The prototype is based on the Triton framework created for dynamic binary analysis of programs. HAIT logs memory operations and tagged values. Symbolic expressions are stored in the form of abstract syntax trees (AST), and the analysis of parsed tagged data is used to track memory operations affected by user input. Heap Hopper applies the principles of symbolic execution to search for buffer overflow vulnerabilities in the heap [11] . Heap Hopper is based on the Angr framework. At each step of the program execution, an object of the SimState class is created, which stores the state of the registers and memory of the program at the moment. Registers and memory can have a specific or symbolic value. Each symbolic variable is represented as a class of BitVectorSymbol. It is also possible to manually mark the necessary input data as symbolic - it can be symbolic memory, represented as a SimSymbolicMemory class, or a symbolic Dynamic Memory Errors 7 file, represented by a SimFile class. When the conditional branch instruction is reached, a constraint on the symbolic variable is added. When calling the ”malloc” operator with a symbolic parameter for the size of allocated memory, a memory chunk with symbolic metadata is created. At each step in the SimState class, the heap state is saved. The heap model presented in the Heap Hopper is considered approximate. ArcHeap [12] is an automatic tool for detecting unexplored heap exploita- tion techniques, regardless of their realization. For its operation it is necessary to describe the parameters of the memory allocator as well as set of possible actions on the heap. During the study ArcHeap checks to see if combinations of these actions can potentially be used to perform maintenance techniques, such as random storage or overlapping chunks. As proof ArcHeap generates a PoC that demonstrates a discovered exploitation technique. MOPS Modelchecking Programs for Security properties [13] - verifier of models extracted from the code of programs written in C. The correctness re- quirements are specified in a special form and correspond to the statements of the so-called “defensive” programming. During compilation, all possible execution paths are analyzed without regard to conditions. All possible tracks are collected. Of these, operators important to safety are highlighted. Having a context-free grammar of the C language, the program is presented as a pushdown automa- ton. The security model is represented as a finite-state machine that accepts a sequence of security operations. The sequence of security operators is “suitable” if it is received at the input by a state machine and puts it in a “safe” state. Since only certain aspects of the task are implemented in all the presented products, the authors of this article consider it possible and expedient to search for a more comprehensive approach that would allow creating a product whose capabilities would include methods to detect the most complete list of vulnerabil- ities presented in program codes combined with possible overhead minimization for its implementation. 4 The proposed solution It is known when verifying C / C ++ programs several scenarios are possible: – emulation of code execution in a virtual machine. – instrumentation of the executable file after the building and its execution on a real processor. – instrumentation of the program source code during compilation and subse- quent execution of the file [14]. – performing changes to the source code of the program separately before compilation, for example, by facilities of the TXL language [15]. – the symbolic execution of the internal representation of the code during the compilation [16]. – execution using the debugger after the building. 8 A.Dergachev et al. In addition, an important step is to obtain an abstract representation of reg- ister information. The most used are the internal representations of VEX and REIL. VEX uses an intermediate representation of SSA (Static single assign- ment), in which each variable is assigned a value only once. The following comprehensive approach to solving the problem is proposed: dynamic symbolic (concolic - concrete and symbolic) execution of the executable file which combines the real execution of the program with symbolic execution should be carried out. Dynamic symbolic execution will allow to applicate tech- niques of program execution paths investigation and by adding security predi- cates to path constraints to check the potentially dangerous operations for real errors included in a program. To create a more precise representation of the heap during symbolic execution, it is proposed to work directly with the “mal- loc state” structure. All the abilities and vulnerabilities from the table 1 should to be checked for each state of symbolic execution. This approach will allow to understand the applicability of attacks to errors while working with the dynamic memory. In addition, in order to reduce overhead, instead of emulating, one should run the tested application by using a special debugger. In the future, it is also planned to investigate methods for finding errors in programs with interactive input. 5 Conclusion The article examined the known techniques of attacks on dynamic memory, ap- proaches to software verification, briefly characterized the existing solutions for finding errors in program code. A new approach to finding errors in memory allocators has been proposed. In the future, this approach is planned for imple- mentation with subsequent comparison with existing solutions. References 1. Understanding glibc malloc, https://sploitfun.wordpress.com/2015/02/10/ understanding-glibc-malloc/comment-page-1/. Last accessed 25 May 2019 2. Yet another free() exploitation technique, http://phrack.org/issues/66/6.html. Last accessed 25 May 2019 3. Malloc Des-Maleficarum, http://www.phrack.org/issues/66/10.html. Last ac- cessed 25 May 2019 4. A repository for learning various heap exploitation techniques, https://github. com/shellphish/how2heap. Last accessed 08 Nov 2019 5. Common Weakness Enumeration, https://cwe.mitre.org/index.html. Last ac- cessed 25 May 2019 6. Bhardwaj, M., Bawa, S.: Fuzz testing in stack-based buffer overflow. Advances in Intelligent Systems and Computing 759, 23–36 (2019) 7. Mouzarani, M., Sadeghiyan, B., Zolfaghari, M. A.: Smart Fuzzing Method for De- tecting Heap-Based Buffer Overflow in Executable Codes.Proceedings - 2015 IEEE 21st Pacific Rim International Symposium on Dependable (2016) Dynamic Memory Errors 9 8. Karna, A.K., Chen, Y., Yu, H., Zhong, H., Zhao, J.: The role of model checking in software engineering. Frontiers of Computer Science 12(4), 642–668 (2018) 9. Dudina, I.A., Belevantsev, A.A.: Using static symbolic execution to detect buffer overflows. Programming and Computer Software.43(5), 277–288 (2017) 10. Atzeni, A., Marcelli, A., Muroni, F., Squillero, G.: HAIT: Heap analyzer with input tracing. ICETE 2017 - Proceedings of the 14th International Joint Conference on e-Business and Telecommunications, 4, pp. 327–334. (2017). 11. Eckert, M., Bianchi, A., Wang R, Shoshitaishvili Y, Kruegel C, Vigna G.: Heap Hopper: Bringing Bounded Model Checking to Heap Implementation Security, 27th USENIX Security Symposium, 2018 12. Automatic Techniques to Systematically Discover New Heap Exploitation Primi- tives, https://arxiv.org/pdf/1903.00503.pdf. Last accessed 25 May 2019 13. MOPS: an Infrastructure for Examining Security Properties of Software, http: //web.cs.iastate.edu/~hridesh/teaching/610/06/02/papers/mops-ccs02.pdf. Last accessed 25 May 2019 14. Jang, Y.-S., Choi, J.-Y.: Automatic prevention of buffer overflow vulnerability using candidate code generation. IEICE Transactions on Information and Sys- tems.E101D(12), 3005–3018 (2018) 15. Dahn, C., Mancoridis, S.: Using program transformation to secure C programs against buffer overflows. Proceedings - Working Conference on Reverse Engineering, WCRE. 2003-January 16. Loding, H., Peleska, J. Symbolic and Abstract Interpretation for C/C++ Programs / Electronic Notes in Theoretical Computer Science 217 (2008), https://www. sciencedirect.com/science/article/pii/S1571066108003885. Last accessed 25 May 2019