=Paper= {{Paper |id=Vol-2400/paper-35 |storemode=property |title=Exotic Compilers as a Malware Evasion Technique |pdfUrl=https://ceur-ws.org/Vol-2400/paper-35.pdf |volume=Vol-2400 |authors=Michele Ianni,Elio Masciari,Domenico Saccà |dblpUrl=https://dblp.org/rec/conf/sebd/IanniMS19 }} ==Exotic Compilers as a Malware Evasion Technique== https://ceur-ws.org/Vol-2400/paper-35.pdf
         Exotic Compilers as a Malware Evasion
             Technique (Discussion Paper)

               Michele Ianni1 , Elio Masciari2 , and Domenico Saccà1
1
    DIMES, University of Calabria, Rende, Italy {mianni, sacca}@dimes.unical.it
                            2
                              ICAR, CNR, Rende, Italy
                           elio.masciari@icar.cnr.it



        Abstract. The increasing complexity of new malware and the constant
        refinement of detection mechanisms are driving malware writers to re-
        think the malware development process. In this respect, compilers play a
        key role and can be used to implement evasion techniques able to defeat
        even the new generation of detection algorithms. In this paper we provide
        an overview of the endless battle between malware writers and detectors
        and we discuss some considerations on the benefits of using high level
        languages and even exotic compilers (e.g. single instruction compilers)
        in the process of writing malicious code.

        Keywords: Metamorphic malware · Obfuscation · Malware detection ·
        Single instruction compilers.


1     Introduction
History of malware, short for malicious software, is characterized by the endless
battle between malware writers and detectors. Since detection strategies are be-
coming more and more complex, malware writers have to invent new techniques
in order to evade detection. Today we can find many viruses that can be consid-
ered as pieces of art because they employ several clever ideas in order to keep
themselves as stealth as possible. The increasing complexity of new malware is
posing new intriguing challenges both from the malware writer perspective and
detection mechanisms. In order to implement complex malware, able to spread
itself on various operating systems and architectures, it could be useful to move
from pure assembly implementations, to malware written using high level lan-
guages. In this respect the use of compilers is a key concept to take into account.
Compilers, in fact, can be used to implement metamorphic techniques and obfus-
cation and can build executables able to defeat new detection mechanisms based
on the extraction of semantic patterns from the binaries. The paper is organized
as follows: in section 2 we describe several techniques used by malware writers
in order to avoid detection. In section 3 we show some of the strategies used
by antivirus software to detect malicious code and we focus our attention on
CFG based detection. In section 4 we discuss the benefits related to the use of
compilers in the process of writing malicious code and we show the advantages
    Copyright c 2019 for the individual papers by the papers authors. Copying permit-
    ted for private and academic purposes. This volume is published and copyrighted by
    its editors. SEBD 2019, June 16-19, 2019, Castiglione della Pescaia, Italy.
of using single instruction compilers as an evasion technique. Finally in section 5
we draw our conclusions.


2   Evasion techniques
The most widespread technique used by commercial anti-malware systems in
order to detect viruses is based on malware signatures [21]. They are invariant
patterns, usually taken from the program’s code or raw file content, used to
uniquely identify the given malware. To evade signature based scanners many
today viruses (called metamorphic) are able to transform themselves during
the propagation phase, without losing their capabilities [47]. To achieve this
goal several metamorphic transformation are used, including code permutation,
garbage code insertion, code shrinking and expansion, register renaming, en-
cryption [2, 3, 15]. The result of these transformations is a brand new virus that,
while keeping the functionality of its predecessor, present a different structure
and then a different signature, thus evading detection [47]. As explained in [4],
we define M ⊂ P to be the set of malicious programs, where P is the set of
all programs and S to be the set of signatures. A detector is then a function
D : P × M → {0, 1}. A program p is detected if there is a signature m ∈ S such
that D(p, m) = 1. In the case of malwares D(p, m) = 1 ⇐⇒ m is a pattern
derived from p.
    Using signature for malware detection is efficient only if it is applied to known
malware. This technique, compared to dynamic analysis techniques, has less
scanning time, lower false-positive ratio and doesn’t suffer of the risks of system
infection due to the execution of the malware. The use of code obfuscation
techniques allows to easily generate variants of known malware resulting in new
malware that cannot be detected by signature based scanners and are harder to
comprehend for an human analyst. As stated in [31] more than 80% of malware
is packed, in accordance to [41] almost 50% of new malware in 2006 were existing
malware obfuscated with packing techniques. The same trend is linked to the use
of other obfuscation approaches. There exist many examples of code obfuscation
designed to avoid AV scanners detection [17, 23, 32, 34] and all of them are able
to easily evade signature based malware detectors.
    As described in [47] except for packing, the first obfuscation technique his-
torically used is encryption [36, 37, 45]. The executable body is crypted and the
malware adds to it a decryptor that provides decryption of the body at program
runtime. Since at every infection the cryptographic keys used are different the
crypted virus body will be always different. One of the first viruses that devel-
oped this strategy is Cascade, followed by Win95/Mad and Win95 Zombie [42].
Some of this kind of viruses use also multiple layers of encryption (Win32/Coke).
The major limitation of this approach is that in most cases the decryption is in
clear text and, since it is always the same, it can be used in order to generate
a signature. To overcome this limitation malware writers created new malware
able to change also decryptor code. This led to the birth of polymorphic mal-
ware [34], that, using many different techniques [11, 26, 45] are able to generate
always different decryptors, without invariant patterns that could be used as
signatures. The first viruses that used a real 32-bit polymorphic engine were
Win95/Marburg and Win95/HPS. They have been developed and spreaded on-
line many polymorphic engines, among these we can cite “The Mutation Engine”
(MtE ) [36] that is able to easily convert non obfuscated code into polymorphic
malware. Altough polymorphic malware are effective against signature based de-
tection, they can be detected using more refined techniques. After the decryption
phase, in fact, the body of the virus will be always the same. Using sandbox-
ing techniques [36, 37] the detectors are able to emulate malware in a controlled
environment, allowing the decryptor to decrypt malware body in memory. At
this point it is still possible to use signature based techniques on the decrypted
body. To prevent malware emulation several armoring techniques have been pro-
posed [36] but the improvements in sandboxing mechanisms brought many of
them to be ineffective. To overcome all these limitations malware writers brought
obfuscation to a new level: metamorphic malware [17, 23] [11, 26, 37, 45]. They
are malware able to transform their own body during infection phase. At each
iteration metamorphic malware is rewritten so that each succeeding version of
the code is different from the preceding one.
    There are numerous techniques used by this kind of malware (and often by
polymorphic malware too):

 – Register swapping: is a technique used, for example, by Vecna’s Win95/Regswap
   virus [45]. It consists in changing registers used in various instructions during
   the evolution from generation to generation. Wildcard searching makes this
   technique ineffective.
 – Instruction substitution: it is based on substituting single, or groups, of in-
   structions with other instructions (or groups of them). The new instructions
   will be equivalent to the previous ones in functionalities but syntactically
   different [26].
 – Garbage instructions insertion: is a technique based on the insertion of
   garbage instructions, that are useless for the execution of the program. Their
   only goal is to vary malware body [1, 26, 45]. They can be single instructions
   or sequences that perform useless operations leaving unaltered the state of
   the program or even instructions located in areas of the program that will
   never be executed. In this case we are talking about dead-code.
 – Transposition: this name is used to define many instruction reordering tech-
   niques that leave unaltered the flow of the program [11]. One of these tech-
   nique consist in randomly reordering some instructions then using uncon-
   ditional jumps to reconstruct the original flow. In a more elegant way it is
   possible to isolate independent groups of instructions then modify their or-
   der. In this case, since the sequences are independent, there is no need to
   make use of unconditional jumps. Finding independent groups of instructions
   is not an easy task to perform, so, often developers use easier techniques,
   which can be considered a variant of the first one. It is based on the re-
   ordering of the various subroutines that are present inside the malware [11].
   One of the malware that used this approach is Win32/Ghost. If we have n
   subroutines we are able to generate up to n! different variants.
 – Code integration: is the most sophisticated technique used to obfuscate code.
   It has been introduced by the virus writer Zombie in Win95/Zmist (Zombie
   Mistfall). It consists in decompiling the program to infect in distinct parts
   and then inserting malware code between them. At the end the original
   program and the malware are reassembled in a single executable.
3   Malware detection
Although several theoretical studies [10, 14] have proved that an algorithm able
to detect all types of malware can not exist, a lot of effort has been put to
improve detection mechanisms. Several methods have been proposed, varying
from support vector machine (SVM) [7], to decision trees [25,33], to Naive Bayes
Method [38]. There are two different approaches in malware detection: static and
dynamic analysis. The first analyzes a binary without executing it. Since this
technique is safer, faster and easier to implement than dynamic analysis, it is
the most widespread approach, even if it is more limited than the latter. Some
examples of operations that are performed by static analysis are finding patterns
on executables and code flow analysis. In addition to be fast and safe, static
analysis has the advantage of reaching complete application code coverage, thus
reducing the number of false positives in malware detection. The main problem
in using static analysis is that it is very difficult to detect unknown malware.
Dynamic analysis, instead, runs malware in a sandbox simulating the behaviour
of a real environment, monitoring all system calls. It is clear that the speed of
performing this kind of analysis is much slower than static analysis techniques. In
addition to that, several techniques have been proposed from malware writers in
order to check if the infected executable is running inside a virtual machine, and,
in that case, changing its own behaviour in a non offensive one. To overcome
the limitation of static analysis they have been introduced many techniques
based on the concept of code normalization. These techniques try to reduce
obfuscated code to a base form that is the same for all obfuscated variants of the
same executable. Many different approaches to code normalization have been
proposed. Among them we can cite Christodorescu et al. [12], Walenstein et
al. [44] and Lakhotia et al. [28]. In the latter the base form is called “zero form”
and the process to reduce an executable to that form is called “zeroing”. As
herm1t states in [20] if this kind of approach was perfect a tool that could perform
zeroing reducing all possible variants of a virus into a single “normalized” form
(not necessarily optimal), could then use this strategy with every algorithm thus
proving or refuting their identity. That’s known to be indecidable. The effort of
the creators of detectors is then focused on capturing semantic pattern inside
an executable rather than invariant synctactic features. These techniques vary
from API call analysis [35] to Control Flow Graph analysis.
    As explained in [40] a Control Flow Graph (CFG) is a graph in which the
nodes are basic blocks of execution and the edges are possible control flow trans-
fers between them. They are used both for malware detection and vulnerabil-
ity discovery. CFG recovery is a widely discussed topic in literature, we can
cite [13, 24, 27, 39, 43, 46].
    Many CFG recovery algorithms deal with the problem of indirect jumps.
An indirect jump occurs when the control flow is transferred to a target repre-
sented by a value in a register or to a memory location. This makes flow analysis
much harder because the destination of the jump is not easily resolvable. This
is because it could depend from computations specified in code, from the appli-
cation context or even from function pointers used in object oriented languages
to implement object polymorphism [40].

Control Flow Graph based malware detection As previously stated, due to the
difficulty on isolating invariant synctactic features of a self-mutating malware,
the effort of the creators of detectors is focused on capturing semantic patterns.
CFG are widely used to find similiarities among executables [18] and, more in
detail, among malwares [4–6,8,9,19,22,30]. The techniques proposed in literature
are based on generating the CFG of a program P to analyze. The generated CFG
is the compared to a set of CFGs of known viruses in order to find isomorphic
components. Usually ( [6]) before extracting the CFG from a program P a set
of normalization operations [28] are performed on the binary. This step aims to
reduce the effects of mutation techniques. The normalized binary is then used to
extract the CFG which is then compared to CFG extracted by normalized known
malware. If the CFG of the normalized program contain a subgraph isomorphic
to the CFG of a malware, then the executable is marked as malicious.


4     Using compilers to evade detection
The advances in malware detection and the plethora of different devices and
operating systems in use nowadays, pose new intriguing challenges to malware
writers. The use of assembly language is becoming more and more painful, be-
cause of the difficulties involved to write portable and easy-to-support code. In
the forward-looking article “Recompiling the metamorphism” [20], the author,
herm1t, suggests to make use of high level languages in order to overcome the
difficulties involved in developing malware in pure assembly. He outlines how us-
ing a high level language gives to the developer the opportunity to easily extract
additional information from the code, rather than builtin support for features
like hashes, iterators or objects. The idea of ”recompiling the metamorphism”
is without any doubt interesting and introduces many advantages to the virus
writer, however, not all the benefits of using compilers have been considered in
detail. In order to evade new anti-malware techniques based on the extraction
of semantic patterns from executables, compilers could play an important role.
They, in fact, are able to generate executables characterized by very different
structures and could be used in order to defeat detection mechanisms. In this
respect we take into accounts the benefits introduced by using exotic compilers,
like the M/o/Vfuscator2 3 in order to obtain different CFGs, thus fooling the
CFG based detection.

Single instruction compilers as an evasion technique In [16], Dolan demonstrates
the Turing-completeness of the x86 instruction mov. After the publication of the
article many people started to implement, most of the time for fun, single in-
struction compilers, capable to compile arbitrary programs into lists of only mov
instructions. Several different instruction turned out to be Turing-complete, so
many different single instruction compilers arose 4 . Even if at first sight it may
seem just like a funny fact, the use of single instruction compilers has very inter-
esting consequences. Since in single instruction compiled programs comparisons,
jumps, function calls are all implemented with a single instruction, the resulting
CFG is a single (usually long) basic block. This result is very interesting, because
having a CFG composed by a single basic block make ineffective all detection
mechanisms based on CFG isomorphism detection. Using a single instruction
3
    https://github.com/xoreaxeaxeax/movfuscator
4
    https://github.com/xoreaxeaxeax/movfuscator/tree/master/post
compiler, however, has its drawbacks. First of all a program compiled with a
single instruction compiler could be considered suspicious, thus marked as mali-
cious. In addition to that, the size of a program compiled with a single instruction
compiler, is, most of the time, much bigger than the size of the same program
compiled using the entire set of instructions. This introduces some problems to
virus writers that in many cases have a limited space in the binary they are going
to infect, so they should keep the malicious code as small as possible. Sometimes
even the performances of the malicious code is important and programs com-
piled with a single instruction compiler usually are slower than traditional ones
in terms of execution speed. These problems lead to rethink the way single in-
struction compilers are used for evasion purposes. A simple but effective solution
could be splitting the source code at compilation time. In our implementation we
mark with source code annotation the functions that we want to compile with
a single instruction compiler, then at compilation time, making use of the tools
provided by LLVM [29] we are able to build an executable that uses the full set
of instructions for the code that cannot be used to determine its malicious be-
haviour and a single instruction for the malware routines. In this way we are able
to build a much smaller executable that is still able to fool CFG based malware
detection. It is important to underline that the single block of single instruction
compiled code can be furtherly modified using the metamorphic techniques de-
scribed in section 2. This single block can be also manipulated in order to obtain
different CFGs, this can be easily done by inserting jumps or comparisons, thus
creating branches in the graph. To further variate the result of the obfuscation,
several different single instruction compilers can be adopted, resulting in a great
variety of CFGs, making very hard for the detectors to extract signatures, both
syntactically, due to metamorphic transformation, and semantically, thanks to
the always changing CFG.


5   Conclusions

In this paper we presented and overview of the techniques used by malware in
order to avoid detection as well as some detection mechanisms. We showed the
benefits related to the use of compilers on the process of malware creation and
we proposed the use of single instruction compilers as an evasion mechanisms for
CFG based malware detection. In order to overcome the limitations of this kind
of compilers we proposed several solutions that can greatly increase the ability
of malware to hide itself from detectors.


References
 1. Balakrishnan, A., Schulze, C.: Code obfuscation literature survey. CS701 Construc-
    tion of compilers 19 (2005)
 2. Barak, B., Goldreich, O., Impagliazzo, R., Rudich, S., Sahai, A., Vadhan, S., Yang,
    K.: On the (im) possibility of obfuscating programs. In: Annual International Cryp-
    tology Conference. pp. 1–18. Springer (2001)
 3. Beaucamps, P., Filiol, É.: On the possibility of practically obfuscating programs
    towards a unified perspective of code protection. Journal in Computer Virology
    3(1), 3–21 (2007)
 4. Bonfante, G., Kaczmarek, M., Marion, J.Y.: Control flow graphs as malware sig-
    natures. In: International workshop on the Theory of Computer Viruses (2007)
 5. Briones, I., Gomez, A.: Graphs, entropy and grid computing: Automatic compari-
    son of malware. Virus Bulletin pp. 1–12 (2008)
 6. Bruschi, D., Martignoni, L., Monga, M.: Detecting self-mutating malware using
    control-flow graph matching. In: International Conference on Detection of Intru-
    sions and Malware, and Vulnerability Assessment. pp. 129–143. Springer (2006)
 7. Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data
    mining and knowledge discovery 2(2), 121–167 (1998)
 8. Cesare, S., Xiang, Y.: Classification of malware using structured control flow. In:
    Proceedings of the Eighth Australasian Symposium on Parallel and Distributed
    Computing-Volume 107. pp. 61–70. Australian Computer Society, Inc. (2010)
 9. Cesare, S., Xiang, Y.: A fast flowgraph based classification system for packed and
    polymorphic malware on the endhost. In: 2010 24th IEEE International Conference
    on Advanced Information Networking and Applications. pp. 721–728. IEEE (2010)
10. Chess, D.M., White, S.R.: An undetectable computer virus. In: Proceedings of
    Virus Bulletin Conference. vol. 5, pp. 1–4 (2000)
11. Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious pat-
    terns. Tech. rep., WISCONSIN UNIV-MADISON DEPT OF COMPUTER SCI-
    ENCES (2006)
12. Christodorescu, M., Kinder, J., Jha, S., Katzenbeisser, S., Veith, H.: Malware nor-
    malization. Tech. rep., University of Wisconsin (2005)
13. Cifuentes, C., Van Emmerik, M.: Recovery of jump table case statements from
    binary code. Science of Computer Programming 40(2-3), 171–188 (2001)
14. Cohen, F.: Computer viruses. Computers & security 6(1), 22–35 (1987)
15. Collberg, C., Thomborson, C., Low, D.: A taxonomy of obfuscating transforma-
    tions. Tech. rep., Department of Computer Science, The University of Auckland,
    New Zealand (1997)
16. Dolan, S.: mov is turing-complete. Tech. rep., Tech. rep. 2013 (cit. on p. 153) (2013)
17. Driller, M.: Metamorphism in practice. 29A Magazine 1(6) (2002)
18. Dullien, T., Rolles, R.: Graph-based comparison of executable objects (english
    version). SSTIC 5, 1–3 (2005)
19. Eskandari, M., Hashemi, S.: Metamorphic malware detection using control flow
    graph mining. Int. J. Comput. Sci. Network Secur 11(12), 1–6 (2011)
20. herm1t: Recompiling the metamorphism. https://83.133.184.251/virensimulation.org/lib/vhe11.html
    (2002), accessed: 2018-11-13
21. Idika, N., Mathur, A.P.: A survey of malware detection techniques. Purdue Uni-
    versity 48 (2007)
22. Jeong, K., Lee, H.: Code graph for malware detection. In: 2008 International Con-
    ference on Information Networking. pp. 1–5. IEEE (2008)
23. Julus, L.: Metamorphism. 29A Magazine 1(5) (2000)
24. Kinder, J., Veith, H.: Jakstab: A static analysis platform for binaries. In: Interna-
    tional Conference on Computer Aided Verification. pp. 423–427. Springer (2008)
25. Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild.
    In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge
    discovery and data mining. pp. 470–478. ACM (2004)
26. Konstantinou, E., Wolthusen, S.: Metamorphic virus: Analysis and detection. Royal
    Holloway University of London 15, 15 (2008)
27. Kruegel, C., Robertson, W., Valeur, F., Vigna, G.: Static disassembly of obfuscated
    binaries. In: USENIX security Symposium. vol. 13, pp. 18–18 (2004)
28. Lakhotia, A., Mohammed, M.: Imposing order on program statements to assist
    anti-virus scanners. In: Reverse Engineering, 2004. Proceedings. 11th Working Con-
    ference on. pp. 161–170. IEEE (2004)
29. Lattner, C., Adve, V.: Llvm: A compilation framework for lifelong program anal-
    ysis & transformation. In: Proceedings of the international symposium on Code
    generation and optimization: feedback-directed and runtime optimization. p. 75.
    IEEE Computer Society (2004)
30. Lee, J., Jeong, K., Lee, H.: Detecting metamorphic malwares using code graphs. In:
    Proceedings of the 2010 ACM symposium on applied computing. pp. 1970–1977.
    ACM (2010)
31. Lyda, R., Hamrock, J.: Using entropy analysis to find encrypted and packed mal-
    ware. IEEE Security & Privacy 5(2), 40–45 (2007)
32. Mohanty, D.: Anti-virus evasion techniques and countermeasures. Published online
    at http://www. hackingspirits. com/eth-hac/papers/whitepapers. asp. 18 (2005)
33. Moser, A., Kruegel, C., Kirda, E.: Exploring multiple execution paths for malware
    analysis. In: Security and Privacy, 2007. SP’07. IEEE Symposium on. pp. 231–245.
    IEEE (2007)
34. Rajaat: Polimorphism. 29A Magazine 1(3) (1999)
35. Sathyanarayan, V.S., Kohli, P., Bruhadeshwar, B.: Signature generation and de-
    tection of malware families. In: Australasian Conference on Information Security
    and Privacy. pp. 336–349. Springer (2008)
36. Schiffman,        M.:      A      brief    history      of    malware       obfusca-
    tion:     Part     1    of    2.    Published    online    at    https://blogs.cisco.
    com/security/a brief history of malware obfuscation part 1 of 2, accessed: 2018-
    11-13
37. Schiffman,        M.:      A      brief    history      of    malware       obfusca-
    tion:     Part     2    of    2.    Published    online    at    https://blogs.cisco.
    com/security/a brief history of malware obfuscation part 2 of 2, accessed: 2018-
    11-13
38. Schultz, M.G., Eskin, E., Zadok, F., Stolfo, S.J.: Data mining methods for de-
    tection of new malicious executables. In: Security and Privacy, 2001. S&P 2001.
    Proceedings. 2001 IEEE Symposium on. pp. 38–49. IEEE (2001)
39. Schwarz, B., Debray, S., Andrews, G.: Disassembly of executable code revisited. In:
    Reverse engineering, 2002. Proceedings. Ninth working conference on. pp. 45–54.
    IEEE (2002)
40. Shoshitaishvili, Y., Wang, R., Salls, C., Stephens, N., Polino, M., Dutcher, A.,
    Grosen, J., Feng, S., Hauser, C., Kruegel, C., et al.: Sok:(state of) the art of war:
    Offensive techniques in binary analysis. In: 2016 IEEE Symposium on Security and
    Privacy (SP). pp. 138–157. IEEE (2016)
41. Stepan, A.: Improving proactive detection of packed malware. Virus Bulletin 1
    (2006)
42. Szor, P., Ferrie, P.: Hunting for metamorphic. In: Virus bulletin conference. Prague
    (2001)
43. Troger, J., Cifuentes, C.: Analysis of virtual method invocation for binary trans-
    lation. In: Reverse Engineering, 2002. Proceedings. Ninth Working Conference on.
    pp. 65–74. IEEE (2002)
44. Walenstein, A., Mathur, R., Chouchane, M.R., Lakhotia, A.: Normalizing meta-
    morphic malware using term rewriting. In: Source Code Analysis and Manipula-
    tion, 2006. SCAM’06. Sixth IEEE International Workshop on. pp. 75–84. IEEE
    (2006)
45. Wong, W., Stamp, M.: Hunting for metamorphic engines. Journal in Computer
    Virology 2(3), 211–229 (2006)
46. Xu, L., Sun, F., Su, Z.: Constructing precise control flow graphs from binaries.
    University of California, Davis, Tech. Rep (2009)
47. You, I., Yim, K.: Malware obfuscation techniques: A brief survey. In: Broadband,
    Wireless Computing, Communication and Applications (BWCCA), 2010 Interna-
    tional Conference on. pp. 297–300. IEEE (2010)