<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>VMorph: A Virtualization/Metamorphic Framework for Binary Obfuscation and Intellectual Property Protection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pierciro Caliandro</string-name>
          <email>p.caliandro@ing.uniroma2.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Ciccaglione</string-name>
          <email>m.ciccaglione@ing.uniroma2.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Pepe</string-name>
          <email>pepe.andmj@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Bianchi</string-name>
          <email>giuseppe.bianchi@uniroma2.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Pellegrini</string-name>
          <email>a.pellegrini@ing.uniroma2.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Tor Vergata University of Rome</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we analyse the efectiveness of combining obfuscation and metamorphism techniques to evade antivirus detection and protect intellectual property. We do so by introducing a new framework called VMorph, which jointly utilises emulation and metamorphic techniques to thwart attempts at reconstructing the application's behaviour and accessing internal details or secrets. We assess the performance of VMorph to determine its suitability for safeguarding the intellectual property of the application. The findings indicate a (bounded) decrease in performance, which is still acceptable for securing applications. Additionally, we investigate the stealth capabilities of the proposed technique, which enhances its ability to bypass common static analysis techniques. Based on the results, we also suggest detection techniques that can be employed to mitigate the risk that this technique is used to evade antivirus detection.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Binary obfuscation</kwd>
        <kwd>Metamorphic techniques</kwd>
        <kwd>Virtualization-based obfuscation</kwd>
        <kwd>Intellectual property protection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Binary obfuscation encompasses a range of methodologies aimed at manipulating applications to
conceal their internal logic by means of code confusion [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. The ultimate goal of binary obfuscation
is to make the analysis process more dificult and time-consuming [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. These techniques serve a dual
purpose: they can be employed to secure applications against reverse engineering [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ], or they can
be abused for malicious intents, such as evading antivirus detection by malware [7, 8].
      </p>
      <p>Several attempts have been made to provide techniques that enable automatic obfuscation of generic
applications. Binary code permutation is a simple reorganisation of the control flow graph of the
application [9], while encryption and packing are strong obfuscation techniques largely difused in
malicious application [10]. Encryption is based on the use of cryptography to hide the representation
of the program [11], and packing [12] is the art of embedding the application into some container that
cannot be executed directly but needs the support of another piece of code, the unpacker, which brings
the payload into executable form and activates it.</p>
      <p>Binary code virtualization [13, 14, 15] is essentially the conversion of executable code from a
wellknown (typically native) Instruction Set Architecture (ISA) into a new one, which is often hand-crafted
and undocumented, and that can be interpreted by an underlying interpreter or Virtual Machine (VM).
This technique results in a file that cannot be executed directly on the CPU but must be loaded as a
guest on the VM and then interpreted and emulated by it.</p>
      <p>All of these techniques sufer from a common pitfall: they are inherently static in that, once generated,
the program’s binary image is not subject to any change. This means that static analysis and debugging
techniques can allow uncovering the code’s original version or providing insights on the program’s
logic [16, 17, 18]. In some cases, the process can also be automated [15].</p>
      <p>In contrast, metamorphism [19, 20] is a technique that exceeds this limitation by changing the
binary form of the program at each generation without impacting the program’s semantics. Therefore,
any static or automatic analysis will face diferent versions of the same program whenever the same
application is started, making it more dificult to get insights into what the program is doing. Both
static and dynamic analysis techniques face significant challenges. However, more advanced detection
methods, particularly those that analyse similarities between diferent generation of programs, may
efectively identify metamorphic applications [21, 22].</p>
      <p>Jointly using obfuscation and metamorphic techniques may ofer a more covert approach to hiding
application internals. To the best of our knowledge, the joint use of these techniques has never been
explored. In this paper, we analyse the efectiveness of combining these techniques by introducing
VMorph, a novel framework for code obfuscation that employs both virtualization and metamorphism.</p>
      <p>VMorph allows the automatic building of an executable obfuscated using emulation, either starting
from the application’s sources or the compiled version, thanks to binary transformation techniques.
Metamorphism is applied to the virtualised ISA used by the emulator and the emulator itself. This
approach results in a final binary that mutates itself entirely upon each execution. Thus, the metamorphic
transformations are concealed, making it dificult to correlate the diferent versions.</p>
      <p>We evaluated the stealth capabilities of this approach by exercising existing malware samples
transformed with VMorph against of-the-shelf antiviral solutions. Based on the results, we propose possible
detection approaches that can be studied to reduce the impact of this kind of threat. Also, we study
the performance impact of VMorph on transformed applications to understand the viability of the
approach to protect benign software from reverse engineering for intellectual property (IP) protection
purposes.</p>
      <p>The remainder of this paper is structured as follows. In Section 2, we discuss related work. Our
metamorphic virtual machine is presented in Section 3. We present an experimental assessment in
Section 4.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Obfuscation techniques are widely discussed in the literature, both from the evasion and detection point
of view. The panorama is vast, with techniques such as ROP-based obfuscation [23], opcode hiding [24],
code virtualization [25, 26], control flow redirection based on exceptions [ 27] and mimimorphic
techniques [28] based on attacking both semantic and statistical analysis. Another type of hiding technique
is based on the use of packers, i.e. compressing or encrypting binary code in the executable file, adding
an unpacking stub that, when activated, reverses the packing operation before giving control to the
entry point of the original code [29, 30, 31, 29].</p>
      <p>All of these obfuscation techniques share the same problem: they are static and do not change
executable files over time. Therefore, once reverse engineers have recognised the underlying pattern
(e.g. how to unpack or how the code is virtualised), they can construct a deobfuscated version of the
executable to analyse it without impediments. Our solution to this problem does not involve including
additional levels of complexity in the binary representation, as many of the above contributions did.
Instead, we switched focus, allowing the obfuscated application to modify itself over time. This is done
through a metamorphic engine that can drastically rewrite itself, based on some source of randomness,
to prevent reverse engineers from observing two identical versions of the same program.</p>
      <p>Malware writers have applied metamorphism to protect their code against detection activities. The
most relevant example is Metaphor [32, 33]. Indeed, the author showed how to build a metamorphic
engine to efectively change the binary code of malware. The techniques reported in [ 32] were a
starting point for this work. VMorph extends current literature results by combining virtualization and
metamorphism, allowing developers to obfuscate and metamorphose their applications even if they
were not built natively to do so. The result is that VMorph’s main improvement to the state of the art
introduces an obfuscation technique which can be resilient to static analysis by converting the original
binary code into a self-modifying ISA.</p>
      <p>Guest program input
program vector
0
8191</p>
      <p>Fetch
instruction</p>
      <p>MorphVM
memory vector
0</p>
      <p>In the field of binary manipulation, various proposals in the literature ofer the capability to manipulate
executable files (see, e.g.,[ 34, 35, 36]) with difering levels of flexibility. These proposals primarily focus
on studying or enhancing the non-functional properties of the applications, which limits the scope of
modifications they can support. Conversely, to create an efective metamorphic engine, a significantly
broader range of alterations to binary files must be supported. This necessitates the development of
meticulous solutions to avoid compromising the correctness of the application and to extract information
that is not explicitly available in the program, possibly through ad-hoc heuristics. This forms a crucial
part of the proposal on which VMorph is based.</p>
      <p>The problem of the static structure of virtualised program emulators has already been acknowledged
in the literature, and several papers have attempted to address it. In [26], the authors propose a technique
that employs diversified dispatchers to decode the program. This approach makes it more complicated
to analyse the structure of the emulator, but it still relies on a static structure, which we overcome
with VMorph. In [37], the authors enhance the protection of virtualised code by leveraging encryption
based on dynamically generated keys, which depend on an anti-tamper code that, if modified during the
analysis, no longer generates the correct keys. This approach relies on an additional layer of security,
which we consider orthogonal to the mechanism we present in this paper.</p>
    </sec>
    <sec id="sec-3">
      <title>3. VMorph</title>
      <p>The VMorph metamorphic virtualization framework includes a retargetable virtual machine interpreter
and a metamorphic engine. VMorph can be integrated into the source code as a development library,
or used on compiled applications. When used as a library, developers can obfuscate specific parts of the
application using VMorph’s API, while leaving optimised code untouched. In this way,
performancecritical parts of the applications can be left native. Binary transformation can be useful for
privacypreserving purposes, obfuscating and protecting applications which were not explicitly designed to
natively rely on code protection mechanisms. With this approach, the VMorph framework can also
be included in a CI/CD pipeline. We implemented binary transformation by means of a transpiling
approach, but for the sake of space, we do not discuss this aspect in this paper.</p>
      <sec id="sec-3-1">
        <title>3.1. Interpreter and ISA</title>
        <p>The general architecture of the VMorph interpreter is shown in Figure 1. The interpreter relies on two
large arrays representing the virtualised program and memory. The former contains instructions for
the virtualised program, which are fetched one by one, decoded, and executed; the latter holds data for
the guest program. The interpreter maps to the memory array diferent logical regions necessary to
support the correct execution of generic applications. In the initial part, we keep global data, read-only
variables and optional command line arguments used by the guest program, in this order. Then we
place contiguously the heap and the stack of the virtualised application.</p>
        <p>As mentioned earlier, one of the goals of VMorph is to enable the metamorphism of both the
interpreter and the virtualised program. To achieve virtualised program metamorphism, we do not rely
on a single ISA; instead, we have defined a generic structure for the ISA, upon which multiple variations
can be created. Therefore, the program uses only one of the possible ISAs that the interpreter can run.
When a metamorphic transformation is applied to the virtualised program, a new representation of
the instructions is randomly generated, making it more dificult to correlate the results of previous
iterations of static analysis. Each byte of the program can either be part of a legitimate instruction
or a padding byte. In fact, padding is one way in which multiple versions of the ISA are implemented,
as there can be several diferent encodings for each logical instruction while maintaining the same
meaning.</p>
        <p>The ISA register file is divided into several categories. There are 16 general-purpose registers, namely
r0 through r15. The tgt register keeps the destination address of every jump the guest program
performs. The v0 register keeps the return value of subroutines, while ret keeps the return address. ip
holds the program counter, while sp points to the top of the stack. The brk register keeps the program
break to identify the valid portion of the virtualised heap.</p>
        <p>All registers are 64-bit. They are a mixture of diferent architectural organisations of real-world CPUs,
taking some ideas from MIPS, x86, RISC-V, and ARM architectures [38]. This choice avoids abiding by a
single architectural organisation in the attempt to make it less evident what certain logical structures
are used for in static analysis.</p>
        <p>Registers are used with specific calling conventions. When invoking functions (belonging to the
virtualised program, to some library or a syscall), parameters are placed into registers r0 through r15
in sequential order. This means that there must be a mapping between the native ISA and the VMorph
one. To accommodate the diferent calling conventions used by libc and syscalls, before each invocation,
the value in each register used to pass a function parameter is saved by pushing it onto the stack. After
the function returns, the saved values are restored back into their respective registers.</p>
        <p>The libc register maintains a second-order opcode to select any standard C library function with
a single instruction. Dedicated support for the standard library is essential because the interpreter
mediates access to the underlying system, ensuring the program remains under the interpreter’s control
and cannot bypass it by calling the regular standard library directly. Additionally, providing personalised
access to the standard library, where most functions are internally reimplemented by VMorph, can
reduce the chances of studying a virtualised program through behavioural analysis at the standard
library level.</p>
        <p>The instructions supported by the interpreter are grouped in classes, such as arithmetic-logic ones or
load/store to access guest memory. We provide a lea (load efective address) instruction to convert
addresses from the guest to the host address space. All jumps are indirect and use the dedicated tgt
register to keep the destination in the program array. Conditional jump operations combine comparison
and jump functionalities into one instruction, resulting in the modification of the ip register (based on
the tgt register’s value) when the specified condition is satisfied.</p>
        <p>The arit instruction indexes another second-order opcode table that implements arithmetic
operations, both integer and floating point. The brk instruction grows the heap in the memory array when
dealing with memory allocation. syscall and libc instructions are, respectively, the access points to
invoke a system call or a standard library function (relying on the libc register mentioned above). The
morph instruction explicitly activates the VMorph metamorphic engine.</p>
        <p>The brk register and instruction are used by a custom memory allocator to support dynamic memory
allocation and deallocation through malloc and free. The need for a custom dynamic memory
manager is similar to the reasons for having a custom standard library: to prevent the virtualised
program from escaping the control of the interpreter.</p>
        <p>The basic allocator implemented in VMorph is quite simple: the entire available heap space is
divided into chunks, each associated with metadata that includes the chunk’s size and a flag indicating
whether the chunk is in use. Whenever the virtualised program needs a new block of dynamic memory,
the allocator first checks if there is a previously allocated block with a size greater than or equal to
the requested size that has been freed. If so, the block is split into two if its size is more than twice
the requested size. A new block is allocated using the brk instruction if no free blocks exist. If the
brk instruction fails, an ENOMEM error is returned to the virtualised program. When free is called,
the “in-use” flag is reset, and the memory content is zeroed out before making it available for future
allocations1.</p>
        <p>The binary format of the instructions is shown in Figure 2: bold red bits are random padding, while
regular black ones are semantically valid. Random padding bytes can appear in the middle of instructions
to increase their confusion. The meaning of each (non-padding) byte is the following:
• The first byte is the primary opcode: its first 5 bits encode the first-level ISA operation, while the
last 3 specify how many padding bytes immediately follow.
• The second is the mod byte, whose first 4 bits are the size of the source and destination operands
(if present), and the last 4 bits tell how many padding bytes will be placed before each operand.
• Then, there are the operands preceded by padding bytes, if any.</p>
        <p>Each instruction can have 0 to 2 operands, which come in various types and sizes. The register
encodings are 8 bits in size, and their value corresponds to the register number. The size of a memory
operand is 16 bits, representing an ofset in the memory array. The size of the value loaded/written
from/to memory depends on the current value of the size register, which specified the size of the data
transfer in bits (up to 64). An immediate also has a size of 16 bits, representing the immediate value
itself. We do not support immediate sizes other than 16 bits.2.</p>
        <p>It is noteworthy to emphasize the importance of padding. As depicted in Figure 2, a byte can be used
to encode a valid ISA opcode or act as a padding byte. In this sense, the same value can have diferent
meanings in the program’s byte stream. Padding is also helpful in creating a CISC-like ISA for VMorph
because the size of an instruction can vary depending on the number of padding bytes used to encode
it. However, the interpreter’s implementation follows a RISC-like approach, which can be confusing to
analysts.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Metamorphic Transformations</title>
        <p>The VMorph framework allows for automatic metamorphism of both the virtualised program (from
now on, the “guest”) and the interpreter (from now on, the “host”). The guest and host undergo diferent
transformations due to their technical nature. Our reference implementation targets Linux systems
adhering to the System V ABI [39]. We discuss separately how we handle host and guest metamorphism.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Host Metamorphism</title>
          <p>For the host program, metamorphism involves rewriting the binary code to change the order and type
of instructions without altering its functionality. This approach takes inspiration from previous work
in [32] and divides the metamorphic engine into functional blocks, as shown in Figure 3. The process
begins with the disassembler, which reads and interprets ELF binaries [39], and outputs an intermediate
representation composed of multiple linked data structures, inspired by [34]. This representation
includes function information, ELF metadata detailing the position and size of sections and segments,
and cross-reference information between instructions and data. Since metamorphic transformations
must preserve references, any modifications must respect these relationships. Transformations are
1This last step is purposely not compliant with standard operations of the malloc library.
2Larger values can be constructed in the 64-bit registers using arithmetic/logic instructions. This further complicates the
generated virtualised code.</p>
          <p>main:</p>
          <p>...push $rbp
create</p>
          <p>Elf intermediate
representation</p>
          <p>metadata
Disassembler</p>
          <p>Expander
Shrinker</p>
          <p>Assembler</p>
          <p>Engine
always performed on this intermediate representation to ensure that relocated instructions or data
remain semantically correct and consistent.</p>
          <p>The expander reorders the instruction blocks, inserting additional jumps to preserve the logical flow
of the program. More importantly, it replaces specific instruction sequences with one or more equivalent
statements, guided by a set of predefined rules. For instance, a jump instruction can be transformed into
a push followed by a return, maintaining semantic equivalence but altering the structure of the binary.
Each type of statement is associated with transformation rules identified by the opcode mnemonic or
the behaviour of the instruction. Because certain transformations only apply when particular conditions
hold, some rules must be carefully chosen, such as a move↔push/pop replacement that only works
when the source operand is a register.</p>
          <p>Since applying these transformations repeatedly could cause unlimited binary growth, the shrinker
component then reverts some of the changes introduced in previous metamorphic steps. By restoring
certain instructions to their original form, the shrinker maintains a balance between complexity and
size, ensuring that the binary remains manageable while still varying enough to prevent easy pattern
recognition.</p>
          <p>Finally, the assembler reconstructs a functional ELF file from the instructions produced by the
expander and the shrinker. It uses Keystone [40] to generate valid machine code and updates the section
and program headers, reassigning the entry point and ensuring that all references between instructions
and data remain correct. This process yields a binary that preserves its original functionality, but has
been rewritten in such a way that it can resist static analysis and code-reconstruction attempts.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Guest Metamorphism</title>
          <p>Because the program is interpreted by a custom interpreter, it is possible to manipulate both the ISA
and its semantics during guest metamorphism. Here, metamorphism involves modifying not only the
program array but also the memory array. Within the program array, the opcode-to-handler mapping
can be completely changed. The set of valid opcode bits illustrated in Figure 2 and any padding values
are fully remapped, including both the number of padding bytes embedded within each instruction
and the values of those bytes. Consequently, the ISA employed by VMorph varies from execution
to execution. The reassignment of opcodes obscures the type of each instruction, and variations in
padding bytes mask operand types and values.</p>
          <p>In the memory array, the objective is to conceal the guest program’s data by introducing arbitrary</p>
          <p>Extract a
transformation
randomly</p>
          <p>Verify that this
transformation can
be applied to the
selected instruction</p>
          <p>Extract parameters
from the source
instruction like
registers and
immediate values</p>
          <p>Build the new
instructions using
keystone and the
information gathered
from previous phase</p>
          <p>Put the generated
instructions in the
proper list of
instructions
amounts of padding bytes before each piece of actual data. Although the addresses used by the
instructions remain unchanged, it becomes impossible to distinguish real data from garbage. To ensure
that the virtual machine can still resolve correct locations, a simple symbol table indexes addresses
and variable sizes. This table is an array of pointers to unsigned long values, with one entry per
memory address. An entry contains a valid memory area only if a variable starts at that address. In
such cases, the first 32 bits indicate the number of padding bytes preceding the variable, and the last 32
bits indicate the variable’s size. During interpretation, any memory access instruction is resolved by
converting the virtual address to a physical address, adding the padding bytes specified by the symbol
table. During metamorphic transformations, this table identifies variables and their lengths, and the
amount of padding is adjusted based on how many variables are inserted.</p>
          <p>These modifications are not visible while the program is loaded in memory. During metamorphosis,
only the program’s disk image is modified. The application that uses VMorph must rewrite itself on
disk to persist these transformations. This presents a challenge, since the operating system denies write
permissions to a running binary. To address this issue, VMorph uses a separate hook executable that
rewrites the binary file. The interpreter constructs the new version of the virtualised program in a
shared memory bufer. At the end of the metamorphic transformation, the hook is triggered, waits
for the interpreter to exit, and then acquires write access to the executable file to finalise the updated
version.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Detection and Countermeasures</title>
        <p>VMorph can be leveraged to obfuscate and metamorphose malicious code, evading both static and
dynamic analysis methods. Even the most sophisticated static checks, such as those proposed in [41], are
unable to detect the transformed payload produced by VMorph, which alters both its semantics and its
representation. Although an adversary may gain advantage by adopting VMorph, modern approaches
to counter such threats can rely on dynamic analysis, often employing sandboxing mechanisms. Since
a VMorph-obfuscated application executes its logic through an interpreter, heuristic-based measures
involving system call tracing or other run-time behaviours can label the running instance as malicious.</p>
        <p>A further strategy to detect metamorphosis is to identify recurring patterns that point to
transformation activities, perhaps by integrating static and dynamic observations to ascertain the characteristics of
the current ISA and the manner in which the emulator is generated before metamorphosis takes place.
Several studies have introduced a variety of static analysis approaches, for instance the “Smart Scanning”
method presented in [42], which removes redundant instructions to apply signatures more efectively.
However, such methods fail against VMorph due to the complexity and breadth of its transformations.
Another method described in [43] relies on pattern matching between multiple versions of the same file,
deriving signatures from system calls and library functions, and quarantining programs with scores
exceeding a given threshold. Similarly, the work presented in [44] proposes static analysis aided by
execution traces. Enhancing these latter approaches to include dynamic analysis, possibly before any
metamorphosis is performed, could produce detailed insight into the current ISA and the emulator’s
structure. Statistical [45] and machine learning techniques [46] have also shown promise, allowing
classifiers to identify and flag metamorphosed malware samples as malicious.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Assessment</title>
      <p>We conducted an experimental assessment of VMorph to evaluate its performance, the correctness of
virtualized programs, and its ability to conceal the implementation of the original application to reduce
the risk of reverse engineering. We isolated diferent parts of VMorph in the experiments, to better
understand their efects on the overall architecture.</p>
      <sec id="sec-4-1">
        <title>4.1. Metamorphic Transformation Performance</title>
        <p>To measure the impact of metamorphic transformations, experiments have been carried out using two
compute intensive benchmark applications, namely mthtest2 [47], which calculates the position of the
moon, and a simplified version of Google’s pagerank algorithm [48]. Both applications were evaluated
in their original binary forms and subsequently metamorphosed multiple times to assess changes in
their structure after repeated transformations. The tests were run on a machine equipped with an Intel
Core i5 11ℎ generation CPU and 16 GB of DDR4 memory, with a Linux operating system using kernel
version 6.7.4.</p>
        <p>Metamorphic transformations are inherently random and may produce negligible or
counterproductive modifications. To ensure meaningful results, a sampling methodology was defined. One hundred
metamorphic transformations were applied to each benchmark. Starting from the initial binary, ten
diferent versions were generated, collectively referred to as step 1. Each version produced in step 1 was
then metamorphosed again to yield ten new variants, known as step 2. This operation was repeated
until reaching step 10. For each step, a unique seed (yet diferent from other steps) was used to guide the
metamorphic transformations, increasing the probability that the newly generated binaries would difer,
even if only slightly, from one another. The execution time for each step was derived by computing the
average execution time of five runs for each variant at that step, and then averaging the results across
all ten variants. This procedure was followed for each of the ten steps.</p>
        <p>Figures 5a and 5b report the results. The mthtest2 benchmark exhibits negligible impact from
metamorphism, likely due to its limited code size, which restricts the scope and complexity of the
transformations. In contrast, pagerank shows a substantial increase in execution time, nearly doubling
after the first set of transformations. In subsequent steps, execution times remain roughly constant,
confirming the eficacy of the shrinking module that prevents unchecked code bloat.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Evaluation of Obfuscation Capabilities</title>
        <p>The obfuscation capabilities were evaluated using two benchmark applications, the Prime Count
Benchmark (PCB) and an RSA-based test, alongside two malware samples (2d8e89* and 4b060a*)
obtained from a publicly available repository [49]. PCB identifies prime numbers in the range [,  · ],
where  and  are provided as command-line parameters; its performance scales with increasing . The
RSA benchmark encrypts and decrypts ten randomly generated integers, relying on a public key (,  )
and a private key (), with execution time measured while varying  and adjusting the keys accordingly.
These tests were compared against other obfuscation tools, including UPX [50], VMProtect [51], and
ShiftyLoader [52].</p>
        <p>We used the program’s entropy to measure their obfuscation level. This metric, a statistical measure
of byte distribution, allows quick recognition of packed or encrypted samples [53]. The ENT tool [54]
was used to compute two indicators from the generated executables. The first is the binary program
entropy, expressed in bits per character. A value near 8 suggests random, high-density information.
The second is the arithmetic mean value of the file’s bytes, computed by summing all byte values and
dividing by the file length. When data are close to random, this value approaches 127.5, reflecting a
uniform byte distribution [55].</p>
        <p>Table 1 presents the results for the selected benchmarks. The values show that VMorph’s
transformations do not substantially alter entropy or arithmetic mean. In contrast, binaries protected with UPX
or VMProtect reach entropy levels near 8 and display arithmetic means close to 100 or 120, respectively,
indicating that obfuscation is indeed taking place. Conversely, VMorph yields values significantly
lower than those produced by other obfuscation techniques, thereby preventing a direct correlation
between its transformations and the presence of a virtualization layer. ShiftyLoader behaves similarly to
VMorph in terms of entropy, but its arithmetic mean remains closer to that produced by UPX. Overall,
these observations suggest that VMorph can efectively conceal virtualization and produce obfuscation
patterns that are not easily detected by entropy-based analysis.</p>
        <p>To complete the evaluation, the detection rates of commonly available static and dynamic detection
tools were assessed for both unobfuscated and obfuscated malware samples. This analysis was conducted
using VirusTotal [56]. For these tests, no samples protected with ShiftyLoader were included due to
the authors’ restrictions on public disclosure. The comparison involved the original binaries and their
obfuscated counterparts produced using diferent solutions.</p>
        <p>Table 2 reports the detection results. The artefact 4b060a* shows a reduction in detection rate of
approximately 40% after virtualization through VMorph. This version’s detection rate is 0.07, compared
to 0.45 for the native binary. Nonetheless, a few antivirus scanners can still recognise the obfuscated
sample. This persistence of detection is likely related to the presence of strings unchanged from the
original binary, suggesting that certain signatures rely on these strings to identify malicious content.
Conversely, detection for the artefact 2d8e89* is reduced to zero once it is processed by VMorph. Since
this shellcode-based sample has no explicit strings in memory, virtualization disrupts any signature
that might have been anchored to identifiable patterns in the code.</p>
        <p>Some antivirus engines remain efective against binaries protected using UPX, probably due to the
packer’s well-known strategy. This is evident in non-zero detection rates for both artefacts obfuscated
with UPX. On the other hand, binaries protected through VMProtect have a detection rate of zero in
both cases, presumably because its encryption disrupts string-based detection entirely.</p>
        <p>To further highlight these trends, five antivirus solutions were selected from those with the strongest
detection capabilities and the lowest decline in detection rates when facing obfuscated samples. Figure 6
shows that most of these solutions fail to identify malware that has been processed with VMorph
as malicious. These findings indicate that VMorph’s transformations can significantly hinder the
automated classification methods employed by modern antivirus systems, thus extending the obfuscation
capabilities of transformed binaries beyond those ofered by existing solutions.
0</p>
        <p>ZoneAlarmMicroso Avira Google MicroWorld-eScanset</p>
        <p>Antivirus</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>We have presented VMorph, a framework that can protect binaries from source code reconstruction
using virtualization and metamorphic techniques. Our experiments showed a performance degradation
comparable to that of other existing solutions, but improved obfuscation capabilities. Our future plans
include developing a back-end LLVM compiler, supporting multithreading, and porting VMorph to
other operating systems.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was partially funded by the project I-Nest (G.A. 101083398—CUP B97H22004950001)—Italian
National hub Enabling and enhancing networked applications and Services for digitally Transforming
Small, Medium Enterprises and Public Administration.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Writefull in order to: Grammar and spelling
check. After using this tool/service, the authors reviewed and edited the content as needed and take
full responsibility for the publication’s content.
[7] D. Javaheri, P. Lalbakhsh, M. Hosseinzadeh, A novel method for detecting future generations of
targeted and metamorphic malware based on genetic algorithm, IEEE access: practical innovations,
open solutions 9 (2021) 69951–69970. doi:10.1109/access.2021.3077295.
[8] Z. Mumtaz, M. Afzal, W. Iqbal, W. Aman, N. Iltaf, Enhanced metamorphic techniques-a case study
against havex malware, IEEE access: practical innovations, open solutions 9 (2021) 112069–112080.
doi:10.1109/access.2021.3102073.
[9] M. Sikorski, Practical Malware Analysis: The hands-on guide to dissecting malicious software, No</p>
      <p>Starch Press, San Francisco, CA, 2012.
[10] K. Kaushik, H. S. Sandhu, N. K. Gupta, N. Sharma, R. Tanwar, A systematic approach for evading
antiviruses using malware obfuscation, in: Marriwala N., Tripathi C.C., Jain S., Mathapathi S. (Eds.),
International Conference on Emergent Converging Technologies and Biomedical Systems, ETBS
2021, volume 841, Springer Science and Business Media Deutschland GmbH, Berlin, Germany,
2022, pp. 29–37. doi:10.1007/978-981-16-8774-7\_3.
[11] P. Ször, The art of computer virus research and defense, Addison-Wesley Professional, Boston, MA,</p>
      <p>USA, 2005.
[12] M.-J. Kim, J.-Y. Lee, H.-Y. Chang, S. Cho, Y. Park, M. Park, P. A. Wilsey, Design and performance
evaluation of binary code packing for protecting embedded software against reverse engineering,
in: 2010 13th IEEE International Symposium on Object/Component/Service-Oriented Real-Time
Distributed Computing, IEEE, Madeira Island, Portugal, 2010, pp. 80–86.
[13] C. Collberg, C. Thomborson, D. Low, A taxonomy of obfuscating transformations, Technical Report
148, Department of Computer Science, The University of Auckland, 1997.
[14] J. Hwang, T. Han, Identifying Input-Dependent jumps from obfuscated execution using dynamic
data flow graphs, in: Proceedings of the 8th Software Security, Protection, and Reverse Engineering
Workshop, number Article 3 in SSPREW-8, Association for Computing Machinery, New York, NY,
USA, 2018, pp. 1–12.
[15] M. Sharif, A. Lanzi, J. Gifin, W. Lee, Automatic reverse engineering of malware emulators, in:
Proceedings of the 30th IEEE Symposium on Security and Privacy, IEEE, Piscataway, NJ, USA,
2009, pp. 94–109. doi:10.1109/sp.2009.27.
[16] J. H. Anderson, Y.-J. Kim, GDB: The GNU project debugger, Distributed Computing 14 (2001)
17–29. doi:10.1007/PL00008923.
[17] NSA, Ghidra - a software reverse engineering (SRE) suite of tools developed by NSA’s research
directorate in support of the cybersecurity mission, https://ghidra-sre.org/, 2019.
[18] H. Rays, Hex rays - state-of-the-art binary code analysis solutions, 1997.
[19] J.-M. Borello, L. Mé, Code obfuscation techniques for metamorphic viruses, Journal in Computer</p>
      <p>Virology 4 (2008) 211–220. doi:10.1007/s11416-008-0084-2.
[20] J.-M. Borello, Étude du métamorphisme viral: modélisation, conception et détection, Ph.D. thesis,</p>
      <p>Université Rennes 1, Rennes, France, 2011.
[21] Nikitchenko M., Spivakovsky A., Ermolayev V., Bassiliades N., Kharchenko V., Shyshkina M.,
Peschanenko V., Mayr H.C., Fill H.-G., Yakovyna V. (Eds.), Metamorphic viruses’ detection
technique based on the equivalent functional block search, volume 1844, CEUR-WS, 2017.
[22] A. G. Kakisim, M. Nar, I. Sogukpinar, Metamorphic malware identification using engine-specific
patterns based on co-opcode graphs, Computer standards &amp; interfaces 71 (2020) 103443. doi:10.
1016/j.csi.2020.103443.
[23] P. Borrello, E. Coppa, D. C. D’Elia, Hiding in the particles: When return-oriented programming
meets program obfuscation, in: Proceedings of the 51st Annual IEEE/IFIP International Conference
on Dependable Systems and Networks (DSN), DSN ’21, IEEE, Piscataway, NJ, USA, 2021, pp. 555–
568. doi:10.1109/dsn48987.2021.00064.
[24] W. Mahoney, J. Franco, G. Hof, J. T. McDonald, Leave it to weaver, in: Proceedings of the 8th
Software Security, Protection, and Reverse Engineering Workshop, SSPREW ’18, ACM, New York,
NY, USA, 2018, pp. 1–9. doi:10.1145/3289239.3291459.
[25] J. Kinder, Towards static analysis of virtualization-obfuscated binaries, in: Proceedings of the 19th
Working Conference on Reverse Engineering, WCRE ’12, IEEE, Piscataway, NJ, USA, 2012, pp.
61–70. doi:10.1109/wcre.2012.16.
[26] J. H. Suk, D. H. Lee, VCF: Virtual code folding to enhance virtualization obfuscation, IEEE Access
8 (2020) 139161–139175. doi:10.1109/ACCESS.2020.3012684.
[27] B. Lee, Y. Kim, J. Kim, binOb+: a framework for potent and stealthy binary obfuscation, in:
Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security,
ASIACCS ’10, Association for Computing Machinery, New York, NY, USA, 2010, pp. 271–281.
doi:10.1145/1755688.1755722.
[28] Z. Wu, S. Gianvecchio, M. Xie, H. Wang, Mimimorphism: a new approach to binary code
obfuscation, in: Proceedings of the 17th ACM conference on Computer and communications
security, CCS ’10, Association for Computing Machinery, New York, NY, USA, 2010, pp. 536–546.
doi:10.1145/1866307.1866368.
[29] J. Oberheide, M. Bailey, F. Jahanian, PolyPack: an automated online packing service for optimal
antivirus evasion, in: Proceedings of the 3rd USENIX conference on Ofensive technologies,
WOOT’09, USENIX Association, USA, 2009, p. 9.
[30] G. Bernardinetti, D. Di Cristofaro, G. Bianchi, PEzoNG: Advanced packer for automated evasion
on windows, Journal of computer virology and hacking techniques 18 (2022) 315–331. doi:10.
1007/s11416-022-00417-2.
[31] M. Gaudesi, A. Marcelli, E. Sanchez, G. Squillero, A. Tonda, Malware obfuscation through
evolutionary packers, in: Proceedings of the Companion Publication of the 2015 Annual Conference
on Genetic and Evolutionary Computation, GECCO Companion ’15, Association for Computing
Machinery, New York, NY, USA, 2015, pp. 757–758. doi:10.1145/2739482.2764940.
[32] The Mental Driller, Metamorphism in practice or How I made MetaPHOR and what I’ve learnt,</p>
      <p>Technical Report 29A 6, 29A, 2002.
[33] P. Ször, P. Ferrie, Hunting For Metamorphic, Technical Report, Symantec, 2001.
[34] A. Pellegrini, Hijacker: Eficient static software instrumentation with applications in high
performance computing, in: Proceedings of the 2013 International Conference on High
Performance Computing &amp; Simulation (HPCS), IEEE, Piscataway, NJ, USA, 2013, pp. 650–655.
doi:10.1109/hpcsim.2013.6641486.
[35] V. Bala, E. Duesterwald, S. Banerjia, Dynamo: a transparent dynamic optimization system,</p>
      <p>SIGPLAN Notices 35 (2000) 1–12. doi:10.1145/358438.349303.
[36] C.-K. Luk, B. C. Ed, F. C. G. Hi, E. D. Q. Rs, A. Tu, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney,
S. Wallace, V. J. Reddi, K. Hazelwood, Pin: building customized program analysis tools with
dynamic instrumentation, Proceedings of the 2005 ACM SIGPLAN conference on Programming
language design and implementation - PLDI ’05 40 (2005) 190. doi:10.1145/1065010.1065034.
[37] J.-Y. Lee, J. H. Suk, D. H. Lee, VODKA: Virtualization obfuscation using dynamic key approach, in:
Kang B.B., Jang J. (Eds.), Information Security Applications, volume 11402 LNCS of Lecture notes
in computer science, Springer International Publishing, Cham, 2019, pp. 131–145. doi:10.1007/
978-3-030-17982-3\_11.
[38] D. Chisnall, How to design an isa, Commun. ACM 67 (2024) 60–66. URL: https://doi.org/10.1145/
3640538. doi:10.1145/3640538.
[39] System V Application Binary Interface AMD64 Architecture Processor Supplement, 2006.
[40] Keystone, Keystone, 2016.
[41] P. Caporaso, G. Bianchi, F. Quaglia, JITScanner: Just-in-Time executable page check in the linux
operating system, in: Proceedings of the 18th International Conference on Availability, Reliability
and Security, number Article 78 in ARES ’23, Association for Computing Machinery, New York,
NY, USA, 2023, pp. 1–8.
[42] B. B. Rad, M. Masrom, S. Ibrahim, Evolution of computer virus concealment and Anti-Virus
techniques: A short survey, arXiv [cs.CR] 8 (2011) 8.
[43] Q. Zhang, D. S. Reeves, MetaAware: Identifying metamorphic malware, in: Twenty-Third Annual
Computer Security Applications Conference (ACSAC 2007), IEEE, San Diego, California, 2007, pp.
411–420.
[44] A. Kalysch, J. Götzfried, T. Müller, VMAttack: Deobfuscating Virtualization-Based packed binaries,
in: Proceedings of the 12th International Conference on Availability, Reliability and Security,
number Article 2 in ARES ’17, Association for Computing Machinery, New York, NY, USA, 2017,
pp. 1–10.
[45] A. H. Toderici, M. Stamp, Chi-squared distance and metamorphic virus detection, Journal of</p>
      <p>Computer Virology and Hacking Techniques 9 (2013) 1–14.
[46] R. K. Jidigam, T. H. Austin, M. Stamp, Singular value decomposition and metamorphic detection,</p>
      <p>Journal of Computer Virology and Hacking Techniques 11 (2015) 203–216.
[47] P. S. Paulrho, mthtest2: maths benchmark test (numerical position of moon), 2017.
[48] A. Carrara, pagerank: C implementation of google’s PageRank algorithm, 2022.
[49] M. Malware, Linux Malware Samples - a collection of linux malware binaries, https://github.com/</p>
      <p>MalwareSamples/Linux-Malware-Samples, 2021. Accessed: 2023-NA-NA.
[50] UPX Team, UPX, https://upx.github.io/, 1996.
[51] VMProtect Software, VMProtect software, https://vmpsoft.com/, 2005. Accessed: 2024-NA-NA.
[52] M. Salvatori, G. Bernardinetti, F. Quaglia, G. Bianchi, ShiftyLoader: Syscall-free reflective code
injection in the linux operating system, in: Proceedings of the 2024 Italian Conference on
Cybersecurity, ITASEC ’24, CEUR-WS.org, Aachen, Germany, 2024, pp. 1–14.
[53] R. Lyda, J. Hamrock, Using entropy analysis to find encrypted and packed malware, IEEE security
&amp; privacy 5 (2007) 40–45. doi:10.1109/msp.2007.48.
[54] J. Walker, Ent program, https://www.unix.com/man-page/linux/1/ent, 1998.
[55] Z. Mengdi, Z. Xiaojuan, Z. Yayun, M. Siwei, Overview of randomness test on cryptographic
algorithms, J. Phys. Conf. Ser. 1861 (2021) 012009.
[56] VirusTotal, VirusTotal, 2004.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Wróblewski</surname>
          </string-name>
          ,
          <source>General Method of Program Code Obfuscation, Ph.D. thesis</source>
          , Wrocław University of Science and Technology,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Drape</surname>
          </string-name>
          ,
          <article-title>Intellectual property protection using obfuscation</article-title>
          ,
          <source>Technical Report CS-RR-10-02</source>
          , Oxford University Computing Laboratory,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guillot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gazet</surname>
          </string-name>
          ,
          <article-title>Automatic binary deobfuscation</article-title>
          ,
          <source>Journal in computer virology 6</source>
          (
          <year>2010</year>
          )
          <fpage>261</fpage>
          -
          <lpage>276</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11416-009-0126-4.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>PreEmptive</given-names>
            <surname>Solutions</surname>
          </string-name>
          ,
          <string-name>
            <surname>LLC</surname>
          </string-name>
          ,
          <article-title>The unoficial R8 documentation</article-title>
          , https://r8-docs.preemptive.com/,
          <year>2018</year>
          . Accessed:
          <fpage>2023</fpage>
          -9-4.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>What did you pack in my app? a systematic analysis of commercial android packers</article-title>
          ,
          <source>in: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering</source>
          , ESEC/FSE 2022,
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <year>2022</year>
          , pp.
          <fpage>1430</fpage>
          -
          <lpage>1440</lpage>
          . doi:
          <volume>10</volume>
          .1145/3540250.3558969.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Sebastian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malgaonkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kapoor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Parekhji</surname>
          </string-name>
          ,
          <article-title>A study &amp; review on code obfuscation</article-title>
          ,
          <source>in: Proceedings of the 2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare</source>
          , IEEE, Piscataway, NJ, USA,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .1109/startup.
          <year>2016</year>
          .
          <volume>7583913</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>