<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Metamorphic Viruses Detection Technique Based on the Modified Emulators</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oksana Pomorova</string-name>
          <email>o.pomorova@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleg Savenko</string-name>
          <email>savenko_oleg_st@ukr.net</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergii Lysenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrii Nicheporuk</string-name>
          <email>andrey.nicheporuk@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Key Terms. Machine Intelligence, KE, and KM for ICT, Software Component, Software System</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Khmelnitsky National University</institution>
          ,
          <addr-line>11, Institutskaya St., 29016, Khmelnitsky</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>21</fpage>
      <lpage>24</lpage>
      <abstract>
        <p>An article presents a new technique for metamorphic viruses detection using modified emulators, placed in the hosts of the network. Proposed technique provides the classification of the metamorphic virus in classes with the usage of the fuzzy logic. Technique makes it possible to detect the metamorphic viruses, which use obfuscation techniques. The results of experimental studies showed the effectiveness of the proposed method of detection metamorphic virus copies at 85%.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>Malware detection</kwd>
        <kwd>metamorphic viruses</kwd>
        <kwd>polymorphic viruses</kwd>
        <kwd>obfuscation</kwd>
        <kwd>modified network emulators</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The detection of computer viruses is one of the main challenges of information
security for today. Viruses execute harmful activity on infected hosts, such as stealing
system information, accessing private information, corrupting data, spamming,
logging their keystrokes. Among computer viruses one of the leaders are
metamorphic viruses. Thus, metamorphic viruses cause damage to cost millions of dollars.
According to Symantec in 2011 metamorphic virus Sality infected about 3 million
computers in the world [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and at the end of 2015 it is in the top five of the most
spread viruses (1.43% of the total number of detected threats) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Detection of the metamorphic viruses is a very complicated task due to it sus age
of the obfuscation techniques for program code. It allows virus to create multiple
copies of the same virus and its detection become a very difficult task. In addition, in
order to complicate the process of reverse engineering and the data protection
obfuscation technique is often used in the trusted applications by software developers [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>That is why the actual problem is to develop a new technique for metamorphic
viruses detection with the involvement of fuzzy logic that will allow the suspicious</p>
      <p>- 376
programs classification to one of the metamorphic viruses classes using the modified
emulators placed on each host of the network.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>
        Known techniques for virus detection, based on signature analysis are not able to
detect the altered copies of metamorphic virus [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7">4-7</xref>
        ]. In order to detect this type of
viruses most antivirus scanners use the heuristic’s method, which deploys the
sequence of API-functions calls, the control flow graph of the program, the structural
features of the PE .EXE files, opcode instructions and their combinations.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]a method for metamorphic malware detection is presented. A malware
signature is described by the set of control flow graphs the malware contains. Technique
uses a distance metric based on the distance between feature vectors. The drawback of
the approach is that it is computationally inefficient.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]a statistical technique based on comparison of the similarity between two
files infected by two morphed versions of a given metamorphic virus is used. The
proposed solution based on static analysis and it uses the histogram of machine
instructions frequency in various offspring of obfuscated viruses. The disadvantage of
the approach is that it is inefficient for detection of viruses, which use code
transposition techniques.
      </p>
      <p>
        A malware detection system based on API call graph is proposed in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Each
malware sample is represented as data dependent API call graph. Graph matching
algorithm is used to calculate similarity between the input sample and malware API
call graph samples stored in a database. The main drawback that most malware
samples are generated from previous existing samples, therefore sequences API calls are
similar.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] the technique of botnet detection which bots use polymorphic code is
proposed. Performed detection is based on the multi-agent system by means of antiviral
agents that contain sensors. Developed technique makes it possible to perform
provocative actions against probably infected file. The disadvantage of this technique is
large computational complexity of the behavior analysis.
      </p>
      <p>Tha state-of-art demonstrates a need of development new and improve existing
techniques for metamorphic viruses’ detection.
3</p>
      <p>Metamorphic Virus Detection Method-based on the Modified
Emulators
A new metamorphic viruses detection technique based on the modified emulators is
proposed. It uses the emulation on each host in the network. The main function of the
hosts is to implement a single-time emulation and execution of the unknown
potentially malicious program and sending the results to the server.</p>
      <p>The server is used for processing the results of the emulation process obtained
from the hosts. In order to complicate the reverse engineering process and data
protection, the obfuscation techniques are often used for trusted applications by software
developers. Therefore, the main task of the server is classification of the feature
vectors based on the comparison of the metamorphic viruses’ copies, which were
obtained from network hosts.</p>
      <p>The proposed technique specifies three classes of the metamorphic viruses, which
are to be classified. In addition, there is the fourth class of programs, which have
similarity to metamorphic viruses by its behavior, but these programs are not
malicious. Fig.1 shows a scheme of the metamorphic viruses detection.</p>
      <p>Let us consider the proposed technique. In order to detect suspicious program
activity on each host of the corporate network, the analyzer of the suspicion programs is
used. Every single operation that is performed by suspicious program is not dangerous
one. However, the execution of the sequence of such operations may indicate a
possible risk of infection with the virus.</p>
      <p>Each application that comes into the system is marked as suspicious or un
suspicious. Let us present the feature vector, which defines the program membership to one
of two classes as follows:</p>
      <p>U  (M , Q, J ,Y , L, N , H ) ,
where, M–attempt of the program to get the system admin is trator rights, Q - attempt
to open or close the system port, J - attempt to delete the file, Y - create a file or
process, L–key logging attempt, N – sending messages to the network, H -creation a key
or an entry to the registry.</p>
      <p>Each feature is able to posses a value 0 or 1, where 1 indicates the activation of the
featuretrait,0 -vice versa. Program consider suspicious if:</p>
      <p>P  suspicious , if u U , (ui  1  u j  1)
.</p>
      <p>So, if P = suspicious for some program, it comes to the metamorphic viruses
detection system. In order to get the modified code sample FS , the emulation of program
Р is carried out. The emulation process consists of the instruction execution in a
virtual environment, and the extraction instructions from the software package.</p>
      <p>The usage of the one-type emulators in all the network hosts does not guarantee the
metamorphic virus detection with high efficiency, because their usage will produce
only the same code samples. In order to detect the metamorphic viruses properties and
features, different conditions for malicious code execution are needed.</p>
      <p>Therefore, the modified emulators on each host are created. The structure of the
emulator includes the virtual processor. It is able to execute the set of instruction such
as MMX, SSE, SSE2, etc, and it includes a set of virtual registers. Also, the emulator
consists of RAM and virtual stack, virtual network controller; the operating system (it
supports API functions, registry and ports).</p>
      <p>To avoid the anti-emulation technologies, which are used by metamorphic viruses,
the emulator includes a heuristics module. For each operation performed by virtual
CPU the fixed, the processing time is determined and the checking of repeating for
some operation is executed.</p>
      <p>In order to obtain the original sample code FP , the disassembly of program P is
carried out. The result of disassembly is a set assembler instructions x86/x64. In order
to construct the feature vector only opcodes are used and operands are discarded.</p>
      <p>The resulting listing of the disassembly instructions is partitioned into the
functional blocks (FB). One of the techniques that are used to perform the instruction
obfuscation is moving of program blocks. It is carried out by using the conditional state
transition instructions (jz, jnz, jmpetc).</p>
      <p>The main mechanisms creation of the metamorphic viruses copies are the insertion,
deletion and transposition of their instructions. For the purpose of finding the
similarities between the two FB code samples FP and FS the Damerau–Levenshtein distance
was used.</p>
      <p>In order to evaluate the Damerau–Levenshtein distance the polynomial algorithm
complexity of Wagner-Fisher was used. It made it possible to create a short
transformation chain in order to transform the set of opcodes of the program after the
emulation into the opcode set of the program before the emulation.</p>
      <p>Consider a program P, which consists of a set of assembler commands pi ,
P  { p1, p2 ,..., pk } . Let us partition the program P into the functional blocks of an
arbitrary length. Such blocks start and end by the instructions of the conditional state
transitions, such asjmp, jzetc, that is P  {B1, B2 ,..., Bl } . Then we can
write: P  {B1  { p1, p2 ,..., pi1},..., Bl  { pi , pi1,..., pk }} .</p>
      <p>Let us denote program before the emulation FP , and program after the emulation FS .</p>
      <p>Let us present the functional unit B, which consists of a set of opcodes of length
| B | m , as p1, p2 ,..., pm . So, the subset of opcodes xi , xi1,..., x j of the functional
unit B will be specified as B(i, j) .</p>
      <p>Let us denote the transformation weight of the opcode a into b as w(a, b) . Thus,
w(a, b) is the weight of the replacement of one opcode into another one, when a  b ,
w(b, a) -weight of the transposition, w(a, ) -weight of the deletion, and w( , b)
weight of the insertion for opcode b.</p>
      <p>Let us assume that Bg and Bh - two FB, which consist of the opcodes sequence (of n
and m length respectively) defined by a finite alphabet of the assembler
instructions A  (a1, a2 ,..., ak ) . Then Bg of the FB program FP we will denote as BgFP ,
and Bh of the same FB program FS after the emulation we will denote as BhFS . Then the
Damerau-Levenshte</p>
      <p>in
as dL(BgFP , BhFS )  OPT (N , M ) , where
distance dL(BgFP , BhFS )
is
calculated
0,

i,
 j

OPT  





 OPT (i, j  1)  w(a, )

 OPT (i,1 j)  w( , b)
min OPT (i,1, j  1)  w(a, b)
 OPT (i  2, j  2)  w(b, a)
i  0, j  0
j  0, i  0
i  0, j  0
j  0, i  0
.</p>
      <p>After the Damerau-Levenshte in distance for two bocks Bg and Bh is evaluated,
the weighted averages of the corresponding parameter of the feature vector for all
code blocks is to be formed. In order to obtain such weighted averages of the
parameters, the index of the weighted arithmetic mean is used (2).</p>
      <p> n 
  dLi * fl 
dL   i1 
 n 
 i1 fl  ,
where, dLi – the Levenshte in distance for FB Bi , f l - number of FB with the
value dLi .</p>
      <p>TheDamerau-Levenshtein distance estimates the minimum value for the required
operations of the replacement, insertion, deletion and transposition, and is an integer
value. In this case, for the finding of the lowest difference between the metamorphic
viruses’ copies the obtained values are rounded down.</p>
      <p>For the rest of the features the normalization is performed in the same way.</p>
      <p>Thus, the feature vector of similarity for metamorphic viruses’ copies based on
theDamerau-Levenshte in metrics will be presented as follows:</p>
      <p>, , , , ,
S  dL T D I R M (3)
,
where dL – the Damerau-Levenshte in distance for functional unit between
programs Fp and Fs ; T–number of required operations of the opcodes exchange for the
program block’s transformation Fp into Fs ( Fp = Fs ); D - number of operations
required for the opcode deletion; I - number of operations required for the opcode
inser(1)
(2)
tion; R - number of operations required for the opcode replacement; M - number of
matches between opcodes of the functional units of programs Fp and Fs .</p>
      <p>
        In order to make a conclusion about the infection by metamorphic virus,
constructed feature vectors of the similarity are sent to server, where they are analyzed by
the fuzzy inference system for the purpose of its classification [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>The input linguistic variables for fuzzy inference systems are the feature vectors of
similarity for the copes of the metamorphic viruses (3). The terms of the linguistic
variable are Low, Medium and High.</p>
      <p>As the membership function for input variables the trapezoidal one was chosen,
and for output –the triangular. Forfeature dL we can present equations as follows:
,</p>
      <p>Fuzzy inference system uses 38 rules for making the conclusion about belonging of
the metamorphic virus to one of the class:
if (dL is Low) and (T is Low) and (D is Medium) and (I is Hight )
and (R is Low) and (M is Medium) then class1
if (dL is Low) and (T is Medium) and (D is Medium) and (I is Hight )
and (R is Low) and (M is Medium) then class1
if (dL is Hight ) and (T is Low) and (D is Hight ) and (I is Hight )
and (R is Medium) and (M is Low) then class3</p>
      <p>A result of the fuzzy inference system is a determination of the member ship
degree of each virus copy to one of the class of the metamorphic viruses.</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>
        In order to determine the efficiency of the proposed method several experiments were
held. For this purpose the set of metamorphic viruses was generated, and
metamorphic generators Next Generation Virus Creation Kits, Second Generation Virus
Generator and Virus Creation Lab for Win32were used [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].They are able to infect .EXE
and .DLL files and perform the obfuscation operations with files.
      </p>
      <p>For experiments, the university network with 80 users’ hosts and one server was
used. Each host us edits modified emulator. The settings of the emulators are
presented in Table 1.
Fuzzy inference system was built using Fuzzy Logic Toolbox package in Matlab
system. Built system has the following parameters: algorithm –Mamdani, aggregation
method, accumulation method, the method defuzzification. The system consists of six
inputs and one output.</p>
      <p>The result of fuzzy inference system is a conclusion about membership of the
unknown program to one of three metamorphic viruses’ class or program is trusted. If the
resulting value of the membership degree for unknown object belongs to the range from
0 to 0.25 - it is classified as a trusted application; if the membership degree of is in the
range of 0.26 to 1, then the unknown object belongs to one of the metamorphic viruses’
classes. Values from 0.26 to 0.49 determine the first class of the metamorphic viruses,
values from 0.5 to 0.74 –the second class, the value of 0.75 to 1 –the third class.</p>
      <p>Table 2 and figure 2 demonstratethe results of fuzzy inference system for
suspicious file. As a result,15% of the copies have not changed, 5% were classified asthe
first class,11.25%-the third one, and 68, 75% as the second class.
The analysis of the subject area revealed the need to improve existing techniques for
metamorphic viruses’ detection. In the article, the new technique for metamorphic
viruses’ detection using modified emulators in network hosts is proposed. The
classification of viruses into classes of metamorphic viruses is based on a usage of the
fuzzy inference system. Proposed technique makes it possible to detect metamorphic
viruses that use obfuscation techniques of the program code. Such approach enables
the increase of the efficiency of the metamorphic viruses detection. The results of
experimental studies have demonstrated the efficiency technique for metamorphic
viruses’ detection at about 85%.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Falliere</surname>
          </string-name>
          , N.:
          <article-title>Sality: Story of a Peer-to-Peer ViralNetwork</article-title>
          .
          <source>Technical report, Symantec Labs</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Virus statistics.
          <source>Overview cyber November</source>
          <year>2014</year>
          . online: https://www.esetnod32.ru/company/viruslab/statistics/?id= 896397 [in Russian]
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Attaluri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGhee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamp</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Profile Hidden Markov Models and Metamorphic Virus Detection</article-title>
          .
          <source>Journal on Computer Virology</source>
          ,
          <volume>5</volume>
          ,
          <fpage>151</fpage>
          --
          <lpage>169</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Rad</surname>
            ,
            <given-names>B. B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Masrom</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Metamorphic Virus Variants Classification Using Opcode Frequency Histogram</article-title>
          .
          <source>In: Proc. 14th WSEAS Int Conf on Computers</source>
          , pp.
          <fpage>147</fpage>
          --
          <lpage>155</lpage>
          , WSEAS (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Elhadi</surname>
            ,
            <given-names>A. A. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maarof</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barry</surname>
            ,
            <given-names>B. I. A.</given-names>
          </string-name>
          :
          <article-title>Improving the Detection of Malware Behaviour Using Simplified Data Dependent API Call Graph</article-title>
          .
          <source>Int. J. of Security and Its Applications</source>
          ,
          <volume>7</volume>
          (
          <issue>5</issue>
          ),
          <fpage>29</fpage>
          --
          <lpage>42</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kaushal</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swadas</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prajapati</surname>
          </string-name>
          , N.:
          <article-title>Metamorphic Malware Detection Using Statistical Analysis</article-title>
          .
          <source>Int. J. of Soft Computing and Engineering</source>
          ,
          <volume>2</volume>
          ,
          <fpage>49</fpage>
          --
          <lpage>53</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Cesare</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Malware variant detection using similarity search over sets of control flow graphs</article-title>
          .
          <source>In: Proc. 10th Int. Conf. on Trust, Security and Privacy in Computing and Communications</source>
          , Washington, DC, USA, pp.
          <fpage>181</fpage>
          -
          <lpage>189</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Pomorova</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savenko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lysenko</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kryshchuk</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nicheporuk</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A technique fordetection of bots which are using polymorphic code</article-title>
          .
          <source>In: Proc. 21st Int. Conf, CN</source>
          , Springer, Brunów, Poland, pp.
          <fpage>265</fpage>
          --
          <lpage>276</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Shtovba</surname>
            ,
            <given-names>S</given-names>
          </string-name>
          , Pankevich,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Nagorna</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Analyzing the criteria for fuzzy classifier learning</article-title>
          .
          <source>Automatic control and computer sciences</source>
          ,
          <volume>49</volume>
          (
          <issue>3</issue>
          ),
          <fpage>123</fpage>
          --
          <lpage>132</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>VX</given-names>
            <surname>Heavens</surname>
          </string-name>
          <article-title>Computer virus collection</article-title>
          . online: http://vx.netlux.org
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>