<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Zhao Yuntao)
ORCID:</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Research on Visual Malicious Code Classification Based on Improved Faster R-CNN</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Liang Zhen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuntao Zhao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Shenyang Ligong University, College of Information Science and Engineering</institution>
          ,
          <addr-line>Liaoning Shenyang 110159</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>With the development of Internet technology, the number of malicious software is increasing and malicious attacks are becoming rampant. Therefore, the research on malicious software has a great application prospect. This paper proposes a visual malicious code classification method based on improved Faster R-CNN. Rank &amp; Sort Loss is used to optimize the loss function of the Faster R-CNN model, in order to reduce the number of hyperparameters, improve the performance of the model, and make it more robust to the problem of class imbalance in training. The experimental results show that using the improved Faster R-CNN detection method has a further improvement in the recognition accuracy of malicious code grayscale images compared with the classic Faster R-CNN detection method.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Malicious code visualization</kwd>
        <kwd>Rank &amp; Sort loss</kwd>
        <kwd>Faster R-CNN network</kwd>
        <kwd>object detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In 2020, according to the detection and dissemination of malicious programs by CNCERT / CC
(National Internet Emergency Center), more than 42.98 million samples of malicious programs were
found in the whole year. According to the above report, malicious code has great harm. Based on the
above situation, it is of great significance to study malicious code. At this stage, there have been many
researches on malicious code analysis and detection methods. In 2011, L.Nataraj et al. proposed a
malicious code visualization method and classification method, which converts binary data files of
malicious code into texture images, and classifies malicious software through KNN model and
Euclidean distance[2]. Through experimental demonstration, this method can effectively improve the
detection speed of malicious code, and also ensure the accuracy level of traditional static detection
methods. On this basis, many researchers have begun to try to use the detection and classification
method based on malicious code image to train the appropriate malicious code classifier. For example,
Zhang Jinglian et al. proposed a malicious code classification technology based on feature fusion, which
extracts and fuses the features by extracting the opcode instructions and grayscale image texture of
malicious code, and uses Random Forest (RF) to classify the malicious code families[3]. The above
methods all visualize the malicious code, convert the malicious code into a grayscale image for further
classification, and achieve good results.</p>
      <p>With the application of Convolutional Neural Network (CNN) in object detection, more and more
researchers have put forward a series of achievements. For example, Ross B. Girshick proposed the
Faster R-CNN network[5]. For target detection in grayscale images of malicious code, we need to detect
the position of the section (.text) where the core feature opcodes are located in the grayscale image. The
loss of the traditional Faster R-CNN model is composed of two parts: classification loss and regression
loss. During the training process, many hyperparameters will be generated, which requires human and
material resources to adjust the parameters. At the same time, the imbalance of data distribution will
also affect the detection effect of the model. Therefore, this paper optimizes the loss function for the
above problems, and introduces Rank &amp; Sort Loss to achieve better detection results.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
    </sec>
    <sec id="sec-3">
      <title>2.1. Research on Malicious Code Visualization</title>
      <p>Static disassembly of malicious code to achieve the visualization of malicious code. IDA Pro is used
for static disassembly, by importing malicious software into IDA Pro to get malicious code binary file
(.bytes file) and assembly file (.asm file).On this basis, the obtained binary executable file is used as
input data, and regard it as the original Bytes binary stream. A hexadecimal number can be considered
as a combination of binary number, and four binary numbers can be converted to a hexadecimal number.
Because the range of hexadecimal number is only between 0 and 16, which corresponds to two
hexadecimal numbers of 256 pixel value of grayscale image, this method can convert the original data
into a simple gray image. The sequence of binary streams corresponding to the gray level of each 8-bit
pixel is segmented and then arranged into a sequence to form the corresponding grayscale image. Figure
1 shows the schematic diagram of malicious code visualization.</p>
    </sec>
    <sec id="sec-4">
      <title>2.2. Faster R-CNN Network Model</title>
      <p>Faster R-CNN is a deep learning network model based on region proposal network. The function of
target positioning is added on basis of the CNN network model. Compared with the RCNN and Fast
RCNN detection networks, the Faster R-CNN network implements an end-to-end network training mode,
so that the CNN for generating the proposal window and the CNN for object detection share operations.
The structure of Fast R-CNN mainly includes backbone extraction network, RPN region proposal
network, region of interest pooling layer and classifier. The Faster R-CNN network first processes the
pictures, obtains the common feature layer, and then obtains the suggestion frame, then uses the
suggestion frame to intercept the common feature layer, and adjusts some feature layers after the
interception to the same size through the ROI pooling layer, and finally Perform classification and
regression. The network structure is shown in Figure 2.</p>
    </sec>
    <sec id="sec-5">
      <title>2.3. Rank &amp; Sort Loss</title>
      <p />
      <p>| |
= 1 ∑ ∈ (  ( ) −</p>
      <p>∗ ( ))</p>
      <p>Rank &amp; Sort (RS) Loss [8] was proposed by K. Oksuz et al in 2021. Rank &amp; Sort Loss is composed
of Rank loss and Sort loss. Rank loss enables all positive samples to be sorted before negative samples,
and only the negative samples with higher scores are selected for calculation. Sort loss uses IoU as the
classification label, so that the positive samples in the prediction box can be sorted by continuous values
as the label. Besides, Rank &amp; Sort Loss does not require multi-task weight or coefficient adjustment.</p>
      <sec id="sec-5-1">
        <title>The definition of the loss function is shown in formula (1):</title>
        <p>time,  
respectively express:</p>
        <p>
          Where P is the collection of positive sample,   ( ) is the sum of rank error and sort error at this
∗ ( ) is the sum of the target rank error and the target sort error, using equations (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) and (
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
  ( ) =


( )
−( ) + ∑ ∈  (  )(1−  )
        </p>
        <p>+( )

∗ ( ) =  ∗ ( ) + ∑ ∈  (  )[  ≥  ](1−  )
∑ ∈  (  )[  ≥  ]
Where  and  are the sample numbers,  is the score label, and 
( ) is the number of all
positive samples and negative samples that are greater than or equal to the positive sample’s
classification score; 
positive sample's classification score; 
−( ) is the number of all negative samples that is greater than or equal to the</p>
        <p>+( ) is the number of all positive samples that is greater
unit step function,  ∗ ( ) is the target rank error.
than or equal to the positive sample's classification score,   is the classification score,  ( ) is the</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>3. Task Description</title>
      <p>The training process for the whole network mainly includes the following steps:
1. Convert the malicious code into a grayscale image dataset and preprocess the dataset. The first
step is to use the disassembly tool IDA Pro to disassemble, turn the program used for detection
into a binary file, and map it into a corresponding grayscale image through visualization. Modify
the label names to the labels of 6 types of malicious code images.
2. Feature extraction is performed on the input image through the backbone extraction network. In
this paper, ResNet50 is used as the backbone extraction network. ResNet50 has two basic blocks,
Conv Block and Identity Block. The main difference between the two is that Conv Block
performs convolution operations on the residual edge, while Identity Block does not perform
convolution. ResNet50 contains 1 Conv Block, and the number of Identity Blocks is 3, 4, 6, and</p>
      <sec id="sec-6-1">
        <title>3, respectively.</title>
        <p>3. The features output by the backbone extraction network are sent to the proposal box, where one
convolution channel number is 18, which is used to predict whether each prediction box contains
an object, and the other convolution channel number is 36, which is used to adjust the prior box,
get a suggestion box. Then, the classification and regression are carried out through the
ROIPooling layer. The improved method is to replace the cross entropy loss in the classification
loss with the RS loss, and the regression loss adopts the GIoU loss. The weighted parameter of
the regression loss of the improved model is the RS loss divided by the regression loss.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>4. Experimental Process and Analysis</title>
    </sec>
    <sec id="sec-8">
      <title>4.1. Experimental Data Set</title>
      <p>This experiment selects malicious sample data from the Kaggle platform of the Microsoft Malware
Security Defense Center. There are 1839 samples from 6 malicious code sample families, as shown in</p>
    </sec>
    <sec id="sec-9">
      <title>4.2. Experimental Environment</title>
    </sec>
    <sec id="sec-10">
      <title>4.3. Analysis of Experimental Results</title>
      <p>In the experiment, the data were randomly divided into training set and test set according to 8:2. The
accuracy rate and recall rate are selected as the evaluation criteria. Table 3 shows the accuracy rate and
recall rate of each class of the traditional machine learning classifier, the traditional Faster R-CNN
network framework and the improved Faster R-CNN network framework. The experimental results
show that the Faster R-CNN network model is better than the traditional machine learning classification
model, and the improved Faster R-CNN network model is further improved on the basis of the
traditional Faster R-CNN network model. After the introduction of RS Loss, the accuracy of the model
is increased by 1.9 percentage points compared with the original model. At the same time, the
complexity of the model is measured by Floating point Operations (FLOPs) and Parameters. The results
show that the amount of Parameters of the model is reduced by 15.37% compared with the original
model, and the amount of FLOPs is reduced by 23.58%.It shows that the introduction of RS loss can
greatly reduce the amount of parameters and effectively reduce the complexity of the model.
Table 3</p>
      <sec id="sec-10-1">
        <title>Malicious sample data</title>
      </sec>
      <sec id="sec-10-2">
        <title>Classification method KNN RF</title>
      </sec>
      <sec id="sec-10-3">
        <title>Faster R-CNN</title>
      </sec>
      <sec id="sec-10-4">
        <title>RS+Faster R-CNN</title>
      </sec>
      <sec id="sec-10-5">
        <title>Accuracy</title>
        <p>0.611
0.889
0.894
0.427
0.836
0.877
0.895</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>5. Conclusion</title>
      <p>In order to further improve the detection and classification effect of malicious code, this paper
proposes a malicious code classification model based on improved Faster R-CNN. The method of using
Rank &amp; Sort loss function effectively reduces the number of hyperparameters. In the process of model
training, there is no need to repeatedly adjust the hyperparameters. We only need to adjust the learning
rate to improve the model performance, avoiding the complex parameter adjustment process and one
loss dominant situation. The experimental results show that the model is more feasible and effective. In
the following work, the detection ability of the model can be further improved by adding Attention
Mechanism and Data Augmentation.</p>
    </sec>
    <sec id="sec-12">
      <title>6. Reference</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>CNCERT</given-names>
            ,
            <surname>China</surname>
          </string-name>
          <string-name>
            <surname>Internet</surname>
          </string-name>
          <source>network security report</source>
          <year>2020</year>
          [r/ol] [
          <fpage>2021</fpage>
          -07- 21]．https://www.cert.org.cn/publish/main/upload/File/
          <year>2020</year>
          %20Annual%
          <fpage>20Report</fpage>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Nataraj</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karthikeyan</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jacob</surname>
            <given-names>G</given-names>
          </string-name>
          , et al.
          <source>Malware Images: Visualization and Automatic Classification. ACM</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Zhang</given-names>
            <surname>Jinglian</surname>
          </string-name>
          , Peng Yanbing.
          <article-title>Research on malicious code classification based on feature fusion[J]</article-title>
          .
          <source>Computer Engineering</source>
          ,
          <year>2019</year>
          ,
          <volume>45</volume>
          (
          <issue>08</issue>
          ):
          <fpage>281</fpage>
          -
          <lpage>286</lpage>
          +
          <fpage>295</fpage>
          . DOI:
          <volume>10</volume>
          .19678/j.issn.
          <volume>1000</volume>
          -
          <fpage>3428</fpage>
          .
          <fpage>0051790</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Liu</given-names>
            <surname>Yashu</surname>
          </string-name>
          , Wang Zhihai, Hou Yueran,
          <string-name>
            <given-names>Yan</given-names>
            <surname>Hanbing</surname>
          </string-name>
          .
          <article-title>Visualization and automatic classification of malicious code with enhanced information density[J]</article-title>
          .
          <source>Journal of Tsinghua University (Natural Science Edition)</source>
          ,
          <year>2019</year>
          ,
          <volume>59</volume>
          (
          <issue>01</issue>
          ):
          <fpage>9</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ren</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            <given-names>R</given-names>
          </string-name>
          , et al.
          <article-title>Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis &amp; Machine Intelligence</source>
          ,
          <year>2017</year>
          ,
          <volume>39</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1137</fpage>
          -
          <lpage>1149</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Saxe</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berlin</surname>
            <given-names>K .</given-names>
          </string-name>
          <article-title>Deep neural network based malware detection using two dimensional binary program features</article-title>
          [C]// International Conference on Malicious &amp;
          <article-title>Unwanted Software</article-title>
          . IEEE,
          <year>2015</year>
          ．
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Ibrahim</given-names>
            <surname>Ghafir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Vaclav</given-names>
            <surname>Prenosil</surname>
          </string-name>
          .
          <article-title>Malicious File Hash Detection</article-title>
          and
          <string-name>
            <surname>Drive-by Download Attacks</surname>
          </string-name>
          [J].
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K</given-names>
            <surname>Oksuz</surname>
          </string-name>
          ,
          <string-name>
            <surname>Cam</surname>
            <given-names>B C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akbas</surname>
            <given-names>E</given-names>
          </string-name>
          , et al.
          <article-title>Rank &amp; Sort Loss for Object Detection</article-title>
          and
          <string-name>
            <given-names>Instance</given-names>
            <surname>Segmentation</surname>
          </string-name>
          [J].
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Wang</given-names>
            <surname>Yinglong</surname>
          </string-name>
          , Huang Zuyuan, Liu Ailian,
          <article-title>Lichuan Testing of malicious code detection method based on texture feature[J]</article-title>
          .
          <source>Mobile communication</source>
          ,
          <year>2017</year>
          ,
          <volume>41</volume>
          (
          <issue>13</issue>
          ):
          <fpage>46</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Kunwar</surname>
            <given-names>R S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharma</surname>
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Malware</surname>
            <given-names>Analysis</given-names>
          </string-name>
          : Tools and Techniques[C]// International Conference on Information &amp;
          <article-title>Communication Technology for Competitive Strategies</article-title>
          .
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Jinpei</surname>
            <given-names>Yan</given-names>
          </string-name>
          , Yong Qi,
          <string-name>
            <given-names>Qifan</given-names>
            <surname>Rao</surname>
          </string-name>
          .
          <source>Detecting Malware with an Ensemble Method Based onDeep Neural Network[J]</source>
          .
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>