<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Sympo-
sium on Adversary-Aware Learning Techniques and Trends in Cy-
bersecurity, Arlington, VA, USA</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Static Malware Detection &amp; Subterfuge: Quantifying the Robustness of Machine Learning and Current Anti-Virus</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>William Fleshman</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edward Raff</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Richard Zak</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mark McLean</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Charles Nicholas</string-name>
          <email>nicholas@umbc.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Booz Allen Hamilton</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Laboratory for Physical Sciences</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Maryland</institution>
          ,
          <addr-line>Baltimore County</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>1</volume>
      <fpage>8</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>As machine-learning (ML) based systems for malware detection become more prevalent, it becomes necessary to quantify the benefits compared to the more traditional anti-virus (AV) systems widely used today. It is not practical to build an agreed upon test set to benchmark malware detection systems on pure classification performance. Instead we tackle the problem by creating a new testing methodology, where we evaluate the change in performance on a set of known benign &amp; malicious files as adversarial modifications are performed. The change in performance combined with the evasion techniques then quantifies a system's robustness against that approach. Through these experiments we are able to show in a quantifiable way how purely ML based systems can be more robust than AV products at detecting malware that attempts evasion through modification, but may be slower to adapt in the face of significantly novel attacks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>The threat and impact of malicious software (malware) has
continued to grow every year. The annual financial impact
is already measured in the hundreds of billions of dollars
(Hyman 2013; Anderson et al. 2013). Simultaneously, there
are worries that the classical anti-virus approach may not
continue to scale and fail to recognize new threats (Spafford
2014).</p>
      <p>
        Anti-virus systems were historically built around signature
based approaches. Wressnegger et al. (
        <xref ref-type="bibr" rid="ref10 ref22 ref24 ref25 ref28 ref32 ref43 ref6">2016</xref>
        ) discussed a
number of issues with signatures, but the primary
shortcoming is an intrinsically static nature and inability to generalize.
Most anti-virus companies have likely incorporated machine
learning into their software, but to what extent remains
unclear due to the nature of proprietary information.
Regardless, adversaries can still successfully evade current detection
systems with minimal effort. The success of methods like
obfuscation and polymorphism is evident by its prevalence,
and recent work has suggested that the majority of unique
malicious files are due to the use of polymorphic malware
(Li et al. 2017).
      </p>
      <p>Given these issues and urgency, Machine Learning would
appear to be a potential solution to the problem of
detecting new malware. Malware detection can be directly phrased
as a classification problem. Given a binary, we define some
feature set, and learn a binary classifier that outputs either
benign or malicious. The output of this model could be
calibrated to reach any desired false-positive ratio.</p>
      <p>However, one should never switch to a new technology
for its own sake. It is necessary to have empirical and
quantifiable reasons to adopt a new approach for malware
detection. Ideally, this would be based off the accuracy of
current anti-virus systems compared to their machine-learning
counterparts. In reality, this is non-trivial to estimate, and
many arguments currently exist as to how this should be done
(Jordaney et al. 2016; Deo et al. 2016; Sommer and Paxson
2010). Designing a corpus and test protocol to determine
accuracy at large is hampered by issues like concept drift
(Jordaney et al. 2017), label uncertainty, cost (Miller et al.
2016), and correlations over time (Kantchelian et al. 2013;
Singh, Walenstein, and Lakhotia 2012).</p>
      <p>Toward remedying this issue, we propose a new testing
methodology that can highlight a detector’s strength or
weakness with regard to specific attack types. We first utilize the
framework from Biggio et al., to specify a model for our
adversary (Biggio, Fumera, and Roli 2014). After defining the
adversary’s goal, knowledge, and capabilities, we look at how
difficult it is for an adversary with known malware to evade
malware detection systems under comparison. As examples,
we compare two machine learning approaches, one a simpler
n-gram model and one neural network based, to a quartet of
production AV systems. While we do not exhaustively test
all possible evasion techniques, we show the potential for our
protocol using a handful of relevant tests: non-destructive
automated binary modification, destructive adversarial attacks,
and malicious injection. We are specifically looking only
at black box attacks which can be applied to any potential
malware detector, assuming that the adversary is looking for
maximal dispersion and not a targeted attack. This new
evaluation protocol is our primary contribution, and a black-box
adversarial attack for byte-based malware detectors our
second contribution. This evaluation protocol allows us to make
new conclusions on the potential benefits and weakness of
two byte based machine learning malware detectors.</p>
      <p>We will review previous work in section 2. The
experiments and methodology we use to compare our ML based
systems to anti-virus will be given in section 3 and section 4,
followed by their results in section 5 and conclude in
section 6.</p>
      <p>2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>Malware detection and analysis has been an area of active
research for several years. One class of techniques that is of
particular note is dynamic analysis, where the binary itself
is run to observe its behavior. Intuitively, malicious behavior
is the best indicator of a malicious binary — making
dynamic analysis a popular approach. However, dynamic
analysis has many complications. It requires significant effort and
infrastructure to make it accurately reflect user environments,
which are often not met in practice (Rossow et al. 2012).
Furthermore, the malware author can use a number of techniques
to detect dynamic execution, hide behavior, or otherwise
obfuscate such analysis. This means effective dynamic analysis
systems are often a cat-and-mouse game of technical issues,
and can require running binaries across many layers of
emulation (Egele et al. 2017).</p>
      <p>It is for these reasons that we focus on the static analysis
case. This removes the unlimited degrees of freedom
provided by actually running code. However, this does not mean
the malicious author has no recourse. Below we will review
the related work in this area.</p>
      <p>Questions regarding the security of machine learning based
systems have been an active area of research for over a decade
(Barreno et al. 2010), but have recently received increased
attention. This is in particular due to the success in generating
adversarial inputs to neural networks, which are inputs that
induce an incorrect classification despite only minor changes
to the input (Szegedy et al. 2014). These approaches generally
work by taking the gradient of the network with respect to
the input, and adjusting the input until the network’s output is
altered. This does not directly translate to malware detection
when using the raw bytes, as bytes are discontinuous in nature.
That is to say, any change in a single byte’s value is an equally
“large” change, whereas in images, adjusting a pixel value
can result in visually imperceptible changes. Arbitrary byte
changes may also result in a binary that does not execute,
making it more difficult to apply these attacks in this space
(Grosse et al. 2016; Russu et al. 2016). While such attacks
on individual models are possible (Kolosnjaji et al. 2018;
Kreuk et al. 2018), we are concerned with attacks that can be
applied to any detector, therefore these methods are out of
the scope of this work.</p>
      <p>
        To overcome the issue with arbitrary byte changes breaking
executables, Anderson, Filar, and Roth (
        <xref ref-type="bibr" rid="ref11 ref19 ref20 ref27 ref29 ref3 ref7">2017</xref>
        ) developed
a set of benign actions that can modify a binary without
changing its functionality. Selecting these actions at random
allowed malware to evade a ML model 13% of the time.
Introducing a Reinforcement Learning agent to select the
actions increased this to 16% of the time. Their ML model
used features from the PE header as well as statistics on
strings and bytes from the whole binary. We will replicate
this type of attack as one of the comparisons between our ML
systems and anti-virus.
      </p>
      <p>
        Another approach is to modify the bytes of a binary, and
run the malware detection system after modification. While
this risks breaking the binary, it can still be effective, and it
is important to still quarantine malware even if it does not
execute as intended. Wressnegger et al. (
        <xref ref-type="bibr" rid="ref10 ref22 ref24 ref25 ref28 ref32 ref43 ref6">2016</xref>
        ) modified each
binary one byte at a time, and found that single byte changes
could evade classical AV detectors. They used this approach
to find which bytes were important to the classification, and
reverse-engineered signatures used from multiple AV
products. We extend this technique in subsection 4.2 to create a
new adversarial approach against our models, which finds a
contiguous byte region which can be altered to evade
detection.
      </p>
      <p>
        For the machine learning based malware detection
methods, we look at two approaches: byte n-grams and neural
networks. The byte n-gram approach was one of the first
machine learning methods proposed for malware detection
(Schultz et al. 2001), and has received significant attention
(Kolter and Maloof 2006). We use the same approach for
training such models as presented in Raff et al. (
        <xref ref-type="bibr" rid="ref10 ref22 ref24 ref25 ref28 ref32 ref43 ref6">2016</xref>
        ). For
the neural network approach we use the recently proposed
MalConv, which processes an entire binary by using an
embedding layer to map bytes to feature vectors followed by
a convolutional network (Raff et al. 2017). Because the
embedding layer used by MalConv is non-differentiable with
respect to the input bytes, the aforementioned adversarial
approaches applied to image classification tasks are not as
easily applicable. Similarly, anti-virus products do not
provide a derivative for such attacks. This is why we develop a
new black-box adversarial approach in subsection 4.2.
      </p>
      <p>Using these modeling methods is beneficial for our
analysis because they are trained with minimal assumptions about
the underlying data. Other approaches to malware detection
that process the PE-header (Shafiq et al. 2009) or use
disassembly (Moskovitch et al. 2008) exist. We avoid these under
the belief that their use of explicit feature sets aids in evading
them (e.g, modifying the PE-Header is easy and allows for
direct evasion of models using those features).</p>
      <p>We leverage the existing framework from (Biggio, Fumera,
and Roli 2014) to model our adversary for each comparison
made. This framework prescribes defining the adversary’s
goals, knowledge, and capabilities in regards to the classifiers.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Adversarial Model</title>
      <p>For each experiment we specify the model of our adversary
by outlining the goals, knowledge, and capabilities available
(Biggio, Fumera, and Roli 2014). These make clear the
assumptions and use cases presented by each scenario.</p>
      <p>The goal of our adversary is the same across all
experiments – using a single attack to maximize the
misclassification rate of malicious files across many classifiers. The one
caveat is that in subsection 4.2 we only goes as far as
identifying a small portion of the file which can be altered to evade
detection. A real attacker would then have to alter those bytes
while retaining the malicious functionality of the file. It is
unlikely that authors of benign software would seek a malicious
classification – therefore we limit ourselves to experiments
on malware only.</p>
      <p>We severely constrain the knowledge of our adversary. In
all cases we assume the adversary has no knowledge of the
classifiers’ algorithms, parameters, training data, features, or
decision boundaries. This is the best case scenario for security
applications, as knowledge of any of the previous attributes
would increase the sophistication of possible attacks.</p>
      <p>Similarly, the experiments outlined in subsection 4.1 and
subsection 4.3 require no prior interaction with the
classifiers before deploying a malicious file to the wild. In
subsection 4.2, the adversary has the capability of querying our
n-gram model only. The inner workings of the model remain
hidden, but the output of the model is available for deductive
reasoning.</p>
      <p>These assumptions are restrictive to the adversary’s actions
and knowledge, but are important because current adversaries
are able to succeed under such restrictions today. This means
malware authors can obtain success with minimal effort or
expense. Through our evaluation procedure we are able to
better understand which techniques / products are most robust
to this restricted scenario, and thus will increase the effort
that must be expended by malware authors.</p>
    </sec>
    <sec id="sec-4">
      <title>4 Experiments and Methodology</title>
      <p>We now perform experiments under our proposed testing
methodology to compare machine learning models to
commercial anti-virus systems. To do so, we compare two
machine learning classifiers and four commercial anti-virus
products: AV1, AV2, AV3, and AV4. We use these anonymous
names to be compliant with their End User License
Agreements (EULA), which bars us from naming the products in
any comparative publication. We are prohibited from using
the internet based mass scanners, like VirusTotal, for similar
reasons. Ideally, we could test many products, but with our
large corpus of files we limit ourselves to these four – which
we believe to be representative of the spectrum of
commercially used AV platforms. We purposely did not chose any
security products which advertise themselves as being
primarily powered by machine learning or artificial intelligence.
Additionally, any cloud based features which could upload
our files were disabled to protect the proprietary nature of our
dataset. Our machine learning based approaches will be built
upon n-grams and neural networks which process a file’s raw
bytes. We will compare these systems on a number of tasks
to evaluate their robustness to evasion.</p>
      <p>We now describe our new testing protocol. Given a
corpus fx1; : : : ; xng of n files with known labels yi 2
fBenign; Maliciousg, and detector C( ) we first measure
the accuracy of C(xi); 8i. This gives us our baseline
performance. We then use a suite of evasion techniques
1( ); : : : ; K ( ) and look at the difference in accuracy
between C(xi) = yi and C( (xi)) = yi. The difference in
these scores tells us the ability of the detector C( ) to
generalize past its known data and catch evasive modifications. Any
approach for which C(xi) 6= C( j (xi)) for many different
evasion methods which preserve the label yi, is then an
intrinsically brittle detector, even if it obtains perfect accuracy
before is used.</p>
      <p>The raw detection accuracy is not the point of this test.
Each system will have been built using different and distinct
corpora of benign and malicious files. For the AV products,
these are used to ensure that a certain target level of false
positives and false negatives are met. In addition, the training
corpus we use for our models is orders of magnitude smaller
than what the AV companies will have access to. For this
reason we would not necessarily expect our approaches to
have better accuracy. The goal is to quantify the robustness
of any classifier to a specific threat model and attack.</p>
      <p>One novel and three pre-existing techniques (“ ”s) will
be used to perform this evaluation. These are not exhaustive,
but show how our approach can be used to quantify relative
performance differences. Below we will review their
methodology, and the information that we can glean from their
results. In section 5 we will review the results of running these
experiments.</p>
      <sec id="sec-4-1">
        <title>Dataset</title>
        <p>We take only a brief moment to expound upon the machine
learning models and data used for training, as the information
disparity with other AV products makes direct comparison
difficult. We trained the machine learning models on a large
dataset from industry discussed in (Raff and Nicholas 2017)
which contains approximately 2 million samples balanced
almost evenly between benign and malicious. Our test set
consists of approximately 80,000 files held out for our
posttraining experimentation. The test set is also almost perfectly
balanced. This set was not seen by our machine learning
models during training, but could have files previously seen by
the anti-virus corporations. The n-gram model follows the
procedure used in (Raff et al. 2016), but uses one million
features instead of 100,000. The MalConv approach is specified
in (Raff et al. 2017). Both of these approaches require no
domain knowledge to apply.
4.1</p>
      </sec>
      <sec id="sec-4-2">
        <title>Randomly Choosing Benign Modifications</title>
        <p>
          In this experiment, we use a subset1 of the modifications used
by Anderson, Filar, and Roth (
          <xref ref-type="bibr" rid="ref11 ref19 ref20 ref27 ref29 ref3 ref7">2017</xref>
          ) to alter malicious files
before scanning them with the classifiers. The objective is to
test the brittleness of the classifiers’ decision mechanisms by
making small changes that do not alter the functionality of
the malware. This experiment is ideal because it produces a
new file that is still functional, and should have no impact on
an ideal malware detector.
        </p>
        <p>There are nine different modification actions, and grouped
by type are:
• rename or create new sections
• append bytes to the end of a section or the file
• add an unused function to the import table
• create a new entry point (which jumps to the old entry)
• modify the header checksum, the signature, or debug info</p>
        <p>Each malicious file was scanned before the experiment
to see if it was already being misclassified as benign (false
negative). If it was correctly classified as malicious then a
modification was randomly selected from the set and applied.
This process was repeated up to ten times or until the file
evaded the classifier. Approaches that rely too heavily on
exact signatures or patterns should suffer in this test, and we</p>
        <sec id="sec-4-2-1">
          <title>1A UPX (un)packing option did not operate correctly.</title>
          <p>would expect any approach that uses purely dynamic features
to be unaffected by the modifications.</p>
          <p>This experiment is time intensive, and so we use a subset of
1,000 files randomly chosen from our test set. Anderson, Filar,
and Roth limited themselves to a small 200 file sample, so our
larger test set may provide additional confidence in the results.
The original paper proposed using a reinforcement learning
algorithm (RLA) to select the modifications in an adversarial
manner. We found that including a RLA did not change
results, but did limit our ability to test with all products in a
timely manner.
4.2</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>Targeted Occlusion of Important Bytes</title>
        <p>The first approach described requires extensive domain
knowledge based tooling, and would be difficult to adapt to
new file types. We introduce a novel approach to find
important byte regions of a malicious file given a working detector.
Identifying the most important region of a file gives feedback
to adversaries, and allows us to occlude that region as an
evasion technique.</p>
        <p>Our approach works by occluding certain bytes of a file,
and then running the malware detector on the occluded file.
By finding a contiguous region of bytes that reduced the
detector’s confidence that a file is malicious, we can infer
that the bytes in that region are important to the detector. If
occluding a small region causes the detector to change its vote
from malicious to benign, then we can infer that the detector
is too fragile. In this case it is plausible for a malware author
to determine what is stored in the small region detected, and
then modify the binary to evade detection.</p>
        <p>Manually editing groups of bytes one at a time would
be computationally intractable. Given a file F of jF j bytes,
and a desired contiguous region size of , it would take
O(jF j) calls to the detector to find the most important region.
We instead develop a binary-search style procedure, which
is outlined in Algorithm 1. Here C( ) returns the malware
detector’s confidence that a given file is malicious, and D is a
source of bytes to use for the occlusion of the original bytes.
This approach allows us to find an approximate region of
importance in only O(log jF j) calls to the detector C( ). In
the event that C(Fl) = C(Fr), ties can be broken arbitrarily
– which implicitly covers the boolean decisions from an
antivirus product.</p>
        <p>This method starts with a window size equal to half of the
file size. Both halves are occluded and the file is analyzed
by the classifier. The half that results in the largest drop in
classification confidence is chosen to be split for the next time
step. This binary search through the file is continued until the
window size is at least as small as the target window size.</p>
        <p>The last component of our approach is to specify what
method D should be used to replace the bytes of the original
file. One approach, which we call Random Occlusion, is
to simply select each replacement byte at random. This is
similar to prior work in finding the important regions of an
image according to an object detector, where it was found
that randomly blocking out the pixels was effective — even if
the replacement values were nonsensical (Zeiler and Fergus
2014). We also look at Adversarial Occlusion, where we use
a contiguous region selected randomly from one of our own</p>
        <sec id="sec-4-3-1">
          <title>Algorithm 1 Occlusion Binary Search</title>
          <p>13:</p>
          <p>size
Require: A file F of length jF j, a classifier C( ), target
occlusion size , byte replacement distribution D
1: split jF j=2, size jF j=2
2: start 0, end jF j
3: while size &gt; do
4: Fl F , Fr F
5: Fl[split size:split] contiguous sample D
6: Fr[split:split+size] contiguous sample D
7: if C(Fl) &lt; C(Fr) then
8: split split size=2
9: start split size, end split
10: else
11: split
12: start
split + size=2
split, end split + size
size=2
14: return start, end
benign training files to occlude the bytes in the file under
test. Since we use this approach only with malicious files,
Adversarial Occlusion is an especially challenging test for
our machine learning based methods: they have seen these
byte regions before with an explicit label of benign. This
test also helps to validate that all approaches aren’t simply
detecting high-entropy regions that “look” packed, and then
defaulting to a decision of malicious.</p>
          <p>To show that our search procedure provides meaningful
improvements for the Random and Adversarial Occlusions,
we will also compare against an Undirected Occlusion
approach that eschews our search procedure. In this case we
randomly select a region of the binary to occlude with
highentropy random bytes.</p>
          <p>Recall, that this method identifies the critical region – for a
successful attack the adversary would then have to manually
alter the portion of the file at that location while maintaining
their intended outcome. For this reason, we use a single
target window size of 2048 bytes. The resulting occluded area
will fall in the range of (1024-2048] bytes. On average, this
restricts the occluded section to under 0.5% of the file size for
our testing set. We also limit ourselves to searching for critical
regions with our n-gram model. It would be infeasible for an
adversary to conduct this search across all possible classifiers,
and we would expect there to be some overlap among which
regions are critical. This has been shown to be true in gradient
based classifiers, where an adversarial example generated by
one model can fool many others (Papernot, McDaniel, and
Goodfellow 2016).
4.3</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>ROP Injection</title>
        <p>The last method we will consider is of a different variety.
Instead of modifying known malicious files in an attempt to
evade detection, we will inject malicious functionality into
otherwise benign applications.</p>
        <p>
          Many such techniques for this exist. We used the Return
Oriented Programming (ROP) Injector (Poulios, Ntantogian,
and Xenakis 2015) for our work. The ROP Injector converts
malicious shellcode into a series of special control flow
instructions that are already found inside the file. These
instructions are inherently benign, as they already exist in the benign
file. The technique patches the binary in order to modify the
memory stack so that the targeted instructions, which are
noncontiguous in the file, are executed in an order equivalent to
the original shellcode. The result is that the functionality of
the file is maintained, with the added malicious activity
executing right before the process exits. We use the same reverse
Meterpreter shellcode as Poulios, Ntantogian, and Xenakis
(
          <xref ref-type="bibr" rid="ref26">2015</xref>
          ) for our experiment.
        </p>
        <p>We note that not all benign executables in our testing set
were injectable. The files must either have enough
instructions to represent the shellcode already, or have room for
further instructions to be added in a non-contiguous
manner so as to prevent raising suspicion. Additionally, if the
malware requires external functionality such as networking
capabilities, and these are not inherent to the original
executable, then portions of the PE header such as the import
table must be adjusted. For these reasons, only 13,560 files
were successfully infected.</p>
        <p>Poulios, Ntantogian, and Xenakis claim they were able to
evade anti-virus over 99% of the time using this technique.
This approach should be nearly undetectable with static
analysis alone, as all instructions in the file remain benign in
nature. This makes it an extremely difficult test for both the
machine learning and anti-virus detectors.</p>
        <p>5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>Before delving into results, we make explicit that we
considered Malware the positive class and Benign the negative
class . We remind the reader that not all tested systems are on
equal footing. Our n-gram and MalConv models are trained
on the same corpus, but we have no knowledge of the
corpora used by the AV companies and their products. In all
likelihood, they will be using datasets orders of magnitude
larger than our own to ensure the quality of their products and
to reach a desired false-positive rate. From this perspective
alone, we would not necessarily expect the machine learning
based methods to have better accuracies, as there is a
disparity of information at training time. Our methods are also not
optimized for the same low false-positive goal.</p>
      <p>The accuracies and metrics of each method are presented
in Table 1. Because of the mentioned information disparity
between each system, we do not present this information
for direct comparison. Our goal is to use these metrics as
a base-line measure of performance for each method, and
look at how these baselines are affected by various evasion
techniques.
5.1</p>
      <sec id="sec-5-1">
        <title>Benign Modifications</title>
        <p>The results of the experiment described in subsection 4.1 are
shown in Figure 1. All four anti-virus products were
significantly degraded by making seemingly insignificant changes
to the malware files. The machine learning models were
immune to this evasion technique. Both model’s confidence that
these files were malware changed by less than 1% on average.
The few files that did evade were very close to the models’
decision boundaries before the modifications were performed.</p>
        <sec id="sec-5-1-1">
          <title>Classifier</title>
        </sec>
        <sec id="sec-5-1-2">
          <title>N-Gram MalConv AV1 AV2</title>
          <p>AV3
AV4
1;000
s
e
l
i
F 400
800
600
200</p>
          <p>0
g
in 100
d
a
v
E 80
s
e
iF 60
l
f
eo 40
g
a
tn 20
e
c
r
eP 0
ram</p>
          <p>v
on
alC
M</p>
          <p>The anti-virus products did not perform as well in this test.
AV3 and AV4 were the most robust to these modifications,
with 130 and 215 files evading each respectively. While this
is better than products like AV1, where 783 out of 1000
malicious files where able to evade, it is still an order of
magnitude worse than the 3 and 20 files that could evade the
n-gram and MalConv models, respectively.</p>
          <p>Given these results, one might wonder — could the ML
based models be evaded by this approach if more
modifications were performed? In particular, a number of the
modifications have multiple results for a single action. So re-applying
them is not unreasonable. To answer this question, we look at
the evasion rate as a function of the number of modifications
performed in Figure 2. In this plot we can see that all four
AV products have an upward trending slope as the number
of modifications is increased. In contrast, the n-gram model
is completely unchanged from 3 or more modifications, and
MalConv at 4 or more. This would appear to indicate that the
ML approaches are truly immune to this evasion technique,
whereas the AVs are not.</p>
          <p>These results also speak to the use of dynamic evaluation,
or lack thereof, in the AV products at scan time. Any dynamic
analysis based approach should be intrinsically immune to
the modifications performed in this test. Because we see all
AV products fail to detect 13-78% of modified malware, we
can postulate that if any dynamic analysis is being done its
application is ineffectual.</p>
          <p>
            We make note that in the original work of Anderson,
Filar, and Roth (
            <xref ref-type="bibr" rid="ref11 ref19 ref20 ref27 ref29 ref3 ref7">2017</xref>
            ), they applied this attack to a simple
machine learning based model that used only features from
the PE header. This attack was able to evade the PE-feature
based model, but not the byte based models tested under our
framework.
5.2
          </p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>Targeted Occlusion</title>
        <p>The experiment detailed in subsection 4.2 attempts to modify
the most important region for detecting the maliciousness of
a binary as deduced from querying only the n-gram model
and attempting to transfer that knowledge to all others.</p>
        <p>The results for the byte occlusion tests are given in
Figure 3, where the blue bar shows the accuracy on 40,000
malicious files when no bytes are occluded. For both the n-gram
and MalConv models, we see that the Random, Undirected,
and Adversarial occlusion attacks have almost no impact on
classification accuracy. In the case of the n-gram model, only
0.13% of the files where able to evade its detection after
occlusion. Again evasions were highly correlated with files
close to the decision boundary.</p>
        <p>In particular, we remind the reader that the Adversarial
occlusion is replacing up to 2KB of the malicious file with
bytes taken from benign training data. This is designed to
maximally impact the ML approaches, as the model was
trained with the given bytes as having an explicit label of
benign. Yet, the Adversarial choice has almost no impact on
results. This is a strong indicator of the potential robustness of
the ML approach, and that simply adding bytes from benign
programs is not sufficient to evade their detection.</p>
        <p>Considering the AV products we again see a different story.
AV1 had the highest detection rate when no occlusion
occurred, but also had the largest drop in detection rate for
all three types of occlusion. AV4 had the best performance
4
s
e
l
iF3
2</p>
        <sec id="sec-5-2-1">
          <title>Baseline</title>
          <p>Random</p>
        </sec>
        <sec id="sec-5-2-2">
          <title>Undirected</title>
          <p>Adversarial
ram</p>
          <p>v
on
alC
M</p>
          <p>From the AV results it is also clear that the Targeted
Random occlusion is an improvement over the Undirected
occlusion of random bytes, showing that Algorithm 1 is effective
at identifying important regions. This was not discernible
from the n-gram and MalConv models, which are relatively
immune to this scenario.</p>
          <p>Although on average the occlusion was limited to 0.5%
of a file’s bytes, we acknowledge that one could argue this
transformation may have altered the true label of the file. The
search could be potentially removing the malicious payload
of the application, rendering it muted (if runnable). We accept
this possibility under the assumption that the adversary now
has enough information to alter this small portion of the file
in order to break possible signatures.</p>
          <p>The results from the machine learning models would
suggest that maliciousness is not contained to a small contiguous
section of most files. This makes intuitive sense, as malicious
functionality being employed in executable sections of a file
might also require functions to be referenced in the import
table. The non-contiguous nature of the PE file format allows
for many similar circumstances, and so we believe it unlikely
that all of a file’s maliciousness would be contained within a
small contiguous region of a binary. In addition, this results
in binaries that are analogous to deployed malware with bugs.
Just because malware may not function properly, and thus not
negatively impact users, doesn’t remove its malicious intent
and the benefit in detecting it.
In Table 2 we show the results of running the ROPInjector on
the benign test files. The results are shown only for the 13,560
files that the ROPInjector was able to infect. Before infection,
the Pre-ROP Accuracy column indicates the percentage of
files correctly labeled benign. After infection, the Post-ROP
Accuracy column indicates what percentage were correctly
labeled as malicious. The Post-ROP Lift then shows how
many percentage points of Post-ROP Accuracy came from
the classifier correctly determining that a formerly benign file
is now malicious, rather than simply being a false-positive.
That is to say, if a model incorrectly called a benign file
malicious, it’s not impressive when it calls an infected version
of the file malicious as well.</p>
          <p>Most malware detectors had significant difficulty with this
task. The machine learning based models, n-gram and
MalConv, showed only modest improvements of 0.4 and 1.2
percentage points. AV1 had a similarly small 0.6 improvement,
but surprisingly AV2 and AV3 had negative improvements.
That means for 1.4% of the benign files, AV3 called the
original benign file malicious. But when that same file was
infected by the ROPInjector, AV3 changed its decision to
benign. This is a surprising failure case, and may be caused
by the AV system relying too heavily on a signature based
approach (i.e., a signature had a false-positive on the benign
file, but the infection process broke the signature — turning
a would-be true positive into a false-negative).</p>
          <p>Overall these models showed no major impact from the
ROPInjector. Only AV4 was able to significantly adjust its
decision for 22.1% of the infected files to correctly label them
as malicious. Though the evasion rate for AV4 remains high
at 77%, it performed best in this scenario. Given the muted
and negative performance of the other AV systems in this test,
we suspect AV4 has developed techniques specifically for the
ROPInjector approach.</p>
          <p>We also notice that the benign files that were injectable
include 67% of our original false positives for the n-gram
model. We suspect that this is due to those files already
containing the networking functionality required by the
shellcode. The majority of malware requires communication over
the Internet, therefore our machine learning detectors may
view networking related features as somewhat malicious in
nature.</p>
          <p>Overall the ROPInjection of malicious code into
otherwise benign applications presents an apparent weakness for
our machine learning approaches. An important question
is whether the ROPInjection is functionally invisible to the
machine learning based models (i.e., it could never detect
features of the injection), or is simply not sufficiently
represented in our training data for the model to learn.</p>
          <p>This question was tested by applying the ROPInjector to all
of our training data, both benign and malicious files, which
resulted in 235,865 total ROPInjected binaries. We then trained
an n-gram model that tried to distinguish between
ROPInjected vs Not-ROPInjected binaries. The n-gram model was
able to obtain 97.6% accuracy at this task with an AUC of
99.6%. This indicates that the models could learn to detect
ROPInjection, but that it is not sufficiently prevalent in our
training corpus for the models to have learned.</p>
          <p>Overall this case highlights a potential advantage of the
classical AV systems with extensive domain knowledge.
When sufficiently novel attacks are developed for which
current models have almost no predictive power, it may be easier
to develop and deploy new signatures to catch the attacks —
while the machine learning approaches may require waiting
for data, or creating synthetic data (which has its own risks)
to adjust the model.</p>
          <p>6</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>We have demonstrated a new testing methodology for
comparing the robustness of machine learning classifiers to current
anti-virus software. Furthermore, we have provided evidence
that machine learning approaches may be more successful at
catching malware that has been manipulated in an attempt to
evade detection. Anti-virus products do a good job of
catching known and static malicious files, but their rigid decision
boundaries prevent them from generalizing to new threats or
catching evolutionary malware. We demonstrated that top tier
anti-virus products can be fooled by simple modifications
including changing a few bytes or importing random functions.
The machine learning models appear to better generalize
maliciousness — leading to an increased robustness to evasion
techniques compared to their anti-virus counterparts.
sion and Counter-Evasion. In Proceedings of Reversing and
Offensive-oriented Trends Symposium.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [2013]
          <string-name>
            <surname>Anderson</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Barton,
          <string-name>
            <given-names>C.</given-names>
            ;
            <surname>Böhme</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ; Clayton, R.; van Eeten,
          <string-name>
            <surname>M. J. G.</surname>
          </string-name>
          ; Levi,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ; and
            <surname>Savage</surname>
          </string-name>
          , S.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2013.
          <article-title>Measuring the Cost of Cybercrime</article-title>
          . Berlin, Heidelberg: Springer Berlin Heidelberg. 265-
          <fpage>300</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [2017]
          <string-name>
            <surname>Anderson</surname>
            ,
            <given-names>H. S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Filar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ; and Roth,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Evading Machine Learning Malware Detection</article-title>
          . Black Hat USA.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [2010]
          <string-name>
            <surname>Barreno</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nelson</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Joseph</surname>
            ,
            <given-names>A. D.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Tygar</surname>
            ,
            <given-names>J. D.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>The security of machine learning</article-title>
          .
          <source>Machine Learning</source>
          <volume>81</volume>
          (
          <issue>2</issue>
          ):
          <fpage>121</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [2014]
          <string-name>
            <surname>Biggio</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Fumera</surname>
          </string-name>
          , G.; and
          <string-name>
            <surname>Roli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Security evaluation of pattern classifiers under attack</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>26</volume>
          (
          <issue>4</issue>
          ):
          <fpage>984</fpage>
          -
          <lpage>996</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [2016]
          <string-name>
            <surname>Deo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dash</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Suarez-Tangil</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Vovk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Cavallaro</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Prescience: Probabilistic Guidance on the Retraining Conundrum for Malware Detection</article-title>
          .
          <source>In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security</source>
          , AISec '
          <volume>16</volume>
          ,
          <fpage>71</fpage>
          -
          <lpage>82</lpage>
          . New York, NY, USA: ACM.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [2017]
          <string-name>
            <surname>Egele</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Scholte</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kirda</surname>
          </string-name>
          , E.; and
          <string-name>
            <surname>Barbara</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>A</given-names>
            <surname>Survey On Automated Dynamic Malware Analysis Eva</surname>
          </string-name>
          [2016]
          <string-name>
            <surname>Grosse</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Papernot</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Manoharan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Backes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>McDaniel</surname>
            ,
            <given-names>P. D.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Adversarial perturbations against deep neural networks for malware classification</article-title>
          .
          <source>CoRR abs/1606</source>
          .04435.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [2013] Hyman,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <year>2013</year>
          . Cybercrime.
          <source>Communications of the ACM</source>
          <volume>56</volume>
          (
          <issue>3</issue>
          ):
          <fpage>18</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [2016]
          <string-name>
            <surname>Jordaney</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Papini</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nouretdinov</surname>
            ,
            <given-names>I.;</given-names>
          </string-name>
          and
          <string-name>
            <surname>Cavallaro</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2016</year>
          . Misleading Metrics :
          <article-title>On Evaluating Machine Learning for Malware with Confidence</article-title>
          .
          <source>Technical report</source>
          , University of London.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [2017]
          <string-name>
            <surname>Jordaney</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Sharad,
          <string-name>
            <given-names>K.</given-names>
            ;
            <surname>Dash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            ;
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ;
            <surname>Papini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ;
            <surname>Nouretdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.;</given-names>
            and
            <surname>Cavallaro</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Transcend: Detecting Concept Drift in Malware Classification Models</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>In 26th USENIX Security Symposium (USENIX Security 17)</source>
          ,
          <fpage>625</fpage>
          -
          <lpage>642</lpage>
          . Vancouver, BC: {USENIX} Association.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [2013]
          <string-name>
            <surname>Kantchelian</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Afroz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Islam</surname>
            ,
            <given-names>A. C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Tschantz</surname>
            ,
            <given-names>M. C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Greenstadt</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Joseph</surname>
            ,
            <given-names>A. D.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Tygar</surname>
            ,
            <given-names>J. D.</given-names>
          </string-name>
          <year>2013</year>
          . Approaches to Adversarial Drift.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>In Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security</source>
          , AISec '
          <volume>13</volume>
          ,
          <fpage>99</fpage>
          -
          <lpage>110</lpage>
          . New York, NY, USA: ACM.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [2018]
          <string-name>
            <surname>Kolosnjaji</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Demontis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Biggio</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Maiorca</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; Giacinto,
          <string-name>
            <surname>G.</surname>
          </string-name>
          ; Eckert,
          <string-name>
            <given-names>C.</given-names>
            ; and
            <surname>Roli</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Adversarial Malware Binaries: Evading Deep Learning for Malware Detection in Executables.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [2006]
          <string-name>
            <surname>Kolter</surname>
            ,
            <given-names>J. Z.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Maloof</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <year>2006</year>
          .
          <article-title>Learning to Detect and Classify Malicious Executables in the Wild</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>Journal of Machine Learning Research</source>
          <volume>7</volume>
          :
          <fpage>2721</fpage>
          -
          <lpage>2744</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [2018]
          <string-name>
            <surname>Kreuk</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Barak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Aviv-Reuven</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Baruch,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Pinkas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ; and
            <surname>Keshet</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Adversarial Examples on Discrete Sequences for Beating Whole-Binary Malware Detection</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [2017]
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Roundy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gates</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ; and Vorobeychik,
          <string-name>
            <surname>Y.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          2017.
          <article-title>Large-Scale Identification of Malicious Singleton Files</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy - CODASPY '17</source>
          <volume>227</volume>
          -
          <fpage>238</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [2016]
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kantchelian</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Tschantz</surname>
            ,
            <given-names>M. C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Afroz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Bachwani,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ; Faizullabhoy,
          <string-name>
            <given-names>R.</given-names>
            ;
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ;
            <surname>Shankar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ;
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ;
            <surname>Yiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ;
            <surname>Joseph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            ; and
            <surname>Tygar</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. D.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Reviewer Integration and Performance Measurement for Malware Detection</article-title>
          .
          <source>In Proceedings of the 13th International Conference on Detection of Intrusions and Malware</source>
          , and
          <string-name>
            <surname>Vulnerability</surname>
          </string-name>
          Assessment - Volume
          <volume>9721</volume>
          ,
          <string-name>
            <surname>DIMVA</surname>
          </string-name>
          <year>2016</year>
          ,
          <volume>122</volume>
          -
          <fpage>141</fpage>
          . New York, NY, USA: Springer-Verlag New York, Inc.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [2008]
          <string-name>
            <surname>Moskovitch</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Feher,
          <string-name>
            <given-names>C.</given-names>
            ;
            <surname>Tzachar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ;
            <surname>Berger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ;
            <surname>Gitelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Dolev</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ; and Elovici,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <year>2008</year>
          .
          <article-title>Unknown Malcode Detection Using OPCODE Representation</article-title>
          .
          <source>In Proceedings of the 1st European Conference on Intelligence and Security Informatics</source>
          , EuroISI '
          <volume>08</volume>
          ,
          <fpage>204</fpage>
          -
          <lpage>215</lpage>
          . Berlin, Heidelberg: Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [2016]
          <string-name>
            <surname>Papernot</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>McDaniel</surname>
            ,
            <given-names>P. D.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I. J.</given-names>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          2016.
          <article-title>Transferability in machine learning: from phenomena to black-box attacks using adversarial samples</article-title>
          .
          <source>CoRR abs/1605</source>
          .07277.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [2015] Poulios,
          <string-name>
            <surname>G.</surname>
          </string-name>
          ; Ntantogian,
          <string-name>
            <given-names>C.</given-names>
            ; and
            <surname>Xenakis</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [2017]
          <string-name>
            <surname>Raff</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Nicholas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Malware Classification and Class Imbalance via Stochastic Hashed LZJD</article-title>
          .
          <source>In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security</source>
          , AISec '
          <volume>17</volume>
          ,
          <fpage>111</fpage>
          -
          <lpage>120</lpage>
          . New York, NY, USA: ACM.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [2016]
          <string-name>
            <surname>Raff</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cox</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Sylvester,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ; Yacci,
          <string-name>
            <given-names>P.</given-names>
            ;
            <surname>Ward</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ;
            <surname>Tracy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>McLean</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          ; and Nicholas,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>An investigation of byte n-gram features for malware classification</article-title>
          .
          <source>Journal of Computer Virology and Hacking Techniques.</source>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [2017]
          <string-name>
            <surname>Raff</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Barker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sylvester</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Brandon,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ; Catanzaro,
          <string-name>
            <surname>B.</surname>
          </string-name>
          ; and Nicholas,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Malware Detection by Eating a Whole EXE</article-title>
          .
          <source>arXiv preprint arXiv:1710</source>
          .
          <fpage>09435</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [2012]
          <string-name>
            <surname>Rossow</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dietrich</surname>
            ,
            <given-names>C. J.</given-names>
          </string-name>
          ; Grier,
          <string-name>
            <given-names>C.</given-names>
            ;
            <surname>Kreibich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ;
            <surname>Paxson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ;
            <surname>Pohlmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ;
            <surname>Bos</surname>
          </string-name>
          , H.;
          <article-title>and van</article-title>
          <string-name>
            <surname>Steen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <article-title>Prudent Practices for Designing Malware Experiments: Status Quo and Outlook</article-title>
          .
          <source>In 2012 IEEE Symposium on Security and Privacy</source>
          ,
          <volume>65</volume>
          -
          <fpage>79</fpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [2016] Russu,
          <string-name>
            <given-names>P.</given-names>
            ;
            <surname>Demontis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Biggio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ;
            <surname>Fumera</surname>
          </string-name>
          , G.; and
          <string-name>
            <surname>Roli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Secure kernel machines against evasion attacks</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <source>In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security</source>
          , AISec '
          <volume>16</volume>
          ,
          <fpage>59</fpage>
          -
          <lpage>69</lpage>
          . New York, NY, USA: ACM.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [2001]
          <string-name>
            <surname>Schultz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Eskin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zadok</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Stolfo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <article-title>Data Mining Methods for Detection of New Malicious Executables</article-title>
          .
          <source>In Proceedings 2001 IEEE Symposium on Security and Privacy. S&amp;P</source>
          <year>2001</year>
          ,
          <volume>38</volume>
          -
          <fpage>49</fpage>
          .
          <source>IEEE Comput. Soc.</source>
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [2009]
          <string-name>
            <surname>Shafiq</surname>
            ,
            <given-names>M. Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Tabish</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Mirza</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ; and Farooq, M.
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          2009.
          <article-title>PE-Miner: Mining Structural Information to Detect Malicious Executables in Realtime</article-title>
          .
          <source>In Recent Advances in Intrusion Detection</source>
          .
          <fpage>121</fpage>
          -
          <lpage>141</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [2012]
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Walenstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Lakhotia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <article-title>Tracking Concept Drift in Malware Families</article-title>
          .
          <source>In Proceedings of the 5th ACM Workshop on Security and Artificial Intelligence</source>
          , AISec '
          <volume>12</volume>
          ,
          <fpage>81</fpage>
          -
          <lpage>92</lpage>
          . New York, NY, USA: ACM.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [2010]
          <string-name>
            <surname>Sommer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Paxson</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Outside the Closed World: On Using Machine Learning for Network Intrusion Detection</article-title>
          .
          <source>In Proceedings of the 2010 IEEE Symposium on Security and Privacy</source>
          , SP '
          <volume>10</volume>
          ,
          <fpage>305</fpage>
          -
          <lpage>316</lpage>
          . Washington, DC, USA: IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [2014]
          <string-name>
            <surname>Spafford</surname>
            ,
            <given-names>E. C.</given-names>
          </string-name>
          <year>2014</year>
          . Is Anti-virus
          <source>Really Dead? Computers &amp; Security</source>
          <volume>44</volume>
          :
          <fpage>iv</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [2014]
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zaremba</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Sutskever,
          <string-name>
            <surname>I.</surname>
          </string-name>
          ; Bruna,
          <string-name>
            <given-names>J.</given-names>
            ;
            <surname>Erhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ;
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.;</surname>
          </string-name>
          and Fergus,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2014</year>
          .
          <article-title>Intriguing properties of neural networks</article-title>
          .
          <source>In ICLR.</source>
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [2016]
          <string-name>
            <surname>Wressnegger</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Freeman</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yamaguchi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Rieck</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>From Malware Signatures to Anti-Virus Assisted Attacks</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [2014]
          <string-name>
            <surname>Zeiler</surname>
            ,
            <given-names>M. D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Fergus</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Visualizing and understanding convolutional networks</article-title>
          .
          <source>Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8689 LNCS(PART 1)</source>
          :
          <fpage>818</fpage>
          -
          <lpage>833</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>