Exploring Adversarial Examples in Malware Detection

                             Octavian Suciu                           Scott E. Coull and Jeffrey Johns
                   University of Maryland, College Park                               FireEye, Inc.


                            Abstract                                the classification decision boundary. The perturbations do
                                                                    not change the semantics of the image as a human oracle
  The Convolutional Neural Network (CNN) architecture is in-
  creasingly being applied to new domains, such as malware          easily identifies the original label associated with the image.
  detection, where it is able to learn malicious behavior from         In the context of malware detection, adversarial examples
  raw bytes extracted from executables. These architectures         could represent an additional attack vector for an attacker
  reach impressive performance with no feature engineering ef-      determined to evade such a system. However, domain-
  fort involved, but their robustness against active attackers is   specific challenges limit the applicability of existing attacks
  yet to be understood. Such malware detectors could face a         designed against image classifiers on this task. First, the
  new attack vector in the form of adversarial interference with    strict semantics of binary files disallows arbitrary perturba-
  the classification model. Existing evasion attacks intended       tions in the input space. This is because there is a structural
  to cause misclassification on test-time instances, which have     interdependence between adjacent bytes, and any change
  been extensively studied for image classifiers, are not appli-
                                                                    to a byte value could potentially break the functionality of
  cable because of the input semantics that prevents arbitrary
  changes to the binaries. This paper explores the area of adver-   the executable. Second, limited availability of representative
  sarial examples for malware detection. By training an existing    datasets or robust public models limits the generality of ex-
  model on a production-scale dataset, we show that some pre-       isting studies. Existing attacks (Kolosnjaji et al. 2018) use
  vious attacks are less effective than initially reported, while   victim models trained on very small datasets and avoid the
  simultaneously highlighting architectural weaknesses that fa-     semantic issues entirely by appending adversarial noise at
  cilitate new attack strategies for malware classification. Fi-    the end of the binary files, and their generalization effective-
  nally, we explore more generalizable attack strategies that in-   ness is yet to be evaluated.
  crease the potential effectiveness of evasion attacks.               This paper sheds light on the generalization property of
                                                                    adversarial examples against CNN-based malware detectors.
                        Introduction                                By training on a production-scale dataset of 12.5 million bi-
The popularity of Convolutional Neural Network (CNN)                naries, we are able to observe interesting properties of exist-
classifiers has lead to their adoption in fields which have         ing attacks, and propose a more effective and generalizable
been historically adversarial, such as malware dectec-              attack strategy. Our contributions are as follows:
tion (Raff et al. 2017; Krčál et al. 2018). Recent advances in    • We measure the generalization of existing adversarial at-
adversarial machine learning have highlighted weaknesses               tacks and highlight the limitations that prevents them from
of classifiers when faced with adversarial samples. One such           being widely applicable.
class of attacks is evasion (Biggio et al. 2013), which acts        • We unearth an architectural weaknesses of a published
on test-time instances. The instances, also called adversar-           CNN architecture that facilitates the existing append-
ial examples, are modified by the attacker such that they are          based attack by Kolosnjaji et al. (Kolosnjaji et al. 2018).
misclassified by the victim classifier even though they still
resemble their original representation. State-of-the-art at-        • We propose a new attack which, by modifying the existing
tacks focus mainly on image classifiers (Szegedy et al. 2013;          bytes of a binary, has the potential to outperform append-
Goodfellow, Shlens, and Szegedy 2014; Papernot et al.                  based attacks without semantic inconsistencies.
2017; Carlini and Wagner 2017), where attacks add small
perturbations to input pixels that lead to a large shift in the                            Background
victim classifier feature space, potentially shifting it across     The Convolutional Neural Network (CNN) architecture has
                                                                    proven to be very successful across popular vision tasks,
Copyright c by the papers authors. Copying permitted for private
and academic purposes. In: Joseph Collins, Prithviraj Dasgupta,     such as image classification (He et al. 2016). This lead to
Ranjeev Mittu (eds.): Proceedings of the AAAI Fall 2018 Sympo-      an increased adoption in other fields and domains, with one
sium on Adversary-Aware Learning Techniques and Trends in Cy-       such example being text classification from character-level
bersecurity, Arlington, VA, USA, 18-19 October, 2018, published     features (Zhang, Zhao, and LeCun 2015), which turns out to
at http://ceur-ws.org                                               be extremely similar to the malware classification problem
                                   Convolution                       chitectural feature that we will revisit in our results: each
          PE Files   Embedding                     Gating
                                   Convolution
                                                                     pooled filter is mapped back to a specific 500-byte sequence
                                                                     and there are at most 128 such sequences that contribute to
                                   Fully          Temporal           the final classification across the entire input. Their reported
          Labels     Softmax
                                 Connected       Max-Pooling         results on a testing set of 77,349 samples achieved a Bal-
                                                                     anced Accuracy of 0.909 and Area Under the Curve (AUC)
                                                                     of 0.982.
      Figure 1: Architecture for the MalConv Model.

discussed in this paper. In this setting, a natural language         Adversarial Binaries. Unlike evasion attacks on im-
documents is represented as a sequence of characters, and            ages (Szegedy et al. 2013; Goodfellow, Shlens, and Szegedy
the CNN is applied on that one-dimensional stream of char-           2014; Papernot et al. 2017; Carlini and Wagner 2017), at-
acters. The intuition behind this approach is that a CNN is          tacks that alter the raw-bytes of PE files must maintain
capable of automatically learning complex features, such as          the syntactic and semantic fidelity of the original file. The
words or word sequences, by observing compositions of raw            Portable Executable (PE) standard (Microsoft 2018) defines
signals extracted from single characters. This approach also         a fixed structure for these files. A PE file contains a leading
avoids the requirement of defining language semantic rules,          header enclosing file metadata and pointers to the sections
and is able to tolerate anomalies in features, such as word          of the file, followed by the variable-length sections which
misspellings. The classification pipeline first encodes each         contain the actual program code and data. Changing bytes
character into a fixed-size embedding vector. The sequence           arbitrarily could break the malicious functionality of the bi-
of embeddings acts as input to a set of convolutional lay-           nary or, even worse, prevent it from loading at all.
ers, intermixed with pooling layers, then followed by fully             Recent work (Kolosnjaji et al. 2018) avoided this prob-
connected layers. The convolutional layers act as receptors,         lem by appending adversarial noise to the end of the binary.
picking particular features from the input instance, while           Since the appended adversarial bytes are not within the de-
the pooling layers act as filters to down-sample the feature         fined boundaries of the PE file, their existence does not im-
space. The fully connected layers act as a non-linear classi-        pact the binary’s functionality and there are no inherent re-
fier on the internal feature representation of instances.            strictions on the syntax of bytes (i.e., valid instructions and
                                                                     parameters). The trade-off, however, is that the impact of
                                                                     the appended bytes on the final classification is offset by the
CNNs for Malware Classification. Similar to this ap-                 features present in the original sample, which remain un-
proach, the security community explored the applicability of         changed. As we will see, these attacks take advantage of cer-
CNNs to the task of malware detection on binary files (Raff          tain vulnerabilities in position-independent feature detectors
et al. 2017; Krčál et al. 2018). Analogous to text, a file could   present in the MalConv architecture.
be conceptualized as a sequence of bytes that are arranged
into higher-level features, such as instructions or functions.
By allowing the classifier to automatically learn features in-       Datasets. To evaluate the success of evasion attacks
dicative of maliciousness, this approach avoids the labor-           against the MalConv architecture, we collected 16.3M PE
intensive feature engineering process typical of malware             files from a variety of sources, including VirusTotal, Revers-
classification tasks. Manual feature engineering proved to           ing Labs, and proprietary FireEye data. The data was used
be challenging in the past and lead to an arms race be-              to create a production-quality dataset of 12.5M training sam-
tween antivirus developers and attackers aiming to evade             ples and 3.8M testing samples, which we refer to as the Full
them (Ugarte-Pedrero et al. 2015). However, the robustness           dataset. It contains 2.2M malware samples in the training
of these automatically learned features in the face of evasion       set, and 1.2M in testing, which represents a realistic ratio of
is yet to be understood.                                             goodware to malware. The dataset was created from a larger
   In this paper, we explore evasion attacks by focusing on          pool of more than 33M samples using a stratified sampling
a byte-based convolutional neural network for malware de-            technique based on VirusTotal’s vhash clustering algorithm,
tection, called MalConv (Raff et al. 2017), whose architec-          which is based on dynamic and static properties of the bi-
ture is shown in Figure 1. MalConv reads up to 2MB of                naries. Use of stratified sampling ensures uniform coverage
raw byte values from a Portable Executable (PE) file as in-          over the canonical ‘types’ of binaries present in the dataset,
put, appending a distinguished padding token to files smaller        while also limiting bias from certain overrepresented types
than 2MB and truncating extra bytes from larger files. The           (e.g., popular malware families). In addition, we also created
fixed-length sequences are then transformed into an embed-           a smaller dataset whose size and distribution is more in line
ding representation, where each byte is mapped to an 8-              with Kolosnjaji et al.’s evaluation (Kolosnjaji et al. 2018),
dimensional embedding vector. These embeddings are then              which we refer to as the Mini dataset. The Mini dataset was
passed through a gated convolutional layer, followed by a            created by sampling 4,000 goodware and 4,598 malware
temporal max-pooling layer, before being classified through          samples from the Full dataset. Note that both datasets follow
a final fully connected layer. Each convolutional layer uses         a strict temporal split where test data was observed strictly
a kernel size of 500 bytes with a stride of 500 (i.e., non-          later than training data. We use the Mini dataset in order to
overlapping windows), and each of the 128 filters is passed          explore whether the attack results demonstrated by Kolosn-
through a max-pooling layer. This results in a unique ar-            jaji et al. would generalize to a production-quality model, or
whether they are artifacts of the dataset properties.              Benign Append. We propose two new append strategies,
                                                                   one intended to highlight a vulnerability specific to the Mal-
                    Baseline Performance                           Conv architecture, and the second to address the potentially
                                                                   long convergence time of the previously-proposed gradient-
To validate our implementation of the MalConv architec-            based attack. First, the Benign Append attack allows us to
ture (Raff et al. 2017), we train the classifier on both the       observe how the MalConv architecture encodes positional
Mini and the Full datasets, leaving out the DeCov regulariza-      features extracted from the input byte sequences. The attack
tion addition suggested by the authors. Our implementation         appends leading bytes extracted from benign instances that
uses a momentum-based optimizer with decay and a batch             are correctly classified with high confidence by the victim
size of 80 instances. We train on the Mini dataset for 10 full     classifier. The intuition behind this attack uses the observa-
epochs. We also trained the Full dataset for 10 epochs, but        tion that the leading bytes of a file are the most influential
stopped the process early due to a small validation loss1 . To     towards the classification decision (Raff et al. 2017). There-
assess and compare the performance of the two models, we           fore, it signals whether the maliciousness of the target could
test them on the entire Full testing set. The model trained on     be offset by appending highly-influential benign bytes.
the Full dataset achieves an accuracy of 0.89 and an AUC of
0.97, which is similar to the results published in the origi-      Algorithm 1 The FGM Append attack
nal MalConv paper. Unsurprisingly, the Mini model is much
                                                                    1: function FGMA PPEND(x0 , numBytes, )
less robust, achieving an accuracy of 0.73 and an AUC of
                                                                    2:    x0 ← PAD R ANDOM(x0 , numBytes)
0.82.                                                               3:    e ← G ET E MBEDDINGS(x0 )
                                                                    4:    eu ← G RADIENTATTACK(e, )
                        Append Attacks                              5:    for i in |x0 |...|x0 | + numBytes − 1 do
                                                                    6:        e[i] ← eu [i]
In this section we present various attack strategies that           7:    end for
address the semantic integrity constraints of PE files by           8:    x∗ ← E MBEDDING M APPING(e)
appending adversarial noise to the original file. We start          9:    return x∗
by presenting two attacks first introduced by Kolosnjaji et        10: end function
al. (Kolosnjaji et al. 2018) and evaluated against MalConv.        11: function G RADIENTATTACK(e, )
                                                                   12:    eu ← e −  ∗ sign(∇l (e))
                                                                   13:    return eu
Random Append. The Random Append attack works by                   14: end function
appending byte values sampled from a uniform distribution.         15: function E MBEDDING M APPING(ex )
This baseline attack measures how easily an append attack          16:    e ← A RRAY(256)
                                                                   17:    for byte in 0...255 do
could offset features derived from the file length, and helps      18:        e[byte] ← G ET E MBEDDINGS(byte)
compare the actual adversarial gains from more complex ap-         19:    end for
pend strategies over random appended noise.                        20:    for i in 0...|ex | do
                                                                   21:        x∗ [i] ← argminb∈0...255 (||ex [i] − e[b]||2 )
                                                                   22:    end for
Gradient Append. The Gradient Append strategy uses                 23:    return x∗
the input gradient value to guide the changes in the appended      24: end function
byte values. The algorithm appends numBytes to the can-
didate sample and updates their values over numIter itera-
tions or until the victim classifier is evaded. The gradient of
the input layer with respect to the classification loss ∇l indi-   FGM Append. Based on the observation that the conver-
cates the direction in the input space of the change required      gence time of the Gradient Append attack grows linearly
to shift the instance towards the other class. The represen-       with the number of appended bytes, we propose the ”one-
tation of all appended bytes is iteratively updated, starting      shot” FGM Append attack, an adaptation of the Fast Gra-
from random values. However, as the input bytes are mapped         dient Method (FGM) originally described in (Goodfellow,
to a discrete embedding representation in MalConv, the end-        Shlens, and Szegedy 2014). The pseudocode is described
to-end architecture becomes non-differentiable and its input       in Algorithm 1. Our attack starts by appending numBytes
gradient cannot be computed analytically. Therefore, this at-      random bytes to the original sample x0 and updating them
tack uses a heuristic to instead update the embedding vector       using a policy dictated by FGM. The FGM attack updates
and discretize it back in the byte space to the closest byte       each embedding value by a user specified amount  in a di-
value along the direction of the embedding gradient. We re-        rection that minimizes the classification loss l on the input,
fer interested readers to the original paper for details of this   as dictated by the sign of the gradient. In order to avoid the
discretization process (Kolosnjaji et al. 2018). The attack re-    non-differentiability issue, our attack performs the gradient-
quires numBytes∗numIter gradient computations and up-              based updates of the appended bytes on the embedding
dates to the appended bytes in the worst case, which could         space, while mapping the updated value to the closest byte
be prohibitively expensive for large networks.                     value representation in E MBEDDING M APPING using the L2
                                                                   distance metric. Unlike Gradient Append, the FGM Append
   1
       This was also reported in the original MalConv study.       applies a single update to the appended bytes, which makes
           1.0       activations cdf
                     file sizes cdf
                                                                             Algorithm 2 The Slack FGM attack
                                                                              1: function S LACK ATTACK(x0 )
                                                                      0.87    2:    m ← S LACK I NDEXES(x0 )
           0.8                                                        0.79    3:    e ← G ET E MBEDDINGS(x0 )
                                                                              4:    eu ← G RADIENTATTACK(e)
                                                                              5:    xu ← E MBEDDING M APPING(eu )
           0.6
                                                                      0.55    6:    x∗ ← x0
     CDF


                                                                              7:    for idx in m do
           0.4
                                                                              8:        x∗ [idx] ← xu [idx]
                                                                              9:    end for
                                                                             10:    return x∗
           0.2                                                               11: end function
                                                                             12: function S LACK I NDEXES(x)
                                                                             13:    s ← G ET PES ECTIONS(x)
           0.0                                                               14:    m ← A RRAY(0)
                 0                1000        2000      3000   4000          15:    for i in 0...|s| do
                                       byte sequence number                  16:        if s[i].RawSize > s[i].V irtualSize then
                                                                             17:             rs ← s[i].RawAddress + s[i].V irtualSize
Figure 2: CDF of the file sizes and activation locations af-                 18:             re ← s[i].RawSize
ter the max-pooling layer. Each byte sequence correspond                     19:             for idx in rs ...re do
to 500 bytes from the original file and could be selected at                 20:                 m ← A PPEND(m, idx)
most 128 times for each instance.                                            21:             end for
                                                                             22:        end if
                                                                             23:    end for
it very fast regardless of the number of appended bytes.                     24:    return m
                                                                             25: end function
                                       Slack Attacks
Besides the inability to append bytes to files that already ex-
ceed the model’s maximum size (e.g., 2MB for MalConv),                       section header. The gaps are inserted by the compiler and ex-
append-based attacks suffer from an additional limitation.                   ist due to misalignments between the virtual addresses and
Figure 2 plots the average frequency of each of the 4,195                    the multipliers over the block sizes on disk. We compute the
input byte sequences as input to the fully connected layer                   size of the gap between consecutive sections in a binary as
across a random set of 200 candidate malware samples. This                   RawSize − V irtualSize, and define its byte start index in
shows that, for example, while the first 1,000 byte sequences                the binary by the section’s RawAddress + V irtualSize.
(0.5 MB) in binaries correspond to 79% of the actual fea-                    By combining all the slack regions, S LACK I NDEXES returns
tures for the classifier, only 55% of the files are smaller than             a set of indexes over the existing bytes of a file, indicating
that. Additionally, 13% of the instances cannot be attacked                  that they can be modified.
at all because they are larger than the maximum file size for                   Although more complex byte update strategies are possi-
the classifier. The result shows not only that appended bytes                ble, potentially accounting for the limited leverage imposed
need to offset a large fraction of the discriminative features,              by the slack regions, we use the technique introduced for
but also that attacking the byte sequences of these discrim-                 the FGM Append attack in Algorithm 1, which proved to be
inative features directly will likely amplify the attack effec-              effective. Like in the case of FGM Append, updates are per-
tiveness due to their importance. Driven by this intuition, we               formed on the embeddings of the allowed byte indexes and
proceed to describing an attack strategy that would exploit                  the updated values are mapped back to the byte values using
the existing bytes of binaries with no side effects on the func-             the L2 distance metric.
tionality of the program.
                                                                                                        Results
                                                                             Here, we evaluate the attacks described in the previous sec-
Slack FGM. Our strategy defines a set of slack bytes                         tion in the same adversarial settings using both our Mini and
where an attack algorithm is allowed to freely modify bytes                  Full datasets. Our evaluation seeks to answer the following
in the existing binary without breaking the PE. Once identi-                 three questions:
fied, the slack bytes are then modified using a gradient-based
approach. The S LACK ATTACK function in Algorithm 2                          • How do existing attacks generalize to classifiers trained
highlights the architecture of our attack. The algorithm is                    on larger datasets?
independent of the strategy S LACK I NDEXES employed for                     • How vulnerable is a robust MalConv architecture to ad-
extracting slack bytes or the gradient-based method in G RA -                  versarial samples?
DIENTATTACK used to update the bytes.
   In our experiments we use a simple technique that empiri-                 • Are slack-based attacks more effective than append at-
cally proves to be effective in finding sufficiently large slack               tacks?
regions. This strategy extracts the gaps between neighbor-                      In an attempt to reproduce prior work, we select candidate
ing PE sections of an executable by parsing the executable                   instances from the test set set if they have a file size smaller
    # Bytes     Random         Benign            FGM                                      70       RandomAppend
                                                                                                   BenignAppend
               Mini Full      Mini Full       Mini Full                                            FGMAppend
                                                                                                   SlackFGM
      500      0%    0%       4%    0%        1%    13%                                   60
     2000      0%    0%       5%    0%        2%    30%
     5000      0%    0%       6%    1%        2%    52%
                                                                                          50
     10000     0%    0%       9%    1%        1%    71%


                                                                       success rate (%)
                                                                                          40
Table 1: Success Rate of the Append attacks for increasing
number of bytes on both the Mini and Full datasets.
                                                                                          30

than 990,000 bytes and are correctly classified as malware
                                                                                          20
by the victim. We randomly pick 400 candidates and test the
effectiveness of the attacks using the Success Rate (SR): the
                                                                                          10
percentage of adversarial samples that successfully evaded
detection.
                                                                                          0
                                                                                               0          2000     4000    6000     8000   10000
Append Attacks. We evaluate the append-based attacks                                                         number of modified bytes
on both the Mini and the Full datasets by varying the number
of appended bytes. Table 1 summarizes these results.              Figure 3: The success rate of the attacks as a function of the
   We observe that the Random Append attack fails on both         average number of modified bytes on the Full dataset.
datasets, regardless of the number of appended bytes. This
result is in line with our expectations, demonstrating that the   features from the original binary in the max pool operation.
MalConv model is immune to random noise and that the in-          Therefore, the architecture does not encode positional infor-
put size is not among the learned features. However, our re-      mation, which is a significant vulnerability that we demon-
sults do not reinforce previously reported success rates of up    strate can be exploited.
to 15% in (Kolosnjaji et al. 2018).                                  Additionally, we implemented the Gradient Append at-
   The SR of the Benign Append attack seems to progres-           tack proposed by Kolosnjaji et al., but failed to reproduce
sively increase with the number of added bytes on the Mini        the reported results. We aimed to follow the original descrip-
dataset, but fails to show the same behavior on the Full          tion, with one difference: our implementation, in line with
dataset. Conversely, on the FGM Append attack we observe          the original MalConv architecture, uses a special token for
that the attack fails on the Mini dataset, while reaching up to   padding, while Kolosnjaji et al. use the byte value 0 instead.
71% SR on the Full dataset. This paradoxical behavior high-       We evaluated our implementation under the same settings as
lights the importance of large, robust datasets in evaluating     the other attacks, but none of the generated adversarial sam-
adversarial attacks. One reason for the discrepancy in attack     ples were successful. One limitation of the Gradient Append
behaviors is that the MalConv model trained using the Mini        attack that we identified is the necessity to update the value
dataset (modeled after the dataset used by Kolosnjaji et al.)     of each appended byte at each iteration. However, different
has a severe overfitting problem. In particular, the success      byte indexes might converge to their optimal value after a
of appending specific benign byte sequences from the Mini         varying number of iterations. Therefore, successive and un-
dataset could be indicative of poor generalizability and this     necessary updates may even lead to divergence of some of
is further supported by the disconnect between the model’s        the byte values. Indeed, empirically investigating individual
capacity and the number of samples in the Mini dataset.           byte updates across iterations revealed an interesting oscil-
When we consider the one-shot FGM Attack’s success on             lating pattern, where some bytes receive the same sequence
the Full dataset and failure on the Mini dataset, this can also   of byte values cyclically in later iterations.
be explained by poor generalizability in the Mini model; the
single gradient evaluation does not provide enough informa-       Slack Attacks. We evaluate the Slack FGM attack over
tion for the sequence of byte changes made in the attack.         the Full dataset for the same experimental settings as above.
Recomputing the gradient after each individual byte change        In order to control the amount of adversarial noise added in
is expected to result in a higher attack success rate.            the slack bytes, we use the  parameter to define an L2 ball
   Aside from the methodological issues surrounding dataset       around the original byte value in the embedding space. Only
size and composition, our results also show that even a ro-       those values provided by the FGM attack that fall within the
bustly trained MalConv classifier is vulnerable to append at-      ball are considered for the slack attack, otherwise the orig-
tacks when given a sufficiently large degree of freedom. In-      inal byte value will remain. The upper bound for the SR
deed, the architecture uses 500 byte convolutional kernels        is 28% for  = 1.0, where we observed all the available
with a stride size of 500 and a single max pool layer for the     slack bytes being modified according to the gradient. In or-
entire file, which means that not only is it looking at a lim-    der to compare it with the append attacks, in Figure 3 we
ited set of relatively coarse features, but it also selects the   plot the SR as a function of the number of modified bytes.
best 128 activations locations irrespective of location. That     The results show that, while the FGM Append attack could
is, once a sufficiently large number of appended bytes are        achieve a higher SR, it also requires much larger number of
added in the FGM attack, they quickly replace legitimate          extra byte modifications. The Slack FGM attack achieves a
SR of 28% for an average of 1005 modified bytes, while the          against machine learning at test time. In Joint European
SR of the FGM Append lies around 20% for the same set-              conference on machine learning and knowledge discovery
ting. This results confirms our initial intuition that the coarse   in databases, 387–402. Springer.
nature of MalConv’s features requires consideration of the          Carlini, N., and Wagner, D. 2017. Towards evaluating the
surrounding contextual bytes within the convolutional win-          robustness of neural networks. In 2017 IEEE Symposium on
dow. In the slack attack, we make use of existing contextual        Security and Privacy (SP), 39–57. IEEE.
bytes to amplify the power of our FGM attack without hav-
                                                                    Goodfellow, I.; Shlens, J.; and Szegedy, C. 2014. Explain-
ing to generate a full 500-byte convolutional window using
                                                                    ing and harnessing adversarial examples. arXiv preprint
appended bytes.
                                                                    arXiv:1412.6572.
                      Related Work                                  Grosse, K.; Papernot, N.; Manoharan, P.; Backes, M.; and
                                                                    McDaniel, P. 2017. Adversarial examples for malware de-
The work by Barreno et al. (Barreno et al. 2010) was among          tection. In European Symposium on Research in Computer
the first to systematize attack vectors against machine learn-      Security, 62–79. Springer.
ing, where they distinguished evasion as a type of test-time
                                                                    He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep resid-
attack. Since then, several evasion attacks have been pro-
                                                                    ual learning for image recognition. In Proceedings of the
posed against malware detectors. Many of these attacks fo-
                                                                    IEEE conference on computer vision and pattern recogni-
cus on additive techniques for evasion, where new capabili-
                                                                    tion, 770–778.
ties or features are added to cause misclassification. For in-
stance, Biggio et al. (Biggio et al. 2013) use a gradient-based     Hu, W., and Tan, Y. 2018. Black-box attacks against RNN
approach to evade malware detectors by adding new features          based malware detection algorithms. In The Workshops of
to PDFs, while Grosse et al. (Grosse et al. 2017) and Hu et         the The Thirty-Second AAAI Conference on Artificial Intel-
al. (Hu and Tan 2018) add new API calls to evade detec-             ligence, New Orleans, Louisiana, USA, February 2-7, 2018.
tion. More recently, Anderson et al. (Anderson et al. 2018)         Kolosnjaji, B.; Demontis, A.; Biggio, B.; Maiorca, D.; Gi-
used reinforcement learning to evade detectors by selecting         acinto, G.; Eckert, C.; and Roli, F. 2018. Adversarial mal-
from a pre-defined list of semantics-preserving transforma-         ware binaries: Evading deep learning for malware detection
tions. Similarly, Xu et al. (Xu, Qi, and Evans 2016) propose        in executables. 26th European Signal Processing Confer-
a genetic algorithm for manipulating PDFs while maintain-           ence (EUSIPCO ’18).
ing necessary syntax. Closest to our work is the gradient-          Krčál, M.; Švec, O.; Bálek, M.; and Jašek, O. 2018. Deep
based append attack by (Kolosnjaji et al. 2018) against the         convolutional malware classifiers can learn from raw ex-
CNN-based MalConv architecture. In comparison to earlier            ecutables and labels only. International Conference on
work, our slack-based attack operates on the raw bytes of           Learning Representations.
the binary, and modifies them without requiring the expen-          Microsoft.       2018.      Pe format.      https://docs.
sive feedback loop from the reinforcement learning agent            microsoft.com/en-us/windows/desktop/
and has the potential to outperform append-based attacks.           debug/pe-format.
                                                                    Papernot, N.; McDaniel, P.; Goodfellow, I.; Jha, S.; Celik,
                        Conclusion                                  Z. B.; and Swami, A. 2017. Practical black-box attacks
In this paper, we explored the space of adversarial exam-           against machine learning. In Proceedings of the 2017 ACM
ples against deep learning-based malware detectors. Our ex-         on Asia Conference on Computer and Communications Se-
periments indicate that the effectiveness of adversarial at-        curity, 506–519. ACM.
tacks on models trained using small datasets does not al-           Raff, E.; Barker, J.; Sylvester, J.; Brandon, R.; Catanzaro,
ways generalize to robust models. We also observe that the          B.; and Nicholas, C. 2017. Malware detection by eating a
MalConv architecture does not encode positional informa-            whole exe. arXiv preprint arXiv:1710.09435.
tion about the input features and is therefore vulnerable to
append-based attacks. Finally, we proposed the Slack FGM            Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan,
attack, which modifies existing bytes without affecting se-         D.; Goodfellow, I.; and Fergus, R. 2013. Intriguing proper-
mantics, with greater efficacy than append-based attacks.           ties of neural networks. arXiv preprint arXiv:1312.6199.
                                                                    Ugarte-Pedrero, X.; Balzarotti, D.; Santos, I.; and Bringas,
                        References                                  P. G. 2015. Sok: Deep packer inspection: A longitudinal
                                                                    study of the complexity of run-time packers. In 2015 IEEE
Anderson, H. S.; Kharkar, A.; Filar, B.; Evans, D.; and Roth,       Symposium on Security and Privacy (SP), 659–673. IEEE.
P. 2018. Learning to evade static pe machine learning
malware models via reinforcement learning. arXiv preprint           Xu, W.; Qi, Y.; and Evans, D. 2016. Automatically evading
arXiv:1801.08917.                                                   classifiers. In Proceedings of the 2016 Network and Dis-
                                                                    tributed Systems Symposium.
Barreno, M.; Nelson, B.; Joseph, A. D.; and Tygar, J. D.
2010. The security of machine learning. Machine Learn-              Zhang, X.; Zhao, J.; and LeCun, Y. 2015. Character-level
ing 81(2):121–148.                                                  convolutional networks for text classification. In Advances
                                                                    in neural information processing systems, 649–657.
Biggio, B.; Corona, I.; Maiorca, D.; Nelson, B.; Šrndić, N.;
Laskov, P.; Giacinto, G.; and Roli, F. 2013. Evasion attacks