<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Empirical Quantification of Spurious Correlations in Malware Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bianca Perasso</string-name>
          <email>bianca.perasso@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ludovico Lozza</string-name>
          <email>ludovico.lozza@hotmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Ponte</string-name>
          <email>andrea.ponte@edu.unige.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Demetrio</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Oneto</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Roli</string-name>
          <email>fabio.roli@unige.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università degli Studi di Cagliari</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università degli Studi di Genova</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>End-to-end deep learning exhibits unmatched performance for detecting malware, but such an achievement is reached by exploiting spurious correlations - features with high relevance at inference time, but known to be useless through domain knowledge. While previous work highlighted that deep networks mainly focus on metadata, none investigated the phenomenon further, without quantifying their impact on the decision. In this work, we deepen our understanding of how spurious correlation afects deep learning for malware detection by highlighting how much models rely on empty spaces left by the compiler, which diminishes the relevance of the compiled code. Through our seminal analysis on a small-scale balanced dataset, we introduce a ranking of two end-to-end models to better understand which is more suitable to be put in production.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Malware Detection</kwd>
        <kwd>Spurious Correlations</kwd>
        <kwd>Deep Neural Networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Antivirus programs are now deeply integrated with machine learning components, extending the
defensive capabilities of endpoints. In particular, one prominent approach is posed by end-to-end
techniques, which directly digest the raw bytes of programs, thus learning an intermediate representation
directly from data. This is opposed to regular feature extraction processes, that pre-process each sample
to extract aggregated and compressed information, later used at training time. While being
timeconsuming [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], feature extraction in cybersecurity contexts are easily prone to pre-processing errors [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
since malicious actors always try to complicate their programs to impede the analysis. Hence, even
if the requirement of available samples is higher, end-to-end models ditch this problem entirely, still
exhibiting excellent performance in production [
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6">3, 4, 5, 6</xref>
        ]. However, as a downside, these end-to-end
models are completely opaque in terms of which are the relevant features used to compute predictions.
In particular, previous work [
        <xref ref-type="bibr" rid="ref5 ref7 ref8">7, 5, 8</xref>
        ] debated whether Windows malware detectors implemented with
deep neural networks are afected by spurious correlations – correlations between the predicted class and
features that are known to be not relevant in terms of domain knowledge. While they all acknowledge an
excessive focus on the metadata of input samples, none of the previous work either (i) tried to quantify
how much these models rely on spurious correlations, and (ii) ignore other spurious correlations that
might be learned by those models. Hence, in this preliminary analysis, we investigate and propose
an empirical quantification of three spurious correlations that can be learned by Windows malware
detectors, while also highlighting the relevance of features known to be important. Such is achieved
by leveraging integrated gradients [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] (IG), a gradient-based method for computing the relevance of
each input feature of deep learning models. Relying on domain knowledge, we select the locations
inside samples where there might be spurious correlations, and we estimate their relevance through the
      </p>
      <p>Contribution of</p>
      <p>Spurious Correlations
DOS</p>
      <p>Code</p>
      <p>Slack
ℓ2 norm. Such aggregated relevance is then normalized to gain an understanding of their impact in
the decision. Through an experimental analysis on a balanced small-scale dataset, we show how two
state-of-the-art end-to-end deep networks are afected by spurious correlations, impacting the relevance
attributed to the compiled code which should be, in theory, the most relevant portion of any executable.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>Before describing our research work, we briefly present the key concepts to fully understand our
manuscript. In the following, we explain the type of data we deal with, how a malicious sample can be
detected, and how this decision can be explained.</p>
      <p>Windows PE File Format. In this
work, we focus on malware
detection for Windows operating systems.</p>
      <p>Each Windows program is stored
in Portable Executable (PE) format1.</p>
      <p>
        The PE format is composed of
several parts (Figure 2), containing
everything the OS needs to store, load, Figure 2: Representation of the Windows PE File Format.
and execute the program: (A) the
DOS header and stub, which compose a valid DOS program, are components kept only for backward
compatibility purposes, and they are not informative about the program structure; (B) the PE header
is the real header of the PE format, and, together with the (C) Optional Header, contains essential
information for storing and loading the program; the Section Table (D) and (E) the Sections, that
instructs the OS where to find the real content of the program, such as .text, being the compiled code of
the program (for all the others, we refer to the documentation of the format). While being necessary
for each program to comply with this structure, the format itself permits modifications. These are
often named code caves, and together with backward compatibility parts (A), leave an open field to
malware developers [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]. For instance, due to alignment requirements, the compiler allocates more
space than needed to store sections: these uninitialized bytes between sections are called slack bytes,
represented in gray in Figure 2 (E). Moreover, the PE format does not forbid the appending of bytes at
the end of the last section (overlay space, depicted in F).
      </p>
      <p>
        End-to-end Malware Detectors. In recent years, malware detection methods have fully embraced the
AI paradigm, training Machine Learning (ML) and Deep Learning (DL) models on past data [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Models
can learn from diferent kinds of malware analysis, which can extrapolate knowledge from the sole
program structure (static analysis) or the behavior of malicious samples, previously executed in isolated
1https://learn.microsoft.com/en-us/windows/win32/debug/pe-format
environments (dynamic analysis). In this work, we leverage end-to-end static detectors, which do not
rely on feature extraction, which can be faulty [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], but can learn directly from raw bytes, as proposed
by Raf et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Coull et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and Krčál et al [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Generally, these DL architectures treat each PE
byte as an input for the network, which is represented as a vector using an embedding layer. This
approach produces matrices in which each row corresponds to a byte, represented as a vector generated
by the embedding layer. After embedding, the input matrices pass through a series of convolutional
and pooling layers, followed by fully connected layers that generate the final output probability.
Integrated Gradients. Explaining predictions of deep neural networks is a thorny challenge, due to
the black-box nature of these complex models. In their work, Sundararajan et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] propose Integrated
Gradients, an attribution method that compute the relevance of each input feature in computing the
prediction of input samples. These contributions are computed with respect to a baseline, representing
a null signal from which the relevant features “emerge”. To calculate them, the method accumulates the
gradients of the model to interpret with respect to the input along a straight-line path from the baseline
to the actual input, and then it averages all them to find the relevance of each feature.
Related Work on Spurious Correlations in Malware Detection. Recent work rise the debate the
presence of spurious correlation in malware detection [
        <xref ref-type="bibr" rid="ref12 ref5 ref7 ref8">7, 5, 8, 12</xref>
        ]. While these work analyses peculiar
behaviors of malware detectors, supporting both the presence of useless features that might cause drop
in robustness [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], or deep networks being able to learn relevant locations inside headers [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], none of
them proposes an empirical way to quantify these awkward phenomena.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Measuring Spurious Correlations for Malware Detectors</title>
      <sec id="sec-3-1">
        <title>We now illustrate how we quantify the rele</title>
        <p>
          vance attributed by end-to-end Windows mal- Algorithm 1: Impact of Spurious
Correlaware detectors to known spurious correlations, tion in Windows malware detectors
and we report our methodology in Algorithm 1. Data:  , a malware detector; , a set of
First, we leverage domain knowledge on Win- Windows program
dows PE file format to isolate parts of each Result:  , reliance on spurious correlation
executable file that are known to be semanti- 1 , , , . ← 0
cally irrelevant for malware detection. While 2 for  ∈  do
there might be plenty, in this paper we focus 3  ← (, )
on three specific parts that should not be rel- 4  ← 1/(||||||22)
evant at inference time: the DOS header, the 5  ←  +  ||select_dos()||22
salnacekndby-ttoe-se,nadndWthinedoovwerslamyaslwpaacree. dTehteenct,ogri,vfeonr 67 ← ← + ||s+elec||ts_eolvecetr_lasyla(ck()||22)||22
each sample (line 2) we compute feature attri- 8 . ← . +  ||select_text()||22
butions using Integrated Gradients (IG) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] 9 end
(line 3). If the attributions computed by IG on
the selected regions are diferent from zero, it 10 return . − ( +  + )
means that the model is considering those to
compute predictions with other features as well, thus confirming the presence of spurious correlations.
These attribution, divided in the selected regions through domain knowledge, are characterized by
their ℓ2 norm (line 5 – line 8), and normalized accordingly with the total ℓ2 norm of the attributions. In
this way, we obtain a score for each spurious correlation that has an higher bound on the total norm
of the attribution: if this ratio leans towards 1, it mean that all the prediction relies on such portion
of the executable. We also select the .text section to show how much relevance is given to the most
important part of an executable, expecting it to be higher than the other numbers. Finally, we compute
a score based on the previous steps, by subtracting from the average relevance of the compiled code,
the average relevance of the spurious correlations (line 10). If this metric is positive, the analyzed model
is attributing most of relevance to the really important features. If it is zero or negative, it means that
the analyzed model is giving more relevance to spurious correlations, harming its reliability.
        </p>
        <sec id="sec-3-1-1">
          <title>Model</title>
          <p>MalConv
BBDNN</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Data</title>
          <p>goodware
malware
goodware
malware</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Analysis</title>
      <p>Experimental Setup. We detail here how we have setup our analysis, describing which data and deep
neural networks we have used.</p>
      <p>
        Dataset. We use a dataset composed of 210 malware samples and 210 goodware samples. We take
malware samples from the Speakeasy testset [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], sampling 30 PEs for each family (Backdoor, Coinminer,
Dropper, Keylogger, Ransomware, RAT and Trojan). We take benign samples from a fresh installation of
Windows 11, inside sys32 folder.
      </p>
      <p>
        Models. In our experimental work, we leverage two CNN architectures, mentioned in Sect. 2: the first is
MalConv, proposed by Raf et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the second is BBDNN, proposed by Coull et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. MalConv embeds
input sequences (truncated to 1 MB) with an 8-dimensional embedding space, then it implements a
single gated convolutional layer [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] with global max-pooling, followed by a single fully-connected layer
and softmax. BBDNN is quite similar to MalConv, but it has a larger embedding space (10 dimensions),
and forwards input sequences to five alternating convolutional and pooling layers and finally to a fully
connected layer with a sigmoid function. BBDNN results in a deeper architecture, and it pays this
price by truncating input byte sequences to 100 KB to have reasonable training times. Both models are
trained on the EMBER dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and we use the pretrained version included into maltorch2 library.
Setup of Integrated Gradients. The method requires two hyper-parameters: the baseline from which
compute the attributions, and the size of the interpolation. Regarding the baseline, we adopt the same
setting provided by previous work [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and we set the baseline as the empty file, thus filled with the
special character 256. Thereafter, we set the interpolation size to evaluate 50 gradients.
Setup of Algorithm 1. Since many malware samples are obfuscated, thus they have renamed or removed
the .text section, we use as code the first section that is marked as executable to conclude our analysis.
In this way, we ensure that each sample is correctly treated as specified in Algorithm 1.
Experimental Results. We now present the results of our analysis, by computing the relevance
of the four analyzed PE regions using Integrated Gradients (IG) method, applied to both MalConv
and BBDNN models. For each model, we report the average values computed for the four analyzed
regions of PE files, both for goodware and for malware, and all results are presented in Table 1. Even
in this simplified setting, we can observe that: (i) on average, both models rely more on the bytes
found in the executable section, which is evidenced by the positive aggregated scores, nevertheless
these are diminished by the presence of the spurious correlations; (ii) there is a notable distinction
between goodware and malware samples, particularly the values in the Slack regions are higher for
the goodware samples with respect to the malware ones, suggesting that the models might focus
too much on unused space; (iii) comparing the two models, their aggregated scores appear largely
similar on average, however in this analysis we do not account the diferent models’ input spaces;
(iv) for both models and across both datasets, the relevance attributed to overlay region is always
zero, which may indicate that the models are not attributing relevance to those bytes. However, it is
important to notice that the majority of the samples is longer than the actual file sizes: there are 283
ifles larger than 100 KB (BBDNN’s input space) and 100 files larger than 1 MB (MalConv’s input space).
      </p>
      <sec id="sec-4-1">
        <title>2https://github.com/zangobot/maltorch</title>
        <p>These trends are summarized by
the average response of Integrated
Gradients in Figure 3, where it is
clear that plenty of relevance is
attributed to bytes in executable
sections, but there are peaks in other
regions as well. Also, this
aggregated view highlight that it is very
likely that many other spurious
correlations have been learned by these
models which are located in other
places other than the ones we have
analyzed.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>Mean
length</p>
      <p>
        Limitations and Future Work. Our preliminary analysis is timely, but it has some limitations. First,
we tested our methodology with a small-scale dataset using only two state-of-the-art end-to-end neural
network. Since this analysis depends on both data and trained models, results might change accordingly.
Nevertheless, the dataset we have used is balanced, both in terms of ratio between goodware and
malware, but also in terms of the malware families we have considered. Secondly, our results do not
consider the length of the input window of each network. Hence, larger sections, which are kept
without being cut, contribute more to the total score computed for the model. However, with our
methodology we are characterizing how much the spurious correlations contribute to the norm of the
calculated attribution, showing that they absorb a consistent fraction of it. Thus, as future work, we
will consider larger dataset like analyzing the whole Speakeasy dataset [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], and other state-of-the-art
data sources, also including other end-to-end neural networks for malware detection [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Lastly, while
we only covered spurious correlations on static analysis, we will extend our work to also cover spurious
correlation in dynamic analysis, which track the execution of programs and train models on summary
reports of their activity.
      </p>
      <p>Final Remarks. In this preliminary analysis, we proposed a methodology for quantifying the impact
of spurious correlations in end-to-end Windows malware detectors. While most of the prediction
is focused inside the section containing code, a consistent amount of the attribution is also given to
spurious correlation. Through our analysis, we introduce a way to characterize such reliance to spurious
correlations in malware detectors, paving the road towards novel techniques that can, in the near future,
serve as the first benchmark of these technologies, providing a more reliable way to deploy those.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The authors acknowledge the help of Daniel Gibert for releasing trained models in maltorch library.
This work was partially supported by projects SERICS (PE00000014), FAIR (PE00000013) under the
NRRP MUR program funded by the EU - NGEU, and FISA-2023-00128 funded by the MUR program
“Fondo italiano per le scienze applicate”.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The authors declare that no generative AI tools were used to write the manuscript.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <article-title>Ember: an open dataset for training static pe malware machine learning models</article-title>
          , arXiv preprint arXiv:
          <year>1804</year>
          .
          <volume>04637</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ponte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Trizna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Demetrio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Biggio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. T.</given-names>
            <surname>Ogbu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Roli</surname>
          </string-name>
          ,
          <string-name>
            <surname>Slifer:</surname>
          </string-name>
          <article-title>Investigating performance and robustness of malware detection pipelines</article-title>
          ,
          <source>Computers &amp; Security</source>
          <volume>150</volume>
          (
          <year>2025</year>
          )
          <fpage>104264</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Raf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Barker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sylvester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brandon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Catanzaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. K.</given-names>
            <surname>Nicholas</surname>
          </string-name>
          ,
          <article-title>Malware detection by eating a whole EXE</article-title>
          ,
          <source>in: Workshops at the 32 AAAI Conf. on Art. Int.</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Krčál</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Švec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bálek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Jašek</surname>
          </string-name>
          ,
          <article-title>Deep convolutional malware classifiers can learn from raw executables and labels only</article-title>
          ,
          <source>in: ICLR Workshop</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Coull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gardner</surname>
          </string-name>
          ,
          <article-title>Activation analysis of a byte-based deep neural network for malware classification</article-title>
          ,
          <source>in: 2019 IEEE Sec. and Priv</source>
          .
          <source>Workshops (SPW)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gibert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mateu</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Planes,</surname>
          </string-name>
          <article-title>The rise of machine learning for detection and classification of malware: Research developments, trends and challenges</article-title>
          ,
          <source>Journal of Network and Computer Applications</source>
          <volume>153</volume>
          (
          <year>2020</year>
          )
          <fpage>102526</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Demetrio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Biggio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lagorio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Roli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Armando</surname>
          </string-name>
          ,
          <article-title>Explaining vulnerabilities of deep learning to adversarial malware binaries</article-title>
          ,
          <source>in: ITASEC</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Barao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Explaining ai for malware detection: Analysis of mechanisms of malconv</article-title>
          , in: 2020
          <source>international joint conference on neural networks (IJCNN)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundararajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Taly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>Axiomatic attribution for deep networks</article-title>
          ,
          <source>in: Int. Conf. on Machine Learning (ICML)</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>3319</fpage>
          -
          <lpage>3328</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kharkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Filar</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. Roth,</surname>
          </string-name>
          <article-title>Evading machine learning malware detection</article-title>
          ,
          <source>Black Hat</source>
          <year>2017</year>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Demetrio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Biggio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lagorio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Roli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Armando</surname>
          </string-name>
          ,
          <article-title>Functionality-preserving black-box optimization of adversarial windows malware</article-title>
          ,
          <source>IEEE Transactions on Information Forensics and Security</source>
          <volume>16</volume>
          (
          <year>2021</year>
          )
          <fpage>3469</fpage>
          -
          <lpage>3478</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Arp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Quiring</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pendlebury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Warnecke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pierazzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wressnegger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cavallaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rieck</surname>
          </string-name>
          ,
          <article-title>Dos and don'ts of machine learning in computer security</article-title>
          ,
          <source>in: 31st USENIX Security Symposium (USENIX Security 22)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>3971</fpage>
          -
          <lpage>3988</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>D.</given-names>
            <surname>Trizna</surname>
          </string-name>
          ,
          <article-title>Quo vadis: hybrid machine learning meta-model based on contextual and behavioral malware representations</article-title>
          ,
          <source>in: Proceedings of the 15th ACM Workshop on Artificial Intelligence and Security</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>127</fpage>
          -
          <lpage>136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y. N.</given-names>
            <surname>Dauphin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Auli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Grangier</surname>
          </string-name>
          ,
          <article-title>Language modeling with gated convolutional networks</article-title>
          ,
          <source>in: International conference on machine learning, PMLR</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>933</fpage>
          -
          <lpage>941</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>