<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deep Quarantine for Suspicious Mail?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nikita Benkovich</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roman Dedenok</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitry Golubev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kaspersky</institution>
          ,
          <addr-line>Moscow 125212</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Nikita.Benkovich</institution>
          ,
          <addr-line>Roman.Dedenok, Dmitry.S.Golubev</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we introduce DeepQuarantine (DQ), a cloud technology to detect and quarantine potential spam messages. Spam attacks are becoming more diverse and can potentially be harmful to email users. Despite the high quality and performance of spam ltering systems, detection of a spam campaign can take some time. Unfortunately, in this case some unwanted messages get delivered to users. To solve this problem, we created DQ, which detects potential spam and keeps it in a special Quarantine folder for a while. The time gained allows us to double-check the messages to improve the reliability of the anti-spam solution. Due to high precision of the technology, most of the quarantined mail is spam, which allows clients to use email without delay. Our solution is based on applying Convolutional Neural Networks on MIME headers to extract deep features from large-scale historical data. We evaluated the proposed method on real-world data and showed that DQ enhances the quality of spam detection.</p>
      </abstract>
      <kwd-group>
        <kwd>spam ltering spam detection learning cloud technology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        deep
Nowadays it is hard to imagine a life without e-mail communication, particularly
in business area. The growth of e-mail's popularity is accounted for low cost and
high e ectiveness of exchanging messages. The same factors contribute to the
increasing amount of spam. According to a report by Kaspersky [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], the average
percentage of spam in the global mail tra c in Q1-Q2 2019 was 57.64%, up 1.67
p.p. compared to the previous reporting period. The largest share of spam was
recorded in May (58:71%). In Q2 2019, Kaspersky alone detected more than 43
million of malicious email attachments and about 130 million phishing attacks.
Statistics show that spam campaigns are a serious threat these days. A large
amount of spam in the mailbox causes a decrease in performance, wastage of
storage space and inconvenience when using e-mail. Moreover, spam messages
can carry malicious content, phishing and fraud schemes, which can harm both
casual users and business around the world.
? Copyright c 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
      </p>
      <p>
        Anti-spam software companies aim to protect users against malicious mail,
and, crucially, ensure the delivery of all legitimate messages to them. Otherwise,
even one misclassi ed message, for example, from a business conversation, can
lead to signi cant reputation risks. To reach a low false positive rate, anti-spam
decisions must be very reliable, which obviously reduces detection rate. To solve
this problem, commercial anti-spam products delay potential spam messages to
recheck them after a certain time to improve the reliability of the anti-spam
solution. The Axway Inc. in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] described delay technique in e-mail ltering
system, which provides a store and the transmission path of quarantined data.
This mechanism was established reliable and now its di erent modi cations are
used in many companies such as Cisco, Barracuda and others.
      </p>
      <p>
        In this paper, we describe a novel approach to quarantine messages. Our
work focuses on applying Deep Learning [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] techniques on MIME (Multipurpose
Internet Mail Extensions) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] headers to classify potential spam. Unlike most
research papers, our solution does not process body content of a message. The
proposed architecture has three inputs: a char sequence of Message-Id, a sequence
of headers and X-Mailer. For extracting information from sequential data, we use
one-dimensional convolutional neural network (CNN). This method was applied
on characters to text classi cation [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. It has been shown that this approach can
be competitive to traditional solutions for example with a simple long-short term
memory net (LSTM) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Moreover, CNNs do not depend on the computations
of the previous states unlike LSTM. This fact a ects model performance, which
is extremely important in real-time services.
      </p>
      <p>We evaluated our approach on a large-scale dataset. In the experiments, we
showed that combination of our proposed model and traditional spam lters
improves in classi cation rates.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>Cybercriminals continue to look for new ways to spread spam and improve
previous techniques. Traditional signature approaches are becoming less e ective
compared to previous years. The reasons are poor generalization ability and the
need to use human resources to nd new attacks and develop signatures to block
them.</p>
      <p>
        Machine learning techniques have recently become very e ective to ght
spam. Most research papers propose di erent methods to handle body content
of a message. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] suggested a defence strategy against poisoning attacks, when
spammers enrich messages with legitimate words to defeat lters. They showed
that bagging ensembles could be very promising in this task. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] authors
applied deep learning and transfer learning techniques to detect di erent attacks
such as phishing, social engineering, propaganda and others. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] demonstrated a
phishing content classi er based on a recurrent neural network.
      </p>
      <p>
        There are also related works that use non-content features for spam
detection. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] noted that message headers are a powerful source of features for spam
ltering. The experiments showed that using only features from headers could
achieve comparable or better performance than where using body content. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
and [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] authors proposed hand-crafted methods to extract features from e-mail
headers, and evaluated performance of various machine learning classi ers using
a prepared corpus.
      </p>
      <p>
        Publicly available benchmark datasets on e-mail spam highlighted in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] are
not regularly updated thus do not re ect actual threats. Publication of real
email collections is almost impossible since this data is associated with numerous
con dential and legal restrictions. Moreover, available datasets can be highly
biased because they contain conversations between a small group of users. For
example, the popular Enron corpus [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is deemed to be in the public domain
as the result of an investigation after the company's collapse and contains only
communications between Enron employees. These factors complicate research in
this area and the adaptation of the proposed methods in the real world.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Method</title>
      <p>In this section, we introduce the design of DQ. We describe three main parts of
the new technology. First, we focus on backend logic, which is responsible for
message transactions and the system-customer relationship. Then we illustrate
preprocessing of message headers. Finally, we show design of model for spam
classi cation.
3.1</p>
      <sec id="sec-3-1">
        <title>Backend logic</title>
        <p>According to Figure 1a messages in origin-based scheme are processed by
complicated system of spam lters before delivered to user. Moreover, spam lters are
regularly updated because statistical properties of spam campaigns change over
time. Indeed, this approach can be used and potentially provides high detection
rate. In real life, most missed spam is detected shortly after updating lters.
Unfortunately, the considered scheme delivers these messages to users because
spam decisions are made once when a message is received.</p>
        <p>To solve this problem we implemented DQ, illustrated on Figure 1b. DQ
is a cloud technology, which provides request-response logic with an installed
anti-spam service on user's machines. The main objective of DQ is classi cation
of potential spam. After a messages passed through lters, the service sends
a request to DQ with message headers and waits for a response. Meanwhile,
DQ handles input data and returns true if message should be delayed or false
otherwise. Of course, in real life organization of this communication to process
the big data that accumulates from di erent user nodes is not a trivial task. We
do not go deep into implementation details and focus on logic of the technology.
As shown on Figure 1b, suspicious mail is put in the Quarantine folder for a
while, others are delivered to user. When the time is over, quarantined messages
pass through lters again. It should be noted that DQ only receives required
headers and returns the quarantine decision, all delayed mail in Quarantine
folder is located on the user PC.</p>
        <p>The proposed scheme allows to gain the time to update lters and
doublecheck suspicious messages to improve the reliability of the anti-spam solution.
Moreover, this implementation provides a low-cost way to update the model that
is extremely important to adapt to new spam tactics or changing mail transfer
protocols.</p>
        <p>(a) Origin-based</p>
        <p>(b) DQ implementation
It is well known that the feature selection plays a big role in model performance.
The e-mail provides a large amount of information about a sender and content
of the message. Some of this data can be absolutely useless and add unwanted
noise that can be a reason of lower model performance. Our solution is based on
non-content classi cation. Due to this fact, we are able to transfer data to cloud
service and collect this type of information using simpler way than in
contentbased case. Another important aspect is the ability to quickly extract features
from message. As far as DQ is a real-time service, performance is very important
to ensure email communication without delay.</p>
        <p>At the moment, the model takes: Message-ID, a sequence of message headers
(HeaderSeq) and X-mailer. To bypass protection systems and spread malicious
mail, spammers often use their own Mail User Agent (MUA). MUAs are
responsible for preparing email messages for transferring to a Mail Transfer Agent
(MTA). One of the MUA tasks is to create and ll correct MIME headers. Some
of attackers ignore it and can use random content for headers. Others try to
fake headers to make them look like real ones. We focus on Message-ID and
HeaderSeq for several reasons. Firstly, these features have non-trivial structure.
Secondly, the form of Message-Id and the order of headers in HeaderSeq can vary
depending on the type of MUA, which creates a tight connection between
features. These facts make compromising more di cult, which helps the model to
detect spam. We also added X-mailer to de ne MUA. Below we describe features
and their representation for the classi er.</p>
        <p>The Message-ID provides an identi er for messages and looks like a sequence
of US-ASCII characters between an angle bracket pair. For example:
Message-ID consists of two parts splitted by @. The left part of the
MessageID is a hash that has a speci c structure for di erent MUAs. The right part
is a domain. The Message-ID is transformed to a tensor size l, where each row
vector is a char embedding. For encoding, we build a vocabulary that maps
USASCII chars (without special characters) to trainable embeddings. In addition,
we added two symbols &lt; EOS &gt; for the end of a string and &lt; U N K &gt; for
unknown characters. In case the length of Message-ID is greater than l, the rst
l-characters are taken. In case length of Message-ID is less than l, the sequence
is lled with &lt; EOS &gt; to the length l.</p>
        <p>The HeaderSeq is a sequence of MIME headers in the message. The order of
headers can vary depending on the type of MUA. The encoding of HeaderSeq
has the same scheme as the Message-ID. The only di erence is that we operate
with header names, not characters. For example:</p>
        <p>subject:from:to:date:message-id:content-type:
is a possible HeaderSeq. The nal representation is a tensor with xed shape
where each row is an encoded header. The number of rows was estimated from
statistics as a 95-percentile of HeaderSeq length.</p>
        <p>The X-Mailer is the name of a MUA. Before encoding, we preprocess the
X-Mailer to get information only about the type of MUA. For an actual e-mail
program, we drop information about version and release. For example:</p>
        <p>Microsoft Windows Live Mail 14.0.8117.416
is transformed to Microsoft. This helps signi cantly reduce the size of the feature
space. We also conducted experiments that used the name and version of MUA,
but this did not increase performance. For an unknown e-mail program, we
created a special category. The encoding is done by using one-hot encoding.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Classi er design</title>
        <p>In this section, we describe the architecture of the spam classi er. Despite the fact
that DQ does not block messages, we cannot delay all of them for re-checking,
because this signi cantly increase the delivery of an e-mail. Moreover, to ensure
that e-mail work without delays, DQ has a time limit for the response. If the
time is over, the message is delivered to the user without applying DQ. For these
reasons, we have a trade-o between model complexity and computation time.</p>
        <p>
          Figure 2 demonstrates the model architecture. Following [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], for Message-ID
and HeaderSeq we applied a temporal CNN to extract features from sequential
data. This kind of CNN applies convolutional lters along one dimension and
capture all units from others. Also we used the one-dimensional version of the
max-pooling module applied in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>
          For the Message-ID we designed a subnet with four temporal convolution
layers with a xed number of lters for each of them. We applied relu as activation
function. Initially we use a layer with biggest lter size to extract information
from longer subsequences. After the rst and last layers, we inserted a temporal
max-pooling layer to ensure stability of training. In the HeaderSeq branch, we
used two layers: a temporal convolution layer and a temporal max-pooling layer.
The shallow architecture is the result of a small length of HeaderSeq. The
outputs from the convolutional nets are concatenated with the encoded X-Mailer to
a one-dimensional tensor as illustrated in Figure 2. Finally, we added two fully
connected layers and inserted a dropout [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] between them for regularization. We
used sigmoid activation to obtain the probability of spam for model's output.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>In many research papers it is stated that a CNN usually requires large-scale
datasets to work and achieve competitve performance in di erence areas.
Unfortunately, public datasets for spam classi cation are fairly small and do not show
actual threats because they are not regularly updated.</p>
      <p>In this work, we used a collection that consists of metadata from tens of
millions of real-time e-mail scans. We split the data into training and test datasets
by timestamp to avoid leaking information from the future into the past. We
sampled 120 million objects for training and 40 million objects for testing. In
both datasets, the proportion of spam is about 40 percent. We optimized weights
of the model using SGD with momentum of 0:9 to minimize the cross-entropy
loss. We initialized the model weights using a Gaussian distribution and trained
all layers together throughout nine epochs and halve the learning rate every
three epochs.</p>
      <p>We show the PR-curve in Figure 3 to demonstrate the model performance
on the test data. As mentioned earlier, a classi er should have high precision
to deliver legitimate e-mail messages without delay. We de ned a probability
threshold for which the precision is equal 0:998 and the recall is 0:823. We
tested the DQ with this classi er in the course of 4 weeks in the real world.
Our internal tests showed that the proposed technology detects up to 30% of
previously missed spam.</p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>This article proposes a non-content-based classi cation approach to delay
potential spam messages in real time. On the one hand, we demonstrated a novel
feature set and way to handle it for a spam classi cation task. On the other
hand, we show that this method is well-suited for enterprise solutions because it
has a simple update scheme, high performance and a low false positive rate.
Furthermore, combining this technology with resource-intensive checks that require
additional time for veri cation/response (such as a Whois requests, in-depth
content veri cation, etc), we can get a fast and cost-e ective system for detecting
spam messages.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bhowmick</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hazarika</surname>
            ,
            <given-names>S. M.:</given-names>
          </string-name>
          <article-title>Machine Learning for E-mail Spam Filtering: Review Techniques and Trends</article-title>
          .
          <source>arXiv preprint arXiv:1606.01042</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Biggio</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corona</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fumera</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giacinto</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Bagging Classi ers for Fighting Poisoning Attacks in Adversarial Classi cation Tasks</article-title>
          .
          <source>In: MCS</source>
          <year>2011</year>
          ,
          <article-title>LNCS</article-title>
          , vol.
          <volume>6713</volume>
          , pp.
          <fpage>350</fpage>
          -
          <lpage>359</lpage>
          , Springer,Verlag (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Boureau</surname>
            ,
            <given-names>Y.-L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bach</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>LeCun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponce</surname>
          </string-name>
          , J.:
          <article-title>Learning mid-level features for recognition. In: Computer Vision and Pattern Recognition (CVPR)</article-title>
          , IEEE Conference on, pp.
          <fpage>2559</fpage>
          -
          <lpage>2566</lpage>
          , IEEE,
          <string-name>
            <surname>Finland</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dhamani</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azunre</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gleason</surname>
            ,
            <given-names>J. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corcoran</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Honke</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kramer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Morgan J.:
          <article-title>Using Deep Networks and Transfer Learning to Address Disinformation</article-title>
          . arXiv preprint arXiv:
          <year>1905</year>
          .
          <volume>10412</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Google</given-names>
            <surname>Patents</surname>
          </string-name>
          <article-title>Delay technique in e-mail ltering system</article-title>
          , https://patents.google.com/patent/US20090157708A1/en.
          <source>Last accessed 15 Sep 2019</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Halgas</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agra</surname>
            <given-names>otis</given-names>
          </string-name>
          , I.,
          <string-name>
            <surname>Nurse</surname>
          </string-name>
          , J.:
          <article-title>Catching the Phish: Detecting Phishing Attacks using Recurrent Neural Networks (RNNs)</article-title>
          .
          <source>In: 20th World Conference on Information Security Applications</source>
          ,
          <string-name>
            <surname>Korea</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
            ,
            <given-names>R. R.</given-names>
          </string-name>
          :
          <article-title>Improving neural networks by preventing coadaptation of feature detectors</article-title>
          .
          <source>arXiv preprint arXiv:1207.0580</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural Comput</source>
          .
          <volume>9</volume>
          (
          <issue>8</issue>
          ),
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngai</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <string-name>
            <given-names>A Scalable</given-names>
            <surname>Intelligent</surname>
          </string-name>
          Non-contentbased Spam- ltering
          <string-name>
            <surname>Framework</surname>
          </string-name>
          .
          <source>Expert Systems with Applications</source>
          <volume>37</volume>
          (
          <issue>12</issue>
          ),
          <fpage>8557</fpage>
          -
          <lpage>8565</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Klimt</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>The Enron corpus: A new dataset for email classi cation research</article-title>
          .
          <source>In: Proceedings of the 15th European Conference on Machine Learning</source>
          , pp.
          <fpage>217</fpage>
          -
          <lpage>226</lpage>
          , Pisa,
          <string-name>
            <surname>Italy.</surname>
          </string-name>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Qaroush</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khater</surname>
            ,
            <given-names>I. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Washaha</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Identifying spam e-mail based-on statistical header features and sender behavior</article-title>
          .
          <source>In: CUBE International Information Technology Conference</source>
          , pp.
          <fpage>771</fpage>
          -
          <lpage>778</lpage>
          , ACM, USA (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <article-title>RFC 1521 Mechanisms for Specifying and Describing the Format of Internet Message Bodies</article-title>
          , https://tools.ietf.
          <source>org/html/rfc1521. Last accessed 15 Sep 2019</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <source>Deep Learning in Neural Networks: An Overview. Neural Networks</source>
          <volume>61</volume>
          ,
          <fpage>85</fpage>
          -
          <lpage>117</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <article-title>Securelist Spam and</article-title>
          phishing in
          <source>Q2</source>
          <year>2019</year>
          , https://securelist.com/spam-andphishing-in-q2-
          <year>2019</year>
          /92379/.
          <source>Last accessed 15 Sep 2019</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>An Evaluation of Statistical Spam Filtering Techniques Spam Filtering as Text Categorization</article-title>
          .
          <source>ACM Transactions on Asian Language Information Processing (TALIP) 3</source>
          (
          <issue>4</issue>
          ),
          <fpage>243</fpage>
          -
          <lpage>269</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , LeCun, Y.:
          <article-title>Character-level convolutional networks for text classi cation</article-title>
          .
          <source>In: NIPS</source>
          , pp.
          <fpage>649</fpage>
          -
          <lpage>657</lpage>
          , Montreal, Canada (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>