<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>DeepAnti-PhishNet: Applying Deep Neural Networks for Phishing Email Detection CEN-AISecurity@IWSPA-2018</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vinayakumar R</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Barathi Ganesh HB</string-name>
          <email>barathiganesh.hb@arnekt.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anand Kumar M</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Soman KP</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Center for Computational Engineering</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Networking(CEN)</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Amrita School of Engineering</institution>
          ,
          <addr-line>Coimbatore Amrita Vishwa Vidyapeetham</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Arnekt Solutions Pvt. Ltd.</institution>
          ,
          <addr-line>Pentagon P-3, Magarpatta City Pune, Maharashtra</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Prabaharan Poornachandran Center for Cyber Security Systems and Networks, Amrita School of Engineering</institution>
          ,
          <addr-line>Kollam Amrita Vishwa Vidyapeetham</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2007</year>
      </pub-date>
      <volume>4</volume>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Phishing represents a genuine risk to the
Internet economy. Email has turned out to be a
necessary verbal exchange tool in
contemporary lifestyles. In recent days, email remains
as the foremost generally utilized medium to
dispatch phishing attacks. As a result,
detection of phishing emails has been considered as
an important task in the eld of
Cybersecurity. In this working note, we use word
embedding and Neural Bag-of-ngrams with deep
learning methods such as convolutional
neural network (CNN), recurrent neural network
(RNN), long short-term memory (LSTM) and
multi-layer perceptron (MLP) to detect
phishing email. Both word embedding and Neural
Bag-of-ngrams facilitates to extract syntactic
Copyright c by the paper's authors. Copying permitted for
private and academic purposes.</p>
      <p>
        In: R. Verma, A. Das (eds.): Proceedings of the 1st
AntiPhishing Shared Pilot at 4th ACM International Workshop on
Security and P
        <xref ref-type="bibr" rid="ref1 ref2">rivacy Analytics (IWSPA 2018</xref>
        ), Tempe, Arizona,
USA, 21-03-2018, published at http://ceur-ws.org
and semantic similarity of emails. Deep
learning algorithms facilitate to extract the
abstract and optimal feature representation and
fully connected layer with non-linear
activation function for classifcation. All the
experiments are done on anti-phishing sha
        <xref ref-type="bibr" rid="ref1 ref2">red task
corpus at IWSPA-AP 2018</xref>
        1. All the models
performed well during training phase.
Moreover, word embedding with LSTM obtains
10fold cross validation accuracy of 0.991 on sub
task 12 and 0.971 on sub task 23. Based on the
experimental results, we claim that word
embedding with deep learning, specifcally LSTM
is appropriate for the anti-phishing task.
1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>
        Email or electronic mail is an e ective type of
communication through electronic devices over the Internet.
It is one of the fastest and reasonable method of
information exchange varying from interpersonal to
intercorporational levels, ranging across continents as well
1dasavisha.github.io/IWSPA-sharedtask/
2Email without header
3Email with header
as to even outer space missions. The rst message was
sent through ARPANET (Advanced Research Projects
Agency Network) from computer to computer on
October 29th 1969 by the US Department of Defense. In
1971 the currently known electronic mail was
developed by Ray Tomlinson while creating ARPANET's
networked email system. In the modern era, the
usage of email has been increasing rapidly because of
its fast, low cost, e ective and very convenient to use
properties. Nowadays, as the use of smart phones and
the availability of networks is in abundance in most
places, usage of emails has also increased accordingly.
As email became a primary communication tool for
people, it nds application in almost all elds. Hence
the marketers also have found its potential as a
p
        <xref ref-type="bibr" rid="ref1 ref2">rimary marketing tool. In 2018</xref>
        , the number of email
users worldwide increased to 3.8 billion users as per the
Radicati research group Inc. and is expected to rise
to 4.1 billion users by 2021 as per the trend followed4.
The number of business and consumer emails produced
and delive
        <xref ref-type="bibr" rid="ref1 ref2">red per day in 2018</xref>
        reached 279 billion and
is set to grow at a rate of 4.4% annually resulting to
319.6 billion emails by the end of 2021. So, almost half
of the population uses email as a mode of
communication these days. With its increasing popularity and
ease of use, many people use it for inappropriate
activities by sending illegitimate or spam emails[CL98].
      </p>
      <p>Through spam emails people deliver all kinds of
malicious attacks. The frequently used type of malware
attack through spam emails are blended attacks. It
uses more than one method to deliver malware to an
internal network. Blended attacks often starts from
illegitimate emails, which may not contain malware
but provide links to compromised websites. Usually
attackers send emails in such a way that it looks
legitimate to a normal user by mixing authentic links and
false links that will contain URLs to some fake website.
As per the survey produced by IBM's X-Force research
team, more than half of the emails produced worldwide
are scam. The percentage of spam email amounted to
55.9% in the rst quarter of 2017 and shows gradual
increase in the coming years. Spam mails may also
consist of phishing mails hence resulting in leakage of
sensitive information at times. As reported by APWG,
the number of phishing email has increased from 68270
in 2014 to 106421 in 20155. According to Gartner
report6, 109 million users received phishing email. It
can be delivered using several ways, by attaching les
with malicious content or by sending a link to a
compromised website. There are various types of phishing
4https://www.radicati.com/wp/wpcontent/uploads/2017/01/Email-Statistics-Report-2017-2021Executive-Summary.pdf
5APWG, Apwg attack trends report, 2014
6Gartner Survey Shows Phishing Attacks Escalated in 2007
attacks that exist and these are discussed in detail by7.</p>
      <p>In most cases, the internet users fails to check the
authenticity of the emails and land in compromised
websites due to lack of education regarding security.
There is no method found till now, which provide
100% accuracy in checking if someone has fell in the
trap of phishing emails. But analyzing the header and
checking the content of the email body for spelling
or grammar mistakes or identifying emails seeking
for personal information will help in most of the
cases. Many works have been proposed to handle such
scenarios[CNU06], [FST07], [ZDL07], [ANNWN07],
[S+08], [TC10], [HAK13]. Recently, [CYT18] has
conducted a comprehensive literature survey on phishing
attacks and its approaches. Additionally, [GAP18]
discussed the current issues, future directions and
taxonomy of methods for defending against phishing attacks.
The methods which are based on the blacklisting and
heuristic approaches completely fails to detect the new
or the variants of existing phishing email[AGA+13].</p>
      <p>With all these traditional methods that we have
been following since ages, arti cial intelligence (AI) is
another technique which became popular in last few
decades. AI uses supervised learning classi cation
algorithms to do binary classi cation of phishing emails.
Machine learning methods rely on feature engineering
to extract body, header based features and a hybrid
of both to detect phishing email. This can perform
well in comparision to any of the previous methods
used for phishing email detection. It is because
training and building a classi er based on a given data
is much easier than to build a set of ltering rules.
Additionally, they have the capability to detect
variants of existing phishing email or the new email itself.
However, the recent development of machine
learning models, typically called as 'deep learning'
models have performed well in various long-standing
arti cial intelligence (AI) tasks that exist in the elds
of natural language processing, image processing and
speech recognition. The application of deep learning
techniques has transformed to various Cybersecurity
tasks [VSP18b], [VSP18a], [VSPSK18a], [VSPSK18b],
[VSP17d], [VSP17i], [VSP17f], [VSVG17], [VSP17b],
[VSP17h], [VSP17c], [VSP17a], [VSP17e], [VSP17g].
The application of word embedding is largely used in
text classi cation related to various domains [VKS17],
[BVP]. Recently, the application of deep learning with
word embedding is used for email spam detection[RK],
[EC]. Following, in this paper, we use word embedding
and Neural Bag-of-ngrams with deep learning models
and MLP respectively to distinguish an email as
phishing or legitimate.</p>
      <p>The sections in this paper are arranged as follows:
7PCworld, 2016 mobile world congress, 2016
Section 2 discusses the mathematical details of
algorithms. Section 3 includes task description, email
representation and proposed architecture. Section 4
provides results. At last, the conclusion is placed in
Section 5.
2</p>
    </sec>
    <sec id="sec-3">
      <title>Background</title>
      <p>The purpose of this section is to discuss the various
deep learning approaches namely multi-layer
perceptron (MLP), convolutional neural network (CNN),
recurrent neural network (RNN) and long short-term
memory (LSTM) architectures concisely.
2.1</p>
      <sec id="sec-3-1">
        <title>Arti cial neural networks (ANNs)</title>
        <p>An arti cial neural network (ANN) is a computational
model in uenced by the characteristics of biological
neural networks. Feed forward neural network (FFN),
convolutional neural network, recurrent neural
network (RNN) and long short-term memory (LSTM) are
various types of ANN.
2.1.1</p>
      </sec>
      <sec id="sec-3-2">
        <title>Feed forward neural network (FFN)</title>
        <p>Feed forward neural network (FFN) forms a directed
graph which is composed of nodes and edges. FFN
passes information along edges from one node to
another without forming a cycle. Multi-layer
perceptron (MLP) is one type of feed forward neural
network that contains 3 or more layers, speci cally input
layer, one or more hidden layer and an output layer in
which each layer has many neurons, called as units in
mathematical notation. The number of hidden layers
is selected through following ne-tuning mechanism.
The information is transformed from one layer to
another layer in forward direction without considering
the past values. Moreover, neurons in each layer are
fully connected. MLP is de ned mathematically as
O : RmXRn where m is the size of the input vector
x = x1; x2; x3::::xm and n is the size of the output
vector O(x) respectively. The computation in each hidden
layer hii is mathematically de ned as</p>
        <p>hii = f (wT x + bi)
hii : Rdi 1 ! Rdi , f : R ! R, where w 2 Rd di 1 ,
b 2 Rd di denotes the size of the input, f is non-linear
activation function, either sigmoid (values in the range
[0, 1]) or tangent function (values in the range [1, -1]).
(x) =</p>
        <p>1
1 + exp( z)
2.1.2</p>
      </sec>
      <sec id="sec-3-3">
        <title>Convolutional neural network (CNN)</title>
        <p>Convolution neural network (CNN) is most commonly
used in the eld of computer vision[LBH15]. This
(1)
(2)
has been used for text classi cation tasks[Kim14],
[KGB14]. This has been transformed for email spam
detection[LNRW]. CNN is fairly e ective and
considerable faster for training and predictive evaluation
in sequential data modeling problems. CNN
network contains convolution1d layer, pooling1d layer
(maxpooling1d or minpooling1d) and fully connected
network with non-linear activation function.
Generally, convolution1d layer extracts the optimal features,
maxpooling1d reduces the dimension of the CNN layer
features and fully connected layer is used for classi
cation. A CNN may involve varying numbers of such
convolutional layers and may nally be terminated
with linear fully connected or partially connected
layers. The number of layers and the number of lters
decides the performance of theses networks. More and
more abstract features are extracted in the higher
layers of such networks and thus the number of such
layers required heavily depends on the complexity and
non linearity of the data under analysis. Further, the
number of lters in each stage decides the number of
features extracted from each stage. Proper choice of
these numbers and tuning is a di cult task and
several literature discusses this[Che90]. More the number
of layers and lters, more is the computation required;
hence it is important that concise designs are to be
selected. Also, there is a high chance for the selection of
an over tted model which results in poor prediction
accuracy. Techniques like 'dropout' are implemented
during training to avoid this[SHK+14].
2.1.3</p>
      </sec>
      <sec id="sec-3-4">
        <title>Recurrent neural network (RNN)</title>
        <p>Recurrent neural network is a variant of traditional
FFNs introduced in the 1980's for time-series data
modeling[Elm90]. As RNN has a cyclic connection in
its units it facilitates to carryout previous time step
information in computing the current states. This has
obtained good performance in long standing arti cial
intelligence tasks related to the eld of computer
vision, natural language processing, speech processing
and others[BSF94]. The values hidden layer units are
estimated recurrently by a transition function tf
according to the current input vector xt and the previous
hidden state hit 1.</p>
        <p>hit =
0 t = 0
tf (hit 1; xt) otherwise
(3)
where tf is a mix of a ne transformation of xt and
hit 1 with the element wise non-linearity. This type
of transition function tf is trained using the
backpropagation through time (BPTT). While in the
process of backpropagating error across many time-steps,
the weight matrix has to be multiplied with the
gradient signal. This causes the vanishing issue when
a gradient becomes too small and exploding
gradient issue when a gradient becomes too large[HS97].
Long short-term memory (LSTM) is an extension of
RNN[GSC99], [GSS02], [MCCD13], memory block in
LSTM facilitates to handle the vanishing and
exploding gradient problem by forcing the constant error
ow.</p>
        <p>Generally, a memory block in LSTM is composed
of input gate (ig), forget gate (f g), output gate (og),
memory cell (m) and hidden state vector (hi) at each
time step t. The values of input gate (ig), forget gate
(f g), output gate (og) are in the range [0, 1]. The
transition function (tf ) for each LSTM units is written
as follows
igt = (wigxt + Pighit 1 + Qigmt 1 + big)
(4)
(6)
(7)
(8)
ogt = (wogxt + Poghit 1 + Qogmt 1 + bog)
mt = tanh(wmxt + Pmhit 1 + bm)
hit = ogt</p>
        <p>tanh(mt)
where xt is the input at time step t, is sigmoid
non-linear activation function, denotes element-wise
multiplication.
batch size
embedding size
skip window
num skips
num samples
learning rate
All experiments are run on GPU enabled
TensorFlow[ABC+16] in conjunction with
Keras[C+15]. All deep learning architectures are
trained using the back propagation through time
(BPTT)[Wer90] technique. TensorFlow is an open
source library for numeric computation using data
ow graphs. TensorFlow is the second generation of
machine learning platforms developed by the Google
Brain team after DistBelief. As the name suggests,
TensorFlow represents a problem with a data ow
model acting on N dimensional arrays (tensors). The
key advantage of the framework is its exibility;
the model can be mapped onto a range of hardware
platforms ranging from a mobile device to massive
GPU clusters.
3.1</p>
      </sec>
      <sec id="sec-3-5">
        <title>Task description</title>
        <p>
          The phishing email detection is a task on anti-phishing
shared task at rst security and privacy analytics
antiphishing sha
          <xref ref-type="bibr" rid="ref1 ref2">red task (IWSPA-AP 2018</xref>
          ) as co-located
with 8th ACM Conference on Data and Application
Security and Privacy8. Anti-phishing shared task is
an exercise in the eld of applied machine learning
and text analysis in the domain of Cybersecurity. The
email corpus was provided by the o
          <xref ref-type="bibr" rid="ref1 ref2">rganizers of the
(IWSPA-AP 2018</xref>
          )[EDMB+18]. The aim of the
antiphishing shared task is to build a classi er to detect
phishing email from spam and legitimate ones. The
given email corpus is highly unbalanced, this is
primarily to make the task relatable to a real world situation.
Both of the sub tasks belongs to unconstrained
cate8http://www.ycheng.o
          <xref ref-type="bibr" rid="ref1 ref2">rg/codaspy/2018</xref>
          /index.html
gory which means participants can use any of the other
external corpus during training. The Anti-phishing
shared task contains two sub tasks, The sub task 1
contains email samples without header and sub task
2 contains email samples with header. The detailed
statistics of training and testing email corpus of each
task is summarized in Table 5 and Table 6. The
detailed description of both sub task 1 and sub task 2
corpus and the baseline detection methodologies is
summarized by the shared task organizers[EDB+18].
3.2
        </p>
      </sec>
      <sec id="sec-3-6">
        <title>Problem de nition</title>
        <p>Let E = fe1; e2; eng be a set of email samples and
C = fc1; c2; cng be a set of email types such as
legitimate or phishing, where n denotes the number of
email samples. The task is to classify each given email
sample into either legitimate or phishing.
3.3</p>
      </sec>
      <sec id="sec-3-7">
        <title>Text Representation</title>
        <p>The rst step is to map each email sample into their
corresponding numeric vector representation. Two
types of distributed text representation are mapped
for email representation, they are (1) Word2vec (2)
Neural Bag-of-ngrams
1. Word2vec : Representing a word in dense
vector form is called as word embedding, which is
a projection of words (tokens) into vector space.
The vector we got from projection will capture
semantic properties which will be very helpful in
improving the performance of the natural
language processing (NLP) system to get a better
result than traditional bag-of-words
representations. For this work, we implement skip-gram
module[MCCD13], [MSC+13], [Ron14] so as to
predict the context based on the given current
word.
2. Neural Bag-of-ngrams: Traditional
bag-ofwords is a representation of multiset belonging
words where the word order and grammar are
ignored, on the other hand in Bag-of-ngram a
token is represented with one-hot representation
with the sum of n-gram vectors. Bag-of-ngram
is a sparse vector representation where the
semantics of the text is disregarded. To
overcome this, Neural Bag-of-ngrams is introduced
[LLZ+17], [JGBM16]. Neural Bag-of-ngrams
vectors is a dense, real-valued vector representation
and also captures the semantics of the context.
It is the combination of Bag-of-ngram and neural
word embedding. It is robust, simple and exible.
3.4</p>
      </sec>
      <sec id="sec-3-8">
        <title>Proposed Architecture</title>
        <p>The proposed architecture to identify phishing emails
is shown in Fig. 1. This same architecture is used for
both the sub tasks. The proposed tool is
DeepAntiPhishNet9 made publically available. This can be
adopted to any of the security text classi cation. This
can work on any other language as well as the code
mixed language. The architecture contains the
following modules.</p>
        <p>
          Representation of emails: Two types of email
representation are used. They are (1) word2vec
9github.com/vinayakuma
          <xref ref-type="bibr" rid="ref1 ref2">rr/IWSPA-AP-2018</xref>
          (2) Neural Bag-of-ngrams. Table 1 incudes the
detailed con guration details of word2vec. In word
embedding, we append every word's embedding
so the text's representation is a variable length
embeddings. Neural Bag-of-ngrams sum every
word's embedding up to a xed one.
        </p>
        <p>Deep learning: The dense vector that
is obtained from the word2vec and Neural
Bag-of-ngrams are passed as input to the
CNN/RNN/LSTM and MLP network. All these
various algorithms capture the appropriate
feature representation and pass into the fully
connected layer for classi cation. The detailed
conguration details of MLP, CNN, RNN and LSTM
architecture is reported in Table 7, 8, 9 and 10
respectively.</p>
        <p>Classi cation: The units in this layer have
connection to every other unit in the succeeding layer.
That's why this layer is called as fully-connected
layer. It contains sigmoid non-linear activation
function, which gives values 0 for legitimate and
1 for phishing. The prediction loss for both the
sub tasks is estimated using binary cross entropy,
as given below
loss(p; e) =
1 N</p>
        <p>X[ei log(pi)+(1 ei) log(1 pi)]
N i=1
(9)
value
where p is a vector of predicted probability for all
samples in testing corpus, e is a vector of expected
class label, values are either 0 or 1.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>Though the sub tasks are unconstrained, only the
private training corpus is used with word2vec and
Neural Bag-of-ngrams. Initially, preprocessing is done on
the corpus. Preprocessing includes conversion of all
characters to lower case, ignoring punctuation marks
and special characters and assigning a unique
number for the unknown word. This is primarily due to
the fact that distinguishing between the lowercase and
higher case letters might ends up in regularization
issue. Word vector of dimension 200 is estimated on the
preprocessd data by using the word2vec and Neural
Bag-of-ngrams. These methods captures the syntactic
and semantic similarity of words exist in phishing and
legitimate emails. These word vectors are passed to
the deep learning algorithms. These algorithms learns
the abstract and high level feature representation and
in turn passes into the fully connected layer for
classication.</p>
      <p>To nd the hyper parameters existing in word2vec
and deep learning model, the given training data has
been randomly shu ed and split into 73% training and
27% testing. The best hyperparameters of word2vec,
MLP, CNN, RNN and LSTM is displayed in Table 1, 7,
8, 9 and 10 respectively. The 10-fold cross validation
accuracy of both the email representation with deep</p>
      <p>We have submitted two runs for sub task 1, where
the rst run is based on RNN with word2vec and
second run is based on LSTM with word2vec. This is due
to the fact that the 10-fold cross validation of RNN and
LSTM model are closer. We have submitted one run
for sub task 2. It is based on the MLP with Neural
Bag-of-ngrams. The model performance on the
submitted runs has been evaluated by the shared task
organizers. Shared task organizers evaluated
submitted runs based on the true positive (TP), true negative
(TN), false positive (FP) and false negative (FN) and
are reported in Table 3. From the Table 3, we
estimated the accuracy, precision, recall and f1-score and
are reported in Table 2. Accuracy, precision, recall and
f1-score are estimated using the following equations
value</p>
      <p>From the Table 2, 3 and 4, we can observe that
the model has shown less performance during testing
when compared to training. This is due to that the
email corpus used during training process is very less.
Moreover, the unbalanced email corpus has made the
model to be biased. This can be alleviated by training
a word2vec model on the large number of email corpus
and training by using the highly complex deep learning
model.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Detecting phishing email from spam and legitimate
ones is considered as an important aspect in the eld
of Cybersecurity. This is primarily due to the fact that
most of the internet's tra c was caused by phishing
emails in the previous years. In this work, we use
word embedding and Neural Bag-of-ngrams with deep
learning algorithms such as CNN/RNN/LSTM and
traditional neural network, MLP. The performance
of deep learning methods with word embedding and
Neural Bag-of-ngrams with MLP is closer. Moreover,
the LSTM network has performed well in both the
sub tasks. Due to computational cost and other
constraints, we were not able to train complex deep
learning architecture. The performance of the system can
be enhanced with more complex deep learning
architecture. These architectures can be trained by using
advanced hardware and following distributed approach
in training that we are incompetent to try.</p>
      <p>Both the sub tasks belong to the unconstrained
category which means any other corpus can be used
during training. The given corpus for both the sub tasks
are highly imbalanced. Even though the tasks are
unconstrained, we haven't used any other external data
sources. With highly imbalanced corpus, the proposed
methodology achieve considerable phishing email
detection rate in both the sub tasks. The phishing email
detection rate of the proposed methodology can be
easily enhanced by adding additional extra publically
available or private data sources. This will be
considered as one of the signi cant direction towards the
future work.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This research was supported in part by Paramount
Computer Systems. We are grateful to NVIDIA
India, for the GPU hardware support to the research
grant. We are grateful to Computational Engineering
and Networking (CEN) department for encouraging
the research.
[ABC+16]
[AGA+13]</p>
      <sec id="sec-6-1">
        <title>Mart n Abadi, Paul Barham, Jianmin</title>
        <p>Chen, Zhifeng Chen, Andy Davis,
Jeffrey Dean, Matthieu Devin, Sanjay
Ghemawat, Geo rey Irving, Michael Isard,
et al. Tensor ow: A system for
largescale machine learning. In OSDI,
volume 16, pages 265{283, 2016.</p>
      </sec>
      <sec id="sec-6-2">
        <title>Ammar Almomani, BB Gupta, Samer</title>
        <p>Atawneh, A Meulenberg, and Eman
Almomani. A survey of phishing email</p>
        <p>ltering techniques. IEEE
communications surveys &amp; tutorials, 15(4):2070{
2090, 2013.
[ANNWN07] Saeed Abu-Nimeh, Dario Nappa,
Xinlei Wang, and Suku Nair. A
comparison of machine learning techniques for
phishing detection. In Proceedings of the
anti-phishing working groups 2nd annual
eCrime researchers summit, pages 60{
69. ACM, 2007.
[BSF94]
[BVP]</p>
      </sec>
      <sec id="sec-6-3">
        <title>Yoshua Bengio, Patrice Simard, and</title>
        <p>Paolo Frasconi. Learning long-term
dependencies with gradient descent is
difcult. IEEE transactions on neural
networks, 5(2):157{166, 1994.</p>
      </sec>
      <sec id="sec-6-4">
        <title>Barathi Ganesh Hullathy Balakrishnan,</title>
        <p>Anand Kumar Madasamy
Vinayakumar, and Soman Kotti Padannayil. Nlp
cen amrita@ smm4h: Health care text
classi cation through class embeddings.
[C+15]</p>
        <p>Francois Chollet et al. Keras, 2015.</p>
      </sec>
      <sec id="sec-6-5">
        <title>Felix A Gers, Jurgen Schmidhuber, and Fred Cummins. Learning to forget: Continual prediction with lstm. 1999. [GSS02]</title>
      </sec>
      <sec id="sec-6-6">
        <title>R Vinayakumar, KP Soman, and Prabaharan Poornachandran. Applying convolutional neural network for network intrusion detection. In Advances in</title>
        <p>Computing, Communications and
Informatics (ICACCI), 2017 International
Conference on, pages 1222{1228. IEEE,
2017.</p>
        <p>R Vinayakumar, KP Soman, and
Prabaharan Poornachandran. Applying deep
learning approaches for network tra c
prediction. In Advances in
Computing, Communications and Informatics
(ICACCI), 2017 International
Conference on, pages 2353{2358. IEEE, 2017.
R Vinayakumar, KP Soman, and
Prabaharan Poornachandran. Deep android
malware detection and classi cation.
[VSP17d]
[VSP17e]
[VSP17f]
[VSP17g]
[VSP17h]
[VSP17i]</p>
        <p>In Advances in Computing,
Communications and Informatics (ICACCI),
2017 International Conference on, pages
1677{1683. IEEE, 2017.</p>
      </sec>
      <sec id="sec-6-7">
        <title>R Vinayakumar, KP Soman, and Prabaharan Poornachandran. Deep encrypted text categorization. In Advances in</title>
        <p>Computing, Communications and
Informatics (ICACCI), 2017 International
Conference on, pages 364{370. IEEE,
2017.</p>
        <p>R Vinayakumar, KP Soman, and
Prabaharan Poornachandran. Evaluating
effectiveness of shallow and deep networks
to intrusion detection system. In
Advances in Computing, Communications
and Informatics (ICACCI), 2017
International Conference on, pages 1282{
1289. IEEE, 2017.</p>
      </sec>
      <sec id="sec-6-8">
        <title>R Vinayakumar, KP Soman, and Praba</title>
        <p>haran Poornachandran. Evaluating
shallow and deep networks for secure
shell (ssh) tra c analysis. In Advances
in Computing, Communications and
Informatics (ICACCI), 2017 International
Conference on, pages 266{274. IEEE,
2017.</p>
      </sec>
      <sec id="sec-6-9">
        <title>R Vinayakumar, KP Soman, and Prabaharan Poornachandran. Evaluation of recurrent neural network and its variants for intrusion detection system (ids).</title>
        <p>International Journal of Information
System Modeling and Design (IJISMD),
8(3):43{63, 2017.</p>
        <p>R Vinayakumar, KP Soman, and
Prabaharan Poornachandran. Long
shortterm memory based operation log
anomaly detection. In Advances in
Computing, Communications and
Informatics (ICACCI), 2017 International
Conference on, pages 236{242. IEEE, 2017.
R Vinayakumar, KP Soman, and
Prabaharan Poornachandran. Secure shell
(ssh) tra c analysis with ow based
features using shallow and deep
networks. In Advances in Computing,
Communications and Informatics (ICACCI),
2017 International Conference on, pages
2026{2032. IEEE, 2017.
[VSP18b]
[VSVG17]
[Wer90]
[ZDL07]</p>
      </sec>
      <sec id="sec-6-10">
        <title>R Vinayakumar, KP Soman, KK Senthil Velan, and Shaunak Ganorkar. Evaluating shallow and deep networks for ransomware detection and classi cation. In</title>
        <p>Advances in Computing,
Communications and Informatics (ICACCI), 2017
International Conference on, pages 259{
265. IEEE, 2017.</p>
      </sec>
      <sec id="sec-6-11">
        <title>Paul J Werbos. Backpropagation</title>
        <p>through time: what it does and how
to do it. Proceedings of the IEEE,
78(10):1550{1560, 1990.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>R</given-names>
            <surname>Vinayakumar</surname>
          </string-name>
          , KP Soman, and
          <string-name>
            <given-names>Prabaharan</given-names>
            <surname>Poornachandran</surname>
          </string-name>
          .
          <article-title>Detecting malicious domain names using deep learning approaches at scale</article-title>
          .
          <source>Journal of Intelligent &amp; Fuzzy Systems</source>
          ,
          <volume>34</volume>
          (
          <issue>3</issue>
          ):
          <volume>1355</volume>
          {
          <fpage>1367</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>R</given-names>
            <surname>Vinayakumar</surname>
          </string-name>
          , KP Soman, and
          <string-name>
            <given-names>Prabaharan</given-names>
            <surname>Poornachandran</surname>
          </string-name>
          .
          <article-title>Evaluating deep learning approaches to characterize and classify malicious urls</article-title>
          .
          <source>Journal of Intelligent &amp; Fuzzy Systems</source>
          ,
          <volume>34</volume>
          (
          <issue>3</issue>
          ):
          <volume>1333</volume>
          {
          <fpage>1343</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>[VSPSK18a] R Vinayakumar</surname>
            , KP Soman, Prabaharan Poornachandran, and
            <given-names>S Sachin</given-names>
          </string-name>
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          .
          <article-title>Detecting android malware using long short-term memory (lstm)</article-title>
          .
          <source>Journal of Intelligent &amp; Fuzzy Systems</source>
          ,
          <volume>34</volume>
          (
          <issue>3</issue>
          ):
          <volume>1277</volume>
          {
          <fpage>1288</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>[VSPSK18b] R Vinayakumar</surname>
            , KP Soman, Prabaharan Poornachandran, and
            <given-names>S Sachin</given-names>
          </string-name>
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          .
          <article-title>Evaluating deep learning approaches to characterize and classify the dgas at scale</article-title>
          .
          <source>Journal of Intelligent &amp; Fuzzy Systems</source>
          ,
          <volume>34</volume>
          (
          <issue>3</issue>
          ):
          <volume>1265</volume>
          {
          <fpage>1276</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>