A.R.E.S : Automatic Rogue Email Spotter
                                                     Crypt Coyotes


                      Vysakh S Mohan, Naveen J R, Vinayakumar R, Soman KP
                     Center for Computational Engineering and Networking(CEN),
                               Amrita School of Engineering, Coimbatore
                                 Amrita Vishwa Vidyapeetham, India
                            vsmo92@gmail.com,naveenaksharam@gmail.com


                                                                 1   Introduction

                        Abstract                                 Internet and staying connected through it is what dis-
                                                                 tinguishes this era from the previous. More and more
                                                                 people rely on the internet for their communication as
    Be it formal or casual, email is undoubtedly                 well as data transaction requirements. Email has rev-
    the most popular means of communication in                   olutionized the way people communicate over the web.
    modern times. Their popularity owes to the                   From its inception, electronic mails have outgrown
    fact that they are reliable, fast and more over              its real world counterpart to become mainstream and
    free to use. One issue that plagues this oth-                serve as both casual and official way of passing a mes-
    erwise solid technology is phishing emails re-               sage. Now we have several service providers offering
    ceived by users. Phishing emails have always                 email platforms for free and with a plethora of fea-
    bothered users as it’s a huge waste of stor-                 tures. This means that the number of people taking
    age, time, money and resource to any user.                   advantage of these services have grown dramatically.
    Many previous attempts to eradicate or at                    This mass adoption is one aspect any malignant adver-
    least block phishing emails have been deemed                 sary could use to his benefit. Such malignant emails
    futile. This work uses word embedding as text                are called spam[CM01], and they are unsolicited as
    representation for supervised classification ap-             well as junk info usually unwanted for the user. They
    proach to identify phishing emails. Ruled                    are commonly characterized by the following: they are
    based and machine learning models with fea-                  mass mailed, may contain explicit content, useless ad-
    ture engineering were attempted but failed                   vertisements, fraudulent, may contain hidden links to
    due to the ever increasing ways of threats                   phishing websites etc. On a personal front the user
    and lack of scalability of the model. Deep                   could face issues like, annoyance due to irrelevant info,
    learning based models have shown to surpass                  unwanted use of bandwidth, waste of storage, makes
    the older techniques in spam email detection.                the communication channel less productive via loss of
    This work aims at attempting the same using                  time sorting junk mails, unnecessary use of comput-
    a CNN/RNN/MLP network with Word2vec                          ing power, causes spread of viruses, loss of money via
    embeddings on phishing email corpus, where                   phishing etc.
    Word2vec helps to capture the synaptic and
                                                                    These issues have brought immense focus on safety
    semantic similarity of phishing and legitimate
                                                                 of users against spam emails. Massive pool of users
    emails in an email corpus. This work aims to
                                                                 using these platforms is one reason for it being tar-
    show the abilities of word embedding have to
                                                                 geted more often. It is an inexpensive means to gain
    solve issues related to cybersecurity use cases.
                                                                 access to millions of people, which forces adversaries
                                                                 to target it more often. The most dangerous type of
Copyright c by the paper’s authors. Copying permitted for        emails are the spam emails[KRA+ 07]. It may be via a
private and academic purposes.
                                                                 spam email server or from personal servers containing
In: R. Verma, A. Das (eds.): Proceedings of the 1st AntiPhish-
ing Shared Pilot at 4th ACM International Workshop on Se-
                                                                 malicious URLs that could direct the users to phish-
curity and Privacy Analytics (IWSPA 2018), Tempe, Arizona,       ing sites. This is a challenging task and many solu-
USA, 21-03-2018, published at http://ceur-ws.org                 tions have been devised to solve this problem over the
past few years, but they all come with some downsides.      and co-relation issues are ignored[SDHH98], that is,
One reason it gets challenging is the variety of ways       the multi variate nature of the problem breaks down
in which the attacker can serve a spam email. A fre-        to a uni-variate one without compromising on accu-
quently used method is the blended attack. Malware          racy. Different authors have tried to incorporate mod-
delivery through such attacks may vary. Usually the         ifications on top of the naive bayes pipeline, but the
email itself may not contain the malware, but possibly      approach was unable to find the correlation between
contain a link to some compromised website. These           words and the algorithm failed in certain tasks. In
emails may look normal, but would contain a mix of          2004 Chih-Chin Lai and Tsai[LT04] introduced the
legitimate as well as malicious content. A former re-       TF-IDF, K-NN and SVM to overcome the issues in
search by IBM’s X-Force team, found that more than          the email filtering task. SVM, TF-IDF got a satisfac-
50% of the emails produced worldwide are fraudulent.        tory result while K-NN got worst result among them.
These figures are going to increase in the subsequent       Blanzieri and Bryl came up with feature extraction
years.                                                      methods in[BB08], along with SVM. During this time,
    One reason such attacks are successful is the care-     unsupervised machine leaning techniques were also de-
lessness from the generic user. Most internet users are     veloped. Data were clustered into spam and ham.
illiterate when it comes to cybersecurity and they sim-     Whissell and Clarke[WC11] in 2011, came up with
ply ignore the safety precautions that need to be exer-     a novel research on spam clustering, which attained
cised in the online space. There are no sure shot ways      state of art result compared to all the previous meth-
to check if a person has been a victim to such attacks,     ods. Since the spam filtering is a diverse area, ensem-
but can be prevented by being a bit cautious. You           ble methods (combining different algorithms on same
could check the email headers and check for grammat-        problem), like boosting and bagging[GGWM+ 10], are
ical mistakes. But these may not be sufficient when the     applied to get effective classification. Caruana and
scale of such attacks escalates. These type of states re-   Li, (2012) focused[CL12] on distributed computuing
quire some automated solution to detect spam email.         paradigram using SVM and ANN by removing the in-
    Emails headers can help to a certain extent. They       teroperability and implementation issues.
can be used as features to some machine learning based
classifiers[LT04, S+ 09]. The advantage of using header
features compared to body features have been detailed
in[ZZY04]. Header features like sender address, mes-
sage ID etc. were used in[WC07] to make the detec-              Machine learning models usually rely on some sort
tion.                                                       of engineered features that are generated from the
    Most of the popular machine learning techniques         data and has been proved to surpass the accuracy of
consists of two steps: obtain the proper features rep-      its predecessors in spam email classification[FRID+ 07,
resentation from the data and use these features for        AAY11], whereas, very few machine learning models
learning and predicting the system. First step focuses      for phishing emails exist today and most of them are
on extracting useful info from the given URL, which         in their infancy. With acquired domain knowledge,
is stored as a vector so that the algorithm can fit dif-    various feature engineering strategies are employed on
ferent machine learning based models in it. Different       the data to build the model[SAZ18], [PHS18], [VH13],
categories of features have been taken[SLH17]. Lexi-        [VSH12], [MG18], [HDC+ 18], [MBA18]. A main plus
cal features, content features, host based features and     to this method is the reduced effort to train the clas-
context features are some of the popular ones. An al-       sifier rather than developing complex rules for a filter.
gorithm requires some form of mathematical represen-        This feature engineering method could also deem the
tation to work with. This work uses Word2vec embed-         system vulnerable to manipulation and the model may
ding methods for effective representation of the data.      not scale well to newer threats. Deep learning mod-
    Spam filtering is a supervised classification problem   els can be used to overcome this issue as they learn
where the problem is considered as a binary classifi-       the features themselves and modify it according to
cation task with 2 classes: legitimate (good) emails        newer inputs. On top of that these models are com-
and spam emails. Tretyakov used methods like naive          paratively more accurate and scalable. Nowadays deep
bayes and K-NN machine learning algorithms for spam         learning models combined with word embeddings have
detection[Tre04], which doesn’t deal with feature se-       given good performance for various cybersecurity use-
lection but beneficial for beginners. Spam detection        cases[VSP18a], [VSP18b], [VSP17], [LF17], [SKP18].
or automatic email filtering starts with statistical ap-    This motivated the use of word embeddings with deep
proaches primarily. The development began with pop-         learning models like Multi-Layer Perceptron (MLP),
ular naive bayes approaches, which reduced the prob-        Recurrent Neural Network (RNN), Convolutional Neu-
lem into a space where dependencies between the data        ral Networks and Long Short Term Memory (LSTM).
2     Background                                             which then slides over the entire rows and columns
                                                             of the matrix. In this matrix each individual row is a
This section details the theory behind various deep
                                                             vector representing one word, more accurately speak-
learning models used.
                                                             ing, these are word embedding models like Glove1 or
2.1   Word2vec                                               Word2vec2 . This work used Word2vec model before
                                                             applying CNN in this task. CNN performs well on
Word2vec is a model proposed by Mikalov[MSC+ 13]             sequential data with faster training times and is ex-
to learn the word embedding which is inspired                ceptional for predictive analysis. CNN normally con-
by distributed representation introduced by                  sist of an input layer followed by convolutional layers,
Hinton[HMR+ 86], but in the Word2vec frame-                  maxpooling layers for dimensionality reduction pur-
work, word representation is learned using a shallow         pose and fully connected layers with a specific non-
neural network. The fundamental assumption in                linear activation function (ReLU in this work). In this
word embedding or distributional methods is that,            phishing email detection task (text based), one dimen-
words with similar sense tends to happen in similar          sional maxpooling layers and fully connected layers are
context and they capture the similarity between              used. Filters used in this network model slides above
words[BG17], [BG18]. Word2vec is a popular model             the embedding vector to output a continuous value at
to generate word embeddings on text data. They have          each step. This outputs better representations of the
the ability to reproduce linguistic context of words         word vectors. For text based applications 1D CNN is
through training their shallow two layer architectures.      used.
The input to the Word2vec model may be a huge
corpus and the generated outputs are vectors in some         2.3   Multi-layer perceptron (MLP)
multi-dimensional space, with each unique word in
the corpus have a corresponding vector associated            Rosenblatt introduced the concept of a single percep-
with it. This makes learning the word representation         tron. Multi-layer perceptron (MLP) is typically a net-
significantly faster than the previous methods. In the       work of perceptrons or simple neurons. MLP consists
Word2vec framework the distributed representation            of one input and output layer. Dimensions of input
of the words in the vocabulary is learned in an              output nodes depends on the no of sample vectors and
unsupervised way. Learning can be done via two               the no of label vectors present in the input data. In be-
architectures like skip-gram and continuous bag of           tween these two layers, many hidden layers are present.
words.                                                       There exist layers where the output is being fed as in-
                                                             put to the following hidden layers and each unit does
                 N
             1 X       X                                     a relatively straight forward computation. It takes in-
                                 logp(Qn+k |Qk )      (1)    put X multiplies it by a weight W , performs a sum-
             N n=1
                     −sks,j6=0
                                                             mation and passes all of that through an activation
   Skip-gram method tries to maximize the average            function to yield the output. Perceptrons compute a
probability value of the word sequence Q1 ,Q2 ,...QN .       score or a single output from sequential inputs that
Here ’s’ indicates training context size that is directly    are usually real valued. This calculated score is used
related to the center word Qn and p(Qn+k |wk ) is soft-      for backward pass, where cost function is calculated by
max function. In the skip-gram model, the context or         matching wrongly predicted output to the truth label
surrounding word is predicted given the centre word          value, and is expressed as root mean square (RMS)
as the input and in Continuous Bag of Words(CBOW)            error value. This RMS error is minimized using gra-
model, given the surrounding words the centre word is        dient descent technique and optimum weight and base
predicted.                                                   value is figured out from this network model. It uses
                                                             activation functions like sigmoid or tanh to produce
2.2   Convolutional neural Nets (CNN)                        the output. One nature of MLP is the fully connected
CNN is commonly used for computer vision tasks,              architecture within its deep layers.
where their local receptive field is advantageous for fea-
ture learning in images. CNN models are also used for        2.4   Recurrent neural network (RNN)
text classification tasks. CNN can be thought of as an
                                                             The problem associated with MLP and CNN model is
artificial neural network that has the ability to pick out
                                                             that every input and outputs vectors are independent.
or detect patterns and make sense out of them. These
                                                             Or in other words above models can’t capture the se-
pattern detection makes CNN useful for data analysis.
                                                             quential info between the words. In phishing email
CNN has hidden layers called convolutional layers are
a tad bit different from MLP. For each convolutional           1 https://nlp.stanford.edu/projects/glove

layers, the number of filters needs to be specified,           2 https://www.tensorflow.org/tutorials/Word2vec
                                  Table 1: Hyper Parameter for Word2vec Model
                                                  Hyperparameter
                Batch-Size      250               The number of training samples required
             Embedding-Size 300                            Word vector dimension
               Skip-Window        7    Context window, five words before and after each word
                Num-skips        12       How many prediction pairs are selected from the window
               Num-sampled      128                      Number of negative samples
               Learning rate     0.1 Determines how quickly or slowly model update the parameter
                  n-epoch        50                   No of (forward+backward pass)
detection task it is highly useful to identify the asso-
ciated words for classification purposes. RNN model
is popular in time series and sequence data analysis.
It can take variable size inputs and return a variable
size output. State of recurrent NN at time ’T’ is a
function of its old state and the input at the time ’T’.
Since it is storing previous state of system we can say
that RNN has a ’memory’ to capture sequential info
between words. Recurrent neural net is a varied it-
eration of feed forward nets. The cyclic connections
between the neurons makes way for results from pre-
vious time step to compute the current state, in a way
remembering the temporal information about the in-
put data. This makes RNN learn well on data with
long term dependency, like for natural language pro-
cessing and speech processing applications.

3   DATASET DESCRIPTION
The dataset[EDMB+ 18] used is provided at the 4th
ACM International Workshop on Security and Privacy                 Figure 1: Proposed Architecture
Analytics shared task[EDB+ 18]. The task was to de-
tect phishing emails. Details of the dataset is shown   from the header and the methodology used for conver-
in Table2 & 3                                           sion of raw email samples to feature vectors the same
                                                        for both the sub tasks. In both the sub tasks, the raw
        Table 2: Training Dataset details               email corpus is fed to the embedding layer that uses
   Category       Legitimate Phishing          Total    Word2vec model to generate distributed word embed-
 With No header       5088         612         5700     ding. The learned word embedding model is used to
  With header         4082         501         4583     represent the input data, which is then fed to a deep
                                                        learning models. The hyperparameters used to create
                                                        Word2vec model is detailed in Table 1.
         Table 3: Testing Dataset details
                                                           The deep learning models learn additional features
       Training Dataset Data Samples
                                                        which will be pushed to the fully connected layer. Pre-
        With No header           4300
                                                        vious work on similar problem suggests to use RNN to
          With header            4195                   solve such tasks, but in order to have a better analysis
                                                        on the performance of different models we incorpo-
                                                        rated CNN and MLP to this work. Finally, due to the
4   Experiments and Result                              binary nature of this task we used sigmoid to clas-
The proposed tool is christened A.R.E.S which stands    sify legitimate emails from the phishing based on its
for Automatic Rogue Email Spotter. A detailed vi-       threshold and used binary cross entropy for loss reduc-
sualization of the model is shown in Fig 1. The ar-     tion.
chitecture is a combination of word embedding with a       From the statistics shown in Table 4 and 5, the word
CNN, RNN, and MLP. This task is categorized into        embedding model along with an MLP network gives
2 subs tasks, which are emails with ’no header’ and     a commendable score for both the sub tasks. Fur-
’with header’. We didn’t extract any other features     ther, when the same word embedding model is passed
                                   Table 4: Statistics of training results
                                                                     10-fold cross
                               Method                Task
                                                                 validation accuracy
                        Word embedding + MLP Sub task 1                   0.921
                        Word embedding + CNN Sub task 1                   0.952
                        Word embedding + RNN Sub task 1                   0.951
                        Word embedding + MLP Sub task 2                   0.901
                        Word embedding + CNN Sub task 2                   0.912
                        Word embedding + RNN Sub task 2                   0.931


                                       Table 5: Statistics of test results
                               Method                  Task       TP TN FP FN
                        Word embedding + CNN Sub task 1 3479 237 238 346
                        Word embedding + RNN Sub task 1 3446 224 251 379
                        Word embedding + RNN Sub task 2 3193 363 133 506
through CNN and RNN, it registered an overall im-         olating the training corpus and by adding deeper lay-
proved score from the previous MLP model. Specif-         ers to infuse more feature learning capabilities to the
ically, the CNN gave the highest score for sub task       model. This work also demonstrates the possibilities
1, whereas RNN gave the best score for sub task 2,        of amalgamating techniques from text analytics and
over the validation set. The MLP model with 6 hid-        deep learning for cybersecurity use cases.
den layers of size 300 are used primarily for building
the base model. The activation function is ReLU           Acknowledgements
and the dropout is 0.01. Model is implemented in
Keras, which used the best validation score among         This research was supported in part by Paramount
500 epochs. Then the model structure is extended          Computer Systems. We are grateful to NVIDIA In-
into CNN and RNN neural network models. CNN is            dia for the GPU hardware support to the research
implemented with 256 filters and maxpooling is used       grant. We are grateful to Computational Engineering
for dimensionality reduction between the dense lay-       and Networking (CEN) department for encouraging
ers. All experiments were performed on GPU enabled        the research.
                  +
TensorFlow[ABC 16] in conjunction with the Keras
framework[C+ 15]. All models are trained using back-      References
propagation.                                              [AAY11]          Tiago A Almeida, Jurandy Almeida,
                                                                         and Akebo Yamakami. Spam filtering:
5   Conclusion                                                           how the dimensionality reduction affects
Phishing emails have always plagued even the average                     the accuracy of naive bayes classifiers.
user and classifying the same properly is a challeng-                    Journal of Internet Services and Appli-
ing task. Where former machine learning techniques                       cations, 1(3):183–200, 2011.
failed, deep learning models have provided state of
                                                          [ABC+ 16]      Martı́n Abadi, Paul Barham, Jianmin
the art performance. The CNN/RNN/MLP architec-
                                                                         Chen, Zhifeng Chen, Andy Davis, Jef-
ture along with the Word2vec embeddings used in this
                                                                         frey Dean, Matthieu Devin, Sanjay
work has outperformed former rule based and machine
                                                                         Ghemawat, Geoffrey Irving, Michael Is-
learning based models. During training the model gave
                                                                         ard, et al. Tensorflow: A system for
high accuracy, while the test accuracy were compara-
                                                                         large-scale machine learning. In OSDI,
tively low due to the highly unbalance nature of the
                                                                         volume 16, pages 265–283, 2016.
dataset. In the proposed system, no external data
was provided to train the model. CNN had a slightly       [BB08]         Enrico Blanzieri and Anton Bryl. A
better performance over RNN model on subtask1 and                        survey of learning-based techniques of
RNN perform well for subtask2, on the test data. For                     email spam filtering. Artificial Intelli-
subtask 1, the CNN managed a score of 95.2%, al-                         gence Review, 29(1):63–92, 2008.
most comparable to RNN and for subtask 2, the RNN
managed a score of 93.1%, making the RNN a better         [BG17]         Reshma U. Anand Kumar M. So-
and more versatile overall performer. More accuracy                      man K.P. Barathi Ganesh, H.B.
can be achieved with these trained model by extrap-                      Representation of target classes
             for   text     classification-amrita-cen-   [HDC+ 18]   Reza Hassanpour, Erdogan Dogdu,
             nlp@rusprofiling pan 2017. In CEUR                      Roya Choupani, Onur Goker, and Na-
             Workshop Proceedings, pages 25–27,                      zli Nazli. Phishing e-mail detection by
             2017.                                                   using deep learning algorithms. In Pro-
                                                                     ceedings of the ACMSE 2018 Confer-
[BG18]       Anand Kumar M. Soman K.P.                               ence, page 45. ACM, 2018.
             Barathi Ganesh, H.B.         From vec-
             tor space models to vector space models     [HMR+ 86]   Geoffrey E Hinton, James L McClel-
             of semantics.    In Lecture Notes in                    land, David E Rumelhart, et al. Dis-
             Computer Science (including subseries                   tributed representations. Parallel dis-
             Lecture Notes in Artificial Intelligence                tributed processing: Explorations in the
             and Lecture Notes in Bioinformatics),                   microstructure of cognition, 1(3):77–
             10478 LNCS., pages 50–60, 2018.                         109, 1986.
                                                         [KRA+ 07]   Ponnurangam Kumaraguru,           Yong
[C+ 15]      François Chollet et al. Keras, 2015.                   Rhee, Alessandro Acquisti, Lorrie Faith
                                                                     Cranor, Jason Hong, and Elizabeth
[CL12]       Godwin Caruana and Maozhen Li. A
                                                                     Nunge. Protecting people from phish-
             survey of emerging approaches to spam
                                                                     ing: the design and evaluation of an
             filtering.  ACM Computing Surveys
                                                                     embedded training email system. In
             (CSUR), 44(2):9, 2012.
                                                                     Proceedings of the SIGCHI confer-
[CM01]       Xavier Carreras and Lluis Marquez.                      ence on Human factors in computing
             Boosting trees for anti-spam email fil-                 systems, pages 905–914. ACM, 2007.
             tering.   arXiv preprint cs/0109015,        [LF17]      Ruidan Li and Errin W Fulp. Evolu-
             2001.                                                   tionary approaches for resilient surveil-
                                                                     lance management. In 2017 IEEE Se-
[EDB+ 18]    Ayman Elaassal, Avisha Das, Shahryar                    curity and Privacy Workshops (SPW),
             Baki, Luis De Moraes, and Rakesh                        pages 23–28. IEEE, 2017.
             Verma. Iwspa-ap: Anti-phising shared
             task at acm international workshop on       [LT04]      Chih-Chin Lai and Ming-Chi Tsai. An
             security and privacy analytics.    In                   empirical performance comparison of
             Proceedings of the 1st IWSPA Anti-                      machine learning methods for spam e-
             Phishing Shared Task. CEUR, 2018.                       mail categorization. In Hybrid Intelli-
                                                                     gent Systems, 2004. HIS’04. Fourth In-
[EDMB+ 18] Ayman Elaassal, Luis De Moraes,                           ternational Conference on, pages 44–48.
           Shahryar Baki, Rakesh Verma, and                          IEEE, 2004.
           Avisha Das. Iwspa-ap shared task email
           dataset, 2018.                                [MBA18]     Youness     Mourtaji,     Mohammed
                                                                     Bouhorma,     and Daniyal Alghaz-
[FRID+ 07]   Florentino Fdez-Riverola, Eva Lorenzo                   zawi. New phishing hybrid detection
             Iglesias, Fernando Dı́az, José Ramon                   framework. Journal of Theoretical &
             Méndez, and Juan M Corchado. Ap-                       Applied Information Technology, 96(6),
             plying lazy learning algorithms to tackle               2018.
             concept drift in spam filtering. Expert     [MG18]      Ankur Mishra and BB Gupta. In-
             Systems with Applications, 33(1):36–48,                 telligent phishing detection system us-
             2007.                                                   ing similarity matching algorithms.
                                                                     International Journal of Information
[GGWM+ 10] Pedro H Calais Guerra, Dorgival                           and Communication Technology, 12(1-
           Guedes, J Wagner Meira, Cristine                          2):51–73, 2018.
           Hoepers, MHPC Chaves, and Klaus
           Steding-Jessen. Exploring the spam            [MSC+ 13]   Tomas Mikolov, Ilya Sutskever, Kai
           arms race to characterize spam evolu-                     Chen, Greg S Corrado, and Jeff Dean.
           tion. In Proceedings of the 7th Col-                      Distributed representations of words
           laboration, Electronic messaging, Anti-                   and phrases and their compositionality.
           Abuse and Spam Conference (CEAS),                         In Advances in neural information pro-
           Redmond, WA, 2010.                                        cessing systems, pages 3111–3119, 2013.
[PHS18]    Tianrui Peng, Ian Harris, and Yuki         [VSP17]    R Vinayakumar, KP Soman, and
           Sawa. Detecting phishing attacks us-                  Prabaharan Poornachandran. Deep en-
           ing natural language processing and ma-               crypted text categorization. In Ad-
           chine learning. In Semantic Computing                 vances in Computing, Communications
           (ICSC), 2018 IEEE 12th International                  and Informatics (ICACCI), 2017 Inter-
           Conference on, pages 300–301. IEEE,                   national Conference on, pages 364–370.
           2018.                                                 IEEE, 2017.
[S+ 09]    Jyh-Jian Sheu et al. An efficient two-     [VSP18a]   R Vinayakumar, KP Soman, and
           phase spam filtering method based on                  Prabaharan Poornachandran. Detect-
           e-mails categorization. IJ Network Se-                ing malicious domain names using deep
           curity, 9(1):34–43, 2009.                             learning approaches at scale. Jour-
                                                                 nal of Intelligent & Fuzzy Systems,
[SAZ18]    Sami Smadi, Nauman Aslam, and                         34(3):1355–1367, 2018.
           Li Zhang. Detection of online phish-
           ing email using dynamic evolving neural    [VSP18b]   R Vinayakumar, KP Soman, and
           network based on reinforcement learn-                 Prabaharan Poornachandran. Evaluat-
           ing. Decision Support Systems, 2018.                  ing deep learning approaches to charac-
                                                                 terize and classify malicious urls. Jour-
[SDHH98]   Mehran Sahami, Susan Dumais, David                    nal of Intelligent & Fuzzy Systems,
           Heckerman, and Eric Horvitz.        A                 34(3):1333–1343, 2018.
           bayesian approach to filtering junk e-
           mail. In Learning for Text Categoriza-     [WC07]     Chih-Chien Wang and Sheng-Yi Chen.
           tion: Papers from the 1998 workshop,                  Using header session messages to anti-
           volume 62, pages 98–105, 1998.                        spamming.     Computers & Security,
                                                                 26(5):381–390, 2007.
[SKP18]    Vysakh S Mohan Soman Kp, Vinayaku-
           mar R and Prabaharan Poornachan-           [WC11]     John S Whissell and Charles LA Clarke.
           dran. S.p.o.o.f net: Syntactic pat-                   Clustering for semi-supervised spam fil-
           terns for identification of ominous on-               tering. In Proceedings of the 8th An-
           line factors. In 2018 IEEE Security and               nual Collaboration, Electronic messag-
           Privacy Workshops (SPW). IEEE, [In-                   ing, Anti-Abuse and Spam Conference,
           Press], 2018.                                         pages 125–134. ACM, 2011.
[SLH17]    Doyen Sahoo, Chenghao Liu, and             [ZZY04]    Le Zhang, Jingbo Zhu, and Tianshun
           Steven CH Hoi. Malicious url detection                Yao. An evaluation of statistical spam
           using machine learning: A survey. arXiv               filtering techniques. ACM Transactions
           preprint arXiv:1701.07179, 2017.                      on Asian Language Information Pro-
                                                                 cessing (TALIP), 3(4):243–269, 2004.
[Tre04]    Konstantin Tretyakov. Machine learn-
           ing techniques in spam filtering. In
           Data Mining Problem-oriented Semi-
           nar, MTAT, volume 3, pages 60–79,
           2004.
[VH13]     Rakesh Verma and Nabil Hossain. Se-
           mantic feature selection for text with
           application to phishing email detection.
           In International Conference on Infor-
           mation Security and Cryptology, pages
           455–468. Springer, 2013.
[VSH12]    Rakesh Verma, Narasimha Shashidhar,
           and Nabil Hossain. Detecting phishing
           emails the natural language way. In Eu-
           ropean Symposium on Research in Com-
           puter Security, pages 824–841. Springer,
           2012.