=Paper=
{{Paper
|id=Vol-2124/paper_13
|storemode=property
|title=A.R.E.S:
Automatic Rogue Email Spotter Crypt
Coyotes
|pdfUrl=https://ceur-ws.org/Vol-2124/paper_13.pdf
|volume=Vol-2124
|authors=Vysakh Mohan,Naveen JR,Vinayakumar R,Soman KP
}}
==A.R.E.S:
Automatic Rogue Email Spotter Crypt
Coyotes==
A.R.E.S : Automatic Rogue Email Spotter
Crypt Coyotes
Vysakh S Mohan, Naveen J R, Vinayakumar R, Soman KP
Center for Computational Engineering and Networking(CEN),
Amrita School of Engineering, Coimbatore
Amrita Vishwa Vidyapeetham, India
vsmo92@gmail.com,naveenaksharam@gmail.com
1 Introduction
Abstract Internet and staying connected through it is what dis-
tinguishes this era from the previous. More and more
people rely on the internet for their communication as
Be it formal or casual, email is undoubtedly well as data transaction requirements. Email has rev-
the most popular means of communication in olutionized the way people communicate over the web.
modern times. Their popularity owes to the From its inception, electronic mails have outgrown
fact that they are reliable, fast and more over its real world counterpart to become mainstream and
free to use. One issue that plagues this oth- serve as both casual and official way of passing a mes-
erwise solid technology is phishing emails re- sage. Now we have several service providers offering
ceived by users. Phishing emails have always email platforms for free and with a plethora of fea-
bothered users as it’s a huge waste of stor- tures. This means that the number of people taking
age, time, money and resource to any user. advantage of these services have grown dramatically.
Many previous attempts to eradicate or at This mass adoption is one aspect any malignant adver-
least block phishing emails have been deemed sary could use to his benefit. Such malignant emails
futile. This work uses word embedding as text are called spam[CM01], and they are unsolicited as
representation for supervised classification ap- well as junk info usually unwanted for the user. They
proach to identify phishing emails. Ruled are commonly characterized by the following: they are
based and machine learning models with fea- mass mailed, may contain explicit content, useless ad-
ture engineering were attempted but failed vertisements, fraudulent, may contain hidden links to
due to the ever increasing ways of threats phishing websites etc. On a personal front the user
and lack of scalability of the model. Deep could face issues like, annoyance due to irrelevant info,
learning based models have shown to surpass unwanted use of bandwidth, waste of storage, makes
the older techniques in spam email detection. the communication channel less productive via loss of
This work aims at attempting the same using time sorting junk mails, unnecessary use of comput-
a CNN/RNN/MLP network with Word2vec ing power, causes spread of viruses, loss of money via
embeddings on phishing email corpus, where phishing etc.
Word2vec helps to capture the synaptic and
These issues have brought immense focus on safety
semantic similarity of phishing and legitimate
of users against spam emails. Massive pool of users
emails in an email corpus. This work aims to
using these platforms is one reason for it being tar-
show the abilities of word embedding have to
geted more often. It is an inexpensive means to gain
solve issues related to cybersecurity use cases.
access to millions of people, which forces adversaries
to target it more often. The most dangerous type of
Copyright c by the paper’s authors. Copying permitted for emails are the spam emails[KRA+ 07]. It may be via a
private and academic purposes.
spam email server or from personal servers containing
In: R. Verma, A. Das (eds.): Proceedings of the 1st AntiPhish-
ing Shared Pilot at 4th ACM International Workshop on Se-
malicious URLs that could direct the users to phish-
curity and Privacy Analytics (IWSPA 2018), Tempe, Arizona, ing sites. This is a challenging task and many solu-
USA, 21-03-2018, published at http://ceur-ws.org tions have been devised to solve this problem over the
past few years, but they all come with some downsides. and co-relation issues are ignored[SDHH98], that is,
One reason it gets challenging is the variety of ways the multi variate nature of the problem breaks down
in which the attacker can serve a spam email. A fre- to a uni-variate one without compromising on accu-
quently used method is the blended attack. Malware racy. Different authors have tried to incorporate mod-
delivery through such attacks may vary. Usually the ifications on top of the naive bayes pipeline, but the
email itself may not contain the malware, but possibly approach was unable to find the correlation between
contain a link to some compromised website. These words and the algorithm failed in certain tasks. In
emails may look normal, but would contain a mix of 2004 Chih-Chin Lai and Tsai[LT04] introduced the
legitimate as well as malicious content. A former re- TF-IDF, K-NN and SVM to overcome the issues in
search by IBM’s X-Force team, found that more than the email filtering task. SVM, TF-IDF got a satisfac-
50% of the emails produced worldwide are fraudulent. tory result while K-NN got worst result among them.
These figures are going to increase in the subsequent Blanzieri and Bryl came up with feature extraction
years. methods in[BB08], along with SVM. During this time,
One reason such attacks are successful is the care- unsupervised machine leaning techniques were also de-
lessness from the generic user. Most internet users are veloped. Data were clustered into spam and ham.
illiterate when it comes to cybersecurity and they sim- Whissell and Clarke[WC11] in 2011, came up with
ply ignore the safety precautions that need to be exer- a novel research on spam clustering, which attained
cised in the online space. There are no sure shot ways state of art result compared to all the previous meth-
to check if a person has been a victim to such attacks, ods. Since the spam filtering is a diverse area, ensem-
but can be prevented by being a bit cautious. You ble methods (combining different algorithms on same
could check the email headers and check for grammat- problem), like boosting and bagging[GGWM+ 10], are
ical mistakes. But these may not be sufficient when the applied to get effective classification. Caruana and
scale of such attacks escalates. These type of states re- Li, (2012) focused[CL12] on distributed computuing
quire some automated solution to detect spam email. paradigram using SVM and ANN by removing the in-
Emails headers can help to a certain extent. They teroperability and implementation issues.
can be used as features to some machine learning based
classifiers[LT04, S+ 09]. The advantage of using header
features compared to body features have been detailed
in[ZZY04]. Header features like sender address, mes-
sage ID etc. were used in[WC07] to make the detec- Machine learning models usually rely on some sort
tion. of engineered features that are generated from the
Most of the popular machine learning techniques data and has been proved to surpass the accuracy of
consists of two steps: obtain the proper features rep- its predecessors in spam email classification[FRID+ 07,
resentation from the data and use these features for AAY11], whereas, very few machine learning models
learning and predicting the system. First step focuses for phishing emails exist today and most of them are
on extracting useful info from the given URL, which in their infancy. With acquired domain knowledge,
is stored as a vector so that the algorithm can fit dif- various feature engineering strategies are employed on
ferent machine learning based models in it. Different the data to build the model[SAZ18], [PHS18], [VH13],
categories of features have been taken[SLH17]. Lexi- [VSH12], [MG18], [HDC+ 18], [MBA18]. A main plus
cal features, content features, host based features and to this method is the reduced effort to train the clas-
context features are some of the popular ones. An al- sifier rather than developing complex rules for a filter.
gorithm requires some form of mathematical represen- This feature engineering method could also deem the
tation to work with. This work uses Word2vec embed- system vulnerable to manipulation and the model may
ding methods for effective representation of the data. not scale well to newer threats. Deep learning mod-
Spam filtering is a supervised classification problem els can be used to overcome this issue as they learn
where the problem is considered as a binary classifi- the features themselves and modify it according to
cation task with 2 classes: legitimate (good) emails newer inputs. On top of that these models are com-
and spam emails. Tretyakov used methods like naive paratively more accurate and scalable. Nowadays deep
bayes and K-NN machine learning algorithms for spam learning models combined with word embeddings have
detection[Tre04], which doesn’t deal with feature se- given good performance for various cybersecurity use-
lection but beneficial for beginners. Spam detection cases[VSP18a], [VSP18b], [VSP17], [LF17], [SKP18].
or automatic email filtering starts with statistical ap- This motivated the use of word embeddings with deep
proaches primarily. The development began with pop- learning models like Multi-Layer Perceptron (MLP),
ular naive bayes approaches, which reduced the prob- Recurrent Neural Network (RNN), Convolutional Neu-
lem into a space where dependencies between the data ral Networks and Long Short Term Memory (LSTM).
2 Background which then slides over the entire rows and columns
of the matrix. In this matrix each individual row is a
This section details the theory behind various deep
vector representing one word, more accurately speak-
learning models used.
ing, these are word embedding models like Glove1 or
2.1 Word2vec Word2vec2 . This work used Word2vec model before
applying CNN in this task. CNN performs well on
Word2vec is a model proposed by Mikalov[MSC+ 13] sequential data with faster training times and is ex-
to learn the word embedding which is inspired ceptional for predictive analysis. CNN normally con-
by distributed representation introduced by sist of an input layer followed by convolutional layers,
Hinton[HMR+ 86], but in the Word2vec frame- maxpooling layers for dimensionality reduction pur-
work, word representation is learned using a shallow pose and fully connected layers with a specific non-
neural network. The fundamental assumption in linear activation function (ReLU in this work). In this
word embedding or distributional methods is that, phishing email detection task (text based), one dimen-
words with similar sense tends to happen in similar sional maxpooling layers and fully connected layers are
context and they capture the similarity between used. Filters used in this network model slides above
words[BG17], [BG18]. Word2vec is a popular model the embedding vector to output a continuous value at
to generate word embeddings on text data. They have each step. This outputs better representations of the
the ability to reproduce linguistic context of words word vectors. For text based applications 1D CNN is
through training their shallow two layer architectures. used.
The input to the Word2vec model may be a huge
corpus and the generated outputs are vectors in some 2.3 Multi-layer perceptron (MLP)
multi-dimensional space, with each unique word in
the corpus have a corresponding vector associated Rosenblatt introduced the concept of a single percep-
with it. This makes learning the word representation tron. Multi-layer perceptron (MLP) is typically a net-
significantly faster than the previous methods. In the work of perceptrons or simple neurons. MLP consists
Word2vec framework the distributed representation of one input and output layer. Dimensions of input
of the words in the vocabulary is learned in an output nodes depends on the no of sample vectors and
unsupervised way. Learning can be done via two the no of label vectors present in the input data. In be-
architectures like skip-gram and continuous bag of tween these two layers, many hidden layers are present.
words. There exist layers where the output is being fed as in-
put to the following hidden layers and each unit does
N
1 X X a relatively straight forward computation. It takes in-
logp(Qn+k |Qk ) (1) put X multiplies it by a weight W , performs a sum-
N n=1
−sks,j6=0
mation and passes all of that through an activation
Skip-gram method tries to maximize the average function to yield the output. Perceptrons compute a
probability value of the word sequence Q1 ,Q2 ,...QN . score or a single output from sequential inputs that
Here ’s’ indicates training context size that is directly are usually real valued. This calculated score is used
related to the center word Qn and p(Qn+k |wk ) is soft- for backward pass, where cost function is calculated by
max function. In the skip-gram model, the context or matching wrongly predicted output to the truth label
surrounding word is predicted given the centre word value, and is expressed as root mean square (RMS)
as the input and in Continuous Bag of Words(CBOW) error value. This RMS error is minimized using gra-
model, given the surrounding words the centre word is dient descent technique and optimum weight and base
predicted. value is figured out from this network model. It uses
activation functions like sigmoid or tanh to produce
2.2 Convolutional neural Nets (CNN) the output. One nature of MLP is the fully connected
CNN is commonly used for computer vision tasks, architecture within its deep layers.
where their local receptive field is advantageous for fea-
ture learning in images. CNN models are also used for 2.4 Recurrent neural network (RNN)
text classification tasks. CNN can be thought of as an
The problem associated with MLP and CNN model is
artificial neural network that has the ability to pick out
that every input and outputs vectors are independent.
or detect patterns and make sense out of them. These
Or in other words above models can’t capture the se-
pattern detection makes CNN useful for data analysis.
quential info between the words. In phishing email
CNN has hidden layers called convolutional layers are
a tad bit different from MLP. For each convolutional 1 https://nlp.stanford.edu/projects/glove
layers, the number of filters needs to be specified, 2 https://www.tensorflow.org/tutorials/Word2vec
Table 1: Hyper Parameter for Word2vec Model
Hyperparameter
Batch-Size 250 The number of training samples required
Embedding-Size 300 Word vector dimension
Skip-Window 7 Context window, five words before and after each word
Num-skips 12 How many prediction pairs are selected from the window
Num-sampled 128 Number of negative samples
Learning rate 0.1 Determines how quickly or slowly model update the parameter
n-epoch 50 No of (forward+backward pass)
detection task it is highly useful to identify the asso-
ciated words for classification purposes. RNN model
is popular in time series and sequence data analysis.
It can take variable size inputs and return a variable
size output. State of recurrent NN at time ’T’ is a
function of its old state and the input at the time ’T’.
Since it is storing previous state of system we can say
that RNN has a ’memory’ to capture sequential info
between words. Recurrent neural net is a varied it-
eration of feed forward nets. The cyclic connections
between the neurons makes way for results from pre-
vious time step to compute the current state, in a way
remembering the temporal information about the in-
put data. This makes RNN learn well on data with
long term dependency, like for natural language pro-
cessing and speech processing applications.
3 DATASET DESCRIPTION
The dataset[EDMB+ 18] used is provided at the 4th
ACM International Workshop on Security and Privacy Figure 1: Proposed Architecture
Analytics shared task[EDB+ 18]. The task was to de-
tect phishing emails. Details of the dataset is shown from the header and the methodology used for conver-
in Table2 & 3 sion of raw email samples to feature vectors the same
for both the sub tasks. In both the sub tasks, the raw
Table 2: Training Dataset details email corpus is fed to the embedding layer that uses
Category Legitimate Phishing Total Word2vec model to generate distributed word embed-
With No header 5088 612 5700 ding. The learned word embedding model is used to
With header 4082 501 4583 represent the input data, which is then fed to a deep
learning models. The hyperparameters used to create
Word2vec model is detailed in Table 1.
Table 3: Testing Dataset details
The deep learning models learn additional features
Training Dataset Data Samples
which will be pushed to the fully connected layer. Pre-
With No header 4300
vious work on similar problem suggests to use RNN to
With header 4195 solve such tasks, but in order to have a better analysis
on the performance of different models we incorpo-
rated CNN and MLP to this work. Finally, due to the
4 Experiments and Result binary nature of this task we used sigmoid to clas-
The proposed tool is christened A.R.E.S which stands sify legitimate emails from the phishing based on its
for Automatic Rogue Email Spotter. A detailed vi- threshold and used binary cross entropy for loss reduc-
sualization of the model is shown in Fig 1. The ar- tion.
chitecture is a combination of word embedding with a From the statistics shown in Table 4 and 5, the word
CNN, RNN, and MLP. This task is categorized into embedding model along with an MLP network gives
2 subs tasks, which are emails with ’no header’ and a commendable score for both the sub tasks. Fur-
’with header’. We didn’t extract any other features ther, when the same word embedding model is passed
Table 4: Statistics of training results
10-fold cross
Method Task
validation accuracy
Word embedding + MLP Sub task 1 0.921
Word embedding + CNN Sub task 1 0.952
Word embedding + RNN Sub task 1 0.951
Word embedding + MLP Sub task 2 0.901
Word embedding + CNN Sub task 2 0.912
Word embedding + RNN Sub task 2 0.931
Table 5: Statistics of test results
Method Task TP TN FP FN
Word embedding + CNN Sub task 1 3479 237 238 346
Word embedding + RNN Sub task 1 3446 224 251 379
Word embedding + RNN Sub task 2 3193 363 133 506
through CNN and RNN, it registered an overall im- olating the training corpus and by adding deeper lay-
proved score from the previous MLP model. Specif- ers to infuse more feature learning capabilities to the
ically, the CNN gave the highest score for sub task model. This work also demonstrates the possibilities
1, whereas RNN gave the best score for sub task 2, of amalgamating techniques from text analytics and
over the validation set. The MLP model with 6 hid- deep learning for cybersecurity use cases.
den layers of size 300 are used primarily for building
the base model. The activation function is ReLU Acknowledgements
and the dropout is 0.01. Model is implemented in
Keras, which used the best validation score among This research was supported in part by Paramount
500 epochs. Then the model structure is extended Computer Systems. We are grateful to NVIDIA In-
into CNN and RNN neural network models. CNN is dia for the GPU hardware support to the research
implemented with 256 filters and maxpooling is used grant. We are grateful to Computational Engineering
for dimensionality reduction between the dense lay- and Networking (CEN) department for encouraging
ers. All experiments were performed on GPU enabled the research.
+
TensorFlow[ABC 16] in conjunction with the Keras
framework[C+ 15]. All models are trained using back- References
propagation. [AAY11] Tiago A Almeida, Jurandy Almeida,
and Akebo Yamakami. Spam filtering:
5 Conclusion how the dimensionality reduction affects
Phishing emails have always plagued even the average the accuracy of naive bayes classifiers.
user and classifying the same properly is a challeng- Journal of Internet Services and Appli-
ing task. Where former machine learning techniques cations, 1(3):183–200, 2011.
failed, deep learning models have provided state of
[ABC+ 16] Martı́n Abadi, Paul Barham, Jianmin
the art performance. The CNN/RNN/MLP architec-
Chen, Zhifeng Chen, Andy Davis, Jef-
ture along with the Word2vec embeddings used in this
frey Dean, Matthieu Devin, Sanjay
work has outperformed former rule based and machine
Ghemawat, Geoffrey Irving, Michael Is-
learning based models. During training the model gave
ard, et al. Tensorflow: A system for
high accuracy, while the test accuracy were compara-
large-scale machine learning. In OSDI,
tively low due to the highly unbalance nature of the
volume 16, pages 265–283, 2016.
dataset. In the proposed system, no external data
was provided to train the model. CNN had a slightly [BB08] Enrico Blanzieri and Anton Bryl. A
better performance over RNN model on subtask1 and survey of learning-based techniques of
RNN perform well for subtask2, on the test data. For email spam filtering. Artificial Intelli-
subtask 1, the CNN managed a score of 95.2%, al- gence Review, 29(1):63–92, 2008.
most comparable to RNN and for subtask 2, the RNN
managed a score of 93.1%, making the RNN a better [BG17] Reshma U. Anand Kumar M. So-
and more versatile overall performer. More accuracy man K.P. Barathi Ganesh, H.B.
can be achieved with these trained model by extrap- Representation of target classes
for text classification-amrita-cen- [HDC+ 18] Reza Hassanpour, Erdogan Dogdu,
nlp@rusprofiling pan 2017. In CEUR Roya Choupani, Onur Goker, and Na-
Workshop Proceedings, pages 25–27, zli Nazli. Phishing e-mail detection by
2017. using deep learning algorithms. In Pro-
ceedings of the ACMSE 2018 Confer-
[BG18] Anand Kumar M. Soman K.P. ence, page 45. ACM, 2018.
Barathi Ganesh, H.B. From vec-
tor space models to vector space models [HMR+ 86] Geoffrey E Hinton, James L McClel-
of semantics. In Lecture Notes in land, David E Rumelhart, et al. Dis-
Computer Science (including subseries tributed representations. Parallel dis-
Lecture Notes in Artificial Intelligence tributed processing: Explorations in the
and Lecture Notes in Bioinformatics), microstructure of cognition, 1(3):77–
10478 LNCS., pages 50–60, 2018. 109, 1986.
[KRA+ 07] Ponnurangam Kumaraguru, Yong
[C+ 15] François Chollet et al. Keras, 2015. Rhee, Alessandro Acquisti, Lorrie Faith
Cranor, Jason Hong, and Elizabeth
[CL12] Godwin Caruana and Maozhen Li. A
Nunge. Protecting people from phish-
survey of emerging approaches to spam
ing: the design and evaluation of an
filtering. ACM Computing Surveys
embedded training email system. In
(CSUR), 44(2):9, 2012.
Proceedings of the SIGCHI confer-
[CM01] Xavier Carreras and Lluis Marquez. ence on Human factors in computing
Boosting trees for anti-spam email fil- systems, pages 905–914. ACM, 2007.
tering. arXiv preprint cs/0109015, [LF17] Ruidan Li and Errin W Fulp. Evolu-
2001. tionary approaches for resilient surveil-
lance management. In 2017 IEEE Se-
[EDB+ 18] Ayman Elaassal, Avisha Das, Shahryar curity and Privacy Workshops (SPW),
Baki, Luis De Moraes, and Rakesh pages 23–28. IEEE, 2017.
Verma. Iwspa-ap: Anti-phising shared
task at acm international workshop on [LT04] Chih-Chin Lai and Ming-Chi Tsai. An
security and privacy analytics. In empirical performance comparison of
Proceedings of the 1st IWSPA Anti- machine learning methods for spam e-
Phishing Shared Task. CEUR, 2018. mail categorization. In Hybrid Intelli-
gent Systems, 2004. HIS’04. Fourth In-
[EDMB+ 18] Ayman Elaassal, Luis De Moraes, ternational Conference on, pages 44–48.
Shahryar Baki, Rakesh Verma, and IEEE, 2004.
Avisha Das. Iwspa-ap shared task email
dataset, 2018. [MBA18] Youness Mourtaji, Mohammed
Bouhorma, and Daniyal Alghaz-
[FRID+ 07] Florentino Fdez-Riverola, Eva Lorenzo zawi. New phishing hybrid detection
Iglesias, Fernando Dı́az, José Ramon framework. Journal of Theoretical &
Méndez, and Juan M Corchado. Ap- Applied Information Technology, 96(6),
plying lazy learning algorithms to tackle 2018.
concept drift in spam filtering. Expert [MG18] Ankur Mishra and BB Gupta. In-
Systems with Applications, 33(1):36–48, telligent phishing detection system us-
2007. ing similarity matching algorithms.
International Journal of Information
[GGWM+ 10] Pedro H Calais Guerra, Dorgival and Communication Technology, 12(1-
Guedes, J Wagner Meira, Cristine 2):51–73, 2018.
Hoepers, MHPC Chaves, and Klaus
Steding-Jessen. Exploring the spam [MSC+ 13] Tomas Mikolov, Ilya Sutskever, Kai
arms race to characterize spam evolu- Chen, Greg S Corrado, and Jeff Dean.
tion. In Proceedings of the 7th Col- Distributed representations of words
laboration, Electronic messaging, Anti- and phrases and their compositionality.
Abuse and Spam Conference (CEAS), In Advances in neural information pro-
Redmond, WA, 2010. cessing systems, pages 3111–3119, 2013.
[PHS18] Tianrui Peng, Ian Harris, and Yuki [VSP17] R Vinayakumar, KP Soman, and
Sawa. Detecting phishing attacks us- Prabaharan Poornachandran. Deep en-
ing natural language processing and ma- crypted text categorization. In Ad-
chine learning. In Semantic Computing vances in Computing, Communications
(ICSC), 2018 IEEE 12th International and Informatics (ICACCI), 2017 Inter-
Conference on, pages 300–301. IEEE, national Conference on, pages 364–370.
2018. IEEE, 2017.
[S+ 09] Jyh-Jian Sheu et al. An efficient two- [VSP18a] R Vinayakumar, KP Soman, and
phase spam filtering method based on Prabaharan Poornachandran. Detect-
e-mails categorization. IJ Network Se- ing malicious domain names using deep
curity, 9(1):34–43, 2009. learning approaches at scale. Jour-
nal of Intelligent & Fuzzy Systems,
[SAZ18] Sami Smadi, Nauman Aslam, and 34(3):1355–1367, 2018.
Li Zhang. Detection of online phish-
ing email using dynamic evolving neural [VSP18b] R Vinayakumar, KP Soman, and
network based on reinforcement learn- Prabaharan Poornachandran. Evaluat-
ing. Decision Support Systems, 2018. ing deep learning approaches to charac-
terize and classify malicious urls. Jour-
[SDHH98] Mehran Sahami, Susan Dumais, David nal of Intelligent & Fuzzy Systems,
Heckerman, and Eric Horvitz. A 34(3):1333–1343, 2018.
bayesian approach to filtering junk e-
mail. In Learning for Text Categoriza- [WC07] Chih-Chien Wang and Sheng-Yi Chen.
tion: Papers from the 1998 workshop, Using header session messages to anti-
volume 62, pages 98–105, 1998. spamming. Computers & Security,
26(5):381–390, 2007.
[SKP18] Vysakh S Mohan Soman Kp, Vinayaku-
mar R and Prabaharan Poornachan- [WC11] John S Whissell and Charles LA Clarke.
dran. S.p.o.o.f net: Syntactic pat- Clustering for semi-supervised spam fil-
terns for identification of ominous on- tering. In Proceedings of the 8th An-
line factors. In 2018 IEEE Security and nual Collaboration, Electronic messag-
Privacy Workshops (SPW). IEEE, [In- ing, Anti-Abuse and Spam Conference,
Press], 2018. pages 125–134. ACM, 2011.
[SLH17] Doyen Sahoo, Chenghao Liu, and [ZZY04] Le Zhang, Jingbo Zhu, and Tianshun
Steven CH Hoi. Malicious url detection Yao. An evaluation of statistical spam
using machine learning: A survey. arXiv filtering techniques. ACM Transactions
preprint arXiv:1701.07179, 2017. on Asian Language Information Pro-
cessing (TALIP), 3(4):243–269, 2004.
[Tre04] Konstantin Tretyakov. Machine learn-
ing techniques in spam filtering. In
Data Mining Problem-oriented Semi-
nar, MTAT, volume 3, pages 60–79,
2004.
[VH13] Rakesh Verma and Nabil Hossain. Se-
mantic feature selection for text with
application to phishing email detection.
In International Conference on Infor-
mation Security and Cryptology, pages
455–468. Springer, 2013.
[VSH12] Rakesh Verma, Narasimha Shashidhar,
and Nabil Hossain. Detecting phishing
emails the natural language way. In Eu-
ropean Symposium on Research in Com-
puter Security, pages 824–841. Springer,
2012.