DDoS Botnet Detection Technique Based on the Use of
      the Semi-Supervised Fuzzy C-Means Clustering

 Sergii Lysenko1[0000-0001-7243-8747], Oleg Savenko1[0000-0002-4104-745X] and Kira Bobrovni-
                                    kova1[0000-0002-1046-893X]
              1 Department of Computer Engineering and System Programming

                          Khmelnitsky National University,
                        Instytutska, 11, Khmelnitsky, Ukraine
                 sirogyk@ukr.net,savenko_oleg_st@ukr.net,
                       bobrovnikova.kira@gmail.com,
                            http://ki.khnu.km.ua


       Abstract. A new technique for the DDoS botnet detection based on the botnets
       network features analysis is proposed. It uses the semi-supervised fuzzy c-means
       clustering. The proposed approach includes the learning and the detection stages.
       Analysis is based on the extracted from the network traffic features that may in-
       dicate the presence of the DDoS botnets’ presence in the network. Experimental
       results demonstrated that the proposed technique ensures the DDoS botnet detec-
       tion at the rate at about 95%.

       Keywords: Botnet, Botnet Detection, DDoS, DDoS Botnet, corporate area net-
       works, Fuzzy C-means Clustering, Cyber Attack.


1      Introduction

Today one of the most dangerous type of the malware is the botnet - a group of infected
Internet-connected devices with malware and used to control it from a remote location
without the knowledge of the device’s owner. One of the malicious purposes botnets
are used are the spam or DDoS attacks [1, 2]. DDoS botnets’ attack is considered as the
biggest threat to IT industry, and increasing of the intensity, size and frequency of the
attacks are observed every year. DDoS botnets’ attacks can primarily compromise the
availability of the system services leading to financial damage or affecting the reputa-
tion of the corporate area networks. That is why there is a strong need for development
of the efficient techniques for DDoS botnets’ detection to impede these attacks.


2      Related Works

Today there are many attempts for the development of the DDoS botnet detection tech-
niques. In [3] an overview of DDoS attacks that can be carried out in cloud environment
and possible defensive mechanisms and tools are presented.
    In [4] an approach for the detection and mitigation of known and unknown DDoS
attacks in real time environments is proposed. An Artificial Neural Network (ANN)
algorithm to detect DDoS attacks based on specific characteristic features (patterns)
that separate DDoS attack traffic from genuine traffic was chosen.
    In [5] research is related to DDoS attack mitigation solutions in the cloud. In partic-
ular, a comprehensive survey with a detailed insight into the characterization, preven-
tion, detection, and mitigation mechanisms of these attacks were presented. A compre-
hensive solution taxonomy to classify DDoS attack solutions was presented. A definite
guideline on effective solution building and detailed solution requirements to design
the defense mechanisms was provided.
    In [6] basically three contributions were offered: an abstract model for the aforemen-
tioned class of attacks, where the botnet emulates normal traffic by continually learning
admissible patterns from the environment was introduced; an inference algorithm that
is shown to provide a consistent estimate of the botnet possibly hidden in the network
was devised.
    The work [7] outlines an evaluation tool and evaluates an amplification attack based
on the Trivial File Transfer Protocol (TFTP). Mitigation methods to this threat have
been considered and a variety of countermeasures are proposed. The approach presents
the adjustment of the attack, detection and its mitigation.
    In [8] an analysis of Mirai’s botnet were provided. The Mirai represents a new type
of the botnet that can compromise enough low-end devices to threaten even some of
the best-defended targets. To address this risk, technical and nontechnical interventions
were proposed.
    The main drawbacks of the described above approaches are the low rate of the de-
tection efficiency of the DDoS botnets’ detection in the situation of the network em-
ploying for the purpose of the attacks performance, when the attack traffic is very much
similar to legitimate traffic.


3      Previous Work

During the last years, several attempts to solve the problem of the botnet detection in
the corporate area networks (CAN) were made. Approaches [9, 10] proposed the botnet
detection the using the multi-agent system. The conclusion about botnet’s presence was
drawn using the fuzzy logic, taking into account the botnet features in the several net-
work hosts. The botnet detection technique [11] involved the DNS-based analysis. The
approach employed the passive DNS-monitoring and active DNS probing in the net-
work. That enabled the possibility of the botnets, which used the cycling of IP mapping,
"domain flux", "fast flux", DNS-tunneling evasion techniques. Based on the proposed
technique the botnet detection tool BotGRABBER was developed. It abled the gather-
ing of the DNS-traffic and analyzing the features obtained from the payload. Conclu-
sion about possible botnet‘s presence was drawn using the clustering analysis. Ap-
proach [12] presented an evolution of the BotGRABBER system. It was enhanced by
the possibility of the of the botnets localization in the CAN by the means of the combi-
nation analysis of the DNS-traffic and the behavior of the malicious software in the
network hosts. Nevertheless, the main drawback of the BotGRABBER system is that it
deals with the malicious DNS-traffic, and do not take into account the features of the
DDoS botnets, that may employ the networks for DDoS execution. The further research
is to extend the functionality of the BotGRABBER system with ability to analyze the
network traffic and to detect the botnets that execute DDoS attacks.


4      DDoS Botnet Detection Technique Based on the Use of
       the Semi-Supervised Fuzzy C-Means Clustering

A new technique for the DDoS botnet detection based on the botnets network features
analysis is proposed. It uses the semi-supervised fuzzy c-means clustering. The pro-
posed approach includes the learning and the detection stages. Let us consider the steps
of the learning stage:
    1. Knowledge formation based on the features that may indicate DDoS botnet at-
tacks in the network;
    2. Presentation of the knowledge about the cyberattacks as a set of feature vectors;
    The detection stage of the technique consists the following steps:
    1. a gathering of the inbound and outbound network traffic;
    2. an extraction of the features from the network traffic that may indicate the pres-
ence of the DDoS botnets’ presence in the network and building a feature vector;
    3. a construction of the feature vectors based on the information obtained from the
network traffic;
    4. the implementation of the semi-supervised fuzzy c-means clustering of the ob-
tained feature vectors in order to label them to one of the clusters which assigns the
specified DDoS botnets attack;
    5. a localization of the hosts, infected with DDoS botnets.


4.1    Knowledge Formation Based on the Features that May Indicate
       DDoS Botnet Attacks in the Network

Let us denote the set of attacks, performed by the DDoS botnet as А = {am }m=1 , where
                                                                                NA


a1 – the ping flooding attack; a2 – the smurf attack; a3 – the TCP SYN flood attack;
a4 – the fragmented UDP flood attack; a5 – the DNS amplification attack; a6 – the
TCP reset attack; a7 – the ICMP flood attack; a8 – the SIP INVITE flood attack; a9 –
the encrypted SSL DDoS attack; a10 – the ping sweep attack; a11 – the DNS spoofing
attack; a12 – the ping of death attack; a13 – the R-U-Dead-Yet DDos attack (R.U.D.Y.),
where N A – the number of attacks, performed by DDoS botnets.
   Let us denote the set of features, that may indicate DDoS botnet attacks and are to
be analyzed as B = {b j }j =1 , where N B – the number of features. The list of features is
                         N
                         B


presented in Table 1. Let us denote the set network hosts attacked by the DDoS botnets
as H = {hi }i=1 , where N H – the number of network hosts. Thus, the function of the
                NH


DDoS botnet attack identifying f can be presented as: f : hi × b j → am .

              Table 1. The features, that take place in the DDoS botnet detection process.

 Feature                                 Description
 p       transmission protocol
 f IO    a boolean feature that indicates whether the inbound traﬃc has an associ-
         ated outbound traﬃc record
 pOD     a number of packages transmitted from origin to destination
  bOD           a number of bytes transmitted from origin to destination
  dC            a duration of the connection
  d EL          a duration of the connection, observed from the earliest of the associated
                inbound or outbound traﬃc until the end of the latter traﬃc
  lp            an average payload length per connection
  bTC           a total number of bytes transmitted per connection
  bEH           a total number of bytes per connection excluding the header
  nPSF          a number of a diﬀerent size of packets transferred to a total number of
                frames per connection
  ps            total number of packets in the session
  bs            total size for the session in bytes
  d PSB        standard deviation of packet size within the session measured in bytes
  vOBP , v IBP velocity of outbound/inbound traﬃc measured in bytes per packet

  vOBS , vIBS velocity of outbound/inbound traﬃc measured in bits per second

  vOPS , vIPS velocity of outbound/inbound traﬃc measured in packets per second

  oSS , iSS     self-similarity of the outbound/inbound packets in the session, determined
                by examining the variance in size of the outbound/inbound packets using
                the Hurst exponent
  n DP          an amount of denied packets
  n NAT         a number of records in the NAT/PAT-table
  n ARP         a number of the ARP-requests
  f TCP         invalid values of TCP ﬂags seen in this session
  f GEO         the geolocation feature deﬁned by IP-address
  pR            a value of the router’ s processor’ s time, %
  mR            a size of the router’ s memory used, megabytes
  s RT          server response time, milliseconds
4.2     Presentation of the Knowledge About the Cyberattacks As the Set
        of the Feature Vectors

All the above-mentioned features are the base of the set of feature vectors X = {xk }k =1 ,
                                                                                                     NX


where each of feature vector xk describes the botnet’ attack and the legitimate traffic,
 N X – the number of the feature vectors. Employing the obtained from the network
traffic features, which are presented as the feature vectors, the set of rule R is con-
structed. Each rule describes specified DDoS botnet’ attack. The set of feature vectors
forms the training set, which is used for the semi-supervised learning.
For instance, the rule R describes the smurf DDoS botnet’ attack can be presented as
follows:
 R : if (((d c > δ)or (d EL > δ′' ) )and (d PSB ∈ [ϕ, ϕ′])and ((vOBP < σ)and (vOPS < ο)and (vOBS < κ) )and
             and ((vIBP < ε)and (vIPS < β)and (vIBS < γ ) )and ((oSS > τ)or (oIS > τ) )) ⇒ a13      (1)


4.3     Labeling the Obtained Feature Vectors of the DDoS Botnets
        Attacks for the Purpose of the Clusters' Formation

Let c denote the number of the predefined clusters of feature vectors. Each cluster cor-
responds to the specified DDoS botnets attacks and one cluster corresponds to the le-
gitimate network traffic. The membership of the feature vector xk to the i-th cluster
indicates the DDoS botnets attacks performance or its absence in the network.
   In order to construct the centroid (the prototype) of the i-th clusters, vi, the labeled
data are to be assumed. It is based on the knowledge about the features that may indicate
the DDoS botnets’ attacks in the network and is presented as the set of feature vectors.
Each feature vector xk of labeled data belongs to one of the predefined clusters. The
semi-supervised fuzzy c-means clustering is based on the minimization of the following
objective function [13]:
                                                                            p

                            J k = ∑ ∑ u ikp d ik2 + α ∑ ∑ (u ik − f ik bk ) d ik2 ,
                                    c   N              c   N
                                                                                                    (2)
                                   i =1 k =1          i =1 k =1

where N – the total number of the feature vectors to be clustered (labeled and unla-
beled feature vectors), u ik – the membership value for the k-th feature vector in the i -
th cluster, f ik – the membership value of the k-th labelled feature vector in the i-th
cluster, d ik – the distance between the k-th feature vector and prototype of the i-th
cluster, b = [bk ] – a boolean indicator, which distinguishes the labeled and unlabeled
feature vectors:
                               1, if feature vector x k is labeled,
                          bk =                                                         (3)
                               0, otherwise.
The centroid of the i-th cluster, vi, and the partition matrix uik are calculated using the
formulas (4) [28]:
                                                                          
                   N
                                               1 − b c f                
                  ∑ u xk                    + α         k ∑ ik 
                        2
                                          1
                       ik
                                     1                   l =1  + αf b  ,
             vi = k =1N    , u ik =                                  ik k         (4)
                                    1+ α 
                                                              2
                                                 c 
                    ∑ uik2                            d ik                
                     k =1                     ∑                       
                                              l =1
                                                     d lk                
where α denotes a scaling factor to maintain a balance between the supervised and un-
supervised component within the optimization mechanism [13].
   As a distance metric between the k-th feature vector and the centroid of cluster the
Mahalanobis distance was used:
                                  d ik = x k − vi       A x k − vi ,
                                                    T
                                                                                       (5)
with A being a positive definite matrix in R × R .
                                                n        n


4.4    Gathering the Inbound and the Outbound Network Traffic
At this stage of the method for the purpose of the DDoS botnets’ attacks detection, the
monitoring of the network activity, that may indicate its appearance, is performed. The
gathered information is sent to the classifier for the further analysis.


4.5    Construction of the Feature Vectors and the Implementation of the
       Semi-Supervised Fuzzy C-Means Clustering for the DDoS Botnets
       Attack Classification
The features that may indicate the presence of the DDoS botnets’ in the network are
extracted from data gathered at the previous stage, and are to be analyzed. The result of
the analysis is conclusion about the presence or absence of DDoS botnet attack. As the
means of the classification is the semi-supervised fuzzy c-means clustering was used.
The objects of the clustering are the feature vectors x k , obtained in the analysis of the
payload of the inbound and outbound traffic about the possible network hosts' infection.
The result of clustering are the membership values uik of the feature vector x k to each
cluster i . The membership of feature vector x k to the i-th cluster assigns the type of
the DDoS botnets’ attack.


4.6    Localization of Hosts Infected with DDoS Botnets
Based on the membership of the vector of the to malicious traffic the localization of the
network host or hosts is carying out. It is performed using the logs with MAC- and IP-
addresses of the hosts that carried malicious network requests.


5      Experiments

In order to determine the efficiency of the proposed technique several experiments were
held. For the experiments the DDoS dataset [14] of the malicious network traffic was
used. For the experiments, a network of 80 hosts was employed, and each mentioned
above types of the DDoS botnets’ attacks were executed (simulated).
   Each experiment lasted 24 hours. Network traffic was captured by means of tcpdump
utility. As the training set 15% of feature vectors of the inbound and outbound network
traffic were labeled. The experimental results are presented in Table 2 and in Figure 1.
The results demonstrated that the efficiency of the DDoS botnets’ detection is at about
95%, while the rate of false positives is about 6%.

                Table 2. Experimental results: detection and false positives rates.

   Type of the DDoS botnet’s      Number of attacks      Detected attacks    False negatives
              attack
   ping flooding                           10                   10                    0
   smurf                                   10                    9                    1
   TCP SYN flood                           10                   10                    0
   fragmented UDP flood                    10                    9                    1
   DNS amplification                       10                   10                    0
   TCP reset                               10                   10                    0
   ICMP flood                              10                   10                    0
   SIP INVITE flood                        10                   10                    0
   encrypted SSL DDoS                      10                    9                    1
   ping sweep                              10                    9                    1
   DNS spoofing                            10                   10                    0
   ping of death                           10                   10                    0
   R.U.D.Y.                                10                    8                    2
   Total                                  130             124 (95,38%)                6


  Fig. 1. Results of clustering.
6      Conclusions

A new technique for the DDoS botnet detection based on the botnets network features
analysis is proposed. It uses the semi-supervised fuzzy c-means clustering. The pro-
posed approach includes the learning and the detection stages. Analysis is based on the
extracted from the network traffic features that may indicate the presence of the DDoS
botnets’ presence in the network. Experimental results demonstrated that the detection
rate is at about 95% and false positives 6%.


References
 1. Virus Bulletin, https://www.virusbulletin.com/, last accessed 2018/03/26.
 2. Cisco. A Cisco Guide to Defending Against Distributed Denial of Service Attacks,
    https://www.cisco.com/c/en/us/about/security-center/guide-ddos-defense.html, last ac-
    cessed 2018/03/26.
 3. Gupta, B. B., Badve, O. P. Taxonomy of DoS and DDoS attacks and desirable defense mech-
    anism in a cloud computing environment. Neural Computing and Applications, vol. 28, No.
    12, pp. 3655-3682 (2017).
 4. Saied, A., Overill, R. E., Radzik, T. Detection of known and unknown DDoS attacks using
    Artificial Neural Networks. Neurocomputing, vol. 172, pp. 385-393 (2016).
 5. Somani, G., Gaur, M. S., Sanghi, D., Conti, M., Buyya, R. DDoS attacks in cloud computing:
    Issues, taxonomy, and future directions. Computer Communications, vol. 107, pp. 30-48 (2017).
 6. Matta, V., Di Mauro, M., Longo, M. DDoS attacks with randomized traffic innovation: bot-
    net identification challenges and strategies. IEEE Transactions on Information Forensics and
    Security, vol. 12, No. 8, pp. 1844-1859 (2017).
 7. Sieklik, B., Macfarlane, R., Buchanan, W. J. Evaluation of TFTP DDoS amplification attack.
    Computers & security, No. 57, pp. 67-92 (2016).
 8. Antonakakis, M., April, T., Bailey, M., Bernhard, M., Bursztein, E., Cochran, J., Kumar, D.
    Understanding the mirai botnet. In USENIX Security Symposium, pp. 1092-1110 (2017).
 9. Lysenko, S., Savenko, O., Kryshchuk, A., Kljots, Y. Botnet detection technique for corporate
    area network. In: Proceedings of the 2013 IEEE 7th International Conference on Intelligent
    Data Acquisition and Advanced Computing Systems (IDAACS), pp. 363-368 (2013).
10. Savenko, O., Lysenko, S., Kryshchuk, A. Multi-agent Based Approach for Botnet Detection
    in a Corporate Area Network Using Fuzzy Logic. In: International Conference on Computer
    Networks: Springer, pp. 146-156 (2013).
11. Pomorova, O., Savenko, O., Lysenko, S., Kryshchuk, A., Bobrovnikova, K. Antievasion
    technique for the botnets detection based on the passive DNS monitoring and active DNS
    probing. In: International Conference on Computer Networks: Springer International Pub-
    lishing, pp. 83-95 (2016).
12. Lysenko, S., Savenko, O., Bobrovnikova, K., Kryshchuk, A., Savenko, B. Information Tech-
    nology for Botnets Detection Based on Their Behaviour in the Corporate Area Network. In:
    International Conference on Computer Networks: Springer, Cham, pp. 166-181 (2017).
13. Pedrycz, W., Waletzky, J. Fuzzy clustering with partial supervision. IEEE Transactions on
    Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 27, No. 5, pp. 787-795 (1997)
14. Canadian Institute for Cybersecurity. Botnet dataset, https://www.unb.ca/cic/datasets/bot-
    net.html, last accessed 2018/03/26.