Semantic Models for Network Intrusion Detection
                                                     Peter Bednar, Martin Sarnovsky, Pavol Halas
                                                  Department of Artificial Inteligence and Cybernetics
                                                            Technical University of Kosice
                                                                   Kosice, Slovakia
                                                              {name.surname}@tuke.sk


    Abstract—The presented paper describes the design and                         proposed combined approach. In this chapter we at first define
validation of the hierarchical intrusion detection system (IDS),                  quantitative evaluation metrics and then summarize the
which combines machine learning approach with the knowledge-                      performance of the system on the standard benchmark dataset
based methods. As the knowledge model, we have proposed the                       from the KDD Cup competition.
ontology of network attacks, which allow to us decompose
detection and classification of the existing types of attacks or
formalize detection rules for the new types. Designed IDS was
evaluated on a widely used KDD 99 dataset and compared to                         II. HIERARCHICAL KNOWLEDGE-BASED INTRUSION DETECTION
similar approaches.                                                                                                SYSTEM

    Keywords—ontologies, network security incidents, machine                      A. Overall system architecture
learning                                                                              The main objective of the proposed architecture is to
                                                                                  hierarchically decompose detection and classification of the
                          I. INTRODUCTION                                         intrusions according to the types of the attacks. For the
   With the extensive usage of the information and                                decomposition we have proposed the Network Intrusion
communication technologies the number and variety of the                          Ontology which main part is formalized as the taxonomy of
security attacks grow. This is also reflected in the growing of                   attack types. This ontology allows to capture all knowledge
budget invested by companies or public institutions into the                      related to the known types of the attacks, including the
security. In order to cope with the current situation, the new and                description of rare cases which are difficult to detect using the
innovative techniques are applied in order to automatize the                      machine learning methods.
security management [1].                                                             The main decomposition of the detection and classification
    Recently, we can observe two main approaches to the                           process can be divided into the following phases:
security of the ICT: the first approach is data-oriented, and it is                   1.   Coarse attack/normal classification - this phase is
based on the application of machine learning techniques to                                 implemented using the machine learning algorithm
proactively achieve the best possible prediction of the new                                which distinguish normal traffic and attacks. If a
attacks [2][3][4][5]. The second approach is more user-centric                             network connection is labelled as a normal one, then an
and it is based on the application of knowledge modelling                                  alarm is not raised. Otherwise, the suspicious
techniques in order to model user behavior and ICT environment                             connection is processed by a set of models to determine
[8][9][10].                                                                                the class of attack during the phase 2.
    The presented article tries to combine these two approaches                       2.   Attack class and type prediction—this phase is guided
into a single system, where the domain knowledge about the                                 by the taxonomy of the attacks from the Network
types, effects and severity of the attacks is used to decompose                            Intrusion Ontology. The system hierarchically processes
intrusion detection task into the classification subtasks which                            the taxonomy and selects the appropriate model to
can be handled more efficiently with less training data. The                               classify the instance on a particular level of a class
design of the proposed intrusion detection system is symmetrical                           hierarchy. The model can be a machine learning model
in the sense that both approaches (machine learning and                                    statistically inferred from the training data, or rule-based
knowledge based) are equal and mutually contribute to address                              model formalized using the classes and relations from
the challenges of the detection and prevention of the security                             the ontology.
threats.
                                                                                      3.   When a class of attack is predicted, ontology is queried
    The rest of this paper is organized as follows: in the                                 for all relevant sub-types of the attack type and to
following chapter we will present hierarchical knowledge model                             retrieve the suitable model to predict the particular sub-
in the form of the ontology which will be used for the                                     type. Knowledge model can also be used to extract
decomposition of the detection problem and which will provide                              specific domain-related information as a new attribute,
additional contextual information. Subsequent part describes                               which could be used either to improve the classifier’s
implemented machine learning models and how these models                                   performance or to provide context, domain-specific
are combined with the knowledge in the ontology. Subsequent                                information which could complement the predictive
section then presents the experimental evaluation of the                                   model.


Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

                                                                             25
The details about the predictive models and their evaluation will          system. The main concepts and relations of the ontology are
be presented in the subsequent chapter.                                    represented on the Figure 1.
B. Network Intrusion Ontology                                                  The central part of the proposed semantic model is the
    The proposed knowledge model captures all essentials                   taxonomy of Attacks which are summarized in the following
concepts required to describe network intrusion systems. We                figure. The taxonomy was extracted from the types of the attacks
have designed our semantic model according to the                          described in the KDD 99 datasets. Attacks are divided into the
methodology proposed by Grüninger and Fox and with some                    four main groups such as DOS, R2L, U2R and PROBE. The
extensions from Methodology.                                               main types of the attacks are further specified on the additional
                                                                           level of the hierarchy.
    The designed ontology is formalized using the OWL 2 RL
profile, which allows to formalize common constructs such as
multiple hierarchies and at the same time provides compatibility
with the rule languages for automatic reasoning. As the objective
of the knowledge model was to use it in the data analytical tasks,
the concepts and properties map directly to the data used in the
process. Moreover, ontology was extended with the concepts
related to the classification models, to create the relation
between the particular classifier and its usability on the specific
level of target attribute hierarchy. The main classes of ontology
include:
   •    Connections - This class represents the status of each
        connection record. It specifies Attack connection or
        normal traffic. Attack connections are further
        conceptualized using the Attack hierarchy described
        below.
   •    Effects - This class contains subclasses that represent all
        possible consequences of individual attacks (e.g., slows
        down server response, execute commands as root, etc.).             Fig. 1. The main concepts of the proposed sematnic model.

   •    Mechanisms - The subclasses represent all possible
        causes of individual ontology attacks (poor environment
        sanitation, misconfiguration, etc.).
   •    Flags - The subclasses represent normal or error states
        of individual connections (Established, responder
        aborted, Connection attempt was rejected, etc.). Each of
        these subclasses has a 1 equivalent instance.
   •    Protocols - The class contains subclasses that represent
        the types of the communication protocols on which the              Fig. 2. The hierarchy of Attacks.
        connection is running (TCP, UDP, and ICMP).
                                                                           C. Machine learning models
   •    Services - The subclasses represent each type of
        connection service (http, telnet, etc. ...). Each of these             To evaluate the proposed approach, we used the KDD Cup
        subclasses has a 1 equivalent instance.                            1999 competition dataset, which is a commonly accepted
                                                                           benchmark for the intrusion detection task. The dataset consists
   •    Severities - This class represents the severity of the             of the records from the device logs in a LAN network collected
        attack, its subclasses represent the severity level (weak,         over nine weeks. For the evaluation, we have used 10% sample
        medium, and high).                                                 with the 494,021 records in total. Each record is labeled as the
                                                                           normal communication or it is assigned to the major attack class
   •    Targets - The subclasses represent possible targets of a
                                                                           and specific attack types. There are 22 different attack types
        given type of attack (user, network).
                                                                           which corresponds to the classes in the proposed ontology.
   •    Models concept covers the classification models used to
                                                                               The common problem with the diagnostic tasks such as
        predict the given target attribute.
                                                                           intrusion detection systems is that the target attribute (i.e. in our
    The instances of the specified classes represent the network           case type of the attack) is highly unbalanced with the majority
connections (e.g., connection records from the data set). Trained          of normal communication. Table I presents the taxonomy of
and serialized classification models are instantiated as the               attack types together with the number of cases in the dataset.
instances of the Model class. The models are represented as the            Some attack classes such as Probe are more balanced but
web resources and they could be accessed by their URI property,            generally for each attack class we can find some minor types
which points to the location where the model is serialized in the          with only the few training examples. The lack of cases is


                                                                      26
problematic not only for the training of statistical models but             for the classification which were identified in the work of [4].
also for the evaluation. On the other side, rare cases can be still         The final list of features includes: service, src_bytes, dst_bytes,
very critical and can in overall a big impact on the security of the        logged_in,          num_file_creations,         srv_diff_host_rate,
system.                                                                     dst_host_count,                            dst_host_diff_srv_rate,
                                                                            dst_host_srv_diff_host_rate, srv_count, serror_rate, rerror_rate,
         TABLE I.      ATTACK TYPES AND NUMBER OF SAMPLES
                                                                                Since the data of diagnostic tasks are commonly highly
 Attack                Attack class          # of samples                   unbalanced towards the normal cases, the proposed approach is
                                                                            based on the decomposition of the diagnostic classification task
 back                                        2203
                                                                            into the hierarchy of classifiers. At the top level of the class
 land                                        21                             hierarchy, an attack detection model is used for the prediction to
                                                                            distinguish between the attack connections and normal traffic.
 neptune                                     107,201                        The classifier on this level was trained on the whole dataset and
                              DoS
 pod                                         264                            target attribute was transformed to the binary indicator. The
                                                                            main goal of this top-level classifier is to reliably separate
 smurf                                       280,790                        normal connections from the attack ones.
 teardrop                                    979                                If the top-level model detects an attack connection, the cases
                                                                            are further classified by the ensemble models into the one of the
 satan                                       1589
                                                                            four types of the attack on the second level of the taxonomy
 ipsweap                                     1247                           (DoS, R2L, U2R, Probe). In this level, we use ensemble
                             Probe                                          classifier with voting scheme trained on all attack instances (i.e.
 nmap                                        231                            without the normal communication cases). We found that the
 portsweep                                   1040                           proposed ensemble model is more efficient in the case of
                                                                            unbalanced target classes. The standard machine learning
 guess_passwd                                53                             models proposed in the previous works were able to gain good
                                                                            accuracy, achieved mostly on the dominant class (in our case on
 ftp_write                                   8
                                                                            KDD 99 dataset, on the most common DoS attack). However,
 imap                                        12                             the simple models struggled to predict minor classes such as
                                                                            U2R, which can be even more serious from the point of view of
 phf                                         4                              network security. For example, when training a decision tree
                              R2L
 multihop                                    7                              model, the model has very good performance for the DoS and
                                                                            R2L classes but missed a significant amount of the Probe attacks
 warezmaster                                 20                             and was not able to detect the U2R class at all.
 warezclient                                 1020                               Proposed weighting schema is based on the idea of
                                                                            complementing classifiers which is based on the performance of
 spy                                         2
                                                                            a particular model on the particular class. This weighting schema
 buffer_overflow                             30                             is presented on the Table II. The wi,j terms represent the weight
                                                                            associated with the i-th model and j-th class.
 loadmodule                                  9
                              U2R                                                 TABLE II.          WEIGHTING SCHEME OF THE ENSEMBLE MODEL
 perl                                        3
 rootkit                                     10                              Model            DoS            R2L         U2R         Probe

 normal                     Normal           97,227                          model 1          w1,1            w2,1        w3,1        w4,1
                                                                             model 2          w1,2            w2,2        w3,2        w4,2

    The records for each connection are described by set of                  model 3          w1,3            w2,3        w3,3        w4,3
features, which are represented in the ontology as the data                      ...           ...             ...         ...         ...
attributes. The features can be divided into the basic features,
content features and traffic features. Overall there are 32
features. The first group describes the type of the communication               After the binary classification and classification of the attack
protocol, duration of the connection, service on the destination            class by ensemble weighted classifier, we have trained particular
network node and other standard attributes describing the TCP               models to further classify specific type of the attack on the most
connection. Content features are attributes that can be linked to           specific level of the taxonomy. Four different models were
the domain specific knowledge depending on the applications                 trained using only the records of particular attack classes (i.e.
and environment in which communication occurs. The last                     models for DoS, R2L, U2R and Probe). The most problematic
group of features (traffic) describe the communication attributes           was minority U2R class, as the dataset contains very few records
captured during the 2 seconds time window, e.g. the number of               of that type. The final implemented classification schema is
hosts communicating with the target host etc. For the data                  presented on the Figure 3. All models were implemented in the
preprocessing, we have selected only the most relevant features             Python environment using the standard pandas and scikit-learn


                                                                       27
stack. Predictive models were then persistently stored and the               was in fact an attack, etc. The entire system was also evaluated
models URIs (Uniform Resource Locators) were added as the                    with the number of missed attacks and raised false alarms as
data properties to the knowledge model.                                      FAR metric (False Alarm Rate), which corresponds to the false
                                                                             positive records divided by total number of normal traffic
                                                                             records (true negative + false positive).
                                                                                 For the evaluation of the binary classification on the top level
                                                                             of the taxonomy, we used directly precision and recall metrics.
                                                                             In the subsequent stages on the more specific levels of taxonomy
                                                                             we have computed precision and recall for each class and used
                                                                             macro-averaging for overall evaluation. Additionally, we have
                                                                             computed multi-class confusion matrix to further investigate the
                                                                             types of the errors produced by the system.
                                                                             A. Training and evaluation
Fig. 3. The implemented hierarchical classification schema.
                                                                                 For the binary classification for the attack detection, we used
    The main role of the semantic model in the proposed                      the decision tree classifier. Dataset includes all records and
detection system is to navigate through the target class taxonomy            target attribute was transformed to binary indicator
and decompose classification problem to the sub-problems                     attack/normal traffic. The classifier was trained without the limit
implemented by the particular models for the specific type of                for maximum depth with default settings for pruning and gini
attack. The system is implemented using the Python language                  index as the splitting criterion. We split the dataset randomly to
and RDFlib package which provides integration with the                       70/30 training/testing ration. The testing data were also used for
ontology using the SPARQL query interface. When predicting                   overall evaluation of the entire system. Model for the binary
the unknown connection, system query the ontology using the                  classification achieved the accuracy 0.9997. The detailed
SPARQL query and retrieve correspondent model for the                        confusion matrix is presented in the Table III.
particular class of the attacks according to the URL stored in the
                                                                               TABLE III.        PERFORMANCE OF THE BINARY ATTACK CLASSIFICATION
hasTargetAttribute property. Once the classification of the main
type is performed, the system checks in the ontology if there is a                          Normal           Attack     Precision   Recall
classifier able to process the record further and to detect subtype
of the attack.                                                                Normal            29,095         11
                                                                                                                          0.999         0.999
    Besides the hierarchical decomposition of the detection                   Attack              35         119,066
process, knowledge model provides also additional context
which can be leveraged during the classification and improve
detection of the minor classes. We have mainly extended the                      For the training of ensemble classifier, we have selected only
context with the potential effect of the attack. Additionally, if the        the attack records from the training set. As the base classifiers
models are not reliable enough to predict the concrete attack sub-           we have used various configuration of the Naive Bayes and
type, the system can be used to classify attacks at least according          Decision Tree models. The experiments proved that the Decision
to the severity which is retrieved from the knowledge model for              Tree classifier performed well on the Probe, DoS and R2L
the particular main class of the attack. This could serve as a               attacks. On the other hand, for the U2R class model produces
supporting source of information, completing the attach type                 many false alarms or (depending on pruning) the model was not
classification.                                                              able to detect U2R attacks at all. For this reason, we have trained
                                                                             one-vs-all model just to separate U2R class. We have then
                          III. EVALUATION                                    combined both types of the models into the ensemble classifier.
    For the evaluation, we used the most common metrics                      The weights of the base classifiers were computed according to
employed in the classification tasks such as recall and precision.           the accuracies of the models on the training data. For the
We have also computed confusion matrix for the particular                    evaluation we have used the same 70/30 dataset split as for the
classes of attacks. The confusion matrices were especially                   binary classification, but we have further selected only the attack
informative since they record number of correctly and                        records (since the normal communication is filtered already by
incorrectly classified examples and also the types of the error.             the binary classifier). In total, models were trained on 396743
For the binary classification on the top level of the taxonomy               records. The confusion matrix of the ensemble classifier is
hierarchy we used standard evaluation metrics:                               presented on the Table IV.

         •     Precision: P = TP / (TP + FP)                                 TABLE IV.          PERFORMANCE OF THE ENSEMBLE ATTACK CLASSIFICATION

         •     Recall: R = TP / (TP + FN)                                                Probe       U2R      DoS       R2L       Prec.    Rec.
    where TP, TN, FP, FN are numbers of true positive, true                   Probe      1279            0       1         0      0.992    0.992
negative, false positive and false negative records (e.g. for true
positive number of records when the predicted attack was in fact              U2R           0          15        0         0        1      0.882
attack, false positive when the predicted attack was in fact a                DoS           6            0    117,385      0      0.999    0.999
normal traffic, false negative when the predicted normal traffic


                                                                        28
 R2L          4           2            0           331        0.982       1        model for the detection of the attack class. Overall achieved
                                                                                   performance was 0.999 precision and recall with very good
                                                                                   accuracy for the high and low severity. The Table VIII presents
    On the most specific level of the taxonomy, each major                         the confusion matrix for the severity detection in comparison for
attack class has dedicated one model for the further classification                each class of the attack.
of subtypes. The performance of each model was evaluated
using the precision and recall macro-averaged for each subtype.                        TABLE VIII.     CONFUSION MATRIX FOR THE SEVERITY DETECTION
The overall performance of the models is summarized in Table                                    High        Low      Medium      Prec.     Recall
V.
                                                                                    DoS        117695        0          0
     TABLE V.         PERFORMANCE OF THE SUBTYPE CLASSIFICATION
                                                                                    Probe        443         0         779
               Probe           U2R                DoS           R2L                                                              0.999      0.999
                                                                                    R2L           0         346         6
 Accuracy         0.991             0.937          0.999          0.989
                                                                                    U2R           0          0          20
 Precision        0.989             0.927          0.999          0.879
 Recall           0.989             0.875          0.999          0.833
                                                                                       Medium severity was biased by our model towards the high
                                                                                   severity which has the similar effect like the higher false positive
                                                                                   rate. Further details and information about the designed model
    The overall system with the hierarchical classification was                    were published in [9].
evaluated using the standard precision, recall F-measure and
FAR (False Alarm Rate) metrics. Comparison of the proposed                                     IV. CONCLUSION AND FUTURE WORK
system and models published in previous works [4][6][7][11] is
presented in Table VI.                                                                 In this paper we have proposed an approach based on the
                                                                                   combination of knowledge based and machine learning methods
          TABLE VI.       OVERALL PREFORMANCE OF THE SYSTEM                        for intrusion detection. The proposed knowledge model in the
                                                                                   form of the ontology is used for the hierarchical decomposition
 Classifier             Acc.           Prec.        F1           FAR               of the detection process according to the types of the attack. This
                                                                                   decomposition allows to overcome the problems with the
 C4.5                   0.969          0.947        0.970        0.005
                                                                                   unbalanced training data which are typical for the diagnostic
 Random forests         0.964          0.998        0.986        0.025             machine learning tasks. By the leveraging of the domain
                                                                                   knowledge, our combined approach also provides an additional
 Forest PA              0.975          0.998        0.998        0.002             context which includes for example the effects and severity of
 Ensemble model         0.976          0.998        0.998        0.001             the attacks.

 Our approach           0.998          0.998        0.998        0.001                 The performance of the proposed IDS is 0.998 in terms of
                                                                                   precision as well as recall and 0.001 in terms of FAR metric,
                                                                                   which on the standard benchmark dataset outperforms other
                                                                                   state-of-the-art methods. Moreover, the proposed method has
   Additionally, we have computed confusion matrix, which
                                                                                   also potential to partially detect new emerging types of attacks
summarizes the performance for each attack class. The
                                                                                   in terms of the contextual information stored in the knowledge
confusion matrix is presented in the Table VII.
                                                                                   model.
 TABLE VII.       CONFUSION MATRIX FOR THE OVERALL PREFORMANCE OF                      In the future work we plan to extend the role of the
                                   THE SYSTEM
                                                                                   knowledge model by introducing a rule-based classifier which
              Probe           U2R           DoS          R2L      Normal           will be based on the declarative rules and application of
                                                                                   automatic reasoning technique and logical programming. We
 Probe        1176             0             5            0           7            hope that this will allow to further improve accuracy for minor
 U2R              0           15             0            0           5            classes with the low number of training examples. Additionally,
                                                                                   extended knowledge model will allow to create formalized
 DoS              4            0        117547            0           1            knowledge base of the existing cases.
 R2L              3            1             0           346          7                                   ACKNOWLEDGMENT
 Normal           1            0             3            1       48454               This work was supported by Slovak Research and
                                                                                   Development Agency under the contract No. APVV-16-0213
                                                                                   and by the VEGA project under grant No. 1/0493/16.
   Besides the classification of attack types, we have
implemented and also evaluated the classification of the attack
severity. To train the severity detector we have used 10 % of
KDD 99 dataset with the 70/30 training/testing ratio. The
severity classifier was applied complementary to the ensemble


                                                                              29
                                REFERENCES                                               [6]  Sharma, N.; Mukherjee, S. A Novel Multi-Classifier Layered Approach
                                                                                              to Improve Minority Attack Detection in IDS. Procedia Technol. 2012, 6,
                                                                                              913–921.
[1]   Park, J. Advances in Future Internet and the Industrial Internet of Things.        [7] Ahmim, A.; Ghoualmi Zine, N. A new hierarchical intrusion detection
      Symmetry 2019, 11, 244.                                                                 system based on a binary tree of classifiers. Inf. Comput. Secur. 2015, 23,
[2]   Javaid, A.; Niyaz, Q.; Sun, W.; Alam, M. A Deep Learning Approach for                   31–57.
      Network Intrusion Detection System. In Proceedings of the 9th EAI                  [8] Abdoli, F.; Kahani, M. Ontology-based distributed intrusion detection
      International Conference on Bio-inspired Information and                                system. In Proceedings of the 2009 14th International CSI Computer
      Communications Technologies (formerly BIONETICS), New York, NY,                         Conference, Tehran, Iran, 20–21 October 2009; pp. 65–70.
      USA, 3-5 December 2016.
                                                                                         [9] Sarnovsky, M.; Paralic, J. Hierarchical Intrusion Detection Using
[3]   Khan, M.A.; Karim, M.d.R.; Kim, Y. A Scalable and Hybrid Intrusion                      Machine Learning and Knowledge Model. Symmetry 2020, 12, 203.
      Detection System Based on the Convolutional-LSTM Network.
      Symmetry 2019, 11, 583.                                                            [10] More, S.; Matthews, M.; Joshi, A.; Finin, T. A Knowledge-Based
                                                                                              Approach to Intrusion Detection Modeling. In Proceedings of the 2012
[4]   Zhou, Y.; Cheng, G; Jiang, S.; dai, M. An efficient detection system based              IEEE Symposium on Security and Privacy Workshops, San Francisco,
      on feature selection and ensemble classifier. arXiv 2019,                               CA, USA, 24–25 May 2012; pp. 75–81.
      arXiv:190401352
                                                                                         [11] Özgür, A.; Erdem, H. A review of KDD99 dataset usage in intrusion
[5]   Aljawarneh, S.; Aldwairi, M.; Yassein, M.B. Anomaly-based intrusion                     detection and machine learning between 2010 and 2015. PeerJ Preprints
      detection system through feature selection analysis and building hybrid                 2016, 4, e1954v1.
      efficient model. J. Comput. Sci. 2018, 25, 152–160.


                                                                                    30