=Paper=
{{Paper
|id=Vol-2859/paper18
|storemode=property
|title=Rule-oriented Method of Cyber Incidents Detection by SIEM Based on Fuzzy Logical Inference
|pdfUrl=https://ceur-ws.org/Vol-2859/paper18.pdf
|volume=Vol-2859
|authors=Ihor Subach,Volodymyr Kubrak,Artem Mykytiuk,Stanislav Korotayev
|dblpUrl=https://dblp.org/rec/conf/its2/SubachKMK20
}}
==Rule-oriented Method of Cyber Incidents Detection by SIEM Based on Fuzzy Logical Inference==
<pdf width="1500px">https://ceur-ws.org/Vol-2859/paper18.pdf</pdf>
<pre>
210


      Rule-oriented method of cyber incidents detection by
             SIEM based on fuzzy logical inference

 © Ihor Subach, © Volodymyr Kubrak, © Artem Mykytiuk, © Stanislav Korotayev

    Institute of Special Communication and Information Protection of the National Technical
            University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine

                                  igor_subach@ukr.net


        Abstract. We consider the role of SIEM in the protection circuit of information
        and telecommunication system for proactive cyber incident management. We
        provide the main mechanisms of the process of correlation of events on the de-
        tection of cyber-attacks, malicious activity, and violations of security policy. We
        analyze identification methods of signs of deletion, integration, and connection
        of the processed information, as well as the establishment of its reasons and pri-
        orities. We outline the main disadvantages of the rule-oriented method. We pro-
        pose the implementation of the model and method of cyber incident recognition
        under incompleteness or inaccuracy of information about the incidents based on
        the application of fuzzy set theory and fuzzy inference. We present the formal
        statement of the problem of cyber incident detection by the SIEM and propose
        its solution. The problem of incident identification is solved by finding a mapping
        between the set of signs of cyber incidents and the set of their possible classes.
        Graphical interpretation of the problem of cyber incident identification is pre-
        sented and the main difficulties that arise during its solution are formulated. Em-
        phasis is placed on the expediency of creating a subsystem of intelligent decision
        support in the SIEM, which should be based on the model of cyber incident iden-
        tification based on fuzzy rules and fuzzy inference, where the causal relationship
        between a cyber incident and its features are described by an expert in natural
        language, and then formalized as a set of fuzzy logical rules. An algorithm for
        deciding on cyber incident identification is proposed. The data on the practical
        effectiveness of the proposed method is presented.

        Keywords: cybersecurity, cyber defense, cyber-attack, cyber incident, SIEM,
        fuzzy set theory, tuple recognition model, rule-oriented method


1       Introduction

Building of an effective cyber defense system should be based on proactive Security
Information and Event Management (SIEM) [1]. The use of SIEM in the protection
circuit allows for effective proactive management of cyber incidents, based on auto-
mated mechanisms that use information about events that have already occurred in the
system, predict future events that will occur in it, and adapt system protection parame-
ters to its current status.

Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
                                                                                        211


    A cyber incident is an event or a series of adverse events that bear signs of a possible
cyberattack, which threaten the security of electronic communications systems, process
control systems, create a possibility of violation of the normal operation of such sys-
tems, including failure and/or blocking of the system and/or unauthorized management
of its resources, and endanger the security of electronic information resources [2].
    The architecture and functional model of the proactive SIEM were considered in [1].
According to the tasks performed by this system (collection, processing, and analysis
of security events coming to it from many disparate distributed sources), the basis of
its operation includes the following mechanisms: normalization, filtering, classifica-
tion, aggregation, correlation, prioritization, and analysis of events and cyber incidents
and their consequences, as well as generation of various reports, messages and visual
presentation of data for prompt and informed decision-making [1]. The methodology
of rational selection of SIEM for the construction of SOC (Security Operation Center)
is given in [3].
    In some sources [4, 5] these mechanisms are considered as stages of the general
process, which is called the correlation process. It has a special place in the SIEM, as
its purpose is to detect cyberattacks, malicious activity, security policy violations, and
others [6]. This purpose is achieved by addressing a wide range of tasks that it covers:
identifying potential relationships between disparate security information; grouping
low-level events into higher-level events; detecting potential incidents based on analy-
sis of the behavior of various infrastructure objects, and others.
    Technologically, as part of the SIEM, the correlation method includes a sequence of
actions on the data, which aims to identify, in a certain way, signs of deletion, integra-
tion, and linking of processed information, as well as establishing its causality and pri-
ority [4, 5]. These features are called correlation features.
    To achieve these objectives, at different stages of the correlation process, a wide
variety of methods are used [7, 8, 9, 10, 11], such as: the method based on finite ma-
chine states (finite state machines), which is used to identify dangerous states of the
system; rule-oriented method, which is based on rules that have clear syntax and se-
mantics; the method of reasoning based on precedents; Bayesian network method,
which is used at the stage of multi-step event correlation, loss analysis, and prioritiza-
tion; artificial neural networks, which are also used for event correlation, loss analysis,
and prioritization, and others.
    Analysis shows that the most common method is the rule-oriented method, but due
to the fact that it is based on classical production rules, which do not always give the
expected result in terms of incomplete and inaccurate information about cyber inci-
dents, its application is not always effective.
    Therefore, the task of developing models and methods for recognizing cyber inci-
dents in conditions of incompleteness or inaccuracy of information about them is rele-
vant.
    The aim of the work is to develop a model and rule-oriented method of detecting
cyber incidents by SIEM based on fuzzy inference.
212


2      Statement of the problem of cyber incident detection by SIEM

  Any cyber incident is characterized by a set of information features, on the basis of
which, in turn, it can be recognized.
       O  oi  i  1, n 
  Let                       the set of information features of cyber incidents that occur
in  the       system           and  are              represented    by      the    set
 C   C j C j  o j1 , o j 2 , , o jm , j  1, J ,
                                                       where information signs are associated
                        Cj
with a cyber incident          .
    Then the model of cyber incident recognition can be represented by a tuple [12]:
                                     M  K , Oi , R, C
                                                             ,                             (1)
                                 o O                                       R  Ri 
where K is a feature classifier; i        is a set of the observed features;           is a
set of cyber incidents recognition rules; C – a cyber incident.
   The process of recognizing cyber incidents is carried out based on rules (usually,
production rules):
                     R1 : ( K , Oi ), R2 : ( K , Oi ), , Rl : ( K , Oi )  C
                                                                                .

   However, in traditional production systems, the rules are classic products that do not
fully meet the conditions of incompleteness and inaccuracy of information about cyber
incidents that occur during operation of information and telecommunications systems.
As a rule, for this purpose, methods, and models of fuzzy set theory on fuzzy inference
are used [13].


3      The method of solving the problem

   Based on above-listed considerations, model (1) can be developed and presented as
follows:
                                   MF  KF , Oi , RF , C
                                                                 ,                         (2)
                                   RF  RFi 
where KF is a fuzzy classifier,                    is a set of fuzzy cyber incident recognition
rules:
                  RF1 : ( K , Ov ), RF2 : ( K , Ov ), , RFl : ( K , Ov )  C
                                                                                    .

    On the other hand, based on the works [15, 16], the problem of recognizing cyber
incidents can be considered as a problem of their identification, the solution of which
is to find a mapping:
                                           
                    O *  o1* , o2* , , on*  c j  C  c1 , c2 , , cm ,
                                                                                           (3)
                                                                                             213

         *
where O  is a set of signs a cyber incident; a set of possible      cyber incidents.
                                                 oi  o i , o i 
   Range of change of signs of cyber incidents                 , i  1, n , and the original
                                   k   k,k
value of the identification result            are considered known. Accordingly,
   
oi oi
      is the lower (upper) value of the cyber incidence parameters, i
                                                                                             
                                                                               o , i  1, n, k, k

is the lower (upper) value of the identification result k.
    Graphically, the problem of identifying cyber incidents can be represented as follows
(see Fig. 1):


              Input Parameters
                                                                         Solution:
          (Signs of Cyber Incidents)
                                                                    Cyber Incident Class
                      o1                                                        C1
                                     Fuzzy
                      o2
                                   knowledge               Logic                 C2
                       y *  D.
                                      base                 output              y *  D.


                                                                                             y
                                    (fuzzy                 unit               
                      on                                                       Cm
                                     rules)


        Fig. 1. Graphical interpretation of the problem of cyber incident identification.

   In this case, the main difficulties that arise when solving the problem (3) are as
follows:
   first: for correct identification of a cyber incident, it is necessary to take into account
a large number of heterogeneous parameters of the system (quantitative and
qualitative), which, in turn, requires a highly qualified cybersecurity officer, as well as
appropriate time;
   second: the lack of analytical dependence between the cyber incident and its signs.
   These difficulties confirm the expediency of creating an intelligent decision support
subsystem within the SIEM. Its operation should be based on a model of cyber incident
identification based on fuzzy rules and fuzzy inference.
   At the same time, for its development, it is necessary to take into account the
linguistic nature of the type of cyber incident (output variable) and its features (input
variables). In turn, the causal relationship between a cyber incident and its signs must
be described by an expert in plain language and then formalized as a set of fuzzy logical
rules.
   It should be noted that with a large number of signs of cyber incidents, it is advisable
to build a tree of inference, which determines the order of embedding statements in
each other.
   Figure 2 shows the inference tree for the correlation rule [16]: if on one computer
with the same IP address, seven user authentication attempts using different user IDs
have failed within ten minutes, and a successful login of a user into the system from
214


any computer with the same IP address has been successful, then this event must be
addressed by a security officer.


        o1           o2          o3             o4              o5               o6


      Failed login          ξα                                        ξβ              Successful
                                                                                        login

                                   α                 ξс              β


                Cyber incident            с1               с2              Normal state

                      Fig. 2. Logical inference tree for the correlation rule.


  Here through 𝑜          𝑜 marked signs of a cyber incident (Table 1).

                             Table 1. Signs of a cyber incident.
 Sign                       The content of the sign                                       Type
 O1           The number of failed login attempts                                        numerical
 O2           The number of users IDs                                                    numerical
 O3           The duration of login attempts                                             numerical
 O4           The number of IP-addresses involved during login                           numerical
 O5           The number of computers involved during login                              numerical
 O6           The number of successful login attempts                                    numerical

  In turn, c1 and c2 indicate the type of event occurring in the system (Table 2).


                     Table 2. The type of event occurring in the system.
             Event                               Event content
               α                       Successful login
               β                       Failed login
              С1                       Cyber incident
              С2                       Successful login
                                                                                     215


  The structure of the logical output tree corresponds to relations (4)-(6):
                                      c   c  ,  ,
                                                                                    (4)
                                  o1 , o2 , o3 , o4 , o5 ,                    (5)
                                     o3 , o4 , o5 , o6 .
                                                                                    (6)

       Table 3. Multidimensional matrix of knowledge about cyber incidents.
                                                Type               Cyber incident
              Sign       Value
                                                 α
              O1         L
              O2         H
              O3         L                        H
              O4         L
              O5         aA                                                С1
              Sign       Value                     β
              O3         bA
              O4         L
                                                  H
              O5         H
              O6         L
              Sign       Value                     α
              O1         aA
              O2         aA
              O3         bA                       aA
              O4         L
              O5         A
                                                                         С1
              Sign       Value                     β
              O3         bA
              O4         L
                                                  aA
              O5         aA
              O6         L

                                         o  o6 ,  , 
   A single scale of qualitative terms 1                is used to estimate the values of
linguistic variables: L - low; bA - below average; A - average; aA - above average; H -
high. Each of these terms is given by the corresponding membership function.
   From a formal point of view, the problem of cyber incident identification based on
fuzzy rules and fuzzy inference corresponds to the mathematical model of object
216

                                                                                                  c1
identification with a discrete output [14, 15]. Thus, to identify a cyber incident                     , the
ratio is as follows:
                  с1 с     H     H       aA     aA   ,
                                                                         
                                                                                                        (7)

where
                                                                            
 H      H o1    H o2    L o3    L o4    aA o5 ,
                                                                            

                                                                 
       H      bA o3    L o4    H o5    L o6 ;
                                                                 

                                                                                 
       aA      aA o1    aA o2    bA o3    L o4    A o5 ,
                                                                                 

                                                                     
       aA      bA o3    L o4    aA o5    L o6 ;
                                                                     


   and
         с ,   ,   ,  оi   are corresponding membership functions.
   These fuzzy logical equations allow us to make a decision in favor of identification
of a cyber incident based on the following algorithm:
   Step 1. The values of the signs of cyber incidents are recorded
                                                                                          
                                                                          O *  o1* , o2* , , o6*        
.                                                              k *
   Step 2. The values of membership functions  oi are determined at fixed  
parameter values oi , i  1, 6; k  L, bA, A, aA, H .
                         *


   Step 3. Based on logical equations (7), the values of membership functions
  c
                                                                                           
 j o1* , o2* ,, o6* are calculated by the vector of attributes O *  o1* , o2* , , o6* for all
                           c ,c
types of cyber incidents 1 2 . Logical operations AND
                                                                          and OR  on
membership functions are replaced by operations min and max:
                                       
                                                                         
                  k o *i   k o *j  min   k o *i ,  k o *j  ; i  j,
                                                                 
                                                                                        (8)

                                                                     
                           k o *i   k o *j  max   k o *i ,  k o*j  ; i  j,
                                                       
                                                                                                        (9)

                                       c *j
   Step 4. Choice of solution                 (the type of cyber incident) provided:
                                                                                      217
                      c
                                                  c
                                                                       
                     j o1* , o2* ,, o6*  max   j o1* , o2* , , o6* .
                                                                                   (10)


   It should be noted that the adequacy of this model and the effectiveness of the
method of detecting cyber incidents, which is based on the proposed model,
respectively, are determined by the quality of membership functions, through which
linguistic estimates are quantified. Due to the fact that these membership functions are
determined by experts, the adequacy of the fuzzy knowledge base will depend entirely
on the qualifications of experts.
   However, it should be noted that as a result of SIEM operation, statistics on cyber
incidents will be collected, which makes it possible to assess the adequacy of the
proposed model and the method developed on its basis.
   Thus, it is quite expedient to perform additional training (system settings). This, in
turn, will allow the identification of cyber incidents that were not previously identified
by the system during its operation.
   Comparative analysis of the proposed method showed that, in comparison with
existing methods (the method of reference vectors, neural networks, k-nearest
neighbors, the method based on immune systems), it can increase the accuracyof cyber
incident detection (11) by 2-15 % (Table 4).
                                              TP
                                       P           ,
                                            TP  FP                                  (11)
where P (precision) is the accuracy of cyber incident detection;
  TP – the number of cyber incidents that are properly classified;
  FP – the number of cyber incidents classified as a normal state of the system [17].

Table 4. Comparative analysis of the proposed method of cyber incidents detection.
           Method           The accuracy of cyber incident detection               Δ
Method of reference vectors                  0,83                                +0,15
K-nearest neighbors                          0,85                                +0,13
Method based on immune sys-                                                      +0,02
                                             0,96
tems
Neural networks                              0,86                                +0,12
The proposed method                          0,98                                  -


4      Conclusions

   As a result of the conducted research, it is shown that the main role in a protection
circuit of information and telecommunication system for proactive management of
cyber incidents belongs to SIEM.
218


    The results of the analysis indicate the feasibility of using a rule-oriented method to
identify signs of deletion, aggregation, and linking of information processed, as well as
to establish its causality and priority.
    To increase the efficiency of the rule-oriented method of recognizing cyber incidents
in conditions of incompleteness and inaccuracy of information about them, a model
based on the theory of fuzzy sets and fuzzy inference is proposed. Based on the model,
a rule-oriented method of cyber incident identification, based on mapping of the set of
incident features to the set of possible classes of cyber incidents, and the algorithm for
its implementation have been developed.
    To implement the developed model and method, it is advisable to modify the
structure of the SIEM-system by introducing an intelligent decision support subsystem,
which should be based on the model of cyber incident identification based on fuzzy
rules and fuzzy inference, where causal relationships of a cyber incident and its sighs
are described by the expert in plain language and then formalized as a set of fuzzy
logical rules.
    The simulation results show that the proposed method allows us to increase the
accuracy of cyber incident detection by 2-15%.
    The obtained results can be used in practice for solving the problem of detecting
cyber incidents by SIEM, which is part of the SOC software and hardware.


References

 1. I. Subach, V. Kubrak, and A. Mykytiuk, “Architecture and functional model of a promising
    proactive intelligent system SIEM-system for cyber protection of critical infrastructure
    objects”, Information Technology and Security, Vol 7., Iss. 2., 2019, pp. 208-215, DOI:
    10.20535 / 2411-1031.2019.7.2.190570, Access mode: https://doi.org/10.20535/2411-
    1031.2019.7.2.190570
 2. Law of Ukraine On the Basic Principles of Cyber Security of Ukraine: Official Publication:
    Vidomosti Verkhovnoi Rady, 2017, № 45, Art. 403.
 3. I. Subach, V. Kubrak, and A. Mykytiuk, “Methodology of rational choice of security
    incident management system for building operational security center”, CEUR Workshop
    Proceedings, 2019, 2577, р.p. 11-20, Режим          доступу:        http://ceur-ws.org/Vol-
    2577/paper2.pdf
 4. A. Fedorchenko, D. Levshun, A. Chechulin, and I. Kotenko, “Analysis of methods for
    correlating security events in SIEM systems. Part 1 ”, Proceedings of SPIIRAN, issue 4
    (47), 2016, pp. 5-27, DOI: 10.15622 / sp.47.1.
 5. A. Fedorchenko, D. Levshun, A. Chechulin, and I. Kotenko, “Analysis of methods for
    correlating security events in SIEM systems. Part 2 ”, Proceedings of SPIIRAN, issue 6
    (49), 2016, pp. 208-225, DOI: 10.15622 / sp.49.11.
 6. Elshoush H.T., Osman I.M. Alert correlation in collaborative intelligent intrusion detection
    systems — A survey // Applied Soft Computing, 2011, pp. 4349–4365.
 7. Muller A. Event Correlation Engine. Master`s Thesis. Swiss Federal Institute of
    Technology Zurich. 2009. 165 p.
                                                                                         219


8. Jakobson G., Weissman M.D. Alarm correlation // IEEE Network. 1993. no. 7(6). pp. 52
    59.
9. Tiffany M. A survey of event correlation techniques and related topics. URL:
    http://www.tiffman.com/netman/netman.html (дата обращения: 26.04.2016).
10. Sadoddin R., Ghorbani A. Alert Correlation Survey: Framework and Techniques //
    Proceedings of 2006 International Conference on Privacy, Security and Trust: Bridge the
    Gap Between PST Technologies and Business Services (PST`06). 2006. Article no. 37.
11. Hanemann A., Marcu P. Algorithm Design and Application of Service-Oriented Event
    Correlation // Proceedings of Conference BDIM 2008, 3rd IEEE/IFIP International
    Workshop on Business-Driven IT Management. 2008. pp. 61–70.
12. Yu. Samokhvalov, and S. Tolyupa. “Correlation of events in SIEM-systems based on non-
    monotone inference”, Information protection, .Volume 19, № 1, 2017, pp. 5-9.
13. L. Zade, The concept of a linguistic variable and its application to approximate decision
    making, Moscow, Russia: Mir, 1976.
14. A.P. Rothstein, Medical diagnostics on fuzzy logic, Vinnytsia, Ukraine: Continent-PRIM,
    1996.
15. A.P. Rothstein, Intelligent identification technologies: fuzzy sets, genetic algorithms,
    neural networks, Vinnytsia, Ukraine: UNIVERSUM, 1999.
16. SIEM Rules or Models for Threat Detection? Exabeam, 2018.[Online]. Available:
    https://www.exabeam.com/siem/siem-threat-detection-rules-or-models/. Accessed on:
    November29, 2020.
17. F. Salo, M. Injadat, A. Nassif, A. Shami, and A. Essex, ‘‘Data Mining Techniques in
    Intrusion Detection Systems: A Systematic Literature Review,’’ in Proc. IEEEAccess,
    September 2018, Vol. 6, pp. 56046–56058. DOI:10.1109/ACCESS.2018.2872784.

</pre>