210 Rule-oriented method of cyber incidents detection by SIEM based on fuzzy logical inference © Ihor Subach, © Volodymyr Kubrak, © Artem Mykytiuk, © Stanislav Korotayev Institute of Special Communication and Information Protection of the National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine igor_subach@ukr.net Abstract. We consider the role of SIEM in the protection circuit of information and telecommunication system for proactive cyber incident management. We provide the main mechanisms of the process of correlation of events on the de- tection of cyber-attacks, malicious activity, and violations of security policy. We analyze identification methods of signs of deletion, integration, and connection of the processed information, as well as the establishment of its reasons and pri- orities. We outline the main disadvantages of the rule-oriented method. We pro- pose the implementation of the model and method of cyber incident recognition under incompleteness or inaccuracy of information about the incidents based on the application of fuzzy set theory and fuzzy inference. We present the formal statement of the problem of cyber incident detection by the SIEM and propose its solution. The problem of incident identification is solved by finding a mapping between the set of signs of cyber incidents and the set of their possible classes. Graphical interpretation of the problem of cyber incident identification is pre- sented and the main difficulties that arise during its solution are formulated. Em- phasis is placed on the expediency of creating a subsystem of intelligent decision support in the SIEM, which should be based on the model of cyber incident iden- tification based on fuzzy rules and fuzzy inference, where the causal relationship between a cyber incident and its features are described by an expert in natural language, and then formalized as a set of fuzzy logical rules. An algorithm for deciding on cyber incident identification is proposed. The data on the practical effectiveness of the proposed method is presented. Keywords: cybersecurity, cyber defense, cyber-attack, cyber incident, SIEM, fuzzy set theory, tuple recognition model, rule-oriented method 1 Introduction Building of an effective cyber defense system should be based on proactive Security Information and Event Management (SIEM) [1]. The use of SIEM in the protection circuit allows for effective proactive management of cyber incidents, based on auto- mated mechanisms that use information about events that have already occurred in the system, predict future events that will occur in it, and adapt system protection parame- ters to its current status. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 211 A cyber incident is an event or a series of adverse events that bear signs of a possible cyberattack, which threaten the security of electronic communications systems, process control systems, create a possibility of violation of the normal operation of such sys- tems, including failure and/or blocking of the system and/or unauthorized management of its resources, and endanger the security of electronic information resources [2]. The architecture and functional model of the proactive SIEM were considered in [1]. According to the tasks performed by this system (collection, processing, and analysis of security events coming to it from many disparate distributed sources), the basis of its operation includes the following mechanisms: normalization, filtering, classifica- tion, aggregation, correlation, prioritization, and analysis of events and cyber incidents and their consequences, as well as generation of various reports, messages and visual presentation of data for prompt and informed decision-making [1]. The methodology of rational selection of SIEM for the construction of SOC (Security Operation Center) is given in [3]. In some sources [4, 5] these mechanisms are considered as stages of the general process, which is called the correlation process. It has a special place in the SIEM, as its purpose is to detect cyberattacks, malicious activity, security policy violations, and others [6]. This purpose is achieved by addressing a wide range of tasks that it covers: identifying potential relationships between disparate security information; grouping low-level events into higher-level events; detecting potential incidents based on analy- sis of the behavior of various infrastructure objects, and others. Technologically, as part of the SIEM, the correlation method includes a sequence of actions on the data, which aims to identify, in a certain way, signs of deletion, integra- tion, and linking of processed information, as well as establishing its causality and pri- ority [4, 5]. These features are called correlation features. To achieve these objectives, at different stages of the correlation process, a wide variety of methods are used [7, 8, 9, 10, 11], such as: the method based on finite ma- chine states (finite state machines), which is used to identify dangerous states of the system; rule-oriented method, which is based on rules that have clear syntax and se- mantics; the method of reasoning based on precedents; Bayesian network method, which is used at the stage of multi-step event correlation, loss analysis, and prioritiza- tion; artificial neural networks, which are also used for event correlation, loss analysis, and prioritization, and others. Analysis shows that the most common method is the rule-oriented method, but due to the fact that it is based on classical production rules, which do not always give the expected result in terms of incomplete and inaccurate information about cyber inci- dents, its application is not always effective. Therefore, the task of developing models and methods for recognizing cyber inci- dents in conditions of incompleteness or inaccuracy of information about them is rele- vant. The aim of the work is to develop a model and rule-oriented method of detecting cyber incidents by SIEM based on fuzzy inference. 212 2 Statement of the problem of cyber incident detection by SIEM Any cyber incident is characterized by a set of information features, on the basis of which, in turn, it can be recognized. O  oi  i  1, n  Let the set of information features of cyber incidents that occur in  the system and  are represented by the set C   C j C j  o j1 , o j 2 , , o jm , j  1, J ,   where information signs are associated Cj with a cyber incident . Then the model of cyber incident recognition can be represented by a tuple [12]: M  K , Oi , R, C , (1) o O  R  Ri  where K is a feature classifier; i is a set of the observed features; is a set of cyber incidents recognition rules; C – a cyber incident. The process of recognizing cyber incidents is carried out based on rules (usually, production rules): R1 : ( K , Oi ), R2 : ( K , Oi ), , Rl : ( K , Oi )  C . However, in traditional production systems, the rules are classic products that do not fully meet the conditions of incompleteness and inaccuracy of information about cyber incidents that occur during operation of information and telecommunications systems. As a rule, for this purpose, methods, and models of fuzzy set theory on fuzzy inference are used [13]. 3 The method of solving the problem Based on above-listed considerations, model (1) can be developed and presented as follows: MF  KF , Oi , RF , C , (2) RF  RFi  where KF is a fuzzy classifier, is a set of fuzzy cyber incident recognition rules: RF1 : ( K , Ov ), RF2 : ( K , Ov ), , RFl : ( K , Ov )  C . On the other hand, based on the works [15, 16], the problem of recognizing cyber incidents can be considered as a problem of their identification, the solution of which is to find a mapping:   O *  o1* , o2* , , on*  c j  C  c1 , c2 , , cm , (3) 213 * where O  is a set of signs a cyber incident; a set of possible cyber incidents. oi  o i , o i  Range of change of signs of cyber incidents   , i  1, n , and the original k   k,k value of the identification result   are considered known. Accordingly,   oi oi is the lower (upper) value of the cyber incidence parameters, i  o , i  1, n, k, k is the lower (upper) value of the identification result k. Graphically, the problem of identifying cyber incidents can be represented as follows (see Fig. 1): Input Parameters Solution: (Signs of Cyber Incidents) Cyber Incident Class o1 C1 Fuzzy o2 knowledge Logic C2 y *  D. base output y *  D. y  (fuzzy unit  on Cm rules) Fig. 1. Graphical interpretation of the problem of cyber incident identification. In this case, the main difficulties that arise when solving the problem (3) are as follows: first: for correct identification of a cyber incident, it is necessary to take into account a large number of heterogeneous parameters of the system (quantitative and qualitative), which, in turn, requires a highly qualified cybersecurity officer, as well as appropriate time; second: the lack of analytical dependence between the cyber incident and its signs. These difficulties confirm the expediency of creating an intelligent decision support subsystem within the SIEM. Its operation should be based on a model of cyber incident identification based on fuzzy rules and fuzzy inference. At the same time, for its development, it is necessary to take into account the linguistic nature of the type of cyber incident (output variable) and its features (input variables). In turn, the causal relationship between a cyber incident and its signs must be described by an expert in plain language and then formalized as a set of fuzzy logical rules. It should be noted that with a large number of signs of cyber incidents, it is advisable to build a tree of inference, which determines the order of embedding statements in each other. Figure 2 shows the inference tree for the correlation rule [16]: if on one computer with the same IP address, seven user authentication attempts using different user IDs have failed within ten minutes, and a successful login of a user into the system from 214 any computer with the same IP address has been successful, then this event must be addressed by a security officer. o1 o2 o3 o4 o5 o6 Failed login ξα ξβ Successful login α ξс β Cyber incident с1 с2 Normal state Fig. 2. Logical inference tree for the correlation rule. Here through 𝑜 𝑜 marked signs of a cyber incident (Table 1). Table 1. Signs of a cyber incident. Sign The content of the sign Type O1 The number of failed login attempts numerical O2 The number of users IDs numerical O3 The duration of login attempts numerical O4 The number of IP-addresses involved during login numerical O5 The number of computers involved during login numerical O6 The number of successful login attempts numerical In turn, c1 and c2 indicate the type of event occurring in the system (Table 2). Table 2. The type of event occurring in the system. Event Event content α Successful login β Failed login С1 Cyber incident С2 Successful login 215 The structure of the logical output tree corresponds to relations (4)-(6): c   c  ,  , (4)    o1 , o2 , o3 , o4 , o5 , (5)     o3 , o4 , o5 , o6 . (6) Table 3. Multidimensional matrix of knowledge about cyber incidents. Type Cyber incident Sign Value α O1 L O2 H O3 L H O4 L O5 aA С1 Sign Value β O3 bA O4 L H O5 H O6 L Sign Value α O1 aA O2 aA O3 bA aA O4 L O5 A С1 Sign Value β O3 bA O4 L aA O5 aA O6 L o  o6 ,  ,  A single scale of qualitative terms 1 is used to estimate the values of linguistic variables: L - low; bA - below average; A - average; aA - above average; H - high. Each of these terms is given by the corresponding membership function. From a formal point of view, the problem of cyber incident identification based on fuzzy rules and fuzzy inference corresponds to the mathematical model of object 216 c1 identification with a discrete output [14, 15]. Thus, to identify a cyber incident , the ratio is as follows:  с1 с     H     H       aA     aA   ,     (7) where    H      H o1    H o2    L o3    L o4    aA o5 ,      H      bA o3    L o4    H o5    L o6 ;      aA      aA o1    aA o2    bA o3    L o4    A o5 ,      aA      bA o3    L o4    aA o5    L o6 ;   and  с ,   ,   ,  оi   are corresponding membership functions. These fuzzy logical equations allow us to make a decision in favor of identification of a cyber incident based on the following algorithm: Step 1. The values of the signs of cyber incidents are recorded  O *  o1* , o2* , , o6*  . k * Step 2. The values of membership functions  oi are determined at fixed   parameter values oi , i  1, 6; k  L, bA, A, aA, H . * Step 3. Based on logical equations (7), the values of membership functions c      j o1* , o2* ,, o6* are calculated by the vector of attributes O *  o1* , o2* , , o6* for all c ,c types of cyber incidents 1 2 . Logical operations AND  and OR  on membership functions are replaced by operations min and max:          k o *i   k o *j  min   k o *i ,  k o *j  ; i  j,  (8)          k o *i   k o *j  max   k o *i ,  k o*j  ; i  j,  (9) c *j Step 4. Choice of solution (the type of cyber incident) provided: 217 c    c    j o1* , o2* ,, o6*  max   j o1* , o2* , , o6* .   (10) It should be noted that the adequacy of this model and the effectiveness of the method of detecting cyber incidents, which is based on the proposed model, respectively, are determined by the quality of membership functions, through which linguistic estimates are quantified. Due to the fact that these membership functions are determined by experts, the adequacy of the fuzzy knowledge base will depend entirely on the qualifications of experts. However, it should be noted that as a result of SIEM operation, statistics on cyber incidents will be collected, which makes it possible to assess the adequacy of the proposed model and the method developed on its basis. Thus, it is quite expedient to perform additional training (system settings). This, in turn, will allow the identification of cyber incidents that were not previously identified by the system during its operation. Comparative analysis of the proposed method showed that, in comparison with existing methods (the method of reference vectors, neural networks, k-nearest neighbors, the method based on immune systems), it can increase the accuracyof cyber incident detection (11) by 2-15 % (Table 4). TP P , TP  FP (11) where P (precision) is the accuracy of cyber incident detection; TP – the number of cyber incidents that are properly classified; FP – the number of cyber incidents classified as a normal state of the system [17]. Table 4. Comparative analysis of the proposed method of cyber incidents detection. Method The accuracy of cyber incident detection Δ Method of reference vectors 0,83 +0,15 K-nearest neighbors 0,85 +0,13 Method based on immune sys- +0,02 0,96 tems Neural networks 0,86 +0,12 The proposed method 0,98 - 4 Conclusions As a result of the conducted research, it is shown that the main role in a protection circuit of information and telecommunication system for proactive management of cyber incidents belongs to SIEM. 218 The results of the analysis indicate the feasibility of using a rule-oriented method to identify signs of deletion, aggregation, and linking of information processed, as well as to establish its causality and priority. To increase the efficiency of the rule-oriented method of recognizing cyber incidents in conditions of incompleteness and inaccuracy of information about them, a model based on the theory of fuzzy sets and fuzzy inference is proposed. Based on the model, a rule-oriented method of cyber incident identification, based on mapping of the set of incident features to the set of possible classes of cyber incidents, and the algorithm for its implementation have been developed. To implement the developed model and method, it is advisable to modify the structure of the SIEM-system by introducing an intelligent decision support subsystem, which should be based on the model of cyber incident identification based on fuzzy rules and fuzzy inference, where causal relationships of a cyber incident and its sighs are described by the expert in plain language and then formalized as a set of fuzzy logical rules. The simulation results show that the proposed method allows us to increase the accuracy of cyber incident detection by 2-15%. The obtained results can be used in practice for solving the problem of detecting cyber incidents by SIEM, which is part of the SOC software and hardware. References 1. I. Subach, V. Kubrak, and A. Mykytiuk, “Architecture and functional model of a promising proactive intelligent system SIEM-system for cyber protection of critical infrastructure objects”, Information Technology and Security, Vol 7., Iss. 2., 2019, pp. 208-215, DOI: 10.20535 / 2411-1031.2019.7.2.190570, Access mode: https://doi.org/10.20535/2411- 1031.2019.7.2.190570 2. Law of Ukraine On the Basic Principles of Cyber Security of Ukraine: Official Publication: Vidomosti Verkhovnoi Rady, 2017, № 45, Art. 403. 3. I. Subach, V. Kubrak, and A. Mykytiuk, “Methodology of rational choice of security incident management system for building operational security center”, CEUR Workshop Proceedings, 2019, 2577, р.p. 11-20, Режим доступу: http://ceur-ws.org/Vol- 2577/paper2.pdf 4. A. Fedorchenko, D. Levshun, A. Chechulin, and I. Kotenko, “Analysis of methods for correlating security events in SIEM systems. Part 1 ”, Proceedings of SPIIRAN, issue 4 (47), 2016, pp. 5-27, DOI: 10.15622 / sp.47.1. 5. A. Fedorchenko, D. Levshun, A. Chechulin, and I. Kotenko, “Analysis of methods for correlating security events in SIEM systems. Part 2 ”, Proceedings of SPIIRAN, issue 6 (49), 2016, pp. 208-225, DOI: 10.15622 / sp.49.11. 6. Elshoush H.T., Osman I.M. Alert correlation in collaborative intelligent intrusion detection systems — A survey // Applied Soft Computing, 2011, pp. 4349–4365. 7. Muller A. Event Correlation Engine. Master`s Thesis. Swiss Federal Institute of Technology Zurich. 2009. 165 p. 219 8. Jakobson G., Weissman M.D. Alarm correlation // IEEE Network. 1993. no. 7(6). pp. 52 59. 9. Tiffany M. A survey of event correlation techniques and related topics. URL: http://www.tiffman.com/netman/netman.html (дата обращения: 26.04.2016). 10. Sadoddin R., Ghorbani A. Alert Correlation Survey: Framework and Techniques // Proceedings of 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services (PST`06). 2006. Article no. 37. 11. Hanemann A., Marcu P. Algorithm Design and Application of Service-Oriented Event Correlation // Proceedings of Conference BDIM 2008, 3rd IEEE/IFIP International Workshop on Business-Driven IT Management. 2008. pp. 61–70. 12. Yu. Samokhvalov, and S. Tolyupa. “Correlation of events in SIEM-systems based on non- monotone inference”, Information protection, .Volume 19, № 1, 2017, pp. 5-9. 13. L. Zade, The concept of a linguistic variable and its application to approximate decision making, Moscow, Russia: Mir, 1976. 14. A.P. Rothstein, Medical diagnostics on fuzzy logic, Vinnytsia, Ukraine: Continent-PRIM, 1996. 15. A.P. Rothstein, Intelligent identification technologies: fuzzy sets, genetic algorithms, neural networks, Vinnytsia, Ukraine: UNIVERSUM, 1999. 16. SIEM Rules or Models for Threat Detection? Exabeam, 2018.[Online]. Available: https://www.exabeam.com/siem/siem-threat-detection-rules-or-models/. Accessed on: November29, 2020. 17. F. Salo, M. Injadat, A. Nassif, A. Shami, and A. Essex, ‘‘Data Mining Techniques in Intrusion Detection Systems: A Systematic Literature Review,’’ in Proc. IEEEAccess, September 2018, Vol. 6, pp. 56046–56058. DOI:10.1109/ACCESS.2018.2872784.