Invasion Detection Model using Two-Stage Criterion of Detection of Network Anomalies Volodymyr Buriachok1[0000-0002-4055-1494], Dmytro Ageyev2[0000-0002-2686-3854], Oleksii Zhyltsov1[0000-0002-7253-5990], Pavlo Skladannyi1[0000-0002-7775-6039], and Volodymyr Sokolov1[0000-0002-9349-7946] 1 Borys Grinchenko Kyiv University, Ukraine 2 Kharkiv National University of Radio Electronics, Ukraine p.skladannyi@kubg.edu.ua Abstract. The article considers the methods that affect the operation of the in- trusion detection system. Using the “disconnection task,” a two-stage criterion for detecting anomalies in computer networks has been formed, which provides an analysis of network infrastructure characteristics and their identification with specific computer attacks and provides the ability to respond to possible attacks in real-time. Keywords: Intrusion Detection Systems, Anomalies, Signature Analysis Meth- od, Statistical Analysis Method, Neural Networks, Invasion Detection Model, Attack. 1 Introduction and Task Setting Based on the annual rate of increase in the number of incidents in cyberspace, we can conclude that it is necessary to independently include in the integrated information security system critical infrastructure of automated means of detecting computer at- tacks and other dangerous divisions, including threats of man-made and natural (ran- dom) nature. Modern Intrusion Detection Systems (IDS) to achieve information security goals constantly monitor the functioning of hardware and software platforms of the network infrastructure and record several quantitative and qualitative indicators of their work: {X1(t), …, XN(t)} (1) where t is the time at which the measurement of the indicator Xk(t). Such indicators, in particular in the NIDES system [1], are:  Ability to use the CPU separately by the system and the user.  Time to complete the process.  Total amount of memory used during the execution of the process and its maxi- mum size during execution. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) 24  Number of open files at runtime; the number of page failures.  Amount of information read from the disk.  Number of I/O characters during application execution.  Username changed during the execution of the application.  Start time of the application.  Number of signals received during the execution of the application.  Application was running on the remote station and the name of this station  Name of the application that was used on the remote station.  Application was running on the local station and the name of this station, the name of the application used on the local station, etc. 2 Problem Formulation With this in mind, this article aims to build a model of intrusion detection system that would detect network anomalies in real-time and with the least error, as well as find answers to the following questions [2]:  What happened online?  What was attacked and how dangerous is the attack?  When and where did the attack start?  Who is the malefactor?  How and as a result of what there was the invasion? Accordingly, the complexity of the attacking party’s tasks to create cryptographic software tools for attacks is significantly simplified, and the effectiveness of solving problems of recovery (decryption) of cryptosystem data, which are encrypted with software tools for attacks, has an objective tendency to constant decline from year to year. At the same time, there are such groups of threats related to reliability [3–5]:  Cryptographic primitive programming errors.  Errors in the use of cryptographic primitives in crypto providers.  Errors in transferring parameters to the crypto provider and returning processing results.  Errors in the failure of the hardware platform.  Accidental and intentional violations of the integrity of programs and data of cryp- to providers. These factors may be prerequisites for the formation of hidden channels of leakage of information about the operation of cryptographic software tools for attacks. Consider two options for attacks on the implementation of software tools for attacks that do not involve the presence of critical information about their work. 25 3 Crypto Attack Mechanics Premise 1. There is a certain cryptographic software tool for attacks (cryptosystem) with secret keys based on a stable block symmetric cryptographic algorithm, which provides encryption in output feedback (OFB) mode or otherwise tampering with feedback [6]. It is necessary: to build an attack on the implementation of the specified cryptosystem to ensure the possibility of partial or complete recovery of information (decryption). This mode of operation of the crypto algorithm is widely used to build fully con- nected communication networks, in which each subscriber must send a message to any other subscriber of this network. This mode is characterized by the presence of an initialization vector IV is random number, which provides at an acceptable level the probability of not overlapping the cipher in the case of using one key K for some time [6]. The idea of the attack on this cryptosystem is to implement a fictitious random data generator of the crypto algo- rithm, which is included instead of the real pseudorandom number generator, provides a hidden repetition of initialization vectors or keys. As a result, some messages will be encrypted in the same way: ~  S1  T~1  Г~ j  (2)  ... ... ... S~  T~  Г~  mj mj j ̅̅̅̅̅̅̅̅̅̅̅ where 𝑇̃1 , … , 𝑇̃𝑚𝑗 are open messages; 𝛤̃𝑗 , 𝑗 = 0.2 𝑘 − 1 is a sequence of gamma charac- ters formed from a single initialization vector and an algorithm key; 𝑆̃1 , … , 𝑆̃𝑚 are 𝑗 corresponding encrypted messages. In this case, depending on the number of equally encrypted messages 𝑚𝑗 and re- dundancy of the source texts of the messages, the corresponding encrypted messages can be partially or completely decrypted [7]. To repeat the initialization vectors IV the random bit generator should be used as follows: IV   1, 2 ,..., k   1, 2 ,..., b (3) where 𝛼1 , 𝛼2 , … , 𝛼𝑘 is the sequence of random bits; 𝜑(… ) some unambiguous func- tion that expands the random sequence to a given size b; 𝛽1 , 𝛽2 , … , 𝛽𝑏 is coordinates of the initialization vector. To solve the problem, we first calculate the allowable number of random bits to provide the required number of repetitions of equally encrypted messages. Let in the network M subscribers, each of whom sends on average daily 𝜇 messages for each connection direction. With T is public key validity period (days). Then the average number of messages sent 𝑁 is the value: M  (4) N       T 2 26 During the specified period, the average number of repetitions 𝑅 each of 2𝑘 the values of vectors of random bits 〈𝛼1 , 𝛼2 , … , 𝛼𝑘 〉 in the case of their even distribution will be: M      T N  2  R k  . (5) 2 2k Based on equation (5) we obtain an estimate of the allowable number of random bits:  M   k  log 2      T  log 2 R  (6)  2  Using inequality (2) in each case of the attacked network, it is possible to calculate the allowable number of random bits to provide an average static number of 𝑅 equally encrypted messages. Of course, random data used for cryptographic transformations are subject to statistical testing, but the length of the vector is quite small. IV (block length for most standard cryptographic algorithms is 64 or 128 bits) significantly limits the ability to apply statistical criteria. To check the uniformity of the distribu- tion of sequences of this length, the criteria of frequencies of signs and bigram are mainly used [8], which puts forward the appropriate requirements for the expansion function. Therefore, it is clear that to ensure the equal occurrence of bigrams in the binary vector 〈𝛼1 , 𝛼2 , … , 𝛼𝑘 〉 appropriate value k choose a multiple of four (there are four options for bigram 00, 01, 10, 00). For example, in the case of a special network that includes M = 10 subscribers with an average intensity of sending messages during the validity of one network key 𝜇 ∙ 𝑇 = 10 to obtain the average number of repetitions of encryption 𝑅 = 2 it is nec- essary to have several random bits no more 𝑘 ≤ 7. To hide statistical dependencies in vector IV and obtain the required number of bits in the corresponding expression as a function , you should use a stable hash func- tion, such as MD5, which is part of the standard set of Windows algorithms, or simi- lar. Thus from 128 bits of the digest which we receive using a hash function, we will use the necessary quantity. Premise 2. Suppose that there is a software tool for implementing attacks (cryp- tosystem), built on a stable symmetric block cryptographic encryption algorithm in OFB mode using public key distribution based on the El Gamal protocol [6]. The task is to organize an attack on the implementation of the cryptosystem in such a way as to ensure complete decryption. Initially, the operation of the cryptosystem in the El Gamal protocol using a cyclic element g of some field and the secret key of the asymmetric algorithm x is formed public key y: y  g x mod p (7) where p is some large prime number, 1024 bits long. 27 To encrypt the session key of the symmetric algorithm, a random number k is gener- ated, the value is calculated: y1  g k mod p . (8) and the secret key of the symmetric algorithm is encrypted K:   K  y k mod p . (9) Pair (δ,y1) together with the encrypted message is transmitted to the owner of the se- cret key x, which is based on the obtained pair (δ,y1) calculates the secret key of a symmetric algorithm K: K    y1 x mod p . (10) We organize an attack on the specified protocol by substituting a random number k with a pseudo-random one so that the testing system does not reveal this fact. Let k{t., t=1,2,…,}, at the same time power  the set of valid values of the pseudo- random number k choose a fairly large, but less than the performance of a specialized computer system for the search of keys for a reasonable period. Calculation of possible key variants K у in this case we carry out using Internation- al Planned Parenthood Federation (IPPF) proceeding from the equation (10) based on a set of admissible values of pseudorandom number {t} [9]. As a result, it should be noted that access to information of the “true” random data generator and methods of its substitution in each case depends on the hardware and software platforms on which the IPPF, the peculiarities of its implementation, the presence of an automated system of protection against unauthorized interference. her work, etc. To create a fictitious random number generator, it is possible to use a se- quence of Mersenne numbers 𝑀𝑝 = 2𝑝 − 1, if 𝑝 is a simple number, 𝑀𝑝 also simple. 4 Methods of Analysis It is known that the efficiency of the IDS system significantly depends on the applied method of analysis of the original data. Currently, the following main methods are distinguished [1]: 1. Signature methods of analysis. 2. Statistical methods of analysis. 3. Hybrid methods of analysis with the function of self-learning. 4.1 Signature Method Signature methods of analysis are based on the fact that most attacks and their scenar- ios are generally known. In this approach, the signatures of invasions determine the characteristics and conditions of the objects, the occurrence of events that are signs of attempted attacks (invasion), and their relationship. Of course, signature methods of 28 analysis use intrusion signature databases that are supported by the security system. In this case, the order (sequence) of actions performed or initiated by the user or infor- mation process (program) is compared with known signatures. A sign of an attempted security breach may be partial compliance of the sequence of events with the signa- ture. Anti-virus scanners that work with a database of virus signatures are typical repre- sentatives which implement this idea. Their advantage is a fairly high speed of analy- sis. The effectiveness of signature methods of IDS can be increased through the use of methods of artificial intelligence. The advantages of statistical methods of analysis in IDS include:  No need for a large amount of memory to store controlled variables.  Ease of detecting deviations in the data characterizing the behavior of users and processes.  Ability to analyze quantitative and qualitative data of different nature of origin as a parameter in the analysis.  Disadvantages of statistical methods include difficulties in generating statistics of the normal behavior of users and processes. 4.2 Hybrid Method Hybrid methods of analysis with the function of self-learning include:  Teaching methods for the classification of examples.  Neural networks.  Genetic algorithms. To achieve the goal set in the work, it is advisable to use statistical methods exactly. This allows, using the well-known “problem of disorder” [10]: to form a two-stage criterion for the detection of anomalies in computer networks, to overcome the con- nection of statistical methods to the model of usual (normal) users’ behavior. Step 1. Let’s suppose that two complex hypotheses are considered concerning the random sequence 𝑋 = {𝑥𝑡 , 𝑥𝑡 ∈ 𝑅, 𝑡 = ̅̅̅̅̅1, 𝑁} under analysis: H0: the sequence X is stationary with a single probability distribution function, H1: the sequence X is a con- catenation (the result of “gluing”) of two stationary random sequences with different distribution functions: 𝑋 = 𝑋1 ||𝑋2 , (11) where 𝑋1 = {𝑥𝑡 , 𝑡 = ̅̅̅̅̅̅ 1, 𝑛∗ }, 𝑋2 = {𝑥𝑡 , 𝑡 = ̅̅̅̅̅̅̅̅̅̅̅̅ 𝑛∗ + 1, 𝑁}, 𝑛∗ = [𝜃𝑁], 0 < 𝜃 < 1 It is necessary to estimate the point of “gluing” n*. It is believed that the sequences X1 and X2 differ in one of the two-dimensional distribution functions, namely, the probability distribution of the vector (𝑥𝑡 , 𝑥𝑡+2 ): 𝐹(𝑢0 , 𝑢1 ) = 𝑃{𝑥𝑡 ≤ 𝑢0 , 𝑥𝑡+2 ≤ 𝑢1 } (12) 29 to the moment 𝑡1∗ = 𝑛∗ − 2 including is equal to F1(u), and at 𝑡 ≥ 𝑡2∗ = 𝑛∗ + 1 is equal to F2(u), and ||𝐹1 (𝑢) − 𝐹2 (𝑢)|| ≥ 𝜀 > 0, (13) where ||…|| is normal sup-norm. It is known that the distribution function of a finite-dimensional random vector can be approximated uniformly with any accuracy to the probability distribution function of a random vector with a finite number of values. It follows that if we give the set R is a combination of a sufficiently large number of domains {𝐴𝑗, 𝑗 = ̅̅̅̅1, 𝑟}, that do not intersect 𝐴𝑖 ∩ 𝐴𝑗 = ∅ for i ≠ j, then the vector (𝑥𝑡 , 𝑥𝑡+2 ) can be approximated by a distribution vector with a finite number of values [11]. Therefore, if we are to enter new random sequences 𝑖𝑗 𝑉𝑡 = 𝐼(𝑥𝑡 ∈ 𝐴𝑖 , 𝑥𝑡+2 ∈ 𝐴𝑗 ), де 1 ≤ 𝑖 ≤ 𝑟, 1 ≤ 𝑗 ≤ 𝑟, (14) where I(A) is an indicator of the set A, then at least in one of them there is a change in mathematical expectation. Therefore, if we use an algorithm that detects a change in mathematical expecta- tion, the same algorithm will detect a change in the distribution function. This fact allowed the work to be limited to the development of only one, a basic algorithm that can detect changes in the mathematical expectation. To do this, to identify moments of “disorders” a family of statistics of the form is proposed: 𝑛 𝑛 𝛿 1 1 𝑌𝑁 (𝑛, 𝛿) = [ (1 − )] ∙ [ ∑𝑛𝑘=1 𝑥𝑘 − ∑𝑁 𝑘=𝑛+1 𝑥𝑘 ], (15) 𝑁 𝑁 𝑛 𝑁−𝑛 where 0 < 𝛿 ≤ 1, 1 ≤ 𝑛 ≤ 𝑁 − 1, 𝑋 = {𝑥𝑘 , 𝑘 = ̅̅̅̅̅ 1, 𝑁 }} is the sequence under study. The given family of statistics in the case of a fixed n is a generalized variant of Kolmogorov-Smirnov’s statistics, which is used to test hypotheses of coincidence or difference of distribution functions in two samples. The paper also proves that the statistics in the case of 𝛿 = 1 for 𝑁 → ∞ and maintaining the ratio between the vol- umes of “glued” implementations minimizes the maximum possible probability of error estimating the moment of “disorder” (minimax in order): 𝑃 { max √𝑁|𝑌𝑁 (𝑛, 1)| > 𝐶 (1) } → 1≤𝑛≤𝑁−1 2 С(1) 2 ∑∞ 𝑘=1(−1) 𝑘+1 exp (−2𝑘 2 ( ) ) ≡ 𝑓(𝐶 (1) ), (16) 𝜎∗ where the parameter 𝜎∗ is the standard deviation, С(1) is the limit of the criterion, the excess of which will be perceived as the occurrence of “disorder,” the value of 𝑛∗ , for which it occurred, is the desired moment of “disorder.” Step 2. Having fixed the level of probability  “false alarm” about the disorder, during the statistical processing of real data we determine the level of the threshold of the first level C(1): 30 𝛼 = 𝑓(𝐶 (1) √𝑁⁄𝜎̅∗ ), (17) where 𝜎̅∗ is the estimate of the parameter 𝜎∗ (standard deviation), and N is the sample volume of the sequence under study [12]. In the case of hypothesis Н1 on the representation of the original sequence of measurements in the form of the concatenation of several stationary random sequenc- es with different probability distribution functions, we apply the criterion of the al- lowable number of triggers (“disorders”) Z. That is, we assume that the following equation takes place: 𝑛1 + 𝑛2 + … + 𝑛𝑍 = 𝑁, де 2 ≤ 𝑍 < 𝑁 − 2. (18) For a small number of sequences, the Chebyshev inequality can be used to determine the boundary of the criterion if the random variable Z has a mathematical expectation  and a standard deviation  for a given 𝜀 > 0: 𝜎2 𝑃{|𝑍 − 𝜇| ≥ 𝜀} ≤ (19) 𝜀2 If Z is a random variable with a single-mode probability distribution with a mathemat- ical expectation  and a standard deviation of 0 < 𝜎 < ∞, then for any 𝜆 > √8⁄3 ≈ 1.63299, there is a Vysochansky-Petunin inequality, which improves the estimation of the deviation probability: 4 𝑃{|𝑍 − 𝜇| ≥ 𝜆𝜎} ≤ (20) 9𝜆2 In particular, for a typical deviation of 3σ (three sigmas), the significance level of the criterion is the probability of “false alarm” is calculated as [13]: 4 𝛼= ≈ 0.0494. (21) 9𝜆2 A more accurate result can be obtained for the case of a random variable Z, which is the sum of a sufficiently large number of independent random variables (𝑚 → ∞): 𝑍 = 𝑧1 + 𝑧2 + … + 𝑧𝑚 . (22) According to the integral limit theorem, the probability of deviation can be approxi- mated by the normal probability distribution function 𝑁(0,1): 𝑍−𝜇 1 𝑡1−𝛼 𝑡2 𝑃 {| | ≥ 𝑡1−𝛼 } = ∫ exp(− )𝑑𝑡. (23) 𝜎 √2𝜋 −∞ 2 Based on (12), we calculate the boundary of the second level criterion: 𝐶 (2) = 𝜇 + 𝑡1−𝛼 𝜎 . (24) Thus, the algorithm for calculating the criterion includes the following steps (Fig. 1). 31 Fig. 1. Intrusion detection algorithm. It is possible to see that the complexity of the algorithm that implements the proposed criterion is estimated by the value of O(N). Thus, its implementation on the principle of the “sliding window” will not cause a significant load on the computer system. The most difficult issue for the implementation of the method of detecting anoma- lies, as in other statistical methods, is the formation of regular behavior of the network infrastructure and its users. 5 Conclusions In this regard, of particular interest are modern technologies that increase the efficien- cy of classical decryption methods [6], which using vulnerabilities in a particular cryptographic algorithm and/or protocol provide recovery of the secret key used to create encrypted messages, and/or disclosing the original value of the encrypted in- formation. Thus recovery of the initial information can be full or partial therefore speak about full or partial decryption. A two-level statistical criterion for detecting anomalies based on measurements of network infrastructure characteristics is proposed, which provides the possibility of 32 their further analysis to identify them with certain computer attacks. The application of the criterion does not require significant computing resources and provides an op- portunity to respond to possible attacks in almost real-time. References 1. Court, S. S.: Intruder Detection Methods (2004). [Publication in Ukrainian] 2. Kostrov, D.: Intrusion detection systems. Byte 8(49): 14–21 (2002). [Publication in Russian] 3. Zubov, A. Y.: Cryptographic Methods of Information Protection. Perfect Ciphers (2005). [Publication in Russian] 4. Duhin, A. A.: Information theory (2007). [Publication in Russian] 5. Gorbenko, I. D., Gorbenko Y. I.: Applied Cryptology: Theory. Practice. Application (2012) Buriachok, V. L., Gulak, G. M., Horoshko, V. O.: On the issue of organizing and conducting intelligence in cyberspace. Sci. Def. 2: 19–23 (2011). [Publication in Ukrainian] 6. Olіynikov, Р., et al.: Principles of construction and main properties of the new national stand- ard of block encryption of Ukraine. Inf. Prot. 17(2): 142–157 (2015). [Publication in Ukraini- an] 7. Kelsey, J., Schneier, B., Wagner, D., Hall, C.: Side channel cryptanalysis of product ciphers. 5th European Symposium on Research in Computer Security: 97–111 (1998) 8. Safety and Security Consultancy, https://www.ippf.org/about-us/jobs-and-opportunities/ consultancy/safety-and-security-consultancy, last accessed 2020/05/11. 9. Lloyd, E., et al.: Reference Book on Applied Statistics. Vol. 2 (1990). [Publication in Rus- sian] 10. Brodsky, B. E., Darkhovsky, B. S.: Non-Parametric Statistical Diagnosis: Problems and Methods (2000) 11. Shiryaev, A. N.: Probability. Vol. 1 (2007). [Publication in Russian] 12. Vysochansky, D. F., Petunin, Y. I.: Justification of the 3-sigma rule for unimodal distribu- tions. Probab. Theor. Math. Stat. 21: 23–35 (1979)