Exposing Deviations in Information Processes using Multifractal Analysis Yevhen Ivanichenko1, Valerii Kozachok1, Yurii Dreis2, Olena Nesterova1,3, and Kate Dmytriienko1 1 Borys Grinchenko Kyiv University, 18/2 Bulvarno-Kudriavska str., Kyiv, 04053, Ukraine 2 Polissia National University, 7 Staryi ave., Zhytomyr, 10008, Ukraine 3 National Pedagogical Dragomanov University, 9 Pyrohova str., Kyiv, 01601, Ukraine Abstract The main requirement for modern systems of intrusion detection is the possibility of identifying deviations in information processes in order to detect unknown attack types. An overview of existing approaches to identifying network deviations based on multifractal analysis methods is given. The results of the calculation of the Hurst exponent for the time series of CPU usage for different types of user activity are presented. Keywords1 Fractal analysis, the Hurst exponent, network deviation detection. 1. Introduction Signature methods of analysis used in modern intrusion detection systems aimed at identifying known and more specific methods of attacks, appear not to be able to detect their modifications or new types, which makes the use of such systems ineffective. Existing solutions to individual cases of detection of network deviations to this time do not allow to develop a single universal mechanism for detecting previously unknown attack types. The current task at the moment is to find more effective universal methods for detecting network deviations that are a consequence of technical failures or unauthorized impacts. The main requirement for these methods is the possibility of identifying arbitrary types of intruders, including distributed in time. Statistical studies of network traffic indicate that it has the properties of fractality or self- similarity, as well as the variability of these characteristics in the event of deviations in the network, which allows the use of fractal analysis to detect attacks [1, 2]. The purpose of this study is an overview of modern existing approaches to identifying network deviations based on the method of fractal analysis. 2. Methods of Detecting Attacks Attacks are deliberate actions of the offender, leading to violation of confidentiality, integrity or accessibility of the system. Methods of detection of attacks are divided into: 1. Signature. 2. Behavioral. Signature methods are intended to detect known and clearly described attacks and founded on a reference verification of symbol sequences and events with the database of attack signatures. Advantages of signature methods include low demands for computing power and probability of detection of attacks. Disadvantages of signature methods are the impossibility of identifying new types CPITS-II-2021: Cybersecurity Providing in Information and Telecommunication Systems, October 26, 2021, Kyiv, Ukraine EMAIL: y.ivanichenko@kubg.edu.ua (Y. Ivanichenko); v.kozachok@kubg.edu.ua (V. Kozachok); dreisyuri@gmail.com (Y. Dreis); o.d.nesterova@npu.edu.ua (O. Nesterova); k.dmytriienko.asp@kubg.edu.ua (K. Dmytriienko) ORCID: 0000-0002-6408-443X (Y. Ivanichenko); 0000-0003-0072-2567 (V. Kozachok); 0000-0003-2699-1597 (Y. Dreis); 0000-0002- 0402-0370 (O. Nesterova); 0000-0001-7984-7279 (K. Dmytriienko) ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 251 of attacks and modifications that exist without a clear formalization of keywords of network traffic and updating the signature database. Behavioral methods are intended to identify unknown attacks and are based on detection of deviations from normal operation mode. Advantages of behavioral methods comprise the possibility of analyzing the dynamics of processes and the possibility of identifying new types of attacks. Disadvantages of behavioral methods include higher requirements for computing resources and capacities and lower probability of detection. 3. Fractal Analysis The time series is a sequence of values of the studied magnitude measured at regular intervals. The central concepts of fractal analysis are fractal dimension (D) and the Hurst exponent (H). The fractal dimension of the set (according to Hausdorff) is determined by: lg⁡[N(ε)] D = − lim , (1) ε→∞ lg⁡(ε) where N(ε) – the minimum number of non-empty cells ε that cover a given set. The Hurst exponent characterizes the degree of similarity of the process: 1. 0 0.5 – a trend-resistant process that has a long memory and is self-similar. The fractal dimension is directly related to the Hurst exponent: D = 2 – H. This ratio is fair when the structure of the curve that describes the fractal function is investigated with high resolution, that is, in the local limits. One of the popular methods of finding fractal dimension is R / S analysis [2]: R(n) M [ S(n) ] ~cnH , n → ∞ (2) where n – high resolution; с – a positive finite constant that does not depend on n; Н – the Hurst exponent; R(n) – the scope of the time series. R(n) = max ∆j − ⁡ min ∆j (3) 1≤j≤n 1≤j≤n ∆j = ∑i=n xi − kx̅ , k = ⃗⃗⃗⃗⃗⃗ n 1, n (4) 1 n x̅ = n ∑i=1 xi (5) 1 S(n) = n−1 ∑ni=1(xi − x̅)2 (6) 4. Review of Existing Methods and Approaches In [1], a method of maximum modules of wavelet transform (MMWT) is used to detect traffic deviations, which allows us to detect the singularity of the signal. The network traffic collected on the boundary router of the university network was taken as the analyzed data. Each sequence is about 24 hours long with a sampling step of 1 second. Samples of "pure" traffic without attacks, and also with various deviations are presented: at DDoS-attacks of different types of scanning. The algorithm for estimating the parameters of the multifractal spectrum is as follows: The output signal f (t) is decomposed by means of a wavelet transform by the mother wavelet Ψ (t) into the corresponding coefficients: j t−u Wf (u, i) = (f(t), Ψu,s (t)) = 2−2 ∫ 2j dt (7) The partition function is calculated: q S(q, j) = ∑p|Wf (up , j| (8) For each value of q∈R it is necessary to calculate the scale indicator: lnS(q,j) τ(q, j) = log j→0 inf ln2j (9) Then the multifractal spectrum fl (a), using the Legendre transformation is calculated: 252 1 fl (a) = min [q (a + 2) − τ(q)] (10) q∈R For each octave j, the multifractal dimension of the order q is calculated: 1 Dq,j = q−1 [q(a(q, j) − f(a(q), j))] (11) When q <0, the value of S (q, j) depends on small maxima of the amplitude | W_f (u_p, j) |, as a result, the calculations may not be stable. In order to avoid the emergence of false maxima of modules created by computing errors in areas where f is almost constant, wavelet-maxima are combined in a chain to form a maximum curve depending on the scale. −t 2 P (p) 1 If Ψ = (−1) θ , where θ = 2π e 2 – Gaussian function, then all lines of maxima up (j) are √ determined by curves that are limited by j = 0. Therefore, all maximum lines that do not extend to the smallest scale are deleted when calculating S (q, j). Formalizing the difference in traffic spectra with certain deviations and without them, it is possible to compare the fractal dimensions D1, correlation dimensions D2 and intervals characterizing the “width” of the Lezhandr spectrum for each of the implementations for each octave of decomposition of j. Information dimensions of the comparative implementations D1 are distinguished by a small stable value and practically do not depend on the number of levels of sampling. This allows us to conclude that the presence of long-term attacks in the signal and non-predicted activity changes the self-similar nature of traffic, and this property can be used in the future to detect attacks. In [3] it is proposed to determine the deviations, based on their identity and distribution of "heavy tails". Network deviations may occur as a result of overloads, errors by network devices, DDOS attacks, attempts of unauthorized access. To reduce the impact of the periodicity of network traffic on the estimation of the Hurst exponent, the time series is divided into 24 sets of values. For each set, a histogram is built for 24 equal intervals. For each group there are a packet number and the average package length for the same time intervals. At the next stage, the Hurst exponent is calculated by the method of periodogram that use the slope of the power spectrum. The Hurst exponent is calculated from the ratio: β − 1 = 1 − 2Н, (12) where β -the slope of the line on a logarithmic scale. In practice, you must first analyze traffic in the regular network operation mode during the day. When the deviation detection mode is turned on, at first the Hurst exponent is calculated and compared with the corresponding reference value calculated in normal mode, for each parameter separately. In the paper [4], an algorithm for detecting deviations based on a discrete stationary wavelet transform and fractal dimension is used. As the first step, a time series filtering with a discrete stationary wavelet transform is performed. This preliminary processing is necessary to increase the accuracy of the proposed method: the main components are allocated, the details are filtered. The main advantage of the discrete stationary wavelet transform over a classic one is the preservation of the time information of the output signal at each level. In the second step, the time series is bypassed by two adjacent windows R and S. For each window, a fractal dimension is calculated according to the algorithm: L lg⁡( ) a FD = d (13) lg( ) a where L – the length of the time series, d – the distance between the first point of the series and the farthest from it, a - the average distance between two adjacent points of the series. Changes in the statistical parameters of the signal are reflected in the fractal dimension, to account for which the following function is introduced: Gk = |FDk+1 − FDk |, k = 1, … , n (14) where n = number of points G. The third step is the search for local maxima G that exceed a given threshold, which are considered as deviations from normal behavior. The accuracy of the method is significantly affected by the length of the window. For the analyzed window of length l, the energy of the function⁡Gl ⁡⁡⁡⁡ is calculated as: 2 ∑ k|Gl | k Gl = N (15) 253 The window length is calculated as the minimum of the standard energy function EGl . In [5], a method for detecting DDOS attacks is offered based on the estimation of the Hurst exponent using the Fourier fractional transformation, which makes the transition to the frequency-time area. For the signal x (t) the fractional Fourier transform is determined as: ∞ X a (u) = Fa (u) = ∫−∞ x(t)K a (t, u)dt (16) K a (t, u) = ⁡ √1 − i ∙ cot(α) ∙ exp⁡[iπ(t 2 cot⁡(α) − 2ut ∙ csc(α) + u2 cot⁡(α))], α ≠ nπ (17) K a (t, u) = δ(t − u), a = (2 ∓ 1)π (18) aπ n ∈ Z, α = 2 (19) where a – the order of fractional Fourier transformation, provided that a=1, then the formula changes to the usual Fourier transformation. Using a discrete wavelet transform and a multi-scale method of analysis, we can calculate the Hurst exponent if we analyze the expression: G(j) ↔ (2H + 1)j + constant, where j – scale. Next, the optimal selection of the range of scale intervals is made, using the method of one- dimensional weighted estimation of least squares [8]. Experimental verification of the proposed method showed its high accuracy, which reduced the number of false positives and omissions during the detection of the attack. It was stated that network traffic is divided into several disjoint segments. The Hurst exponent for each segment is estimated. When the threshold values are exceeded, the traffic loses the property of self-similarity, which is regarded as a DDoS attack. But the intensity of the DDoS attack can change, which leads to a change in the Hurst exponent, so detection methods based on a fixed threshold require flexibility and adaptability. This article proposes a method consisting of two stages: 1. Statistical analysis of the time series of network traffic using discrete wavelet transform and the Schwartz information criterion to find the change point of the Hurst exponent, which signals the start of a DDoS attack. 2. Adaptive regulation of the intensity of a DDoS-attack on the basis of fuzzy logic, by analyzing the Hurst exponent and the rate of its change. The Schwartz information criterion is based on the maximum likelihood function for the model and can be used to detect the presence of a threshold point by comparing the probability of a null hypothesis (no point) and an alternative (point of presence). The Hurst exponent is estimated using a discrete wavelet transform, because in practice this method is one of the most reliable, as it is more resistant to gentle polynomial trends and noise. 5. The Evaluation is Performed According to the Following Algorithm For the time series of network traffic X in real time, the wavelet coefficients d (j, k) are calculated for each scale j and position k. Next, it is necessary to perform a detailed assessment of the dispersion at each scale j: nj Sj = ∑k=1 d2 (j, k) (20) where nj ⁡– the number of wavelet coefficients that are available in scale j We assume that a new sample of traffic is received, then the amount will be updated as follows: nj ≠ nj + 1 (21) Sj = Sj + d2 (j, nj ) (22) Estimation of variance in scale j: S εj = nj (23) j Next, the dependence of log 2 (εj ) on scaling j is constructed and a weighted linear regression is performed for the linear section, α is calculated.You do not need to build this dependency every time you receive a new segment of traffic, this action is performed only when necessary. Then the Hurst exponent is calculated α+1 H= 2 (24) 254 6. The Principle of Detecting Attacks Let X be a time series of normal traffic, Y is a time series of traffic with deviations, Z - a time series of deviations, i.e. the relation Y = X + Z holds. Based on the theorems, we can conclude that regardless of the presence of self-similarity in Z, if X is a stationary self-similar process of the second order, then Y will still be a self-similar process. But the degree of self-similarity may change. Let ry , rz be autocorrelation functions X, Y, Z, respectively. Then a ‖ry − rx ‖, is of interest during the attack, and ry = rx + rz . For each value of H ∈ (0.5,1] there is only one autocorrelation function on self-similarity. Thus, we examine ‖Hy − Hx ‖, where Hx ⁡and Hx is an average values of the Hurst exponents Y and X, respectively. The disadvantage of this method is that the wavelet transform coefficients and statistics based on the Schwartz information criterion are updated at the moment when new traffic values arrive, and the detection of the traffic self-similarity threshold will be restarted for each scale. Thus, a signal of the change in the point of self-similarity will be given, even if this change occurred on a different scale at the same moment. After an attack is detected, close to the detection time the traffic is divided into parts. By analyzing the Hurst exponent and the speed of its change (the difference between the Hurst exponents of traffic parts before and after the moment of detection), we can determine the intensity of DDoS-attack, using the rules of fuzzy logic. Determining the point of change of traffic self-similarity by the Schwartz information criterion which is based on the assumption that the entropy of a sequence with a variable self-similarity boundary point is greater than the entropy of the sequence in which this point is fixed. Suppose there is a sequence of length M. It is assumed that there is only one point of the self-similarity boundary at position 1