<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Modeling of Pseudo-Random Sequences Generated by Data Encryption and Compression Algorithms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexander V. Kozachok</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vasiliy I. Kozachok</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrey A. Spirin</string-name>
          <email>spirin_aa@bk.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrey K. Trofimenkov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Academy of the Federal Guard Service of the Russian Federation</institution>
          ,
          <addr-line>35 Priborostroitelnaya ul., Orel, 302015</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Orel Branch of the Federal Research Center "Informatics and Management" of the Russian Academy of Sciences (OF FIC IU RAS)</institution>
          ,
          <addr-line>137 Moskovskoe shosse., Orel, 302025</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>98</fpage>
      <lpage>106</lpage>
      <abstract>
        <p>Reports of information and analytical agencies indicate a high proportion of internal violators as sources of confidential data leaks in Russia. One of the possible channels of data leaks can be their transmission in encrypted form. Modern data analysis tools are not able to reliably detect the transfer of information after its processing by cryptographic algorithms. In addition, an attacker can embed digital signatures specific to compressed data in encrypted data, thereby disguising them as legitimate file types. The paper presents an approach to the classification of encrypted and compressed data based on the developed model of pseudorandom sequences and the algorithm for their classification. The accuracy of the proposed method was 0.97.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Statistical data analysis</kwd>
        <kwd>machine learning</kwd>
        <kwd>classification of encrypted and compressed data</kwd>
        <kwd>binary sequence classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        According to the report of the expert and analytical center of the Infowatch group of companies, the
share of internal violators as sources of confidential data leaks in Russia increased in 2020 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], in more
than 79% of cases, confidential data leaks were caused by an internal violator.
      </p>
      <p>To protect information from leaks, software tools for detecting and preventing leaks of confidential
data and various systems for deep traffic analysis are used.</p>
      <p>
        Methods of traffic analysis, depending on the feature spaces used in its classification, can be divided
into several groups: calculation of the entropy of all or part of the data [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5 ref6">2-6</xref>
        ]; service information of data
transmission protocols [
        <xref ref-type="bibr" rid="ref10 ref7 ref8 ref9">7–10</xref>
        ], statistical characteristics and byte distribution [
        <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14">11–14</xref>
        ]. However, there
are ways to circumvent these security methods, such as using encryption, data compression, or
encapsulation in other protocols [
        <xref ref-type="bibr" rid="ref15 ref16">15,16</xref>
        ]. In papers [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ] the authors conclude that the entropy
approach reduces the information about the distribution to a single number, thereby reducing the
features available for analysis. To overcome this problem, deep neural networks were used to classify
data of two classes: encrypted by the aes algorithm and compressed by the rar, zip/gzip, jpeg, png, mp3,
and pdf algorithms. This mechanism, according to the authors, will automatically determine the most
significant features inherent in the analyzed sequences and improve the accuracy of their classification.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], a combined approach based on recurrent networks and extremely randomized trees is used
to detect potentially malicious actions and intrusions into information systems. Service network
information was used as attributes.
      </p>
      <p>
        In the paper [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] notes that the byte distributions in encrypted and compressed sequences tend to be
evenly distributed. This fact is explained by the fact that encryption algorithms disperse the original
statistics in messages and compression algorithms tend to reduce the redundancy of the original
messages. The article [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] describes the use of a feature space based on byte allocation for detecting
potentially malicious software.
      </p>
      <p>
        In the reviewed studies, the following machine learning algorithms were used: decision tree (DT),
support vector machine (SVM), random forest (RF), Markov chain (MC), hidden Markov chain (HMC),
boosting (BG), convolutional neural networks (CNN), recurrent neural networks (RNN), determination
of autocorrelation of distributions (AC). The considered methods use container and file headers for data
analysis, which contain "magic" bytes – digital signatures that uniquely identify the transmission
protocol or the compressed data container. At the same time, software and hardware protection against
information leaks do not have mechanisms for analyzing encrypted or compressed data, in the absence
of information about the compression algorithm [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. An overview of the methods considered is
presented in Table 1.
      </p>
      <p>
        Machine learning algorithms are also used in related areas of information security. So in the study,
the authors used the XGBoost algorithm to search for abnormal behavior of players on the stock
exchange who own insider information and use it improperly [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. The signs were various specific signs
and financial indicators inherent in conducting transactions on the exchange. Despite some differences
in the subject area, the authors solve a similar problem of optimizing machine learning algorithms to
improve the accuracy of classifying illegitimate actions of agents.
      </p>
      <p>
        In a number of studies, an algorithm of extremely randomized trees is used to detect intrusions into
corporate networks. This approach also applies to the ensemble method as well as the random forest,
but differs in the way of selecting the partition threshold. Instead of searching for the most optimal
thresholds, as happens in the random forest algorithm, the thresholds are selected randomly for each
possible feature, and the best one is selected as a rule for dividing the node. The use of extremely
randomized trees slightly reduces the variance of the model due to a slightly larger increase in the bias
[
        <xref ref-type="bibr" rid="ref23 ref24 ref25">23-25</xref>
        ].
      </p>
      <p>
        Gradient boosting refers to ensemble machine learning algorithms that can be used for classification
or regression tasks. Ensembles are built on the basis of decision tree models. Trees are added one by
one to the ensemble and trained to correct prediction/classification errors made by previous models.
The models are trained using any arbitrary differentiable loss function and a gradient descent
optimization algorithm. This explains the name of the algorithm "gradient boosting", since the loss
gradient is minimized as the model is trained, like a neural network, and the addition of trees causes the
term boosting [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
      <p>Special attention in Work 21 is paid to the definition of hyperparameters of the classifier. The
maximum depth of the trees limits the growth of the trees used in depth. A higher value increases the
complexity of the model, but at the same time can lead to overfitting. The learning rate controls the
weight of an individual model in the ensemble prediction. Smaller speed values may require more
decision trees in the ensemble.</p>
      <p>
        In work [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], the authors search for hyperparameters of the classifier based on the lightgbm
algorithm to increase its accuracy, the main ones are: maximum depth of trees – this parameter
determines the complexity of the model, the higher it is, the more accurately classifies the data, but can
lead to overfitting and a decrease in accuracy on test data; learning rate – the parameter assigns the
weight of each tree in the ensemble, lower values indicate a low weight of each tree in the ensemble.
The special role of the feature selection and normalization process is also noted, since erroneous data
and omissions in the feature values can significantly reduce the accuracy of the classification algorithms
used. The statistical characteristics of the obtained distributions of the values of the feature space were
used as features: the minimum and maximum values, the arithmetic mean, and the variance.
      </p>
      <p>The analysis of the works considered in the study allows us to conclude that machine learning
algorithms are widely used in the field of information security and their high accuracy in solving
problems of data classification of various classes. In many works, the entropy approach, the distribution
of bytes and subsequences of different lengths, is used to form the feature space. Based on the analysis,
it was suggested that it is possible to form a feature space based on byte distributions and subsequences
of limited bit length.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Pseudo-random sequences model</title>
      <p>The analysis of sources in the subject area of research revealed the frequent use of the byte
distribution as a feature space, its statistical characteristics, and the results of counting the frequencies
of subsequences of different lengths with different steps.</p>
      <p>It was suggested that the feature space based on combining the frequency of occurrence of
independent bit subsequences of different length N (in bits) without taking into account the complete
overlap of each subsequence and the byte distribution can be used to solve the problem of improving
the accuracy of the classification of PRS and improving existing solutions in the field of information
protection from leaks. For example, for sequence = 100101111010, the frequency of occurrence of
subsequences with length N = 3 bits is shown in Table 2.</p>
      <p>In order to prevent the use of digital signatures and correctly classify encrypted and compressed
sequences that have a uniform byte distribution and are called pseudorandom (PSP), a PSP model based
on statistical features was developed. When it is formed, the header part of the file with a length of 10
KB is discarded, the rest is subject to statistical analysis. In its formal form, the PSP model is defined
by the expression 1.</p>
      <p>=    ∈[0,…,255] ,  
,  
,     ,  
,    ∈[0,…,511] ,
(1)
frequency values.
occurrence of subsequences of length 9 bits in the analyzed PRS;</p>
      <p>– standard deviation of the byte distribution;     ,  
where   ∈[0,…,255] – byte distribution in the analyzed PRS;   ∈[0,…,511] – distribution of frequencies of
– mean of the byte frequency
– minimum and maximum byte</p>
      <p>Mean of the byte frequency in PRS and standard deviation of the byte distribution is calculated
according to expression 2.</p>
      <p>=
∑2=505  (  )
256
,
where  (  ) – the number of occurrences of byte i in the analyzed PRS.</p>
      <p>The frequency of occurrence of subsequences in the PRS is calculated according to expression 3.
of the subsequence j in bits.
where   – the number of occurrences of the subsequence j in PRS; M – PRS length in bits; N – length</p>
      <p>To test the adequacy of the model, experiments were conducted to classify the generated set of
encrypted and compressed data with a size of 600 Kbytes. 2000 files containing meaningful text in
Russian were generated, then they were processed by encryption algorithms (AES, DES, RC4,
Camellia, GOST34.12 "Kuznechik") and compression algorithms (RAR, ZIP, 7Z, GZ, BZ2, XZ). Thus
a data sample consisting of two classes and containing 22,000 files was obtained.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Practical application of the developed PRS model</title>
      <p>
        In order to select the most suitable machine learning algorithm, experiments were conducted to
evaluate the accuracy of the PRS classification. In view of the dependence of the classification accuracy
on the number of features used in the PRS model, experiments were conducted using a different number
of features ranked by discriminating ability [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ].
      </p>
      <p>The results of evaluating the accuracy of the classification of PRS by machine learning algorithms
depending on the number of features are shown in the figure 1.
(2)
(3)</p>
      <p>Machine learning algorithms have different time complexity, and to assess their applicability in real
systems, experiments were conducted to determine the training time of a model based on the
corresponding machine learning algorithm. The results of the experiments are shown in Figure 2.</p>
      <p>
        The proposed PRS model based on the developed PRS classification algorithm allows classifying
encrypted and compressed data with an accuracy of 0.97 in 0.5 seconds for a sequence of 600 Kbytes
in length. The classification does not take into account digital signatures, file extensions, and other
service information, but only statistical features of byte distributions and 9-bit subsequences [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ].
      </p>
      <p>The developed solution can be implemented in existing DLP systems and perform statistical analysis
of data transmitted beyond the controlled perimeter of the corporate data network, the flow diagram is
shown in Figure 2.
Number of estimators 100</p>
      <p>Number of features 200
Max depth of the trees 38</p>
      <p>Length of the</p>
      <p>9 bits
subsequences
100 1
101 2
110 1
111 1</p>
      <p>The developed PRS model together with the PRS classification algorithm can be implemented in the
statistical data analysis module of existing DLP systems. The greatest threat to modern enterprise
systems is cloud storage. If necessary, the internal intruder is able to use encryption and data masking
tools, for example, by introducing digital signatures of known file formats: for ZIP format - "50 4B";
RAR - "52 61 72 21 1A"; pdf - "25 50 44 46 2D 31 2E". The developed solution can be implemented
in existing DLP systems and perform statistical analysis of data transmitted outside the controlled
perimeter of the corporate data network, the block diagram of which is shown in figure 5.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>The study examines existing approaches to the formation of feature spaces and machine learning
algorithms used to build classifiers of encrypted and compressed data. Existing approaches show high
classification accuracy, but all of them use the headers of the analyzed files or data packets, which
contain digital signatures that uniquely determine the type of information transmitted. An attacker can
take advantage of this flaw and transmit the information in encrypted form by changing the header part
of the file or using encapsulation methods for traffic.</p>
      <p>To improve the accuracy of the classification of PRS generated by data encryption and compression
algorithms, a model of PRS was developed that uses the byte distribution and the frequency of
occurrence of 9-bit subsequences in the analyzed PRS. The choice of the mathematical apparatus based
on the random forest construction algorithm was justified. During the experiments, the adequacy of the
model and the possibility of its application for the classification of the data types specified in the work
with a precision of 0.98 were confirmed. The place of implementation of the developed PRS model in
the means of detecting and preventing information leaks was proposed.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Acknowledgements</title>
      <p>The reported study was funded by Russian Ministry of Science (information security), project
number 18/2020.</p>
    </sec>
    <sec id="sec-6">
      <title>6. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[1] InfoWatch analytics report</source>
          ,
          <year>2021</year>
          . URL: https://www.infowatch.ru/analytics/reports/30708.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Mamun</surname>
            <given-names>M.S.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghorbani</surname>
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stakhanova</surname>
            <given-names>N.</given-names>
          </string-name>
          <article-title>An Entropy Based Encrypted Traffic Classifier</article-title>
          . // In: Qing S.,
          <string-name>
            <surname>Okamoto</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>D</given-names>
          </string-name>
          . (eds) Information and
          <string-name>
            <given-names>Communications</given-names>
            <surname>Security</surname>
          </string-name>
          .
          <source>ICICS 2015. Lecture Notes in Computer Science</source>
          , vol.
          <volume>9543</volume>
          . Springer, Cham. DOI:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          - 29814-6_
          <fpage>23</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Tang Z.</given-names>
            ,
            <surname>Zeng</surname>
          </string-name>
          <string-name>
            <given-names>X.</given-names>
            ,
            <surname>Sheng</surname>
          </string-name>
          <string-name>
            <surname>Y</surname>
          </string-name>
          .
          <article-title>Entropy-based feature extraction algorithm for encrypted and nonencrypted compressed traffic classification</article-title>
          .
          <source>International Journal of ICIC</source>
          ,
          <year>2019</year>
          , vol.
          <volume>15</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>845</fpage>
          -
          <lpage>860</lpage>
          . DOI:
          <volume>10</volume>
          .24507/ijicic.15.03.845.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Belyaev</surname>
            <given-names>S.</given-names>
          </string-name>
          <article-title>and etc. Development of a Pseudo-Random Sequence Generation Function Based on the “Kuznechik” Cryptographic Algorithm</article-title>
          .
          <source>Voprosy kiberbezopasnosti [Cybersecurity issues]</source>
          ,
          <year>2021</year>
          , No
          <volume>4</volume>
          (
          <issue>44</issue>
          ), pp.
          <fpage>25</fpage>
          -
          <lpage>34</lpage>
          . DOI:
          <volume>10</volume>
          .21681/
          <fpage>2311</fpage>
          -3456-2021-4-
          <fpage>25</fpage>
          -
          <lpage>34</lpage>
          . DOI:
          <volume>10</volume>
          .21681/
          <fpage>2311</fpage>
          -3456- 2017-5-
          <fpage>30</fpage>
          -
          <lpage>41</lpage>
          . (In Russ.)
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Livshitz</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neklydov</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <source>Assessment of Entropy of Information Security Systems. Voprosy kiberbezopasnosti [Cybersecurity issues]</source>
          ,
          <source>2017. No</source>
          <volume>5</volume>
          (
          <issue>24</issue>
          ), pp.
          <fpage>30</fpage>
          -
          <lpage>41</lpage>
          . DOI:
          <volume>10</volume>
          .21681/
          <fpage>2311</fpage>
          -3456- 2017-5-
          <fpage>30</fpage>
          -
          <lpage>41</lpage>
          . (In Russ.)
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Practical evaluation of encrypted traffic classification based on a combined method of entropy estimation and neural networks</article-title>
          .
          <source>ETRI Journal</source>
          ,
          <volume>42</volume>
          (
          <issue>3</issue>
          ),
          <fpage>311</fpage>
          -
          <lpage>323</lpage>
          . DOI:
          <volume>10</volume>
          .4218/etrij.2019-
          <volume>0190</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Shen</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            <given-names>L.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Wang</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>Classification of encrypted traffic with second-order markov chains and application attribute bigrams //</article-title>
          <source>IEEE Transactions on Information Forensics and Security</source>
          ,
          <year>2017</year>
          , vol.
          <volume>12</volume>
          , no.
          <issue>8</issue>
          , pp.
          <fpage>1830</fpage>
          -
          <lpage>1843</lpage>
          . DOI:
          <volume>10</volume>
          .1109/TIFS.
          <year>2017</year>
          .
          <volume>2692682</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Chen</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zang</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhouz</surname>
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Wang</surname>
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Rethinking Encrypted Traffic Classification: A Multi-Attribute Associated</surname>
          </string-name>
          Fingerprint Approach. //
          <source>2019 IEEE 27th International Conference on Network Protocols (ICNP)</source>
          , Chicago, IL, USA,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          . DOI:
          <volume>10</volume>
          .1109/ICNP.
          <year>2019</year>
          .
          <volume>8888043</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Obasi</surname>
            <given-names>T.C.</given-names>
          </string-name>
          <article-title>Encrypted Network Traffic Classification using Ensemble Learning Techniques // Doctoral dissertation</article-title>
          , Carleton University,
          <year>2020</year>
          . DOI:
          <volume>10</volume>
          .22215/etd/2020-14171.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Yao</surname>
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ge</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            <given-names>R.</given-names>
          </string-name>
          , Ma Y.
          <article-title>Encrypted traffic classification based on Gaussian mixture models</article-title>
          and
          <source>Hidden Markov Models. // Journal of Network and Computer Applications</source>
          ,
          <year>2020</year>
          , vol.
          <volume>166</volume>
          , p.
          <fpage>102711</fpage>
          . DOI:
          <volume>10</volume>
          .1016/j.jnca.
          <year>2020</year>
          .
          <volume>102711</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Wang</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quach</surname>
            <given-names>T.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wheeler</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aimone</surname>
            <given-names>J.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>James</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          <article-title>Sparse coding for n-gram feature extraction and training for file fragment classification</article-title>
          .
          <source>// IEEE Transactions on Information Forensics and Security</source>
          ,
          <year>2018</year>
          , vol.
          <volume>13</volume>
          , no.
          <issue>10</issue>
          , pp.
          <fpage>2553</fpage>
          -
          <lpage>2562</lpage>
          . DOI:
          <volume>10</volume>
          .1109/TIFS.
          <year>2018</year>
          .
          <volume>2823697</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Choudhury</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            <given-names>K. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nandi</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Athithan</surname>
            <given-names>G.</given-names>
          </string-name>
          <article-title>An empirical approach towards characterization of encrypted and unencrypted VoIP traffic // Multimedia Tools</article-title>
          and Applications,
          <year>2020</year>
          , vol.
          <volume>79</volume>
          , no.
          <issue>1-2</issue>
          , pp.
          <fpage>603</fpage>
          -
          <lpage>631</lpage>
          . DOI:
          <volume>10</volume>
          .1007/s11042-019-08088-w.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Baldini</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hernandez-Ramos</surname>
            <given-names>J. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nowak</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neisse</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nowak</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>Mitigation of Privacy Threats due to Encrypted Traffic Analysis through a Policy-Based Framework and</article-title>
          MUD Profiles. // Symmetry,
          <year>2020</year>
          , vol.
          <volume>12</volume>
          , no.
          <issue>9</issue>
          , p.
          <fpage>1576</fpage>
          . DOI:
          <volume>10</volume>
          .3390/sym12091576.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Shen</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guizani</surname>
            <given-names>N.</given-names>
          </string-name>
          <article-title>Optimizing Feature Selection for Efficient Encrypted Traffic Classification: A Systematic Approach /</article-title>
          / IEEE Network,
          <year>2020</year>
          , vol.
          <volume>34</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>20</fpage>
          -
          <lpage>27</lpage>
          . DOI:
          <volume>10</volume>
          .1109/MNET.011.1900366.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Huang</surname>
            <given-names>X.</given-names>
          </string-name>
          et al.
          <article-title>A novel mechanism for fast detection of transformed data leakage /</article-title>
          /IEEE Access,
          <year>2018</year>
          , vol.
          <volume>6</volume>
          , pp.
          <fpage>35926</fpage>
          -
          <lpage>35936</lpage>
          . DOI:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2018</year>
          .
          <volume>2851228</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Cheng</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao D</surname>
          </string-name>
          . D.
          <article-title>Enterprise data breach: causes, challenges, prevention</article-title>
          , and future directions //Wiley Interdisciplinary Reviews:
          <source>Data Mining and Knowledge Discovery</source>
          ,
          <year>2017</year>
          , vol.
          <volume>7</volume>
          , no. 5. DOI:
          <volume>10</volume>
          .1002/widm.1211.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>De Gaspari F</surname>
          </string-name>
          .,
          <string-name>
            <surname>Hitaj</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pagnotta</surname>
            <given-names>G.</given-names>
          </string-name>
          , De Carli L.,
          <string-name>
            <surname>Mancini</surname>
            <given-names>L. V.</given-names>
          </string-name>
          <article-title>EnCoD: Distinguishing Compressed and Encrypted File Fragments</article-title>
          .
          <source>In International Conference on Network and System Security</source>
          (pp.
          <fpage>42</fpage>
          -
          <lpage>62</lpage>
          ).
          <year>2020</year>
          . Springer, Cham. DOI:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -65745-
          <issue>1</issue>
          _
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Lee</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            <given-names>S. Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yim</surname>
            <given-names>K.</given-names>
          </string-name>
          <article-title>Machine learning based file entropy analysis for ransomware detection in backup systems</article-title>
          .
          <source>IEEE Access</source>
          ,
          <volume>7</volume>
          , pp.
          <fpage>110205</fpage>
          -
          <lpage>110215</lpage>
          .
          <year>2019</year>
          . DOI:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2019</year>
          .
          <volume>2931136</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Kasongo</surname>
            <given-names>S. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            <given-names>Y.</given-names>
          </string-name>
          <article-title>A deep gated recurrent unit based model for wireless intrusion detection system</article-title>
          .
          <source>ICT Express</source>
          ,
          <volume>7</volume>
          (
          <issue>1</issue>
          ),
          <fpage>81</fpage>
          -
          <lpage>87</lpage>
          .
          <year>2021</year>
          . DOI:
          <volume>10</volume>
          .1016/j.icte.
          <year>2020</year>
          .
          <volume>03</volume>
          .002.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Raff</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zak</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cox</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sylvester</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yacci</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ward</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nicholas</surname>
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>An investigation of byte n-gram features for malware classification</article-title>
          .
          <source>Journal of Computer Virology and Hacking Techniques</source>
          ,
          <volume>14</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          . DOI:
          <volume>10</volume>
          .1007/s11416-016-0283-1.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>De Gaspari</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hitaj</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pagnotta</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Carli</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Mancini</surname>
            ,
            <given-names>L. V.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Reliable Detection of Compressed and Encrypted Data</article-title>
          .
          <source>arXiv preprint arXiv:2103</source>
          .
          <fpage>17059</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Deng</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tian</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>T</given-names>
          </string-name>
          .
          <article-title>Identification of Insider Trading Using Extreme Gradient Boosting and Multi-Objective Optimization</article-title>
          . Information,
          <volume>10</volume>
          (
          <issue>12</issue>
          ),
          <fpage>367</fpage>
          .
          <year>2019</year>
          . DOI:
          <volume>10</volume>
          .3390/info10120367.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Ke</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meng</surname>
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finley</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>W.</given-names>
          </string-name>
          , Ma W.,
          <string-name>
            <surname>Liu</surname>
            <given-names>T. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Lightgbm</surname>
          </string-name>
          :
          <article-title>A highly efficient gradient boosting decision tree</article-title>
          .
          <source>Advances in neural information processing systems</source>
          ,
          <volume>30</volume>
          ,
          <fpage>3146</fpage>
          -
          <lpage>3154</lpage>
          .
          <year>2017</year>
          . DOI:
          <volume>10</volume>
          .5555/3294996.3295074.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Minastireanu</surname>
            <given-names>E. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mesnita</surname>
            <given-names>G</given-names>
          </string-name>
          .
          <article-title>Light gbm machine learning algorithm to online click fraud detection</article-title>
          .
          <source>J. Inform. Assur. Cybersecur</source>
          ,
          <year>2019</year>
          . DOI:
          <volume>10</volume>
          .5171/
          <year>2019</year>
          .263928.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Ge</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gu</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cai</surname>
            <given-names>J</given-names>
          </string-name>
          .
          <article-title>Credit Card Fraud Detection Using Lightgbm Model</article-title>
          . In 2020 International Conference on E-Commerce and
          <article-title>Internet Technology (ECIT) (pp</article-title>
          .
          <fpage>232</fpage>
          -
          <lpage>236</lpage>
          ). IEEE.
          <year>2020</year>
          . DOI:
          <volume>10</volume>
          .1109/ECIT50008.
          <year>2020</year>
          .
          <volume>00060</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Kozachok</surname>
            <given-names>A.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spirin</surname>
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golembiovskaya O.M.</surname>
          </string-name>
          <article-title>Algoritm classificazii psevdosluchainuh posledovatelnostei na osnove postroenija sluchainogo lesa. Dokladj Tomskogo gosudarstvennogo universiteta sistem upravlenija I radioelektroniki</article-title>
          . -
          <source>2020</source>
          . - V.
          <year>23</year>
          . - no.
          <issue>3</issue>
          . - P.
          <fpage>55</fpage>
          -
          <lpage>60</lpage>
          . DOI:
          <volume>10</volume>
          .21293/1818-0442-2020-23-3-
          <fpage>55</fpage>
          -
          <lpage>60</lpage>
          . (In Russ.)
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Kozachok</surname>
            <given-names>A.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spirin</surname>
            <given-names>A.A.</given-names>
          </string-name>
          <article-title>Algoritm classificazii psevdosluchainuh posledovatelnostei</article-title>
          .
          <source>Sistemnij analyz i informazionnie tehnologii. - 2020</source>
          . - no.
          <issue>1</issue>
          , pp.
          <fpage>87</fpage>
          -
          <lpage>98</lpage>
          . DOI:
          <volume>10</volume>
          .17308/sait.
          <year>2020</year>
          .
          <volume>1</volume>
          /2595.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>