=Paper= {{Paper |id=Vol-3003/short4 |storemode=property |title=Hyperplane Clustering of the Data in the Vector Space of Features Based on Pseudo Inversion Tools |pdfUrl=https://ceur-ws.org/Vol-3003/short4.pdf |volume=Vol-3003 |authors=Iurii Krak,Hrygorii Kudin,Veda Kasianiuk,Mykola Efremov |dblpUrl=https://dblp.org/rec/conf/profitai/KrakKKE21 }} ==Hyperplane Clustering of the Data in the Vector Space of Features Based on Pseudo Inversion Tools== https://ceur-ws.org/Vol-3003/short4.pdf
Hyperplane Clustering of the Data in the Vector Space of
Features Based on Pseudo Inversion Tools
Iurii Kraka,b, Hrygorii Kudinb, Veda Kasianiuka and Mykola Efremova,b
a
     Taras Shevchenko National University of Kyiv, Kyiv, 64/13, Volodymyrska str., 01601, Ukraine
b
     Glushkov Cybernetics Institute, Kyiv, 40, Glushkov ave., 03187, Ukraine

                   Abstract
                   Method of the hyperplane data clustering in the vector space characteristics features based on
                   the results of the theory of perturbation of pseudo-inverse and projective matrices and
                   solutions of systems of linear algebraic equations (SLAE) is proposed. The algorithm of a
                   hyperplane clustering with the verification of a given criterion for the effectiveness of the
                   proposed method of the clustering is developed. An example of using the method of scaling
                   characteristic features for recognizing the fingerspelling alphabet of the sign language is
                   given.
                   Keywords 1
                   clustering, pseudo-inverse matrix, SLAE, optimization

1. Introduction
    One of the important problems of classification and clustering of information is the problem of
minimizing the dimension of the space of features, the choice of criteria for optimal solutions in the
processes of practical use. Such problems are effectively solved by the method of multidimensional
scaling of empirical data on the proximity of objects, with the help of which the dimension of the
measured objects essential characteristics space is determined and the configuration of points -
objects in this space - is constructed. This space is a multidimensional scale, similar to the commonly
used scales in various applications in the sense that the values of the specially generated essential
characteristics of the measured objects correspond to certain positions on the axes of the new space
[1]-[5].
    The purpose of this work is the development of mathematical methods for the synthesis of systems
for solving problems of classification and clustering based on information about the characteristic
features of objects [6]-[10]. These problems are proposed to be solved by constructing hyperplanes in
a space derived from the original space of features using theory of perturbation of pseudo-inverse and
projection matrices and solving systems of linear algebraic equations. The paper proposes a method
for synthesizing a piecewise hyperplane cluster to isolate the most effective characteristic features and
an algorithm for constructing piecewise hyperplane clusters that allow you to find an effective
solution to the problems . The performance and efficiency of the proposed approach is shown on the
example of the scaling of characteristic features for recognizing the letters of the fingerprint alphabet
of the sign language [8], [9], [11].

2. Method of synthesis of the hyperplane cluster concepts
    The problem of synthesis of a piecewise hyperplane cluster for a training sample of vectors
                                            
    0 = x : x( j )  E m , j = 1, n , where x(1),, x(n) - vectors from the Euclidean space of features E m

International Workshop of IT-professionals on Artificial Intelligence (ProfIT AI 2021), September 20–21, 2021, Kharkiv, Ukraine
EMAIL:        krak@univ.kiev.ua       (I.Krak);      gkudin@ukr.net         (H.Kudin);      veda.kasianiuk@gmail.com       (V.Kasianiuk);
nick.yefremov.in@gmail.com (M.Efremov)
ORCID: 0000-0002-8043-0785 (I. Krak); 0000-0002-7310-2126 (H.Kudin); 0000-0003-3268-303X (V.Kasianiuk); 0000-0001-8698-3957
(M.Efremov)
              ©️ 2021 Copyright for this paper by its authors.
              Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
              CEUR Workshop Proceedings (CEUR-WS.org)
is to build a cluster so that the training sample points in this space are located quite close, in the sense
of a given distance criterion, to some set of hyperplanes that are formed according to this sample.
    Note that in the formulation of this problem for clustering of information, the components of the
set of hyperplanes are not known in advance. Therefore, for the correct construction of piecewise
hyperplane clustering procedures, it is assumed that vectors x(1),, x(n) from the space of features
                                                                               sm
 E m can belong to one of several hyperplanes L( A(k ), b(k )), where A(k )  E , b(k )  E , k = 1,2,...
                                                                                            s



some given dimension s , ( s  m) . Here A(k ) and b(k ) are matrix and vector parameters, respectively,
for a fixed hyperplane L( A(k ), b(k )) k = 1,2,... .
   The proposed method of a piecewise-hyperplane cluster synthesis is based on using the
representation of hyperplanes by means of a set of solutions (pseudo solutions) of systems of
algebraic equations.
                                          A(k ) x = b(k ) ,                               (1)
                                                            L( A(k ), b(k )) =
                                            +
                = {x  E : x = A (k )b(k ) + Z ( A(k )) z, z  E m } .
                                m
                                                                                                                  (2)
   Here A+ - pseudo inverse matrix, Z - projection matrix.
   Let us give some mathematical results on the inversion (pseudo inversion) of matrices and the
construction of projection matrices, which are important for solving the problem of synthesizing a
piecewise-hyperplane cluster.
   Let the matrix be given A = (aij ), i = 1, m j = 1, n and we write down, which are important in
further studies, the representations of this matrix in columns or rows, respectively:
                                A = (a (1) ... a (n)), a ( j )  E m , j = 1, n,
                                                (                      )   T
                                         A = a (T1)  a (Tm ) , a (Ti )  E n , i = 1, m ,
where T - transpose symbol.
  We consider the singular decomposition of an arbitrary matrix A of dimension m  n of
                                                    r
rank r  min( m, n) in the form A =  i ui viT , where: 12  ...  2r  0 - non-zero eigenvalues of
                                                i =1

matrices AA , A A ; vi  E , i = 1, r - orthonormal set of eigenvectors of the matrix AT A , which
               T   T            n


correspond to non-zero eigenvalues i2 , i = 1, r , AT Av i = i2 vi , i = 1, r, v iT v j =  ij ui  E m , i = 1, r -
orthonormal set of eigenvectors of the matrix AAT , which also correspond to non-zero
eigenvalues i2 , i = 1, r , AAT ui = i2 u i , i = 1, r, u T u j =  ij ,  ij - Kronecker’s symbol.
                                                               i


   Let us give the definition of a pseudo inverse matrix in the Penrose optimization form
[12]. For a matrix A  E mn , a pseudo inverse matrix A+  E mn is defined by the relation:
                                                bE m A + b = arg min
                                                                                                  2
                                                                                                 x .
                                                                                   x   A (b)
                                     2
   Here  A (b) = Arg min
                      xE
                          Ax − b .
                            n



   Also, using the singular representation of the matrix A  E mn , the pseudo inverse matrix
                                                                   r
A+  E nm can be represented as [13]: A =
                                                        +
                                                              u  .
                                                               j =1
                                                                       j
                                                                               T
                                                                               j
                                                                                   −1
                                                                                   j


   Will consider important for practical application, matrices that are defined and calculated
using matrices A и A+ :
                                                              r
   1) a projection matrix P( A) = A+ A   i iT that is an orthogonal projector onto the
                                                             i =1
subspace LA generated by the row vectors of the matrix A ;
               T


   2) projection matrix Z ( A) = I n−P( A) - orthogonal projection onto a subspace orthogonal to a
subspace LA , I n - unit matrix;
               T
                                                       r
   3) matrix R( A) = A ( A )   j j −j 2 .
                                +    + 

                                                       j =1

   Note the important properties of projection matrices P and Z :
                                                           P( A) + Z ( A) = I n , P( A) = A+ A,
                                                                      r                                        r
                                                     P( A) =  vi viT , Z ( A) =  vi viT .
                                                                     i =1                                  i = r +1

    Note that the calculation of the pseudo inverse matrix for an arbitrary matrix can be reduced to the
calculation of the corresponding to it some inverse matrix using the following relations:
                                       A+ = ( AT A) + AT = AT ( AAT ) + , A+ = R( A) AT .
    Using the above mathematical ratios for the inversion and pseudo inversion of matrices, we write
the necessary formulas for the synthesis of a piecewise hyperplane cluster.
    The distance  (x( j ), L( A, b)) from the point x( j ) to the hyperplane L( A, b) will be found from the
relation
                                          2 (x( j ), L( A, b) ) = A+ (b − Ax ( j )) , x = x T x .
                                                                                                                   2           2



   To calculate the sum of the squares of the distances of the set of points x( j ) , j = 1, n to the
hyperplane L( A, b) , we use the following formula:

       (                              )
                                                       n                        T                                                                   n
     2 x : x( j ), j = 1, n , L( A, b) = =  (b − Ax ( j ) ) R( AT )(b − Ax ( j ) ) = = tr R( AT ) (b − Ax ( j ) )(b − Ax ( j ) )T .
                                                     j =1                                                                                       j =1

   Here tr() - matrix trace.
   Then, for given values A, x( j ), j = 1, n , the optimal value of the vector of the right-hand side of
the system of equations determining the hyperplane is determined from the conditions
                                                                bopt = Axˆ =

                                                                (
                                    = arg minS  2 x : x( j ), j = 1, n , L( A, b) , xˆ =
                                                 bR
                                                                                                                      )           1 n
                                                                                                                                     
                                                                                                                                   n j =1
                                                                                                                                          x( j ).

                                             (                                                       )
   From here, the distance  x : x( j ), j = 1, n , L( A, b) of the set of points x( j ) , j = 1, n to the
hyperplane, with an optimal vector bopt , is calculated by the following formula:

                                            (                                                            )
                                                                                                                                           1
                                                                                                                                   ~~
                                    x: x(j), j = 1,n , L( A, bopt ( A)) = (trA+ AXX T ) 2 ,
         ~
   where X = ( ~
               x (1)) .. ~
                           x (n)), ~
                                   x ( j) = x( j ) − xˆ, j = 1, n .
   The optimal matrix A  E s m is defined as a solution of the problem
                                                   Aopt = arg               min
                                                                     AAT = E s , AE s  m
                                                                                                   
                                                                                              2 ( x : x ( j ), j = 1, n ,.          
                                                                                        (
                                                           L( A,b opt ( A))) = u mT − s +1  u mT .                        )
                                                                                                                           T


                                        r

                                        , (u ,, u ) (u ,, u ) = I .
              +            ~~
   Wherein trAopt Aopt XX T =                    2
                                                 j          1         m
                                                                            T
                                                                                 1             m           m
                                    j = m − s +1

    Using the above results on pseudo inversion of matrices and calculation of distances, a piecewise
hyperplane clustering method is proposed. The idea of the method is to perform a sequence of steps,
at each of which the parameters of the hyperplanes L( A(k ), b(k )), k = 1,2,... are found. These
hyperplanes are constructed within the framework of performance the requirements of the
implemented hyperplane clustering efficiency criterion. As initial parameters of piecewise hyperplane
clustering it is assumed that all the training set vectors x(1),, x(n) from the space of features E m
are optimally approximated by the hyperplane L( Aopt (1), bopt (1)) . At these values Aopt (1), bopt (1) , the
performance efficiency criterion of hyperplane clustering is checked. If the efficiency criterion is
satisfied, this means that the result of the construction of the piecewise hyperplane cluster is achieved
by building the cluster with one hyperplane and those characteristic features are the optimal set. If the
conditions of the efficiency criterion in the framework of this (first) hyperplane are not met, then the
transition to the construction of the second hyperplane of the cluster is carried out. For this purpose,
the vectors that determine the non-fulfillment of the efficiency criterion are excluded from the training
                                                                                               
sample, that is, a subset is formed 1 = x : x( j1 )  E m , j1 = 1, n1 . Then the actions on the optimal
approximation of the subset  by the hyperplane L( Aopt (2), bopt (2)) are repeated, and at the new
                                            1


optimal values Aopt (2), bopt (2) the fulfillment of the clustering efficiency criterion is checked. When
the criterion is met, the result of the piecewise hyperplane cluster construction is reached, then the
resulting vectors of characteristic features are added to the optimal set of features. When not
performing - it is necessary to exclude from the training sample those vectors that cause to the non-
fulfillment of the efficiency criterion and repeat the procedure. Note that this finite recurrent process
will always ensure the construction of an optimal piecewise hyperplane cluster.

3. Algorithm of synthesis of the piecewise hyperplane cluster
   Using the proposed method of synthesizing piecewise hyperplane clusters, the algorithm of such a
synthesis can be represented as the following sequence of actions:
   1. Formation of a single-link cluster (the number of a link in a cluster is an index in brackets):
                                                                                                      
    1) All vectors of the training sample (0) = x : x( j)  E m , j = 1, n from the space of features are
to be optimally approximated by a hyperplane L( Aopt (1), bopt (1)) , which is defined as the set of
solutions (pseudo-solutions) of systems of algebraic equations (1), (2), where A(k ) and b(k ) are
respectively the matrix and vector parameters of a certain hyperplane (basic formulas for constructing
a hyperplane cluster are implemented).
                                                                   
   2) To form a set (1) = x : x( j1 )  E m , j1 = 1, n1 of points (0) for which the condition is satisfied,
namely:
                                                 ( b1opt (1) − A1opt (1) x ( j1 ))T R( A1Topt (1)) 
                                                  ( b1opt (1) − A1opt (1) x ( j1 ))  hmin ,
    where hmin - valid distance of vectors from components of the cluster of hyperplanes. The linear
dependence or independence of vectors that can be removed from the set makes it possible to simplify
the form of the formulas for the distances of these vectors from the corresponding hyperplanes.
    3) The stop of the algorithm at the stage of the first cluster link can be completed after the distance
                                                                                       
of each of the vectors of the training sample (0) = x : x( j)  E m , j = 1, n to the hyperplane              
 L( Aopt (1), bopt (1)) does not exceed the allowable distance.
                                                              
   4) If the set (1) = x : x( j1 )  E m , j1 = 1, n1 includes at least two vectors, then go on to form the
second link of the cluster.
   2. Formation of the second cluster link. Consists of the following:
                                                                                                           
   1) From the set obtained in the process of building the first link (1) = x : x( j1 )  E m , j1 = 1, n1 , a                        
hyperplane L( Aopt (2), bopt (2)) defined as the set of solutions (pseudo - solutions) of a system of
algebraic equations
                                 Ak (2) x = bk (2) , k = 1,2,...                                  (3)
   is constructed.
   2) To calculate optimal Akopt (2) , bkopt (2) for L( Ak (2), bk (2)) , k = 1,2,...
   3) To form a set 0k (2), k = 1,2,... by the distance of the vector x ( j ) 0 ( 2) to each of the
hyperplanes (3):
        2 (x : x( j ) , L( Akopt (2), bkopt (2))) = = (bkopt (2) − Akopt (2) x( j ))T R T ( Akopt (2))   (bkopt (2) − Akopt (2) x( j )) ,
   i.e., perform actions similar to paragraph 3 of the construction of the first link.
   4) To go to step 2 with new subsets 1j (1) , 2j (1) , j = 1,2,. The superscript j denotes the
number of iterations at the second link stage.
   The algorithm is stopped at the stage of the second cluster link after the distances of each of the
vectors of the corresponding partitioning element to the corresponding hyperplane are not improved.
   The efficiency criterion of the carried out hyperplane clustering is checked (for example, the
requirement of the level of compactness of the cluster links). When the criterion is fulfilled, the
cluster is completed, and if it is unfulfilled, the transition to the formation of the third cluster link is
carried out. Then the process repeats and, since it is finite, we will always find a solution to the
clustering problem.


4. Experimental studies
   To test the effectiveness of the proposed method of scaling information, we used characteristic
features for recognizing the dactyl letters of the Ukrainian sign language alphabet [6], [7]. As
characteristic features, 52 features were taken, which were divided into 6 groups, depending on the
method of their getting. The construction of piecewise hyperplane clusters was carried out with
groups of features that characterize the geometric - topological parameters of the human hand when
showing the letters of the dactyl alphabet and for which an acceptable recognition quality was
obtained [6]. Using the example of the classification of nine dactylemes (А, Б, В, Г, Ж, І, Є, И, Й)
according to three and five characteristic features, the separation of these dactylemes on the scaling
plane was obtained. Wherein three characteristic features were taken: compactness, directionality,
elongation, and in experiments with five features: the ratio of width to height, four angles values
between vectors drawn from the center of the hand to the most distant points of the hand.
   Experimental results using these three characteristic features are given in Fig. 1, where image of
the dactylemes location on the scaling plane is normalized on the interval 0,1. Without reducing the
generality of research, dactylem A was placed at the origin.

                                 Dact          X          Y          D
                              ylemes
                                  А          0.000        0.00       0.0354
                                             0           00
                                   Б         0.719        0.95       0.0073
                                             6           17
                                   В         0.457        0.66       0.0519
                                             9           44
                                   Г         0.449        0.60       0.1107
                                             5           88
                                   Ж         0.454        0.70       0.0536
                                             7           62
                                   І         0.325        0.44       0.0306
                                             7           15
                                   Є         0.299        0.52       0.1274
                                             8           37
                                   И         0.408        0.59       0.0643
                                             3           08
                                   Й         0.406        0.62       0.0707
                                             7           52
                             1
                           0,9                                              []
                           0,8
                           0,7                            []
                                                            []   []
                           0,6                       []          []
                           0,5                     [][]
                           0,4
                           0,3
                           0,2
                           0,1
                             0       []
                                 0        0,2            0,4          0,6    0,8



Figure 1: Dactylemes location on hyperplane of scaling and distances of dactylemes from hyperplane
                            of scaling for three characteristic features.

   The results of clustering with using of five characteristic features are shown in Fig. 2., where the
dactylem A was also placed at the origin.

                         Dactylemes                X                 Y           D
                             А                  0.0000            0.0000      0.2719
                             Б                  1.0345            0.6393      0.0177
                             В                  0.5065            0.4320      0.3419
                             Г                  0.5271            0.3670     0.3828
                             Ж                  0.3878            0.3289      0.2511
                              І                 0.5353            0.4159      0.1825
                             Є                  0.3718            0.2753      0.1976
                             И                  0.5851            0.4470      0.1580
                             Й                  0.4585            0.2876      0.1647




Figure 2: Dactylemes location on hyperplane of scaling and distances of dactylemes from hyperplane
of scaling for five characteristic features.

    The results of the experiments showed that using of five characteristic features makes it possible to
get a clearer separability (the distance of the dactylemes from the scaling plane ranged from 0.1580 to
0.3828), while using three features the distances from the scaling plane were significantly less
(ranging from 0.0306 to 0.1274) that is, three to five times less. The exception was dactylem Б, in the
first case (0.0177) and in the second case (0.0073), the separability from the scaling plane was not
insignificant in both cases.
5. Discussion and conclusions
   The paper proposes method and algorithm for multidimensional scaling of recognition objects
characteristic features using the means of matrix pseudo inversion, building piecewise hyperplane
clusters that provide a solution of the problem [14]-[18]. Proposed method makes it possible to
analyze information about the set of characteristic features and to identify those that are essential for
solving the recognition problem that is important due to their significant number and for difficult
separable classes [19]-[24]. The effectiveness of the proposed approach is shown by the example of
obtaining clusters for dactylemes sign language in order to obtain the optimal number of characteristic
features for effective recognition.

6. References
[1] M. Devison. Multidimensional scaling. Moscow: Finansy I Statistyka, 1988. p. 254.
[2] P. Perera, P. Oza, V. M. Patel. One-Class Classification: A Survey. (2021) arXiv preprint
     arXiv:2101.03064.
[3] M. Z. Zaheer, J. H. Lee, M. Astrid, A. Mahmood, & S. I. Lee. Cleaning label noise with clusters
     for minimally supervised anomaly detection, 2021. arXiv preprint arXiv:2104.14770.
[4] W. Sultani, C. Chen, & M. Shah. Real-world anomaly detection in surveillance videos. In
     Proceedings of the IEEE conference on computer vision and pattern recognition, (2018) 6479-
     6488.
[5] I.V.Krak, G.I.Kudin & A.I.Kulias. Multidimensional Scaling by Means of Pseudoinverse
     Operations, Cybernetics and Systems Analysis, 55(1) (2019) 22-29. doi: 10.1007/s10559-019-
     00108-9.
[6] L. Cheng, Y. Wang, X. Liu, & B. Li. Outlier Detection Ensemble with Embedded Feature
     Selection. In Proceedings of the AAAI Conference on Artificial Intelligence 34(04), (2020)
     3503-3512. https://doi.org/10.1609/aaai.v34i04.5755.
[7] S. A. N. Alexandropoulos, S. B. Kotsiantis, V. E. Piperigou, & M. N. Vrahatis. A new ensemble
     method for outlier identification. In 2020 10th International Conference on Cloud Computing,
     Data        Science       &        Engineering,        IEEE,     (2020)     769-774.          doi:
     10.1109/Confluence47617.2020.9058219.
[8] Iu.G. Kryvonos, Iu.V. Krak, O.V. Barmak, D.V. Shkilniuk. Construction and identification of
     elements of sign communication. Cybernetics and Systems Analysis. 49 (2) (2013). p. 163-172.
[9] Yu.V. Krak, A.A.Golik, V.S.Kasianiuk. Recognition of dactylemes of Ukrainian sign language
     based on the geometric characteristics of hand contours defects. Journal of Automation and
     Information Sciences. 48(4) (2016). p. 90-98.
[10] J. T. O'Brien, C. Nelson. Assessing the Risks Posed by the Convergence of Artificial Intelligence
     and         Biotechnology.        Health        security,     18(3),     (2020)         219-227.
     https://doi.org/10.1089/hs.2019.0122.
[11] I.V. Krak, O.V. Barmak & S.O. Romanyshyn. The method of generalized grammar structures for
     text to gestures computer-aided translation. Cybernetics and Systems Analysis, 50(1) (2014)
     116-123. doi: 10.1007/s10559-014-9598-4.
[12] R. Penrose. A generalized inverse for matrices. Proceeding of the Cambridge Philosophical
     Society. 51. 1955. p. 406-413.
[13] A. Ben-Israel, T.N.E. Greville. Generalized inverse: Theory and Applications. (2-nd Ed.). –
     Springer-Verlag, New York, 2003. p. 420.
[14] G. Markowsky, O. Savenko, S. Lysenko, A. Nicheporuk. The Technique for Metamorphic
     Viruses' Detection Based on Its Obfuscation Features Analysis. In ICTERI Workshops. CEUR
     Workshop Proceedings, Vol. 2104, (2018), 680-687
[15] I. Krak, O. Barmak, & E. Manziuk. Using visual analytics to develop human and machine-
     centric models: A review of approaches and proposed information technology, Computitional
     Intelligence (2020) 1-26. https://doi.org/10.1111/coin.12289.
[16] A.V. Barmak, Y.V. Krak, E.A. Manziuk &, V.S. Kasianiuk. Information technology of
     separating hyperplanes synthesis for linear classifiers. Journal of Automation and Information
     Sciences, 51(5) (2019) 54-64. Doi: 10.1615/JAutomatinfScien.v51.i5.50.
[17] D. Oosterlinck, D. F. Benoit & P. Baecke. From one-class to two-class classification by
     incorporating expert knowledge: Novelty detection in human behaviour. European Journal of
     Operational Research, 282(3), (2020) 1011-1024. https://doi.org/10.1016/j.ejor.2019.10.015.
[18] D. Abati, A. Porrello, S. Calderara, R. Cucchiara. Latent space autoregression for novelty
     detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
     (2019) 481-490.
[19] D. Gong, L. Liu, V. Le, B. Saha, M.R. Mansour, S. Venkatesh, A.v.d. Hengel. Memorizing
     normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly
     detection. In: Proceedings of the IEEE International Conference on Computer Vision, (2019)
     1705 - 1714
[20] C. You, D. P. Robinson, R. Vidal. Provable self-representation based outlier detection in a union
     of subspaces. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR
     2017, Honolulu, HI, USA, July 21-26, (2017) 4323 – 4332.
[21] C.-H. Lai, D. Zou, G. Lerman. Robust subspace recovery layer for unsupervised anomaly
     detection. in Proc. Int. Conf. Learn. Represent., (2020) 1 - 28. arXiv:1904.00152.
[22] Z. Cheng, E. Zhu, S. Wang, P. Zhang & W. Li. Unsupervised Outlier Detection via
     Transformation Invariant Autoencoder. IEEE Access, 9, (2021) 43991-44002. doi:
     10.1109/ACCESS.2021.3065838.
[23] H. Wang, M. J. Bah, & M. Hammad. Progress in outlier detection techniques: A survey. IEEE
     Access, 7, (2019) 107964-108000. doi: 10.1109/ACCESS.2019.2932769.
[24] L. Ruff, J. R. Kauffmann, R. A. Vandermeulen, G. Montavon, W. Samek, M. Kloft, ... & K. R.
     Müller. A unifying review of deep and shallow anomaly detection. Proceedings of the IEEE.
     2021. doi: 10.1109/JPROC.2021.3052449.