=Paper=
{{Paper
|id=Vol-3101/Short10
|storemode=property
|title=Hyperplane clasterization of the small data based on pseudo-inverse and projective matrices (short paper)
|pdfUrl=https://ceur-ws.org/Vol-3101/Short10.pdf
|volume=Vol-3101
|authors=Iurii Krak,Hrygorii Kudin,Mykola Efremov,Alexander Samoylov,Vladislav Kuznetsov,Yedilkhan Amirgaliyev,Veda Kasianiuk
|dblpUrl=https://dblp.org/rec/conf/citrisk/KrakKESKAK21
}}
==Hyperplane clasterization of the small data based on pseudo-inverse and projective matrices (short paper)==
<pdf width="1500px">https://ceur-ws.org/Vol-3101/Short10.pdf</pdf>
<pre>
Hyperplane Clasterization of the Small Data Based on
Pseudo-Inverse and Projective Matrices
Iurii Krak1,3, Hrygorii Kudin1, Mykola Efremov1,3, Alexander Samoylov1,3, Vladislav
Kuznetsov1, Yedilkhan Amirgaliyev2 and Veda Kasianiuk3
1 Glushkov Cybernetics Institute, Kyiv, 40, Glushkov ave., 03187, Ukraine

2 Institute of Information and Computer Technologies, 125, Pushkin str., Almaty, 050010, Republic of Kazakhstan
3 Taras Shevchenko National University of Kyiv, 60, Volodymyrska Street, Kyiv, 01033, Ukraine


            Abstract
            Based on the developed mathematical methods for solving linear algebraic equations, an approach to
            solving problems of classification and clustering of information using the characteristic features of
            objects is proposed. An algorithm of hyperplane clustering with the verification of a given efficiency
            criterion by constructing hyperplanes in a space derived from the original feature space using the
            theory of perturbation of pseudo inverse and projection matrices is developed. A method of piecewise
            hyperplane cluster synthesis for selection of the most effective characteristics features and an
            algorithm for construction of piecewise hyperplane clusters are also proposed, which allows to find an
            effective solution of the given problems. The productivity and efficiency of the proposed approach is
            shown by the example of scaling the characteristic features of the recognition of the letters of the
            alphabet of fingerprints of sign language.

           Keywords1 1
            clustering, classification, pseudo-inverse operations, SLAE, optimization.


1. Introduction
One of the important problems of classification and clustering of information is the problem of
minimizing the dimension of the space of features, the choice of criteria for optimal solutions in
the processes of practical use. Such problems are effectively solved by the method of
multidimensional scaling of empirical data on the proximity of objects, with the help of which
the dimension of the measured objects essential characteristics space is determined and the
configuration of points - objects in this space - is constructed. This space is a multidimensional
scale, similar to the commonly used scales in various applications in the sense that the values of


CITRisk’2021: 2nd International Workshop on Computational & Information Technologies for Risk-Informed Systems, September
16–17, 2021, Kherson, Ukraine
EMAIL:      krak@univ.kiev.ua    (I.Krak);  gkudin@ukr.net    (H.Kudin);    nick.yefremov.in@gmail.com      (M.Yefremov);
SamoylovSasha@gmail.com (A.Samoylov); kuznetsov.wlad@incyb.kiev.ua (V.Kuznetsov); amir_ed@mail.ru (Y.Amirgaliyev);
veda.kasianiuk@gmail.com (V.Kasianiuk)
ORCID: 0000-0002-8043-0785 (I.Krak); 0000-0002-7310-2126 (H.Kudin); 0000-0001-8698-3957 (M.Yefremov); 0000-0002-7423-
5596 (A.Samoylov); 0000-0002-1068-769X (V.Kuznetsov); 0000-0002- 6528-0619 (Y.Amirgaliyev); 0000-0003-3268-303X
(V.Kasianiuk)
            © 2021 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)
the specially generated essential characteristics of the measured objects correspond to certain
positions on the axes of the new space [1]-[5].
   The purpose of this work is the development of mathematical methods for the synthesis of
systems for solving problems of classification and clustering based on information about the
characteristic features of objects [6]-[10]. These problems are proposed to be solved by
constructing hyperplanes in a space derived from the original space of features using theory of
perturbation of pseudo-inverse and projection matrices and solving systems of linear algebraic
equations. The paper proposes a method for synthesizing a piecewise hyperplane cluster to
isolate the most effective characteristic features and an algorithm for constructing piecewise
hyperplane clusters that allow you to find an effective solution to the problems. The performance
and efficiency of the proposed approach is shown on the example of the scaling of characteristic
features for recognizing the letters of the fingerprint alphabet of the sign language [8], [9], [11].


2. Related works
The problem of synthesis of a piecewise hyperplane cluster for a training sample of vectors
         {                      }
    Ω 0 = x : x ( j ) ∈ E m , j = 1, n , where x (1),, x ( n ) - vectors from the Euclidean space of features
    m
E    is to build a cluster so that the training sample points in this space are located quite close, in
the sense of a given distance criterion, to some set of hyperplanes that are formed according to
this sample.
    Note that in the formulation of this problem for clustering of information, the components of
the set of hyperplanes are not known in advance. Therefore, for the correct construction of
piecewise hyperplane clustering procedures, it is assumed that vectors x (1),, x ( n ) from the
space of features E m can belong to one of several hyperplanes L( A( k ), b( k )), where A( k ) ∈ E s×m ,
b( k ) ∈ E s , k = 1,2,... some given dimension s , ( s < m ) . Here A(k ) and b(k ) are matrix and vector
parameters, respectively, for a fixed hyperplane L( A(k ), b(k )) , k = 1,2,... .
   The proposed method of a piecewise-hyperplane cluster synthesis is based on using the
representation of hyperplanes by means of a set of solutions (pseudo solutions) of systems of
algebraic equations.
                                           A( k ) x = b( k ) ,                            (1)
                                                       L( A( k ), b( k )) =
                        = {x ∈ E : x = A (k )b(k ) + Z ( A(k )) z, z ∈ E m } .
                                    m        +
                                                                                                          (2)
Here A - pseudo inverse matrix, Z - projection matrix.
         +

   Let us give some mathematical results on the inversion (pseudo inversion) of matrices and the
construction of projection matrices, which are important for solving the problem of synthesizing
a piecewise-hyperplane cluster.
   Let the matrix be given A = ( aij ), i = 1, m j = 1, n and we write down, which are important in
further studies, the representations of this matrix in columns or rows, respectively:
                                        A = (a (1) ... a (n )), a ( j ) ∈ E m , j = 1, n,
                                            (                    )T
                                        A = a (T1)  a (Tm ) , a (Ti ) ∈ E n , i = 1, m ,
where T - transpose symbol.
   We consider the singular decomposition of an arbitrary matrix A of dimension m × n of rank
                                    r
r ≤ min(m, n ) in the form A = ∑ λi ui viT , where: λ12 ≥ ... ≥ λ2r > 0 - non-zero eigenvalues of matrices
                                   i =1
    T   T             n
AA , A A ; vi ∈ E , i = 1, r - orthonormal set of eigenvectors of the matrix AT A , which
correspond to non-zero eigenvalues λi2 , i = 1, r , AT Avi = λi2 vi , i = 1, r, viT v j = δ ij ui ∈ E m , i = 1, r -
orthonormal set of eigenvectors of the matrix AAT , which also correspond to non-zero
eigenvalues λi2 , i = 1, r , AAT ui = λi2 u i , i = 1, r, u Ti u j = δ ij , δ ij - Kronecker’s symbol.
   Let us give the definition of a pseudo inverse matrix in the Penrose optimization form [12].
For a matrix A ∈ E m×n , a pseudo inverse matrix A + ∈ E m×n is defined by the relation:
                                                                                            2
                                                  ∀b∈E m A+ b = arg min                    x .
                                                                             x ∈ Ω A (b)
                                   2
Here Ω A (b) = Arg minn Ax − b .
                     x∈E

   Also, using the singular representation of the matrix A ∈ E m×n , the pseudo inverse matrix
A + ∈ E n×m can be represented as [13]:
                                                                      r
                                                              A+ = ∑ν j u Tj λ−j 1 .
                                                                     j =1

Will consider important for practical application, matrices that are defined and calculated using
matrices A и A+ :
                                                       r
   1) a projection matrix P( A) = A+ A ≡ ∑ν iν iT that is an orthogonal projector onto the subspace
                                                      i =1
L AT generated by the row vectors of the matrix A ;
   2) projection matrix Z ( A) = I n− P( A) - orthogonal projection onto a subspace orthogonal to a
subspace L AT , I n - unit matrix;
                                           r
   3) matrix R( A) = A+ ( A+ ) Τ ≡ ∑ν jν Τj λ−j 2 .
                                          j =1

   Note the important properties of projection matrices P and Z :
                                                 P( A) + Z ( A) = I n , P( A) = A+ A,
                                                        r                          r
                                               P( A) = ∑ vi viT , Z ( A) = ∑ vi viT .
                                                       i =1                     i = r +1
Note that the calculation of the pseudo inverse matrix for an arbitrary matrix can be reduced to
the calculation of the corresponding to it some inverse matrix using the following relations:
                                   A + = ( AT A) + AT = AT ( AAT ) + , A+ = R ( A) AT .
   Using the above mathematical ratios for the inversion and pseudo inversion of matrices, we
write the necessary formulas for the synthesis of a piecewise hyperplane cluster.
   The distance ρ (x ( j ), L( A, b) ) from the point x ( j ) to the hyperplane L( A, b) will be found from
the relation
                                                                                       2
                                ρ 2 (x ( j ), L( A, b) ) = A+ (b − Ax ( j )) , x
                                                                                                 2
                                                                                                     = xT x .
To calculate the sum of the squares of the distances of the set of points x ( j ) , j = 1, n to the
hyperplane L( A, b) , we use the following formula:
                                                                ({                     }       )
                                                          ρ 2 x : x ( j ), j = 1, n , L( A, b) =
                                                          n                    T
                                                      = ∑ (b − Ax ( j ) ) R( AT )(b − Ax ( j ) ) =
                                                         j =1
                                                                         n
                                                      = tr R( AT ) ∑ (b − Ax ( j ) )(b − Ax ( j ) )T .
                                                                        j =1

Here tr() - matrix trace.
    Then, for given values A, x( j ), j = 1, n , the optimal value of the vector of the right-hand side
of the system of equations determining the hyperplane is determined from the conditions
                                                              bopt = Axˆ =

                                                              ({
                                                              }      ) n1 ∑ x( j).
                                     = arg minS ρ 2 x : x ( j ), j = 1, n , L( A, b) , xˆ =
                                                b∈R
                                                                                                         n

                                                                                                         j =1

From here, the distance ρ ({x : x ( j ), j = 1, n}, L( A, b) ) of the set of points x ( j ) , j = 1, n to the
hyperplane, with an optimal vector bopt , is calculated by the following formula:
                                                                                                                1
                                           ({                          }                   )
                                        ρ x: x(j), j = 1,n , L( A, bopt ( A)) = (trA+ AXX T ) 2 ,
                                                                                                       ~~
           ~
where X = ( ~x (1)).. ~x ( n )), ~x ( j ) = x ( j ) − xˆ , j = 1, n .
                                           s ×m
    The optimal matrix A ∈ E                   is defined as a solution of the problem
                                                                                       {
                                              Aopt = arg T min s×m ρ 2 ( x : x ( j ), j = 1, n ,.
                                                                   AA = Es , A∈E
                                                                                                            }
                                                                                   (
                                                       L( A,b opt ( A))) = umT − s +1  umT .      )
                                                                                                   T


                           ~~             r
           +
Wherein trAopt Aopt XX T =                ∑          λ2j , (u1 ,, u m ) T (u1 ,, u m ) = I m .
                                      j = m − s +1

Using the above results on pseudo inversion of matrices and calculation of distances, a piecewise
hyperplane clustering method is proposed. The idea of the method is to perform a sequence of
steps, at each of which the parameters of the hyperplanes L( A(k ), b(k )), k = 1,2,... are found.
These hyperplanes are constructed within the framework of performance the requirements of the
implemented hyperplane clustering efficiency criterion. As initial parameters of piecewise
hyperplane clustering it is assumed that all the training set vectors x (1),, x ( n ) from the space
of features E m are optimally approximated by the hyperplane L( Aopt (1), bopt (1)) . At these values
 Aopt (1), bopt (1) , the performance efficiency criterion of hyperplane clustering is checked. If the
efficiency criterion is satisfied, this means that the result of the construction of the piecewise
hyperplane cluster is achieved by building the cluster with one hyperplane and those
characteristic features are the optimal set. If the conditions of the efficiency criterion in the
framework of this (first) hyperplane are not met, then the transition to the construction of the
second hyperplane of the cluster is carried out. For this purpose, the vectors that determine the
non-fulfillment of the efficiency criterion are excluded from the training sample, that is, a subset
                       {                                        }
is formed Ω1 = x : x ( j1 ) ∈ E m , j1 = 1, n1 . Then the actions on the optimal approximation of the
subset Ω1 by the hyperplane L( Aopt (2), bopt (2)) are repeated, and at the new optimal values
 Aopt (2), bopt (2) the fulfillment of the clustering efficiency criterion is checked. When the criterion
is met, the result of the piecewise hyperplane cluster construction is reached, then the resulting
vectors of characteristic features are added to the optimal set of features. When not performing -
it is necessary to exclude from the training sample those vectors that cause to the non-fulfillment
of the efficiency criterion and repeat the procedure. Note that this finite recurrent process will
always ensure the construction of an optimal piecewise hyperplane cluster.


3. Algorithm of synthesis of the piecewise hyperplane cluster
Using the proposed method of synthesizing piecewise hyperplane clusters, the algorithm of such
a synthesis can be represented as the following sequence of actions:
    1. Formation of a single-link cluster (the number of a link in a cluster is an index in brackets):
                                                                          {
   1) All vectors of the training sample Ω(0) = x : x ( j ) ∈ E m , j = 1, n from the space of features }
are to be optimally approximated by a hyperplane L( Aopt (1), bopt (1)) , which is defined as the set of
solutions (pseudo-solutions) of systems of algebraic equations (1), (2), where A(k ) and b(k ) are
respectively the matrix and vector parameters of a certain hyperplane (basic formulas for
constructing a hyperplane cluster are implemented).
                                            {                                 }
    2) To form a set Ω(1) = x : x ( j1 ) ∈ E m , j1 = 1, n1 of points Ω(0) for which the condition is
satisfied, namely:
                                                 ( b1opt (1) − A1opt (1) x ( j1 )) T R( A1Topt (1)) ×
                                                 × ( b1opt (1) − A1opt (1) x ( j1 )) > hmin ,
where hmin - valid distance of vectors from components of the cluster of hyperplanes. The linear
dependence or independence of vectors that can be removed from the set makes it possible to
simplify the form of the formulas for the distances of these vectors from the corresponding
hyperplanes.
   3) The stop of the algorithm at the stage of the first cluster link can be completed after the
distance of each of the vectors of the training sample Ω(0) = x : x ( j ) ∈ E m , j = 1, n to the       {                             }
hyperplane L( Aopt (1), bopt (1)) does not exceed the allowable distance.
                                   {                                 }
   4) If the set Ω(1) = x : x ( j1 ) ∈ E m , j1 = 1, n1 includes at least two vectors, then go on to form
the second link of the cluster.
    2. Formation of the second cluster link. Consists of the following:
    1)        From        the      set
                                     obtained in the process of building the first link
          {                                 }
 Ω(1) = x : x ( j1 ) ∈ E , j1 = 1, n1 , a hyperplane L( Aopt ( 2), bopt ( 2)) defined as the set of solutions
                            m

(pseudo - solutions) of a system of algebraic equations
                                Ak (2) x = bk (2) , k = 1,2,...                                           (3)
is constructed.
    2) To calculate optimal Akopt (2) , bkopt (2) for L( Ak ( 2), bk ( 2)) , k = 1,2,...
    3) To form a set Ω 0k ( 2), k = 1,2,... by the distance of the vector x ( j )∈Ω 0 (2) to each of the
hyperplanes (3):
   ρ 2 (x : x ( j ) , L( Akopt ( 2), bkopt ( 2)) ) = = (bkopt ( 2) − Akopt ( 2) x ( j ) )T R T ( Akopt ( 2)) × × (bkopt ( 2) − Akopt ( 2) x ( j ) )
                                                                                                                                                      ,
i.e., perform actions similar to paragraph 3 of the construction of the first link.
  4) To go to step 2 with new subsets Ω1j (1) , Ω 2j (1) ,   j = 1,2,. The superscript j denotes the
number of iterations at the second link stage.
The algorithm is stopped at the stage of the second cluster link after the distances of each of the
vectors of the corresponding partitioning element to the corresponding hyperplane are not
improved.
    The efficiency criterion of the carried out hyperplane clustering is checked (for example, the
requirement of the level of compactness of the cluster links). When the criterion is fulfilled, the
cluster is completed, and if it is unfulfilled, the transition to the formation of the third cluster link
is carried out. Then the process repeats and, since it is finite, we will always find a solution to the
clustering problem.


4. Experimental studies
To test the effectiveness of the proposed method of scaling information, we used characteristic
features for recognizing the dactyl letters of the Ukrainian sign language alphabet [6], [7]. As
characteristic features, 52 features were taken, which were divided into 6 groups, depending on
the method of their getting.


Figure 1: Dactylemes location on hyperplane of scaling and distances of dactylemes from hyperplane of
scaling for three characteristic features


The construction of piecewise hyperplane clusters was carried out with groups of features that
characterize the geometric - topological parameters of the human hand when showing the letters
of the dactyl alphabet and for which an acceptable recognition quality was obtained [6]. Using
the example of the classification of nine dactylemes (А, Б, В, Г, Ж, І, Є, И, Й) according to
three and five characteristic features, the separation of these dactylemes on the scaling plane was
obtained. Wherein three characteristic features were taken: compactness, directionality,
elongation, and in experiments with five features: the ratio of width to height, four angles values
between vectors drawn from the center of the hand to the most distant points of the hand.
   Experimental results using these three characteristic features are given in Fig. 1, where image
of the dactylemes location on the scaling plane is normalized on the interval 0,1. Without
reducing the generality of research, dactylem A was placed at the origin.
   The results of clustering with using of five characteristic features are shown in Fig. 2., where
the dactylem A was also placed at the origin.


Figure 2: Dactylemes location on hyperplane of scaling and distances of dactylemes from hyperplane of
scaling for five characteristic features


The results of the experiments showed that using of five characteristic features makes it possible
to get a clearer separability (the distance of the dactylemes from the scaling plane ranged from
0.1580 to 0.3828), while using three features the distances from the scaling plane were
significantly less (ranging from 0.0306 to 0.1274) that is, three to five times less. The exception
was dactylem Б, in the first case (0.0177) and in the second case (0.0073), the separability from
the scaling plane was not insignificant in both cases.


5. Conclusions
The paper proposes method and algorithm for multidimensional scaling of recognition objects
characteristic features using the means of matrix pseudo inversion, building piecewise
hyperplane clusters that provide a solution of the problem [14]-[18]. Proposed method makes it
possible to analyze information about the set of characteristic features and to identify those that
are essential for solving the recognition problem that is important due to their significant number
and for difficult separable classes [19]-[25]. The effectiveness of the proposed approach is
shown by the example of obtaining clusters for dactylemes sign language in order to obtain the
optimal number of characteristic features for effective recognition.
   Subsequent research will focus on the study of different types of characteristics features and
their impact on the quality of recognition.


    References
[1] M.Devison, Multidimensional scaling, Moscow: Finansy I Statistyka, 1988, 254 p.
[2] P.Perera, P.Oza, V.M.Patel, One-Class Classification: A Survey, 2021, arXiv preprint.
     arXiv:2101.03064
[3] M.Z.Zaheer, J.H.Lee, M.Astrid, A.Mahmood, S.I.Lee, Cleaning label noise with clusters for
     minimally supervised anomaly detection, 2021. arXiv preprint, arXiv:2104.14770
[4] W.Sultani, C.Chen, M.Shah, Real-world anomaly detection in surveillance videos, In
     Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp.
     6479-6488
[5] I.V.Krak, G.I.Kudin, A.I.Kulias, Multidimensional Scaling by Means of Pseudoinverse
     Operations, Cybernetics and Systems Analysis, 55(1), 2019, pp. 22-29. doi:
     10.1007/s10559-019-00108-9
[6] L.Cheng, Y.Wang, X.Liu, B.Li, Outlier Detection Ensemble with Embedded Feature
     Selection. In Proceedings of the AAAI Conference on Artificial Intelligence, 34(04), 2020,
     pp. 3503-3512. https://doi.org/10.1609/aaai.v34i04.5755
[7] S.A.N.Alexandropoulos, S.B.Kotsiantis, V.E.Piperigou, M.N.Vrahatis, A new ensemble
     method for outlier identification, In 2020 10th International Conference on Cloud
     Computing, Data Science & Engineering, IEEE, 2020, pp. 769-774. doi:
     10.1109/Confluence47617.2020.9058219
[8] Iu.G.Kryvonos, Iu.V.Krak, O.V.Barmak, D.V.Shkilniuk, Construction and identification of
     elements of sign communication, Cybernetics and Systems Analysis, 49 (2), 2013. pp. 163-
     172
[9] Yu.V.Krak, A.A.Golik, V.S.Kasianiuk, Recognition of dactylemes of Ukrainian sign
     language based on the geometric characteristics of hand contours defects. Journal of
     Automation and Information Sciences, 48(4), 2016, pp. 90-98
[10] J.T.O'Brien, C.Nelson, Assessing the Risks Posed by the Convergence of Artificial
     Intelligence and Biotechnology, Health security, 18(3), 2020, pp. 219-227.
     https://doi.org/10.1089/hs.2019.0122
[11] I.V.Krak, O.V.Barmak, S.O.Romanyshyn, The method of generalized grammar structures
     for text to gestures computer-aided translation, Cybernetics and Systems Analysis, 50(1),
     2014, pp.116-123. doi: 10.1007/s10559-014-9598-4
[12] R.Penrose, A generalized inverse for matrices. Proceeding of the Cambridge Philosophical
     Society, 51, 1955, pp. 406-413
[13] A.Ben-Israel, T.N.E.Greville, Generalized inverse: Theory and Applications, (2-nd Ed.),
     Springer-Verlag, New York, 2003, 420 p.
[14] G.Markowsky, O.Savenko, S.Lysenko, A.Nicheporuk, The Technique for Metamorphic
     Viruses' Detection Based on Its Obfuscation Features Analysis, In ICTERI Workshops,
     CEUR Workshop Proceedings, Vol. 2104, 2018,pp. 680-687
[15] I.Krak, O.Barmak, E.Manziuk, Using visual analytics to develop human and machine-
     centric models: A review of approaches and proposed information technology,
     Computitional Intelligence, 2020, pp. 1-26. https://doi.org/10.1111/coin.12289
[16] A.V.Barmak, Y.V.Krak, E.A.Manziuk, V.S.Kasianiuk, Information technology of
     separating hyperplanes synthesis for linear classifiers, Journal of Automation and
     Information Sciences, 51(5), 2019, pp. 54-64. Doi: 10.1615/JAutomatinfScien.v51.i5.50
[17] D.Oosterlinck, D.F.Benoit, P.Baecke, From one-class to two-class classification by
     incorporating expert knowledge: Novelty detection in human behaviour. European Journal
     of        Operational        Research,       282(3),        2020,        pp.      1011-1024.
     https://doi.org/10.1016/j.ejor.2019.10.015
[18] D.Abati, A.Porrello, S.Calderara, R.Cucchiara, Latent space autoregression for novelty
     detection, In: Proceedings of the IEEE Conference on Computer Vision and Pattern
     Recognition, 2019, pp. 481-490
[19] D.Gong, L.Liu, V.Le, B.Saha, M.R.Mansour, S.Venkatesh, A.v.d.Hengel, Memorizing
     normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised
     anomaly detection, In: Proceedings of the IEEE International Conference on Computer
     Vision, 2019, pp. 1705 - 1714
[20] C.You, D.P.Robinson, R.Vidal, Provable self-representation based outlier detection in a
     union of subspaces, In 2017 IEEE Conference on Computer Vision and Pattern
     Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 4323 – 4332
[21] C.-H.Lai, D.Zou, G.Lerman, Robust subspace recovery layer for unsupervised anomaly
     detection. in Proc. Int. Conf. Learn. Represent, 2020, pp. 1 - 28. arXiv:1904.00152
[22] Z.Cheng, E.Zhu, S.Wang, P.Zhang, W.Li, Unsupervised Outlier Detection via
     Transformation Invariant Autoencoder, IEEE Access, 9, 2021, pp. 43991-44002. doi:
     10.1109/ACCESS.2021.3065838
[23] H.Wang, M.J.Bah, M.Hammad, Progress in outlier detection techniques: A survey, IEEE
     Access, 7, 2019, pp. 107964-108000. doi: 10.1109/ACCESS.2019.2932769
[24] L.Ruff, J.R.Kauffmann, R.A.Vandermeulen, G.Montavon, W.Samek, M.Kloft, K.R.Müller,
     A unifying review of deep and shallow anomaly detection, Proceedings of the IEEE, 2021.
     doi: 10.1109/JPROC.2021.3052449
[25] P.Perera, P.Oza, V.M.Patel, One-Class Classification: A Survey, 2021, arXiv preprint.
     arXiv:2101.03064

</pre>