1. Introduction

Adaptive ADALINE Robust Training Algorithm Under the Maximum Correntropy Criterion With Variable Center

Oleg G. Rudenko

oleh.rudenko@nure.ua 0

O leksandr O. Bezsonov

oleksandr.bezsonov@nure.ua 0

Andrzej Szajna

a.szajna@dtpoland.com 1 0 Khark iv National University of Radio Electronics , Nauk y Ave. 14, Kharkiv, 61166, Uk raine 1 Uniwersytet Zielonogórsk i , ul. Licealna 9, Zielona Gora, 65-417 , Poland

The problem of training ADALINA in the presence of non-Gaussian interference is considered. The learning algor ithm is a gradient procedure for maximizing the functional. In contrast to the commonly used Gaussian kernels, the centers of which are at zero and effective for distributions with zero mean, the paper considers a modification of the criterion suitable for distributions with nonzero mean. The modification is to use correntropy with a variable center. The use of Gaussian kernels with a variable center will allow us to estimate unknown parameters under Gaussian and non-Gaussian noises with zero and non-zero mean distributions. The properties of its convergence in the stationary and non-stationary cases in conditions of Gaussian and non-Gaussian noises are investigated.

1 Correntropy maximization functional gradient algor ithm asymptotic convergence non-stationary steady state

1. Introduction

Adaptive linear element (ADALINE) was the first linear neural network proposed by Widrow B. and Hoff M., and became an alternative to the perceptron [1]. Subsequently, this element and its learning a lgorithm are being very commonly used in problems of identification, control, filtering, etc. The learning algorithm of Widrow-Hoff is the Kaczmarz algorithm for solving systems of linear algebraic equations [2]. Properties of this algorithm dealt w ith the solution of the identification problem is suffic iently described in [3].

2. The problem of ADALINE training ADALINE is described by the equation

yn+1 = c∗T xn+1 + ξ n+1 , where yn+1 is the observed output s ignal; xn+1 = (x1,n+1, x2,n+1,..xN ,n+1)T is the vector of output signa ls N ×1; c∗ = (c1∗, c2∗,..c∗N )T is the vector of desired parameters N ×1; ξ n + 1 is the noise; n is the discrete time.

The task of its learning cons ists in the definition (estimation) of the vector of parameters c∗ and is reduced to minimize some of the chosen in advance performance functiona l (identification criterion) (1) (2) where ei = yi − yˆi ; yˆi = ciT−1xi is the output model signal; c is the vector estimation c∗ ; ρ(ei ) – some differentia l loss function satisfying the conditions: ρ(ei ) ≥ 0; ρ(0) = 0; ρ(ei ) = ρ(− ei ); ρ(ei ) ≥ ρ(e j ) for ei ≥ e j .

The training objective is to search for estimate c defined as the solution of a minimum extreme problem or as solving equation system

F (c) = min , ∂F( e ) ∂c j

n = ∑ ρ′(ei ) ∂∂ceij = 0, i=1 (3) (4) where ρ′(ei )) = ∂ρ(ei ) – is the function of influence.

∂ei

If we introduce the weigh function ω(e) = ρ′(e) / e , the system of equations (4) may be put as following: n ∑ ω(ei )ei ∂∂ceij = 0, (5) i=1 while functional minimization (2) will be equiva lent to minimizing a weighted quadratic functional, most often seen in practice

n min ∑ω (ei )ei2. (6)

i=1

A quadratic functional the most wide ly used in estimating the parameters uses the second order statistics of the error signa l and is quite optimal in assuming linearity and Gauss nature of signals. Indeed, when choosing ρ(ei ) = 0.5ei2 the influence function ρ′(ei ) = ei , i.e. grows linearly with the increase of ei , that expla ins the volatility of the least squares method va luation to outliers and distortions with big distribution “tails”.

Stable M-estimation is also estimation c , defined as solving an extremal problem (3) or solving a system of equations (4), however loss function ρ(ei ) is chosen as different from the quadratic one.

There are quite a number of functiona ls that provide the robust M-estimates but the most common are combined functiona ls proposed by Huber [4] and Hampe l [5] consisting of quadratic, that ensures optima l estimates for the Gaussian distribution, and modular, that allows to get an estimate that is more robust to distributions with heavy "ta ils" (outliers). However, the effectiveness of the resulting robust estimations depends significantly on many parameters used in these criteria and chosen depending on the experience of the researcher.

The practical application of these functiona ls for solving the identification problem was considered in many works, in [6, 7], in particular.

Another approach to obta in robust estimates, devoid of this drawback, is the use of the fourth degree criterion [8], combined criteria using a combination of the quadratic criterion and the criterion of smallest moduli [9–11], the quadratic criterion and the fourth degree criterion [12], the fourth degree criterion and the criterion of smallest moduli [13]. It should be noted that the use of the combined criterion turned out to be very effective and much simpler when implementing the identification procedure.

One more approach that is currently wide ly used is the approach based on information characteristics of signals, entropy, in particular. The functiona l used in this case is an explic it functional of the probability density function (PDF) and inc ludes all the higher-order statistical properties defined in PDF. Since entropy measures the mean uncertainty conta ined in a given PDF, minimizing it provides a reduction in error. In [14, 15], the concept of information theoretic learning (ITL) was introduced, using as a criterion the Rényi quadratic entropy, for which a nonparametric estimate based on Parzen windows with Gauss kernels is determined directly from data samples. In these works, it was proved that when using the Rényi entropy, as a result of training, the Rényi distance between the conditiona l probability of the density function of the desired and actual output signa ls for the given input signa ls is minimized.

The results of numerous studies indicate that in the presence of non-Gaussian, in particular, impulse noise, in measurements, an approach based on information characteristics of signals is very effective, while a criterion that considers all statistics of a higher-order error signa l turns out to be more appropriate. Correntropy was introduced in [16] as a generalized measure of similarity, the maximization of which underlies the development of suffic iently simple and effic ient robust algorithms.

3. Correntropy as a measure of similarity

Correntropy, defined as a localized measure of similarity, has proven to be very efficient for obtaining robust estimates due to its less sensitivity to outliers. Its name emphasizes the relationship with correlation, and also indicates the fact that its average value over time or measurements is associated with entropy, more precisely, with the argument of the logarithm in the quadratic Rényi entropy, estimated with the help of Parzen windows [17].

For two random variables X and Y, the correntropy is defined as

V ( X ,Y ) = M {kσ ( X ,Y )}, (7) where M{•} – is the expectation symbol; kσ (•) – rotation invariant Mercer kernels; σ – kernel width.

The most widely used in calculating the correntropy are Gaussian ones, defined by the formula kσ ( X ,Y ) = 1 exp− X − Y 2 . (8)

2πσ  2σ 2 

When calculating the correntropy, it is necessary to know the joint distribution of random variables X and Y, which, as a rule, is not known. Since in practice there are usually a finite number of samples {xi , yi },i = 1,2,..., N , the most simple estimate of the correntopy is calculated as follows: 1 N Vˆ( X ,Y ) = ∑ kσ (xi − yi ). (9)

N i=1

In tasks of identification, filtering, etc. as a functional, the correntropy between the required output signa l di and the model output signa l (real) yi is used. When using Gaussian kerne ls, the optimized functional takes the form where ei = di − yi – is the identification (filtration) error.

The use of the Taylor series expansion for the Gaussian kernel makes it possible to write the correntropy as follows:

Jcorr (n) = 1 1 N

∑ 2πσ N i=n−N +1 exp − ei2 ,

 2σ 2  V ( X ,Y ) = 1 ∑∞ (−1)n 2πσ n=02n σ 2nn!

M { X − Y 2n }.

(10) (11)

4. Correntropy maximization algorithms

The gradient optimization a lgorithm (10) at N = 1 will have the form [18, 19] and having the form where γ is the parameter affecting the rate of convergence.

A significant drawback of this algorithm is the low convergence rate, which significantly limits the possibility of its use in identifying nonstationary objects. It should be noted that finding the optima l value of the parameter γ , that provides the maximum convergence rate of the algorithm, equa l, as it is easy to show,

 e2n2σ+21 , leads to an analogue of Kaczmarz algorithm (Widrow–Hoff’s). where ψn+1 = exp −



In [ 20–23 ], to reduce impulse noise, a recurrent weighted least squares (RWLS) method was proposed, which minimizes the criterion  en2+1  wn+1 = wn + γ exp −  2σ 2 en xn+1,  γn+1 = ψn+1 xn+1 2 −1,

 en2+1  ψn+1 = exp −   2σ 2  cn+1 = cn + ψn+1Pn xn+1

T λ + ψn+1xn+1Pn xn+1

( yn+1 − cnT xn+1),

Pn+1 = λ−1 Pn − λψ+n+ψ1nP+n1xxnnT++11xPnTn+x1nP+n1 , where 0 ≤ λ < 1 is the weighing factor.

Thus, when deriving the formula for calculating Pn+1 (16), the approximation was used

Pn+1 = λPn + ψn+1xn+1xnT+1. (17)

As known, introduc ing a parameter λ into an algorithm is advisable when identifying nonstationary parameters.

Since a function Gσ (e) is a local function of error e , correntropy can be used as an indicator of error in information processing and machine learning problems

Gσ (e) =exp  − e2  (18) 1  2πσ  2σ 2 .

It can be seen from (18) that the center of the Gaussian nuc leus is at zero. This circumstance can lead to the fact that if the distribution of errors (noise) has a nonzero mean, function (18) will not correspond to this distribution. Therefore, the problem arises of choosing such a correntropy function that would be effective for noises having a nonzero mean.

One of the approaches to solving this problem is the use of correntropy with a variable center [24

Vσ ,c (T ,Y ) = M {Gσ ,c (e)}Gσ (x, y) = 1 2π σ exp − {e − c}2 ,  2σ 2  where c ∈ R is the center.

In this case

Vσ ,c (T ,Y ) = 1 ∑∞ (−1)n M  (e − c)2n . 2π σ n=0 2n n!  σ 2n  (12) (13) (14) (15) (16) (19) (20)

When σ increasing, the moments of higher orders relative to the center will decrease faster, therefore, the moment of the second order will prevail in the value Vσ ,c (T ,Y ) . In particular, for c = M {e} and σ → ∞ , maximizing the correntropy whih the center c is equiva lent to minimizing the error variance.

In [27], it was proposed сomplex сorrentropy w ith variable center, in [28] was introduced generalized correntropy criterion. In [29] was considered maximum mixture correntropy criterion.

The solution of practical problems based on the minimization of the corresponding criteria was considered in [ 30–33 ].

Sparsity Constrained Recursive Generalized maximum correntropy criterion (MCC ) with variable center algorithm was studied in [34]. Work [35], is interested in distributed MCC algorithms, based on a divide-and-conquer strategy.

Minimizing functional (19) with respect to the parameters of the model, we obtain ∂En+1 = − exp (en+1 − c)2  (en+1 − c) ∂w  2σ 2  2σ 2

xn+1;  (en+1 − c)2  (en+1 − c) ∂En+1 = w exp − ∂c  2σ 2  σ 2

;  (en+1 − c)2  (en+1 − c)2 ∂∂Eσn+21 = −w exp − 2σ 2  σ 3 . (23)

Taking these expressions into account, the algorithms for correcting the network parameters will have the form

 (en+1 − cn+1 )2 (en+1 − cn+1 )xn+1, wn+1 = wn +γ w exp −  2σ n2+1 

 (en+1 − cn+1 )2 (en+1 − cn ); cn+1 = cn +γ c exp −

 2σ n2+1  σ n2+1 = σ n2 −γ σ wn+1 exp − (en+12−σcn2n+1 )2  (en+1σ− n3cn+1 ) , (26) 2 where γ w ,γ c , γ σ are the parameters of the algorithm that regulate the step size and affect the rate of its convergence. 4.1.

Multidimensional object

If the object under study has several outputs, then the output s igna l w ill be a vector signal and the error will also be a vector value, and the learning algorithm w ill have the form 2  wn+1 = wn +γ exp − en+1 − c R−1 en+1xn+1,

 2 where en+1 − c R−1 = (en+1 − c)T R−1(en+1 − c); R−1 is the covariance matrix of the input vector Rn−+11 = Rn−1 −γ Rwn+1 exp − en+1 − cn+1 2Rn−1 (en+1 − cn+1)(en+1 − cn+1)T .

 4.2.

Investigation of the issues of convergence of the algorithm. Consider the estimation error Then

Θ n+1 = cn+1 − c*. en+1 =Θ nT+1xn+1 +ξ n+1 = ena+1 +ξ n+1, where en+1 =Θ nT+1xn+1 is a priori error. a (21) (22) (24) (25) (27) (28) (29) (30)

In this case, the estimation algorithm can be written as

where f (en+1) = exp (en+1 − c)2  (en+1 − c).  2σ 2 

Writing down algorithm (31) with respect to estimation errors, we have

Multiplying both sides of the given expression on the left by θ nT+1, we get wn+1 = wn +γf (en+1)xn+1, θ n+1 =θ n −γf (en+1)xn+1.

Averaging both sides of (32), i.e. we obtain the condition for the convergence of algorithm (31) in the mean square

Consider a steady state. Since in steady state it follows from (33) that

0 < γ ≤

2M {f (en+1)ena+1} M {f 2 (en+1) xn+1 2 }

. lim M {θ n+1 2}= lim M {θ n 2}, n→∞ n→∞ 2lim M {ena+1 f (en+1)}= γtrRx lim M {f 2(en+1) ,

} n→∞ n→∞ (31) (32) (33) (34) (36)

+∞ 2π1σ e nl→im∞ −∫∞ exp − (en+21σ−2c)2 (en+1 − c)2 exp − (en+21σ−e2ce )2 den+1.

Substitution of (35) and (36) into (34) gives the expression for the steady-state error ( a )2  lim M  en+1  n→∞  

( a )2  A lim M  en+1  = n→∞   2B where tr denotes the trace operator.

To calculate the steady-state value of the estimation error, we define M {f 2(en+1) xn+1 2} and Consider the case of Gaussian noise ξ ∼ Ν (0,σξ2 ). Using Price's theorem [36], we obtain lim M {ena+1 f (en+1)}= lim M {ena+1 f (ena+1 +ξ n+1)}= lim M (ena+1)2 M {f ′(en+1)} = n→∞ n→∞ n→∞  

  (en+1 − c)2  = lim SM exp − 1− n→∞   2σ 2  B = nl→im∞ +−∫∞∞ exp − (en+21σ−2c)2 1 − (en+σ1 −2 c)2 exp − (en+21σ−e2ce )2 den+1; or lim M (ena+1)2  = n→∞ =

+∞ γtrRx lim ∫ exp − (en+1 − c)2 (en+1 − c)2 exp − (en+1 − ce )2 den+1 n→∞ −∞  2σ 2   2σ e2  +∞ lim ∫ exp − (en+1 − c)2 1− (en+1 − c)2 exp − (en+1 − ce )2 den+1 n→∞ −∞  2σ 2  σ 2   2σ e2 

This expression shows that lim M (ena+1 )2  = 0 when choosing γ → 0.

n→∞ 

Consider the case of non-Gaussian interference. In this case, we use the Taylor series expansion. In the steady state, the estimated parameters change (are corrected) insignificantly. Therefore, we can rewrite (34) as follows: f ′(ξ ) = exp − (ξ2−σ c2)2 1− (ξ σ−2c)2 ; f ′′(ξ ) = exp − (ξ − c)2  (ξ − c)3 −  2σ 2  σ 4 3(ξ − c)  σ 2 .

 γtrRxM {f 2(ξ − c)} γtrRxM {K (ξ − c)2} S = 2M {f ′(ξ − c)}−γtrRxM {f (ξ − c) f ′′(ξ − c) + f ′(ξ − c) 2} . 2M K′1− (ξ − c)2  −γtrRxM K1+ 2(ξ − c)4   σ 2    σ 4 − 5(ξ − c)2  σ 2 

Assuming that the interference does not correlate with the signa ls and the prior error ea , we can write

M {f 2 (e)}≈ M {f 2 (ξ )}+ SM {f (ξ ) f ′′(ξ ) + f ′(ξ ) 2}.

Substituting (43) and (44) into (40), we have

M {ea f (e)}= M ea f (ξ ) + f ′(ξ )(ea )2 + o(ea )2   ≈ SM {f ′(ξ )};  

Substitution of (41), (42) into (45) gives

S = where

 (ξ − c)2   (ξ − c)2  K = exp − σ 2 ; K′ = exp − 2σ 2 . 4.3.

Non-stationary case Let us assume that the estimated parameters are non-stationary, i.e.

cn+1 = cn∗ + ∆c*, ∗ zero mathematical expectation, the correlation matrix of which is equal to Rc = M {c∗c∗T }. where ∆c* = (∆c1∗,∆c2∗,...,∆c∗N )T is a vector of a random sequence N ×1 whose components have Consider the error vector θ n+1 = cn+1 − cn∗+1.

Then, taking into account (30), the estimation algorithm can be written as

θ n+1 =θ n − cn∗+1 +γf (en+1)xn+1 =θ n − ∆c∗ +γf (en+1)xn+1, Multiplying both sides of (48) on the left by θ nT+1 and calculating the mathematical expectation, we Sσ 3

3 (σ 2 +σ ξ2 + S )2

; σ 3 (S +σ ξ2 )

3 (2σ ξ2 +σ 2 + 2S )2 .

(47) (48) (49)

(50) (51) (52) get = =

M {θ n+1 2}= M {θ n 2}− 2γM {xnT+1θ n f (en+1)}+γ 2M {f 2(en+1) xn+1 2}+ + M  ∆c∗ 2  }

 + M {xnT+1∆c∗}+ M {∆c∗T xn+1}− 2γM {xnT+1∆c∗ f (en+1) ,   

Taking into account the statistical properties of signa ls and noise, we have

M {θ n+1 2}= M {θ n 2}− 2γM {ena+1 f (en+1)}+ γ 2M {f 2(en+1) xn+1 2}+ M  ∆c∗ 2 .  

For Gaussian interference, using Price's theorem gives lim M {ena+1 f (en+1)}= lim M {ena+1 f (ena+1 +ξ n+1)}= lim M (ena+1)2 M {f ′(en+1)} = n→∞ n→∞ n→∞     (en+1 − c)2  = lim SM exp − 1− n→∞   2σ 2  M {f 2 (en+1)}= lim M exp − (en+1 − c)2 (en+1 − c)2  = n→∞   2σ 2   +∞ 1  (en+1 − c)2   (en+1 − ce )2  2π σ e nl→im∞ −∫∞ exp − 2σ 2 (en+1 − c)2 exp − 2σ e2 den+1 =

Considering that

   = M {∆c∗∆c∗T }= trRc , M  ∆c∗ 2 

 for steady state when lim M {θ n+1 2}= lim M {θ n 2}, n→∞ n→∞ from expression (49) we obtain From this ratio, we can determine the value S 2S

3 (σ 2 +σξ2 + S )2 = γtrRx (σξ2 + S )

3 (σ 2 + 2σξ2 + 2S )2 + trRc γσ 3 .

S =

3 3 γtrRx (σξ2 + S )(σ 2 +σξ2 + S )2 trRc (σ 2 +σξ2 + S )2

3 + 2γσ 3 (σ 2 + 2σξ2 + 2S )2

For σ 2 → ∞ , we have the value of S for the least squares γtrRxσξ2 +γ −1trRc lim S = . σ →∞ 2 −γtrRx

In the case of non-Gaussian noise, we have

M {ena+1 f (en+1)}≈ M {ena+1 f (ξ n+1)+ ena+1 f ′(ξ n+1)}≈ SM {f ′(ξ n+1)}. M {f 2(en+1)}≈ M (f (ξ n+1)+ ena+1 f ′(ξ n+1)+ 0,5 f ′′(ξ n+1)ena+21)2  ≈  ≈ M {f 2(ξ n+1)}+ SM {(f (ξ n+1) f ′′(ξ n+1)+ ( f ′(ξ n+1))2 )}, where f ′(ξ n+1) = exp − (ξ2−σ c2)2 1− (ξ σ−2c)2 ;  (ξ n+1 − c)2 ξ n3+1 − 3ξ n+1 .

 f ′′(ξ n+1) = exp − 2σ 2 ξ n4+1 σ 2 

Substituting (54) and (55) into (49), after simple transformations we obtain

S = γAC+−γγ−D1B , where A = trR xM(ξn+1 − c)2 exp − (ξn+1 − c)2 ;     σ2  B = trRc; C = 2M1− (ξn+21σ−2 c)2  exp − (ξn+σ1 2− c)2 ; D = trR xM1+ 2(ξn+σ14− c)4 − 5(ξn+σ12− c)2  exp − (ξn+σ1 2− c)2  (54) (55) (56) This expression shows that S is a monotonically non-increasing function of the parameter γ .

From the condition ∂S / ∂γ = 0 , an equation can be obtained to determine the optima l value of the parameter γ that provides the minimum value S

ACγ 2 + BDγ − BC = 0.

5. Numerical experiments

The problem of ADALINE parameters adjustment was considered. Sequences of normally distributed quantities х(k) ~ Ν (0;1) were chosen as the input signa l х(k) . When testing the robustness of the algorithms, an independent noise distributed according to the Rayle igh law with σ = 1 was added to the output signa l of the object. The histogram of such noise is shown in fig. 2. The simulation results for various values of the parameter are shown in fig. 3. In fig. 4 shows the graphs of changes in the error when choosing the RWLS algorithm (15)-(16) and algorithm (31) respective ly, here

RMSE = 1 2 2 cn − c* , where cn and c* denote estimated and target parameters vectors respectively.

6. Conclusion

The work considered an adaptive robust learning a lgorithm for ADALINE when using the information criterion of correntropy with variable center as a learning criterion.

The properties of its convergence in the stationary and non-stationary cases in conditions of non-Gaussian noises are investigated.

The importance of choosing the width of the Gaussian kernel, which affects the rate of convergence of estimation a lgorithms and the error in the steady state, is noted, and the expediency of developing procedures for adaptive correction of the kernel width is indicated.

The estimates obtained are quite general and depend both on the degree of nonstationarity of the object and on the statistical characteristics of useful signals and interference.

7. References

[1] Widrow, В., Hoff, М. Adaptive switching circuits, IRE WESCON Convention Record. Part 4.

New York: Institute of Radio Engineers, рp.96–104.(1960)

[2] Kaczmarz, S. Angenäherle Auflösung von Systemen linearer Gle ichungen, Bull. Int. Acad.

Polon. Sci. Lett., C 1, Sci. Math. Nat. Ser. A. S. 355-357.(1937) [25] Zhu, L., Song, C., Pan, L., Li, J. Adaptive Filtering Under the Maximum Correntropy Criterion

With Variable Center. IEEE Acces, pp. 105902–105908 (2019). [26] Wang, X., Han, J. Affine Projection Algorithm by Employing Maximum Correntropy Criterion for System Identification of Mixed Noise.IEEE Acces, 7, pp. 182515– 182526 (2019) [27] Dong,F., Q ian,G., Wang, S. Complex Correntropy with Variable Center: Definition, Properties, and Application to Adaptive Filtering. Entropy (Basel). 22(1): 70 p.( 2020) [28] Yang, J., Cao, J., Xue, A. Robust Maximum Mixture Correntropy Criterion-Based SemiSupervised ELM With Variable Center. IEEE Transactions on C ircuits and Systems II: Express Briefs, 67, 12, pp. 3572–3576 ( 2020) [29] Zhang, J., Huang, G., Zhan, L. Generalized Correntropy Criterion-Based Performance

Assessment for Non-Gaussian Stochastic Systems. Entropy, 23, 764 (2021).

[ 30 ] 30. Li, Y., Wang, Y., Sun, L. A Proportionate Normalized Maximum Correntropy Criterion Algorithm with Correntropy Induced Metric Constra int for Identifying Sparse Systems.

Symmetry, 10, 683 (2018). [31] 31. Zhang, J., Huang, G., Zhan, L. Generalized Correntropy Criterion-Based Performance

Assessment for Non-Gaussian Stochastic Systems. Entropy, 23, 764 (2021).

[ 32 ] 32. Wang, X., Han, J. Affine Projection Algorithm by Employing Maximum Correntropy

Criterion for System Identification of Mixed Noise. IEEE Acces, 7, pp. 182515–182526 (2019). [33] 33. Sun, Q., Zhang, H., Wang, X., Ma, W., C hen, B. Sparsity Constrained Recursive Generalized Maximum Correntropy Criterion With Variable Center Algorithm. IEEE Transactions on C ircuits and Systems II: Express Briefs, 67, 12, pp. 3517–3521 (2020). [34] Xie, F., Hu,T., Wang, S., Wang, D. Maximum Correntropy Criterion with Distributed Method.

Mathematics, 10, 304, 17 p. (2022). [35] Wang, X., Han, J. Affine Projection Algorithm by Employing Maximum Correntropy Criterion for System Identification of Mixed Noise. IEEE Acces, 7, pp. 182515–182526 (2019). [36] Price, R. A useful theorem for nonlinear devices having Gaussian inputs. IEEE Transactions on Information Theory, 4 (2), pp. 69–72. (1958).

Control , 1993 , 57 , p. 1269 - 1271 . [3] Либероль , Б.Д. , Руденко . О.Г, Бессонов, А.А. Исследование схо димости одношаговых

32 ( 2018 ). [4] Хьюбер , П. Робастность в статистике . М.:Мир . 304 с .( 1984 ) [5] Hampel , F.R. , Ronchetti , E.M. , Rousseeuw , P.J. , Stahel , W.A. Robust Statistics . The Approach

Based on Influence Functions . N.Y.: John Wiley and Sons, 526 p.( 1986 ) [6] Rudenko , O. , Bezsonov , O. Function approximation using robust radial basis function networks .

J. of Intelligent Learning Systems and Applications , 3 , pр. 17 - 25 ( 2011 ). [7] Руденко , О.Г. , Бессонов , А.А. М-обучение радиально-базисных сетей с использованием

( 2012 ). [8] Walach , E. Widrow D. The least mean fourth (LMF) adaptive algorithm and its family , IEEE

Trans , IT 30, pр. 275 - 283 ( 1984 ). [9] Chambers , J. , Avlonitis,

А. A Robust

Mixed-Norm Adaptive Filter Algorithm. IEEE Signal

Processing Letters , 4 , 2 , pp. 46 - 48 ( 1997 ). [10] Papoulis , E.V. , Stathaki

T. A Normalized

Robust

Mixed-Norm Adaptive Algorithm for System

Identification , IEEE Signal Processing Letters , 2004 , 11 , 1 , p. 56 - 59 [11] Chambers , J. , Tanrikulu , O. , Constantinides , A.G.

Least mean mixed-norm adaptive filtering,

Electronics letters, 30 , 19 , pp. 1574 - 1575 ( 1984 ). [12] Zerguine , А. A variable-parameter normalized mixed-norm (VPNMN) adaptive algorithm .

EURASIP Journal on Advances in Signal Processing , 55 , 13 p. ( 2012 ) [13] Руденко , О.Г. , Безсонов , О.О., Сердюк , Н.М. Олійник, К.О. Романюк, О.С. Робастна

інформації , 1 ( 160 ), с. 80 - 88 ( 2020 ). [14] Principe , J.C. , Xu , D. , Zhao , Q. , Fisher, J. W.

Learning from examples with information theoretic

criteria. J. VLSI Signal

Process . Syst., 26 , 1 - 2 , p. 61 - 77 ( 2000 ). [15] Principe , J.C. , Xu , D. , Fisher, J. Information theoretic learning / In: S. Haykin (Ed.),

Unsupervised

Adaptive Filtering . New York: Wiley, pp. 265 - 319 ( 2000 ). [16] Liu , W. , Pokharel , P.P , Principe, J.C.

Correntropy: Properties and Applications in Non-Gaussian

Signal

Processing . IEEE Trans. on Signal Processing , 1 , pp. 5286 - 5298 .( 2007 ) [17] Wang , W. , Zhao , J. , Qu , H. , Chen , B. , Principe , J.C.

An adaptive kernel width update method of

(DSP) , pp. 916 - 920 ( 2015 ). [18] Chen , B. , Xing , L. , Liang , J. , Zheng , N , Principe, J.C.

Steady-state mean-square error analysis

21 ( 7 ), pp. 880 - 884 ( 2014 ). [19] Ma , W. , Qua , H. , Guib , G. , Li , L. , Zhao , J. , Chen , B.

Maximum correntropy criterion based

environments. J. of the Franklin Institute , 352 , 2 , pp. 2708 - 2727 ( 2015 ) [20] Xiong , W. , Schindelhauer , C. , So , H , Wang , Z.

Maximum Correntropy Criterion for Robust

Processing , 40 , pp. 6325 - 6339 ( 2021 ). [21] Hu , C. , Wang , G ,. Ho , K.C. , Liang

. Robust Ellipse Fitting With Laplacian Kernel Based

Maximum

Correntropy

Criterion . IEEE Transactions on Image Processing , 30 , pp. 3127 -

3141 ( 2021 ). [22] Flores , T.K.S. , Villanueva , J.M.M. , Gomes , H.P. , Catunda , S.Y.C. Adaptive Pressure Control

System Based on the Maximum Correntropy Criterion. Sensors , 21 ( 15 ), 5156 ( 2021 ). [23] Hu , T. Kernel-based maximum correntropy criterion with gradient method , CPAA , 19 ( 8 ),

4159- 4177 ( 2020 ). [24] Chen , B. , Wang , X. , Li , Y. , Principe , J.C

Maximum

Correntropy Criterion with Variable Center .

IEEE Signal Process. Letter,26.(5 ) , pp. 1212 - 1216 ( 2019 )