1. Introduction

Using Ellipsoid Method for Nonsmooth Regression

0 Petro Stetsyuk 1 Taras Shevchenko National University of Kyiv , Volodymyrska Street, 60, Kyiv, 01033 , Ukraine 2 V.M. Glushkov Institute of Cybernetics of the NASU , Academician Glushkov Avenue, 40, Kyiv, 03187 , Ukraine

Regression type problems are of great interest and importance due to their numerous applications in modern science and technology areas. Both smooth and nonsmooth models of that type require effective methods for finding unknown parameters withsufficient accuracy. Presented in the paper is a general model that covers a few well-known methods, such as LASSO and RIDGE regression. The empq algorithm based on the Shor's ellipsoid method for minimizing the model function is proposed. A brief description of the ellipsoid method is given, as well as convergence theorem, which allows one to estimate computational complexity of the empq algorithm. Results of three computational experiments of using the general model for solving 1D total variation (TV) denoising problem are presented. In the first two experiments piecewise constant and piecewise linear functions are restored, in the third experiment bitcoin open price curve is denoised. Obtained results show prospects for using the general model and the empq algorithm for solving regression and image processing problems.

eol>ellipsoid method nonsmooth regression models convex optimization total variation denoising ROF model regularization image processing1

1. Introduction

Regression-type problems with nonsmooth functions are known to be one of the main areas of research in mathematical programming and its applications. This is largely motivated by constant emergence of new application areas and computing technologies progress, which provides new computing paradigms – cluster computing architectures, grid- and cloud computing, which correspond well to mathematical ideas underlying non-smooth optimization [1].

Classical regression models, which are an important type of supervised learning problems, have many practical applications. Now, they are one of the most statistically justified, and intensive work on their generalizations and improving accuracy of solutions obtained is actively underway [ 2]. In particular, one issue of that kind is the study of the family of regression model training methods between the least moduli method and the least squares method. The least moduli method allows one to ensure robust estimation of model parameters, but is more difficult to use than the least squares method due to the nonsmoothness of the loss function.

Using generalized regression models with ability to control several specific parameters permits to have a qualitative impact on solutions changing them depending on need and external conditions. It makes them extremely useful in such areas as digital signal and image processing [ 3], image compression, compressed sensing etc. Noises of different types appear continuously and even small amounts of them can significantly complicate obtaining accurate solutions. Since these models remain to be convex for all used parameter intervals (but may be nonsmooth though), the development of general and special efficient methods for minimizing such models is of great importance. Moreover, denoising processes for images require significant computational powers because of sizes and detailing of images, so these methods must be able to deal with high-dimensional problems and obtain solutions with sufficient accuracy.

This paper presents a general model that covers some well-known methods like LASSO and RIDGE regression, and proposes a new algorithm based on Shor’s ellipsoid method for its minimization. Further, the total variation (TV) denoising problem is considered, as well as the way it can be formulated using the general model proposed. Finally, results of three computational experiments dedicated to 1D TV denoising are presented that show computational possibilities of the algorithm and the model proposed.

2. Formulation of the general model

j=1,n and C ={cij }ij==11,,kn, vector y ={ yi }i=1,m be given. We consider the following Let matrices A ={aij }i=1,m convex optimization problem:

p f pq ( x )=∑|yi−∑ aij x j| + λ ∑|∑ cij x j| → min , m n k n i=1 j=1 i=1 j=1 x∈ Rn q where f *pq=f ( x*pq)=xm∈iRnn f pq ( x ). Here x={xi }i=1,n is a vector of variables; λ ≥ 0, 1 ≤ p , q ≤ 2 are given scalars. The function f pq ( x ) is nonsmooth if p=1 and q=1, and smooth if p>1 and q >1. The problem ( 1 ) can be rewritten in matrix form as follows: f pq ( x )=‖y − Ax‖pp+ λ‖Cx‖q → min , q x∈ Rn ( 1 ) ( 2 ) ( 3 ) which is to minimize nonsmooth function f 2L,A1SSO ( x ) and uses L1-regularization. This method assumes that the coefficients of the model are sparse, meaning that few of them are non-zero. LASSO is closely related to basis pursuit denoising [5] in the field of signal processing and has potential applications in image compression and compressed sensing.

If C = I , p=2 and q=2, we get so called RIDGE-regression model [6]: n where ‖∙‖pp is a p-th power of Lp-norm of a vector x∈ Rn, which is defined as‖x‖pp=∑|x|p. i=1

In the problem of finding parameters of regression models m × n-matrix A is usually called a regression or observation matrix, and the first summand of the function f pq ( x ) is a p-th power of Lp-norm of residuals y − Ax of a regression model. The second summand of the function f pq ( x ) is considered as regularization addition, where λ ≥ 0 is a regularization parameter. Matrix C of a size k × n defines interconnections between variablesx j, j=1 , n, of a regularization addition. With some values of parameters p, q set and defined matrixC the model ( 1 ) covers some well-known models. In particular, if C = I , p=2 and q=1, we get so called LASSO-regression model [4]: 2 n f 2L,A1SSO ( x )=‖y − Ax‖2+ λ ∑|xi|→ min ,

i=1 x∈ Rn 2 n f 2R,I2DGE ( x )=‖y − Ax‖2+ λ ∑ xi2 → min , i=1 x∈ Rn which is to minimize strictly convex smooth function f 2R,I2DGE ( x ) and uses L2-regularization. This model is usually used for estimating coefficients of regression model if the independent variables are highly correlated. Also, it can be useful for dealing with multicollinearity problems, which commonly occur in models with large number of parameters [7].

Finally, when λ=0, then the problem ( 1 ) corresponds to well-known least squares method (LSM) if p=2 and the least moduli method (LMM) if p=1. LSM plays a key role in obtaining estimates and finding unknown parameters in statistics, and LMM has proven to be robust to anomalous observations or outliers [8, 9].

The problem ( 1 ) is a problem of unconditional minimization of the convex function f pq ( x ), subgradient of which at the point x is calculated using the following formula: gf pq ( x j)= p ∑|∑ aij x j− yi| m n i=1 j=1 p−1

n s ign(∑ aij x j− yi)aij+

j=1 + λq ∑|∑ cij x j| k n i=1 j=1 q−1

n s ign(∑ cij x j)cij , j=1 or we can write it in matrix-vector form as follows: gf pq ( x )= p AT (|Ax− y|p−1⊙ sign ( Ax− y ))+ λq CT (|Cx|q−1⊙ sign (Cx )) , ( 4 ) ( 5 ) ( 6 ) where |t| is the modulus (absolute value) of the number t , and ⊙ denotes the elementwise Hadamard product.

So, the general model ( 1 ) allows us to cover some well-known models and methods with different properties and possibilities. Since the function f pq ( x ) is convex and can be both smooth or nonsmooth, we can consider using efficient nonsmooth optimization methods for its minimization. In particular, the Shor’s ellipsoid method [10, 11, 12] can be used, which is implemented as the emshor program [13]. In the next section a brief description of the ellipsoid methods is given, as well as Octave implementation of the empq algorithm, which is designed for the function f pq ( x ) minimization.

3. Ellipsoid method and the empq algorithm

Yudin-Nemirovsky-Shor’s ellipsoid method is based on using ellipsoid of the minimal volume in En, which contains a semi-ball obtained as a result of intersection of n-dimensional ball and half space, which passes through its center. The ellipsoid has a flattened shape in the direction of normal to the hyperplane, which passes through the center of the ball with radius r. Its parameters (see Figure 1) are as follows: a is a length of the minor semi-axis in the direction of normal, which defines a semi-ball; b is a length of the major semi-axis (the number of such axes equals n−1); h is a distance from the center of the ball to the center of the ellipsoid in the direction of its minor semi-axis.

Iteration of the ellipsoid method is to proceed from the current ellipsoid to the next one with constant coefficient of their volumes decreasing. This coefficient is determined by the ratio of volume of the ellipsoid with semi-axes a and b to the volume of the ball with radius r in En and can be written as follows: It is shown in [13] that

a b n−1 qn=( r )( r )

n n n−1 = n+1 ( √n2−1 ) <1. so, for the big values of n the coefficient qn is approximated by the asymptotic formula qn<exp { }<1 , −1 2 n qn ≈ 1− ‖BTk gf pq ( xk )‖

To use the ellipsoid method for finding the minimum pointx*pq of the problem ( 1 ) we must provide it to be localized in n-dimentional ball of radius r0 with center at the point x0∈ Rn, i.e. ‖x0− x*pq‖≤ r0. The algorithm to be used is called the empq algorithm, description of which is given below.

The empq algorithm. The input parameter of the algorithm is accuracy ε f >0, with which f *pq=f pq ( x*pq) is to be found.

Initialization. Let us consider n × n-matrix B and set B0≔ I n, where I n is n × n identity matrix. We move to the first iteration with values x0, r0 and B0. Let values xk∈ Rn, rk, Bk be found at the iteration k. Transition to the iteration k +1 consists of the following sequence of steps.

Step 1. Calculate f pq ( xk ) and subgradient gf pq ( xk ) at the point xk using formula ( 5 ) or ( 6 ). If rk‖BTk gf pq ( xk )‖≤ ε f , then “Stop: k*=k and x*pq= xk”. Otherwise, proceed to Step 2.

Bk gf pq ( xk )

Step 3. Calculate the next point Step 4. Calculate

x k +1 := xk −hk Bk ξk, where hk=

1 n+1 r k.

Step 5. Go to the iteration k +1 with values xk +1, r k +1, Bk +1.

k* Theorem. Sequence of points {xk }k=0 satisfy the following inequalities:

B k +1 :=Bk +(√ n−1 n+1 −1)( Bk ξk ) ξk

T and r k +1 :=r k

n √ n −1 2 k ( xk − x pq )‖≤ r k, k =0,1,2 , … , k .

−1 * * On each iteration k >0 the value of decreasing of volume of the ellipsoid Ek ={x∈ R :‖Bk ( xk − x )‖≤ r k }, which localizes point x pq, is constant and equal to n −1 * qn= vol ( Ek ) vol ( Ek−1) =√ ( n−1 n+1

n √ n −1 2

n ) <exp { −1 2 n

}<1. −ln 10 ln q

The theorem implies that the ellipsoid method converges with geometric progression rate with coefficient q <exp { n −1 2 n }<1 [13]. It also allows us to estimate computational complexity of the

* algorithm empq for finding x pq and affirm that it can be successfully run on modern computers, if n=30 ÷ 100. Indeed, to decrease in 10 times volume of the ellipsoid localizing the point x*pq, we need to perform K iterations, where K = ≈ ( 2 ln 10 ) n ≈ 4.6 n. It means that in order to improve * deviation of found record value of the function f pq ( x ) from its optimal value f pq by 10 times, it is necessary to perform 4.6 n iterations of the algorithm for findingx*pq.

−6 If n=30 and ε f =10

× f ( x0), then the maximal number of iterations of the algorithm is equal to 4.6 n2=46 × 900=41400. Also, if n=100, the maximal number equals 460 000 iterations. Therefore, even the straight-up matrix-vector implementation of calculation of the function f pq ( x ) value and its subgradient according to the formula ( 5 ) allows to provide fast algorithm work on modern computers. Below we will confirm this with the results of computational experiments using Intel Core i7-10750H processor, 2.6 GHz, 16 Gb RAM and GNU Octave 6.3.0 language.

Algorithm for finding an approximation to the point x is implemented using Octave language. * pq Code of the algorithm is given below. # Input parameters: #com01 # A(m,n) – observation matrix; #com02 # C(k,n) – regularization summand matrix; #com03 # y(m,1) – output vector; #com04 # p,q – scalar parameters, 1<=p<=2, 1<=q<=2; #com05 # lambda – regularization rate; #com06 # x0(n,1) – starting point; #com07 # r0 – radius of the ball centered at x0 that localizes x_{pq}^*; #com08 # epsf, maxitn – stop parameters: #com09 # epsf – precision to stop by the value of the function fpq, #com10 # maxitn – maximal number of iterations; #com11 # intp – print information for every intp iteration. #com12 # Output parameters: #com13 # xpq(n,1) – approximation to x_{pq}^*; #com14 # fpq – value of the function f_{pq} at the point xpq; #com15 # itn – the number of iterations; #com16 # ist – exit code: 1 – epsf, 4 – maxitn. #com17 function [xpq,fpq,itn,ist] = empq(y,A,C,p,q,lambda,x0,r0,

epsf,maxitn,intp); #row01 n = columns(A); xpq = x0; B = eye(n); r = r0; #row02 dn = double(n); beta = sqrt((dn-1.d0)/(dn+1.d0)); #row03 for (itn = 0:maxitn) #row04 temp1 = A*xpq - y; temp2 = C*xpq; #row05 fpq = sum(abs(temp1).^p) + lambda*sum(abs(temp2).^q); #row06 g1 = p*A'*abs(temp1).^(p-1).*sign(temp1) + ...

lambda*q*C'*(abs(temp2).^(q-1).*sign(temp2)); if((mod(itn,intp)==0)&&(intp<=maxitn))

printf(" itn %4d fp %14.6e\n",itn,fp); endif g = B'*g1; dg = norm(g); if(r*dg < epsf) ist = 1; return; endif xi = (1.d0/dg)*g; dx = B * xi; hs = r/(dn+1.d0); xpq -= hs * dx; B += (beta - 1) * B * xi * xi'; r = r/sqrt(1.d0-1.d0/dn)/sqrt(1.d0+1.d0/dn); endfor ist = 4; endfunction

Core of the empq program is located in the for loop (rows 4–17). First, the value of the function f (line 6) and its normalized subgradient at the point x pq (row 11) are calculated. If the stop condition is satisfied (row 12), the algorithm stops its work. Stop in the empq algorithm occurs when a condition r k‖Bk gf pq ( xk )‖≤ ε f is fulfilled, which is equivalent to condition f pq ( xk )− f *pq ≤ ε f .

T Otherwise, the next point xk +1 is calculated (row 14), the space transformation matrix Bk +1 (row 15) and the radius r k +1 (row 16) are recalculated.

4. Total variation denoising

In addition to machine learning (in particular, regression and classification models, SVM etc.), models of the type ( 1 ) are commonly used in various applied fields, such as signal processing. This domain belongs to an electrical engineering area and focuses on processing, analyzing and synthesizing signals of different nature, such as scientific measurements, sounds, images, currency fluctuations [15] etc.

The problem of interest is total variation (TV) denoising (filtering), which is a noise removal process. Its main principle claims that signals with excessive and probably incorrect details have rather high total variation. The problem considered is to reduce the variation (and therefore unnecessary details) subject to staying as close as possible to the original signal and preserving important details, such as edges. This allows us to achieve, in particular, digital storage efficiency, since signal preservation in such form requires much less memory, and analysis simplicity. The concept described is known today as ROF model [16].

The total variation problem can be formulated using the described model of the problem ( 1 ). We need to find approximation xi, i=1 , n, to known signal yi, i=1 , n, and the first summand of the function f pq ( x ) denotes the closeness measure of them. The second summand represents the total variation and can be defined in different ways using matrixC form. So, the total variation problem could be rewritten using model ( 1 ) in the following way: f pq ( x )=∑ | yi− xi|p+ λ ∑ |∑ cij x j| → min , n k n i=1 i=1 j=1 x∈ Rn q ( 10 ) Here matrix A is of size n × n and equals to the identity matrix, λ ≥ 0 is a regularization parameter, which is used to regulate intensity of denoising. For instance, if p=2, q=1, and we use the following ( n−1) × n-matrix C1:

1 C1=(⋯ 0 0 −1 1 ⋯ ⋯

0 −1 ⋯ 0 ⋯ ⋯ ⋯ 1 0 ⋯ ) 0 −1 which is two-diagonal with band ( 1 ,−1 ) (all the other elements of which equal zero), model f pq ( x ) turns into the following model: As a result of C1 x multiplication, we get vector C1 x=( x1− x2 , x2− x3 , … , xn−1− xn)T, which consists of pairwise differences between consecutive elements of vector xi, i=1 , n. Thus, the second summand in ( 11 ) represents the total variation, and since the function f 2,1 ( x ; C1) is to be minimized, the vector C1 x is expected to be sparse if the parameter λ is big enough. So, we can regulate denoising level using parameter λ: if λ=0, there is no smoothing and we just restore the signal yi, i=1 , n, precisely. Otherwise, as λ → ∞, the total variation decreases, and filtering process works more intensive. As a result of this process, we get piecewise constant function.

Another form of matrix C that can be used is as follows:

1 C2=(⋯ 0 0 −2 1 ⋯ ⋯ Matrix C2 is tridiagonal matrix of size ( n−2) × n with band ( 1 ,−2,1 ). If p=2 and q=1, we get the following model:

n n−2 f 2,1 ( x ; C2)=∑ ( yi− xi)2+ λ ∑|xi−2 xi+1+ xi+2|.

i=1 i=1 In this case, elements of sparse vector C2 x can be interpreted as difference analogues of the second derivative, and we get piecewise-linear function as a result of filtering process.Moreover, if q=2, we talk about smooth second derivative approximation, which is used to obtain smoother piecewise function as a result.

Total variation denoising technique is of particular interest in the field of image processing, where noisy (stained) or corrupted images are to be cleaned and restored. TV-regularization allows to smooth away noise and preserve edges, unlike famous linear smoothing or median filtering techniques [17]. In this case, a 2D signal yi , j is considered, and the total variation, according to [16], could be written in the following form: ( 12 ) ( 13 ) ( 14 ) ( 15 ) or as its anisotropic version

V ( y )=∑ √|yi+1, j− yi , j| +|yi , j+1− yi , j| ,

2 2 i , j V an ( y )=∑|yi+1, j− yi , j|+|yi , j+1− yi , j|.

i , j Such a problem still can be formulated using model ( 1 ) with p=2 and q=1. Common methods used for solving this problem are primal-dual interior-point methods [18] or the split-Bregman method [19]. It also should be noted that there is a popular approach based on reducing nonsmooth problems ( 11 ) and ( 12 ) to saddle form. For instance, we can rewrite ( 11 ) as 2 min max (‖x− y‖2+(C1 x , p)) , x∈ Rn p∈ B∞ where B∞={p∈ Rn : max|pi|≤ λ}. Then, to obtain approximate solution of the smooth minmax 1≤i ≤ n problem one could use extragradient type methods [20, 21] or proximal primal-dual algorithms [22].

Since the problems considered remain convex (but not necessarily smooth) if p ≥ 1 and q ≥ 1, some efficient nonsmooth optimization methods can be used for their solving, such as the ellipsoid method (see Section 3) or Shor’s r-algorithms [12].

5. Computational experiments

In this section, we present the results of computational experiments obtained using the empq algorithm for 1D TV denoising problem.

Test 1. Let us consider piecewise constant function f 1 ( x ), domain of which is divided into three intervals [ 0,40 ], [ 40,70 ], and [ 70,100 ]. For each of the intervals j, j=1,3, value of the function f 1 ( x ) is calculated via the formula

f 1j ( x )=randint j (30)+ sign (u ) (|u|+3) , where randint j (30) denotes a random integer from the interval [ 1,30 ], and u is a random number from the uniform distribution on the interval [−1,1]; sign and |∙| are the signum function and the absolute value function respectively. Such construction of the range of the function f 1 ( x ) means that to each of the constant functions of the intervals j, j=1,3, a random noise is added. To restore the original piecewise constant function f 1 ( x ), model ( 11 ) is used, which corresponds to the model ( 1 ) with p=2, q=1 and C =C1. Results of the empq algorithm work for this problem with two values of λ are presented in Figure 2.

In Figure 2, the initial piecewise constant function is colored red, its noised version f 1 ( x ) is colored dark blue, and its denoised version, obtained using the empq algorithm, is colored green. As can be seen, with λ=10 we get the restored curve being close to the initial noisy curve; it strives to duplicate ups and downs of the second showing its trend. On the contrary, using parameter value λ=50 allows one to restore the initial piecewise constant function; selection of λ makes it possible to regulate the intensity of noise filtering.

Test 2. Consider the following piecewise linear function f 2 ( x ):

x , if x∈ [ 1,30 ] f 2 ( x )={−3 x +120 , if x∈ [ 31,60 ] .

−60 , if x∈ [ 61,80 ] 2 x−220 , if x∈ [ 81,100 ] To simulate noise, we use the same value sign (u) (|u|+ 4 ) from the previous test, where u is a random number from the uniform distribution on the interval [−1,1]. We use model ( 12 ), i.e. parameters p=2, q=1 and C =C2. Results of filtering of noisy piecewise linear function using the empq algorithm for two values of λ are presented in Figure 3. Here, the initial piecewise linear function is colored red, its noisy version f 2 ( x ) is colored blue, and the denoised version of the initial function is colored green.

As can be seen from Figure 3, using of matrix C2 works better, if a piecewise linear function is denoised rather than a piecewise constant one. With λ=10 we observe how the denoised curve tends to duplicate the noised one (left picture in Figure 3), but with λ=50 we get denoised piecewise linear function being much closer to the initial one (right picture in Figure 3). It should be also noted that choosing parameter q=2 in model ( 1 ) means using smooth 2-derivative approximation, so the denoised curve obtained has edges smoothed away. So, using parameters p and q, as well as regularization coefficient λ and matrix C allows one to control the level of denoising process.

Test 3. Along with a bit synthetical examples, which are though quite important for demonstration of structure of the models described, let us consider results of computational experiments with real datasets, namely bitcoin price evolution. The dataset taken has been collected from the Binance API with open price data captured at one-minute intervals from 11:40 to 13:19 on 01/08/2023. We use model ( 1 ) with p , q=1,2, λ∈ {10,30,50,70 }, and matrix C ={C1 , C2}. The results of using the empq algorithm for given bitcoin dataset denoising is presented in Figure 4 and Figure 5. Here, bitcoin open price data is colored blue, and denoised curve is colored green.

As mentioned before, using matrix C =C1 we obtain piecewise constant function as a result of denoising process (see Figure 4), and using matrix C =C2 we get piecewise linear function approximation (see Figure 5), which fits the data much better. In the first case, further increase ofλ leads to ignoring important details and edges in data, since the resulting curve tends to be just constant. In the second case, this tendence remains, but we get more details preserved in the ups and downs of a piecewise linear function. Finally, using smooth 2-derivative with p=2, q=2 and matrix C =C2 gives us smoothed curve as a result (see Figure 6).

Conclusion

The use of different smooth and nonsmooth regression models is rather common not only in modern areas of artificial intelligence and machine learning, but also in a number of applied fields, such as image processing and compression, compressed sensing, noise reduction etc. Problems that appear in these fields are often formulated as nonsmooth or convex optimization problems, so corresponding methods of that kind can be applied.

Proposed empq algorithm, which is based on the ellipsoid method, is capable of solving convex optimization problems corresponding to total variation denoising problems if the number of points in signal sample is not greater than n=100 ÷ 120. This number is possible to be increased to some level via using more advanced computing capabilities. The general model considered allows one to regulate denoising level and smoothness of a resulting curve by changing parameters p, q, coefficient λ and choosing the proper form of matrix C to determine interconnections between elements of variables vector in regularization summand. Results of computational experiments demonstrate the prospect and flexibility of such models.

As mentioned before, for solving the problems considered it is appropriate to use other types of methods, such as splitting methods [23], various extragradient schemes, Shor’s r-algorithms [11, 12] etc. The last ones are based on using space transformation procedure in the direction of two successive subgradients and provide accelerated convergence while minimizing nonsmooth convex functions of thousands of variables. But the main advantage of the algorithm proposed is obtaining a solution with any given accuracy, whereas other methods are usually limited in this aspect. Nevertheless, the applications of these methods to the problem considered are of a great interest; investigation and comparison of methods efficiency will be conducted in further publications.

Acknowledgements

The paper is supported by the NASU grant for research laboratories/groups of young scientists №02/01-2024( 5 ), and the DTT TS KNU NASU project № 0124U002162.

Declaration on Generative AI

The authors have not employed any Generative AI tools.

[1]

I.V.

Sergienko , Methods of Optimization and Systems Analysis for Problems of Transcomputational Complexity , Springer, New York, NY, USA, 2012 .

[2]

Hastie ,

Tibshirani , J. Friedman,

The Elements of Statistical Learning: Data Mining, Inference, and

Prediction , 2nd ed., Springer Series in Statistics, Textbook, Springer, Berlin/Heidelberg, Germany, 2016 .

[3]

Gonzalez , Digital image processing , New York, NY, Pearson, 2018 .

[4] Tibshirani , Robert, Regression Shrinkage and Selection via the lasso , Journal of the Royal Statistical Society, Series B (methodological) 58 (1) ( 1996 ) 267 - 288 . doi:10.1111/j.2517- 6161 . 1996 .tb02080.x. JSTOR 2346178 .

[5] Chen , Shaobing, Donoho, D. Basis pursuit , in: Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers , Vol. 1 , 1994 , pp. 41 - 44 . doi: 10 .1109/ACSSC. 1994 . 471413 .

[6]

A. E.

Hoerl , R. W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems , Technometrics 12 (1) ( 1970 ) 55 - 67 . doi:10.1080/00401706. 1970 . 10488634 .

[7]

Kennedy , A Guide to Econometrics, Fifth ed., Cambridge, The MIT Press, 2003 .

[8]

P.J.

Huber ,

E.M.

Ronchetti , Robust Statistics, John Wiley & Sons, 2nd Edition , 2011 .

[9]

Fan , P. Hall, On curve estimation by minimizing mean absolute deviation and its implications , The Annals of Statistics ( 1994 ) 867 - 885 .

[10]

N.Z.

Shor , Cutting-off Method with Space Dilation for Solving Convex Programming Problems , Cybernetics ( 1977 ) 94 - 95 .

[11]

N.Z.

Shor , Nondifferentiable Optimization and Polynomial Problems , Kluwer, Amsterdam, 1998 .

[12]

N.Z.

Shor , Minimization Methods for Non-Differentiable

Functions

, Berlin, SpringerVerlag, 1985 .

[13] Martin

Grötschel

, László Lovász, Alexander Schrijver, Geometric Algorithms and Combinatorial Optimization, Second Edition , Springer-Verlag, Berlin, Heidelberg, 1993 .

[14]

Stetsyuk ,

Fischer ,

Khomyak , The Generalized Ellipsoid Method and Its Implementation , Communications in Computer and Information Science ( 2020 ) 355 - 370 . doi: 10 .1007/978-3- 030 - 38603-0_ 26 .

[15] Alan

Oppenheim , Ronald W. Schafer, Discrete-Time Signal

Processing

, Prentice Hall, 1989 .

[16]

L. I.

Rudin ,

Osher , E. Fatemi, Nonlinear total variation based noise removal algorithms , Physica D. 60 ( 1-4 ) ( 1992 ) 259 - 268 .

[17]

Strong ,

Chan , Edge-preserving and scale-dependent properties of total variation regularization , Inverse Problems 19 (6) ( 2003 ) S165 - S187 .

[18]

Mehrotra , On the Implementation of a Primal-Dual Interior Point Method, SIAM Journal on Optimization 2 (4) ( 1992 ) 575 - 601 .

[19]

Goldstein , S. Osher, The Split Bregman Method for L1-Regularized Problems, SIAM J. Imaging Sci. 2 (2) ( 2008 ) 323 - 343 .

[20] Yu . Malitsky, Projected Reflected Gradient Methods for Monotone Variational Inequalities , SIAM Journal on Optimization 25 ( 2015 ) 502 - 520 .

[21]

S. V.

Denisov , D.

Nomirovskii ,

B. V.

Rublyov , V.

Semenov , Convergence of Extragradient Algorithm with Monotone Step Size Strategy for Variational Inequalities and Operator Equations , Journal of Automation and Information Sciences 51 ( 2019 ) 12 - 24 . doi: 10 .1615/JAutomatInfScien.v51. i6 . 20 .

[22]

Chambolle and

Pock , A first-order primal-dual algorithm for convex problems with applications to imaging , J. Math. Imaging. Vis . 40 ( 1 ) ( 2011 ) 120 - 145 .

[23]

Li ,

Wu , Splitting Methods for Nonconvex and Nonsmooth Optimization , Encyclopedia of Optimization ( 2022 ) 1 - 19 . doi: 10 .1007/978-3- 030 -54621-2_ 847 - 1 .