<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Ellipsoid Method for Nonsmooth Regression</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Petro Stetsyuk</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>Volodymyrska Street, 60, Kyiv, 01033</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>V.M. Glushkov Institute of Cybernetics of the NASU</institution>
          ,
          <addr-line>Academician Glushkov Avenue, 40, Kyiv, 03187</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Regression type problems are of great interest and importance due to their numerous applications in modern science and technology areas. Both smooth and nonsmooth models of that type require effective methods for finding unknown parameters withsufficient accuracy. Presented in the paper is a general model that covers a few well-known methods, such as LASSO and RIDGE regression. The empq algorithm based on the Shor's ellipsoid method for minimizing the model function is proposed. A brief description of the ellipsoid method is given, as well as convergence theorem, which allows one to estimate computational complexity of the empq algorithm. Results of three computational experiments of using the general model for solving 1D total variation (TV) denoising problem are presented. In the first two experiments piecewise constant and piecewise linear functions are restored, in the third experiment bitcoin open price curve is denoised. Obtained results show prospects for using the general model and the empq algorithm for solving regression and image processing problems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;ellipsoid method</kwd>
        <kwd>nonsmooth regression models</kwd>
        <kwd>convex optimization</kwd>
        <kwd>total variation denoising</kwd>
        <kwd>ROF model</kwd>
        <kwd>regularization</kwd>
        <kwd>image processing1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Regression-type problems with nonsmooth functions are known to be one of the main areas of
research in mathematical programming and its applications. This is largely motivated by constant
emergence of new application areas and computing technologies progress, which provides new
computing paradigms – cluster computing architectures, grid- and cloud computing, which
correspond well to mathematical ideas underlying non-smooth optimization [1].</p>
      <p>Classical regression models, which are an important type of supervised learning problems, have
many practical applications. Now, they are one of the most statistically justified, and intensive work
on their generalizations and improving accuracy of solutions obtained is actively underway [ 2]. In
particular, one issue of that kind is the study of the family of regression model training methods
between the least moduli method and the least squares method. The least moduli method allows one
to ensure robust estimation of model parameters, but is more difficult to use than the least squares
method due to the nonsmoothness of the loss function.</p>
      <p>Using generalized regression models with ability to control several specific parameters permits to
have a qualitative impact on solutions changing them depending on need and external conditions. It
makes them extremely useful in such areas as digital signal and image processing [ 3], image
compression, compressed sensing etc. Noises of different types appear continuously and even small
amounts of them can significantly complicate obtaining accurate solutions. Since these models
remain to be convex for all used parameter intervals (but may be nonsmooth though), the
development of general and special efficient methods for minimizing such models is of great
importance. Moreover, denoising processes for images require significant computational powers
because of sizes and detailing of images, so these methods must be able to deal with high-dimensional
problems and obtain solutions with sufficient accuracy.</p>
      <p>This paper presents a general model that covers some well-known methods like LASSO and
RIDGE regression, and proposes a new algorithm based on Shor’s ellipsoid method for its
minimization. Further, the total variation (TV) denoising problem is considered, as well as the way it
can be formulated using the general model proposed. Finally, results of three computational
experiments dedicated to 1D TV denoising are presented that show computational possibilities of the
algorithm and the model proposed.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Formulation of the general model</title>
      <p>j=1,n and C ={cij }ij==11,,kn, vector y ={ yi }i=1,m be given. We consider the following
Let matrices A ={aij }i=1,m
convex optimization problem:</p>
      <p>
        p
f pq ( x )=∑|yi−∑ aij x j| + λ ∑|∑ cij x j| → min ,
m n k n
i=1 j=1 i=1 j=1 x∈ Rn
q
where f *pq=f ( x*pq)=xm∈iRnn f pq ( x ). Here x={xi }i=1,n is a vector of variables; λ ≥ 0, 1 ≤ p , q ≤ 2 are
given scalars. The function f pq ( x ) is nonsmooth if p=1 and q=1, and smooth if p&gt;1 and q &gt;1. The
problem (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) can be rewritten in matrix form as follows:
f pq ( x )=‖y − Ax‖pp+ λ‖Cx‖q → min ,
q
x∈ Rn
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
which is to minimize nonsmooth function f 2L,A1SSO ( x ) and uses L1-regularization. This method
assumes that the coefficients of the model are sparse, meaning that few of them are non-zero. LASSO
is closely related to basis pursuit denoising [5] in the field of signal processing and has potential
applications in image compression and compressed sensing.
      </p>
      <p>If C = I , p=2 and q=2, we get so called RIDGE-regression model [6]:
n
where ‖∙‖pp is a p-th power of Lp-norm of a vector x∈ Rn, which is defined as‖x‖pp=∑|x|p.
i=1</p>
      <p>
        In the problem of finding parameters of regression models m × n-matrix A is usually called a
regression or observation matrix, and the first summand of the function f pq ( x ) is a p-th power of
Lp-norm of residuals y − Ax of a regression model. The second summand of the function f pq ( x ) is
considered as regularization addition, where λ ≥ 0 is a regularization parameter. Matrix C of a size
k × n defines interconnections between variablesx j, j=1 , n, of a regularization addition. With some
values of parameters p, q set and defined matrixC the model (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) covers some well-known models. In
particular, if C = I , p=2 and q=1, we get so called LASSO-regression model [4]:
2 n
f 2L,A1SSO ( x )=‖y − Ax‖2+ λ ∑|xi|→ min ,
      </p>
      <p>i=1 x∈ Rn
2 n
f 2R,I2DGE ( x )=‖y − Ax‖2+ λ ∑ xi2 → min ,
i=1 x∈ Rn
which is to minimize strictly convex smooth function f 2R,I2DGE ( x ) and uses L2-regularization. This
model is usually used for estimating coefficients of regression model if the independent variables are
highly correlated. Also, it can be useful for dealing with multicollinearity problems, which commonly
occur in models with large number of parameters [7].</p>
      <p>
        Finally, when λ=0, then the problem (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) corresponds to well-known least squares method (LSM)
if p=2 and the least moduli method (LMM) if p=1. LSM plays a key role in obtaining estimates and
finding unknown parameters in statistics, and LMM has proven to be robust to anomalous
observations or outliers [8, 9].
      </p>
      <p>
        The problem (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) is a problem of unconditional minimization of the convex function f pq ( x ),
subgradient of which at the point x is calculated using the following formula:
gf pq ( x j)= p ∑|∑ aij x j− yi|
m n
i=1 j=1
p−1
      </p>
      <p>n
s ign(∑ aij x j− yi)aij+</p>
      <p>j=1
+ λq ∑|∑ cij x j|
k n
i=1 j=1
q−1</p>
      <p>
        n
s ign(∑ cij x j)cij ,
j=1
or we can write it in matrix-vector form as follows:
gf pq ( x )= p AT (|Ax− y|p−1⊙ sign ( Ax− y ))+ λq CT (|Cx|q−1⊙ sign (Cx )) ,
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
where |t| is the modulus (absolute value) of the number t , and ⊙ denotes the elementwise Hadamard
product.
      </p>
      <p>
        So, the general model (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) allows us to cover some well-known models and methods with different
properties and possibilities. Since the function f pq ( x ) is convex and can be both smooth or
nonsmooth, we can consider using efficient nonsmooth optimization methods for its minimization. In
particular, the Shor’s ellipsoid method [10, 11, 12] can be used, which is implemented as the emshor
program [13]. In the next section a brief description of the ellipsoid methods is given, as well as
Octave implementation of the empq algorithm, which is designed for the function f pq ( x )
minimization.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Ellipsoid method and the empq algorithm</title>
      <p>Yudin-Nemirovsky-Shor’s ellipsoid method is based on using ellipsoid of the minimal volume in
En, which contains a semi-ball obtained as a result of intersection of n-dimensional ball and half
space, which passes through its center. The ellipsoid has a flattened shape in the direction of normal
to the hyperplane, which passes through the center of the ball with radius r. Its parameters (see
Figure 1) are as follows: a is a length of the minor semi-axis in the direction of normal, which defines
a semi-ball; b is a length of the major semi-axis (the number of such axes equals n−1); h is a distance
from the center of the ball to the center of the ellipsoid in the direction of its minor semi-axis.</p>
      <p>Iteration of the ellipsoid method is to proceed from the current ellipsoid to the next one with
constant coefficient of their volumes decreasing. This coefficient is determined by the ratio of volume
of the ellipsoid with semi-axes a and b to the volume of the ball with radius r in En and can be
written as follows:
It is shown in [13] that</p>
      <p>a b n−1
qn=( r )( r )</p>
      <p>n n n−1
= n+1 ( √n2−1 )
&lt;1.
so, for the big values of n the coefficient qn is approximated by the asymptotic formula
qn&lt;exp { }&lt;1 ,
−1
2 n
qn ≈ 1−
‖BTk gf pq ( xk )‖</p>
      <p>
        To use the ellipsoid method for finding the minimum pointx*pq of the problem (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) we must provide
it to be localized in n-dimentional ball of radius r0 with center at the point x0∈ Rn, i.e.
‖x0− x*pq‖≤ r0. The algorithm to be used is called the empq algorithm, description of which is given
below.
      </p>
      <p>The empq algorithm. The input parameter of the algorithm is accuracy ε f &gt;0, with which
f *pq=f pq ( x*pq) is to be found.</p>
      <p>Initialization. Let us consider n × n-matrix B and set B0≔ I n, where I n is n × n identity matrix.
We move to the first iteration with values x0, r0 and B0. Let values xk∈ Rn, rk, Bk be found at the
iteration k. Transition to the iteration k +1 consists of the following sequence of steps.</p>
      <p>
        Step 1. Calculate f pq ( xk ) and subgradient gf pq ( xk ) at the point xk using formula (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) or (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ). If
rk‖BTk gf pq ( xk )‖≤ ε f , then “Stop: k*=k and x*pq= xk”. Otherwise, proceed to Step 2.
      </p>
      <p>T</p>
      <p>Bk gf pq ( xk )</p>
      <sec id="sec-3-1">
        <title>Step 3. Calculate the next point</title>
      </sec>
      <sec id="sec-3-2">
        <title>Step 4. Calculate</title>
        <p>x
k +1 := xk −hk Bk ξk, where hk=</p>
        <p>1
n+1
r k.</p>
        <p>Step 5. Go to the iteration k +1 with values xk +1, r k +1, Bk +1.</p>
        <p>k*
Theorem. Sequence of points {xk }k=0 satisfy the following inequalities:</p>
        <p>B
k +1 :=Bk +(√
n−1
n+1
−1)( Bk ξk ) ξk</p>
        <p>T
and
r
k +1
:=r
k</p>
        <p>n
√ n −1
2
k ( xk − x pq )‖≤ r k, k =0,1,2 , … , k .</p>
        <p>−1 * *
On
each
iteration
k &gt;0
the
value
of
decreasing
of
volume
of
the
ellipsoid
Ek ={x∈ R :‖Bk ( xk − x )‖≤ r k }, which localizes point x pq, is constant and equal to
n −1 *
qn=
vol ( Ek )
vol ( Ek−1)
=√ (
n−1
n+1</p>
        <p>n
√ n −1
2</p>
        <p>n
) &lt;exp {
−1
2 n</p>
        <p>}&lt;1.
−ln 10
ln q</p>
        <p>n</p>
        <p>The theorem implies that the ellipsoid method converges with geometric progression rate with
coefficient q &lt;exp {
n
−1
2 n
}&lt;1 [13]. It also allows us to estimate computational complexity of the</p>
        <p>*
algorithm empq for finding x pq and affirm that it can be successfully run on modern computers, if
n=30 ÷ 100. Indeed, to decrease in 10 times volume of the ellipsoid localizing the point x*pq, we need
to perform K iterations, where K =
≈ ( 2 ln 10 ) n ≈ 4.6 n. It means that in order to improve
*
deviation of found record value of the function f pq ( x ) from its optimal value f pq by 10 times, it is
necessary to perform 4.6 n iterations of the algorithm for findingx*pq.</p>
        <p>−6
If n=30 and ε f =10</p>
        <p>
          × f ( x0), then the maximal number of iterations of the algorithm is equal to
4.6 n2=46 × 900=41400. Also, if n=100, the maximal number equals 460 000 iterations.
Therefore, even the straight-up matrix-vector implementation of calculation of the function f pq ( x )
value and its subgradient according to the formula (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ) allows to provide fast algorithm work on
modern computers. Below we will confirm this with the results of computational experiments using
Intel Core i7-10750H processor, 2.6 GHz, 16 Gb RAM and GNU Octave 6.3.0 language.
        </p>
        <p>Algorithm for finding an approximation to the point x
is implemented using Octave language.
*
pq
Code of the algorithm is given below.
# Input parameters: #com01
# A(m,n) – observation matrix; #com02
# C(k,n) – regularization summand matrix; #com03
# y(m,1) – output vector; #com04
# p,q – scalar parameters, 1&lt;=p&lt;=2, 1&lt;=q&lt;=2; #com05
# lambda – regularization rate; #com06
# x0(n,1) – starting point; #com07
# r0 – radius of the ball centered at x0 that localizes x_{pq}^*; #com08
# epsf, maxitn – stop parameters: #com09
# epsf – precision to stop by the value of the function fpq, #com10
# maxitn – maximal number of iterations; #com11
# intp – print information for every intp iteration. #com12
# Output parameters: #com13
# xpq(n,1) – approximation to x_{pq}^*; #com14
# fpq – value of the function f_{pq} at the point xpq; #com15
# itn – the number of iterations; #com16
# ist – exit code: 1 – epsf, 4 – maxitn. #com17
function [xpq,fpq,itn,ist] = empq(y,A,C,p,q,lambda,x0,r0,</p>
        <p>epsf,maxitn,intp); #row01
n = columns(A); xpq = x0; B = eye(n); r = r0; #row02
dn = double(n); beta = sqrt((dn-1.d0)/(dn+1.d0)); #row03
for (itn = 0:maxitn) #row04
temp1 = A*xpq - y; temp2 = C*xpq; #row05
fpq = sum(abs(temp1).^p) + lambda*sum(abs(temp2).^q); #row06
g1 = p*A'*abs(temp1).^(p-1).*sign(temp1) + ...</p>
        <p>lambda*q*C'*(abs(temp2).^(q-1).*sign(temp2));
if((mod(itn,intp)==0)&amp;&amp;(intp&lt;=maxitn))</p>
        <p>printf(" itn %4d fp %14.6e\n",itn,fp);
endif
g = B'*g1; dg = norm(g);
if(r*dg &lt; epsf) ist = 1; return; endif
xi = (1.d0/dg)*g; dx = B * xi;
hs = r/(dn+1.d0); xpq -= hs * dx;
B += (beta - 1) * B * xi * xi';
r = r/sqrt(1.d0-1.d0/dn)/sqrt(1.d0+1.d0/dn);
endfor
ist = 4;
endfunction</p>
        <p>Core of the empq program is located in the for loop (rows 4–17). First, the value of the function
f (line 6) and its normalized subgradient at the point x pq (row 11) are calculated. If the stop condition
is satisfied (row 12), the algorithm stops its work. Stop in the empq algorithm occurs when a
condition r k‖Bk gf pq ( xk )‖≤ ε f is fulfilled, which is equivalent to condition f pq ( xk )− f *pq ≤ ε f .</p>
        <p>T
Otherwise, the next point xk +1 is calculated (row 14), the space transformation matrix Bk +1 (row 15)
and the radius r k +1 (row 16) are recalculated.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Total variation denoising</title>
      <p>
        In addition to machine learning (in particular, regression and classification models, SVM etc.),
models of the type (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) are commonly used in various applied fields, such as signal processing. This
domain belongs to an electrical engineering area and focuses on processing, analyzing and
synthesizing signals of different nature, such as scientific measurements, sounds, images, currency
fluctuations [15] etc.
      </p>
      <p>The problem of interest is total variation (TV) denoising (filtering), which is a noise removal
process. Its main principle claims that signals with excessive and probably incorrect details have
rather high total variation. The problem considered is to reduce the variation (and therefore
unnecessary details) subject to staying as close as possible to the original signal and preserving
important details, such as edges. This allows us to achieve, in particular, digital storage efficiency,
since signal preservation in such form requires much less memory, and analysis simplicity. The
concept described is known today as ROF model [16].</p>
      <p>
        The total variation problem can be formulated using the described model of the problem (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ). We
need to find approximation xi, i=1 , n, to known signal yi, i=1 , n, and the first summand of the
function f pq ( x ) denotes the closeness measure of them. The second summand represents the total
variation and can be defined in different ways using matrixC form. So, the total variation problem
could be rewritten using model (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) in the following way:
f pq ( x )=∑ | yi− xi|p+ λ ∑ |∑ cij x j| → min ,
n k n
i=1 i=1 j=1 x∈ Rn
q
(
        <xref ref-type="bibr" rid="ref10">10</xref>
        )
Here matrix A is of size n × n and equals to the identity matrix, λ ≥ 0 is a regularization parameter,
which is used to regulate intensity of denoising. For instance, if p=2, q=1, and we use the following
( n−1) × n-matrix C1:
      </p>
      <p>1
C1=(⋯
0
0
−1
1
⋯
⋯</p>
      <p>
        0
−1
⋯
0
⋯
⋯
⋯
1
0
⋯ )
0
−1
which is two-diagonal with band (
        <xref ref-type="bibr" rid="ref1">1 ,−1</xref>
        ) (all the other elements of which equal zero), model f pq ( x )
turns into the following model:
As a result of C1 x multiplication, we get vector C1 x=( x1− x2 , x2− x3 , … , xn−1− xn)T, which
consists of pairwise differences between consecutive elements of vector xi, i=1 , n. Thus, the second
summand in (
        <xref ref-type="bibr" rid="ref11">11</xref>
        ) represents the total variation, and since the function f 2,1 ( x ; C1) is to be minimized,
the vector C1 x is expected to be sparse if the parameter λ is big enough. So, we can regulate
denoising level using parameter λ: if λ=0, there is no smoothing and we just restore the signal yi,
i=1 , n, precisely. Otherwise, as λ → ∞, the total variation decreases, and filtering process works
more intensive. As a result of this process, we get piecewise constant function.
      </p>
      <p>Another form of matrix C that can be used is as follows:</p>
      <p>
        1
C2=(⋯
0
0
−2
1
⋯
⋯
Matrix C2 is tridiagonal matrix of size ( n−2) × n with band (
        <xref ref-type="bibr" rid="ref1 ref1">1 ,−2,1</xref>
        ). If p=2 and q=1, we get the
following model:
      </p>
      <p>n n−2
f 2,1 ( x ; C2)=∑ ( yi− xi)2+ λ ∑|xi−2 xi+1+ xi+2|.</p>
      <p>i=1 i=1
In this case, elements of sparse vector C2 x can be interpreted as difference analogues of the second
derivative, and we get piecewise-linear function as a result of filtering process.Moreover, if q=2, we
talk about smooth second derivative approximation, which is used to obtain smoother piecewise
function as a result.</p>
      <p>
        Total variation denoising technique is of particular interest in the field of image processing, where
noisy (stained) or corrupted images are to be cleaned and restored. TV-regularization allows to
smooth away noise and preserve edges, unlike famous linear smoothing or median filtering
techniques [17]. In this case, a 2D signal yi , j is considered, and the total variation, according to [16],
could be written in the following form:
(
        <xref ref-type="bibr" rid="ref12">12</xref>
        )
(
        <xref ref-type="bibr" rid="ref13">13</xref>
        )
(
        <xref ref-type="bibr" rid="ref14">14</xref>
        )
(
        <xref ref-type="bibr" rid="ref15">15</xref>
        )
or as its anisotropic version
      </p>
      <p>V ( y )=∑ √|yi+1, j− yi , j| +|yi , j+1− yi , j| ,</p>
      <p>2 2
i , j
V an ( y )=∑|yi+1, j− yi , j|+|yi , j+1− yi , j|.</p>
      <p>
        i , j
Such a problem still can be formulated using model (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) with p=2 and q=1. Common methods used
for solving this problem are primal-dual interior-point methods [18] or the split-Bregman method
[19]. It also should be noted that there is a popular approach based on reducing nonsmooth problems
(
        <xref ref-type="bibr" rid="ref11">11</xref>
        ) and (
        <xref ref-type="bibr" rid="ref12">12</xref>
        ) to saddle form. For instance, we can rewrite (
        <xref ref-type="bibr" rid="ref11">11</xref>
        ) as
2
min max (‖x− y‖2+(C1 x , p)) ,
x∈ Rn p∈ B∞
where B∞={p∈ Rn : max|pi|≤ λ}. Then, to obtain approximate solution of the smooth minmax
1≤i ≤ n
problem one could use extragradient type methods [20, 21] or proximal primal-dual algorithms [22].
      </p>
      <p>Since the problems considered remain convex (but not necessarily smooth) if p ≥ 1 and
q ≥ 1, some efficient nonsmooth optimization methods can be used for their solving, such as the
ellipsoid method (see Section 3) or Shor’s r-algorithms [12].</p>
    </sec>
    <sec id="sec-5">
      <title>5. Computational experiments</title>
      <p>In this section, we present the results of computational experiments obtained using the empq
algorithm for 1D TV denoising problem.</p>
      <p>Test 1. Let us consider piecewise constant function f 1 ( x ), domain of which is divided into three
intervals [ 0,40 ], [ 40,70 ], and [ 70,100 ]. For each of the intervals j, j=1,3, value of the function
f 1 ( x ) is calculated via the formula</p>
      <p>
        f 1j ( x )=randint j (30)+ sign (u ) (|u|+3) ,
where randint j (30) denotes a random integer from the interval [ 1,30 ], and u is a random number
from the uniform distribution on the interval [−1,1]; sign and |∙| are the signum function and the
absolute value function respectively. Such construction of the range of the function f 1 ( x ) means that
to each of the constant functions of the intervals j, j=1,3, a random noise is added. To restore the
original piecewise constant function f 1 ( x ), model (
        <xref ref-type="bibr" rid="ref11">11</xref>
        ) is used, which corresponds to the model (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
with p=2, q=1 and C =C1. Results of the empq algorithm work for this problem with two values
of λ are presented in Figure 2.
      </p>
      <p>In Figure 2, the initial piecewise constant function is colored red, its noised version f 1 ( x ) is
colored dark blue, and its denoised version, obtained using the empq algorithm, is colored green. As
can be seen, with λ=10 we get the restored curve being close to the initial noisy curve; it strives to
duplicate ups and downs of the second showing its trend. On the contrary, using parameter value
λ=50 allows one to restore the initial piecewise constant function; selection of λ makes it possible to
regulate the intensity of noise filtering.</p>
      <p>Test 2. Consider the following piecewise linear function f 2 ( x ):</p>
      <p>x , if x∈ [ 1,30 ]
f 2 ( x )={−3 x +120 , if x∈ [ 31,60 ] .</p>
      <p>
        −60 , if x∈ [ 61,80 ]
2 x−220 , if x∈ [ 81,100 ]
To simulate noise, we use the same value sign (u) (|u|+ 4 ) from the previous test, where u is a random
number from the uniform distribution on the interval [−1,1]. We use model (
        <xref ref-type="bibr" rid="ref12">12</xref>
        ), i.e. parameters
p=2, q=1 and C =C2. Results of filtering of noisy piecewise linear function using the empq
algorithm for two values of λ are presented in Figure 3. Here, the initial piecewise linear function is
colored red, its noisy version f 2 ( x ) is colored blue, and the denoised version of the initial function is
colored green.
      </p>
      <p>
        As can be seen from Figure 3, using of matrix C2 works better, if a piecewise linear function is
denoised rather than a piecewise constant one. With λ=10 we observe how the denoised curve tends
to duplicate the noised one (left picture in Figure 3), but with λ=50 we get denoised piecewise linear
function being much closer to the initial one (right picture in Figure 3). It should be also noted that
choosing parameter q=2 in model (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) means using smooth 2-derivative approximation, so the
denoised curve obtained has edges smoothed away. So, using parameters p and q, as well as
regularization coefficient λ and matrix C allows one to control the level of denoising process.
      </p>
      <p>
        Test 3. Along with a bit synthetical examples, which are though quite important for
demonstration of structure of the models described, let us consider results of computational
experiments with real datasets, namely bitcoin price evolution. The dataset taken has been collected
from the Binance API with open price data captured at one-minute intervals from 11:40 to 13:19 on
01/08/2023. We use model (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) with p , q=1,2, λ∈ {10,30,50,70 }, and matrix C ={C1 , C2}. The
results of using the empq algorithm for given bitcoin dataset denoising is presented in Figure 4 and
Figure 5. Here, bitcoin open price data is colored blue, and denoised curve is colored green.
      </p>
      <p>As mentioned before, using matrix C =C1 we obtain piecewise constant function as a result of
denoising process (see Figure 4), and using matrix C =C2 we get piecewise linear function
approximation (see Figure 5), which fits the data much better. In the first case, further increase ofλ
leads to ignoring important details and edges in data, since the resulting curve tends to be just
constant. In the second case, this tendence remains, but we get more details preserved in the ups and
downs of a piecewise linear function. Finally, using smooth 2-derivative with p=2,
q=2 and matrix C =C2 gives us smoothed curve as a result (see Figure 6).</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>The use of different smooth and nonsmooth regression models is rather common not only in
modern areas of artificial intelligence and machine learning, but also in a number of applied fields,
such as image processing and compression, compressed sensing, noise reduction etc. Problems that
appear in these fields are often formulated as nonsmooth or convex optimization problems, so
corresponding methods of that kind can be applied.</p>
      <p>Proposed empq algorithm, which is based on the ellipsoid method, is capable of solving convex
optimization problems corresponding to total variation denoising problems if the number of points in
signal sample is not greater than n=100 ÷ 120. This number is possible to be increased to some level
via using more advanced computing capabilities. The general model considered allows one to
regulate denoising level and smoothness of a resulting curve by changing parameters p, q, coefficient
λ and choosing the proper form of matrix C to determine interconnections between elements of
variables vector in regularization summand. Results of computational experiments demonstrate the
prospect and flexibility of such models.</p>
      <p>As mentioned before, for solving the problems considered it is appropriate to use other types of
methods, such as splitting methods [23], various extragradient schemes, Shor’s r-algorithms [11, 12]
etc. The last ones are based on using space transformation procedure in the direction of two
successive subgradients and provide accelerated convergence while minimizing nonsmooth convex
functions of thousands of variables. But the main advantage of the algorithm proposed is obtaining a
solution with any given accuracy, whereas other methods are usually limited in this aspect.
Nevertheless, the applications of these methods to the problem considered are of a great interest;
investigation and comparison of methods efficiency will be conducted in further publications.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>
        The paper is supported by the NASU grant for research laboratories/groups of young scientists
№02/01-2024(
        <xref ref-type="bibr" rid="ref5">5</xref>
        ), and the DTT TS KNU NASU project № 0124U002162.
      </p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>I.V.</given-names>
            <surname>Sergienko</surname>
          </string-name>
          ,
          <source>Methods of Optimization and Systems Analysis for Problems of Transcomputational Complexity</source>
          , Springer, New York, NY, USA,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hastie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Friedman,</surname>
          </string-name>
          <article-title>The Elements of Statistical Learning: Data Mining, Inference, and</article-title>
          <string-name>
            <surname>Prediction</surname>
          </string-name>
          , 2nd ed., Springer Series in Statistics, Textbook, Springer, Berlin/Heidelberg, Germany,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          , 
          <article-title>Digital image processing</article-title>
          , New York, NY, Pearson,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Tibshirani</surname>
          </string-name>
          , Robert, Regression Shrinkage and
          <article-title>Selection via the lasso</article-title>
          ,
          <source>Journal of the Royal Statistical Society, Series B (methodological) 58 (1)</source>
          (
          <year>1996</year>
          )
          <fpage>267</fpage>
          -
          <lpage>288</lpage>
          . doi:10.1111/j.2517-
          <fpage>6161</fpage>
          .
          <year>1996</year>
          .tb02080.x. JSTOR 
          <fpage>2346178</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Chen</surname>
            , Shaobing, Donoho,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Basis pursuit</article-title>
          , in: 
          <source>Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers</source>
          , Vol. 
          <volume>1</volume>
          ,
          <issue>1994</issue>
          , pp. 
          <fpage>41</fpage>
          -
          <lpage>44</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACSSC.
          <year>1994</year>
          .
          <volume>471413</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Hoerl</surname>
          </string-name>
          , R. W. Kennard,
          <article-title>Ridge regression: Biased estimation for nonorthogonal problems</article-title>
          ,
          <source>Technometrics 12 (1)</source>
          (
          <year>1970</year>
          ) 
          <fpage>55</fpage>
          -
          <lpage>67</lpage>
          . doi:10.1080/00401706.
          <year>1970</year>
          .
          <volume>10488634</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kennedy</surname>
          </string-name>
          , A Guide to Econometrics, Fifth ed., Cambridge, The MIT Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.J.</given-names>
            <surname>Huber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.M.</given-names>
            <surname>Ronchetti</surname>
          </string-name>
          , Robust Statistics,
          <source>John Wiley &amp; Sons, 2nd Edition</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Fan</surname>
          </string-name>
          , P. Hall,
          <article-title>On curve estimation by minimizing mean absolute deviation and its implications</article-title>
          ,
          <source>The Annals of Statistics</source>
          (
          <year>1994</year>
          )
          <fpage>867</fpage>
          -
          <lpage>885</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.Z.</given-names>
            <surname>Shor</surname>
          </string-name>
          ,
          <article-title>Cutting-off Method with Space Dilation for Solving Convex Programming Problems</article-title>
          ,
          <string-name>
            <surname>Cybernetics</surname>
          </string-name>
          (
          <year>1977</year>
          )
          <fpage>94</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.Z.</given-names>
            <surname>Shor</surname>
          </string-name>
          ,
          <source>Nondifferentiable Optimization and Polynomial Problems</source>
          , Kluwer, Amsterdam,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N.Z.</given-names>
            <surname>Shor</surname>
          </string-name>
          , Minimization Methods for
          <string-name>
            <surname>Non-Differentiable</surname>
            <given-names>Functions</given-names>
          </string-name>
          , Berlin, SpringerVerlag,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Martin</surname>
            <given-names>Grötschel</given-names>
          </string-name>
          , László Lovász, Alexander Schrijver,
          <article-title>Geometric Algorithms</article-title>
          and Combinatorial Optimization,
          <source>Second Edition</source>
          , Springer-Verlag, Berlin, Heidelberg,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Stetsyuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Khomyak</surname>
          </string-name>
          ,
          <source>The Generalized Ellipsoid Method and Its Implementation</source>
          , Communications in Computer and Information Science (
          <year>2020</year>
          )
          <fpage>355</fpage>
          -
          <lpage>370</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          - 38603-0_
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Alan</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Oppenheim</surname>
          </string-name>
          , Ronald W. Schafer, 
          <string-name>
            <surname>Discrete-Time Signal</surname>
            <given-names>Processing</given-names>
          </string-name>
          , Prentice Hall,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L. I.</given-names>
            <surname>Rudin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Osher</surname>
          </string-name>
          , E. Fatemi,
          <article-title>Nonlinear total variation based noise removal algorithms</article-title>
          , Physica D. 
          <volume>60</volume>
           (
          <issue>1-4</issue>
          ) (
          <year>1992</year>
          ) 
          <fpage>259</fpage>
          -
          <lpage>268</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D.</given-names>
            <surname>Strong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <article-title>Edge-preserving and scale-dependent properties of total variation regularization</article-title>
          ,
          <source>Inverse Problems 19 (6)</source>
          (
          <year>2003</year>
          ) 
          <fpage>S165</fpage>
          - 
          <fpage>S187</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mehrotra</surname>
          </string-name>
          ,
          <article-title>On the Implementation of a Primal-Dual Interior Point Method, </article-title>
          <source>SIAM Journal on Optimization 2 (4)</source>
          (
          <year>1992</year>
          ) 
          <fpage>575</fpage>
          -
          <lpage>601</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T.</given-names>
            <surname>Goldstein</surname>
          </string-name>
          , S. Osher, 
          <article-title>The Split Bregman Method for L1-Regularized Problems, </article-title>
          <source>SIAM J. Imaging Sci. 2 (2)</source>
          (
          <year>2008</year>
          ) 
          <fpage>323</fpage>
          -
          <lpage>343</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Yu</surname>
          </string-name>
          . Malitsky,
          <article-title>Projected Reflected Gradient Methods for Monotone Variational Inequalities</article-title>
          ,
          <source>SIAM Journal on Optimization</source>
          <volume>25</volume>
          (
          <year>2015</year>
          )
          <fpage>502</fpage>
          -
          <lpage>520</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>S. V.</given-names>
            <surname>Denisov</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
           
          <string-name>
            <given-names>A.</given-names>
            <surname>Nomirovskii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. V.</given-names>
            <surname>Rublyov</surname>
          </string-name>
          , V. 
          <string-name>
            <given-names>V.</given-names>
            <surname>Semenov</surname>
          </string-name>
          ,
          <article-title>Convergence of Extragradient Algorithm with Monotone Step Size Strategy for Variational Inequalities and Operator Equations</article-title>
          ,
          <source>Journal of Automation and Information Sciences</source>
          <volume>51</volume>
          (
          <year>2019</year>
          )
          <fpage>12</fpage>
          -
          <lpage>24</lpage>
          . doi:
          <volume>10</volume>
          .1615/JAutomatInfScien.v51.
          <year>i6</year>
          .
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chambolle</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Pock</surname>
          </string-name>
          ,
          <article-title>A first-order primal-dual algorithm for convex problems with applications to imaging</article-title>
          ,
          <source>J. Math. Imaging. Vis</source>
          .
          <volume>40</volume>
          (
          <issue>1</issue>
          ) (
          <year>2011</year>
          )
          <fpage>120</fpage>
          -
          <lpage>145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>G.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Splitting Methods for Nonconvex and Nonsmooth Optimization</article-title>
          , Encyclopedia of Optimization (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -54621-2_
          <fpage>847</fpage>
          -
          <lpage>1</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>