<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ProfIT AI</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Using of Ellipsoid Method and Linear Regression with L1- Regularization for Medical Data Investigation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Petro Stetsyuk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viktor Stovba</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivan Senko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Illya Chaikovsky</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>V.M. Glushkov Institute of Cybernetics of the NASU</institution>
          ,
          <addr-line>Academician Glushkov Avenue, 40, Kyiv, 03187</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>4</volume>
      <fpage>25</fpage>
      <lpage>27</lpage>
      <abstract>
        <p>The problem of finding of parameters of linear regression model with !-regularization and the least moduli criterion with 1 ≤  ≤ 2 is considered. To solve the problem the Shor's ellipsoid method is used, which is implemented as the emlmpr algorithm. A series of three computational experiments is conducted, which demonstrate solving time of the emlmpr algorithm and robustness of the least moduli criterion if  is close to 1. The third experiment considers situation when the model contains linearly dependent features and shows the effect of !-regularization on the quality of solutions obtained.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;linear regression</kwd>
        <kwd>least moduli criterion</kwd>
        <kwd>!-regularization</kwd>
        <kwd>non-smooth optimization problem</kwd>
        <kwd>Shor's ellipsoid method</kwd>
        <kwd>dependent factors</kwd>
        <kwd>data prediction 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Regression models are an extremely prevalent tool for effective prediction both in machine learning
and artificial intelligence in general. Applying of linear regression models for building effective
forecasting models, which describe linear relationships between factors, in such fields as statistics,
medicine, economics, ecology, identification of parameters of complex systems etc. is studied and
investigated. This type of models has proven themselves to be flexible in construction and to provide
clear interpretation of relationships between dependent variable and model factors, sometimes even
outperforming more complex nonlinear models [1].</p>
      <p>When working with regression models, it is rather important to choose correct criteria for
estimating model parameters. The most well-known and common variants are the criterion based on
least squares and based on least moduli. Effectiveness of the first variant is confirmed by theoretical
studies [2] and numerous statistical experiments. Nevertheless, one of the most significant
disadvantages of the least squares criterion is the increase of the effect of large errors when they are
squared, which makes the model extremely sensitive to anomalous observations (or outliers). An
important condition for using this criterion is the standard normal distribution of model errors,
which is not always fulfilled in practice. A well-known and effective alternative to this criterion is
the criterion based on the least moduli, which is robust to outliers [3, 4] and assumes a Laplacian
distribution of model errors.</p>
      <p>Another important aspect of work with linear regression models is the presence of dependencies
between two or more factors of a model, which negatively affect the quality of the obtained
parameter estimates. Usually, such dependencies are detected at the stage of data preprocessing and
model building by selecting optimal set of model factors that best describe relationship between the
dependent variable and the factors. However, in practice, situations often occur when a certain group
of factors collectively affects the dependent variable. As a result, both the criterion based on the least
squares and the least moduli incorrectly determines parameters of the model, often significantly
overestimating or underestimating them. Therefore, it is expedient to develop methods and criteria
that make it possible to detect such dependencies between factors and make their coefficients be
close to zero. One of the most famous so-called shrinkage methods in machine learning [1] is
regularization approach that permits to balance the model and reduce the effect of dependent factors
on the quality of parameter determination.</p>
      <p>The article is dedicated to applying of the Shor’s ellipsoid method for finding parameters of a
linear regression model with !-regularization and the least moduli criterion with
1 ≤  ≤ 2. This criterion includes the use of the least moduli ( = 1) and the least squares
( = 2) criteria, as well as allows to use any value of the parameter . Certain work results of
applying the ellipsoid method for this type of problems are given in [5].</p>
    </sec>
    <sec id="sec-2">
      <title>2. Finding of linear regression model parameters using the least</title>
      <p>moduli criterion powered to p
Let us consider a classical linear regression problem: to find  unknown parameters !, … , " with
known observations (#, #), # = (#!, #$, … , #") ∈ ", # ∈ ,  = 1444,444, which are related as
follows:</p>
      <p>"
# = 5</p>
      <p>#%% + #,  = 1444,444,
.(̅) =
⎧ 5
⎪
⎪
'
#&amp;!</p>
      <p>"
 O5
%&amp;!
#% %̅ − #P C5</p>
      <p>#% %̅ − #C
"
%&amp;!
) /!</p>
      <p>⎫
#!,⎪</p>
      <p>
        ⎪
%&amp;!
where #% are known coefficients, # are unknown random variables, which have (approximately)
the same distribution functions,  &gt; . The equation (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) can be rewritten in matrix form
 =  + , (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
where  = (!, … , ')( ∈ ' and  = (!, … , ')( ∈ ' are -dimentional vectors,  is a  ×
matrix,  = (!, … , ")( ∈ " is a -dimentional vector that is to be evaluated.
      </p>
      <p>The least moduli method powered to , which corresponds to finding the unknown vector )∗
according to the least moduli criterion powered to  (1 ≤  ≤ 2), is a mathematical programming
problem:
∗ = =)∗ &gt;= +m∈in" B() = 5
#&amp;! %&amp;!
where |∙| is an absolute value of a number. The function () is non-smooth, if  = 1 and smooth,
if  &gt; 1.</p>
      <p>
        The problem (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) is a problem of unconditional minimization of the convex function (),
subgradient of which at the point ̅ is calculated using the following formula:
#&amp;! %&amp;!
The problem (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) is a problem of unconditional minimization of convex piecewise-linear function
!(), which corresponds to the least moduli method, which has proven to be robust to anomalous
observations or outliers [3, 6]. Finding the best according to the least moduli criterion vector ∗,
where ∗ is a solution of the problem (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ), can be formulated as the following LP-problem: to find
!∗ = +m∈in" B!() = 5
subject to
# − 5
#%% ≤ #, − # + 5
#%% ≤ #,  = 4144,444.
      </p>
      <p>
        (
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
$∗ = +m∈in" V$() = 5
      </p>
      <p>
        %&amp;! %&amp;!
For solving the LP-problem (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ) one can use appropriate standard linear programming tools. At the
same time, as we find the vector ∗ we find optimal values of the vector ∗ = (!∗, … , '∗)( as well,
elements of which define estimates for independent random variable #,  = 1444,444.
      </p>
      <p>
        If  = 2 the problem (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) can be written as the following mathematical programming problem:
' " $
      </p>
      <p>
        O# − 5 #%%P W. (
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
#&amp;! %&amp;!
The problem (
        <xref ref-type="bibr" rid="ref7">7</xref>
        ) is a problem of unconditional minimization of a convex quadratic function $(),
which corresponds to the least squares method. Linear independency of the rows of the matrix 
provides existence of an analytical solution ∗ = (( )/!(  of the problem (
        <xref ref-type="bibr" rid="ref7">7</xref>
        ). Otherwise, if rows
of the matrix  are linearly dependent or  &gt; , it is impossible to obtain an analytical solution. In
that case one can use methods for balancing the model, in particular, regularization.
      </p>
      <p>
        Let us consider the problem (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) with !-regularization:
      </p>
      <p>
        ' " "
)∗ = ) =)∗ &gt;= +m∈in" B) () = 5
#&amp;! %&amp;! %&amp;!
The problem (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ) is a problem of unconditional minimization of a convex piecewise-linear function
) (). Here  is a regularization parameter, and if  = 0 the function ) () coincides with the
function (). To calculate the subgradient of the function ) () at the point ̅ one can use the
following formula:
      </p>
      <p>
        .#(̅) = .(̅) +  (̅), (
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
where .(̅) is calculated using the expression (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ).
      </p>
      <p>
        For solving the problem (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ) the Shor’s ellipsoid method [7, 8, 9] can be used, which is implemented
as the emshor program [10]. We will apply it for the problem of the function ) () minimization,
providing that its minimum point )∗ is localized in -dimensional ball with radius 4, which is
centered at the point 4 ∈ ", i.e. \4 − )∗ \ ≤ 4. The algorithm to be used is called emlmpr,
description of which is given below.
      </p>
      <p>)
#%%C +  5</p>
      <p>Y%YE.</p>
      <p>
        (
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. The emlmpr algorithm and its Octave implementation</title>
      <p>The input parameter of the algorithm is . &gt; 0 – accuracy, with which )∗ = ) =)∗ &gt;is to be found.</p>
      <p>Initialization. Let us consider  × -matrix  and set 4 ≔ ", where " is  ×  identity matrix.
We go to the first iteration with values 4, 4 and 4. Let values 5 ∈ ", 5, 5 be found at the
iteration . Passing to the iteration  + 1 consists of the following sequence of actions.</p>
      <p>
        Step 1. Calculate ) (5) and subgradient .#(5) at the point 5 using formula (
        <xref ref-type="bibr" rid="ref9">9</xref>
        ). If
5 a5( .#(5)a ≤ ., then “Stop: ∗ =  and )∗ = 5”. Otherwise, go to the step 2.
      </p>
      <p>Step 2. Set 5 ≔ 6$%7&amp;#(+$) .</p>
      <p>:6$%7&amp;#(+$):
Step 3. Calculate the next point
Step 4. Calculate</p>
      <p>5;! ≔ 5 − ℎ555, where ℎ5 = ";!! 5.
5;!: = 5 + Oe"";/!! − 1P (55)5(
" .
and 5;!: = 5 √"'/!
Step 5. Go to the iteration  + 1 with values 5;!, 5;!, 5;!.</p>
      <p>Theorem. Sequence of points {5}55&amp;∗4 satisfy the following inequalities:</p>
      <p>\5/!=5 − )∗ &gt;\ ≤ 5,  = 0,1,2, … , ∗.</p>
      <p>On each iteration  &gt; 0 the value of decreasing of volume of the ellipsoid 5 =
i  ∈ ": \5/!(5 − )\ ≤ 5k, which localizes point )∗ , is constant and equal to
 =</p>
      <p>(5)
(5/!)
= p
 − 1</p>
      <p>q
 + 1 √$ − 1</p>
      <p>"
s &lt;  v−</p>
      <p>1
2( + 1)
w &lt; 1.</p>
      <p>
        Theorem implies the fact that the algorithm of finding )∗ can be successfully run on modern
computers, if  = 10 ÷ 30 and  = 100 ÷ 1000. Indeed, to decrease in 10 times volume of the
ellipsoid localizing the point )∗ , it is needed to perform  iterations, where  = =" !4
≈
=" &gt;
(2  10)( + 1) ≈ 4.6( + 1). It means that in order to improve deviation of found record value of
the function ) () from its optimal value )∗ by 10 times, it is necessary to perform 4.6( + 1)$
iterations of the algorithm for finding )∗ .
language. Its code is given below.
formula (
        <xref ref-type="bibr" rid="ref9">9</xref>
        ) allows to provide fast algorithm work on modern computers.
      </p>
      <p>If  = 30 and . = 10/? × (4), then the maximal number of iterations of the algorithm is equal
to 4.6( + 1)$ = 46 × 961 = 44206. Therefore, even the straight-up matrix-vector
implementation of calculation of the function ) () value and its subgradient according to the</p>
      <p>The algorithm emlmpr for finding an approximation to the point )∗ is implemented using Octave
# Input parameters: #com01
# A(m,n) – observation matrix; #com02
# y(m,1) – vector of tags (output vector); #com03
# p – power for least moduli criterion, 1&lt;=p&lt;=2; #com04
# lambda – regularization rate; #com05
# x0(n,1) – starting point; #com06
# r0 – radius of the ball centered at x0 that localizes x_p^*; #com07
# epsf, maxitn – stop parameters: #com08
# epsf – precision to stop by the value of the function fp, #com09
# maxitn – maximal number of iterations; #com10
# intp – print information for every intp iteration. #com11
# Output parameters: #com12
# xp(n,1) – approximation to x_p^*; #com13
# fp – the value of the function f_R at the point xp; #com14
# itn – the number of iterations; #com15
# ist – exit code: 1 – epsf, 4 – maxitn. #com16
function [xp,fp,itn,ist] = emlmpr(A,y,p,lambda,x0,r0,</p>
      <p>epsf,maxitn,intp); #row01
n = columns(A); xp = x0; B = eye(n); r = r0; #row02
dn = double(n); beta = sqrt((dn-1.d0)/(dn+1.d0)); #row03
for (itn = 0:maxitn) #row04
temp = A*xp-y; fp = sum(abs(temp).^p) + lambda*sum(abs(xp)); #row05
if((mod(itn,intp)==0)&amp;&amp;(intp&lt;=maxitn)) #row06</p>
      <p>printf(" itn %4d fp %14.6e\n",itn,fp); #row07
endif #row08
g1 = p*A'*(sign(temp).*(abs(temp)).^(p-1)) + lambda*sign(xp);#row09
g = B'*g1; dg = norm(g); #row10
if(r*dg &lt; epsf) ist = 1; return; endif #row11
xi = (1.d0/dg)*g; dx = B * xi; #row12
hs = r/(dn+1.d0); xp -= hs * dx; #row13
B += (beta - 1) * B * xi * xi'; #row14
r = r/sqrt(1.d0-1.d0/dn)/sqrt(1.d0+1.d0/dn); #row15
endfor #row16
ist = 4; #row17
endfunction #row18
5;! (row 15) are recalculated.</p>
      <p>Core of the emlmpr program is the for loop (rows 4–16). First, the value of the function  (line 5)
and its normalized subgradient at the point ) (row 10) are calculated. If the stop condition is satisfied
(row 11), the algorithm stops its work. Stop in the emlmpr algorithm occurs when a condition
5 a5( .#(5)a ≤ . is fulfilled, which is equivalent to condition ) (5) − )∗ ≤ .. Otherwise, the
next point 5;! is calculated (row 13), the space transformation matrix 5;! (row 14) and the radius</p>
    </sec>
    <sec id="sec-4">
      <title>4. Computational experiments without regularization</title>
      <p>
        To demonstrate the effectiveness of the emlmpr algorithm work we present results of three
computational experiments conducted for solving the problem (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ). For the first and the second
experiments parameters  = 30 and  = 10 ×  = 300. The purpose of the first experiment is to
estimate time of solving the problem (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ) for specified parameters on a personal computer with Intel
Core i7-10750H processor (2.6 GHz), and 16 Gb RAM. The purpose of the second experiment is to
demonstrate robustness of the least moduli method, and therefore solutions of the problem (
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
without regularization ( = 0), if  is close to one. Third experiment is dedicated to finding
parameters of linear regression model using real medical data for further prediction psychological
indicators.
      </p>
      <p>All the calculations are performed on a computer with Intel Core i7-10750H processor (2.6 GHz),
16 Gb RAM in Windows 10/64 system using GNU Octave, version 6.3.0. For the first two experiments
regularization parameter  is chosen equal to zero.</p>
      <p>
        Test example 1. For the first experiment input data for the problem (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ) are matrix  and vector
, which are generated randomly with a standard uniform distribution according to the following
formulas: A = 10*rand(m,n), y = A*xstar(n,1), xstar(n,1) =
round(10*rand(n,1) + 0.5). Starting point is chosen according to the rule
x0(n,1) = round(5*rand(n,1)), and radius of the sphere, in which the point )∗ = @ABC is
located, is chosen according to the rule r0 = 5*norm(x0 – xstar), i.e. 4 = ‖4 − @ABC‖.
The first experiment is implemented by the following Octave code.
# Test 1: emlmpr running time for n = 30 and m = 300
n = 30, m = 10*n,
rand("seed", 2024);
A = 10*rand(m,n);
xstar = round(10.0*rand(n,1) + 0.5); y = A*xstar;
x0 = round(5.0*rand(n,1)); r0 = 5*norm(x0 - xstar),
maxitn = 50000, intp = 10000, lambda = 0.0,
# running the emlmpr algorithm for p=1.0;1.1.2;1.5;1.8;2.0
printf("\n Test 1: emlmpr runnning time for n = 30 and m = 300 \n");
epsf0 = 1.e-6; ntest = 5; table = [];
for (i = 1:ntest)
p = 1.d0 + (i - 1.d0)/(ntest - 1.d0),
epsf = epsf0**(p); time0 = time();
[xp,fp,itn,ist] = emlmpr(A,y,p,lambda,x0,r0,epsf,maxitn,intp);
time1 = time() - time0,
dx = norm(xp - xstar);
table = [table; p epsf time1 itn ist fp dx];
itn, fp,
endfor
n,m,
printf(" p epsf time itn ist fp dx \n");
for (i = 1:ntest)
printf(" %4.1f %6.1e %4.2f %6d %2d
      </p>
      <p>table(i, 1:7))
endfor
%10.5e
%10.1e\n",</p>
      <p>
        Results of the emlmpr program work for the first experiment are  required to solve the
problem (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ) with accuracy ., the number of iterations  of the method, the minimum value of the
function ) found, norm of deviation  of the found approximation to the minimum point from the
known minimum point xstar are given in Table 1. Here . is chosen as follows: if  = 1 the value
. = 10/?, if  &gt; 1 we choose . = (10/?)) .
Results of solving the problem (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ) with  = ,  =  and  =
      </p>
      <p>It is easy to see from Table 1 that to get solution with accuracies 10/? ÷ 10/!$ for different  the
emlmpr algorithm requires approximately 40 000 iterations and no more than 7 seconds of time.
The least deviation  equals 2.3e–11 and is obtained for  = 1.</p>
      <p>
        Test example 2. The purpose of the second experiment is to demonstrate robustness of the least
moduli method, which means that the same robustness will characterize solutions of the problem (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ),
if  is close to one. Here, the matrix , the starting point 4, ball radius 4 are chosen to be the same
as in the first test, the vector  is adjusted so that its odd components remain the same as in the first
test,
and
even
components
are
multiplied
by
the
value
q = (1.0 + 1.0*sign(0.5 - rand)). Thus, even components of the vector  can be
considered anomalous (incorrect) results of observations.
which indicates successful completion of the program.
      </p>
      <p>Calculation results for  = 30 and  = 300 are given in Table 2. Here,  is an exit code of the
emlmpr program,  is a norm of deviation of found approximation to the minimum point from the
point xstar. The 5th column contains values of the function ) at the found point ) , the 7th
column contains the -th root of the 5th column. For all the values of the parameter  code  = 1,</p>
      <p>Results of Table 2 show that the function value C grows as the parameter  increases: from
1.34e+05 if  = 1 to 1.09e+08 if  = 2. Deviation  of the solution found from the minimum point
with  = 1 is significantly smaller than if  &gt; 1, which confirms robustness of the least moduli
method corresponding to  = 1 situation. It is important to emphasize that this situation is typical
for all the values of the parameter  close enough to 1. Time used for finding solutions for each of
the parameter  values does not exceed 4 seconds.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Computational experiments with regularization</title>
      <p>To show effectiveness of the emlmpr algorithm applied to real data we consider the problem of
prediction of psychological indicators of the patient's condition based on cardiological data obtained
using complex [11]. There were 90 patients studied with more than 200 features including
cardiological and basic ones (like age and ordinal number). Willing to exclude choice of categorial
features recoding method from analysis so we are omitting categorial feature as well as ordinal.
Practically, usage of ordinal features instead of numerical could increase the quality of linear
modelling, see [12], however, we need to simplify experiment in order to research only the ellipsoid
method usage. While ability of the medical complex [11] to create binning good enough for the linear
modelling is out the scope of the current research. So, we are taking just 175 numerical features that
we have. Then, we apply the feature selection procedure to test the ellipsoid method on the dataset
being optimal at least at some sense.</p>
      <p>We want to select features that describe relationship between medical and psychological data in
the best way using the $ metric [1]. While the goal of the studying the medical data includes feature
interpretability, we take these data as is. In other words, we do not make transformations like PCA
and similar ones to get linear independent features. Undoubtedly, it is possible to get some
interpretation even after the transformations, but our approach is to take features as is. Taking into
account that internal metrics for feature importance in the case of linear regression model work are
the best when features are either linearly independent or have normal distributions at least, we
cannot rely on internal linear regression metrics, so we try to use “wrapper” approach for the model
feature selection [13]. For the quality metric, we use 5-fold cross-validation [1]. Since the initial
dataset holds missing values, we use simple imputation via median strategy using only training
subsample to avoid distortion due to the whole-set median calculation. Moreover, in our situation
the initial number of features, which is 200, is greater than number of observations, which is 90, so
we start from the first feature, increase number of features until the quality metric $ stops to grow.
Also, we consider non-transformed features to decrease the number of experiments to perform and
the variability of the whole scheme. Selection of the optimal transformation is an additional task,
which is out of scope of the current paper. In general, the feature selection procedure is described at
Figure 1.</p>
      <p>The calculations for feature selection are made in Python 3 [14] using Google Colab with
Sequential Feature Selection and Linear Regression classes with embedded $-metric taken from
Scikit-learn library [15]. We also used Pandas library [16] for keeping feature names during
calculations.</p>
      <p>
        The observation matrix  consists of values of the following 16 numerical features for 90 patients:
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) observation number; (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) amplitude  () (wd. II); (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) amplitude  () (wd. III); (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) amplitude 
() (wd. III); (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) amplitude  () (wd. AvL); (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ) amplitudes ⁄ ratio (wd. II); (
        <xref ref-type="bibr" rid="ref7">7</xref>
        ) amplitude  ()
(wd. AvF); (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ) LFn; (
        <xref ref-type="bibr" rid="ref9">9</xref>
        ) amplitude  () (wd. AvF); (
        <xref ref-type="bibr" rid="ref10">10</xref>
        ) ECG phase ratio index; (
        <xref ref-type="bibr" rid="ref11">11</xref>
        ) state of regulation
reserves; (
        <xref ref-type="bibr" rid="ref12">12</xref>
        ) withdrawal code AvR_init; (
        <xref ref-type="bibr" rid="ref13">13</xref>
        ) comprehensive assessment of occurrence of significant
cardiovascular events_init; (
        <xref ref-type="bibr" rid="ref14">14</xref>
        ) functional condition according to Baevsky; (
        <xref ref-type="bibr" rid="ref15">15</xref>
        ) withdrawal code
I_univ; (
        <xref ref-type="bibr" rid="ref16">16</xref>
        ) HFn; (17) target: Beck anxiety scale. The last feature is target and is to be predicted.
      </p>
      <p>To determine parameters of linear regression model and further prediction the emlmpr algorithm
is used with parameter  = 1 and  = 2, where the first case corresponds to the least moduli method,
and the second case corresponds to the least square method. The observation matrix  is as follows:
problem D∗ (line 6) for four accuracies and two values of the parameter .</p>
      <p>Results of the emlmpr program work is given in Table 3. It contains problem solving time (line
3), the number of iterations (line 4), value of the function at the point D∗ (line 5) and solution of the</p>
      <p>Table 4 shows that to solve the problem with  = 1 with . = 10/? and . = 10/$4 the emlmpr
program requires approximately 8 thousand operations. If we use  = 1 for the same accuracies
11700 iterations are required, and their number is increased to 29719 iterations when using . =
10/F4. The ) value for fixed  remains unchanged.</p>
      <p>As it can be seen from Table 4, the emlmpr program successfully finds linear regression model
coefficients when using  = 0 (see Table 5). However, some of the coefficients are rather larger than
others (bold values in Table 5), which can indicate presence of dependency between the following
features in the observation matrix. To reduce their effect on the quality of coefficients restoration
we apply !-regularization, which allows to set model parameters corresponding to dependent
columns to zero. In practice, it is difficult to obtain exactly zero values of the corresponding
parameters, so we have to settle for values close to zero with a certain accuracy.</p>
      <p>Table 6 contains coefficients of linear regression model found by the emlmpr program with  =
1.0; 2.0, different accuracies . and regularization rate  = 0.1. Corresponding values to large
coefficients from Table 5, as well as any changes in coefficients digits are highlighted in bold. It is
easy to see that now these coefficients are rather close to zero with sufficient accuracy: 10/$ for the
feature 7 with any values of  and ., 10/G for the feature 14 with  = 1 and . = 10/? and even
10/$H for the feature 16 with  = 2 and . = 10/F4. The rest of the coefficients remained almost
unchanged except several digits. It is also worth noting that increasing of the regularization rate
leads to decreasing coefficients values of dependent features even more. It gives an instrument to
adjust the impact of regularization and obtain coefficients at dependent features close enough to
zero, thus improving quality of the solutions obtained.</p>
      <p>The prediction results obtained using the model with parameters calculated with the emlmpr
algorithm show that using the least moduli method ( = ) we obtain many more zero values (which
means that solution is found with required accuracy) than in case of using the least square method
( = ). Thus, using  =  is more appropriate than  = .</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>The paper investigates the problem of finding parameters of linear regression model with the least
moduli criterion with  ≤  ≤  and -regularization. The problem is formulated as a problem of
unconditional minimization of a convex piecewise-linear function. For solving this problem, Shor’s
ellipsoid method is used, which is implemented by the emlmpr program using Octave programming
language.</p>
      <p>Series of three computational experiments with the emlmpr program are considered. Results of
the first experiment show that the problem of finding parameters of linear regression model with
 =  and  =  can be solved within 7 seconds being run on modern laptop of average
performance. The second experiment shows that the least moduli criterion is robust if  is close to
one, thus solutions of the problem are robust as well. The third experiment is dedicated to using of
-regularization for decreasing effect of linearly dependent features that the model can include on
the solutions quality. Results of the experiment, where real cardiological data are used for prediction
of psychological indicators of the patient’s condition, show that the emlmpr algorithm can
successfully compute linear regression model parameters with  = ,  =  within 3 seconds,
and set coefficients at dependent features to zero with sufficient accuracy using
-regularization approach.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>
        The paper is supported by National Research Foundation of Ukraine (grants № 2021.01/0136 and
№2023.04/0094), Volkswagen Foundation grant № 97775, the project of research works of young
scientists №07-02/03-2023, the NASU grant for research laboratories/groups of young scientists
№02/01-2024(
        <xref ref-type="bibr" rid="ref5">5</xref>
        ), and the DTT TS KNU NASU project № 0124U002162.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>James</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Witten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hastie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , An Introduction to Statistical Learning: with Applications in Python, Springer Texts in Statistics, Springer Cham, New York, NY,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -38747-0
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Deisenroth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Faisal</surname>
          </string-name>
          , C. Soon Ong,
          <source>Mathematics for Machine Learning: textbook, Cambridge, 1st Edition</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.J.</given-names>
            <surname>Huber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.M.</given-names>
            <surname>Ronchetti</surname>
          </string-name>
          , Robust Statistics,
          <source>John Wiley &amp; Sons, 2nd Edition</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.H.</given-names>
            <surname>Clarke</surname>
          </string-name>
          , Optimization and
          <string-name>
            <given-names>Nonsmooth</given-names>
            <surname>Analysis</surname>
          </string-name>
          ,
          <string-name>
            <surname>SIAM</surname>
          </string-name>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Stetsyuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Budnyk</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Sen'ko</article-title>
          .,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stovba</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Chaikovsky</surname>
          </string-name>
          ,
          <article-title>Using the Ellipsoid Method to Study Relationships in Medical Data, Cybernetics</article-title>
          and Computer Technologies (
          <year>2023</year>
          )
          <fpage>23</fpage>
          -
          <lpage>43</lpage>
          . doi:
          <volume>10</volume>
          .34229/
          <fpage>2707</fpage>
          -
          <lpage>451X</lpage>
          .
          <year>23</year>
          .
          <issue>3</issue>
          .
          <fpage>3</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Fan</surname>
          </string-name>
          , P. Hall,
          <article-title>On curve estimation by minimizing mean absolute deviation and its implications</article-title>
          ,
          <source>The Annals of Statistics</source>
          (
          <year>1994</year>
          )
          <fpage>867</fpage>
          -
          <lpage>885</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.Z.</given-names>
            <surname>Shor</surname>
          </string-name>
          ,
          <article-title>Cutting-off Method with Space Dilation for Solving Convex Programming Problems</article-title>
          ,
          <string-name>
            <surname>Cybernetics</surname>
          </string-name>
          (
          <year>1977</year>
          )
          <fpage>94</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.Z.</given-names>
            <surname>Shor</surname>
          </string-name>
          ,
          <source>Nondifferentiable Optimization and Polynomial Problems</source>
          , Kluwer, Amsterdam,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.Z.</given-names>
            <surname>Shor</surname>
          </string-name>
          , Minimization Methods for
          <string-name>
            <surname>Non-Differentiable</surname>
            <given-names>Functions</given-names>
          </string-name>
          , Berlin, Springer-Verlag,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Stetsyuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Khomyak</surname>
          </string-name>
          ,
          <source>The Generalized Ellipsoid Method and Its Implementation</source>
          , Communications in Computer and Information Science (
          <year>2020</year>
          )
          <fpage>355</fpage>
          -
          <lpage>370</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          - 38603-0_
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>I.</given-names>
            <surname>Chaikovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Primin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kazmirchuk</surname>
          </string-name>
          ,
          <article-title>Development and implementation into medical practice new information technologies and metrics for analysis of small changes in electromagnetic field of human heart</article-title>
          ,
          <source>Visnyk of the National Academy of Sciences of Ukraine</source>
          (
          <year>2021</year>
          )
          <fpage>33</fpage>
          -
          <lpage>43</lpage>
          . doi:
          <volume>10</volume>
          .15407/visn2021.
          <fpage>02</fpage>
          .033.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Persson</surname>
          </string-name>
          ,
          <article-title>Weight of evidence transformation in credit scoring models: How does it affect the discriminatory power? Master's thesis</article-title>
          , Lund university, Lund, Sweden,
          <year>2021</year>
          . https://lup.lub.lu.se/luur/download?func=
          <source>downloadFile&amp;recordOId=9066332&amp;fileOId=9067075</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Jundong</surname>
          </string-name>
          , K. Cheng,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Morstatter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.P.</given-names>
            <surname>Trevino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          , H. Liu, Feature Selection:
          <article-title>A Data Perspective</article-title>
          ., ACM Computing Surveys (
          <year>2017</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>45</lpage>
          . doi:
          <volume>10</volume>
          .1145/3136625
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Van Rossum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.L.</given-names>
            <surname>Drake</surname>
          </string-name>
          , Python 3 Reference Manual, CreateSpace, Scotts Valley, CA,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          et al.,
          <source>Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research 12.85</source>
          (
          <year>2011</year>
          ),
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          . URL: http://jmlr.org/papers/v12/pedregosa11a.html
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>W.</given-names>
            <surname>McKinney</surname>
          </string-name>
          ,
          <article-title>Data structures for statistical computing in python</article-title>
          ,
          <source>in: Proceedings of the 9th Python in Science Conference</source>
          , Austin, 28 June-3
          <source>July</source>
          <year>2010</year>
          ,
          <fpage>56</fpage>
          -
          <lpage>61</lpage>
          . doi:
          <volume>10</volume>
          .25080/Majora-92bf1922-00a
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>