Using of Ellipsoid Method and Linear Regression with L 1 -Regularization for Medical Data Investigation

Using of Ellipsoid Method and Linear Regression with L 1 -Regularization for Medical Data Investigation PetroStetsyuk stetsyukp@gmail.com V.M. Glushkov Institute of Cybernetics of the NASU

Academician Glushkov Avenue, 40 03187 Kyiv Ukraine

ViktorStovba vik.stovba@gmail.com V.M. Glushkov Institute of Cybernetics of the NASU

Academician Glushkov Avenue, 40 03187 Kyiv Ukraine

IvanSenko V.M. Glushkov Institute of Cybernetics of the NASU

Academician Glushkov Avenue, 40 03187 Kyiv Ukraine

IllyaChaikovsky illya.chaikovsky@gmail.com V.M. Glushkov Institute of Cybernetics of the NASU

Academician Glushkov Avenue, 40 03187 Kyiv Ukraine

≤𝑧 # −𝑦 𝑎 #% 𝑥 𝑧 # , 𝑖 V.M. Glushkov Institute of Cybernetics of the NASU

Academician Glushkov Avenue, 40 03187 Kyiv Ukraine

2024 Cambridge MA USA

Using of Ellipsoid Method and Linear Regression with L 1 -Regularization for Medical Data Investigation 1613-0073 6FC165EAE79D78A29F0BC41BF37443D9 GROBID - A machine learning software for extracting information from scholarly documents linear regression, least moduli criterion, 𝐿 ! -regularization, non-smooth optimization problem, Shor's ellipsoid method, dependent factors, data prediction 1 0000-0003-4036-2543 (P. Stetsyuk) 0000-0003-3023-5815 (V. Stovba) 0000-0002-2432-4582 (I. Senko) 0000-0002-4152-0331 (I. Chaikovsky)

The problem of finding of parameters of linear regression model with 𝐿 ! -regularization and the least moduli criterion with 1 ≤ 𝑝 ≤ 2 is considered. To solve the problem the Shor's ellipsoid method is used, which is implemented as the emlmpr algorithm. A series of three computational experiments is conducted, which demonstrate solving time of the emlmpr algorithm and robustness of the least moduli criterion if 𝑝 is close to 1. The third experiment considers situation when the model contains linearly dependent features and shows the effect of 𝐿 ! -regularization on the quality of solutions obtained.

Introduction

Regression models are an extremely prevalent tool for effective prediction both in machine learning and artificial intelligence in general. Applying of linear regression models for building effective forecasting models, which describe linear relationships between factors, in such fields as statistics, medicine, economics, ecology, identification of parameters of complex systems etc. is studied and investigated. This type of models has proven themselves to be flexible in construction and to provide clear interpretation of relationships between dependent variable and model factors, sometimes even outperforming more complex nonlinear models [1].

When working with regression models, it is rather important to choose correct criteria for estimating model parameters. The most well-known and common variants are the criterion based on least squares and based on least moduli. Effectiveness of the first variant is confirmed by theoretical studies [2] and numerous statistical experiments. Nevertheless, one of the most significant disadvantages of the least squares criterion is the increase of the effect of large errors when they are squared, which makes the model extremely sensitive to anomalous observations (or outliers). An important condition for using this criterion is the standard normal distribution of model errors, which is not always fulfilled in practice. A well-known and effective alternative to this criterion is the criterion based on the least moduli, which is robust to outliers [3,4] and assumes a Laplacian distribution of model errors.

Another important aspect of work with linear regression models is the presence of dependencies between two or more factors of a model, which negatively affect the quality of the obtained parameter estimates. Usually, such dependencies are detected at the stage of data preprocessing and model building by selecting optimal set of model factors that best describe relationship between the dependent variable and the factors. However, in practice, situations often occur when a certain group of factors collectively affects the dependent variable. As a result, both the criterion based on the least squares and the least moduli incorrectly determines parameters of the model, often significantly overestimating or underestimating them. Therefore, it is expedient to develop methods and criteria that make it possible to detect such dependencies between factors and make their coefficients be close to zero. One of the most famous so-called shrinkage methods in machine learning [1] is regularization approach that permits to balance the model and reduce the effect of dependent factors on the quality of parameter determination.

The article is dedicated to applying of the Shor's ellipsoid method for finding parameters of a linear regression model with 𝐿 ! -regularization and the least moduli criterion with 1 ≤ 𝑝 ≤ 2. This criterion includes the use of the least moduli (𝑝 = 1) and the least squares (𝑝 = 2) criteria, as well as allows to use any value of the parameter 𝑝. Certain work results of applying the ellipsoid method for this type of problems are given in [5].

Finding of linear regression model parameters using the least moduli criterion powered to p

Let us consider a classical linear regression problem: to find 𝑛 unknown parameters 𝑥 ! , … , 𝑥 " with known observations

(𝐚 # , 𝑦 # ), 𝐚 # = (𝑎 #! , 𝑎 #$ , … , 𝑎 #" ) ∈ 𝑹 " , 𝑦 # ∈ 𝑹, 𝑖 = 1, 𝑚

444444 , which are related as follows:

𝑦 # = 5 𝑎 #% 𝑥 % " %&! + 𝜀 # , 𝑖 = 1, 𝑚 444444 ,(1)

where 𝑎 #% are known coefficients, 𝜀 # are unknown random variables, which have (approximately) the same distribution functions, 𝑚 > 𝑛. The equation ( 1) can be rewritten in matrix form

𝑦 = 𝐴𝑥 + 𝜀,(2)

where

𝑦 = (𝑦 ! , … , 𝑦 ' ) ( ∈ 𝑹 ' and 𝜀 = (𝜀 ! , … , 𝜀 ' ) ( ∈ 𝑹 ' are 𝑚-dimentional vectors, 𝐴 is a 𝑚 × 𝑛- matrix, 𝑥 = (𝑥 ! , … , 𝑥 " ) ( ∈ 𝑹 " is a 𝑛-dimentional

vector that is to be evaluated. The least moduli method powered to 𝑝, which corresponds to finding the unknown vector 𝑥 ) * according to the least moduli criterion powered to 𝑝 (1 ≤ 𝑝 ≤ 2), is a mathematical programming problem:

𝑓 * = 𝑓=𝑥 ) * > = min +∈𝑹 " B𝑓(𝑥) = 5 C𝑦 # − 5 𝑎 #% 𝑥 % " %&! C ) ' #&! E,(3)

where |•| is an absolute value of a number. The function 𝑓(𝑥) is non-smooth, if 𝑝 = 1 and smooth, if 𝑝 > 1.

The problem (3) is a problem of unconditional minimization of the convex function 𝑓(𝑥), subgradient of which at the point 𝑥̅ is calculated using the following formula:

𝑔 . (𝑥̅ ) = ⎩ ⎪ ⎪ ⎨ ⎪ ⎪ ⎧ 𝑝 5 𝑠𝑖𝑔𝑛 O5 𝑎 #% " %&! 𝑥̅ % − 𝑦 # P C5 𝑎 #% " %&! 𝑥̅ % − 𝑦 # C )/! ' #&! 𝑎 #! , … 𝑝 5 𝑠𝑖𝑔𝑛 O5 𝑎 #% " %&! 𝑥̅ % − 𝑦 # P C5 𝑎 #% " %&! 𝑥̅ % − 𝑦 # C )/! ' #&! 𝑎 #" , ⎭ ⎪ ⎪ ⎬ ⎪ ⎪ ⎫(4)

If 𝑝 = 1, the problem (3) can be formulated as a following mathematical programming problem:

𝑓 ! * = min +∈𝑹 " B𝑓 ! (𝑥) = 5 C𝑦 # − 5 𝑎 #% 𝑥 % " %&! C ' #&! E.(5)

The problem ( 5) is a problem of unconditional minimization of convex piecewise-linear function 𝑓 ! (𝑥), which corresponds to the least moduli method, which has proven to be robust to anomalous observations or outliers [3,6]. Finding the best according to the least moduli criterion vector 𝑥 * , where 𝑥 * is a solution of the problem (5), can be formulated as the following LP-problem: to find

E.(8)

The problem ( 8) is a problem of unconditional minimization of a convex piecewise-linear function 𝑓 ) (𝑥). Here 𝜆 is a regularization parameter, and if 𝜆 = 0 the function 𝑓 ) (𝑥) coincides with the function 𝑓(𝑥). To calculate the subgradient of the function 𝑓 ) (𝑥) at the point 𝑥̅ one can use the following formula:

𝑔 . # (𝑥̅ ) = 𝑔 . (𝑥̅ ) + 𝜆 𝑠𝑖𝑔𝑛(𝑥̅ ),(9)

where 𝑔 . (𝑥̅ ) is calculated using the expression (4).

For solving the problem (8) the Shor's ellipsoid method [7,8,9] can be used, which is implemented as the emshor program [10]. We will apply it for the problem of the function 𝑓 ) (𝑥) minimization, providing that its minimum point 𝑥 ) * is localized in 𝑛-dimensional ball with radius 𝑟 4 , which is centered at the point 𝑥 4 ∈ 𝑹 " , i.e. \𝑥 4 − 𝑥 ) * \ ≤ 𝑟 4 . The algorithm to be used is called emlmpr, description of which is given below.

The emlmpr algorithm and its Octave implementation

The input parameter of the algorithm is 𝜀 . > 0 -accuracy, with which 𝑓 ) * = 𝑓 ) =𝑥 ) * > is to be found.

Initialization.

Let us consider 𝑛 × 𝑛-matrix 𝐵 and set 𝐵 4 ≔ 𝐼 " , where 𝐼 " is 𝑛 × 𝑛 identity matrix. We go to the first iteration with values 𝑥 4 , 𝑟 4 and 𝐵 4 . Let values 𝑥 5 ∈ 𝑹 " , 𝑟 5 , 𝐵 5 be found at the iteration 𝑘. Passing to the iteration 𝑘 + 1 consists of the following sequence of actions.

Step 1. Calculate 𝑓 ) (𝑥 5 ) and subgradient 𝑔 . # (𝑥 5 ) at the point 𝑥 5 using formula (9). If 𝑟 5 a𝐵 5 ( 𝑔 . # (𝑥 5 )a ≤ 𝜀 . , then "Stop: 𝑘 * = 𝑘 and 𝑥 ) * = 𝑥 5 ". Otherwise, go to the step 2.

Step 2. Set 𝜉 5 ≔

6 $ % 7 & # (+ $ ) :6 $ % 7 & # (+ $ ):

Step

\𝐵 5 /! =𝑥 5 − 𝑥 ) * >\ ≤ 𝑟 5 , 𝑘 = 0,1,2, … , 𝑘 * .

On each iteration 𝑘 > 0 the value of decreasing of volume of the ellipsoid 𝐸 5 = i𝑥 ∈ 𝑅 " : \𝐵 5 /! (𝑥 5 − 𝑥)\ ≤ 𝑟 5 k, which localizes point 𝑥 ) * , is constant and equal to

𝑞 = 𝑣𝑜𝑙(𝐸 5 ) 𝑣𝑜𝑙(𝐸 5/! ) = p 𝑛 − 1 𝑛 + 1 q 𝑛 √𝑛 $ − 1 s " < 𝑒𝑥𝑝 v− 1 2(𝑛 + 1) w < 1.

Theorem implies the fact that the algorithm of finding 𝑥 ) * can be successfully run on modern computers, if 𝑛 = 10 ÷ 30 and 𝑚 = 100 ÷ 1000. Indeed, to decrease in 10 times volume of the ellipsoid localizing the point 𝑥 ) * , it is needed to perform 𝐾 iterations, where 𝐾 = =" !4

=" > ≈ (2 𝑙𝑛 10)(𝑛 + 1) ≈ 4.6(𝑛 + 1). It means that in order to improve deviation of found record value of the function 𝑓 ) (𝑥) from its optimal value 𝑓 ) * by 10 times, it is necessary to perform 4.6(𝑛 + 1) $ iterations of the algorithm for finding 𝑥 ) * .

If 𝑛 = 30 and 𝜀 . = 10 /? × 𝑓(𝑥 4 ), then the maximal number of iterations of the algorithm is equal to 4.6(𝑛 + 1) $ = 46 × 961 = 44206. Therefore, even the straight-up matrix-vector implementation of calculation of the function 𝑓 ) (𝑥) value and its subgradient according to the formula (9) allows to provide fast algorithm work on modern computers.

The algorithm emlmpr for finding an approximation to the point 𝑥 ) * is implemented using Octave language. Its code is given below. Core of the emlmpr program is the for loop (rows 4-16). First, the value of the function 𝑓 (line 5) and its normalized subgradient at the point 𝑥 ) (row 10) are calculated. If the stop condition is satisfied (row 11), the algorithm stops its work. Stop in the emlmpr algorithm occurs when a condition 𝑟 5 a𝐵 5 ( 𝑔 . # (𝑥 5 )a ≤ 𝜀 . is fulfilled, which is equivalent to condition 𝑓 ) (𝑥 5 ) − 𝑓 ) * ≤ 𝜀 . . Otherwise, the next point 𝑥 5;! is calculated (row 13), the space transformation matrix 𝐵 5;! (row 14) and the radius 𝑟 5;! (row 15) are recalculated.

# Input parameters: #com01 # A(m,n) -observation matrix; #com02 # y(m,1) -vector of tags (output vector); #com03 # p -power for least moduli criterion, 1<=p<=2; #com04 # lambda -regularization rate; #com05 # x0(n,1) -starting point; #com06 # r0 -

Computational experiments without regularization

To demonstrate the effectiveness of the emlmpr algorithm work we present results of three computational experiments conducted for solving the problem (8). For the first and the second experiments parameters 𝑛 = 30 and 𝑚 = 10 × 𝑛 = 300. The purpose of the first experiment is to estimate time of solving the problem (8) for specified parameters on a personal computer with Intel Core i7-10750H processor (2.6 GHz), and 16 Gb RAM. The purpose of the second experiment is to demonstrate robustness of the least moduli method, and therefore solutions of the problem (8) without regularization (𝜆 = 0), if 𝑝 is close to one. Third experiment is dedicated to finding parameters of linear regression model using real medical data for further prediction psychological indicators.

All the calculations are performed on a computer with Intel Core i7-10750H processor (2.6 GHz), 16 Gb RAM in Windows 10/64 system using GNU Octave, version 6.3.0. For the first two experiments regularization parameter 𝜆 is chosen equal to zero.

Test example 1. For the first experiment input data for the problem (8) are matrix 𝐴 and vector 𝑦, which are generated randomly with a standard uniform distribution according to the following formulas: It is easy to see from Table 1 that to get solution with accuracies 10 /? ÷ 10 /!$ for different 𝑝 the emlmpr algorithm requires approximately 40 000 iterations and no more than 7 seconds of time. The least deviation 𝑑𝑥 equals 2.3e-11 and is obtained for 𝑝 = 1.

A =

Test example 2. The purpose of the second experiment is to demonstrate robustness of the least moduli method, which means that the same robustness will characterize solutions of the problem (8), if 𝑝 is close to one. Here, the matrix 𝐴, the starting point 𝑥 4 , ball radius 𝑟 4 are chosen to be the same as in the first test, the vector 𝑦 is adjusted so that its odd components remain the same as in the first test, and even components are multiplied by the value q = (1.0 + 1.0*sign(0.5 -rand)). Thus, even components of the vector 𝑦 can be considered anomalous (incorrect) results of observations. Calculation results for 𝑛 = 30 and 𝑚 = 300 are given in Table 2. Here, 𝑖𝑠𝑡 is an exit code of the emlmpr program, 𝑑𝑥 is a norm of deviation of found approximation to the minimum point from the point xstar. The 5th column contains values of the function 𝑓 ) at the found point 𝑥 ) , the 7th column contains the 𝑝-th root of the 5th column. For all the values of the parameter 𝑝 code 𝑖𝑠𝑡 = 1, which indicates successful completion of the program.

Table 2

Results of solving the problem (8) with 𝒏 = 𝟑𝟎, 𝒎 = 𝟑𝟎𝟎, 𝝀 = 𝟎, and different 𝒑 Results of Table 2 show that the function value 𝑓 C grows as the parameter 𝑝 increases: from 1.34e+05 if 𝑝 = 1 to 1.09e+08 if 𝑝 = 2. Deviation 𝑑𝑥 of the solution found from the minimum point with 𝑝 = 1 is significantly smaller than if 𝑝 > 1, which confirms robustness of the least moduli method corresponding to 𝑝 = 1 situation. It is important to emphasize that this situation is typical for all the values of the parameter 𝑝 close enough to 1. Time used for finding solutions for each of the parameter 𝑝 values does not exceed 4 seconds.

Computational experiments with regularization

To show effectiveness of the emlmpr algorithm applied to real data we consider the problem of prediction of psychological indicators of the patient's condition based on cardiological data obtained using complex [11]. There were 90 patients studied with more than 200 features including cardiological and basic ones (like age and ordinal number). Willing to exclude choice of categorial features recoding method from analysis so we are omitting categorial feature as well as ordinal. Practically, usage of ordinal features instead of numerical could increase the quality of linear modelling, see [12], however, we need to simplify experiment in order to research only the ellipsoid method usage. While ability of the medical complex [11] to create binning good enough for the linear modelling is out the scope of the current research. So, we are taking just 175 numerical features that we have. Then, we apply the feature selection procedure to test the ellipsoid method on the dataset being optimal at least at some sense.

We want to select features that describe relationship between medical and psychological data in the best way using the 𝑅 $ metric [1]. While the goal of the studying the medical data includes feature interpretability, we take these data as is. In other words, we do not make transformations like PCA and similar ones to get linear independent features. Undoubtedly, it is possible to get some interpretation even after the transformations, but our approach is to take features as is. Taking into account that internal metrics for feature importance in the case of linear regression model work are the best when features are either linearly independent or have normal distributions at least, we cannot rely on internal linear regression metrics, so we try to use "wrapper" approach for the model feature selection [13]. For the quality metric, we use 5-fold cross-validation [1]. Since the initial dataset holds missing values, we use simple imputation via median strategy using only training subsample to avoid distortion due to the whole-set median calculation. Moreover, in our situation the initial number of features, which is 200, is greater than number of observations, which is 90, so we start from the first feature, increase number of features until the quality metric 𝑅 $ stops to grow. Also, we consider non-transformed features to decrease the number of experiments to perform and the variability of the whole scheme. Selection of the optimal transformation is an additional task, which is out of scope of the current paper. In general, the feature selection procedure is described at Figure 1.

The calculations for feature selection are made in Python 3 [14] using Google Colab with Sequential Feature Selection and Linear Regression classes with embedded 𝑅 $ -metric taken from Scikit-learn library [15]. We also used Pandas library [16] for keeping feature names during calculations. Table 4 shows that to solve the problem with 𝑝 = 1 with 𝜀 . = 10 /? and 𝜀 . = 10 /$4 the emlmpr program requires approximately 8 thousand operations. If we use 𝑝 = 1 for the same accuracies 11700 iterations are required, and their number is increased to 29719 iterations when using 𝜀 . = 10 /F4 . The 𝑓 ) value for fixed 𝑝 remains unchanged.

𝑝

As it can be seen from Table 4, the emlmpr program successfully finds linear regression model coefficients when using 𝜆 = 0 (see Table 5). However, some of the coefficients are rather larger than others (bold values in Table 5), which can indicate presence of dependency between the following features in the observation matrix. To reduce their effect on the quality of coefficients restoration we apply 𝐿 ! -regularization, which allows to set model parameters corresponding to dependent columns to zero. In practice, it is difficult to obtain exactly zero values of the corresponding parameters, so we have to settle for values close to zero with a certain accuracy. 5, as well as any changes in coefficients digits are highlighted in bold. It is easy to see that now these coefficients are rather close to zero with sufficient accuracy: 10 /$ for the feature 7 with any values of 𝑝 and 𝜀 . , 10 /G for the feature 14 with 𝑝 = 1 and 𝜀 . = 10 /? and even 10 /$H for the feature 16 with 𝑝 = 2 and 𝜀 . = 10 /F4 . The rest of the coefficients remained almost unchanged except several digits. It is also worth noting that increasing of the regularization rate leads to decreasing coefficients values of dependent features even more. It gives an instrument to adjust the impact of regularization and obtain coefficients at dependent features close enough to zero, thus improving quality of the solutions obtained.

The prediction results obtained using the model with parameters calculated with the emlmpr algorithm show that using the least moduli method (𝒑 = 𝟏) we obtain many more zero values (which means that solution is found with required accuracy) than in case of using the least square method (𝒑 = 𝟐). Thus, using 𝒑 = 𝟏 is more appropriate than 𝒑 = 𝟐. 𝑝 = 1 𝑝 = 2 𝜀 . = 10 /?

𝜀 . = 10

Conclusions

The paper investigates the problem of finding parameters of linear regression model with the least moduli criterion with 𝟏 ≤ 𝒑 ≤ 𝟐 and 𝑳 𝟏 -regularization. The problem is formulated as a problem of unconditional minimization of a convex piecewise-linear function. For solving this problem, Shor's ellipsoid method is used, which is implemented by the emlmpr program using Octave programming language.

Series of three computational experiments with the emlmpr program are considered. Results of the first experiment show that the problem of finding parameters of linear regression model with 𝒏 = 𝟑𝟎 and 𝒎 = 𝟑𝟎𝟎 can be solved within 7 seconds being run on modern laptop of average performance. The second experiment shows that the least moduli criterion is robust if 𝒑 is close to one, thus solutions of the problem are robust as well. The third experiment is dedicated to using of 𝑳 𝟏 -regularization for decreasing effect of linearly dependent features that the model can include on the solutions quality. Results of the experiment, where real cardiological data are used for prediction of psychological indicators of the patient's condition, show that the emlmpr algorithm can successfully compute linear regression model parameters with 𝒏 = 𝟏𝟔, 𝒎 = 𝟗𝟎 within 3 seconds, and set coefficients at dependent features to zero with sufficient accuracy using 𝑳 𝟏 -regularization approach. 𝑝 = 1 𝑝 = 2 𝜀 . = 10 /?

𝜀 . = 10 /$4 𝜀 . = 10 /!$ 𝜀 . = 10

3 .Step 5 .35Calculate the next point 𝑥 5;! ≔ 𝑥 5 − ℎ 5 𝐵 5 𝜉 5 , where ℎ 5 = ! ";! 𝑟 5 . Step 4. Calculate 𝐵 5;! : = 𝐵 5 + Oe "/! ";! − 1P (𝐵 5 𝜉 5 )𝜉 5 ( and 𝑟 5;! : = 𝑟 5 " √" ' /! . Go to the iteration 𝑘 + 1 with values 𝑥 5;! , 𝑟 5;! , 𝐵 5;! . Theorem. Sequence of points {𝑥 5 } 5&4 5 * satisfy the following inequalities:

radius of the ball centered at x0 that localizes x_p^*; #com07 # epsf, maxitn -stop parameters: #com08 # epsf -precision to stop by the value of the function fp, #com09 # maxitn -maximal number of iterations; #com10 # intp -print information for every intp iteration. #com11 # Output parameters: #com12 # xp(n,1) -approximation to x_p^*; #com13 # fp -the value of the function f_R at the point xp; #com14 # itn -the number of iterations; #com15 # ist -exit code: 1 -epsf, 4 -maxitn. #com16 function [xp,fp,itn,ist] = emlmpr(A,y,p,lambda,x0,r0, epsf,maxitn,intp); #row01 n = columns(A); xp = x0; B = eye(n); r = r0; #row02 dn = double(n); beta = sqrt((dn-1.d0)/(dn+1.d0)); #row03 for (itn = 0:maxitn) #row04 temp = A*xp-y; fp = sum(abs(temp).^p) + lambda*sum(abs(xp)); #row05 if((mod(itn,intp)==0)&&(intp<=maxitn)) #row06 printf(" itn %4d fp %14.6e\n",itn,fp); #row07 endif #row08 g1 = p*A'*(sign(temp).*(abs(temp)).^(p-1)) + lambda*sign(xp);#row09 g = B'*g1; dg = norm(g); #row10 if(r*dg < epsf) ist = 1; return; endif #row11 xi = (1.d0/dg)*g; dx = B * xi; #row12 hs = r/(dn+1.d0); xp -= hs * dx; #row13 B += (beta -1) * B * xi * xi'; #row14 r = r/sqrt(1.d0-1.d0/dn)/sqrt(1.d0+1.d0/dn);

# Test 1 :110*rand(m,n), y = A*xstar(n,1), xstar(n,1) = round(10*rand(n,1) + 0.5). Starting point is chosen according to the rule x0(n,1) = round(5*rand(n,1)), and radius of the sphere, in which the point 𝑥 ) * = 𝑥 @ABC is located, is chosen according to the rule r0 = 5*norm(x0 -xstar), i.e. 𝑟 4 = ‖𝑥 4 − 𝑥 @ABC ‖. The first experiment is implemented by the following Octave code. emlmpr running time for n = 30 and m = 300 n = 30, m = 10*n, rand("seed", 2024); A = 10*rand(m,n); xstar = round(10.0*rand(n,1) + 0.5); y = A*xstar; x0 = round(5.0*rand(n,1)); r0 = 5*norm(x0xstar), maxitn = 50000, intp = 10000, lambda = 0.0, # running the emlmpr algorithm for p=1.0;1.1.2;1.5;1.8;2.0 printf("\n Test 1: emlmpr runnning time for n = 30 and m = 300 \n"); epsf0 = 1.e-6; ntest = 5;

# Test 2 :2robustness of the least moduli method for n = 30 and m = 300 n = 30, m = 10*n, rand("seed", 2024); A = 10*rand(m,n); # test example generation xstar = round(10.0*rand(n,1) + 0.5); y = A*xstar; x0 = round(5.0*rand(n,1)); r0 = 5*norm(x0xstar), m1 = m/2, for i = 1:m1 ind = (i-1)*2 + 1; y(ind) = y(ind)*(1.0 + 1.0*sign(0.5rand)); endfor # running the emlmpr algorithm for p=1.0;1.1.2;1.5;1.8;2.0 printf("\nTest 2: robustness of the Least Moduli Method \n"); maxitn = 50000, intp = 10000, lambda = 0.0, epsf0 = 1.e-6; ntest = 5;

Figure 1 :1Figure 1: Feature selection workflow

Results of the emlmpr program work for the first experiment are 𝑡𝑖𝑚𝑒 required to solve the problem (8) with accuracy 𝜀 . , the number of iterations 𝑖𝑡𝑛 of the method, the minimum value of the function 𝑓 ) found, norm of deviation 𝑑𝑥 of the found approximation to the minimum point from the known minimum point xstar are given in Table1. Here 𝜀 . is chosen as follows: if 𝑝 = 1 the value 𝜀 . = 10 /? , if 𝑝 > 1 we choose 𝜀 . = (10 /? ) ) .Table 1Results of solving the problem (8) with 𝒏 = 𝟑𝟎, 𝒎 = 𝟑𝟎𝟎 and 𝝀 = 𝟎𝑝𝜀 .time (sec)itn𝑓 )𝑑𝑥1.01.0e-065.17453751.71062e-082.3e-111.23.2e-086.99421483.58266e-101.2e-101.51.0e-095.39400619.87425e-124.1e-101.83.2e-115.97382601.81563e-137.2e-102.01.0e-123.91372167.45098e-151.8e-09dx];itn, fp,endforn,m,printf("pepsftimeitn istfpdx \n");for (i = 1:ntest)printf("%4.1f%6.1e%4.2f%6d%2d%10.5e%10.1e\n",table(i, 1:7))endfor

table = []; for (i = 1:ntest) p = 1.d0 + (i -1.d0)/(ntest -1.d0), epsf = epsf0**(p); time0 = time(); [xp,fp,itn,ist] = emlmpr(A,y,p,lambda,x0,r0,epsf,maxitn,intp); time1 = time() -time0, dx = norm(xpxstar); table = [table; p epsf time1 itn ist fp

table = []; for (in = 1:ntest) p = 1.d0 + (in -1.d0)/(ntest -1.d0), epsf=];itn, fp,endforn,m,printf("pepsftimeitn istfpdxr(fp)\n");for (i = 1:ntest)printf(" %4.1f %6.1e %4.2f %6d %2d%10.5e %10.1e%10.5e\n", table(i, 1:8))endfor

= epsf0**(p); time0 = time(); [xp,fp,itn,ist] = emlmpr(A,y,p,lambda,x0,r0,epsf,maxitn,intp); time1 = time() -time0, dx = norm(xpxstar); table = [table; p epsf time1 itn ist fp dx fp^(1/p)

Table 33Results of the emlmpr program work with 𝒏 = 𝟏𝟔, 𝒎 = 𝟗𝟎, 𝝀 = 𝟎, 𝒑 = 𝟏. 𝟎; 𝟐. 𝟎 and differentaccuracies 𝜺 𝒇𝑝𝜀 .time (sec)itn𝑓 )1.0 1.0e-060.5676402.61732e+021.0 1.0e-200.6983832.61732e+022.0 1.0e-120.95117001.38880e+032.0 1.0e-402.15297191.38880e+03

Table 5 Linear regression model parameters found by the emlmpr program with5𝒑 = 𝟏. 𝟎; 𝟐. 𝟎, 𝝀 = 𝟎 and different accuracies 𝜺 𝒇

Table 66contains coefficients of linear regression model found by the emlmpr program with 𝑝 = 1.0; 2.0, different accuracies 𝜀 . and regularization rate 𝜆 = 0.1. Corresponding values to large coefficients from Table

Table 6 Linear regression model parameters found by the emlmpr program with6𝒑 = 𝟏. 𝟎; 𝟐. 𝟎, 𝝀 = 𝟎. 𝟏 and different accuracies 𝜺 𝒇

/$4𝜀 . = 10 /!$𝜀 . = 10 /F4-8.9453e-02-8.9453e-02-1.1440e-01-1.1440e-01-2.5798e-03-2.5798e-03-7.0401e-03-7.0401e-03-3.2033e-02-3.2033e-02-2.4929e-02-2.4929e-022.0159e-022.0159e-022.7608e-022.7608e-022.4191e-012.4191e-012.6629e-012.6629e-016.8819e-036.8820e-033.5181e-023.5181e-021.4368e+081.4868e+087.5756e+066.3818e+06-1.4482e-02-1.4482e-02-1.2610e-02-1.2610e-023.7213e-023.7213e-023.7520e-023.7520e-02-8.9244e-03-8.9244e-03-4.5045e-02-4.5045e-021.4087e+001.4087e+002.1146e+002.1146e+00-1.0175e+00-1.0175e+00-2.4560e+00-2.4560e+00-3.2982e-01-3.2982e-01-1.0611e-01-1.0611e-01-3.9263e+08-3.7817e+08-7.8529e+06-4.2703e+071.4368e+081.4868e+087.5756e+066.3818e+065.5243e+08-4.9760e+08-4.5914e+089.8456e+08

Acknowledgements

The paper is supported by National Research Foundation of Ukraine (grants № 2021.01/0136 and №2023.04/0094), Volkswagen Foundation grant № 97775, the project of research works of young scientists №07-02/03-2023, the NASU grant for research laboratories/groups of young scientists №02/01-2024 (5), and the DTT TS KNU NASU project № 0124U002162.

An Introduction to Statistical Learning: with Applications in Python GJames DWitten THastie RTibshirani JTaylor 10.1007/978-3-031-38747-0 Springer Texts in Statistics 2023 Springer Cham Mathematics for Machine Learning: textbook MDeisenroth AFaisal CSoonOng 2020 Cambridge 1st Edition PJHuber EMRonchetti Robust Statistics John Wiley & Sons 2011 2nd Edition Optimization and Nonsmooth Analysis FHClarke 1990 SIAM Using the Ellipsoid Method to Study Relationships in Medical Data PStetsyuk MBudnyk ISen'ko VStovba IChaikovsky 10.34229/2707-451X.23.3.3 Cybernetics and Computer Technologies 2023 On curve estimation by minimizing mean absolute deviation and its implications JFan PHall The Annals of Statistics 1994 Cutting-off Method with Space Dilation for Solving Convex Programming Problems NZShor Cybernetics 1977 Nondifferentiable Optimization and Polynomial Problems NZShor 1998 Kluwer Amsterdam Minimization Methods for Non-Differentiable Functions NZShor 1985 Springer-Verlag Berlin The Generalized Ellipsoid Method and Its Implementation PStetsyuk AFischer OKhomyak 10.1007/978-3-030-38603-0_26 Communications in Computer and Information Science 2020 Development and implementation into medical practice new information technologies and metrics for analysis of small changes in electromagnetic field of human heart IChaikovsky MPrimin AKazmirchuk 10.15407/visn2021.02.033 Visnyk of the National Academy of Sciences of Ukraine 2021 RPersson Weight of evidence transformation in credit scoring models: How does it affect the discriminatory power?

Lund, Sweden

2021 Lund university Master's thesis Feature Selection: A Data Perspective LJundong KCheng SWang FMorstatter RPTrevino JTang JTang HLiu 10.1145/3136625 ACM Computing Surveys 2017 Python 3 Reference Manual GVan Rossum FLDrake 2009 CreateSpace Scotts Valley, CA Scikit-learn: Machine Learning in Python FPedregosa Journal of Machine Learning Research 12 2011 Data structures for statistical computing in python WMckinney 10.25080/Majora-92bf1922-00a Proceedings of the 9th Python in Science Conference the 9th Python in Science Conference

Austin

28 June-3 July 2010