Choosing the Optimal Quantity of Factors for Prediction the Severity of Bronchial Asthma in Children Using Linear Regression Models Oleh Pihnastyi 1, Olga Kozhyna 2 and Tetiana Kulik2 1 National Technical University "Kharkiv Polytechnic Institute", 2 Kyrpychova, Kharkiv, 61002, Ukraine 2 Kharkiv National Medical University, 4 Nauky Avenue, Kharkiv, 61022, Ukraine Abstract The severity of the course of bronchial asthma depends on many factors. Clinical and laboratory studies were carried out on 90 children aged 6 to 18. 70 children with bronchial asthma of various degrees of severity as well as 20 healthy school-aged children were included into the main group. 142 predictors were studied, 11 factors were selected from the bottom in accordance with the selection method. Multivariate linear regression models have been developed and analyzed to predict the severity of bronchial asthma disease. The dependence of the forecast quality of the observed value on the number of model regressors is analyzed. The MSE value was used as a characteristic of forecast quality. An estimate of the number of regressors required for a significant increase in the forecast quality is presented. The law of distribution of the error in predicting the severity of bronchial asthma disease in a multifactorial linear regression model has been substantiated. The visual representation of multivariate models is made using the residual plot. Keywords 1 Bronchial asthma, child, severe asthma, prediction, MSE, regression model, residual plot 1. Introduction Bronchial asthma is a severe heterogeneous chronic lung disease. Numerous studies have revealed that the prevalence of bronchial asthma does not depend on the level of wealth of the country and is 4 – 10 % among the adult population [1, 2].The severity of the course of bronchial asthma depends on a sufficiently large number of factors that, presumably, have the same effect on the clinical manifestations of the disease. The first clinical symptoms may appear already in early childhood, very often similar to the symptoms of other childhood diseases [3, 4]. The relationship between the early age at which the first manifestations of the disease appeared and the severity of the course in the patient's adult life has been proven [5]. Despite similar symptoms of bronchial asthma among patients, the result of treatment and further prognosis of the disease is very different. Investigations have confirmed the presence of various phenotypes of the disease and the influence of a huge number of factors on the occurrence of bronchial asthma and the peculiarities of its course [6, 7]. Currently, the tactics of treatment and observation of patients have been developed to increase the level of disease control [8], based on the use of stepwise therapy. However, there is a fairly large category of patients who are characterized by an uncontrolled or severe course of the disease [9], which confirms the presence of different pathogenetic mechanisms of occurrence of bronchial asthma [10, 11]. ITTAP’2021: 1nd International Workshop on Information Technologies: Theoretical and Applied Problems, November 16–18, 2021, Ternopil, Ukraine pihnastyi@gmail.com (A. 1); olga.kozhyna.s@gmail.com (A. 2); tv.kulik@knmu.edu.ua (A. 3) ORCID: 0000-0002-5424-9843 (A. 1); 0000-0002-4549-6105 (A. 2); 0000-0002-8842-892X (A. 3) ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) Studying not only the factors, but also determining their relationship with each other, is an important step in understanding the course of the disease in each individual case. Analysis of the multifactorial nature of bronchial asthma underlies the prediction of the disease and its course [12]. Numerous studies have examined various categories of factors. Commonly used factors include age, gender of the child, whistling breath, allergic sensitization, Ig E [13, 14]. To assess the prognosis of a severe course of bronchial asthma, both linear and nonlinear multifactorial models are used, containing a different number of regressors (Table 1), aimed at determining predictors that are unambiguously significant in determining the severity of the course of bronchial asthma disease [15]. In this case, the values of the regressors are determined by both quantitative and qualitative values. Table 1 presents data on some common types of models for predicting the severity of bronchial asthma and on the number of predictors in these models. Table 1 The number of regressors in models for predicting the severity of bronchial asthma Number of regressors Linear Logistic Machine learning 2-3 [24] [22] - 4-7 - [17, 21] - 8-10 - [16, 19, 20] - >10 - - [18, 23, 25] It should be noted that the models presented in Table 1 with the same number of regressors are used to analyze the prediction of the severity of bronchial asthma by different initial factors. 2. Formulation of the problem The presence of a large number of models with different numbers of regressors makes the issue of choosing both the type of model and the number of regressors in the model relevant. In this research, we analyze the dependence of the forecast quality on the number of model regressors. The process of building a linear regression model with a large number of regressors is quite laborious. The computational complexity of the algorithm for constructing a regression model grows in proportion to the square of the number of regressors in the model. Therefore, when analyzing the severity of bronchial asthma, linear models are usually used with the number of regressors, the number of which does not exceed 5-7 (Table 1). Linear models with a small number of regressors can be considered as a tool for preliminary analysis of a set of experimental data. A deeper analysis requires an increase in the number of regressors in the model. Due to the fact that the results of predicting the severity of the course of bronchial asthma depend on a sufficiently large number of weakly dependent factors with approximately the same scale of formation of the explained value, an increase in the number of regressors in the model leads to a slight increase in the quality of the prediction of the observed value. On the other hand, the presence of a large number of weakly dependent factors with approximately the same scale of formation of the value of the explained quantity leads to the fact that linear regression models are a good tool for predicting the severity of the progression of bronchial asthma disease due to the fact that the distribution of the prediction error for the quantity of the regressors K  10 satisfies the normal distribution law. However, the issue arises, how many regressors the model should contain and how much the prediction accuracy of the model will be estimated to increase with an increase in the number of regressors. This work is devoted to the analysis of this problem. The regression model that allows you to determine the severity of the course of the bronchial asthma can be summarized as follows: Yi  F  X 1 , X 2 ,..., X k    i (1) where X m is the value of m-the regressor; Yi is the numerical value of the characterizing the severity of the course of the disease of bronchial asthma;  i is an error in predicting the numerical value for the i – the test. To analyze the influence of the number of regressors on the quality of predicting the severity of the disease of bronchial asthma, we will use the data set formed during the examination of 90 children with a diagnosis of bronchial asthma aged 6 to 18 years. The investigation contains data from the anamnesis of life and diseases of patients, laboratory and diagnostic indicators of the examination. The study was conducted with respect for human rights and in accordance with international ethical requirements; it doesn't violate any scientific ethical standards and standards of biomedical research. To analyze the dependence of the parameters on the quality of prediction, 142 factors were selected, which were encoded. As a result of the studies, for each examined patient, the values of 142 factors were recorded, on which, it is assumed, the severity of the course of the bronchial asthma disease may depend. As a result of preliminary analysis, invalid data were excluded from this set. The resulting dataset in the form of a 90x142 matrix [26] was used to build a regression model. As a result of the phased elimination, 11 factors were identified that match the criterion ry xm  max , rxm xv  min . (2) which are used in this work. Based on criterion (2), out of a total of 142 factors, those with a correlation ry xm between the regressor and the observed value is the highest, and the correlation rxm xv between the regressors has the lowest value was selected. In other words, out of 142 factors, 11 factors were selected that have the largest correlation ry xm values between the regressor and the observed value. Thus, it is assumed that the selected factors have the most important influence on the severity of bronchial asthma disease. The selected factors are tested for the condition rxm xv  min , in order to exclude those factors that are highly correlated with each other. These factors are replaced by the following factors from the condition ry xm  max . As a result of several iterations, the factors were determined, the numerical characteristics are presented in Table 2. Each factor is characterized by mathematical expectation m x , standard deviation  x and correlation coefficient with the observed value ryx . Table 2 Numerical characteristics of the factors selected to build models that determine the severity of the course of bronchial asthma disease Code Regressor name mx x ryx X1 Allergic rhinitis 0.4494 0.4974 0.3223 X2 Atopic dermatitis 0.0562 0.2303 0.3767 X3 Number of years from the first symptoms 5.5281 4.4396 0.3023 X4 Bronchial asthma in father 0.0864 0.281 0.0309 X5 Bronchial asthma in relatives of second generation 0.0658 0.2479 0.4157 X6 Eosinophils % 3.913 3.4462 0.2646 X7 Domestic dust 2.2319 1.1312 0.3116 X8 Pillow feather 0.7536 0.8059 0.3681 X9 Rabbit hair 0.5652 0.8925 0.2236 X10 Sheep wool 0.5217 0.6507 0.3373 X11 CD25 10*3 cells 0.6937 0.3087 0.2198 The selected factors are used in this work to construct linear regression models for predicting the severity of bronchial asthma disease. The type of model is determined by criteria (2). The criterion rxm xv  min indicates that the factors presented in Table 2 are weakly dependent on each other. Due to the fact that to assess the severity of the course of bronchial asthma disease, a large number of weakly dependent factors with approximately the same resulting contribution of the predicted observable value, proportional ryx , were selected, choice for prediction a linear model for prediction suggests that the error  has a normal distribution with distribution characteristics: E ( i )  0 ,  2 ( i )   2 ,  2 ( i ,  j )  0 , j  i . (3) This feature, characteristic of models for predicting the severity of bronchial asthma disease, will be used to compare the prediction accuracy of linear models with different numbers of regressors. It should also be added that the spread of error values  i for each range of predictor values X m obeys a probability distribution with mathematical expectation E ( i )  0 and standard deviation  ( i )   . 3. Research methodology The first step of the study after the choice of factors (Table 2) is to build a set of linear regression models for predicting the severity of bronchial asthma disease and comparing the quality of the prediction of the observed value for a different number of model regressors. As a criterion for comparing models, we will use the MSE value n MSE  1 n  . i 1 i 2 (4) For comparison, consider 1, 2, 3, 5 and 10 factor linear regression models, for the construction of which we used the factors from Table 2. 3.1. Construction and analysis of 10-factor linear regression models For the eleven factors presented in Table 2, we construct 11 models with ten factors (Table 3). Table 3 Coefficients for a ten-factor linear regression model. № model Number of MSE A X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 examined -0.09 -0.1 -0.06 -0.07 -0.03 -0.05 -0.06 -0.2 -0.22 -0.21 -0.21 0.15 0.12 0.14 0.11 0.11 0.12 0.47 0.45 0.38 0.46 0.45 0.46 0.45 0.47 0.45 0.46 0.39 0.42 0.39 0.02 0.02 0.02 0.02 0.02 0.01 0.01 0.01 0.01 0.08 0.09 0.08 0.05 0.03 0.08 0.07 0.09 0.07 0.07 0.08 0.07 0.07 0.08 0.07 0.08 0.07 0.01 0.01 0.00 0.01 0.00 0.01 0.00 0.00 1 56 - -0.02 0.01 0.00 0.02 -0.02 0.01 0.02 0.02 0.02 0.02 2 56 - 0.09 3 56 - 0.03 0.04 0.04 0.03 0.03 0.04 0.03 0.04 0.1 4 56 - 0.217 0.09 0.08 0.07 0.08 0.11 0.07 0.08 0.08 0.08 0.09 0.08 0.08 0.4 5 56 - - -0.2 -0.27 -0.19 -0.23 -0.26 -0.17 0.01 0.02 0.01 0.01 0.02 0.02 0.4 6 56 - 0.02 0.02 0.02 0.02 0.02 0.1 7 61 - 0.14 0.11 0.4 8 56 - -0.01 -0.04 -0.06 0.38 0.32 0.41 9 56 - 0.01 0.01 0.1 0.1 10 56 - 0.43 0.08 11 56 - The columns labeled «Xm» show the values of the coefficients before the regressor code Xm, which can be identified using Table 2. The free term of an equation are presented in the table in column "A". Linear regression model №1 and № 10 are designated as Y1 , Y10 has the form: Y1  0.21 0.12X 1  0.45X 2  0 X 3  0.06X 4  0.39X 5  0.02X 6  0.01X 7  (5)  0.08X 8  0.03X 9  0.08X10 Y10  0.27  0.1X1  0.01X 3  0.04X 4  0.32X 5  0.02X 6  0.02X 7  0.08X 8  (6)  0.04 X 9  0.1X10  0.01X11 . Residual plots presented in Figure 1 and Figure 2 match the linear regression models Y1 , Y10 , Y2 , Y7 . By virtue of the above-justified assumption, that error  has a normal distribution, it follows that the points characterizing the value of residuals ei for the value of prediction errors  i , should lie on a small neighborhood one straight line. Anomalous values are shown in circles on the graphs. The outliers were probably due to errors in the operation of the equipment used to change the values of quantitative factors or to the carelessness of the personnel in the preparation of raw survey data. Also, abnormal values can be associated with the presence of incorrect answers of patients in the personal survey sheet submitted for the study. a) b) Figure 1: Residual plot for 10-factor linear regression model: a) model Y1 ; b) model Y10 . a) b) Figure 2: Residual plot for 10-factor linear regression model: a) model Y7 ; b) model Y2 The presence of outliers can lead to a significant distortion of the form of the regression model, and, accordingly, to an increase in the error. In this regard, the anomalous values of the regressors in the prepared dataset should be changed or excluded from the set that will be used to build a linear regression model. Table 3 shows the MSE value for each of the models Y1  Y10 . Models Y1 , Y10 correspond to the lowest and highest MSE values. Each of the models in Table 3 has a significant number of anomalous values. The MSE value for each of the model Y1  Y10 is approximately the same (Table 3). To improve the prediction accuracy of the regression model, exclude outliers from the dataset and rebuild the models Y1  Y10 based on the changed data. There are a number of methods for correcting the anomalous values that are presented in the dataset. We will take advantage of excluding rows from the dataset that correspond to patients with abnormal values of one or more regressors. After excluding six rows from the dataset, each of which corresponds to the outlier in Figure 1, the coefficients for the linear regression models were recalculated. Linear regression models Y1 , Y10 (5), (6) after recalculation of the coefficients have the form: Y1  0.19  0.11X1  0.31X 2  0.01X 3  0.02X 4  0.31X 5  0.02X 6  0.01X 7  (7)  0.08X 8  0.03X 9  0.08X10 , Y10  0.23  0.1X1  0.002X 3  0.03X 4  0.29 X 5  0.02 X 6  0.02 X 7  0.03X 8  (8)  0.05X 9  0.09X10  0.01X11 . Linear regression models Y1 , Y10 , correspond to Residual plots, presented in Figure 1. a) b) Figure 3: Residual plot for 10--factor linear regression model after excluding outliers: a) model Y1 ; b) model Y10 . The final results of the analysis of the models after excluding outliers are presented in Table 4. For each model, the MSE value was determined before and after excluding anomalous values. There is a decrease in the MSE value by several times. For the model Y1 the MSE value decreased from 0.071 to 0.027, and for the model Y10 the MSE value decreased from 0.088 to 0.037. As before excluding outliers, the model Y1 has the best indicator according to the MSE criterion (4), and the model Y10 has the worst indicator. The characteristics of the models after excluding the anomalous values are presented in Table 4. As expected, the trend of decreasing MSE value for each of the models corresponds to the trend of decreasing MSE for the models Y1 , Y10 . Additionally, for each of the models presented in Table 4, a Residual plot is built and outliers are determined that are less significant than the initial ones and which can be used for a more in-depth analysis of the dataset. The headings of the table columns indicate the coded patient numbers to which the detected outliers correspond. The presence of a "+" symbol in the column indicates the presence of an outlier. The symbol "  " corresponds to a situation where the emission is negligible. Analysis of the results presented in the table shows that the models Y1 , Y5 , Y8 contain almost the same outliers with coded patient numbers 66, 39, 49, 35, 20, 26. Models Y2 , Y4 also contain almost the same outliers with coded patients 66, 39, 35, 20, 26 and 66, 49, 35, 20, 26. Models Y6 , Y9 , Y11 contain outliers with coded patient numbers 66, 49, 39, 35, 20, 26. Table 4 Characteristics for a 10-way linear regression model after eliminating outliers № № MSE Number of patients MSE model before after 19 20 25 26 35 36 39 48 49 66 69 1 1 0.071 + + + + + + 0,027 2 5 0.071 + + + + + + 0,028 3 8 0.071 + + + + + + 0,028 4 3 0.072 + + +    +  0,03 5 11 0.072 + + + + + + 0,027 6 9 0.073 + + + + + + 0,026 7 6 0.074 + + + + + + 0,033 8 4 0.075 + + + + + 0,028 9 2 0.076 + + + + + 0,031 10 7 0.079 + + + + + + + + + 0,034 11 10 0.088 + + + + + + + + 0,037 Models Y7 , Y10 also contain almost the same outliers with coded patient numbers 69, 48 ,49, 36, 20, 39, 25, 26 and 66, 39, 19, 69, 25, 20, 49, 26. The sequence is indicated in ascending order of the residual error value. Analysis of the results of Table 4 allows us to form a set of rows to which outliers correspond and which are candidates for exclusion of the dataset. 3.2. Construction and analysis of 7-factor linear regression models The next step of the study is the construction of 7-factor linear regression models. To build the models, we use the factors from Table 2. From the eleven factors, 330 seven-factor models were built and analyzed 11! 7 C11   330 , (9) 7! (11  7)! coefficients for some of them are presented in table 5. Models are selected from Table 5 according to the smallest and largest values of MSE, Y78 (MSE=0.025) и Y182 =(MSE=0.061): Y78  0.18  0.09X1  0.27X 2  0.02X 4  0.35X 5  0.03X 6  0.04X 9  0.1X10 , (10) Y182  0.15  0.13X1  0.01X 3  0.04X 7  0.04X 8  0.07X 9  0.08X10  0.03X11 . (11) The values of these indicators will be used to analyze the quality of prediction for models with a different number of regressors. The residual plots shown in Figure 4 correspond to the linear regression models Y78 , Y182 . We see that the residual plot for the model Y182 (Figure 4) (Figure 4) contains a sufficiently large number of anomalous values for the residuals ei , which led to a significant increase in the MSE value of the model Y182 . Table 5 Coefficients for a seven-factor linear regression mode. № № Number of MSE A X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 model examined 0.0246 0.0246 0.0245 -0.18 0.09 0.35 0.03 0.27 0.02 0.04 0.1 1 78 50 - - - - 0.26 0.35 0.02 0.01 0.05 -0.2 0.1 0.1 2 109 50 - - - - 0.09 0.31 0.02 0.05 0.05 0.07 -0.2 0.3 3 112 50 - - - - …. …. . . . . . . … … 0.0608 0.0582 0.0576 -0.02 -0.00 0.02 0.08 0.09 0.02 0.07 -0.2 328 178 63 - - - - -0.21 -0.02 0.01 0.04 0.12 0.02 0,02 0.1 329 179 63 - - - - -0.03 -0.15 -0.01 0.13 0.04 0.04 0.07 0.09 330 182 63 - - - - a) b) Figure 4: Residual plot for 7-factor linear regression model: a) model Y78 ; b) model Y182 Note that for the seven-factor linear regression model, anomalous values for the residuals are shown, but not excluded ei , which correspond to certain outliers for the ten-factor model. The models have not been improved to allow comparison of the prediction accuracy of models built on the same dataset. 3.3. Construction and analysis of 5-factor linear regression models When we construct five-factor models, we use the same approach that was used to build seven-factor models. To build the models, we use the factors from Table 2. From the eleven factors, 462 five-factor models were built and analyzed 11! 5 C11   462 , (12) 5! (11  5)! coefficients for some of them are presented in table 6. To graphically represent the analysis results, residual plots were selected for the model Y309 , with the lowest MSE value and for the model Y133 with the lowest MSE value and for the model (Figure 5). a) b Figure 5: Residual plot for 5-factor linear regression model: a) model Y309 ; b) model Y133 Table 6 Coefficients for a five-factor linear regression model № № Number of MSE A X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 model examined 0.0245 -0.15 0.29 0.36 0.03 0.05 0.08 1 309 50 - - - - - - 0.0264 0.0254 -0.00 -0.03 0.11 0.36 0.03 0.48 2 34 70 - - - - - - -0.01 -0.01 0.38 0.14 0.04 0.49 3 1 70 - - - - - …. …. . . . . . . … … - 0.0673 0.0666 0.0665 -0.05 -0.06 -0.00 0.11 0.05 0.05 460 99 55 - - - - - - -0.07 -0.01 -0.11 0.05 0.06 0.1 461 404 63 - - - - - - -0.11 -0.04 0.08 0.04 0.1 0.0 462 133 63 - - - - - - Model Y309 with the lowest MSE and the model Y133 with the highest MSE have an analytic view: Y309  0.15  0.29X 2  0.36X 5  0.03X 6  0.05X 9  0.09X 10 , (13) Y133  0.11 0.1X 1  0.001X 3  0.04X 7  0.08X 8  0.04X 11 , (14) The model Y133 with the highest MSE contain a sufficient number of anomalous values for the residuals ei , which, as in the case of constructing seven-factor models, explains the high MSE value. It should be noted that the points characterizing the residual values ei for the models Y133 , Y309 , with the exception of a few outliers, practically lie on one straight line. This allows us to assume that for the considered five-factor models, the error  is distributed according to the normal law. 3.4. Construction and analysis of 3-factor, 2-factor, and paired linear regression models Three-factor models are not widely used in the analysis of the severity of bronchial asthma disease. These models are used as an indicator for superficially determining the severity. However, we believe that three-factor models are worth considering for a general understanding of the issue of how much the forecast accuracy increases when moving from a three-factor linear regression model to a five-factor or seven-factor linear regression model. To build three-factor models, the factors from Table 2 were used. Of the eleven factors, 165 three-factor models were constructed and analyzed 11! 3 C11   165 , (15) 3! (11  3)! coefficients for some of them are presented in table 7. To demonstrate the analysis results, residual plots were selected for the model Y3 and Y103 (Figure 6). Table 7 Coefficients for a three-factor linear regression model. № № Number of MSE A X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 model examined 0.0258 0.0254 0.0248 -0.03 0.12 0.36 0.48 1 3 70 - - - - - - - - 0.090 0.02 0.37 0.51 2 54 70 - - - - - - - - -0.01 0.37 0.00 0.5 3 47 70 - - - - - - - - …. …. . . . . . . … … 0.0695 -0.08 -0.00 0.07 0.09 163 108 63 - - - - - - - - 0.0723 0.0709 -0.08 0.01 0.01 0.1 164 106 63 - - - - - - - - -0.09 -0.02 0.06 0.00 165 103 63 - - - - - - - As in previous multivariate model analyzes, the first model Y3 corresponds to the lowest MSE, and the second model corresponds to the highest MSE of the analyzed number from three-factor models. The analytical presentation of the models has the form: Y3  0.11  0.12 X 1  0.36 X 2  0.48 X 5 , Y103  0.02  0.004X 3  0.06X 7  0.09 X11 , (16) The minimum and maximum MSE for three-factor models does not differ much from the minimum and maximum MSE for five-factor models. The jump-like dependence of the values for the residuals ei from z value is explained by the fact that for the forecast regressors are used, which are represented by qualitative values (for example, there is the presence of a feature or there is no presence of a feature). a) b) Figure 6: Residual plot for a 3-factor linear regression model: a) model Y3 ; b) model Y103 Indeed, [Allergic rhinitis], [Atopic dermatitis] and [Bronchial asthma in relatives of second generation] were chosen as regressors for predicting the severity of bronchial asthma disease for the model corresponding to the best result in terms of the quality of fit (4). Table 8 Coefficients for a two-factor linear regression model № № Number of MSE A X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 model examined 0.026 0.05 0.38 0.5 1 13 70 - - - - - - - - - 0.034 -0.01 0.13 0.5 2 4 70 - - - - - - - - - 0.035 0.01 0.01 0.5 3 21 70 - - - - - - - - - …. …. . . . . . . … … -0.1 0.01 -0.06 0.01 0.072 0.071 0.07 0.1 53 24 63 - - - - - - - - - -0.11 0.6 54 49 63 - - - - - - - - - 0.01 0.06 55 23 63 - - - - - - - - - These factors are decisive in the superficial diagnosis of the observed value. In contrast to the model Y3 (Figure 6.а), for the model Y103 the points characterizing the values of the residuals lie on one straight line, with the exception of several outliers ei which contains each of the models considered above. Model Y103 is presented by regressors [Number of years from the first symptoms], [Domestic dust], [CD25 10*3 cells], among which the values of the two regressors are given by quantitative continuous variables. Thus, in three-factor models, the use of bronchial disease to predict the severity of the course of the disease is not appropriate. Table 9 Model Coefficients for paired regression № № model Number of MSE A X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 examined 0.036 0.05 0.47 1 5 70 - - - - - - - - - - 0.05 0.04 0.36 2 2 83 - - - - - - - - - - 0.051 0.05 0.1 3 4 75 - - - - - - - - - - …. …. . . . . . . … … 0.068 -0.01 0.13 9 10 63 - - - - - - - - - - -0.07 -0.01 0.071 0.07 0.6 10 8 63 - - - - - - - - - - 0.01 0.06 11 7 63 - - - - - - - - - In conclusion, we will consider two-factor (Table 8) and one-factor (Table 9) linear regression models. As a result of the analysis, 55 two-factor models and 11 paired regression models were considered: 11! 11! 2 C11   55 , C11 1   11 , (17) 2! (11  2)! 1! (11  1)! coefficients for some of them are presented in tables 8 and 9. Analytical representation of the two-factor model Y21 , Y23 is: Y21  0.01 0.01X 4  0.5 X 5 , Y23  0.1  0.01X 3  0.06X 7 . (18) The model Y21 corresponds to the lowest MSE and the model Y23 corresponds to the highest MSE. Note to MSE that the paired regression model contains the same factor that is present in the two-factor model: Y5  0.05  0.47 X 5 , Y7  0.07  0.06X 7 . (19) Thus, a two-factor linear regression model is a refinement of a paired regression model. An important note is that the paired regression model with the minimum MSE contains the [Bronchial asthma in relatives of second generation] regressor, which in the two-parameter model is supplemented by the [Bronchial asthma in father] factor, and in the three-parameter model [Allergic rhinitis], [Atopic dermatitis]. The prediction accuracy of the two-parameter regression model is quite close to the prediction accuracy of the three-parameter regression model. 4. Analysis of results In the previous section, a detailed analysis of multivariate linear regression models, consisting of ten, seven, five, three, two factors, as well as an analysis of the paired regression model was carried out. For each category of models, the model with the lowest and highest MSE values was found. The obtained MSE values are used to compare the quality of predicting the observed value, presented in Figure 7. The dotted line in the graph shows the average value MSE, which is half the sum of the smallest and largest values. Figure 7: Criterion for the quality of predicting the severity of the course of bronchial asthma The results obtained clearly show that for linear regression models dependent on two to five factors, the value MSE has almost the same value. Improving the forecasting quality is achieved by increasing the number of regressors. With an increase in the number of regressors, the range of variation of the value is significantly narrowed MSE  MSEmin; MSEmax . In this case MSEm in the value changes slowly with an increase in the number of regressors. When moving from a five-factor linear regression model to a ten-factor linear regression model, the the average value MSE (the dotted line) decreased by an amount not exceeding 20%. Approximately the same value MSE min is explained by the fact that when the number of factors decreases, outliers are excluded. Indeed, on the one hand, a decrease in the number of factors should lead to an increase in the error. On the other hand, a model with fewer factors contains only outliers that correspond to the model factors, which accordingly improves the model's accuracy. In this regard, an important conclusion should be made about the need for preliminary data processing. The presence of outliers can lead to a decrease in accuracy with an increase in the number of regressors. As shown in this work, due to the fact that for models with ten or more factors, the error  has a normal distribution with distribution characteristics (3), and therefore, the linear regression model is the most successful for predicting the severity of bronchial asthma. However, the construction of regression models for predicting an observed value with a number of factors significantly greater than ten factors is associated with significant computational difficulties. A slow decrease in the value MSEm in with an increase in the number of model regressors practically makes it impossible to significantly increase the accuracy of the forecasting model by increasing the number of factors. The performed numerical experiments showed that the computational time required to calculate the coefficients of the linear regression model quadratically depends on the number of regressors in the model. 5. Conclusion In this work, we performed a comparative analysis of the quality of predicting the severity of the course of bronchial asthma depending on the number of regressors in the model. For comparative analysis, a multivariate linear regression model was used. The substantiation of the distribution law for the forecasting error  is given. The comparative analysis of values MSE for multivariate linear regression models using the example of the considered dataset shows that the use of models with less than six factors is inappropriate. The results obtained indicate that linear regression models with a small number of factors have approximately the same value MSE . As a result of performing this study, an important conclusion was obtained that the value MSE slowly decreases with an increase in the number of model regressors. This raises the relevance of the search for new methods for predicting the severity of bronchial asthma disease, including the use of Machine learning. A prospect for further research is to analyze the quality of fit the observed value depending on the number of regressors for different types of nonlinear regression models. 6. References [1] Global Initiative for Asthma: Global Strategy for Asthma Management and Prevention. 2020. https://ginasthma.org/wpcontent/uploads/2019/01/2014-GINA.pdf. [2] Wang, Xin, Tapani Ahonen, and Jari Nurmi. "Applying CDMA technique to network-on- chip." IEEE transactions on very large scale integration (VLSI) systems 15.10 (2007): 1091-1100. [3] P. S. Abril, R. Plant, The patent holder’s dilemma: Buy, sell, or troll?, Communications of the ACM 50 (2007) 36–44. doi:10.1145/1188913.1188915. [4] S. Cohen, W. Nutt, Y. Sagic, Deciding equivalances among conjunctive aggregate queries, J. ACM 54 (2007). doi:10.1145/1219092.1219093. [5] Fuchs O, Bahmer T, Rabe KF, von Mutius E. Asthma transition from childhood into adulthood. Lancet Respir Med 2017; 5:224–234. [6] de Vries R, Dagelet YWF, Spoor P, Snoey E, Jak PMC, Brinkman P, et al. . Clinical and inflammatory phenotyping by breathomics in chronic airway diseases irrespective of the diagnostic label. Eur Respir J 2018; 51:1701817. [7] Konradsen JR, Skantz E, Nordlund B, Lidegran M, James A, Ono J, et al. . Predicting asthma morbidity in children using proposed markers of Th2-type inflammation. Pediatr Allergy Immunol 2015; 26:772–779. [8] Fitzpatrick AM, Jackson DJ, Mauger DT, Boehmer SJ, Phipatanakul W, Sheehan WJ, et al. . Individualized therapy for persistent asthma in young children. J Allergy Clin Immunol 2016; 138:1608–1618.e12. [9] Fitzpatrick AM, Moore WC. Severe asthma phenotypes—how should they guide evaluation and treatment? J Allergy Clin Immunol Pract 2017; 5:901–908. [10] Carr TF, Bleecker E. Asthma heterogeneity and severity. World Allergy Organ J. 2016; 9(1): 41 [11] Bush A, Fleming L, Saglani S. Severe asthma in children. Respirology. 2017; 22(5): 886- 897. [12] Smit HA, Pinart M, Antó JM, et al. Childhood asthma prediction models: a systematic review. Lancet Respir Med. 2015; 3(12): 973-984. [13] . Colicino S, Munblit D, Minelli C, Custovic A, Cullinan P. Validation of childhood asthma predictive tools: a systematic review. Clin Exp Allergy. 2019; 49(4): 410- 418. [14] Amin P, Levin L, Epstein T, et al. Optimum predictors of childhood asthma: persistent wheeze or the Asthma Predictive Index? J Allergy Clin Immunol Pract. 2014; 2(6): 709- 715. [15] Luo G, Nkoy FL, Stone BL, Schmick D, Johnson MD. A systematic review of predictive models for asthma development in children. BMC Med Inform Decis Mak. 2015; 15(99). [16] Grabenhenrich LB, Reich A, Fischer F, et al. The novel 10-item asthma prediction tool: external validation in the German MAS birth cohort. PLoS ONE. 2014; 9(12):e115852. [17] van der Mark LB, van Wonderen KE, Mohrs J, van Aalderen WM, ter Riet G, Bindels PJ. Predicting asthma in preschool children at high risk presenting in primary care: development of a clinical asthma prediction score. Prim Care Respir J. 2014;23(1):52–9. [18] Chatzimichail E, Paraskakis E, Sitzimi M, Rigas A. An intelligent system approach for asthma prediction in symptomatic preschool children. Comput Math Methods Med. 2013;2013:240182. [19] Caudri D, Wijga A, Schipper CM A, Hoekstra M, Postma DS, Koppelman GH, et al. Predicting the long-term prognosis of children with symptoms suggestive of asthma at preschool age. J Allergy Clin Immunol. 2009;124(5):903–10. [20] Pescatore AM, Dogaru CM, Duembgen L, Silverman M, Gaillard EA, Spycher BD, et al. A simple asthma prediction tool for preschool children with wheeze or cough. J Allergy Clin Immunol. 2014;133(1):111–8. [21] Mikalsen IB, Halvorsen T, Eide GE, Øymar K. Severe bronchiolitis in infancy: can asthma in adolescence be predicted? Pediatr Pulmonol. 2013;48(6):538–44. [22] Vial Dupuy A, Amat F, Pereira B, Labbe A, Just J. A simple tool to identify infants at high risk of mild to severe childhood asthma: the persistent asthma predictive score. J Asthma. 2011;48(10):1015–21. [23] Smolinska A, Klaassen EM, Dallinga JW, van de Kant KD, Jobsis Q, Moonen EJ, et al. Profiling of volatile organic compounds in exhaled breath as a strategy to find early predictive signatures of asthma in children. PLoS One. 2014;9(4), e95668. [24] Marenholz I, Kerscher T, Bauerfeind A, Esparza-Gordillo J, Nickel R, Keil T, et al. An interaction between filaggrin mutations and early food sensitization improves the prediction of childhood asthma. J Allergy Clin Immunol. 2009;123(4):911–6. [25] Bose S, Kenyon CC, Masino AJ Personalized prediction of early childhood asthma persistence: A machine learning approach. (2021) Personalized prediction of early childhood asthma persistence: A machine learning approach. PLOS ONE 16(3): e0247784. https://doi.org/10.1371/journal.pone.0247784. [26] O. Kozhyna, O. Pihnastyi, Covariance coefficients factors from a clinical study of the severity of bronchial asthma in children of the Kh beforearkov region, 2017, Mendeley Data, 1, 2019.