=Paper=
{{Paper
|id=Vol-2753/paper22
|storemode=property
|title=Mathematical Model Reduction Principle Development
|pdfUrl=https://ceur-ws.org/Vol-2753/paper12.pdf
|volume=Vol-2753
|authors=Yaroslav Matviychuk,Nataliia Shakhovska
|dblpUrl=https://dblp.org/rec/conf/iddm/MatviychukS20
}}
==Mathematical Model Reduction Principle Development==
Mathematical Model Reduction Principle Development Yaroslav Matviychuk, Nataliia Shakhovska Lviv Polytechnic National University, Lviv, 79013, Ukraine. Abstract The new principle of removing elements of mathematical model based on its parametric identification is proposed in the paper. It allows reducing computational and time complexity of the applications built on the model. The computation complexity reducing is important for online data processing. Example of such system is medical decision support system. Also, IoT-based solutions can used the proposed approach. In addition, reducing complexity can increase the accuracy of mathematical models. Therefore, reducing the number of parameters is an important step in data preprocessing, which is used in almost all modern systems. Known methods of reducing the dimension depend on the problem area, which makes it impossible to use them in ensemble models. The proposed method for the reduction of mathematical models is used for the ordinary differential equations and on the neural network model. Keywords 1 mathematical model, reduction, identification procedure, incorrectness, neural network, ordinary differential equation (ODE). 1. Introduction An identification of a mathematical model is reversed problem, and therefore incorrect. Identification errors can occur from redundant model elements. This is used to detect unnecessary elements. Especially this is important for big data processing in real time (medicine, smart city etc.) [1]. The dynamic object can be represented in the differential equations form. That is why the complexity decreasing is given the model simplifying. In most of these methods, the problem of taking into account the total error in assessing the quality indicators of the systems is not posed. Other examples of complex system are neural networks, particularly deep neural networks, auto- encoders and predictive models. For this purpose, the dimensionality reduction is widely used. The novel reduction principle of the mathematical model is presented in the paper. This principle is used for a long time in a simplified form to reduce mathematical models in the form of ordinary differential equations. The novelty of the proposed principle is the following: simpler identifying of redundant parameters based on relative deviations; removing of the elements with biggest relative deviations within the model. This allows increasing the accuracy of identification of mathematical model. The algorithm is processed for models in the forms of neural network and differential equations. The paper consists of following sections. Section 2 Related works presents methods of model complexity reduction. In section 3 the reduction method is given. Section 4 presents the reduction method implementation for different domains. The last section concludes this paper containing the probable decision of appraisal technique. IDDM’2020: 3rd International Conference on Informatics & Data-Driven Medicine, November 19–21, 2020, Växjö, Sweden EMAIL: matv@ua.fm (A.1), natalya233@gmail.com (A.2); ORCID: 0000-0002-5570-182X (A.1), 0000-0002-6875-8534 (A.2) © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 2. Related works Analysis of methods for simplifying mathematical models of dynamic systems shows the use of two main approaches: construction of a simplified model based on the criterion of proximity of quality indicators of the original and simplified models in the image space and in the state space [2]. The paper [3] presents uniform manifold approximation and projection (UMAP) reduction based on Riemannian geometry and algebraic topology. The main disadvantage of this method is time complexity. In [4] there is proposed the locality preserving projections algorithm. However, this algorithm can be used only for linear dimensionality reduction. The idea of reducing redundant elements in machine learning algorithms is not new at all. For, example, PCA (Principal Component Analysis) and k-means algorithms are used for dimensionality reduction in [5]. This classical approach allows also finding outliers. The disadvantage of these algorithms is ability to process only discrete data. Zuo Z. [6] proposed multi‐agent approach and identified using Lyapunov–Krasovskii functional in the time domain. As it is shown in the experimental results, the quality of the method depends on the domain. The model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders is used in [7]. The proposed method is simplified the model using the projection on nonlinear manifolds. The similar results are obtained in [8] too. The reduction is widely used in neural networks too. The basic idea is to find a subset of useless weights on the network and set them to zero. Without exhaustive search, it is hard to tell which weights are involved in the prediction. That is why such type of reduction depends on the model’s complexity. For reduction of unnecessary elements, Arnold – Kolmogorov – Hecht – Nielsen theorem is used [9, 13]. In this sense, the reduction of connections can be compared with the method of shutting down random neurons (dropout) during network training. In addition, if there are many zeros in the network, it takes up less space in the archive and can be read faster on some architectures. Group LASSO regularization is most often used to keep useless weights in networks close to zero. Unlike regular regularization, we do not regularize layer weights or activations directly [10]. However, there are difficulties with the application of channel reduction to "branched" architectures and residual-networks (ResNet). After trimming the extra neurons during the merging of the branches, the dimensions may not match. Using the variation optimization [11], we can approximate the discrete distribution of zeros and ones in the masking layer with a continuous one and optimize the parameters of the latter using the usual backpropagation algorithm. The main disadvantage is the dependence of empirical hyperparameters. It is much more interesting to throw out not individual weights, but neurons from fully connected layers or channels from convolutions as a whole. In this case, the effect of compressing the network and speeding up predictions is much more pronounced. For this purpose, the other structure of neural network can be used. The papers [12, 13] present the Neural-Like Structures based on Geometric Data Transformations. The main advantages of the proposed method are the following: not iterative training process, the high performance in training process, which creates conditions for solution of large-dimension tasks. This approach allows the time complexity reduction, but the number of model’s parameters is the same. The paper [14] proposed GMDH-neuro-fuzzy system with small number of hyperparameters but with huge time complexity. The pruning technic can be used for the neural networks reducing too. The variational Bayesian scheme for pruning convolutional neural networks in channel level is proposed in [15]. The channel pruning without desire of re-training stage allows the computation complexity reducing. Thus, the biggest part of the reduction methods depends on the domain. That is why the proposed method takes into account the error rate and can be used for different models. The reduction principle can be used for software reliability modeling too. Accurate software valuation models can greatly help software project managers: project managers will be able to make informed decisions on resource management, administration and planning project, and as a result will be able to complete the project on time and within the planned budget, which is a problem today. The essence of the proposed method is to find a functional subset with less variability results and higher accuracy than for the initial functional set of the model [16]. In this case the functional set includes the parameters of the model that allow it to be calibrated. 3. The materials and methods 3.1. An Exploration of a Reduction Principle Implemented For a simulated object there is an exact mathematical model with known parameters (p1,…, pn), which are the numerical values of the model elements. The model parameters are calculated using an identification procedure based on some data. The mathematical model and the identification procedure must satisfy the following two conditions. 1. If the parameter is 0, it means the absence of a model element. 2. The model parameters continuously depend on the identification data within some environment of this data. The parameter (p1,…, pn) is now will extended with additional parameters (p1,…, pn, pn+1,…, pm). So, the identification procedure for redundant parameters (pn+1,…, pm) will calculate the values close to zero (condition 1, within the accuracy of calculations): pi ≈ 0; i = n+1,…,m. (1) However, this property cannot detect unnecessary parameters, because some of necessary parameters may be close to zero. Then we introduce perturbations, named disturb, to the data of identification, but within the environment of continuously (given as condition 2). The identification procedure will calculate the parameters (p1’,…, pn’, pn+1’,…, pm’) different from the parameters (p1,…, pn, pn+1,…, pm). For each parameter, we compute the modules of relative deviations as following: δi = abs(( p'i − pi) / p'i ); i=1,…,m. (2) The absolute deviations (p'i–pi) tend to zero while perturbations tend to zero due to continuous dependence between parameters and perturbations. The same situation is for relative deviations. It given as following: δi → 0; i = 1,…,n; if disturb → 0. (3) On the contrary, for the unnecessary parameters the values of the relative deviation (2) are close to one due to (1): δi ≈ 1; i = n+1,…,m; if disturb ≠ 0 (4) Criteria (3) and (4) are received for the precision’s model. It is possible to extend these criteria to the arbitrary mathematical models. In the general case, the parameters with low influence on the simulation result have much more relative deviations (2) compared to the required parameters. Consistent elimination of elements of the mathematical model with a high relative deviation leads to improved accuracy and stability of the identification problem. Numerous examples confirm this conclusion [17, 18], including the examples given in this paper. 3.2. The reduction algorithm The following algorithm represents the main stages for the parameters reduction principle. (Algorithm 1). We propose a mechanical illustration of the reduction principle. The mechanical structure in the form of a bridge truss is in a loaded state. Perturbation of this bridge truss causes fluctuation of beams. Loaded (necessary) beams will fluctuate with smaller amplitudes than unloaded (unnecessary). Algorithm 1. The parameters reduction Input data: list of model’s parameters, the structure of the model Output data: reduced list of model’s parameters 1. The model identification with parameter pi. 2. Identification of a weakly perturbed model with parameter pi. 3. Calculation of modules of relative deviations of parameters by the formula (2). 4. If there are no relative deviations that are significantly larger than the mean, then the end of the reduction. 5. Element reduction with the biggest δi Go to step 1. Student's paired t-test for dependent samples is used to check the mean values in the samples. Student's paired t-test is also used to test for a significant difference between the mean values of the accuracy of a subset function in this approach. This is a statistical test that determines the differences between the average values with a certain level of accuracy, assuming that the dependent variable corresponds to the law of normal distribution. It is used in this model to determine the best subset function. 4. Results 4.1. Lorenz Attractor Test Recovery The classical equations of the Lorenz attractor (5) are convenient for testing the principle of reduction, since they allow an analytical transformation into an equivalent form (6) convenient for our identification [17]. dx1 dt 10 x2 10 x1; dx2 dt 40 x1 x2 x1 x3 ; (5) dx3 dt 8 / 3 x3 x1 x2 ; x1 (0) 5; x2 (0) 4,5; x3 (0) 39,7. dy1 dt y2 ; dy2 dt y3 ; 88 41 dy3 dt 1040 y1 y2 y3 3 3 (6) y22y2 y3 11 y12 y2 10 y13 ; y1 y1 y1 (0) 5; y2 (0) 5; y3 (0) 20. So, the problem of reconstruction of the exact model (6) is as follows - having a discrete signal y1=x1, we must calculate three derivatives of this signal y1'=y2, y2'=y3, y3', and solve the identification problem (7): 2 M 4 4 ( min y3 m aijk y1im y2jm y3km bijk y1im y2jm y3km y1m1 . m1 7) a ,b i , j , k 0 i , j , k 0 All polynomial coefficients of the problem (7) are 50. However, for the exact model (6) only 7 coefficients are required. The remaining coefficients are unnecessary. The discrete signal y1=x1 is calculated by numerical integration of the equations (6) by the Runge-Kutta method with a step of 0.02 sec. from 0 sec up to 34 sec. On the obtained set of points, an interpolation spline of fifth degree is constructed and its three derivatives are calculated analytically. The numerical values of the spline and its derivatives is the data arrays y1m, y2m, y3m, y’3m, (m=1,…, 1701) for the identification (7). Next, the elemental reduction of the arrays of the coefficients aijk, bijk, according to the principle of reduction, was applied. The perturbations were added to values y’3m, with a relative value of 10-5. The relative deviations δi (2) were calculated, and the element with the largest δi was deleted. The criterion for completing the reduction is the compact set of residual relative deviations. The sign of this is the same number of the relative deviations that are larger and smaller than the average of the remaining area. In the reduction process, a magnitude of the relative deviations max(δi)-min(δi), and the middle relative error model coefficients are calculated. After 43 reduction steps, there are 7 coefficients of the exact model (6) with a middle relative error 0.0016. Figure 1 shows a change in the relative error with increasing number of reduction step. Figure 1: Changing the coefficients reproduction relative error of the model (6) in the process of reduction. The dependence of the area size of relative deviations on the reduction step is shown in Figure 2. Figure 2: Change the size of the area of relative deviations during the reduction. The same figure on an enlarged scale is shown in Figure 3. The process of forming a compact area is well visible. Figure 3: The Figure 2 on enlarged scale. On the Lorentz model, the procedure for increasing (induction) model was tested. Relative deviations (2) for all 50 coefficients were calculated. Next, starting with the three coefficients with the smallest relative deviations, the coefficients with the least relative deviations among the remaining coefficients were successively added to the model. The induction termination criterion is the formation of a compact area of relative deviations in 4 steps. It is easy to see the advantages of the induction process compared with the reduction. First, the calculation of relative deviations does not need to be repeated at each step. Suffice it to calculate them at the beginning of the induction. Secondly, the number of steps may be smaller. So, on the test example of the Lorentz attractor reconstruction, the validity of the basic principles of the reduction principle was checked. 4.2. Reduction Case for Neural Network A lot of research [9 – 13] shows the possibility to simplify the structure of neural networks (NN). They are based on the first and second derivatives of the objective function. Opposite to these research, our proposed solution does not depend on the method of network training and type of neural network. We used the reduction principle for neural network given in [2]. As a result of the reduction of the structure of the neural network model, as a rule, the number of neurons in the input and hidden layers decreases to the optimal number at which the classifying ability of the model will be maximum and at the same time not lower than the initial one (before reduction). The back propagation is used in the training stage. The network reduction is implemented as a sequential removing of the connections. The graph of mean square error of approximation of one output variable depending on the number of iteration is shown in Figure 4. Figure 4: The approximation error from the reduction of neural network. Figure 5 and Figure 6 show the reduction in the range of the relative errors δi values. The relative deviations values in reduced network are assembled in a compact group near zero. This is a sign of the end of the reduction process. Figure 5: The reducing in the range of values of relative errors δi from the reduction of neural network. Figure 6: The Figure 6 on a larger scale. We found that the smallest error value is 0.09 on the 99-th iteration, while 35% of connections are removed. The reduction of the network reduced the inaccuracy 16 times. The next step is estimation of neurons in hidden layers using well-known technics. To determine the number of neurons in the hidden layer of a neural network, it is customary to use a consequence of the Arnold – Kolmogorov – Hecht – Nielsen theorem, according to which the maximum number of neurons in the hidden layer of the perceptron is limited by the right side of expression [13]: Nh <2 Nin + 1, (8) where Nh is the number of neurons in the hidden layer, and Nin is the number of neurons in the input layer. Thus, this expression determines the upper limit of the number of neurons in the hidden layer of the perceptron neural network model. Consequently, such a number of neurons can lead to redundancy in the structure of the model and, as a consequence, to ineffectiveness of its practical use. In our case the input architecture of NN satisfies the Arnold – Kolmogorov – Hecht – Nielsen theorem. As results, the standard estimation without error analysis can’t optimize the NN. Let us compare results obtained b Dropout technic. The main idea behind Dropout is instead of training one NN to train an ensemble of several NNs and then average the results. The standard approach implemented in Python 3 shows the average number of disabled neurons is proportional to np, where n is number of neurons in the initial NN, p is Dropout coefficient. (Figure 7). Figure 7: The Dropout results for probability p from 0 to 1. The main advantage of the proposed method is ability to evaluate not only the number of removed connections, but also the accuracy of the model. The Dropout technic is based on Rademacher complexity [19]. So, the complexity of the proposed method is much less. 4.3. Sun Influence on the Earth's Seismic Activities Model A crucial task of geo-heliogenic interactions is to model the influence of the Sun on Earth's earthquake and the intensity of near-surface infrasound. We construct a dynamic model, the variables of which are the intensity of the solar wind s(k) during k-th day, the average daily earth seismic activity g(k), and the average daily intensity of the near-surface infrasound z(k) recorded during m days: s(k); g(k); z(k); k=1,…,m. (9) The variable s(k) will be the input signal of the model, and the variables g(k) and z(k) are output signals, and z(k) depends on g(k). The model is chosen as a system of ordinary differential equations: s s ; (i 0,4); s P. ( s , s , s , s , s , s ); i i 1 5 s 0 1 2 3 4 5 gi gi 1; (i 0,4); g5 Pg ( s4 , s5 , g 0 , g1, g 2 , g3 , g 4 , g5 ); (10) zi zi 1; (i 0,4); z5 Pz ( s4 , s5 , g 4 , g5 , z0 , z1 , z2 , z3 , z4 , z5 ); where the arguments of power polynomials Ps(.), Pg(.) i Pz(.) are selected from a priori considerations. The general order of the model is 18. Identification of the coefficients of the polynomials is the solution of three problems using the reduction of polynomials m 2 min cs 5s ( k ) Ps s0 ( k ),..., s5 ( k ) ; k 1 . m 2 min c g5 (k ) Pg s4 (k ), s5 (k ), g0 (k ),..., g5 (k ) ); (11) g k 1 min z (k ) P s (k ), s (k ), g (k ), g (k ), z (k ),..., z (k ) ), m 2 cz k 1 5 z 4 5 4 5 0 5 where cs, cg, cz – coefficients of multidimensional power polynomials Ps(.), Pg(.), Pz(.). The graphs of the solutions s0(t), g0(t), z0(t) of the model (9), identified by the experimental measurements s(k), g(k), z(k) for 1999, are shown in Figure 8. The obtained model has high accuracy of reproduction of experimental values only due to reduction. Thus, the relative mean square error of the approximation s(k) model (16), constructed for 119 - 197 days in 1999, is 2.11∙10–4. This provides a practical tool for researching and forecasting the activity of simulated geo-heliogenic variables. Figure 8: Time dependence of solar activity, seismic and infrasound. Experimental dependencies are depicted by lighter lines, and the solutions of the model (11) are darker. 5. Conclusions The novel principle of mathematical model reduction is proposed in the paper. This principle works for mathematical models of any nature, what is explored in this paper on examples. We analyzed the application for the neural network and the differential equations models. The given examples demonstrate efficient normalizing properties of the reduction principle for the mathematical models in the form of neural networks. The efficiency of the proposed approach comparing to existing methods are verified by different examples explored in the paper. The Dropout technic for neural network reduction is based on Rademacher complexity. So, the complexity of the proposed method is much less. The results of the developed method can be used in modeling and analysis of the complex systems, particularly for economical modeling [20], energy systems [21]. 6. References [1] Melnykova, N., Shakhovska, N., Gregus, M., Melnykov, V., Zakharchuk, M., & Vovk, O. (2020). Data-driven analytics for personalized medical decision making. Mathematics, 8(8), 1211. [2] Matviychuk, Y., & Matviychuk, O. (2018, April). Reduction principle of the mathematical model and its applications. In 2018 XIV-th International Conference on Perspective Technologies and Methods in MEMS Design (MEMSTECH) 174-177. [3] Becht, Etienne, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nature biotechnology, 2019, 37.1: 38. [4] Wang, Rong, et al. Fast and orthogonal locality preserving projections for dimensionality reduction. IEEE Transactions on Image Processing, 2017, 26.10: 5019-5030. [5] Jamal, Ade, et al. Dimensionality Reduction using PCA and K-Means Clustering for Breast Cancer Prediction. LONTAR KOMPUTER: Jurnal Ilmiah Teknologi Informasi, 2018, 192-201. [6] Zuo, Z., Wang, C., & Ding, Z. (2017). Robust consensus control of uncertain multi‐agent systems with input delay: a model reduction method. International Journal of Robust and Nonlinear Control, 27(11), 1874-1894. [7] Lee, K., & Carlberg, K. T. (2020). Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders. Journal of Computational Physics, 404, 108973. [8] Swischuk, R., Mainini, L., Peherstorfer, B., & Willcox, K. (2019). Projection-based model reduction: Formulations for physics-based machine learning. Computers & Fluids, 179, 704-717. [9] He, Y., Zhang, X., & Sun, J. (2017). Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1389- 1397). [10] Louizos, C., Welling, M., & Kingma, D. P. (2017). Learning Sparse Neural Networks through L0 Regularization. arXiv preprint arXiv:1712.01312. [11] Anderson, D., Ermentrout, B., & Thomas, P. Stochastic representations of ion channel kinetics and exact stochastic simulation of neuronal dynamics. Journal of computational neuroscience, 38(1), 2015, 67-82. [12] Tkachenko, R., Izonin, I. Model and Principles for the Implementation of Neural-Like Structures based on Geometric Data Transformations. In: Hu, Z.B., Petoukhov, S., (eds) Advances in Computer Science for Engineering and Education. ICCSEEA2018. Advances in Intelligent Systems and Computing. Springer, Cham,754, 2019, 578-587 [13] Izonin I., Tkachenko R., Kryvinska N., Tkachenko P., Greguš ml. M. Multiple Linear Regression Based on Coefficients Identification Using Non-iterative SGTM Neural-like Structure. In: Rojas I., Joya G., Catala A. (eds) Advances in Computational Intelligence. IWANN 2019. Lecture Notes in Computer Science, vol 11506, 2019, 467-479. [14] Bodyanskiy, Y., Boiko, O., Zaychenko, Y., & Hamidov, G.: Evolving GMDH-neuro-fuzzy system with small number of tuning parameters. In 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) 2017, 1321- 1326 [15] Zhao, C., Ni, B., Zhang, J., Zhao, Q., Zhang, W., & Tian, Q. (2019). Variational convolutional neural network pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2780-2789). [16] Postma, B. N., Poirier-Quinot, D., Meyer, J., & Katz, B. F. (2016). Virtual reality performance auralization in a calibrated model of Notre-Dame Cathedral. Euroregio, 6, 1-10. [17] Malachivskyy, P. S., Matviychuk, Y. N., Pizyur, Y. V., & Malachivskyi, R. P. (2017). Uniform approximation of functions of two variables. Cybernetics and Systems Analysis, 53(3), 426-431. [18] Matviychuk, Y., Malachivskyy, P., & Hasko, R. (2012, February). Peculiarities of approximations application to the identification of mathematical macromodels. In Proceedings of International Conference on Modern Problem of Radio Engineering, Telecommunications and Computer Science (pp. 47-48). IEEE [19] Yin, D., Kannan, R., & Bartlett, P. (2019, May). Rademacher complexity for adversarially robust generalization. In International Conference on Machine Learning (pp. 7085-7094). [20] Yakovyna, V., Berchenko, N., Kurbanov, K., Virt, I., Kurilo, I., & Nikiforov, Y. (2003). Analysis of the possibility to control complex semiconductors properties by shock wave treatment. physica status solidi (c), (3), 1019-1023. [21] Kaminskyi, R., Kunanets, N., Rzheuskyi, A., & Khudyi, A. (2018, September). Methods of statistical research for information managers. In 2018 IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT) 2, 127-131.