58 Modified Method of Subtractive Clustering for Modeling of Distribution of Harmful Vehicles Emission Concentrations Mykola Dyvak1, Yurii Maslyiak2, Iryna Voytyuk3, Bogdan Maslyiak4 Faculty of Computer Information Technologies, Ternopil National Economic University, UKRAINE, Ternopil, 8 A. Chekhova str., email: mdy@tneu.edu.ua1, yuramasua@gmail.com2, i.voytyuk@tneu.edu.ua3, bm@tneu.edu.ua4 Abstract: Mathematical modeling of distribution of task is solved by methods of interpolation [4,5]. The first one harmful vehicle emissions concentrations is considered in and second one are the subjects for research of this work. the paper. The modified subtractive clustering method for modeling is proposed. This method is characterized by its II. STATEMENT OF THE PROBLEM implementation simplicity due to the fact that it does not To solve the environmental monitoring tasks, it is require a large sample of experimental data and does not necessary to build models of stationary and non-stationary require to set a predetermined number of clusters. An fields of concentrations of harmful vehicle emissions [6]. The example of clustering method application for data theoretical basis for solving this type of tasks are the preparation for modeling of distribution of harmful mathematical models of objects with distributed parameters vehicle emissions concentrations is given. in the form of partial differential equations. Concentration of Keywords: mathematical modeling, clustering analysis, attention on the physical properties of environment requires interval analysis, interval difference operator, harmful to significantly complicate the mathematical model. Even vehicle emissions, vehicular traffic. despite the fact that, in practice, it is impossible to verify the results of modeling with real data obtained under conditions I. INTRODUCTION that meet the conditions of modeling. First of all, this is One of the biggest problems of large and medium cities is related to the complexity of the measurement experiment. For the pollution of the surface atmospheric layer and soils by example, if a mathematical model in the form of a differential harmful emissions from vehicles. Large amount of harmful equation accurately enough describes the process of substances is concentrated in the vehicles exhaust gases. transferring of chemical substances in the atmosphere in case Among them, with high concentrations: Nitrogen oxides, of wind gusts or other turbulent phenomena in the Carbon oxides and Sulfur oxides. Motor transport is a source atmosphere, then an integrated value of the chemical of atmospheric and soils pollution, which should be substance concentration per volume unit is established in the considered as distributed one. For reflecting and predicting of process of measurement. In addition, the accuracy of such concentrations of harmful vehicle emissions, it is expedient to measurements is low, the relative measurement error may use mathematical models. They can be built based on the reach 50%. Consequently, it is enough to build a results of selective observations of their dynamics with mathematical model with an accuracy that is equiualent to the known boundary errors of measurements. This approach was accuracy of the measurement experiment. At this, it is considered in [1]. expedient to represent the experimental data in the form of The process of harmful emissions distribution and its intervals of possible values of the modeled characteristic: dynamics is considered as a mass transfer process. For its [ zi−, j ,h,k ; zi+, j ,h,k ], description, the difference operators (schemes) are used. (1) Their identification is carried out using the data of = = i 1,..., I , j 1,..., = J , h 1,..., = H , k 1,..., K measurements of harmful emissions concentrations with − + known boundary errors. Such data are called interval data where zi , j , h, k , zi , j , h, k are the lower and upper bounds of the [2,3]. As is known, the methods of difference operators interval of possible values of measured concentration of identification based on the interval data analysis require a harmful substances in the grid nodes with discretely given uniform measurement grid that is impossible for the real city spatial coordinates i = 1,..., I , j = 1,..., J , h = 1,..., H at the conditions. Mostly, the measurements of harmful emission dicrete time value k = 1,..., K , respectively. concentrations is carried out in places with intensive traffic It is worth to note that, in the measurement experiment, we and accumulation of vehicles. This means that measurement can set the lower and upper bounds based on the relative error grid is not uniform. Thus, for building of mentioned models, it is necessary to solve three tasks related to data preparation: of the measuring device: zi−, j , h, k = zi , j , h, k − zi , j , h, k ⋅ ε and execute cluster analysis for defining of homogeneous zi−, j , h, k = zi , j , h, k + zi , j , h, k ⋅ ε , where zi , j , h, k is the measured vehicular traffic intensity areas; identify the discrete values of the grid step; calculate estimates of harmful vehicle value of the harmful substance concentration; ε is relative emissions concentrations in the nodes of the grid. The third measurement error. ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic 59 Under these conditions, macromodeling is the only way to vi , j ,h,k ⊂ [ zi−, j ,h,k ; zi+, j ,h,k ], reflect the distribution of harmful emissions concentrations. (3) The building of such macromodels is convenient to carry out = ∀i d ,..., I=, ∀j 1,..., J = , ∀h d ,..., H = , ∀k d ,..., K based on the obtained interval data in the form (1). Based on the results of conducted analysis, we can state In the papers of O.G. Ivakhnenko [7], the inductive that for ensuring of conditions of the given accuracy (3) of approach is described for choosing of acceptable way of the macromodel in the form of linear DO (2) during solving mathematical description of these processes. Its essence the task of its parametric identification, the application of consists in defining of some difference scheme in the way of interval data analysis methods [9] is substantiated. its adjustment in accordance with the experimental data. The  difference scheme itself, which actually converts the values Let’s assume that the vector of parameters estimates g in of input variables into output values, is called a difference the DO (2) is obtained based on the interval data analysis.  operator. The process of adjustment of this scheme is called Substituting the vector of parameters estimates g of DO structural identification [8,12].  instead of their true values g in expression (2) together with In general case, the expression of linear in parameters given initial interval values of each element in the set difference operator (DO) has the following form [2]:      [v0, 0, 0, 0 ],..., [v0, 0, h −1, 0 ], [vi −1, 0, 0, 0 ],..., [v0, j −1, 0, 0 ],..., vi , j ,h,k = f T (v0,0,0,0 ,..., v0,0,h−1,0 , vi −1,0,0,0 ,..., v0, j −1,0,0 ,...,     [vi −1, j −1, h −1, k −1 ] and given vectors of the input variables vi −1, j −1,h−1,k −1 , ui , j ,h,0 ,..., ui , j ,h,k ) ⋅ g , (2)   ui , j , h , 0 ,..., ui , j , h , k , we obtain an interval estimate of the =i d= = ,..., I, j d ,..., = J , h d ,..., H , k d ,..., K  T harmful substance concentration [vi , j , h, k ] in the nodes with where f (•) is the vector of basis functions (nonlinear, in discretely given spatial coordinates i = 1,..., I , general case) by which, the transformation of the modeled j = 1,..., J , h = 1,..., H at discrete moments of time k = 1,..., K : characteristic values, as well as the input variables in the  − +    spatial grid nodes for the certain discrete moments of time = is [vi , j ,h,k ] [v=i , j ,h ,k ; vi , j ,h ,k ] f T ([v0,0,0,0 ],...,[v0,0,h−1,0 ], carried out; vi , j , h, k modeled concentration of harmful    [vi −1,0,0,0 ],...,[v0, j −1,0,0 ],...,[vi −1, j −1,h−1,k −1 ], emissions in grid nodes with discretely-given spatial    (4) coordinates i = d ,..., I , j = d ,..., J , h = d ,..., H at the ui , j ,h,0 ,..., ui , j ,h,k ) ⋅ g ,   moments of time k = d ,..., K ; ui , j , h,0 ,..., ui , j , h, k are the =i 1,..., = = I , j 1,..., = J , h 1,..., H , k 1,..., K  vectors of input variables (controls); d is the DO order; g is Thus, the mathematical model of stationary and non- the vector of unknown parameters of DO. stationary fields of harmful emissions concentrations for the As a result of executing of structural identification task of environmental control will be described by a DO in procedure, we establish the difference computational scheme, general form (4). Taking into account that all calculations in  in particular: the basis functions vector f T (•) ; sets and equation (4) are carried out using interval arithmetic rules [2], the difference operator (4) is called an interval difference dimensionality of input variables (controls) vectors   operator (IDO). ui , j , h,0 ,..., ui , j , h, k ; order of the difference scheme d, which, as The conditions of consistency of experimental data, is known, is equivalent to the order of the differential represented in interval form (1), with the data obtained based equation (the analogue of the difference scheme). To realize on macromodel in the form of IDO (4) are formulated as the difference scheme, it is also necessary to set the initial follows:   conditions, that is, the value of each discrete element from [vi−, j ,h,k vi+, j ,h,k ] ⊂ [ zi−, j ,h,k ; zi+, j ,h,k ], the set v0,0,0,0 ,..., v0,0, h −1,0 , vi −1,0,0,0 ,..., v0, j −1,0,0 ,..., (5)   vi −1, j −1, h −1, k −1 , ui , j ,h,0 ,..., ui , j ,h,k (as a rule, the initial ones) and = ∀i 1,..., I= , ∀j 1,..., J ,= ∀h 1,..., H ,= ∀k 1,..., K  Let’s substitute in expressions (5), instead of interval to establish the values of the parameters vector g estimates of the harmful substance concentrations components. If the structure of DO is known then, it remains   [vi−, j , h, k ; vi+, j , h, k ] , its interval values calculated using IDO (4), the actual task of adjusting the parameters of DO (2) in such a way as to ensure the maximal consistency between the together with taking into account the given initial interval modeled characteristic and experimentally obtained values of values of each element from the set   this characteristic. Such a task is called the parametric [v0,0,0,0 ] ⊆ [ z0,0,0,0 ],...,[v0,0,h −1,0 ] ⊆ [ z0,0,h −1,0 ], identification task [9,13].   Based on the requirements of ensuring the mathematical [vi −1,0,0,0 ] ⊆ [ zi −1,0,0,0 ],...,[v0, j −1,0,0 ] ⊆ [ z0, j −1,0,0 ],..., (6) model accuracy within the bounds of the measuring  [vi −1, j −1,h −1,k −1 ] ⊆ [ zi −1, j −1,h −1,k −1 ] experiment accuracy, the conditions of consistency between   experimental data, represented in the interval form (1), and and given vectors of input variables ui , j , h,0 ,..., ui , j , h, k . We data obtained based on the mathematical model in the form of obtain such interval system of non-linear algebraic equations DO (2), can be formulated in such form: (ISNAE) [3]: ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic 60 ( )  − + − + I J  [v0,0,0,0 ; v0,0,0,0 ] ⊆ [ z0,0,0,0 ; z0,0,0,0 ],..., Ph ( xh = , yh , k ) ∑ ∑ exp −α ⋅ u xh , yh ,k − u xi , y j ,k , (9) [vi−−d , j −d ,h−d ,k −d ; vi+−d , j −d ,h−d ,k −d ] ⊆ [ zi−−d , j −d ,h−d ,k −d ; zi+−d , j −d ,h−d ,k −d ]; =i 1 =j 1       where Ph ( xh , yh , k ) is potential of a point (center of cluster  [vi −1, j −1,h−1,k −1 ] = f T ([v0,0,0,0 ],...,[v0,0,h−1,0 ],[vi −1,0,0,0 ],...,       (7) with coordinates xh , yh at moment of time k); u xh , yh ,k ,  [v0, j −1,0,0 ],...,[vi −d , j −d ,h−d ,k −d ], u0 ,..., uk −1 ) ⋅ g ;  u xi , y j ,k are the numbers of motor vehicles in a point of   z − ≤ f T ([v    0,0,0,0 ],...,[v0,0,h −1,0 ],[vi −1,0,0,0 ],...,[v0, j −1,0,0 ],..., potential cluster center xh , yh , k and in points xi , y j , k with  i , j ,h,k      [vi −d , j −d ,h−d ,k −d ], u0 ,..., uk ) ⋅ g ≤ zi+, j ,h,k ; defined traffic intensity and measured concentrations,  respectively.  =i d= ,..., I , d = 2,..., J , h d= ,..., H , k d ,..., K . The illustration of the potentials distribution is represented So, the ISNAE (7) is obtained by substituting the interval as a surface in the form of a mountainous relief (Fig. 1), estimates of the output characteristic (given in the form of whose peaks have the highest potential values and are initial conditions and predicted using expression (4) in the pretenders to be the centers of the formed clusters. remaining nodes of the grid) into conditions (5). Therefore, the task of parametric identification of IDO (4) under conditions (5) is the task of solving ISNAE in the form (7). Methods for estimation of solutions of the obtained ISNAE are described in [10]. The analysis of the proposed scheme for building of mathematical model of harmful vehicle emissions distribution showed that before its implementing, it is necessary to obtain a uniform grid of measured concentrations (3) in its nodes and vectors of influences on   them ui , j ,h,0 ,..., ui , j ,h,k . The main among them, is the vehicular traffic intensity. This task is solved using modified method of subtractive clustering of data [11] on the traffic intensity. III. MODIFIED SUBTRACTIVE CLUSTERING METHOD As the basis for method of clustering of vehicular traffic distribution, it is expedient to use the “mountain” clustering method with subtractive algorithm of its implementation. This method does not require a large sample of experimental data and does not require to set a predetermined number of clusters that significantly reduces the time for its implementation. It is also worth to note that the number of clusters based on this method is regulated by the only Fig. 1. The illustration of potentials distribution based on the parameter which is the cluster radius [11]. mountain clustering method. According to the clustering method, in the beginning, we As we can see in the Fig. 1, one “mountain peak” is form the potential cluster centers from the rows of data surrounded by other peaks that causes the problem of matrix for the clustering of input variables and calculate the building of very similar data clusters with the corresponding potentials of identified cluster centers using the expression: centers. This does not provide the high quality clustering   ( ch ) ∑ exp ( −α ⋅ ch − xk ) , K results. Ph = (8) As centers of the clusters, we choose the coordinates of k =1  “mountain peaks”, that is, the center of the cluster is the point where ch = (c1h , c2 h ,..., c Kh ) is a potential center of h-th on the city map with the highest value of potential:   cluster; а is a positive constant; ch − xk is a distance ( xh , yh , k ) = arg max Ph ( xh , yh , k ) . (10)  h =1,..., H between potential center of h-th cluster ch and input  In order to avoid the building of similar clusters, we must experimental data xk , k=1,…,K, h = 1,..., H ; H is a number of recalculate the potential values for the remaining possible possible clusters. cluster centers: In our case, if the only property of a cluster which is the number of vehicles u xi, yj ,k in the point with coordinates ( xh+1, yh+1, k ) Ph+1 ( xh , yh , k ) − Ph ( xh , yh , k ) ⋅ Ph+1 = (11) xi , y j at a discrete moment of time k is taken into account, exp(− β ⋅ u xh +1 , yh +1 ,k − u xh , yh ,k ), h = 1, …, Н , the expression for estimation of potentials of given cluster where Ph ( xh , yh , k ) is a potential center of h-th cluster on h- centers, has such form: ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic 61 th iteration; Ph +1 ( xh , yh , k ) is a potential of center of h-th discretization step for building a DO. In our case, the cluster is the set of points of a certain area of the city with similar cluster on h+1 iteration; β is a positive constant, values of current vehicles number. u xh +1 , yh +1 ,k − u xh , yh ,k is a distance between potential center of h+1 cluster and center of found h-th cluster. The procedure of cluster centers calculation is carried out until all the rows of the input variable matrix X, which is represented by the set (3), are excluded. The above procedure is based on the subtractive clustering algorithm, which is based on the following steps. Step 1. Forming of potential cluster centers. They are all points of measured harmful emission concentrations and corresponding intensities of vehicular traffic. Step 2. Calculation the potential of possible cluster centers based on (9). Step 3. Selecting the data point with the maximal potential for representation of the cluster center based on (10). Step 4. Excluding the influence of the found cluster center in the way of recalculating the potentials for other possible cluster centers by (11). Step 5. Identifying the next cluster and the coordinates of its center. If the maximal value of the cluster center potential exceeds some predetermined threshold which is the cluster radius, that is Ph ( xh , yh , k ) , then proceed to step 4, otherwise, the algorithm is completed. The iterative procedure for identification of cluster centers Fig. 2. Points of measurements of vehicular traffic intensity on and the recalculation of potentials is repeated until all points the example of Ternopil city. in the space of input experimental data are located within the neighborhoods of the radius of sought cluster centers. The result of the proposed method of cluster analysis As a result of the clustering algorithm implementation, we application is schematically shown in Fig. 3. As we can see, obtain h clusters, h = 1,…,Н, with the corresponding centers. during the clustering process, H clusters with different The next step is the identification of a uniform grid nodes for vehicular traffic intensity and radius r were defined. So, the homogeneous parts of a cluster. The discrete values of the discrete values of grid nodes coordinates are equal to the grid nodes coordinates are equal to the cluster diameter, and cluster diameter. the value u xh , yh ,k is the number of vehicles in the point of a cluster center xh , yh , k . To assign the number of vehicles at k-th moment to the grid nodes, it is enough to analyze, what cluster the node is in. If the node belongs to the h-th cluster, then, the number of vehicles in the node is u xh , yh ,k . IV. EXAMPLE OF CLUSTERING METHOD APPLICATION Let’s consider the application of the developed clustering method for obtaining of uniform grid of nodes on an example of Ternopil city. The fragment of map of central part of Ternopil city with the marked points of the measured vehicular traffic intensity for one discrete time (one hour) is shown in the Fig. 2. As we can see, the traffic is distributed not uniformly over the territory. Therefore, it is advisable to measure its intensity at some selected points, where this intensity is the highest, for example, as it is shown on the map of Ternopil city. The points of measurement of traffic intensity are colored red on the map. The application of cluster analysis for determination of areas with specific vehicular traffic intensities under condition of identification of cluster centers that are located Fig. 3. The result of clustering of vehicles quantity distribution on on a certain uniform grid gives a possibility to define the the Ternopil city map. ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic 62 Obtained grid for building of distribution model of harmful Technical University Series "Information, cybernetics vehicle emission concentrations in the form of IDO (4) is and computer science.", 2011, Vol. 14 (188), pp. 8-17. schematically shown in Fig. 4. [3] M. Dуvak “Mathematical Modeling Tasks of Static Systems with Interval Data”, Ternopil: TNEU Publishing House "Economic Thought", 2011, 216 p. [4] M. Dyvak, I. Voytyuk, T. Dyvak, A. Pukas “Application of the interval difference operator for approximation of fields of harmful emissions concentration from vehicles”, Measuring and Computing Devices in Technological Processes, 2011, No. 1 (37), pp. 44-52. [5] Kvyetnyy R. N., Dementiev V. Yu., Mashnitsky M. O., Judin O. O. “Difference methods and splines in multidimensional interpolation problems”, Vinnitsa: UNIVERSUM, 2009, 87 p. [6] N. Ocheretnyuk, I. Voytyuk, M. Dyvak and Y. Martsenyuk, "Features of structure identification the macromodels for nonstationary fields of air pollutions from vehicles," Proceedings of International Conference on Modern Problem of Radio Engineering, Telecommunications and Computer Science, Lviv- Slavske, 2012, pp. 444-444. [7] A.G. Ivakhnenko “Inductive method of self-organizing of models of complex systems”, Kyiv: Scientific thought, 1981, 296 p. [8] M. Dyvak, I. Voytyuk, T. Dyvak, A. Pukas “Application Fig. 4. The uniform grid with known spatially distributed of the interval difference operator for approximation of vehicular traffic intensities. fields of harmful emissions concentration from vehicles”, Measuring and Computing Devices in For building of mentioned model in the form of IDO (4), it Technological Processes, 2011, No. 34 (110), pp. 86-94. is enough to execute interpolation and identify the pollution [9] T. Dyvak “Parametric identification of interval difference concentrations in the grid nodes. operator on the example of macromodel for distribution V. CONCLUSIONS of humidity in the drywall sheets in the process of drying”, Information Technologies and Computer The modified method of subtractive clustering and interval Engineering: international Scientific Journal, 2012, analysis for modeling of distribution of harmful vehicle Vol. 3, pp. 79-85. emissions concentrations and vehicular traffic intensity under [10] M. Dyvak, N. Porplytsya, Y. Maslyak, M. Shynkaryk conditions of non-uniform measurement grid were proposed “Method of Parametric Identification for Interval and substantiated. Discrete Dynamic Models and the Computational ACKNOWLEDGMENT Scheme of Its Implementation,” Advances in Intelligent Systems and Computing II: Selected Papers from the This research has been supported by National Grants of International Conference on Computer Science and Ministry of Education and Science of Ukraine “Mathematical Information Technologies, CSIT 2017, рр.101- 112, tools and software for control the air pollution from vehicles” 2018. (0116U005507) and “Mathematical tools and software for [11] Shtovba S. Introduction to the theory of fuzzy sets and classification of tissues in surgical wound during surgery on fuzzy logic. Аccess mode: the neck organs” (0117U000410). http://matlab.exponenta.ru/fuzzylogic/book1/index.php REFERENCES [12] Porplytsya, N., Dyvak, M., Dyvak, T., Voytyuk, I. “Structure identification of interval difference operator [1] A. Veremchuk, A. Pukas, I. Voytyuk and I. Spivak, for control the production process of drywall.” "Mathematical and software tools for modeling objects Proceedings of 12th International Conference on the with distributed parameters," 2016 13th International Experience of Designing and Application of CAD Conference on Modern Problems of Radio Engineering, Systems in Microelectronics, CADSM’2013, pp. 262- Telecommunications and Computer Science (TCSET), 264 (2013). Lviv, 2016, pp. 149-152. [13] Fliess, M., Sira-Ramirez, H. “Closed-loop parametric [2] I. Voytyuk, M. Dyvak, V. Nemish “Method and genetic identification for continuous-time linear systems via new algorithm for structure identification of interval algebraic techniques.” H. Garnier & L. Wang. Identification of differentice operators in the tasks of environmental Continuous-time Models from sampled Data, Springer, pp. monitoring”, Collected Works of Donetsk National 362–391, 2008. ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic