=Paper=
{{Paper
|id=Vol-2300/Paper15
|storemode=property
|title=Modified Method of Subtractive Clustering for Modeling of Distribution of Harmful Vehicles Emission Concentrations
|pdfUrl=https://ceur-ws.org/Vol-2300/Paper15.pdf
|volume=Vol-2300
|authors=Mykola Dyvak,Yurii Maslyiak,Iryna Voytyuk,Bogdan Maslyiak
|dblpUrl=https://dblp.org/rec/conf/acit4/DyvakMVM18
}}
==Modified Method of Subtractive Clustering for Modeling of Distribution of Harmful Vehicles Emission Concentrations==
58
Modified Method of Subtractive Clustering for Modeling
of Distribution of Harmful Vehicles Emission
Concentrations
Mykola Dyvak1, Yurii Maslyiak2, Iryna Voytyuk3, Bogdan Maslyiak4
Faculty of Computer Information Technologies, Ternopil National Economic University, UKRAINE, Ternopil, 8 A. Chekhova str., email:
mdy@tneu.edu.ua1, yuramasua@gmail.com2, i.voytyuk@tneu.edu.ua3, bm@tneu.edu.ua4
Abstract: Mathematical modeling of distribution of task is solved by methods of interpolation [4,5]. The first one
harmful vehicle emissions concentrations is considered in and second one are the subjects for research of this work.
the paper. The modified subtractive clustering method for
modeling is proposed. This method is characterized by its
II. STATEMENT OF THE PROBLEM
implementation simplicity due to the fact that it does not To solve the environmental monitoring tasks, it is
require a large sample of experimental data and does not necessary to build models of stationary and non-stationary
require to set a predetermined number of clusters. An fields of concentrations of harmful vehicle emissions [6]. The
example of clustering method application for data theoretical basis for solving this type of tasks are the
preparation for modeling of distribution of harmful mathematical models of objects with distributed parameters
vehicle emissions concentrations is given. in the form of partial differential equations. Concentration of
Keywords: mathematical modeling, clustering analysis, attention on the physical properties of environment requires
interval analysis, interval difference operator, harmful to significantly complicate the mathematical model. Even
vehicle emissions, vehicular traffic. despite the fact that, in practice, it is impossible to verify the
results of modeling with real data obtained under conditions
I. INTRODUCTION that meet the conditions of modeling. First of all, this is
One of the biggest problems of large and medium cities is related to the complexity of the measurement experiment. For
the pollution of the surface atmospheric layer and soils by example, if a mathematical model in the form of a differential
harmful emissions from vehicles. Large amount of harmful equation accurately enough describes the process of
substances is concentrated in the vehicles exhaust gases. transferring of chemical substances in the atmosphere in case
Among them, with high concentrations: Nitrogen oxides, of wind gusts or other turbulent phenomena in the
Carbon oxides and Sulfur oxides. Motor transport is a source atmosphere, then an integrated value of the chemical
of atmospheric and soils pollution, which should be substance concentration per volume unit is established in the
considered as distributed one. For reflecting and predicting of process of measurement. In addition, the accuracy of such
concentrations of harmful vehicle emissions, it is expedient to measurements is low, the relative measurement error may
use mathematical models. They can be built based on the reach 50%. Consequently, it is enough to build a
results of selective observations of their dynamics with mathematical model with an accuracy that is equiualent to the
known boundary errors of measurements. This approach was accuracy of the measurement experiment. At this, it is
considered in [1]. expedient to represent the experimental data in the form of
The process of harmful emissions distribution and its intervals of possible values of the modeled characteristic:
dynamics is considered as a mass transfer process. For its
[ zi−, j ,h,k ; zi+, j ,h,k ],
description, the difference operators (schemes) are used. (1)
Their identification is carried out using the data of = =
i 1,..., I , j 1,..., = J , h 1,..., = H , k 1,..., K
measurements of harmful emissions concentrations with − +
known boundary errors. Such data are called interval data where zi , j , h, k , zi , j , h, k are the lower and upper bounds of the
[2,3]. As is known, the methods of difference operators interval of possible values of measured concentration of
identification based on the interval data analysis require a harmful substances in the grid nodes with discretely given
uniform measurement grid that is impossible for the real city spatial coordinates i = 1,..., I , j = 1,..., J , h = 1,..., H at the
conditions. Mostly, the measurements of harmful emission dicrete time value k = 1,..., K , respectively.
concentrations is carried out in places with intensive traffic
It is worth to note that, in the measurement experiment, we
and accumulation of vehicles. This means that measurement
can set the lower and upper bounds based on the relative error
grid is not uniform. Thus, for building of mentioned models,
it is necessary to solve three tasks related to data preparation: of the measuring device: zi−, j , h, k = zi , j , h, k − zi , j , h, k ⋅ ε and
execute cluster analysis for defining of homogeneous
zi−, j , h, k = zi , j , h, k + zi , j , h, k ⋅ ε , where zi , j , h, k is the measured
vehicular traffic intensity areas; identify the discrete values of
the grid step; calculate estimates of harmful vehicle value of the harmful substance concentration; ε is relative
emissions concentrations in the nodes of the grid. The third measurement error.
ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic
59
Under these conditions, macromodeling is the only way to
vi , j ,h,k ⊂ [ zi−, j ,h,k ; zi+, j ,h,k ],
reflect the distribution of harmful emissions concentrations. (3)
The building of such macromodels is convenient to carry out = ∀i d ,..., I=, ∀j 1,..., J = , ∀h d ,..., H = , ∀k d ,..., K
based on the obtained interval data in the form (1).
Based on the results of conducted analysis, we can state
In the papers of O.G. Ivakhnenko [7], the inductive
that for ensuring of conditions of the given accuracy (3) of
approach is described for choosing of acceptable way of
the macromodel in the form of linear DO (2) during solving
mathematical description of these processes. Its essence
the task of its parametric identification, the application of
consists in defining of some difference scheme in the way of
interval data analysis methods [9] is substantiated.
its adjustment in accordance with the experimental data. The
difference scheme itself, which actually converts the values Let’s assume that the vector of parameters estimates g in
of input variables into output values, is called a difference the DO (2) is obtained based on the interval data analysis.
operator. The process of adjustment of this scheme is called Substituting the vector of parameters estimates g of DO
structural identification [8,12].
instead of their true values g in expression (2) together with
In general case, the expression of linear in parameters
given initial interval values of each element in the set
difference operator (DO) has the following form [2]:
[v0, 0, 0, 0 ],..., [v0, 0, h −1, 0 ], [vi −1, 0, 0, 0 ],..., [v0, j −1, 0, 0 ],...,
vi , j ,h,k = f T (v0,0,0,0 ,..., v0,0,h−1,0 , vi −1,0,0,0 ,..., v0, j −1,0,0 ,...,
[vi −1, j −1, h −1, k −1 ] and given vectors of the input variables
vi −1, j −1,h−1,k −1 , ui , j ,h,0 ,..., ui , j ,h,k ) ⋅ g , (2)
ui , j , h , 0 ,..., ui , j , h , k , we obtain an interval estimate of the
=i d= =
,..., I, j d ,..., =
J , h d ,..., H , k d ,..., K
T harmful substance concentration [vi , j , h, k ] in the nodes with
where f (•) is the vector of basis functions (nonlinear, in
discretely given spatial coordinates i = 1,..., I ,
general case) by which, the transformation of the modeled
j = 1,..., J , h = 1,..., H at discrete moments of time k = 1,..., K :
characteristic values, as well as the input variables in the
− +
spatial grid nodes for the certain discrete moments of time =
is [vi , j ,h,k ] [v=i , j ,h ,k ; vi , j ,h ,k ] f T ([v0,0,0,0 ],...,[v0,0,h−1,0 ],
carried out; vi , j , h, k modeled concentration of harmful
[vi −1,0,0,0 ],...,[v0, j −1,0,0 ],...,[vi −1, j −1,h−1,k −1 ],
emissions in grid nodes with discretely-given spatial
(4)
coordinates i = d ,..., I , j = d ,..., J , h = d ,..., H at the ui , j ,h,0 ,..., ui , j ,h,k ) ⋅ g ,
moments of time k = d ,..., K ; ui , j , h,0 ,..., ui , j , h, k are the =i 1,...,
= =
I , j 1,..., =
J , h 1,..., H , k 1,..., K
vectors of input variables (controls); d is the DO order; g is Thus, the mathematical model of stationary and non-
the vector of unknown parameters of DO. stationary fields of harmful emissions concentrations for the
As a result of executing of structural identification task of environmental control will be described by a DO in
procedure, we establish the difference computational scheme, general form (4). Taking into account that all calculations in
in particular: the basis functions vector f T (•) ; sets and equation (4) are carried out using interval arithmetic rules [2],
the difference operator (4) is called an interval difference
dimensionality of input variables (controls) vectors
operator (IDO).
ui , j , h,0 ,..., ui , j , h, k ; order of the difference scheme d, which, as The conditions of consistency of experimental data,
is known, is equivalent to the order of the differential represented in interval form (1), with the data obtained based
equation (the analogue of the difference scheme). To realize on macromodel in the form of IDO (4) are formulated as
the difference scheme, it is also necessary to set the initial follows:
conditions, that is, the value of each discrete element from
[vi−, j ,h,k vi+, j ,h,k ] ⊂ [ zi−, j ,h,k ; zi+, j ,h,k ],
the set v0,0,0,0 ,..., v0,0, h −1,0 , vi −1,0,0,0 ,..., v0, j −1,0,0 ,..., (5)
vi −1, j −1, h −1, k −1 , ui , j ,h,0 ,..., ui , j ,h,k (as a rule, the initial ones) and =
∀i 1,..., I=
, ∀j 1,..., J ,=
∀h 1,..., H ,=
∀k 1,..., K
Let’s substitute in expressions (5), instead of interval
to establish the values of the parameters vector g estimates of the harmful substance concentrations
components. If the structure of DO is known then, it remains
[vi−, j , h, k ; vi+, j , h, k ] , its interval values calculated using IDO (4),
the actual task of adjusting the parameters of DO (2) in such
a way as to ensure the maximal consistency between the together with taking into account the given initial interval
modeled characteristic and experimentally obtained values of values of each element from the set
this characteristic. Such a task is called the parametric [v0,0,0,0 ] ⊆ [ z0,0,0,0 ],...,[v0,0,h −1,0 ] ⊆ [ z0,0,h −1,0 ],
identification task [9,13].
Based on the requirements of ensuring the mathematical [vi −1,0,0,0 ] ⊆ [ zi −1,0,0,0 ],...,[v0, j −1,0,0 ] ⊆ [ z0, j −1,0,0 ],..., (6)
model accuracy within the bounds of the measuring
[vi −1, j −1,h −1,k −1 ] ⊆ [ zi −1, j −1,h −1,k −1 ]
experiment accuracy, the conditions of consistency between
experimental data, represented in the interval form (1), and and given vectors of input variables ui , j , h,0 ,..., ui , j , h, k . We
data obtained based on the mathematical model in the form of obtain such interval system of non-linear algebraic equations
DO (2), can be formulated in such form: (ISNAE) [3]:
ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic
60
( )
− + − + I J
[v0,0,0,0 ; v0,0,0,0 ] ⊆ [ z0,0,0,0 ; z0,0,0,0 ],..., Ph ( xh =
, yh , k ) ∑ ∑ exp −α ⋅ u xh , yh ,k − u xi , y j ,k , (9)
[vi−−d , j −d ,h−d ,k −d ; vi+−d , j −d ,h−d ,k −d ] ⊆ [ zi−−d , j −d ,h−d ,k −d ; zi+−d , j −d ,h−d ,k −d ]; =i 1 =j 1
where Ph ( xh , yh , k ) is potential of a point (center of cluster
[vi −1, j −1,h−1,k −1 ] = f T ([v0,0,0,0 ],...,[v0,0,h−1,0 ],[vi −1,0,0,0 ],...,
(7) with coordinates xh , yh at moment of time k); u xh , yh ,k ,
[v0, j −1,0,0 ],...,[vi −d , j −d ,h−d ,k −d ], u0 ,..., uk −1 ) ⋅ g ;
u xi , y j ,k are the numbers of motor vehicles in a point of
z − ≤ f T ([v
0,0,0,0 ],...,[v0,0,h −1,0 ],[vi −1,0,0,0 ],...,[v0, j −1,0,0 ],..., potential cluster center xh , yh , k and in points xi , y j , k with
i , j ,h,k
[vi −d , j −d ,h−d ,k −d ], u0 ,..., uk ) ⋅ g ≤ zi+, j ,h,k ; defined traffic intensity and measured concentrations,
respectively.
=i d= ,..., I , d = 2,..., J , h d= ,..., H , k d ,..., K . The illustration of the potentials distribution is represented
So, the ISNAE (7) is obtained by substituting the interval as a surface in the form of a mountainous relief (Fig. 1),
estimates of the output characteristic (given in the form of whose peaks have the highest potential values and are
initial conditions and predicted using expression (4) in the pretenders to be the centers of the formed clusters.
remaining nodes of the grid) into conditions (5). Therefore,
the task of parametric identification of IDO (4) under
conditions (5) is the task of solving ISNAE in the form (7).
Methods for estimation of solutions of the obtained ISNAE
are described in [10].
The analysis of the proposed scheme for building of
mathematical model of harmful vehicle emissions
distribution showed that before its implementing, it is
necessary to obtain a uniform grid of measured
concentrations (3) in its nodes and vectors of influences on
them ui , j ,h,0 ,..., ui , j ,h,k . The main among them, is the
vehicular traffic intensity. This task is solved using modified
method of subtractive clustering of data [11] on the traffic
intensity.
III. MODIFIED SUBTRACTIVE CLUSTERING METHOD
As the basis for method of clustering of vehicular traffic
distribution, it is expedient to use the “mountain” clustering
method with subtractive algorithm of its implementation.
This method does not require a large sample of experimental
data and does not require to set a predetermined number of
clusters that significantly reduces the time for its
implementation. It is also worth to note that the number of
clusters based on this method is regulated by the only Fig. 1. The illustration of potentials distribution based on the
parameter which is the cluster radius [11]. mountain clustering method.
According to the clustering method, in the beginning, we As we can see in the Fig. 1, one “mountain peak” is
form the potential cluster centers from the rows of data surrounded by other peaks that causes the problem of
matrix for the clustering of input variables and calculate the building of very similar data clusters with the corresponding
potentials of identified cluster centers using the expression: centers. This does not provide the high quality clustering
( ch ) ∑ exp ( −α ⋅ ch − xk ) ,
K results.
Ph = (8) As centers of the clusters, we choose the coordinates of
k =1
“mountain peaks”, that is, the center of the cluster is the point
where ch = (c1h , c2 h ,..., c Kh ) is a potential center of h-th on the city map with the highest value of potential:
cluster; а is a positive constant; ch − xk is a distance ( xh , yh , k ) = arg max Ph ( xh , yh , k ) . (10)
h =1,..., H
between potential center of h-th cluster ch and input
In order to avoid the building of similar clusters, we must
experimental data xk , k=1,…,K, h = 1,..., H ; H is a number of recalculate the potential values for the remaining possible
possible clusters. cluster centers:
In our case, if the only property of a cluster which is the
number of vehicles u xi, yj ,k in the point with coordinates
( xh+1, yh+1, k ) Ph+1 ( xh , yh , k ) − Ph ( xh , yh , k ) ⋅
Ph+1 =
(11)
xi , y j at a discrete moment of time k is taken into account, exp(− β ⋅ u xh +1 , yh +1 ,k − u xh , yh ,k ), h = 1, …, Н ,
the expression for estimation of potentials of given cluster where Ph ( xh , yh , k ) is a potential center of h-th cluster on h-
centers, has such form:
ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic
61
th iteration; Ph +1 ( xh , yh , k ) is a potential of center of h-th discretization step for building a DO. In our case, the cluster
is the set of points of a certain area of the city with similar
cluster on h+1 iteration; β is a positive constant,
values of current vehicles number.
u xh +1 , yh +1 ,k − u xh , yh ,k is a distance between potential center
of h+1 cluster and center of found h-th cluster.
The procedure of cluster centers calculation is carried out
until all the rows of the input variable matrix X, which is
represented by the set (3), are excluded.
The above procedure is based on the subtractive clustering
algorithm, which is based on the following steps.
Step 1. Forming of potential cluster centers. They are all
points of measured harmful emission concentrations and
corresponding intensities of vehicular traffic.
Step 2. Calculation the potential of possible cluster centers
based on (9).
Step 3. Selecting the data point with the maximal potential
for representation of the cluster center based on (10).
Step 4. Excluding the influence of the found cluster center
in the way of recalculating the potentials for other possible
cluster centers by (11).
Step 5. Identifying the next cluster and the coordinates of
its center. If the maximal value of the cluster center potential
exceeds some predetermined threshold which is the cluster
radius, that is Ph ( xh , yh , k ) , then proceed to step 4,
otherwise, the algorithm is completed.
The iterative procedure for identification of cluster centers Fig. 2. Points of measurements of vehicular traffic intensity on
and the recalculation of potentials is repeated until all points the example of Ternopil city.
in the space of input experimental data are located within the
neighborhoods of the radius of sought cluster centers. The result of the proposed method of cluster analysis
As a result of the clustering algorithm implementation, we application is schematically shown in Fig. 3. As we can see,
obtain h clusters, h = 1,…,Н, with the corresponding centers. during the clustering process, H clusters with different
The next step is the identification of a uniform grid nodes for vehicular traffic intensity and radius r were defined. So, the
homogeneous parts of a cluster. The discrete values of the discrete values of grid nodes coordinates are equal to the
grid nodes coordinates are equal to the cluster diameter, and cluster diameter.
the value u xh , yh ,k is the number of vehicles in the point of a
cluster center xh , yh , k . To assign the number of vehicles at
k-th moment to the grid nodes, it is enough to analyze, what
cluster the node is in. If the node belongs to the h-th cluster,
then, the number of vehicles in the node is u xh , yh ,k .
IV. EXAMPLE OF CLUSTERING METHOD
APPLICATION
Let’s consider the application of the developed clustering
method for obtaining of uniform grid of nodes on an example
of Ternopil city.
The fragment of map of central part of Ternopil city with
the marked points of the measured vehicular traffic intensity
for one discrete time (one hour) is shown in the Fig. 2. As we
can see, the traffic is distributed not uniformly over the
territory. Therefore, it is advisable to measure its intensity at
some selected points, where this intensity is the highest, for
example, as it is shown on the map of Ternopil city.
The points of measurement of traffic intensity are colored
red on the map.
The application of cluster analysis for determination of
areas with specific vehicular traffic intensities under
condition of identification of cluster centers that are located Fig. 3. The result of clustering of vehicles quantity distribution on
on a certain uniform grid gives a possibility to define the the Ternopil city map.
ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic
62
Obtained grid for building of distribution model of harmful Technical University Series "Information, cybernetics
vehicle emission concentrations in the form of IDO (4) is and computer science.", 2011, Vol. 14 (188), pp. 8-17.
schematically shown in Fig. 4. [3] M. Dуvak “Mathematical Modeling Tasks of Static
Systems with Interval Data”, Ternopil: TNEU
Publishing House "Economic Thought", 2011, 216 p.
[4] M. Dyvak, I. Voytyuk, T. Dyvak, A. Pukas “Application
of the interval difference operator for approximation of
fields of harmful emissions concentration from
vehicles”, Measuring and Computing Devices in
Technological Processes, 2011, No. 1 (37), pp. 44-52.
[5] Kvyetnyy R. N., Dementiev V. Yu., Mashnitsky M. O.,
Judin O. O. “Difference methods and splines in
multidimensional interpolation problems”, Vinnitsa:
UNIVERSUM, 2009, 87 p.
[6] N. Ocheretnyuk, I. Voytyuk, M. Dyvak and Y.
Martsenyuk, "Features of structure identification the
macromodels for nonstationary fields of air pollutions
from vehicles," Proceedings of International
Conference on Modern Problem of Radio Engineering,
Telecommunications and Computer Science, Lviv-
Slavske, 2012, pp. 444-444.
[7] A.G. Ivakhnenko “Inductive method of self-organizing
of models of complex systems”, Kyiv: Scientific
thought, 1981, 296 p.
[8] M. Dyvak, I. Voytyuk, T. Dyvak, A. Pukas “Application
Fig. 4. The uniform grid with known spatially distributed of the interval difference operator for approximation of
vehicular traffic intensities. fields of harmful emissions concentration from
vehicles”, Measuring and Computing Devices in
For building of mentioned model in the form of IDO (4), it Technological Processes, 2011, No. 34 (110), pp. 86-94.
is enough to execute interpolation and identify the pollution [9] T. Dyvak “Parametric identification of interval difference
concentrations in the grid nodes. operator on the example of macromodel for distribution
V. CONCLUSIONS of humidity in the drywall sheets in the process of
drying”, Information Technologies and Computer
The modified method of subtractive clustering and interval Engineering: international Scientific Journal, 2012,
analysis for modeling of distribution of harmful vehicle Vol. 3, pp. 79-85.
emissions concentrations and vehicular traffic intensity under [10] M. Dyvak, N. Porplytsya, Y. Maslyak, M. Shynkaryk
conditions of non-uniform measurement grid were proposed
“Method of Parametric Identification for Interval
and substantiated.
Discrete Dynamic Models and the Computational
ACKNOWLEDGMENT Scheme of Its Implementation,” Advances in Intelligent
Systems and Computing II: Selected Papers from the
This research has been supported by National Grants of
International Conference on Computer Science and
Ministry of Education and Science of Ukraine “Mathematical
Information Technologies, CSIT 2017, рр.101- 112,
tools and software for control the air pollution from vehicles”
2018.
(0116U005507) and “Mathematical tools and software for
[11] Shtovba S. Introduction to the theory of fuzzy sets and
classification of tissues in surgical wound during surgery on
fuzzy logic. Аccess mode:
the neck organs” (0117U000410).
http://matlab.exponenta.ru/fuzzylogic/book1/index.php
REFERENCES [12] Porplytsya, N., Dyvak, M., Dyvak, T., Voytyuk, I.
“Structure identification of interval difference operator
[1] A. Veremchuk, A. Pukas, I. Voytyuk and I. Spivak,
for control the production process of drywall.”
"Mathematical and software tools for modeling objects Proceedings of 12th International Conference on the
with distributed parameters," 2016 13th International Experience of Designing and Application of CAD
Conference on Modern Problems of Radio Engineering, Systems in Microelectronics, CADSM’2013, pp. 262-
Telecommunications and Computer Science (TCSET), 264 (2013).
Lviv, 2016, pp. 149-152. [13] Fliess, M., Sira-Ramirez, H. “Closed-loop parametric
[2] I. Voytyuk, M. Dyvak, V. Nemish “Method and genetic identification for continuous-time linear systems via new
algorithm for structure identification of interval algebraic techniques.” H. Garnier & L. Wang. Identification of
differentice operators in the tasks of environmental Continuous-time Models from sampled Data, Springer, pp.
monitoring”, Collected Works of Donetsk National 362–391, 2008.
ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic