<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>M. Afonin. Establishing patterns of change in the efficiency of
regulated intersection operation considering the permitted movement directions. Eastern-
European Journal of Enterprise Technologies 4(3(118) (2022) 17-26. doi: 10.15587/1729</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1016/j.trip.2024.101318</article-id>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yurii Matseliukh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vasyl Lytvyn</string-name>
          <email>vasyl.v.lytvyn@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Myroslava Bublyk</string-name>
          <email>my.bublyk@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>S. Bandera Street, 12, Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>2870</volume>
      <issue>29</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>An analysis of a heterogeneous data set on the duration of electric transport races in an average-sized city was conducted. The possibilities of using the K-means clustering method in organizing passenger transportation in a smart city were studied, including the analysis of passenger flows by passenger types, identification of transport hotspots, identification of inefficient routes or their sections, and construction of dynamic models for predicting changes in flows, as well as the features of its application for optimizing the operation of the transport system were determined. Data analysis revealed sections of routes with different intensity of transport flows, depending on their location in urban areas, seasonality, events in the city, changes in transport flows due to detours, repair work, etc. An algorithm for selecting a clustering method was proposed based on clustering quality assessment metrics, including the elbow method, the silhouette method, and the Calinski-Harabasz index. It is recommended to use clustering to create routes with reduced waiting times, fewer transfers, and compliance with passenger needs. passenger transportation; smart city; clustering analysis; K-means method; systems analysis1 high-quality clustering.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The problem of organizing passenger transportation in a smart city is closely related to the search
for effective methods and tools that ensure optimal dynamic interaction with vehicles when
organizing passenger transportation. When organizing passenger transportation, it is important to
consider various data, such as traffic routes, waiting times, duration of schedule execution, traffic
changes, vehicle load, weather conditions, environmental efficiency, individual needs of passengers,
etc. The collected large data sets require proper storage, appropriate analysis, effective grouping, and
data to optimize routes and waiting times for transport, as well as for monitoring environmental
indicators.</p>
      <p>The problem has broad practical significance, which grows every year and each time with the
introduction of modern systems of dynamic interaction of passengers with vehicles in a smart city,
among which we highlight the following: reducing carbon emissions, improving the quality of life,
saving resources and forming a smart urban infrastructure. The collected data sets require proper
and high-quality analysis, which cannot be carried out without the use of effective clustering
algorithms during route optimization, the introduction of environmentally friendly vehicles on
routes, as well as during the effective organization of transportation aimed at reducing waiting times
and vehicle load, improving passenger comfort, reducing greenhouse gas emissions, reducing
transportation costs, optimizing the use of transport infrastructure and energy-efficient
technologies, etc. In general, solving this problem makes a significant contribution to the
development of smart cities that use modern technologies to improve the quality of life of residents.
Therefore, finding ways to apply clustering methods, including the K-means method, in the
organization of passenger transportation in a smart city is an important component of the general
problem of developing methods and means of dynamic interaction of passengers with vehicles and
is of significant practical importance for the development of low-carbon passenger transportation in
public transport of large, medium and small cities. The object of research is the process of clustering
data sets on the organization of passenger transportation in a smart city. The subject of research is
the principles of optimizing passenger transportation in public transport and improving the
implementation of their transportation schedules.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Well-known studies clustering methods in organizing passenger transportation in a smart city</title>
      <p>
        Having conducted a detailed review of cluster analysis methods and tools [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] used for modelling
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], route optimization [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], big data analysis [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and their clustering in order to develop adaptive
algorithms for organizing a transport network in a smart city [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ], we see that special attention is
paid to decision-making models for optimizing passenger flows, taking into account modern
approaches based on both bottom-up (agglomerative) [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ] and top-down (divisive) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] clustering
methods, as well as distributive [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], fuzzy clustering [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], DBSCAN [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and self-organizing maps
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Modern research in the field of systems analysis [
        <xref ref-type="bibr" rid="ref15">15-17</xref>
        ] confirms that the integration of clustering
methods into the transport management system contributes to the creation of adaptive, efficient and
low-carbon transportation models [18, 19]. The development of the fundamental foundations of
clustering methods in the organization of passenger transportation, the development of modern
information and communication systems for passenger transportation by public transport are the
subjects of research by well-known scientists both in Ukraine and abroad. Among the researchers
whose contribution contributed to the development of theoretical foundations and practical
experience in the application of cluster data analysis in the organization of passenger transportation
based on the concept of a smart city, it is appropriate to note such representatives as: Bezdek J. [20],
Bublyk M. [21, 22], Esther M. [23], Jane A. [24] Kohonen T. [25], Koshtura D. [26], Lytvyn V. [27],
Lov A. [28], Nat N. [29], Sun L. [30], Tibshirani R. [31].
      </p>
      <p>
        Among the main clustering methods used for big data analysis in the field of organizing passenger
transportation in a smart city [32-36], it is necessary to consider the methods of hierarchical
clustering, partitioning clustering, density-based clustering, grids, artificial neural networks. They
provide an opportunity not only to understand the individual needs of passengers and patterns of
service consumption but also contribute to the optimization of resources to meet these needs. To
identify which clustering methods are used for organizing passenger flows, a comparison of the main
clustering methods in the field of organizing passenger transportation in a smart city was carried
out [
        <xref ref-type="bibr" rid="ref1 ref10 ref11 ref12 ref13 ref14 ref15 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">1-36</xref>
        ].
      </p>
      <p>In the case of hierarchical clustering, agglomerative or, as they are otherwise called, bottom-up
methods assume that each element in the data set is a separate cluster. The process of merging the
two closest clusters into one occurs according to certain rules (according to a specified metric) until
only one cluster is formed. Bottom-up methods are used to organize passenger flows when it is
necessary to determine the structure of routes, and the merging occurs according to similar route
sections or territories. In divisive or, otherwise, top-down methods, on the contrary, start with one
cluster that includes all the data and divide it into smaller clusters. They are used to organize
passenger flows when it is necessary to allocate separate routes or their separate sections (races,
zones) with different passenger flow intensity to optimize services in specific areas.</p>
      <p>In the case of divisive clustering, the K-means method divides the data into clusters by finding
the midpoint in each cluster, repeating this process until a stable distribution is achieved. This
method necessarily performs an exact data distribution, where each object belongs to only one class.
However, it is poorly adapted to data with a complex distribution or with qualitative characteristics.
The K-means method is useful for zoning the transport network, for example, when optimizing
routes by demand zones. The K-medoid method identifies the centres of clusters that have the
greatest possible separation from the total passenger flow. A distinctive feature of the K-medoid
method is its greater resistance to noise. It is used to optimize the operating schedule of vehicles,
determine the optimal location of stops on routes, for example, metro stations, which can cover the
greatest number of passenger needs. The fuzzy C-means clustering method helps to create more
flexible and adaptive passenger flow management strategies, which is important in modern urban
environments with the dynamic nature of demand for transport services. The fuzzy C-means
clustering method differs from the traditional K-means method in that it allows elements to belong
to several clusters simultaneously with different degrees of membership. This means that each
element has a certain probability of belonging to each of the k clusters. This method uses a
membership function that determines the degree of membership of an element to each of the possible
clusters. The goal of the optimal distribution is to minimize a function that considers both the
distance to the centres of the clusters and the degrees of membership. It gives the best results in
complex systems with high ambiguity and overlap between data. It is used to create more flexible
and adaptive urban transport zones, where passengers belong to several zones at the same time,
considering the unpredictability of demand, for example, changes in passenger flow during the day
or under different weather conditions. It reveals patterns invisible to other methods, which can be
important in developing optimization strategies. The use of C-means provides a more accurate
picture of the segmentation of transport users, reduces the risk of excess or insufficient volumes of
services on certain sections of the transport network. It is also used to predict mixed needs and their
impact on the distribution of clusters, thereby improving strategic decision-making in transport
organization.</p>
      <p>In density-based classification, the most well-known method is DBSCAN, which is most often
used to analyse passenger flows with various densities on different routes and to identify clusters
considering time dynamics. In grid-based classification, the STING method divides the data into
smaller groups for analysis to create models of time intervals and densities. Among the artificial
neural networks used for clustering, the most common is the self-organizing map method, which
allows creating dynamic maps of passenger flows considering time intervals and densities on
different routes but requires a significant amount of data for training the neural network,
prenormalization of the data and division into smaller groups for analysis.</p>
      <p>Therefore, each clustering method has its own advantages and can be applied in different
scenarios for the effective organization and optimization of passenger transportation in a smart city.
The choice of method depends on the specifics of the task, data characteristics and goals set when
analyzing transport and passenger flows. Bottom-up clustering methods are used when it is
necessary to determine the structure of routes, combine similar sections of routes or territories
adjacent to the route. Top-down methods are used when it is necessary to identify individual routes
or their individual sections (races, zones) with different intensity of passenger flows in specific areas.
The K-means distributive clustering method is useful for zoning the transport network when
optimizing routes by the duration of the races, by demand zones, etc. The K-medoid distributive
clustering method, due to its greater resistance to noise, is used to optimize the operating schedule
of vehicles, determine the optimal location of stops on routes to meet passenger needs. Fuzzy
Cmeans clustering method – when developing more flexible and adaptive passenger flow management
strategies in modern urban environments with a dynamic nature of demand for transport services.
The DBSCAN method is most often used to analyze passenger flows with various densities on
different routes and for clustering considering time dynamics. The self-organizing map method – for
creating dynamic maps of passenger flows considering time intervals and densities on different
routes.</p>
      <p>A smart city generates a huge amount of data from various sources, such as GPS systems, mobile
applications, sensors, social networks and video surveillance. Processing such large volumes of data
is critically important for making informed decisions in the field of passenger transportation.
Passenger flows are constantly changing depending on the time of day, day of the week, weather,
social events, etc. Clustering methods allow you to identify key groups or patterns in such flows to
better understand their nature. Clustering methods are the basis of many modern artificial
intelligence algorithms. The use of intelligent transport systems (ITS) [37-41] requires complex
analysis and modeling algorithms to identify optimal routes and manage the transport network.
Effective data clustering allows you to optimize routes, reduce downtime and total emissions, which
is important in the context of combating climate change. Such grouping is extremely important for
the development of adaptive route optimization algorithms, as it allows you to effectively allocate
transport resources, reduce waiting times and minimize emissions of carbon compounds. As noted
by Bublyk M. [42], the concept of smart specialization for the transformation of the Ukrainian
economy includes not only the optimization of the economic activities of transport companies, but
also the transition to a green economy, where a significant role is played by reducing CO2 emissions
through the introduction of innovative solutions in the transport industry. The basis of innovative
models for reducing emissions into the atmosphere is the concept of technosoliton, developed by
Bublyk M. [43, 44], where the damage and losses in highly polluting sectors of the economy, which
have remained transport for many years, were assessed. This concept is of particular importance in
the development of strategies for organizing passenger transportation, since route optimization
using the K-means method allows not only to improve the quality of service, but also to contribute
to the reduction of emissions into the atmosphere, which is crucial for achieving sustainable
development goals [45-48].</p>
      <p>Summarizing the above analysis of recent studies on the problem of applying clustering methods,
including the K-means method, in the organization of passenger transportation in a smart city, today
the still previously unsolved part of the general problem is methods for determining patterns of
passenger flows, optimizing transport routes and increasing network efficiency in real time. to
improve passenger comfort, reduce greenhouse gas emissions, reduce transportation costs, optimize
the use of transport infrastructure and energy-efficient technologies, etc. Insufficient attention has
also been paid to finding effective ways to apply clustering algorithms during route optimization,
when implementing environmentally friendly routes, as well as during effective transportation
organization aimed at reducing passenger waiting time, in general, or vehicle congestion, in
particular. This indicates the need for scientific research in this direction, namely, to study the
possibilities of using the K-means clustering method in organizing passenger transportation in a
smart city and to determine the features of their application for optimizing the organization of
passenger transportation by public transport, which is the purpose of this work.</p>
      <p>The article solves the following tasks: studying the features of clustering methods and their
metrics in organizing passenger transportation in a smart city; analyzing a large-scale heterogeneous
data set on the duration of electric transport trips within an average-sized city; developing a simple
and most effective algorithm for choosing a clustering method based on metrics for assessing the
quality of clustering data collected from the infrastructure of passenger transportation by public
transport in smart cities.</p>
      <p>( ,  ) =</p>
      <p>( −  ) ,
where  = ( ,  , … ,  ) – characteristic vector of point x, which contains n components;  =
( ,  , … ,  ) – characteristic vector of point y, which contains n components; i — index for each of
the attributes (attribute number).</p>
      <p>The choice of the number of clusters directly affects the quality of clustering, so it is important
to choose the optimal number of clusters for a given data set. In our case, it was the elbow method,
which involves analyzing the dependence of the sum of squared distances SSE(k) between points and
the centers of their clusters on the value k. The sum of squared distances is calculated by formula
(2).</p>
    </sec>
    <sec id="sec-3">
      <title>3. Materials and methods</title>
      <p>Among the methods used for data analysis, comparison and grouping was the cluster analysis
method, namely the K-means method. The key feature of the application of data clustering methods
is the choice of distance metric, which among many other different indicators should be chosen based
on its relevance to a specific example. In our case, this is a study of a dataset of low-carbon vehicle
traffic on a single route in an average-sized city, so the Euclidean distance was used, which is
described by the formula (1):
(1)
(2)
(3)
(4)

( ) =</p>
      <p>|  −  | ,
∈
where SSE(k) is the sum of squared distances; k is the number of clusters; xj is the point in the
data set belonging to cluster Cj (xj  Cj); μi is the center of the i-th cluster.</p>
      <p>The K-means algorithm is one of the most common clustering methods used to partition a data
set into k clusters. It works iteratively, minimizing the sum of the squares of the distances of points
to the cluster centers. The K-means algorithm consists of 4 stages: initialization, assigning points to
clusters; updating the cluster centers and checking the stopping criterion.</p>
      <p>Initialization consists of selecting k initial cluster centers  ,  , … , 
, randomly or using special
strategies, and assigning points to clusters. Each data element xi is assigned to the nearest center</p>
      <p>of the cluster according to the criterion of the smallest distance, for the calculation of which
formula (3) is used:</p>
      <p>= arg min ||  −  || ,
where ci is the cluster to which point xi is assigned; ∥xi−μk∥2 is the square of the Euclidean distance.</p>
      <p>The cluster centres are updated each time a new point is added to the cluster. Each point allocated
to the cluster to which it is closest according to the criterion of the smallest distance is considered
sequentially. After all points are assigned to clusters, the new centre μk of each cluster is calculated
as the average value of all points belonging to it (4):

=</p>
      <p>1
| |
∈</p>
      <p>,
where Sk is the set of points belonging to the k-th cluster.</p>
      <p>The centroid is sequentially recalculated each time a new point is added to the cluster, i.e. when
the division of points into clusters changes, then the coordinates of the centroids change to new
ones. To be sure that each point has been optimally assigned to the correct cluster, the distance of
each cluster point to the centre of its own cluster and to the centre of the nearest opposite cluster is
compared according to formula (5):
(5)

=</p>
      <p>argmin |  −  | .</p>
      <p>∈</p>
      <p>The iterative transfer of points continues with each new division into clusters until the last
division is recognized as the result of clustering.</p>
      <p>Checking the stopping criterion indicates that the algorithm is stopped. The clustering algorithm
is terminated if the cluster centres stop changing or the changes are insignificant. Otherwise, we
return to step 2. It is quite possible that the K-means algorithm will not find a final solution. In this
case, it is advisable to stop the algorithm after the algorithm reaches the previously selected
maximum iteration value. Thus, the K-means algorithm iteratively improves the distribution of
points between clusters by reducing the value of the loss function.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <sec id="sec-4-1">
        <title>4.1. An analysis of a heterogeneous data set on the duration of electric transport races in an average-sized city</title>
        <p>In our case, a dataset on the duration of low-carbon public transport trips within an average-sized
city was used for research. Here, we analyze in real time during the study period the duration of
each trip by each vehicle on the route within the same route. The data structure has the following
form: Record number; Geozone; Planned arrival time; Actual arrival time; Month; Day; Time; Date;
Week; Hour; Day of the week; Working / non-working. The total volume is 890999 records. After
cleaning the data from empty cells, separating incomplete, additional, erroneous and information
falling out of the general time frame of the duration of operation of vehicles on the route, 716960
records remained in the dataset, where the appearance of the first 21 records is shown in Fig. 1.</p>
        <p>As a result of the analysis of the collected data, the duration of each leg of the journey was
aggregated within each working hour by vehicles within each day for each week during the study
period. Since the duration of the journey within one hour by all vehicles on the route within one
route for each week during the year is also characterized by a complex, heterogeneous and
largescale structure, therefore it requires appropriate processing before starting the cluster analysis. At
the last stage, after cleaning and grouping the data, a matrix of passenger transport schedules for
each of the 10 legs was obtained with the average values of the duration of the leg for each week
during the study period. As an example, Fig. 2 shows the duration of the leg on average per day
during each week of the study period.
the study period (authors' calculation based on collected data).</p>
        <p>Using the Python pyplot tools from the matplotlib library, we visualize the average daily duration
of the Sec1 race by vehicles for each week, constructing the graph shown in Fig. 3:
from matplotlib import pyplot as plt
df['Sec1'].plot(kind='line', figsize=(8, 4), title='Sec1')
plt.gca().spines[['top', 'right']].set_visible(False)</p>
        <p>Using the same tools (pyplot from the matplotlib library) Python, we visualize the average daily
duration of the race by vehicles for each week for the remaining 9 races, where Fig. 4 shows the
graph for race Sec2.</p>
        <p>from matplotlib import pyplot as plt
df['Sec2'].plot(kind='line', figsize=(8, 4), title='Sec2')
plt.gca().spines[['top', 'right']].set_visible(False)</p>
        <p>Analyzing the structure of the dataset using Python tools, the frequency characteristics of the
dataset for each of the races, where Fig. 5 shows the result for race Sec1.</p>
        <p>from matplotlib import pyplot as plt
df['Sec1'].plot(kind='hist', bins=20, title='Sec1')
plt.gca().spines[['top', 'right',]].set_visible(False)</p>
        <p>Using matplotlib, a graph was generated with weeks on the x-axis and race time in seconds on
the y-axis for each of the races from Sec1 to Sec10 (Fig. 6). Data was read from the CSV file using
pandas, converting the decimal point data to floating point numbers.</p>
        <p>Fig. 6 shows a graph of the average daily duration of each race by vehicles for each week during
the study period, obtained using Python tools:
import pandas as pd
import matplotlib.pyplot as plt
import io
import seaborn as sns
import numpy as np
for col in df.columns[1:11]: # Columns 'Sec1' to 'Sec10'</p>
        <p>df[col] = df[col].str.replace(',', '.').astype(float)
plt.figure(figsize=(12, 6))
for col in df.columns[1:11]:</p>
        <p>plt.plot(df['Week'], df[col], marker='o', label=col)
plt.xlabel('Week')
plt.ylabel('Race Time (seconds)')
plt.title('Race Time vs. Week for All Sections')
plt.legend(loc='upper right')
plt.grid(True)
plt.tight_layout()
plt.show()Do not insert line breaks in your title.</p>
        <p>The graph (Fig. 6) shows the dependence of the race time in seconds on the week number for each
of the sections (Sec1–Sec10). Each section is represented by a line of a different color with markers.
The x-axis is the week number, and the y-axis is the race time in seconds. The plot has a grid for
better readability, and a legend in the upper right corner identifies each section. Sec1 has the highest
overall race time. Sec9 and Sec10 have the lowest and most stable race times. This Line Plot of All
Sections shows us the trend of the race time for each section over all the weeks studied (Fig. 6).</p>
        <p>We see that the average daily duration of the races on average over the year is the highest for the
Sec1 race (00:04:22), and the lowest for the Sec9 race (00:01:44), which indicates the dependence of
passenger transportation in an average-sized city on traffic and the type of race, because the Sec1
race is a race in the city center with a high probability of congestion, and the Sec9 race is a race on
an isolated line specifically for this public transport. The averaged average duration of the race for
each week for the entire route indicates the presence of several hypotheses: hypothesis 1 about the
existence of seasonal dependences of the amount of transport on the roads, as well as hypothesis 2
about the influence of weather changes on the duration of the races.</p>
        <p>The research also used a Box Plot of All Sections, which shows us a statistical summary of the
distribution of race times for each section of the route, highlighting the median, quartiles, and
outliers (Fig. 7).
plt.xlabel('Race Time (seconds)')
plt.ylabel('Frequency')
plt.title('Histogram of Race Times')
plt.tight_layout()
plt.show()</p>
        <p>The Correlation Heatmap shows the correlation between race times on different sections of the
route for each section (Fig. 10).</p>
        <p>From Fig. 10 it can be seen that the Sec3 and Sec4 sections have a strong correlation, which
indicates an unresolved problem of a transport node with high traffic intensity between these
sections, which causes delays on the route.</p>
        <p>Summarizing this analysis of passenger transportation in an average-sized city, it was found that
the average daily duration of each leg for each week increases with the beginning of the
autumnwinter period and reaches its maximum in the 52nd week of the year (00:03:21), lower than average
values of the average daily duration of each leg are observed in the spring period, with the minimum
value (00:02:22) falling on the 18th week of the year (end of April - beginning of May).</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. A cluster analysis of the data set on the duration of electric transport races in an average-sized city</title>
        <p>Cluster analysis of such a large-scale heterogeneous data set on the duration of electric transport
trips within a medium-sized city with a developed public transport network was carried out using
the K-means clustering method due to its feature of necessarily exact distribution of data between
clusters.</p>
        <p>It should be noted that there are several options for selecting the optimal value of the number of
clusters k, among which the elbow method, the silhouette method and the Calinski-Harabasz index
are most often used. The elbow method considers subjectively understandable graphs of the nature
of the change in the scatter of points (Wtotal  max) from the largest value for all points in one
cluster to the smallest value (Wtotal  0) with an increase in the number of groups k (k n).</p>
        <p>
          The silhouette method measures how similar the points in one cluster are compared to other
clusters. The value of the silhouette index is in the range [
          <xref ref-type="bibr" rid="ref1">−1,1</xref>
          ], where larger values indicate better
clustering quality. This method assesses how well the points are located inside their clusters
compared to other clusters. A larger value of the silhouette coefficient indicates better clustering
quality. The Calinski-Harabasz index, also known as the dispersion ratio criterion, involves
determining the ratio of the intercluster separation to the intracluster dispersion, normalized by their
number of degrees of freedom. The highest value of the Calinski-Harabasz index indicates that the
clusters are defined most clearly. Although this metric is best suited for calculating the value of the
number of clusters, it has the same drawback as the silhouette coefficient - it overestimates the
estimate for convex cluster shapes and underestimates the estimate for complex cluster shapes. In
order to find the optimal number of clusters k for the data set with the average daily durations of
each of the races during the week on the route in an average-sized city (Fig. 12), the elbow, silhouette
and Calinski-Harabasz methods were used. The results of estimating the coefficient of total variation
of points within the cluster relative to the cluster center SSE by the elbow method are shown in Fig.
11. The optimal value of the number of clusters is k=5 with the value of SSE=70896.042 (Fig. 11). The
results of the estimation of the silhouette coefficient Si by the silhouette method are shown in Fig.
12. In our case, the maximum value of the silhouette coefficient Si =0.507 occurs at k=2, which is
considered the optimal value of the number of clusters for clustering (Fig. 12). Fig. 13 shows the
results of the estimation of the Calinski-Harabasz index and the corresponding values of the number
of clusters. In our case, the maximum value of the Calinski-Harabasz index S =56.186 occurs at k=3
(Fig. 13), which indicates the optimal value of the number of clusters for data clustering.
        </p>
        <p>When clustering the average daily duration of the races for each week using the K-means method,
the results of calculating the number of clusters k using the elbow, silhouette and Calinski-Harabasz
methods were taken into account, respectively k=5, k=2 and k=3 (Fig. 11 – Fig. 13). Fig. 14 shows the
distribution of data (average daily values of the duration of each race for each week during the year)
into clusters, obtained for k=2 (a); k=3 (b) and k=5 (c).</p>
        <p>(a)</p>
        <p>(b)
(c)
Figure 14: Results of clustering of average daily values of passenger transportation schedules for
each week during the year on each section (leg), namely: clustering of the data set for k=2 (a);
clustering of the data set for k=3 (b); clustering of the data set for k=5 (c).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>Let's conduct a detailed analysis of the distribution of data into clusters. When divided into two
clusters, where the value of k=2 was obtained by the silhouette method, we have clusters with
numbers 0 and 1 (according to Fig. 14 (a). The first cluster under number 0 forms the data of execution
of passenger transportation schedules on each leg for weeks 9-20 and 22-43 with average daily values
close to the average or less than it (Fig 1 - Fig 2, Table 1). The second cluster under number 1 forms
the data of execution of passenger transportation schedules on each leg for weeks 21 and 44-52 with
average daily values significantly higher than the average Fig 1 - Fig 2, Table 1) for at least two legs.
This cluster is also characterized by the presence of weeks (21, 45, 46, 50-52) with a significant excess
(by 1.5-2 times) of the average daily values of execution of passenger transportation schedules on
three or more legs. Most of such significant exceedances occur in the autumn-winter period of the
year, which is due to difficult weather conditions.
k=5 0 0 0 0 0 0 0 2 0 0 0 0 4 4 1 1 1 1 1 1 1 1 1 2 2 2 1 2 2 2 2 3 3 3 3 3 3 3 3 3</p>
      <p>When divided into three clusters, where the value of k=3 was obtained by the Calinski-Harabasz
method, we have clusters with numbers 0; 1 and 2, which are displayed in Fig. 14 (b). The first cluster
under number 0 forms the data of passenger transportation schedules for weeks 21 and 44 - 52 with
average daily values significantly higher than the annual average on each leg (Fig 1, Fig 2, Table 1))
mainly for three or more legs. This cluster is also characterized by the presence of weeks (21, 44-46,
48, 50-52) with a significant excess of average daily values of passenger transportation schedules on
three or more legs, which is due to difficult weather conditions in the autumn-winter period. This
indicates the dependence of the duration of the legs on seasonality. The second cluster under number
1 forms the data of execution of passenger transportation schedules only for weeks 9-20 and 22 with
average daily values less than the annual average on each leg (Fig 1, Fig 2, Table 1). The third cluster
under number 2 forms the data of execution of passenger transportation schedules for weeks 27 - 43
with average daily values close to the annual average on each leg, and insignificant excesses of the
annual average are observed on no more than two legs during the week (Fig 1, Fig 2, Table 1).</p>
      <p>When divided into five clusters (k=5), the value of which was obtained by the elbow method (Fig.
11), we have clusters with numbers 0; 1; 2; 3 and 4, shown in Fig. 14 (c). The first cluster under
number 0 forms the data on the execution of passenger transportation schedules on each leg for
weeks 9-15 and 17-20 with average daily values less than the annual average (Fig 1, Fig 2, Table 1).
The second cluster under number 1 forms the data on the execution of passenger transportation
schedules on each leg for weeks 27-35 and 39 with average daily values higher than the annual
average for no more than two legs (Fig 1, Fig 2, Table 1). The third cluster under number 2 forms the
data on the execution of passenger transportation schedules for weeks 16, 36 - 38 and 40-43 with
average daily values close to the annual average for almost every leg (Fig 1, Fig 2, Table 1), with the
excess of the annual average being observed for no more than one leg. Exceedances are observed
only for the city center run, which indicates the dependence of the run duration on their location in
specific urban areas. The fourth cluster under number 3 is formed by the data on the execution of
passenger transportation schedules for weeks 44–52 with average daily values significantly higher
than the annual average mainly on three or more runs (Fig 1, Fig 2, Table 1), which is due to the
presence of seasonality in the studied dependence of the run duration. This cluster is also
characterized by the presence of weeks 45 and 52 with a significant excess of the average daily values
of the execution of passenger transportation schedules on five runs, which may indicate both a high
impact of traffic together with seasonality. The fifth cluster under number 4 forms the data of
passenger transportation schedules execution only for weeks 21 and 22 in the summer period with
average daily values significantly higher than the annual average on three or more routes (Fig 1, Fig
2, Table 1). This cluster indicates only the high impact of traffic on the average daily values of
passenger transportation schedules execution on routes in the city center. It should be noted that no
excesses of the average annual values of schedule execution were observed for routes 6-10, which
are on an isolated line allocated only for this type of electric transport. This indicates the optimal
way to solve the problems with passenger transportation by public transport, but it is complex,
because it requires significant investments in the city's infrastructure and is long in implementation.</p>
      <p>Thus, the cluster analysis of a large-scale heterogeneous data set on the duration of electric
transport trips within an average-sized city revealed individual sections (trips, zones) of the route
with different intensities, which are highly influenced by traffic, their location in specific urban areas
(city center, residential area, etc.), as well as seasonality. Despite the subjectivity of determining the
optimal value of the number of clusters using the elbow method, we see that dividing the average
daily duration of trips for each week into clusters gave the best results for k=5, where the value of
the estimate of the intra-cluster total variation of points within the cluster relative to the cluster
center SSE=70896.042 (Fig. 11). It should also be noted that at k=5 the values of the silhouette
coefficient Si =0.378 and the Calinski-Harabasz index S =42.5086 are not significantly less than the
maximum values of the silhouette coefficient (Fig. 12) and the Calinski-Harabasz index (Fig. 13),
respectively.</p>
      <p>Thus, it can be stated that the proposed algorithm for selecting a clustering method based on
internal metrics for assessing the quality of clustering data collected from the infrastructure of
passenger transportation by public transport in a medium-sized city is quite simple and effective.
The clustering metrics included the elbow method, the silhouette method and the Calinski-Harabasz
index, which allow for a quick and easy selection of the optimal value of the number of clusters, as
well as taking into account the features of the data. The elbow method allows us to take into account
the intra-cluster general variation of points within a cluster relative to the cluster center, the
silhouette method measures how similar the points in one cluster are compared to other clusters,
and the highest value of the Calinski-Harabasz index indicates that the clusters are defined most
clearly.</p>
      <p>Thus, the K-means clustering method revealed the races with a high excess of the average daily
values of the duration of the races compared to the average annual ones also indicate an increase in
the waiting time of passengers at stops, which affects the number of passengers transported and the
quality of the services provided. This indicates the need to make informed decisions in the field of
passenger transportation by public transport in the city in order to optimize it.</p>
      <p>We recommend using this K-means clustering method when analyzing the average daily duration
of each trip by vehicle for each week during the studied period to make informed decisions in the
field of passenger transportation by public transport in a smart city, namely for optimizing routes,
adapting the transport network itself, forecasting and planning demand for transport services,
implementing personalized services, as well as integrating different types of transport to create a
single effective multimodal transport system.</p>
      <p>Thus, when analyzing passenger flows using the K-means clustering method, the identified areas
of high demand will allow creating optimal transport routes that meet the real needs of passengers
at a specific point in time, reducing waiting times and the number of transfers to the minimum
possible. This K-means clustering method is also useful when analyzing changes in passenger needs
and will facilitate the adaptation of public transport routes to changes in demand, for example,
adding new stops, changing vehicle schedules and their schedules. This will also allow city
government leaders to better plan infrastructure projects and investments in the modernization of
the transport system in order to integrate different modes of transport (electric transport, regular
buses, metro, if available) to create a single efficient transport system. In a smart city, personalization
of services is also important, where mobile applications for public transport play an important role,
which, when providing personalized recommendations to passengers on choosing the optimal route
or travel time, will use the results of clustering the duration of the race schedules in real time. The
main problems that should be solved using big data clustering are the allocation of passenger clusters
by type (workers, students, tourists, etc.), identification of hot spots (areas with the highest demand
for transport at a certain time), identification of inefficient routes or low load on individual sections
of the transport network, analysis of the dependence of passenger flows on external factors (weather,
events in the city, social trends), as well as building dynamic models for predicting changes in flows.</p>
      <p>Therefore, the obtained results of cluster analysis of the average daily duration of each journey
by vehicles for each week during the studied period have practical value in optimizing routes,
adapting the transport network itself, forecasting and planning demand for transport services,
implementing personalized services, as well as integrating different types of transport to create a
single effective multimodal transport system. It was recommended to use clustering to optimize
routes, namely to create optimal transport routes that have reduced waiting times and fewer
transfers, and also meet the real needs of passengers at the time they specify.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and prospects for further development</title>
      <p>In order to study the possibilities of applying clustering methods in organizing passenger
transportation in a smart city, a study was conducted to study the features of their application to
improve the organization of passenger transportation by public transport. This made it possible to
establish that the choice of a clustering method depends on the specifics of the task, data
characteristics and goals set when analyzing transport and passenger flows. Thus, bottom-up
clustering methods are used when it is necessary to determine the structure of routes, to combine
similar sections of routes or territories adjacent to the route. Top-down methods are used when it is
necessary to identify individual routes or their individual sections (races, zones) with different
passenger flow intensity for further optimization of services in specific zones. The K-means
distributive clustering method is useful for zoning the transport network, for example, when
optimizing routes by the duration of the races, by demand zones, etc. The K-medoid distribution
clustering method is more robust to noise, so it is used to optimize the operating schedule of vehicles,
determine the optimal location of stops on routes to best meet passenger needs. The C-means fuzzy
clustering method is used to develop more flexible and adaptive passenger flow management
strategies, which is important in modern urban environments with the dynamic nature of demand
for transport services. The DBSCAN method, which classifies elements based on density, is most
often used to analyze passenger flows with different densities on different routes and for clustering
taking into account time dynamics. The self-organizing map method is used for clustering to create
dynamic maps of passenger flows taking into account time intervals and densities on different routes.</p>
      <p>As a result of the cluster analysis of passenger transportation in an average-sized city with a
developed public transport network, it was found that the collected data on the duration of each
journey by vehicles within each day for each week during the studied period have a complex,
heterogeneous and large-scale structure, therefore they require appropriate processing before
starting the analysis. The cluster analysis of such a large-scale heterogeneous data set on the duration
of electric transport journeys within an average-sized city was carried out using the K-means
clustering method, since this method, by reducing the value of the loss function, necessarily
implements an accurate data distribution, where each object belongs to only one class. A simple and
most effective algorithm for choosing a clustering method is proposed based on internal metrics for
assessing the quality of clustering of data collected from the infrastructure of passenger
transportation by public transport in an average-sized city. The clustering metrics included the elbow
method, the silhouette method and the Calinski-Harabasz index, which allow for a quick and simple
selection of the optimal value of the number of clusters. The elbow method allows us to establish the
intra-cluster general variation of points within a cluster relative to the cluster center, the silhouette
method measures how similar the points in one cluster are compared to other clusters, and the
highest value of the Calinski-Harabasz index indicates that the clusters are defined most clearly.</p>
      <p>The obtained results of the cluster analysis of the average daily duration of each trip by vehicles
for each week during the studied period have practical value in optimizing routes, adapting the
transport network itself, forecasting and planning the demand for transport services, implementing
personalized services, as well as integrating different modes of transport to create a single effective
multimodal transport system. It was recommended to use clustering for route optimization, namely
for creating optimal transport routes that have reduced waiting times and fewer transfers, and also
meet the real needs of passengers at the time they specify.</p>
      <p>Therefore, the K-means clustering method when analyzing the average daily duration of each trip
by vehicles for each week during the studied period is appropriate to use for optimizing the
organization of passenger transportation by public transport in a smart city. The prospect of further
research is the use of big data clustering to identify clusters of passengers by type (workers, students,
tourists, etc.), identify hot spots (areas with the highest demand for transport at a certain time),
identify inefficient routes or low load on individual sections of the transport network, analyze the
dependence of passenger flows on external factors (weather, events in the city, social trends), as well
as build dynamic models for predicting changes in flows.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Saxena</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prasad</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bharill</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel</surname>
            <given-names>O. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tiwari</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Er</surname>
            <given-names>M. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ding</surname>
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            <given-names>C.</given-names>
          </string-name>
          <article-title>A review of clustering techniques and developments</article-title>
          .
          <source>Neurocomputing</source>
          .
          <year>2017</year>
          . No. 267. P.
          <volume>664</volume>
          -
          <fpage>681</fpage>
          . DOI:
          <volume>10</volume>
          .1016/j.neucom.
          <year>2017</year>
          .
          <volume>06</volume>
          .053
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Isoli</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaczykowski</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>Net energy analysis and net carbon benefits of CO2 capture and transport infrastructure for energy applications and industrial clusters</article-title>
          .
          <source>Applied Energy</source>
          .
          <year>2025</year>
          . No.
          <volume>382</volume>
          , 125227. DOI:
          <volume>10</volume>
          .1016/j.apenergy.
          <year>2024</year>
          .125227
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kowalska-Styczeń</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bublyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lytvyn</surname>
          </string-name>
          ,
          <article-title>Green innovative economy remodeling based on economic complexity</article-title>
          ,
          <source>Journal of Open Innovation: Technology, Market, and Complexity</source>
          <volume>9</volume>
          (
          <issue>3</issue>
          ) (
          <year>2023</year>
          )
          <article-title>100091</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.joitmc.
          <year>2023</year>
          .100091
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Podlesna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bublyk</surname>
          </string-name>
          , I. Grybyk,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matseliukh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Burov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kravets</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lozynska</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Karpov</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Peleshchak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Peleshchak</surname>
          </string-name>
          ,
          <article-title>Optimization model of the buses number on the route based on queuing theory in a Smart City</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          Vol-
          <volume>2631</volume>
          (
          <year>2020</year>
          )
          <fpage>502</fpage>
          -
          <lpage>515</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2631</volume>
          /paper37.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Bianchini</surname>
            <given-names>D.</given-names>
          </string-name>
          , De Antonellis V.,
          <string-name>
            <surname>Garda</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>A big data exploration approach to exploit in-vehicle data for smart road maintenance</article-title>
          .
          <source>Future Generation Computer Systems</source>
          .
          <year>2023</year>
          . No. 149. P.
          <volume>701</volume>
          -
          <fpage>716</fpage>
          . DOI:
          <volume>10</volume>
          .1016/j.future.
          <year>2023</year>
          .
          <volume>08</volume>
          .004
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Katrenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Krislata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Veres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Oborska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Basyuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vasyliuk</surname>
          </string-name>
          , I. Rishnyak,
          <string-name>
            <given-names>N.</given-names>
            <surname>Demyanovskyi</surname>
          </string-name>
          ,
          <string-name>
            <surname>O.</surname>
          </string-name>
          <article-title>Meh Development of traffic flows and smart parking system for smart city</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          Vol-
          <volume>2604</volume>
          (
          <year>2020</year>
          )
          <fpage>730</fpage>
          -
          <lpage>745</lpage>
          . URL: https://ceur-ws.org/Vol2604/paper50.pdf
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matseliukh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bublyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naychuk-Khrushch</surname>
          </string-name>
          ,
          <article-title>The role of public transport network optimization in reducing carbon emissions</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          Vol-
          <volume>3723</volume>
          (
          <year>2024</year>
          )
          <fpage>340</fpage>
          -
          <lpage>364</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3723</volume>
          /paper19.pdf
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Visan</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Negrea</surname>
            <given-names>S. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mone</surname>
            <given-names>F</given-names>
          </string-name>
          .
          <article-title>Towards intelligent public transport systems in Smart Cities; Collaborative decisions to be made</article-title>
          .
          <source>Procedia Computer Science</source>
          .
          <year>2021</year>
          . No. 199. P.
          <volume>1221</volume>
          -
          <fpage>1228</fpage>
          . DOI:
          <volume>10</volume>
          .1016/j.procs.
          <year>2022</year>
          .
          <volume>01</volume>
          .155
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Ezugwu</surname>
            <given-names>A. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ikotun</surname>
            <given-names>A. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oyelade</surname>
            <given-names>O. O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abualigah</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agushaka</surname>
            <given-names>J. O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eke</surname>
            <given-names>C. I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akinyelu</surname>
            <given-names>A. A.</given-names>
          </string-name>
          <article-title>A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects</article-title>
          .
          <source>Engineering Applications of Artificial Intelligence</source>
          .
          <year>2022</year>
          . No.
          <volume>110</volume>
          , 104743. DOI:
          <volume>10</volume>
          .1016/j.engappai.
          <year>2022</year>
          .104743
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Chavent</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lechevallier</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Briant</surname>
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>DIVCLUS-T:</surname>
          </string-name>
          <article-title>A monothetic divisive hierarchical clustering method</article-title>
          .
          <source>Computational Statistics &amp; Data Analysis</source>
          .
          <year>2007</year>
          . No.
          <volume>52</volume>
          (
          <issue>2</issue>
          ). P.
          <volume>687</volume>
          -
          <fpage>701</fpage>
          . DOI:
          <volume>10</volume>
          .1016/j.csda.
          <year>2007</year>
          .
          <volume>03</volume>
          .013
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Celebi</surname>
            <given-names>M. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kingravi</surname>
            <given-names>H. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vela</surname>
            <given-names>P. A.</given-names>
          </string-name>
          <article-title>A comparative study of efficient initialization methods for the k-means clustering algorithm</article-title>
          .
          <source>Expert Systems With Applications</source>
          .
          <year>2012</year>
          . No.
          <volume>40</volume>
          (
          <issue>1</issue>
          ). P.
          <volume>200</volume>
          -
          <fpage>210</fpage>
          . DOI:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2012</year>
          .
          <volume>07</volume>
          .021
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bakurova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bilyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Didenko</surname>
          </string-name>
          ,E. Tereschenko,
          <article-title>Analytics module for the system for recording destruction due to russian aggression</article-title>
          ,
          <source>in Monitoring of Geological Processes and Ecological Condition of the Environment</source>
          <year>2023</year>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . doi:
          <volume>10</volume>
          .3997/
          <fpage>2214</fpage>
          -
          <lpage>4609</lpage>
          .
          <fpage>2023520232</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Singh</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            <given-names>D.</given-names>
          </string-name>
          <article-title>A comprehensive review of clustering techniques in artificial intelligence for knowledge discovery: Taxonomy, challenges, applications and future prospects</article-title>
          .
          <source>Advanced Engineering Informatics</source>
          .
          <year>2024</year>
          . No.
          <volume>62</volume>
          , 102799. DOI:
          <volume>10</volume>
          .1016/j.aei.
          <year>2024</year>
          .102799
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Yan</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tseng</surname>
            <given-names>F.</given-names>
          </string-name>
          <article-title>An evaluation system based on the self-organizing system framework of smart cities: A case study of smart transportation systems in China</article-title>
          .
          <source>Technological Forecasting and Social Change</source>
          .
          <year>2020</year>
          . No.
          <volume>153</volume>
          , 119371. DOI:
          <volume>10</volume>
          .1016/j.techfore.
          <year>2018</year>
          .
          <volume>07</volume>
          .009
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gvozd</surname>
          </string-name>
          , ,
          <string-name>
            <surname>Ohinok</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ivaniuk</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Protsak</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , , L. Chernobay,
          <article-title>Independent factors simulation of the influence on the level of sustainable development in intellectual systems of management</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          Vol-
          <volume>3426</volume>
          (
          <year>2023</year>
          )
          <fpage>246</fpage>
          -
          <lpage>258</lpage>
          . URL: https://ceur-ws.org/Vol2870/paper118.pdf
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>