<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">DBSCAN parallelization by spatial dataset division and hyperparameter adjustment</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Vadim</forename><surname>Romanuke</surname></persName>
							<email>romanukevadimv@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Vinnytsia Institute of Trade and Economics</orgName>
								<orgName type="institution">State University of Trade and Economics</orgName>
								<address>
									<addrLine>Soborna str., 87</addrLine>
									<postCode>21050</postCode>
									<settlement>Vinnytsia</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sergii</forename><surname>Pavlov</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Vinnytsia National Technical University</orgName>
								<address>
									<addrLine>Khmelnytske sh., 95</addrLine>
									<postCode>21021</postCode>
									<settlement>Vinnytsia</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Svitlana</forename><surname>Yaremko</surname></persName>
							<email>svitlana_yaremko@ukr.net</email>
							<affiliation key="aff0">
								<orgName type="department">Vinnytsia Institute of Trade and Economics</orgName>
								<orgName type="institution">State University of Trade and Economics</orgName>
								<address>
									<addrLine>Soborna str., 87</addrLine>
									<postCode>21050</postCode>
									<settlement>Vinnytsia</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Artem</forename><surname>Guralnyk</surname></persName>
							<email>artem.guralnyk@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Vinnytsia Institute of Trade and Economics</orgName>
								<orgName type="institution">State University of Trade and Economics</orgName>
								<address>
									<addrLine>Soborna str., 87</addrLine>
									<postCode>21050</postCode>
									<settlement>Vinnytsia</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Olena</forename><surname>Smetaniuk</surname></persName>
							<email>smetaniuk@vntu.edu.ua</email>
							<affiliation key="aff1">
								<orgName type="institution">Vinnytsia National Technical University</orgName>
								<address>
									<addrLine>Khmelnytske sh., 95</addrLine>
									<postCode>21021</postCode>
									<settlement>Vinnytsia</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">DBSCAN parallelization by spatial dataset division and hyperparameter adjustment</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">30E266F5546F4D97319705C0CD188B07</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T20:12+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>arbitrary-shape clustering, DBSCAN, parallelization, hyperparameter adjustment, neighborhood radius, spatial dataset division 1 1 (O. Smetaniuk) 0000-0001-9638-9572 (V. Romanuke)</term>
					<term>0000-0002-0051-5560 (S. Pavlov)</term>
					<term>0000-0002-0605-9324 (S. Yaremko)</term>
					<term>0000-0003-1977-6338 (A. Guralnyk)</term>
					<term>0000-0001-5207-6451 (O. Smetaniuk)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>An approach to parallelizing the DBSCAN algorithm without losing in accuracy is presented to further improve quality of arbitrary-shape clustering. The dataset is divided into a number of subsets, one of which should be labeled to adjust the DBSCAN hyperparameters -the neighborhood radius and minimum-neighbors number. The neighborhood radius, set to a minimum, is adjusted first, where the original DBSCAN is applied to a subset of points with known ground truth labels. While the current accuracy of clustering is not improved, the neighborhood radius is increased by an increment step. The original DBSCAN algorithm is applied with the best neighborhood radius to the remaining subsets. The minimum-neighbors number, set to 3, is adjusted second by the best neighborhood radius. Compared to the performance of the original DBSCAN algorithm, the parallelized DBSCAN performs more accurately being extremely fast. As the dataset size increases and the number of clusters increases, one by one or together, the parallelized DBSCAN outperforms the original DBSCAN more. However, the DBSCAN parallelization by spatial dataset division and hyperparameter adjustment is less efficient on datasets with more complex structure. Nevertheless, the average speedup exceeds 80 times for planar datasets and exceeds 300 times for three-dimensional datasets.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>DBSCAN is one of the most commonly used clustering algorithms whose purpose is to reveal clusters of arbitrary shape by domain minimal knowledge <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. DBSCAN groups points based on the notion of -neighborhood of a point and a specified minimum number of neighboring points , which further are supplemented with the notions of core points, directly reachable points, non-directly reachable points, and outliers <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>. Besides, to rectify the non-symmetric relation of the reachability <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>, a symmetric notion of density-connectedness is added <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>. The density is defined by the neighborhood radius and number , where a cluster contains at least points. The -neighborhood is defined as a set of points falling within radius by a distance function <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10]</ref>. Apart from DBSCAN hyperparameters and , the distance function is seen as an additional parameter of the DBSCAN algorithm <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12]</ref>.</p><p>For points in a dataset, the DBSCAN average runtime complexity is , although the worst case of is not excluded <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b6">7]</ref>, especially when there is probable incompleteness in data <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b12">13]</ref> or the neighborhood radius can vary in a wide range <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b13">14]</ref>. Whereas the overall performance of DBSCAN is quite satisfactory, and it successfully reveals outliers and identifies nonglobular shapes (contrary to other commonly used clustering algorithms like, e. g., k-means <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b15">16]</ref>), it fails to handle datasets whose clusters are of much varying densities <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b17">18]</ref>. The second drawback is the neighborhood radius selection <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b13">14]</ref>. While datasets of planar points or points in the three-dimensional space are visualized, the neighborhood radius is well understood and assessed, but higher dimensions are problematic to "see" a meaningful distance <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b19">20]</ref>. Moreover, quality of features may be pretty different and may change through a domain which is clustered <ref type="bibr" target="#b7">[8]</ref>. This hinders in obtaining accurately partitioned datasets by coarsely applying DBSCAN <ref type="bibr" target="#b16">[17]</ref>.</p><p>There are domains whose factual clusters have multiple meaningful radii of the neighborhood <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b19">20]</ref>.</p><p>Both the weaknesses of DBSCAN are addressed by OPTICS <ref type="bibr" target="#b20">[21,</ref><ref type="bibr" target="#b21">22]</ref>, which is another algorithm whose purpose is to detect meaningful clusters in data of varying density <ref type="bibr" target="#b22">[23,</ref><ref type="bibr" target="#b23">24]</ref>. OPTICS also requires two hyperparameters and , although the neighborhood radius is not necessary. The radius can be set to the maximum possible value, but this results in runtime close to <ref type="bibr" target="#b24">[25]</ref>. Besides, OPTICS is mainly intended for hierarchical clustering, and OPTICS is reported to have an actual constant slowdown factor of 1.6 compared to DBSCAN <ref type="bibr" target="#b20">[21,</ref><ref type="bibr" target="#b25">26]</ref>.</p><p>Another DBSCAN drawback, being reported less frequently, is its poorer performance on large datasets <ref type="bibr" target="#b26">[27]</ref>. This especially concerns high-dimensional data, where it is difficult to find an appropriate value for due to adjusting this hyperparameter takes significant amounts of time and computational resources. Overall, tunability of DBSCAN hyperparameters (as well in other clustering algorithms) worsens as the dataset enlarges <ref type="bibr" target="#b27">[28,</ref><ref type="bibr" target="#b28">29]</ref>.</p><p>Nevertheless, when a large dataset is assumed to have tightly connected or not sparsely scattered clusters, it is possible to divide the dataset into multiple subdatasets in order to apply the DBSCAN algorithm to every subdataset separately. This resembles divide-andconquer approach successfully applied to solving large combinatorial optimization problems like the traveling salesman problem with thousands of nodes and larger <ref type="bibr" target="#b29">[30]</ref>. In particular, the traveling salesman problem nodes are grouped into a definite number of clusters, each of which corresponds to an open-tour traveling salesman subproblem subsequently solved independently of solving other subproblems <ref type="bibr" target="#b30">[31,</ref><ref type="bibr" target="#b31">32]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Problem statement</head><p>In cluster analysis, spatial dataset division is effective when possible clusters are not sparsely scattered. Otherwise, the problem of clustering tends to be almost trivial -the distinctly distanced groups of objects can be more efficiently separated by the support vector machine (linearly inseparable data are projected onto a higher dimensional space to become linearly separable <ref type="bibr" target="#b32">[33,</ref><ref type="bibr" target="#b33">34]</ref>) or other similar approaches including k-means with evaluating the optimal number of clusters using the silhouette clustering evaluation criterion <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b34">35,</ref><ref type="bibr" target="#b35">36]</ref>. To the contrary, potential clusters closely packed together are totally indistinguishable without considering spatial densities if, for instance, the clusters have at least a partially hierarchical structure <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b36">37,</ref><ref type="bibr" target="#b37">38]</ref>. A marginal example of such a structure is a dataset consisting of nonoverlapping enclosed hyperspheres or hyperellipsoids (for planar objects, they are nonoverlapping enclosed circles or ellipses), or objects of similar form being irregularly distorted or nonconvex <ref type="bibr" target="#b28">[29,</ref><ref type="bibr" target="#b38">39]</ref>. Another major example frequently occurred in practice is a dataset consisting of cluster bands, where serpentine-like clusters have a resembling structure <ref type="bibr" target="#b39">[40]</ref>.</p><p>Naturally, the DBSCAN algorithm could have been easily sped up if it was generally parallelizable <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b6">7]</ref>. Unfortunately, the algorithm is of two loops, and there is the nested loop which cannot be made independent of the outside loop <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b3">4]</ref>. The outside loop, working with each point in a given dataset, is based on the range query which can be implemented using database indexing for better performance rather than a slow linear scanning. Although some efforts have been made in this direction to parallelize the outside loop, the gain does not seem to have a significant impact <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b18">19]</ref>.</p><p>A follow-up DBSCAN algorithm named HDBSCAN <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b16">17]</ref> was an attempt to rectify the DBSCAN flat-result drawbacks rather its poor parallelizability. Neither did GDBSCAN advances in parallelizability <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b6">7]</ref>, which only moved the original DBSCAN algorithm parameters and to the predicates. Issuing from unsatisfactory runtime complexity of the DBSCAN algorithm for large datasets, the goal is to suggest an approach to DBSCAN parallelization without losing much in accuracy. The approach should be algorithmized by automatically adjusting neighborhood radius to the most efficient value. A series of computational experiments is to be carried out in order to see the performance of the parallelized DBSCAN algorithm with the hyperparameter adjustment compared to the performance of the original DBSCAN algorithm. The comparative results are then to be discussed by underlining limitations and drawbacks of the suggested DBSCAN parallelization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">DBSCAN parallelization</head><p>To adjust DBSCAN hyperparameters automatically, consider an ultimate repetition of trying to improve the accuracy. As there are more possible versions of the neighborhood radius than those of the minimum number of neighboring points, the neighborhood radius is adjusted first. For this, denote the maximum number of accuracy improvement fails by . While the minimum number of neighboring points is adjusted, the maximum number of accuracy improvement fails denoted by is used. These hyperparameters are adjusted as follows:</p><p>1. Divide the dataset into subsets.</p><p>2. Set to a minimum and apply the DBSCAN algorithm to a subset of points with known ground truth labels.</p><p>3. While the current number of accuracy improvement fails is fewer than , increase by . Once the number of outliers and redundant labels becomes fewer, set the current number of accuracy improvement fails to 0, and store the current value of as the best one; otherwise, increase the current number of accuracy improvement fails by 1.</p><p>4. Denote the best neighborhood radius by . 5. Apply the DBSCAN algorithm to the remaining subsets. 6. Set to 3 and apply the DBSCAN algorithm to the subset by the best neighborhood radius .</p><p>7. While the current number of accuracy improvement fails is fewer than , increase by 1. Once the number of outliers and redundant labels becomes fewer, set the current number of accuracy improvement fails to 0, and store the current value of as the best one; otherwise, increase the current number of accuracy improvement fails by 1.</p><p>Obviously, running the DBSCAN algorithm on the remaining subsets can be executed in parallel. Moreover, as the hyperparameters are adjusted (on a labeled subset), the size of a labeled subset can be set to a few different versions, and the adjustment can be executed in parallel on these differently sized subsets <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b36">37,</ref><ref type="bibr" target="#b37">38]</ref>. Differently sized subsets are obtained by changing number .</p><p>First of all, the DBSCAN parallelization is expected to significantly reduce the computation time. Secondly, the clustering accuracy is believed to remain comparable to the DBSCAN algorithm itself, if even the adjusted hyperparameters are used in it. The correctness of these hypotheses is going to be ascertained or falsified by computational experiments. A specificity of such experiments is that the dataset can be visualized, with better or worse extent of cluster appearance and distinguishability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Computational experiments</head><p>The performance of the parallelized DBSCAN algorithm with the hyperparameter adjustment is estimated by running the above-stated algorithm on planar datasets whose potential clusters are closely packed together. There are two types of such datasets: developing and enveloping. Serpentine-like clusters form a developing dataset (Figure <ref type="figure" target="#fig_0">1</ref>). Non-overlapping enclosed circles or ellipses modulated by sinusoidal functions form an enveloping dataset (Figure <ref type="figure" target="#fig_2">2</ref>). For abbreviation, these clusters (datasets) will be called serpentine and circular clusters (datasets), respectively. </p><p>i. e., , the dataset size is , and is the number of clusters of size each.</p><p>Besides, values and of two additional normally distributed random variables with zero mean and unit variance are used for this dataset. Thus, the horizontal component is </p><p>, ( <ref type="formula">5</ref>) where both and are independent being chosen randomly between 3 and 4 with a step of 0.001; both and are independent being chosen randomly between 0.01 and 0.1 with a step of ; is chosen randomly between 0.0005 and 0.005 with a step of , whereupon it is set negative if ; is varied between 0.1 and 0.5 with a step of 0.1; is chosen randomly between 0.0005 and 0.005 with a step of , whereupon it is set negative if . The serpentine dataset by ( <ref type="formula">2</ref>), ( <ref type="formula">3</ref>), ( <ref type="formula" target="#formula_0">1</ref>), ( <ref type="formula" target="#formula_1">4</ref>), ( <ref type="formula">5</ref>) is generated with three to 18 clusters, , for five versions of noise magnitude, , where each cluster contains 1000 to 10000 points with a step of 1000 points, .</p><p>Clusters contain the same number of points . The dataset is divided into subsets so that each cluster would contain the same number of points within every subset, where <ref type="bibr" target="#b5">(6)</ref> and subset is clustered, if is even and if is odd. Therefore, for a one generation of the serpentine dataset, there are 10 versions of the dataset size, 16 versions of the dataset cluster number, five versions of noise magnitude, and five versions of the dataset division. Applied to a dataset, the parallelized DBSCAN algorithm returns the best neighborhood radius , the best minimum number of neighboring points , the number of outliers (labeled by ), the number of points incorrectly labeled as non-existing clusters (their labels are greater than , as if there are some other one or a few clusters), and the number of points incorrectly labeled as existing clusters (being between 1 and , their labels are factually incorrect or confused). The total number of missed (incorrectly labeled) points is <ref type="bibr" target="#b6">(7)</ref> whose percentage is</p><formula xml:id="formula_2">. (<label>8</label></formula><formula xml:id="formula_3">)</formula><p>The serpentine dataset generation is repeated for 100 times, whereupon the results are averaged over the 100 repetitions. The averages are presented as a array whose entry is a set of , , , , , , .</p><p>Another array presents averages of the computation time. To compare results <ref type="bibr" target="#b8">(9)</ref> with those produced by the DBSCAN algorithm itself, the parallelized DBSCAN algorithm with the hyperparameter adjustment is also used, but with and . The original DBSCAN algorithm results are stored as a array with <ref type="bibr" target="#b8">(9)</ref>. Another array presents averages of the DBSCAN computation time. Comparative computation time statistics for the planar serpentine dataset is presented in Table <ref type="table" target="#tab_0">1</ref>, where every entry is an averaged ratio of the original DBSCAN algorithm computation time to the parallelized DBSCAN algorithm computation time. Ratio is a computation time function of the dataset size and the number of dataset clusters, which is a matrix preliminarily averaged over noise; is a matrix preliminarily averaged over dataset size; is a matrix preliminarily averaged over the number of dataset clusters. The advantage of the parallelized DBSCAN algorithm is quite obvious -it is much faster, while, prudently speaking, the speedup minimum is above 10 times and the speedup maximum exceeds 100 times. On overall average, the parallelized DBSCAN is 97.21 times faster. Comparative performance statistics for the planar serpentine dataset is presented in Table <ref type="table" target="#tab_1">2</ref>, where rows with</p><formula xml:id="formula_5">, ,<label>(10)</label></formula><p>contain percentages, "Five intervals" stands for the parallelized DBSCAN with five versions of the dataset division, and "Best interval" stands for the parallelized DBSCAN with the version at which percentage (8) of missed (incorrectly labeled) points is minimal. Herein and below, the DBSCAN accuracy is decomposed into rates for <ref type="bibr" target="#b9">(10)</ref> and <ref type="bibr" target="#b7">(8)</ref>. The parallelized DBSCAN misses (loses) points at a rate of about 7 %, whereas it is above 26 % by the original DBSCAN.</p><p>Points misidentified as outliers are lost at almost equal rates, although a little advantage of the parallelized DBSCAN exists whose rate is below 1.95 % (while the original DBSCAN loses above 2.1 % of points misidentified as outliers). The original DBSCAN incorrectly assigns above 3.1 % of points to non-existing clusters, but this rate is as 2.5 times as lower for the parallelized DBSCAN. The original DBSCAN confuses clusters even more exceeding 21 % of points incorrectly assigned to existing clusters, while this rate for the parallelized DBSCAN is slightly above 3.8 %. When the best interval is selected, the mentioned rates for <ref type="bibr" target="#b9">(10)</ref> and ( <ref type="formula" target="#formula_2">8</ref>) drop below 0.21 %, 0.26 %, 0.73 %, 1.19 %, respectively. However, the worst cases of the rates for <ref type="bibr" target="#b9">(10)</ref> and ( <ref type="formula" target="#formula_2">8</ref>) are not excluded even by the parallelized DBSCAN with the best interval (Figure <ref type="figure" target="#fig_3">3</ref>).  is chosen randomly between 0 and with a step of ; if and if ; is varied between 0.2 and 1 with a step of 0.2. The planar circular dataset by ( <ref type="formula" target="#formula_6">11</ref>), ( <ref type="formula">12</ref>), ( <ref type="formula" target="#formula_0">1</ref>), ( <ref type="formula">13</ref>) is generated with three to eight clusters, , for five versions of noise magnitude, , where each cluster contains 1000 to 10000 points with a step of 1000 points, .</p><p>The remaining parameters of the circular dataset and its processing including ( <ref type="formula">6</ref>) -( <ref type="formula" target="#formula_4">9</ref>) are the same as for the serpentine dataset, except for the array size that is now (the second dimension has 6 entries, which influences the subsequent arrays upon averaging, minimizing, and maximizing).</p><p>Presented in Table <ref type="table" target="#tab_2">3</ref>, comparative computation time statistics for the planar circular dataset resemble that in Table <ref type="table" target="#tab_0">1</ref> for the planar serpentine dataset. Thus, the parallelized DBSCAN speedup minimum is, prudently speaking, above 15 times and the speedup maximum exceeds 125 times, by roughly the same standard deviations. On overall average, the parallelized DBSCAN is 99.67 times faster (particularly, it is due to the maximum number of clusters here is fewer than that for the planar serpentine dataset). Presented in Table <ref type="table" target="#tab_3">4</ref>, comparative performance statistics for the planar circular dataset confirms the advantage of the parallelized DBSCAN algorithm. The parallelized DBSCAN loses points at a rate of about 7.12 % (slightly higher than that for the planar dataset), whereas it is 11.85 % by the original DBSCAN being 2.25 times lower than that for the planar dataset. Points misidentified as outliers are lost at rates of 1 % and 1.42 % by the original and parallelized DBSCAN algorithms, respectively. So, the parallelized DBSCAN may lose more circularly located points than the original DBSCAN. The original DBSCAN incorrectly assigns above 5.59 % of points to non-existing clusters, but this rate is as 4.26 times as lower for the parallelized DBSCAN. The cluster confusion is at similar rates -it is 5.25 % and 4.39 % by the original and parallelized DBSCAN algorithms, respectively. When the best interval is selected, the mentioned rates for ( <ref type="formula" target="#formula_5">10</ref>) and ( <ref type="formula" target="#formula_2">8</ref>) drop below 0.7 %, 0.3 %, 1.97 %, 2.96 %, respectively. These rates are clearly higher than those for the planar serpentine dataset. Nevertheless, the planar circular dataset worst case (Figure <ref type="figure" target="#fig_4">4</ref>) is not a complete fail like the planar serpentine dataset worst case is.  It is worth noting that the dataset worst case in Figure <ref type="figure" target="#fig_4">4</ref> has far much more variety in its structure than that in the worst case in Figure <ref type="figure" target="#fig_3">3</ref>. Despite this, the best-interval parallelized DBSCAN performs relatively much better than the original DBSCAN. The latter has 73.7 % of missed points versus 44.9 % of points missed by the parallelized DBSCAN with (Figure <ref type="figure" target="#fig_5">5</ref>). The percentage of outliers, however, is 13.1 % by the parallelized DBSCAN, whereas the original DBSCAN leaves here 7.225 % of points as outliers. Meanwhile, the original DBSCAN assigns 59.925 % of points to non-existing clusters, and this rate is just 8.3 % for the parallelized DBSCAN (the black-square marked points are clearly seen in Figure <ref type="figure" target="#fig_5">5</ref>). Due to this, the cluster confusion is lower at the original DBSCAN -it is 6.55 % versus 23.5 % at the parallelized DBSCAN. In this particular example, the parallelized DBSCAN performance significantly worsens as the dataset is divided into fewer subsets. Thus, and the rates for (10) are 6.7625 %, 13.1625 %, 28.15 % with , but and the rates for (10) are 5.9875 %, 26.9125 %, 30.7625 % with (Figure <ref type="figure" target="#fig_7">6</ref>). Furthermore, these rates are 52.9625 %, 1.1875 %, 3.2625 %, 48.5125 % with , and 69.25 %, 0.7875 %, 1.0125 %, 67.45 % with , respectively (Figure <ref type="figure" target="#fig_8">7</ref>), i. e. the rate of the cluster confusion grows while the rate of outliers drops. The rate of points assigned to non-existing clusters becomes lower as the subset is made larger (or, in other words, number is taken fewer).</p><p>For creating three-dimensional serpentine-like clusters, a circular dataset point </p><p>by ( <ref type="formula">13</ref>),</p><formula xml:id="formula_8">, ,<label>(17)</label></formula><p>, ,</p><p>and values , , of three additional normally distributed random variables with zero mean and unit variance, wherein is chosen randomly between 0.1 and 3 with a step of ; if and if ; is chosen randomly between 2 and 18 with a step of 1; is chosen randomly between 0 and with a step of ; if and  is varied between 0.1 and 0.5 with a step of 0.1. Three-dimensional serpentine-like clusters form a developing dataset whose visualization is far much more complicated than that of the planar circular dataset (Figure <ref type="figure" target="#fig_9">8</ref>). The three-dimensional serpentine dataset by ( <ref type="formula" target="#formula_7">14</ref>) -( <ref type="formula">16</ref>), (1), ( <ref type="formula">13</ref>), ( <ref type="formula" target="#formula_8">17</ref>), ( <ref type="formula" target="#formula_9">18</ref>) is generated with three to eight clusters, , for five versions of noise magnitude, , where each cluster contains 1000 to 10000 points with a step of 1000 points, .</p><p>The remaining parameters of the three-dimensional serpentine dataset and its processing including ( <ref type="formula">6</ref>) -( <ref type="formula" target="#formula_4">9</ref>) are the same as for the planar circular dataset. Presented in Table <ref type="table" target="#tab_4">5</ref>, comparative computation time statistics for the three-dimensional serpentine dataset does not resemble that in Tables <ref type="table" target="#tab_1">1 and 2</ref> for the planar datasets. Here, the parallelized DBSCAN speedup minimum is, prudently speaking, above 40 times and the speedup maximum exceeds 600 times, although standard deviations are higher than those for the planar datasets. On overall average, the parallelized DBSCAN over three-dimensional serpentine datasets is 342 times faster. Eventually presented in Table <ref type="table" target="#tab_5">6</ref>, comparative performance statistics for the threedimensional serpentine dataset again confirms the advantage of the parallelized DBSCAN algorithm, but if the best subset size (best interval) is selected. The parallelized DBSCAN loses points at a higher rate (roughly, 12.5 %) than the original DBSCAN does (roughly, 8.6 %), but the best-interval parallelized DBSCAN misses points at a rate of below 4.8 %. The original DBSCAN rarely "sees" outliers having their average rate below 0.1 % and its maximum below 12.5 %, but it "sees" non-existing clusters at a rate of 2.97 % and confuses clusters at a rate of 5.57 %. The respective rates of non-existing clusters and cluster confusion for the parallelized DBSCAN are 5.37 % and 4.39 %, whereas they drop to 2.086 % and 2.065 % with the bestinterval parallelized DBSCAN.</p><p>The worst three-dimensional case is in Figure <ref type="figure" target="#fig_9">8</ref>, where 76.725 % of points are missed by the best-interval parallelized DBSCAN with the dataset division into 100 subsets. Meanwhile, the original DBSCAN performs at a rate of 24.25 % on this dataset. It is worth noting that in this case the noise magnitude is minimal. It appears to be paradoxical, but at greater noise magnitudes the best-interval parallelized DBSCAN performs even worse on some instances of the three-dimensional dataset with the maximum of clusters. The particular dataset in Figure <ref type="figure" target="#fig_9">8</ref> turns out to be the worst case due to the original DBSCAN performs far better on it (it could be even acceptable in some practical situations), and thus the difference between the original and best-interval parallelized DBSCAN algorithms is paradoxically huge. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion and conclusion</head><p>Compared to the performance of the original DBSCAN algorithm, the parallelized DBSCAN performs more accurately being extremely fast. The latter is straightforwardly inferred from Tables 1, 3, 5, where the average speedup exceeds 80 times for planar datasets and exceeds 300 times for three-dimensional datasets. The gain in accuracy is not that unambiguous. While the parallelized DBSCAN outperforms the original DBSCAN on planar datasets (Tables <ref type="table" target="#tab_3">2 and 4</ref>), even without selecting the best size of the subset (called the best interval due to planarity), it underperforms on three-dimensional datasets without selecting the best-interval result. Only the best-interval parallelized DBSCAN outperforms the original DBSCAN on threedimensional datasets. It is likely reasoned by a higher complexity of the structure of such datasets (see Figure <ref type="figure" target="#fig_9">8</ref>) compared to the structure of planar datasets (see Figures <ref type="figure" target="#fig_4">1 -4</ref>). In other words, the DBSCAN parallelization by spatial dataset division and hyperparameter adjustment is less efficient on datasets with more complex structure. However, the series of computational experiments has exposed a possibility of the parallelized DBSCAN underperformance for simple-structured datasets also (see Figure <ref type="figure" target="#fig_3">3</ref> with the worst case of the planar serpentine dataset, although its structure is far much simpler than the structure of planar circular dataset in Figure <ref type="figure" target="#fig_4">4</ref>). The growing number of clusters may additionally affect the parallelized DBSCAN performance. Nevertheless, as the dataset size increases and the number of clusters increases, one by one or together, the parallelized DBSCAN outperforms the original DBSCAN more. The gain in accuracy ranges from 4.9072 to 276.8096 times for the planar serpentine dataset, when an averaged over noise ratio of the original DBSCAN accuracy to the parallelized DBSCAN accuracy is considered as a function of the dataset size and the number of clusters. The accuracy gain drastically drops for the planar circular dataset ranging from 1.4636 to 5.5163 times. For the three-dimensional serpentine dataset, the accuracy gain ranges from 0.0004 to 10.9631, i. e. it is below 1 for three and four clusters (it is just 0.1007 and 0.3179 for three and four clusters, respectively). For five clusters and more, along with the dataset size increasing, the parallelized DBSCAN outperforms the original DBSCAN. The poorer performance on the fewest number of clusters is an obvious limitation of the parallelized DBSCAN.</p><p>Another limitation is the poorer performance on datasets with circularity. Dividing into intervals is uncommon for circular datasets, whose natural division would be circular-sector, especially for three-dimensional datasets. Moreover, it is hard to divide so that each cluster would contain (approximately) the same number of points within every subset. In addition, the DBSCAN parallelization efficiency is impossible to estimate without known ground truth labels.</p><p>The DBSCAN parallelization efficiency expectedly deteriorates when density is changeable. The dataset in Figure <ref type="figure" target="#fig_3">3</ref> is seemingly the case, where each serpentine has significantly thinner and thicker intervals observable via zoom-in. This is the most common drawback of densitybased clustering methods, although they all, and the DBSCAN algorithm in particular, are intended to handle varying densities of points. However, the datasets used for the computational experiments have density-modulated regions (i. e., varying density of regions of changeable densities of points), which have served as a close-to-the-worst-case scenario in order to reveal weaknesses of the suggested DBSCAN parallelization.</p><p>Overall, it is a promising approach to speed up arbitrary-shape clustering without losing in accuracy. Posed as a local optimization versus global optimization, the parallelized DBSCAN performs well on serpentine datasets but not limited only to them. The hyperparameters, neighborhood radius and minimum-neighbors number, are gradually adjusted with some steps, which are tunable also (basically, the neighborhood radius increment step). Apart from the subsets of the divided dataset, the suggested modification of the DBSCAN algorithm can be run in parallel for multiple labeled subset sizes, as well as for a few versions of the neighborhood radius increment step.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: An example of a dataset comprised by 10 serpentine-like clusters (each cluster contains 1000 points).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: An example of a dataset comprised by six enclosed circular clusters (each cluster contains 1000 points).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: The worst case of the planar serpentine dataset of 11000 points and 11 clusters, where the best result of is produced by the best-interval parallelized DBSCAN. and values and of two additional normally distributed random variables with zero mean and unit variance, wherein is chosen randomly between 0.1 and 3 with a step of ; if and if ;is chosen randomly between 2 and 18 with a step of</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: The worst case of the planar circular dataset of 8000 points and 8 clusters, where the best result of is produced by the best-interval parallelized DBSCAN.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: The original DBSCAN performance (above) versus the best-interval parallelized DBSCAN performance (below) in the worst case of the planar circular dataset in Figure 4 (black squares mark points assigned to non-existing clusters, and red squares mark outliers).</figDesc><graphic coords="14,128.95,352.00,328.40,316.45" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head></head><label></label><figDesc>normally distributed random variables with zero mean and unit variance and values , , of uniformly distributed random variables on interval for point belonging to cluster by (1), where ,</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: The parallelized DBSCAN performance in the planar circular dataset worst case (Figure 4) with the dataset division into 50 (above) and 25 (below) intervals (subsets).</figDesc><graphic coords="16,129.60,353.00,327.20,316.45" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: The parallelized DBSCAN performance in the planar circular dataset worst case (Figure 4) with the dataset division into 20 (above) and 10 (below) intervals (subsets).</figDesc><graphic coords="17,129.60,354.90,327.20,318.70" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_9"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: An example of the three-dimensional dataset comprised by 8 serpentine-like clusters (each cluster contains 1000 points); the , , and views are shown below exemplifying that such datasets bear distinct features of circular datasets.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Comparative computation time statistics for the planar serpentine dataset</figDesc><table><row><cell>Statistics</cell><cell>Minimum</cell><cell>Average</cell><cell>Maximum</cell><cell>Standard deviation</cell></row><row><cell></cell><cell>13.78549</cell><cell>82.3331</cell><cell>157.3137</cell><cell>35.78136</cell></row><row><cell></cell><cell>52.49892</cell><cell>91.56534</cell><cell>117.2529</cell><cell>16.80617</cell></row><row><cell></cell><cell>22.08231</cell><cell>87.4891</cell><cell>139.2775</cell><cell>34.42707</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Comparative performance statistics for the planar serpentine dataset</figDesc><table><row><cell>Statistics</cell><cell>Minimum</cell><cell>Average</cell><cell>Maximum</cell><cell>Standard deviation</cell></row><row><cell>DBSCAN</cell><cell>0.04</cell><cell>0.93991</cell><cell>2.87</cell><cell>0.3686</cell></row><row><cell>Five intervals</cell><cell>0.06</cell><cell>1.1708</cell><cell>3.53</cell><cell>0.43709</cell></row><row><cell>Best interval</cell><cell>0.31</cell><cell>1.34179</cell><cell>3.53</cell><cell>0.46328</cell></row><row><cell>DBSCAN</cell><cell>3</cell><cell>4.03313</cell><cell>27</cell><cell>2.39181</cell></row><row><cell>Five intervals</cell><cell>3</cell><cell>3.0225</cell><cell>17</cell><cell>0.34669</cell></row><row><cell>Best interval</cell><cell>3</cell><cell>3.00113</cell><cell>9</cell><cell>0.06324</cell></row><row><cell>DBSCAN</cell><cell>0</cell><cell>2.16369</cell><cell>99.90769</cell><cell>13.575</cell></row><row><cell>Five intervals</cell><cell>0</cell><cell>1.93919</cell><cell>99.55</cell><cell>10.98534</cell></row><row><cell>Best interval</cell><cell>0</cell><cell>0.20774</cell><cell>66.50909</cell><cell>0.8341</cell></row><row><cell>DBSCAN</cell><cell>0</cell><cell>3.13743</cell><cell>82.32462</cell><cell>8.84055</cell></row><row><cell>Five intervals</cell><cell>0</cell><cell>1.22586</cell><cell>79.81176</cell><cell>3.53786</cell></row><row><cell>Best interval</cell><cell>0</cell><cell>0.25967</cell><cell>31.83636</cell><cell>1.03566</cell></row><row><cell>DBSCAN</cell><cell>0</cell><cell>21.45342</cell><cell>94.44389</cell><cell>28.24494</cell></row><row><cell>Five intervals</cell><cell>0</cell><cell>3.80889</cell><cell>80.86471</cell><cell>8.94931</cell></row><row><cell>Best interval</cell><cell>0</cell><cell>0.72118</cell><cell>77.975</cell><cell>2.92216</cell></row><row><cell>DBSCAN</cell><cell>0</cell><cell>26.75454</cell><cell>99.97692</cell><cell>33.20733</cell></row><row><cell>Five intervals</cell><cell>0</cell><cell>6.97394</cell><cell>99.9</cell><cell>16.30767</cell></row><row><cell>Best interval</cell><cell>0</cell><cell>1.1886</cell><cell>87.35455</cell><cell>4.06603</cell></row><row><cell>A circular dataset point</cell><cell></cell><cell cols="3">is randomly generated using values</cell><cell>,</cell><cell>of</cell></row><row><cell cols="5">normally distributed random variables with zero mean and unit variance and values</cell><cell>,</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Comparative computation time statistics for the planar circular dataset</figDesc><table><row><cell>Statistics</cell><cell>Minimum</cell><cell>Average</cell><cell>Maximum</cell><cell>Standard deviation</cell></row><row><cell></cell><cell>17.38579</cell><cell>87.96091</cell><cell>151.3004</cell><cell>38.82666</cell></row><row><cell></cell><cell>71.587</cell><cell>99.31722</cell><cell>137.7623</cell><cell>16.79144</cell></row><row><cell></cell><cell>18.54298</cell><cell>88.55638</cell><cell>169.5846</cell><cell>40.77947</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Comparative performance statistics for the planar circular dataset</figDesc><table><row><cell>Statistics</cell><cell>Minimum</cell><cell>Average</cell><cell>Maximum</cell><cell>Standard deviation</cell></row><row><cell>DBSCAN</cell><cell>0.34</cell><cell>0.99729</cell><cell>2.46</cell><cell>0.32409</cell></row><row><cell>Five intervals</cell><cell>0.01</cell><cell>1.20794</cell><cell>3.51</cell><cell>0.53009</cell></row><row><cell>Best interval</cell><cell>0.31</cell><cell>1.33544</cell><cell>3.37</cell><cell>0.55512</cell></row><row><cell>DBSCAN</cell><cell>3</cell><cell>4.68783</cell><cell>30</cell><cell>3.56671</cell></row><row><cell>Five intervals</cell><cell>3</cell><cell>3.10963</cell><cell>28</cell><cell>0.95433</cell></row><row><cell>Best interval</cell><cell>3</cell><cell>3.16067</cell><cell>20</cell><cell>1.22944</cell></row><row><cell>DBSCAN</cell><cell>0</cell><cell>1.01034</cell><cell>35.72857</cell><cell>2.6169</cell></row><row><cell>Five intervals</cell><cell>0</cell><cell>1.41552</cell><cell>100</cell><cell>4.8618</cell></row><row><cell>Best interval</cell><cell>0</cell><cell>0.69079</cell><cell>20.01563</cell><cell>1.21654</cell></row><row><cell>DBSCAN</cell><cell>0</cell><cell>5.59503</cell><cell>67.76286</cell><cell>12.54719</cell></row><row><cell>Five intervals</cell><cell>0</cell><cell>1.3124</cell><cell>46.65625</cell><cell>3.22294</cell></row><row><cell>Best interval</cell><cell>0</cell><cell>0.29938</cell><cell>11.75</cell><cell>0.8674</cell></row><row><cell>DBSCAN</cell><cell>0</cell><cell>5.24657</cell><cell>74.58056</cell><cell>10.69118</cell></row><row><cell>Five intervals</cell><cell>0</cell><cell>4.38885</cell><cell>67.45</cell><cell>7.23762</cell></row><row><cell>Best interval</cell><cell>0</cell><cell>1.96513</cell><cell>29.9125</cell><cell>3.54033</cell></row><row><cell>DBSCAN</cell><cell>0</cell><cell>11.85194</cell><cell>80.95</cell><cell>19.0927</cell></row><row><cell>Five intervals</cell><cell>0</cell><cell>7.11677</cell><cell>100</cell><cell>10.6957</cell></row><row><cell>Best interval</cell><cell>0</cell><cell>2.9553</cell><cell>44.9</cell><cell>4.55293</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5</head><label>5</label><figDesc>Comparative computation time statistics for the three-dimensional serpentine dataset</figDesc><table><row><cell>Statistics</cell><cell>Minimum</cell><cell>Average</cell><cell>Maximum</cell><cell>Standard deviation</cell></row><row><cell></cell><cell>43.01525</cell><cell>304.0213</cell><cell>1136.548</cell><cell>218.268</cell></row><row><cell></cell><cell>218.7431</cell><cell>338.8232</cell><cell>679.8032</cell><cell>116.2364</cell></row><row><cell></cell><cell>44.25815</cell><cell>309.3223</cell><cell>953.4824</cell><cell>211.2102</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 6</head><label>6</label><figDesc>Comparative performance statistics for the three-dimensional serpentine dataset</figDesc><table><row><cell>Statistics</cell><cell>Minimum</cell><cell>Average</cell><cell>Maximum</cell><cell>Standard deviation</cell></row><row><cell>DBSCAN</cell><cell>0.31</cell><cell>2.35159</cell><cell>4.86</cell><cell>0.72883</cell></row><row><cell>Five intervals</cell><cell>0.01</cell><cell>1.51307</cell><cell>5.4</cell><cell>0.7421</cell></row><row><cell>Best interval</cell><cell>0.29</cell><cell>1.73727</cell><cell>5.26</cell><cell>0.80428</cell></row><row><cell>DBSCAN</cell><cell>3</cell><cell>3.51867</cell><cell>23</cell><cell>1.98233</cell></row><row><cell>Five intervals</cell><cell>3</cell><cell>3.0972</cell><cell>14</cell><cell>0.65372</cell></row><row><cell>Best interval</cell><cell>3</cell><cell>3.03367</cell><cell>9</cell><cell>0.29922</cell></row><row><cell>DBSCAN</cell><cell>0</cell><cell>0.05302</cell><cell>12.17857</cell><cell>0.29464</cell></row><row><cell>Five intervals</cell><cell>0</cell><cell>2.75551</cell><cell>100</cell><cell>10.64053</cell></row><row><cell>Best interval</cell><cell>0</cell><cell>0.62252</cell><cell>56.16667</cell><cell>2.80441</cell></row><row><cell>DBSCAN</cell><cell>0</cell><cell>2.97478</cell><cell>71.8</cell><cell>8.54771</cell></row><row><cell>Five intervals</cell><cell>0</cell><cell>5.37361</cell><cell>67.63</cell><cell>8.39236</cell></row><row><cell>Best interval</cell><cell>0</cell><cell>2.08576</cell><cell>51.025</cell><cell>4.12471</cell></row><row><cell>DBSCAN</cell><cell>0</cell><cell>5.56964</cell><cell>58.4</cell><cell>10.04399</cell></row><row><cell>Five intervals</cell><cell>0</cell><cell>4.38729</cell><cell>74.45714</cell><cell>6.41309</cell></row><row><cell>Best interval</cell><cell>0</cell><cell>2.06541</cell><cell>54.625</cell><cell>4.71045</cell></row><row><cell>DBSCAN</cell><cell>0</cell><cell>8.59744</cell><cell>75.0875</cell><cell>14.35587</cell></row><row><cell>Five intervals</cell><cell>0</cell><cell>12.51642</cell><cell>100</cell><cell>18.76093</cell></row><row><cell>Best interval</cell><cell>0</cell><cell>4.77369</cell><cell>76.725</cell><cell>9.18129</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A density-based algorithm for discovering clusters in large spatial databases with noise</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ester</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-P</forename><surname>Kriegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sander</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96)</title>
				<meeting>the Second International Conference on Knowledge Discovery and Data Mining (KDD-96)</meeting>
		<imprint>
			<publisher>AAAI Press</publisher>
			<date type="published" when="1996">1996</date>
			<biblScope unit="page" from="226" to="231" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Density-based clustering based on hierarchical density estimates</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J G B</forename><surname>Campello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Moulavi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sander</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-37456-2_14</idno>
	</analytic>
	<monogr>
		<title level="m">Advances in Knowledge Discovery and Data Mining</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Pei</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><forename type="middle">S</forename><surname>Tseng</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Cao</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Motoda</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">7819</biblScope>
			<biblScope unit="page" from="160" to="172" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A fast DBSCAN algorithm using a bi-directional HNSW index structure for big data</title>
		<author>
			<persName><forename type="first">S</forename><surname>Weng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gou</surname></persName>
		</author>
		<idno type="DOI">10.1007/s13042-024-02104-8</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal of Machine Learning and Cybernetics</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page" from="3471" to="3494" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">DBSCAN Revisited, revisited: Why and how you should (still) use DBSCAN</title>
		<author>
			<persName><forename type="first">E</forename><surname>Schubert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sander</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ester</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">P</forename><surname>Kriegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xu</surname></persName>
		</author>
		<idno type="DOI">10.1145/3068335</idno>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Database Systems</title>
		<imprint>
			<biblScope unit="volume">42</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="1" to="21" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Generalized Density-Based Clustering for Spatial Data Mining</title>
		<author>
			<persName><forename type="first">J</forename><surname>Sander</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1998">1998</date>
			<publisher>Herbert Utz Verlag</publisher>
			<pubPlace>München</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Load spectra extrapolation by bandwidth-optimized kernel density estimation based on DBSCAN algorithm</title>
		<author>
			<persName><forename type="first">X</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wan</surname></persName>
		</author>
		<idno type="DOI">10.1007/s42417-023-00919-3</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Vibration Engineering &amp; Technologies</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="1445" to="1456" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications</title>
		<author>
			<persName><forename type="first">J</forename><surname>Sander</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ester</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-P</forename><surname>Kriegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xu</surname></persName>
		</author>
		<idno type="DOI">10.1023/A:1009745219419</idno>
	</analytic>
	<monogr>
		<title level="j">Data Mining and Knowledge Discovery</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="169" to="194" />
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">An improved DBSCAN algorithm for hazard recognition of obstacles in unmanned scenes</title>
		<author>
			<persName><forename type="first">W</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="DOI">10.1007/s00500-023-09319-x</idno>
	</analytic>
	<monogr>
		<title level="j">Soft Computing</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="page" from="18585" to="18604" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Speedup of the k-means algorithm for partitioning large datasets of flat points by a preliminary partition and selecting initial centroids</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">V</forename><surname>Romanuke</surname></persName>
		</author>
		<idno type="DOI">10.2478/acss-2023-0001</idno>
	</analytic>
	<monogr>
		<title level="j">Applied Computer Systems</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="12" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Optimized centroid-based clustering of dense nearly-square point clouds by the hexagonal pattern</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">V</forename><surname>Romanuke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">V</forename><surname>Merinova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">A</forename><surname>Yehoshyna</surname></persName>
		</author>
		<idno type="DOI">10.2478/ecce-2023-0005</idno>
	</analytic>
	<monogr>
		<title level="j">Electrical, Control and Communication Engineering</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="29" to="39" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Bus station location selection method based on DBSCAN-DPC clustering algorithm</title>
		<author>
			<persName><forename type="first">S</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Qi</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-981-97-1103-1_13</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of 3rd 2023 International Conference on Autonomous Unmanned Systems (3rd ICAUS 2023)</title>
		<title level="s">Lecture Notes in Electrical Engineering</title>
		<editor>
			<persName><forename type="first">Y</forename><surname>Qu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Gu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Niu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">W</forename><surname>Fu</surname></persName>
		</editor>
		<meeting>3rd 2023 International Conference on Autonomous Unmanned Systems (3rd ICAUS 2023)<address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">1177</biblScope>
		</imprint>
	</monogr>
	<note>ICAUS 2023</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Assessment of landslide susceptibility using DBSCAN-AHD and LD-EV methods</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Mao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Mwakapesa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11629-020-6491-7</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Mountain Science</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="page" from="184" to="197" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Density-Based Clustering for Incomplete Data</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Qi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Dong</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-981-99-7657-7_5</idno>
	</analytic>
	<monogr>
		<title level="m">Dirty Data Processing for Machine Learning</title>
				<meeting><address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">A Novel Approach to Determining the Radius of the Neighborhood Required for the DBSCAN Algorithm</title>
		<author>
			<persName><forename type="first">A</forename><surname>Starczewski</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-87986-0_32</idno>
	</analytic>
	<monogr>
		<title level="m">Artificial Intelligence and Soft Computing, ICAISC 2021</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">L</forename><surname>Rutkowski</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Scherer</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Korytkowski</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">W</forename><surname>Pedrycz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Tadeusiewicz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Zurada</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">12854</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Algorithm AS 136: A k-means clustering algorithm</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Hartigan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Wong</surname></persName>
		</author>
		<idno type="DOI">10.2307/2346830</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of the Royal Statistical Society, Ser. C</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="100" to="108" />
			<date type="published" when="1979">1979</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A comparative study of efficient initialization methods for the k-means clustering algorithm</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Celebi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">A</forename><surname>Kingravi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Vela</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.eswa.2012.07.021</idno>
	</analytic>
	<monogr>
		<title level="j">Expert Systems with Applications</title>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="200" to="210" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Hierarchical density estimates for data clustering, visualization, and outlier detection</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J G B</forename><surname>Campello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Moulavi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zimek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sander</surname></persName>
		</author>
		<idno type="DOI">10.1145/2733381</idno>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Knowledge Discovery from Data</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="51" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Density-based clustering</title>
		<author>
			<persName><forename type="first">H.-P</forename><surname>Kriegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kröger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sander</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zimek</surname></persName>
		</author>
		<idno type="DOI">10.1002/widm.30</idno>
	</analytic>
	<monogr>
		<title level="j">Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="231" to="240" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Identification of convective and stratiform clouds based on the improved DBSCAN clustering algorithm</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zuo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yuan</surname></persName>
		</author>
		<idno type="DOI">10.1007/s00376-021-1223-7</idno>
	</analytic>
	<monogr>
		<title level="j">Advances in Atmospheric Sciences</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="page" from="2203" to="2212" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Convolutional neural networks (CNN) and DBSCAN clustering for SARs-CoV challenges: complete deep learning solution</title>
		<author>
			<persName><forename type="first">G</forename><surname>Habib</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Qureshi</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-981-19-2535-1_35</idno>
	</analytic>
	<monogr>
		<title level="m">International Conference on Innovative Computing and Communications</title>
		<title level="s">Lecture Notes in Networks and Systems</title>
		<editor>
			<persName><forename type="first">D</forename><surname>Gupta</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Khanna</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Bhattacharyya</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">E</forename><surname>Hassanien</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Anand</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename></persName>
		</editor>
		<meeting><address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">471</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">OPTICS: Ordering points to identify the clustering structure</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ankerst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Breunig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-P</forename><surname>Kriegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sander</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACM SIGMOD International Conference on Management of Data</title>
				<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="49" to="60" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J G B</forename><surname>Campello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Moulavi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zimek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sander</surname></persName>
		</author>
		<idno type="DOI">10.1007/s10618-013-0311-4</idno>
	</analytic>
	<monogr>
		<title level="j">Data Mining and Knowledge Discovery</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page">344</biblScope>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Feature reduction of unbalanced data classification based on density clustering</title>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">F</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">Y</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">Y</forename><surname>Cao</surname></persName>
		</author>
		<idno type="DOI">10.1007/s00607-023-01206-5</idno>
	</analytic>
	<monogr>
		<title level="j">Computing</title>
		<imprint>
			<biblScope unit="volume">106</biblScope>
			<biblScope unit="page" from="29" to="55" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Density-Based Clustering for Incomplete Data</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Qi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Dong</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-981-99-7657-7_5</idno>
	</analytic>
	<monogr>
		<title level="m">Dirty Data Processing for Machine Learning</title>
				<meeting><address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">An outlier detection algorithm for electric power data based on DBSCAN and LOF</title>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Guo</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-981-15-3753-0_110</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 9th International Conference on Computer Engineering and Networks, Advances in Intelligent Systems and Computing</title>
				<editor>
			<persName><forename type="first">Q</forename><surname>Liu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Zhou</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><forename type="middle">H</forename><surname>Zhao</surname></persName>
		</editor>
		<meeting>the 9th International Conference on Computer Engineering and Networks, Advances in Intelligent Systems and Computing<address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">1143</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Density-based clustering of static and dynamic functional MRI connectivity features obtained from subjects with cognitive impairment</title>
		<author>
			<persName><forename type="first">D</forename><surname>Rangaprakash</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Odemuyiwa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">Narayana</forename><surname>Dutt</surname></persName>
		</author>
		<idno type="DOI">10.1186/s40708-020-00120-2</idno>
	</analytic>
	<monogr>
		<title level="j">Brain Informatics</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page">19</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Clustering-based visualizations for diagnosing diseases on metagenomic data</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">T</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">H</forename><surname>Phan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">T T</forename><surname>Pham</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11760-024-03264-4</idno>
	</analytic>
	<monogr>
		<title level="j">Signal, Image and Video Processing</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="5685" to="5699" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Density-Based Clustering</title>
		<author>
			<persName><forename type="first">P</forename><surname>Sarang</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-02363-7_12</idno>
	</analytic>
	<monogr>
		<title level="m">Thinking Data Science, The Springer Series in Applied Machine Learning</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Power-line extraction and modelling from 3D point clouds data based on K-D tree DBSCAN algorithm</title>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">R</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">H</forename><surname>Xia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">J</forename><surname>Long</surname></persName>
		</author>
		<idno type="DOI">10.1007/s42835-023-01641-6</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Electrical Engineering &amp; Technology</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="page" from="3587" to="3597" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Evolutionary divide and conquer (I): A novel genetic approach to the TSP</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">L</forename><surname>Valenzuela</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Jones</surname></persName>
		</author>
		<idno type="DOI">10.1162/evco.1993.1.4.313</idno>
	</analytic>
	<monogr>
		<title level="j">Evolutionary Computation</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="313" to="333" />
			<date type="published" when="1993">1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Traveling salesman problem parallelization by solving clustered subproblems</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">V</forename><surname>Romanuke</surname></persName>
		</author>
		<idno type="DOI">10.2478/fcds-2023-0020</idno>
	</analytic>
	<monogr>
		<title level="j">Foundations of Computing and Decision Sciences</title>
		<imprint>
			<biblScope unit="volume">48</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="453" to="481" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Deep clustering of the traveling salesman problem to parallelize its solution</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">V</forename><surname>Romanuke</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.cor.2024.106548</idno>
	</analytic>
	<monogr>
		<title level="j">Computers &amp; Operations Research</title>
		<imprint>
			<biblScope unit="volume">165</biblScope>
			<biblScope unit="page">106548</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Support-vector networks</title>
		<author>
			<persName><forename type="first">C</forename><surname>Cortes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vapnik</surname></persName>
		</author>
		<idno type="DOI">10.1007/BF00994018</idno>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="273" to="297" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Support vector clustering</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ben-Hur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Horn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Siegelmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vapnik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="125" to="137" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Optimization of a dataset for a machine learning task by clustering and selecting closest-to-the-centroid objects</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">V</forename><surname>Romanuke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Herald of Khmelnytskyi national university</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="263" to="265" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note>Technical sciences</note>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Optimal partitioning of an initial dataset into subdatasets to be clustered for getting rid off the dataset superfluities for a machine learning task, Herald of Khmelnytskyi national university</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">V</forename><surname>Romanuke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Technical sciences</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="213" to="215" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">Random centroid initialization for improving centroid-based clustering, Decision Making</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">V</forename><surname>Romanuke</surname></persName>
		</author>
		<idno type="DOI">10.31181/dmame622023742</idno>
	</analytic>
	<monogr>
		<title level="j">Applications in Management and Engineering</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="734" to="746" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Parallelization of the traveling salesman problem by clustering its nodes and finding the best route passing through the centroids</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">V</forename><surname>Romanuke</surname></persName>
		</author>
		<idno type="DOI">10.2478/acss-2023-0019</idno>
	</analytic>
	<monogr>
		<title level="j">Applied Computer Systems</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="189" to="202" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">Sharp curve detection of autonomous vehicles using DBSCAN and augmented sliding window techniques</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">C</forename><surname>Bhupathi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ferdowsi</surname></persName>
		</author>
		<idno type="DOI">10.1007/s13177-022-00317-1</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal of Intelligent Transportation Systems Research</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="651" to="671" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<analytic>
		<title level="a" type="main">Cluster analysis for multidimensional objects in fuzzy data conditions</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zack</surname></persName>
		</author>
		<idno type="DOI">10.20535/SRIT.2308-8893.2021.2.02</idno>
	</analytic>
	<monogr>
		<title level="j">System Research and Information Technologies</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="18" to="34" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
