<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Clustering Classical Data with Quantum k-Means⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessandro Poggiali</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Berti</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Bernasconi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianna M. Del Corso</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Guidotti</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Piemonte Orientale</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Pisa</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <fpage>7</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>In the last years, we have witnessed an unstoppable growth of data created, captured, copied, and consumed globally by more and more devices. The demand for such an increasing amount of information to be processed led to research towards higher computational power systems and specialized algorithms. Among them, quantum computing is a promising paradigm based on quantum theory for performing fast computations. Quantum algorithms are expected to surpass their classical counterparts in terms of computational complexity for certain kinds of tasks, and machine learning is one of them, so the subfield of Quantum Machine Learning is one of the most promising. In this work, we design a hybrid quantum algorithm for k-Means. The main idea of our algorithm is to compute in a quantum way the distance between pairs of records in the input dataset. We show that our quantum algorithm could be, in principle, more eficient than the classical k-Means, yet obtain comparable clustering results.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Quantum Machine Learning</kwd>
        <kwd>Clustering</kwd>
        <kwd>Data Mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Quantum Machine Learning (QML) is the branch of Quantum Computing (QC) that attempts to
adapt classical data mining and machine learning algorithms, or their expensive subroutines,
to run on a potential quantum computer. Indeed, the expectation is that such machines will
be commonly available for applications in the near future. Many QML algorithms have been
recently studied [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, the translation of classical algorithms into their quantum
counterpart named “quantization” is not trivial and hides many dificulties. In this work, we
focus on the quantization of -Means [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which is one of the most famous algorithms used
for clustering. The -Means clustering algorithm is an unsupervised learning algorithm, and
its goal is to find natural groups of elements in a data set. In particular, the elements inside a
group are more similar to the central element of the group than to the central elements of other
groups, according to a specific distance measure. Building a quantum version of this algorithm
means creating a quantum circuit that takes classical data as input and exploits quantum gates
to perform the computation, satisfying all the quantum mechanical constraints. Nowadays
quantum computers are in the noisy intermediate-scale quantum (NISQ) era[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This means that
today, quantum computers are prone to noises that generate errors in computation. This is
known as the decoherence problem. Typically, the deeper the circuit, the more prone the output
is to errors that make it unreliable.
      </p>
      <p>For this reason, this work presents a hybrid -Means that exploits a quantum subroutine
to boost the distance computation between each record and each centroid while the overall
algorithm remains classical. This solution permits the mitigation of the decoherence problem in
this NISQ era. In our analysis, we use the classical  --Means algorithm to provide a measure
of error for our hybrid -Means that we call --Means algorithm. The experiments show that
our quantum algorithm could be, in principle, more eficient than its classical counterpart yet
obtain comparable clustering results.</p>
      <p>The rest of the paper organizes as follows. Section 2 provides the related works, Section 3
sets up the stage by defining the notations, Section 4 describes the Quantum k-Means, Section 5
reports the experimental results, and eventually, Section 6 summarizes contributions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        In general, the problem of quantum clustering can be addressed in several ways. Some studies
were inspired by quantum theory. For instance, the classical clustering method proposed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
is based on physical intuition derived from quantum mechanics. In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the authors perform
clustering by exploiting a well-known reduction from clustering to the Maximum-Cut problem
that is then solved using a quantum algorithm for approximate combinatorial optimization.
In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], the authors present an unsupervised quantum learning algorithm for -Means clustering
based on adiabatic quantum computing [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>The general idea for quantizing classical clustering algorithms is to substitute the most
expensive parts in the algorithm with more eficient quantum subroutines. For instance, in [ 8]
the fidelity distance measure is used for distance computation between each pair of states in the
dataset. The fidelity is eficiently estimated with a quantum circuit containing only a few gates
(two Hadamard gates and a Control-Swap). In this way, the algorithm can perform clustering
directly on quantum states. However, the paper lacks a discussion on how the algorithm could
deal with classical data.</p>
      <p>As alternative approaches, full quantum routines for clustering have been proposed. In [9] two
subroutines based on Grover’s search algorithm [10] are used to accelerate classical clustering
methods. For the -Means quantization, the authors use the subroutines in this way. First, the
total distance of each state to all other states of one cluster is computed with the help of an
oracle that calculates the distance between two quantum states. Then, another routine finds the
smallest value of this distance function in order to select a quantum state as the new centroid
for the cluster. Unfortunately, this approach cannot be used in practice because the oracle is not
described in detail.</p>
      <p>Finally, in [11] it is proposed a quantum version of -Means (called -Means) that provides an
exponential speedup in the number of records of the dataset compared to the classical version.
Moreover, -Means returns explicit classical descriptions of the final centroids. Although
Means looks promising from a practical point of view, the paper discussion is strictly theoretical.
The experiments are performed using a classical algorithm ( --Means) simulating -Means,
instead of a real quantum version.</p>
      <p>Diferent from the literature, our work concentrates on practical problems arising when
implementing a quantum clustering algorithm, with particular attention to the encoding of
classical data.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Setting the stage</title>
      <p>We keep our paper self-contained by summarizing the key concepts necessary to comprehend
our work. Given a dataset  = {⃗1, . . . , ⃗ } of  records where every record ⃗ is a 
dimensional vector of numerical value, the goal of clustering is to assign each record to one out
of  diferent clusters {1, . . . , }, represented by centroids {⃗1, . . . , ⃗} respectively, so that
similar records share the same assignment. The -Means algorithm is one of the most famous
clusterings algorithms [12]. After randomly1 choosing  initial centroids, the algorithm consists
of two repeated steps until a certain convergence condition is met. The first step is the cluster
assignment step: every element in the dataset has to be assigned to its closest centroid according
to a specific distance measure. As distance, we consider the Eucledian distance that is defined as
 (⃗, ⃗) = √︁∑︀=1 ( − )2, where  and  are two  -values vectors. The second step is the
centroids update step, where for every cluster we recompute, the new cluster center to be used
as a centroid for the next iteration. In this work, we concentrate on the cluster assignment step
whose time complexity is (  ) where  is the number of centroids,  is the number of
records, and  their dimension.</p>
      <p>As already mentioned in Section 2, a quantum version of -Means, called -Means, is proposed
in [11]. The authors evaluate -Means by means of  --Means, a “quantum-approximation” of
-Means that simulates the quantum calculus that -Means is supposed to do. More precisely,
 --Means simulates the classical -Means algorithm as performed in a quantum environment.
1We adopt the clever initialization proposed in [13].</p>
      <p>|0⟩</p>
      <p>Since a quantum algorithm can introduce errors due to decoherence and noise present in quantum
machines,  --Means simulates such errors by introducing some noise in both steps of -Means.
However, as discussed in Section 4, in the quantum version of -Means we propose in this paper,
we keep the second step (i.e., centroids update) classical. For this reason, we will consider in the
experiments a slightly diferent version of  --Means where we introduce the noise  only in
the cluster assignment step.</p>
      <p>In Algorithm 1, we show the pseudocode of the updated version of the considered  --Means.
Let ⃗ be the centroid closest to the point ⃗. Then,  --Means defines the set of possible labels
 (⃗) for ⃗ as follows:
 (⃗) = {⃗ : ⃒⃒⃒ 2(⃗, ⃗) − 2(⃗, ⃗)⃒⃒ ≤  }.</p>
      <p>⃒
When  = 0,  --Means is equivalent to the standard -Means since no uncertainty is included
and  (⃗) contains only the closest centroid. On the other hand, a high value of  allows 
-Means to include in  (⃗) not very close centroids, bringing more noise in the entire procedure.
Indeed, the assignment rule selects randomly a cluster label from the set  (⃗) (see line 5 in
Algorithm 1). In [11] it is proven that if the data are “well-cluserable” (see [11] for the detailed
definition) and the centroids are well separated, the  --Means algorithm succeeds assigning
the right cluster to most of the points for a suitable value of  depending on the separation of
the centroids.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Quantum k-Means</title>
      <p>Classical information can be encoded in diferent ways into a quantum state. In [ 14], the
authors revisit several data encoding strategies and quantum distance algorithms. The process
of encoding input numerical features into the amplitude of a quantum system is called amplitude
encoding [15]. Amplitude encoding allows the design of quantum circuits that compute distances
between quantum states.</p>
      <p>One distance measure commonly used in ML is the Euclidean distance [12]. We show here
a general circuit for computing a quantum Euclidean distance (,  ) between two general
quantum states | ⟩, |⟩ encoded in a register  via amplitude encoding [16]. To compute this
distance, we use an additional ancilla qubit  entangled with the two states | ⟩ and |⟩. This
can be accomplished by first applying an Hadamard gate on the ancilla , and then by loading
in the register  the two states | ⟩ and | ⟩ conditioned on the ancilla. Then, the initial state
|0⟩ |00 · · · 0⟩ evolves in √12 (|0⟩ | ⟩ + |1⟩ |⟩). Eventually, we apply a Hadamard gate
|0⟩
|0⟩⊗ ^
|0⟩</p>
      <p>ampl.

| ⟩
0
1
on ancilla . The corresponding state becomes: 12 (︀ |0⟩ (| ⟩ + |⟩) + |1⟩ (| ⟩ − | ⟩)︀) .
The probability of measuring the ancilla in the state |0⟩ is given by  = 14 ‖ + ‖22 which
corresponds to  = 1 − 41 ‖ − ‖22 = 1 − 14 (,  )2, since  and  are unit vectors. The overall
circuit computing the Euclidean distance between two states  and  is illustrated in Figure 1.</p>
      <p>In the rest of this section, we propose our quantum version of -Means. In particular, what we
want to “make quantum” is the computation of the distance between two records. This is similar
to what some previous works have proposed [8], but here we give a practical implementation
of the entire algorithm.</p>
      <p>In order to eficiently load classical data in a suitable quantum state, we employ the
FFQRAM algorithm [17]. In particular, we deal with  -feature vectors, and we use the FF-QRAM
algorithm to amplitude encode each vector.</p>
      <p>Once data have been loaded, we build a quantum circuit that computes the distance between
a single pair of vectors. In the -Means context, we compute distances between every record
and every centroid in order to assign a cluster label to every record in the dataset. This can
be accomplished by using the quantum circuit shown in Figure 1 that computes the Euclidean
distance between two quantum states generated by the amplitude encoding technique.</p>
      <p>Figure 2 illustrates the whole quantum circuit (QC). QC can compute the Euclidean distance
simultaneously between the features of two vectors. The ^-qubit register |⟩ is the index register
for the  features of each vector. It consists of ^ = ⌈log2  ⌉ qubits which control the rotation
of a qubit |⟩.</p>
      <p>To get a proper estimate of the Euclidean distance, we need to repeat the execution of this
circuit a certain number of times . In this way, we can estimate the Euclidean distance as
√︂
(,  ) = 4 − 4 ︁( #|0′⟩ ︁) , where # |0⟩ is the number of times the ancilla qubit is measured
in the state |0⟩. Actually, also the FF-QRAM encoding procedure requires a post-selection on
the qubit |⟩ (see [17, 14] for more details). Thus, # |0⟩ is the number of times the outcome
for ancilla is 0 after having post-selected the result where qubit |⟩ was in 1. Notice that, here,
′ &lt;  is not the total number of repetitions, but the number of times the post-selection on |⟩ is
successful.</p>
      <p>Algorithm 2 describes the pseudocode of the proposed procedure, --Means, for cluster
computation. We adopt -Means++ [13] as centroid initialization strategy (line 1). Then, we
Algorithm 2: q--Means</p>
      <p>Input:  - input data,  - number of clusters,  - number of quantum shots</p>
      <p>Output:  - records to clusters assignment, C - centroids
7
compute the quantum Euclidean distance between each record ⃗ and each centroid ⃗. In
particular, the quantum distance computation is repeated for each of the   pairs of vectors,
where  is the number of records in the dataset and  is the number of centroids (lines 3-7).
Then, the algorithm assigns ⃗ to the cluster  such that the distance between ⃗ and the centroid
⃗ of the cluster  is the minimum (lines 8-10). Eventually, each centroid ⃗ is updated (lines
11-12). The output of --Means consists of a final cluster assignment  and the corresponding
centroids .</p>
      <p>Our hybrid approach improves the cluster assignment step with respect to the classical
Means by a factor  if we do not consider the QRAM preparation. In particular, the overall
complexity of the cluster assignment step is ( ), where  is the number of records and  is
the number of centroids, plus the cost of the QRAM preparation.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <p>In this section, we assess the efectiveness of the --Means algorithm2. We intend to evaluate
the capability of --Means in terms of clustering quality by simulating it on a number of
datasets. Since quantum computers currently available are not large enough to test --Means,
we exploit the qasm_simulator provided by qiskit3.
2Python code at: https://github.com/AlessandroPoggiali/Qkmeans-ICTCS
3Qiskit library: https://qiskit.org/</p>
      <p>For an evaluation of the algorithm as much complete as possible, the parameters that we are
going to test follow:
• shots (): how many times we repeat the execution of a quantum circuit.
• sc_thresh: it rules the stopping condition of the algorithm. It is defined as the relative
tolerance with regards to the Frobenius norm of the diference in the cluster centers of
two consecutive iterations to declare convergence.
• max_iterations (max_ite): the maximum number of iterations the algorithm can
perform.</p>
      <p>If not diferently specified, we tested --Means with the following parameters: max_ite: 10,
sc_thresh: 0.0001, and shots: 8192 on all four synthetic datasets.</p>
      <p>In order to evaluate the algorithm performance and the cluster quality, our analysis considers
the following measures.</p>
      <p>• n_ite (ite): the actual number of iterations --Means performs to converge.
• avg_similarity (sim): the concept of similarity is defined in terms of how accurately
--Means assigns the right centroids to records with respect to the classical assignments.
The avg_similarity measure is basically the average similarity among all iterations of
the algorithm.
• silhouette (sil): the Silhouette Coeficient measures how an element is similar to its
cluster with respect to the other clusters.
• SSE [18]: it corresponds to the sum of the squared diferences between each record and
its centroid.
• v_measure (vm) [19]: it measures the correctness of the clustering assignments with
respect to ground truth.</p>
      <p>Dataset. For the --Means assessment, we consider two groups of datasets: the first
group of four synthetic datasets and two real datasets. The synthetic datasets come from
the clustering guide of scikit-learn4. These datasets show the characteristics of diferent
clustering algorithms on datasets that are “interesting” but still in two dimensions. For these
datasets, we also have the ground truth (i.e., the actual clustering we want to obtain) so that we
can have an objective evaluation of the algorithm. They include: aniso, blobs, blobs2, and
moon (also known as noisymoon). While instead, the real datasets we take into consideration
are: iris, and diabetes. Regarding the real datasets, we have no ground truth. Thus, we have to
choose a suitable number of centroids . As synthetic datasets, the real datasets are available
on scikit-learn.</p>
      <p>Data Preprocessing. The essential data preprocessing required by --Means consists of
two steps: standardization and normalization. The standardization of the dataset has the efect
of having zero mean and unit variance among all samples. This is common practice in ML
to compensate for scaling efects and to ensure that the data does not only populate a small
subspace of the input space. In fact, input spaces in higher dimensions lead to indistinguishably
small distances between data points. The normalization applies row by row and allows us to
deal with unit length vectors. While the dataset standardization is always applied, we have
two diferent preprocessing techniques available: the normalization to unit length ( 1-norm) and
the inf-norm preprocessing, which is equivalent to dividing the entries of the vector by the
component of maximum modulus.</p>
      <p>Finally, vector values are converted to suitable angles in order to encode them in the
FFQRAM. Note that the ancilla post-selection probability for the quantum Euclidean distance is
always around 0.5 [15]. However, we can enhance the FF-QRAM post-selection probability
applying the inf-norm normalization [20].</p>
      <p>Looking at the algorithm assessments, the execution with the inf-norm gives comparable
results with respect to the 1-norm (see Table 1). In this test, we executed --Means on blobs2
dataset using 1024 shots. The Silhouette score tells us that in the inf-norm case, we obtain
a better clustering, and also the v_measure tells us that the final assignment is closer to the
ground truth. Instead, the SSE in the inf-norm case is worse than in the 1-norm case. This
depends on the range of input data, and hence it does make sense only when comparing results
whereby input data have their range of values comparable. We report in Figure 3 the output in
the original space of the final clustering assignment. We observe that we get a more accurate
clustering in the second case, even though bad-clustered points are still present. Henceforth,
we adopt the inf-norm as the default preprocessing.</p>
      <p>Results on Synthetic Datasets. Table 2 reports the evaluation measures observed for
-Means on synthetic datasets. The values show good performance on aniso, blobs, and
blobs2. In fact, the measures sil and vm highlight a good clustering output. While, in the moon
dataset, in spite of having a high similarity value, the vm returned is low. This probably happens
because the moon dataset, as it is generated, is not a well-clusterable dataset. In fact, it has no
spherical clusters hence algorithms like -Means will not be able to identify the right shapes of
its clusters. Table 2 reports also the results obtained with  --Means and -Means. In particular,
--Means produces a clustering with error  when the  --Means similarity is comparable
to the --Means similarity. A final consideration concerns the confusion matrices in Table 3,
where we report the percentages of True Positive (TP), False Positive (FP), False Negative (FN),
and True Negative (TN). We observe that -Means and --Means typically output the same
results on the datasets blobs, blobs2, and moon. In fact, the value   +   in these datasets
is quite small. This does not hold for the aniso dataset where, instead,   +   = 8.71%.</p>
      <p>Results on Real Datasets. With real datasets, we have no prior knowledge about the
clustering result we should achieve. In other words, the number of clusters  has to be selected
properly in order to get good clusterization. A very common heuristic used in clustering to
determine the number of clusters in a dataset is the elbow method [12]. Our first test aims at
performing the elbow method with the -Means and --Means algorithm separately on the
iris dataset. We repeat the execution of both algorithms by varying  from 2 to 8. We obtain
the curves in Figure 4. A suitable value of  for both algorithms is 3 because this is the point
where the SSE stops decreasing sharply. The comparison between the classical and the quantum
algorithm with this configuration is reported in Table 4. We repeated the same experiment on
the diabetes dataset. Again, we select the right  using the elbow method with -Means, and
then we compare the result that --Means gives us for the same . Figure 4 shows that a good
value for  is 8 for this dataset. Hence, by executing the --Means algorithm with  = 8, we
obtain the result in Table 4. The result shows that --Means performs not well compared to
 --Means and -Means according to SIL measure. A possible explanation can be related to the
use of PCA to reduce the number of features from 10 to 4, which is necessary for executing the
algorithm in a reasonable amount of time, due to the technological limits. This makes records
belonging to diferent clusters too similar, and the number of shots was insuficient to estimate
a suficiently precise Euclidean distance, which therefore afected the quality of the clustering.
Eventually, we observe that the SIL measures of -Means and --Means on the iris dataset are
similar, while in the diabetes dataset, -Means performs better than --Means.</p>
      <p>--Means on Real Quantum Hardware. All tests reported up to now were carried out using
the qasm simulator, a simulator provided by qiskit which simulates quantum computation by
using classical hardware. Here, we show the best we can do with currently available quantum
computers. IBM ofers cloud access 5 to some of their quantum computers, so it is possible
to delegate the execution of a quantum circuit to a real quantum machine. We had access to
quantum computers with no more than five qubits. For this reason, we must consider simple
instances of our --Means where no more than five qubits are necessary.</p>
      <p>The first test considers --Means where we use three qubits: one qubit for the ancilla |⟩,
one for the register |⟩, and one for addressing  = 2 features (i.e., |⟩). Instead of simulating
each of the   quantum circuits locally, we first check the least busy quantum computer
available and send to it every circuit to be executed. This requires several steps, like waiting
on the queue of a quantum computer and receiving the result, so it introduces an overhead.
Furthermore, this communication overhead will be paid per pair of vectors, so it could highly
afect the overall performance of the algorithm, especially for big datasets. For this reason,
we took into account for this experiment a small dataset (blobs3) consisting of  = 16 two
dimensional vectors, which form two well-distinguishable spherical clusters (Fig. 5). Notice
that the number of clusters is not involved in the quantum circuit preparation, but we chose
 = 2 to simplify the overall execution.</p>
      <p>We compare the output of --Means executed using real quantum hardware with the output
of the algorithm executed by the simulator. In Table 5 we report this comparison, while in
Figure 6 we show the clustering obtained.</p>
      <p>From the table, we can see that both clusterization were successful with respect to the ground
truth with the same number of iterations. However, to conclude, until a large-scale noise-free
quantum computer is available, testing complex quantum circuits on real quantum hardware
will be an unfeasible task.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>We have proposed --Means, a hybrid approach for clustering classical data. The
algorithm implements a quantum subroutine to boost the cluster assignment step of the classical
-Means. In particular, this quantum subroutine computes the Euclidean distance between
two  -dimensional vectors, i.e., a record and a cluster centroid. The complexity of this step
is ( ), where  and  are the number of records and the number of centroids,
respectively. The experiments show that --Means could be in principle more eficient than classical
-Means, yet obtaining comparable clustering results. In this work, we exploited quantum
parallelism only to compute distances. Our future work is to design and analyze two variants
of --Means that leverage quantum parallelism to compute (i) the distances between a single
record and  centroids simultaneously, and (ii) the distances between  records and  centroids
simultaneously.</p>
      <p>Acknowledgment. This work is supported by Università di Pisa under the “PRA – Progetti
di Ricerca di Ateneo” (Institutional Research Grants) - Pr. no. PRA 2020-2021 92, “Quantum
Computing, Technologies and Applications”, and the work of Gianna Del Corso is also
supported by GNCS-INdAM. This work has been partially supported by the European Community
Horizon 2020 programme under the funding schemes: H2020-INFRAIA-2019-1: Research
Infrastructure G.A. 871042 SoBigData++.
[8] E. Aïmeur, G. Brassard, S. Gambs, Machine learning in a quantum world, in: Conference
of the Canadian Society for Computational Studies of Intelligence, Springer, 2006, pp.
431–442.
[9] E. Aïmeur, G. Brassard, S. Gambs, Quantum clustering algorithms, in: Proceedings of the
24th international conference on machine learning, 2007, pp. 1–8.
[10] L. K. Grover, A fast quantum mechanical algorithm for database search, in: Proceedings of
the twenty-eighth annual ACM symposium on Theory of computing, 1996, pp. 212–219.
[11] I. Kerenidis, J. Landman, A. Luongo, A. Prakash, q-means: A quantum algorithm for
unsupervised machine learning, Advances in Neural Information Processing Systems 32
(2019).
[12] P. Tan, et al., Introduction to Data Mining, Addison-Wesley, 2005.
[13] S. Vassilvitskii, D. Arthur, k-means++: The advantages of careful seeding, in: Proceedings
of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 2006, pp. 1027–
1035.
[14] A. Berti, A. Bernasconi, G. M. Del Corso, R. Guidotti, Efect of Diferent Encodings
and Distance Functions on Quantum Instance-based Classifiers, in: Proceedings 26th
Pacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD, 2022.
[15] M. Schuld, M. Fingerhuth, F. Petruccione, Implementing a distance-based classifier with a
quantum interference circuit, EPL (Europhysics Letters) 119 (2017) 60002.
[16] M. Schuld, et al., Supervised learning with quantum computers, Springer, 2018.
[17] D. K. Park, F. Petruccione, J.-K. K. Rhee, Circuit-based quantum random access memory
for classical data, Scientific reports 9 (2019) 1–8.
[18] P. J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster
analysis, Journal of computational and applied mathematics 20 (1987) 53–65.
[19] A. Rosenberg, J. Hirschberg, V-measure: A conditional entropy-based external cluster
evaluation measure, in: Proceedings of the 2007 joint conference on empirical methods
in natural language processing and computational natural language learning
(EMNLPCoNLL), 2007, pp. 410–420.
[20] T. M. de Veras, I. C. De Araujo, D. K. Park, A. J. da Silva, Circuit-based quantum random
access memory for classical data with continuous amplitudes, IEEE Transactions on
Computers 70 (2020) 2125–2135.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schuld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sinayskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petruccione</surname>
          </string-name>
          ,
          <article-title>An introduction to quantum machine learning</article-title>
          ,
          <source>Contemporary Physics</source>
          <volume>56</volume>
          (
          <year>2015</year>
          )
          <fpage>172</fpage>
          -
          <lpage>185</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>MacQueen</surname>
          </string-name>
          , et al.,
          <article-title>Some methods for classification and analysis of multivariate observations</article-title>
          ,
          <source>in: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability</source>
          , volume
          <volume>1</volume>
          , Oakland, CA, USA,
          <year>1967</year>
          , pp.
          <fpage>281</fpage>
          -
          <lpage>297</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Preskill</surname>
          </string-name>
          ,
          <article-title>Quantum computing in the NISQ era and beyond</article-title>
          ,
          <source>Quantum</source>
          <volume>2</volume>
          (
          <year>2018</year>
          )
          <fpage>79</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gottlieb</surname>
          </string-name>
          ,
          <article-title>Algorithm for data clustering in pattern recognition problems based on quantum mechanics</article-title>
          ,
          <source>Physical review letters 88</source>
          (
          <year>2001</year>
          )
          <fpage>018702</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Otterbach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Manenti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Alidoust</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bestwick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Block</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bloom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Caldwell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Didier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. S.</given-names>
            <surname>Fried</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hong</surname>
          </string-name>
          , et al.,
          <article-title>Unsupervised machine learning on a hybrid quantum computer</article-title>
          ,
          <source>arXiv preprint arXiv:1712.05771</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lloyd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mohseni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rebentrost</surname>
          </string-name>
          ,
          <article-title>Quantum algorithms for supervised and unsupervised machine learning</article-title>
          ,
          <source>arXiv preprint arXiv:1307.0411</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Farhi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Goldstone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gutmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sipser</surname>
          </string-name>
          ,
          <article-title>Quantum computation by adiabatic evolution</article-title>
          ,
          <source>arXiv preprint quant-ph/0001106</source>
          (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>