Introduction

A QUBO Formulation of the k-Medoids Problem

C. Bauckhage

1 2 3

N. Piatkowski

R. Sifa

2 3

D. Hecker

2 3

S. Wrobel

1 2 3 0 AI Group, TU Dortmund , Dortmund , Germany 1 B-IT, University of Bonn , Bonn , Germany 2 Fraunhofer Center for Machine Learning , Sankt Augustin , Germany 3 Fraunhofer IAIS , Sankt Augustin , Germany

We are concerned with k-medoids clustering and propose a quadratic unconstrained binary optimization (QUBO) formulation of the problem of identifying k medoids among n data points without having to cluster the data. Given our QUBO formulation of this NP-hard problem, it should be possible to solve it on adiabatic quantum computers. Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

Introduction

Quadratic unconstrained binary optimization problems (QUBOs) are concerned with nding an n-dimensional binary vector that minimizes a quadratic objective z = argmin z|Qz + z|q: z2f0;1gn ( 1 ) They thus pose combinatorial optimization problems which generally prove to be NP-hard. Indeed, di cult problems such as capital budgeting or task allocation in operations research, constraint satisfaction in AI, or maximum diversity, maximum clique, or graph partitioning in data mining and machine learning are but a few examples of where QUBOs occur in practice [1].

Owing to their practical importance, there exists a venerable literature on QUBOs and e cient solution strategies (for an extensive survey, see [2] and the references therein). More recently, however, interest in QUBOs has also arisen in a di erent context. Since adiabatic quantum computers such as produced by D-Wave [3,4] are designed to solve them, research on QUBO reformulations of various problems has noticeably intensi ed. Indeed, Boolean satis ability [5], graph cuts [6,7,8], graph isomorphisms [9], binary clustering [6,10], or classi er training [11,12] have lately been solved via adiabatic quantum computing.

Following this line of research, we propose a QUBO formulation of the problem of identifying k medoids among n data points which, to our knowledge, has neither been attempted nor reported before. While we do not consider a quantum computing implementation itself, our result indicates that prototype extraction can be accomplished on quantum computers.

mean medoid mean medoid (a) Gaussian blob (b) ring mean medoid (c) spiral

Next, we brie y summarize the notion of a medoid and the ideas behind k-medoids clustering. We then present our QUBO formulation of the k-medoids problem and discuss simple examples that illustrate the practical performance of our approach. Finally, we review related work and summarize our contributions. 2

k-Medoids Clustering Consider a sample X = fx1; : : : ; xng of n Euclidean data points xi 2 Rm. The well established sample mean = argmin X x2Rm xi2X xi x 2 = n1 X xi xi2X is a frequently used summary statistic for such data. The so called sample medoid ( 2 ) ( 3 ) m = argmin X xj2X xi2X xi xj 2 on the other hand, is arguably less widely known but has interesting applications as well.

Figure 1 illustrates characteristics of means and medoids. Both minimize sums of distances to available sample points. However, while the mean is not necessarily contained in the sample, the medoid always coincides with an element of the sample and, for squared Euclidean distances as in ( 3 ), it is easy to prove that this element is the sample point closest to the mean [13].

An interesting feature of medoids is that they result from evaluating squared distances between given data only.1 As these can be precomputed and stored in an n n distance matrix D where Dij = d2(xi; xj ), medoids can be computed 1 Note once again that x in equation ( 2 ) may not be contained in X whereas xi and xj in equation ( 3 ) always are.

M = fj1; j2; : : : ; jkg

I repeat for l = 1; : : : ; k do determine cluster

Cl = i 2 I

Dijl Dijq

for l = 1; : : : ; k do update medoid index jl = argmin X j2Cl i2Cl

Dij until clusters stabilize ( 4 ) ( 5 ) Algorithm 1 k-medoids clustering via Lloyd's algorithm Require: index set I = f1; : : : ; ng, distance matrix D 2 Rn n, and parameter k 2 N initialize set of k cluster medoid indices from relational data. Contrary to means, medoids may thus also be estimated on sets of strings or graphs or other non-numeric data as long as there is an appropriate distance measure d( ; ) [13].

Moreover, since medoids coincide with actual data points, analysts often nd them easy to interpret. This is why k-medoids clustering is an increasingly popular tool whenever explainable or physically plausible prototypes need to be determined [14,15,16,17,18].

Conceptually, k-medoids clustering is almost identical to k-means clustering. To accomplish it, we may, for instance, simply adapt Lloyd's algorithm [19]. In other words, given randomly initialized medoids, we may determine clusters by assigning data points to their closest medoid, update the medoid of each cluster, and repeat these steps until clusters stabilize.

Note, however, that (local) medoids, too, are relational objects that can solely be determined from analyzing a distance matrix. Given such a distance matrix for a data set X , k-medoids clustering can thus be performed on index sets only. We demonstrate this in Algorithm 1 where I = f1; 2; : : : ; ng indexes the data in X . Having initialized a set M = fj1; j2; : : : ; jkg I of medoid indices, each cluster Cl is an index set containing the indices of those data points xi that are closest to medoid mjl (equation ( 4 )). Given the clusters, each medoid index jl is then set to the index of the medoid of the points indexed by Cl (equation ( 5 )).

Similar to k-means clustering, k-medoids clustering subdivides the input space into convex cells each of which is de ned w.r.t. a prototype. Just as in k-means clustering via MacQueen's algorithm [20], k-medoids clustering could therefore in principle be performed in two steps where medoids are determined rst and data are assigned to them second. However, contrary to k-means clustering, the combinatorial nature of k-medoids clustering, which selects prototypes from a discrete set of data points, prevents the use of continuous MacQueen-type updates of prototypes.

Indeed, we are not aware of prior work on how to nd local medoids without having to compute clusters. The quadratic unconstrained binary optimization formulation of the problem which we present in the next section, however, can accomplish this. 3

A QUBO Formulation for k-Medoids Estimation

Our idea for how to select k local medoids among n data points is based on an observation regarding classical k-means clustering.

Recall that, if a set X of n data points is partitioned into k clusters Cl of nl elements whose mean is l, adding the within cluster scatter and the between cluster scatter

k SW = X X x l=1 x2Cl l

2 SB = 1 Xk Xk ni nj 2 n i=1 j=1 i j 2 yields a constant; that is, we have SW + SB = c.

This follows from Fisher's analysis of variance [21,22] and is to say that the two scatter values are inverse under addition: either SW is large and SB is small, or SW is small and SB is large. Since a small within cluster scatter is usually taken to be the objective of k-means clustering, we therefore observe that \good" cluster means are close to the data points within their cluster and, at the same time, far apart from each other.

Consequently, we posit that the same should hold true for the medoids that are supposed to result from k-medoids clustering.

Note, however, that ( 6 ) involves variances about local centroids. Since medoids are not necessarily central, we henceforth work with a more robust measure of similarity. In particular, we consider Welsch's M -estimator ij = 1 exp 21 Dij which is known from robust regression [23] and also referred to as the correntropy loss [24,25]. 3.1

A QUBO to identify far apart data points The problem of selecting k mutually far apart objects among a total of n objects is known as the max-sum dispersion problem [26] and can be formalized as ( 6 ) ( 7 ) ( 8 ) M = argmax

M I s:t:

Looking at this problem, we note that, upon introducing binary indicator vectors z 2 f0; 1gn whose entries are given by it can also be written in terms of a constrained binary optimization problem follows: given an n n similarity matrix I = f1; 2; : : : ; ng, determine a subset M over n objects which are indexed by

I, jM j = k such that z = za2rgf0m;1agxn 21 z| = argmax z| 1

z2f0;1gn 2 where 1 2 Rn denotes the vector of all ones.

If we further note that z|1 = k , (z|1 k)2 = 0, we nd that ( 11 ) can equivalently be expressed as a quadratic unconstrained binary optimization problem, namely z = za2rgf0m;1agxn 21 z| z (z|1 k)2 where 2 R is a Lagrange multiplier. Treating this multiplier as a constant and expanding the expression on the right hand side, we obtain Using the above notation, the problem of selecting k most central objects among a total of n objects can be formalized as follows

M = argmin

M I s:t:

X X i2M j2I jMj = k: ij

Introducing binary indicator vectors and a Lagrange multiplier, reasoning as above then leads to the following quadratic unconstrained binary optimization ( 9 ) ( 10 ) ( 11 ) ( 12 ) ( 13 ) ( 14 ) ( 15 ) ( 16 ) problem z = argmin z|

z2f0;1gn In order to combine the QUBO that identi es far apart points ( 15 ) and the QUBO that identi es central points ( 18 ) into a single model that identi es rather central points that are rather far apart, we adhere to common practice [27]. That is, we introduce two additional tradeo parameters ; 2 R to weigh the contributions of either model. This way, we obtain z = argmin z| z2f0;1gn 11| Working with Welsch's function in ( 8 ) has the additional bene t that it maps squared distances such as kxi xj k2 into the interval [0; 1].

For the weighted, model speci c contributions that occur in ( 19 ), we therefore have the following two upper bounds 21 z| z| z 1 12 k2 1 n k 1: This suggests to set the weighting parameters to = 1=k and = 1=n so as to normalize the two contribution to about the same range. For the Lagrange multiplier which enforces solutions z to have k entries equal to 1, we choose = 2 so as to prioritize this constraint. 4

Practical Examples

Next, we present two admittedly simple experiments that illustrate the behavior of the QUBOs in ( 15 ), ( 18 ), and ( 19 ). Note again that our primary goal in this paper is to investigate whether it is possible to identifying k medoids among n data points without having to cluster the data; e ciency and scalability are not our main concerns. Correspondingly, we consider two samples of 2D data points that are small enough to allow for brute force estimation of the minimizers of the above QUBOs. In other words, in both experiments, we compute similarity matrices over n 2 f12; 16g data points, chose , , and as above and then evaluate each of the 2n possible solutions to the QUBOs in ( 15 ), ( 18 ), and ( 19 ) in order to determine the respective best one.

The data in our rst experiment were deliberately chosen such that extremal, central, as well as locally medoidal data points can be easily identi ed by visual ( 17 ) ( 18 ) ( 20 ) ( 21 ) (a) 2D data (b) solution to ( 15 ) (c) solution to ( 18 ) (d) solution to ( 19 ) inspection. They can be seen in Fig. 2a which shows n = 12 two-dimensional data points which form 4 apparent clusters. Setting k = 4 and solving the QUBOs in ( 15 ), ( 18 ), and ( 19 ) produces indicator vectors that identify the data points highlighted in Fig. 2b{2d. Looking at these results suggests that they are well in line with what human analysts would deem reasonable.

In our second experiment, we generated 100 sets of n = 16 data points sampled from 3 bivariate Gaussian distributions; Fig. 3a shows an example. Setting k = 3, we then solved the QUBOs in ( 15 ), ( 18 ), and ( 19 ). Exemplary results can be seen in Fig. 3b{3d. The three extremal data points in Fig. 3b make intuitive sense. The fact that the three rather central data points in Fig. 3c are all situated in the cluster to the bottom left is due to the fact that this is the largest of the three clusters; here, our use of Welsch's M -estimator has the e ect that points which are central w.r.t. several nearby points are being favored. The local medoids highlighted in Fig. 3d are again intuitive.

For each of the 100 data sets considered in this experiment, we also compared the k = 3 medoids obtained from solving ( 19 ) to those produced by Algorithm 1. In each of our 100 tests, we found them to be identical. While such a perfect agreement between both methods is likely an artifact of the small sample sizes we worked with, it nevertheless corroborates the utility of our QUBO based approach to the k-medoids problem. 5

Related Work

As of this writing, there are three major algorithmic approaches to k-means clustering, namely those due to Lloyd [19], MacQueen [20], and Hartigan and Wong [28]. Lloyd- and Hartigan-type approaches alternatingly update means and clusters and have been applied to the combinatorial problem of k-medoids clustering, too. Examples for the former can be found in [13] and [29]; examples of the latter include the algorithms PAM and CLARA as well as variations thereof [30,31,32].

However, to the best of our knowledge, MacQueen type procedures, which determine local means without having to compute clusters, have not yet been reported for the k-medoids setting. The QUBO presented in equation ( 19 ) lls this gap. While it is not an iterative method such as MacQueen's procedure, it nevertheless suggests that medoids, too, can be estimated without clustering.

Our derivation of a QUBO for the k-medoids problem was motivated by the quest for quantum computing solutions for relational or combinatorial clustering. Here, most prior work we are aware of has focused on solutions for k = 2 [6,7,8]. Work on solutions for k > 2 can be found in [33].

However, the methods proposed there are similar in spirit to kernel k-means clustering in that they operate on binary cluster membership indicator matrices Z 2 f0; 1gk n. This is to say that they perform quantum clustering in a manner that produces clusters but does not identify cluster prototypes. Our approach in this paper, however, only requires binary indicator vectors z 2 f0; 1gn and is, once again, able to identify 1 k n prototypes without having to compute clusters. 6

Summary

In this paper, we have proposed a quadratic unconstrained binary optimization (QUBO) formulation of the problem of identifying k local medoids within a sample of n data points. The basic idea is to trade o measures of central and extremal tendencies of individual data points. Just as conventional approaches to k-medoids clustering, our solution works with relational data and therefore applies to a wide range of practical settings. However, to the best of our knowledge, our solution is the rst that is capable of extracting k local medoids without having to compute clusters.

This capability comes at a price. While conventional k-medoids clustering as well as our QUBO formulation constitute NP-hard combinatorial problems, the former can be solved (approximately) by means of greedy heuristics. Yet, for the latter, such heuristics are hard to conceive.

However, an aspect of our model that may soon turn into a viable advantage is that it can be easily solved on quantum computers. While this paper did not consider corresponding quantum computing implementations, they are the topic of ongoing work and results will be reported once available.

1. Kochenberger , G. , Glover , F. : A Uni ed Framework for Modeling and Solving Combinatorial Optimization Problems: A Tutorial . In Hager, W., Huang , S.J. , Pardalos , P. , Prokopyev , O., eds.: Multiscale Optimization Methods and Applications . Volume 82 of NOIA . Springer ( 2006 )

2. Kochenberger , G. , Hao , J.K. , Glover , F. , Lewis , M. , Lu , Z. , Wang , H. , Wang , Y. : The Unconstrained Binary Quadratic Programming Problem: A Survey . J. of Combinatorial Optimization 28 ( 1 ) ( 2014 )

3. Johnson , M. , et al.: Quantum Annealing with Manufactured Spins . Nature 473 ( 7346 ) ( 2011 )

- Wave press release: D-Wave announces D-Wave 2000Q quantum computer and rst system order ( Jan 2017 )

5. Farhi , E. , Goldstone , J. , Gutmann , S. , Sipser , M. : Quantum Computation by Adiabatic Evolution . arXiv:quant-ph/0001106 ( 2000 )

6. Bauckhage , C. , Brito , E. , Cvejoski , K. , Ojeda , C. , Sifa , R. , Wrobel , S. : Ising Models for Binary Clustering via Adiabatic Quantum Computing . In: Proc. EMMCVPR . Volume 10746 of LNCS., Springer ( 2017 )

7. Ushijima-Mwesigwa , H. , Negre , C. , Mniszewski , S. : Graph Partitioning Using Quantum Annealing on the D-Wave System . In: Proc. Int. Workshop on Post Moores Era Supercomputing , ACM ( 2017 )

8. Junger, M. , Lobe , E. , Mutzel , P. , Reinelt , G. , Rendl , F. , Rinaldi , G. , Stollenwerk , T. : Performance of a Quantum Annealer for Ising Ground State Computations on Chimera Graphs . arXiv: 1904 . 11965 [cs .DS] ( 2019 )

9. Calude , C. , Dinneen , M. , Hua , R.: QUBO Formulations for the Graph Isomorphism Problem and Related Problems . Theoretical Computer Science 701 ( 2017 )

10. Bauckhage , C. , Ojeda , C. , Sifa , R. , Wrobel , S. : Adiabatic Quantum Computing for Kernel k=2 Means Clustering . In: Proc. KDML-LWDA . ( 2018 )

11. Pudenz , K. , Lidar , D. : Quantum Adiabatic Machine Learning . Quantum Information Processing 12 ( 5 ) ( 2013 )

12. Adachi , S. , Henderson , M. : Application of Quantum Annealing to Training of Deep Neural Networks . arXiv:1510 .06356 [quant-ph ] ( 2015 )

13. Bauckhage , C. : NumPy / SciPy Recipes for Data Science: k-Medoids Clustering . researchgate. net (Feb 2015 )

14. Drachen , A. , Sifa , R. , Thurau , C. : The Name in the Game: Patterns in Character Names and

Game

Tags . Entertainment Computing 5 ( 1 ) ( 2014 )

15. Bauckhage , C. , Drachen , A. , Sifa , R.: Clustering Game Behavior Data . IEEE Trans. on Computational Intelligence and AI in Games 7 ( 3 ) ( 2015 )

16. Caro , M. , Aarva , A. , Deringer , V. , Csanyi , G. , Laurila , T. : Reactivity of Amorphous Carbon Surfaces: Rationalizing the Role of Structural Motifs in Functionalization Using Machine Learning . Chemistry of Materials 30 ( 21 ) ( 2018 )

17. Molina , A. , Vergari , A. , Di Mauro , N. , Natarajan , S. , Esposito , F. , Kersting , K. : Mixed Sum-Product Networks: A Deep Architecture for Hybrid Domains . In: Proc. AAAI . ( 2018 )

18. Johnson , M. , et al.: A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering . Systematic Biology (12 2018 )

19. Lloyd , S. : Least Squares Quantization in PCM . IEEE Trans. Information Theory 28 ( 2 ) ( 1982 )

20. MacQueen , J.: Some Methods for Classi cation and Analysis of Multivariate Observations . In: Proc. Berkeley Symp. on Mathematical Statistics and Probability . ( 1967 )

21. Fisher, R.: On the Probable Error of a Coe cient Correlation Deduced from a Small Sample . Metron 1 ( 1921 )

22. Bauckhage , C. : k-Means and Fisher's Analysis of Variance. researchgate .net (May 2018 )

23. Dennis , J. , Welsch , R.: Techniques for Nonlinear Least Squares and Robust Regression . Communications in Statistics { Simulation and Computation 7 ( 4 ) ( 1978 )

24. Liu , W. , Pokharel , P. , Principe , J.: Correntropy: Properties and Applications in Non-Gaussian Signal Processing . IEEE Trans. on Signal Processing 55 ( 11 ) ( 2007 )

25. Feng , Y. , Huang , X. , Shi , L. , Yang , Y. , Suykens , J.: Learning with the Maximum Correntropy Criterion Induced Losses for Regression . J. of Machine Learning Research 16 ( 2015 )

26. Ravi , S. , Rosenkrantz , D. , Tayi , G.: Heuristic and Special Case Algorithms for Dispersion Problems . Operations Research 42 ( 2 ) ( 1994 )

27. Lucas , A. : Ising Formulations of Many NP Problems . Frontiers in Physics 2 ( 5 ) ( 2014 )

28. Hartigan , J. , Wong , M. : Algorithm

136: A k-Means Clustering Algorithm . J. of the Royal Statistical Society C 28 ( 1 ) ( 1979 )

29. Park , H.S. , Jun , C.H. : A Simple and Fast Algorithm for K-Medoids Clustering . Expert Systems with Applications 36 ( 2 ) ( 2009 )

30. Kaufman , L., Rousseeuw , P. : Partitioning Around Medoids (Program PAM) . In: Finding Groups in Data: An Introduction to Cluster Analysis . John Wiley & Sons ( 1990 )

31. Kaufman , L., Rousseeuw , P. : Clustering Large Applications (Program CLARA) . In: Finding Groups in Data: An Introduction to Cluster Analysis . John Wiley & Sons ( 1990 )

32. Schubert , E. , Rousseeuw , P. : Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms . arXiv: 1810 . 05691 [cs .LG] ( 2018 )

33. Kumar , V. , Bass , G. , Tomlin , C. , Dulny

III

, J.: Quantum Annealing for Combinatorial Clustering . Quantum Information Processing 17 ( 2 ) ( 2018 )