Introduction

0 B-IT, University of Bonn , Bonn , Germany 1 Fraunhofer IAIS , Sankt Augustin , Germany

133 144

We explore the idea of clustering according to extremal rather than to central data points. To this end, we introduce the notion of the maxoid of a data set and present an algorithm for k-maxoids clustering which can be understood as a variant of classical k-means clustering. Exemplary results demonstrate that extremal cluster prototypes are more distinctive and hence more interpretable than central ones.

Introduction

In this paper, we introduce a novel, prototype-based clustering algorithm. Since numerous such algorithms exist already [ 1, 12 ], our main goal is to fathom the potential of a paradigm that di ers from existing prototype-based methods.

Whereas most prototype-based clustering algorithms produce prototypes that represent modes of a distribution of data (notable examples include the k-means procedure, the mean-shift algorithm, self organizing maps, or DBSCAN [ 7, 9, 11, 15 ]), our algorithm determines cluster prototypes that are extreme rather than central. They reside on the convex hull of their corresponding clusters and, in addition, are as far apart as possible.

The idea for this approach was motivated by research on e cient archetypal analysis, a matrix factorization technique that expresses a data set in terms of convex combinations of points on the data convex hull [ 5, 8, 13, 17 ]. The resulting representations are easily interpretable by human analysts [ 8, 19, 21 ], allow for clustering, and can facilitate classi cation. However, as their computation involves demanding optimization problems, the quest for more e cient methods and heuristics has become an active area of research [ 5, 6, 16, 18 ].

In the following, we rst de ne the notion of the maxoid of a data set, prove that it will be furthest from the sample mean and necessarily coincides with a vertex of the data convex hull. We then introduce a simple and e cient clustering algorithm based on maxoids. It can be understood as a variant of the popular k-means procedure, however, whereas k-means determines cluster prototypes based on local information, our approach assumes a global view and selects the mean medoid maxoid mean medoid maxoid (a) Gaussian blob (b) ring mean medoid maxoid (c) spiral prototype of a cluster with respect to those of other clusters. In experiments with synthetic and real world data, we illustrate the behavior of this algorithm and observe that it yields prototypes which are more distinct and hence more amenable to human interpretation than those produced by k-means. 2

Means, Medoids, and Maxoids In this section, we brie y recall the concepts of the sample mean and sample medoid, introduce the idea of the sample maxoid, and review its characteristics.

Consider a nite set X = fxigin=1 Rm of data points. The sample mean is arguably the most popular summary statistic of such data. The closely related concept of the sample medoid, however, seems less well known. It is given by n = 1 X xi: n i=1 n m = argmin 1 X xj n i=1 xj xi 2 and coincides with the data point xj whose average distance to all other points is smallest which is to say that it is the data point closest to the mean [ 3, 14 ].

Yet, our focus in this paper is not on central tendencies but on extremal characteristics of a set of data. To make this notion precise, we introduce the idea of the sample maxoid and de ne De nition 1. The maxoid of a set X = fxigin=1

Rm is given by n m = argmax 1 X xj n i=1 xj

xi 2:

Apparently, this de nition reverses that of the sample medoid in that it replaces minimization by maximization. It is thus straightforward to prove Lemma 1. Given a set X = fxigin=1 of real valued data vectors, let be the sample mean and k k be the Euclidean norm. Then = n1 Pi xi implies that 1 X n i xj

xi xj 2 2 1 X n i xk xk xi

2 2: That is, the maxoid m, i.e. the point xj 2 X with the largest average distance to all other points in X , is farthest from the sample mean .

Proof. Note that the left hand side of (4) can be written as xj xi 2 = n1 Xi (xj ) (xi ) 2: Expanding the squared Euclidean distances in this sum, we have ) 2 (4) (5) tu 1 X n i = = = xj 1 X n i

xj xj xj xj 2 +

xi 2 + 1 X

n i 2 + 1 X

n i 2 + 1 X n i xi 2 Since these arguments also apply to the right hand side of (4), the inequality in (4) can be cast as which is to say that xj

Given this result, it is easy to understand the behavior of the means, medoids, and maxoids in Fig. 1. In particular, we note a caveat for analysts working with centroid methods: the sample mean is always is located in the center of the data, yet, in cases where there is no clear mode, it is rather far from most data.The medoid may or may not be close to the mean but always coincides with a data point. The maxoid, too, always coincides with a data point but its behavior seems not to depend on whether or not there is a mode. In fact, we can prove the following 2 2(xj )T (xi

) 2(xj 2(xj )T 1 X(xi

n i )T (

) 2 + 1 X n i

xi xi xi xi

2 xk 2: xk 2. (a) k-means clusters (b) k-medoids clusters (c) k-maxoids clusters

Lemma 2. The maxoid of a nite set X = fxigin=1 of real valued data vectors coincides with a vertex of the convex hull of X .

Proof. The maxoid of X is the maximizer of the convex function f (x) = 1 X x n xi2X xi 2: The domain of f is given by the discrete set X which de nes a polytope, that is, a convex set of nitely many vertices. By Jensen's inequality, the maximum of a convex function over a convex set is attained at a vertex. tu 3

From k-Means Clustering to k-Maxoids Clustering Having familiarized ourselves with means, medoids, and maxoids, we ever so brie y revisit k-means clustering and then present our idea for how to extend it towards k-maxoid clustering.

In the simplest setting, k-means clustering considers a set of n data points X = fxigin=1 Rm and attempts to determine a set C = fC gk=1 of k clusters where C X such that data points within a cluster are similar. In order to assess similarity, the algorithm represents each cluster by its mean and assigns data point xi to cluster C if is the closest mean. This idea reduces clustering to the problem of nding appropriate means which can be formalized as solving k argmin X X 1;:::; k i=1 xj2C xj 2: Since this may prove surprisingly di cult [ 2 ], k-means clustering is typically realized using the following greedy optimization procedure: (6) (7)

1. initialize cluster means 2. repeat until convergence (a) determine all clusters

1; 2; : : : ; k C = n

xi (b) update all cluster means xi =

1 jC j xi2C

X xi 2 xi 2o (8) (9)

Looking at this procedure, its adaptation towards k-medoids clustering is obvious: we simply have to replace the computation of means by that of medoids and use cluster medoids instead of means in (8). The extension towards meaningful k-maxoids clustering is straightforward, too, but not quite as obvious.

Assuming that k data points have been randomly selected as initial maxoids, we may of course cluster the data with respect to their distance to the maxoids. This is again in direct analogy to (8). However, updating the maxoids only w.r.t. the data points in their corresponding clusters may fail to produce reasonable partitions of the data since initially selected maxoids may be close to each other so that one (or several) of them may dominate the others in the subsequent cluster assignment. Our idea is thus to update maxoids not only w.r.t. the data in their cluster but also w.r.t. to the maxoids. That is, for the update step, we propose to select the new maxoid of cluster C as the data point in C that is farthest from the maxoids in the other clusters. Formally, this idea amounts to solving the following constrained minimizing problem

k argmin X X m1;:::;mk i=1 xj2C s.t.

m xj

m = argmax X xj2C 6=

2 xj m 2: (10) which is easy to recognize as a variant of the problem in (7). The corresponding greedy optimization procedure is shown in algorithm 1.

Figure 2 shows how k-means, k-medoids, and k-maxoids clustering perform on a data set consisting of three blob-like components. Setting k = 3, all three methods reliably identify the latent structures in these data. Observable di erences are miniscule and arguably negligible in practice.

However, an important question is how k-maxoids clustering will deal with situations where not all of the clusters contained in a data set are close to the data convex hull. To illustrate this problem and answer the question, Fig. 3 shows a set of 2D data consisting of ve clusters where one of them is situated in between the others and does not contain any point on the the data convex hull. The gure illustrates how the updates in algorithm 1 cause ve randomly selected maxoids to quickly move away from each other; in fact, in this example, the algorithm converged to a stable clustering within only four iterations.

Algorithm 1 k-maxoids clustering

Require: discrete set X = fx1; : : : ; xng Rm and parameter k 2 N initialize iteration counter t 0 and cluster maxoids m(10); m(20); : : : ; m(k0) while not converged do determine all clusters

C (t) = nxi xi m(t) 2 xi

m(t) 2o update all cluster maxoids m(t) = argmax X xi2C 6= xi

m(t) 2 increase iteration counter t

t + 1 Although we observed this kind of e ciency in other experiments as well, the important point conveyed by this example is that the idea of clustering according to extremes works well even if there are substructures who cannot possibly be represented by prototypes on the data convex hull. This is, again, due to the fact that algorithm 1 inherently causes selected maxoids to be as far apart as possible.

In order to illustrate that extremal cluster prototypes may be more easily interpretable to human analysts than central ones, we conducted an experiment with the CBCL data set of face images1 which contains 2429 portraits of people each of a resolution of 19 19 pixels. We turned each image into a 361 dimensional vector and applied k-means, k-medoids, and k-maxoids clustering where k = 9. The resulting prototypes in Fig. 4 clearly highlight the di erences between the three approaches.

Figure 4(b) shows the prototypes returned by k-means clustering. They represent the average face of each cluster and, since each cluster contains several hundred images, are blurred to an extent that makes it di cult to assign distinctive characteristics to these prototypes. A similiar observation applies to the results produced by k-medoids clustering shown in Fig. 4(b). Here, the prototypes correspond to actual data points yet still appear rather similar. The prototypes in Fig. 4(c), on the other hand, resulted from k-maxoids clustering and show clearly distinguishable visual characteristics. Again each corresponding cluster contains several hundred images, yet their prototypes coincide with actual data points far from one another. It is rather easy to identify these faces as prototypes of pale or dark skinned people, of people wearing glasses, sporting mustaches, or having been photographed under varying illumination conditions.

In the next section, we will present and discuss an example of a real world application which further highlights this favorable property of clustering with extremes, namely the property of producing interpretable results. 1 CBCL Face Database #1, MIT Center for Biological and Computation Learning, http://www.ai.mit.edu/projects/cbcl (a) initialization (b) 1st maxoid update (c) 1st cluster update (d) 2nd maxoid update (e) 2nd cluster update

(f) 3rd maxoid update : : : (g) 3rd cluster update (h) nal result

A Practical Application: Player Preference Pro ling in the Online Game Battle eld 3 With the rise of mobile, console, and PC based games that operate on a so called freemium model, the problem of understanding how players interact with games has become a major aspect of the game development cycle [ 4, 10, 19, 20 ]. In this context, analytics provides actionable insights as to player behaviors and allows developers and publishers to quickly adjust their content with respect to the (a) exemplary faces (b) k-means (c) k-medoids (d) k-maxoids Fig. 4: Clustering with k = 9 prototypes on the CBCL data base of face images. (a) examples of 64 face images in this data collection which illustrate the range of appearances. (b) k-means clustering produces cluster prototypes with are the means of the corresponding clusters. (c) k-medoids clustering determines prototypes that are actual data points closest to the local mean. (d) k-maxoids clustering yields cluster prototypes that are extremal data points and therefore appear more distinguishable to human observers than means or medoids. outcomes they receive and thus to increase sales and monetization rates. In this section, we apply the k-maxoids algorithm to a game analytics task, namely the problem of deriving interpretable player pro les from analyzing vehicle usage data of Battle eld 3.

Battle eld 3 is a rst person shooter military simulation game published by Electronic Arts in the Fall of 2011 as the eleventh installment in the Battle eld Series which, as of this writing, has a history of over 15 years. The game o ers a single- and multi-player game-play experience where the former is composed of a storyline that allows the player to control variety of military characters in di erent real world locations and the latter puts the player in a imaginary war between the United States of America (USA) and the Russian Federation (RF). The combination of rich storyline, realistic graphics, exibility through numerous manageable in-game components (such as vehicles and character customization), and the ability of supporting matches with large number of players has made the game one of the most played titles in its genre. Compared to its competitors, one of the most distinguishable features of the Battle eld series is the unique vehicle experience which allows the players to control air-, land-, water-based, and stationary vehicles.

The data we use in this study is a collection of vehicle usage logs of a random sample of 22,000 Battle eld 3 players which we obtained using a Web-based API for the Player Stats Network (https://p-stats.com/). In order to extract vehicle usage pro les from this data that can reveal how players interact with vehicle, we used accumulated activity statistics as to time-spent, number of character kills, and vehicle destroys made with the available 43 vehicles in the game.

Running the k-maxoids algorithm on our data set, we obtained interpretable player pro les that are semantically distinguishable from each other. In Fig. 5, we an example of k-maxoids cluster prototypes indicating di erent player preferences for vehicles in Battle eld 3. For each maxoid, we also indicate the percentage of players it represents.

Upon a closer look at the maxoids, we observe entirely distinct player pro les each representing di erent preferences for vehicles in the game. The rst maxoid represents a pilot player behavior, that is, a behavior where players spend most of their vehicle time ying multirole ghter jets (F-18 and Su-35) and attack jets (A-10 Thunderbolt and Su 25) where the same vehicle ordering applies for both kills and destroys. Speci cally for this particular maxoid the total ying time is 982 hours which is actually comparable to the average yearly ight time of experienced pilots in real life. It is important to note that the players in this cluster particularly chose to y with the equivalent (counterpart) planes for American and Russian teams, which, during gameplay, creates a balance between two teams. In other words, the prototype indicates a habit of choosing a particular type of vehicle during a game. Indeed, behavioral patterns like this are also observed for the pro les represented by the other prototypes.

The second and the fourth most populated pro les represent a preference for land oriented vehicles. Again, counterpart-vehicle mastering is also observed where the players of the second and fourth pro les prefer to mostly use the counterpart heavy and light tanks the American M1 Abrahams and the Russian T-90 and the infantry ghting vehicles BMB and LAV respectively.

A more distinct tower defense behavior is observed for the third maxoid where players in the corresponding cluster spend 85% of their time on two counterpart stationary anti-aircraft vehicles. Similar to the second pro le we observe the use of two counterpart heavy tanks M1 Abrahams and T-90 for this pro le as well.

The fth pro le, on the other hand, shows a helicopter pilot pro le where the maxoid player in this cluster spends 68% of his time ying light ghter helicopters AH 6 and Z 11.

Finally, the least populated pro le, represented by the right most maxoid in the gure, indicates that some players spend most of their time on the heavy counterpart tanks and the light ghting helicopters.

For comparison, we present results obtained from k-means clustering in Fig. 6. Similar to the face clustering example discussed in the previous section, we nd k-means pro les to indicate general averages or mixed preferences for (counterpart) heavy tanks, jets, and light ghting helicopters where each of the vehicles in a prototype ranks high among the overall most frequently played vehicles in our data set. Hence, while k-means results represent average behavioral pro les (as already hinted at in [ 10 ]), the maxoids found by k-maxoids clustering represent more extreme or archetypal behavior that can help game developers to develop a deeper understanding of truly di erent types of user preferences and pro les that cannot be captured k-means clustering but are important w.r.t. balancing the game mechanics. 5

Conclusion

In this paper, we investigated the idea of clustering according to extremal rather average properties of data points. In particular, we de ned the notion of the maxoid of a data set and presented an algorithm for k-maxoids clustering. This algorithm can be understood as a modi cation of the classical k-means procedure, where, in contrast to the classical approach, we determine cluster prototypes not only w.r.t. the data points in a cluster but w.r.t. the prototypes of other clusters. In a couple of didactic examples, we illustrated the behavior of this algorithm and then applied it to a practical problem in the area of game analytics.

In our didactic examples, as well as in our real world application, we observed our algorithm to produce cluster prototypes that are well distinguishable from one another and are thus more easily interpretable for human analysts. This property of clustering with extremes is particularly interesting for practitioners in game analytics for it allows them to quickly identify potentially imbalanced game mechanics.

In addition to these kinds of practical applications of k-maxoids clustering, we are currently investigating more theoretical aspects of its use. In particular, we examine its use as a mechanism to preselect archetypes for e cient archetypal analysis and hope to be able to report corresponding results soon.

1. Aggarwal , C. , Reddy , C. (eds.): Data Clustering: Algorithms and Applications . Chapman & Hall/CRC ( 2013 )

2. Aloise , D. , Deshapande , A. , Hansen , P. , Popat , P. : NP-Hardness of Euclidean Sumof-Squares Clustering . Machine Learning 75(2) ( 2009 )

3. Bauckhage , C. : NumPy / SciPy Recipes for Data Science: k-Medoids Clustering . researchgate. net (Feb 2015 ), https://dx.doi.org/10.13140/2.1.4453.2009

4. Bauckhage , C. , Kersting , K. , Sifa , R. , Thurau , C. , Drachen , A. , Canossa , A. : How Players Lose Interest in Playing a Game: An Empirical Study Based on Distributions of Total Playing Times . In: Proc. IEEE CIG ( 2012 )

5. Bauckhage , C. , Thurau , C. : Making Archetypal Analysis Practical . In: Denzler, J. , Notni , G . (eds.) Pattern Recogntion . LNCS , vol. 5748 . Springer ( 2009 )

6 . Chang , C.I. , Wu , C.C. , Liu , W.M. , Ouyang , Y.C. : A New Growing Method for Simplex-Based Endmember Extraction Algorithm . IEEE Trans. on Geoscience and Remote Sensing 44 ( 10 ) ( 2006 )

7. Cheng, Y.: Mean Shift, Mode Seeking, and Clustering . IEEE Trans. on Pattern Analysis and Machine Intelligence 17 ( 8 ) ( 1995 )

8. Cutler , A. , Breiman , L. : Archetypal Analysis . Technometrics 36 ( 4 ) ( 1994 )

9. Drachen , A. , Canossa , A. , Yannakakis , G.: Player Modeling using Self-Organization in Tomb Raider: Underworld . In: Proc. IEEE CIG ( 2009 )

10. Drachen , A. , Sifa , R. , Bauckhage , C. , Thurau , C. : Guns , Swords and Data: Clustering of Player Behavior in Computer Games in the Wild . In: Proc. IEEE CIG ( 2012 )

11. Ester , M. ,

H.-P.

Kriegel , Sander, J. , Xu , X. : A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise . In: Proc. ACM KDD ( 1996 )

12. Estivill-Castro , V. : Why So Many Clustering Algorithms: A Position Paper . ACM SIGKDD Explorations Newsletter 4 ( 1 ) ( 2002 )

13. Eugster , M. , Leisch , F. : Weighted and Robust Archetypal Analysis . Computational Statistics & Data Analysis 55 ( 3 ) ( 2011 )

14. Kaufman , L., Rousseeuv , P. : Clustering by Means of Medoids . In: Dodge, Y . (ed.) Statistical Data Analysis Based on the L1-Norm and Related Methods . Elsevier ( 1987 )

15. MacQueen , J.: Some Methods for Classi cation and Analysis of Multivariate Observations . In: Proc. Berkeley Symp. on Mathematical Statistics and Probability ( 1967 )

16. Miao , L. , Qi , H.: Endmember Extraction From Highly Mixed Data Using Minimum Volume Constrained Nonnegative Matrix Factorization . IEEE Trans. on Geoscience and Remote Sensing 45 ( 3 ) ( 2007 )

17. Morup , M. , Hansen , L. : Archetypal Analysis for Machine Learning and Data Mining. Neurocomputing 80 ( 2012 )

18. Ostrouchov , G. , Samatova , N.: On FastMap and the convex hull of multivariate data: toward fast and robust dimension reduction . IEEE Trans. on Pattern Analysis and Machine Intelligence 27 ( 8 ) ( 2005 )

19. Sifa , R. , Bauckhage , C. , Drachen , A. : Archetypal game recommender systems . In: Proc. LWA KDML ( 2014 )

20. Sifa , R. , Ojeda , C. , Bauckhage , C. : User Churn Migration Analysis with DEDICOM . In: Proc. ACM RecSys ( 2015 )

21. Thurau , C. , Kersting , K. , Wahabzada , M. , Bauckhage , C. : Descriptive Matrix Factorization for Sustainability: Adopting the Principle of Opposites . Data Mining and Knowledge Discovery 24 ( 2 ) ( 2012 )