=Paper=
{{Paper
|id=Vol-1458/E19_CRC4_Bauckhage
|storemode=property
|title=k-Maxoids Clustering
|pdfUrl=https://ceur-ws.org/Vol-1458/E19_CRC4_Bauckhage.pdf
|volume=Vol-1458
|dblpUrl=https://dblp.org/rec/conf/lwa/BauckhageS15
}}
==k-Maxoids Clustering==
<pdf width="1500px">https://ceur-ws.org/Vol-1458/E19_CRC4_Bauckhage.pdf</pdf>
<pre>
                           k-Maxoids Clustering

                         Christian Bauckhage1,2 and Rafet Sifa2
                      1
                      B-IT, University of Bonn, Bonn, Germany
                     2
                     Fraunhofer IAIS, Sankt Augustin, Germany
                http://mmprec.iais.fraunhofer.de/bauckhage.html


        Abstract. We explore the idea of clustering according to extremal rather
        than to central data points. To this end, we introduce the notion of the
        maxoid of a data set and present an algorithm for k-maxoids clustering
        which can be understood as a variant of classical k-means clustering. Ex-
        emplary results demonstrate that extremal cluster prototypes are more
        distinctive and hence more interpretable than central ones.


1     Introduction

In this paper, we introduce a novel, prototype-based clustering algorithm. Since
numerous such algorithms exist already [1, 12], our main goal is to fathom the
potential of a paradigm that differs from existing prototype-based methods.
    Whereas most prototype-based clustering algorithms produce prototypes that
represent modes of a distribution of data (notable examples include the k-means
procedure, the mean-shift algorithm, self organizing maps, or DBSCAN [7, 9, 11,
15]), our algorithm determines cluster prototypes that are extreme rather than
central. They reside on the convex hull of their corresponding clusters and, in
addition, are as far apart as possible.
    The idea for this approach was motivated by research on efficient archetypal
analysis, a matrix factorization technique that expresses a data set in terms of
convex combinations of points on the data convex hull [5, 8, 13, 17]. The result-
ing representations are easily interpretable by human analysts [8, 19, 21], allow
for clustering, and can facilitate classification. However, as their computation
involves demanding optimization problems, the quest for more efficient methods
and heuristics has become an active area of research [5, 6, 16, 18].
    In the following, we first define the notion of the maxoid of a data set, prove
that it will be furthest from the sample mean and necessarily coincides with a
vertex of the data convex hull. We then introduce a simple and efficient clustering
algorithm based on maxoids. It can be understood as a variant of the popular
k-means procedure, however, whereas k-means determines cluster prototypes
based on local information, our approach assumes a global view and selects the
    Copyright c 2015 by the papers authors. Copying permitted only for private and
    academic purposes. In: R. Bergmann, S. Görg, G. Müller (Eds.): Proceedings of
    the LWA 2015 Workshops: KDML, FGWM, IR, and FGDB. Trier, Germany, 7.-9.
    October 2015, published at http://ceur-ws.org


                                          133
          mean                        mean                         mean
          medoid                      medoid                       medoid
          maxoid                      maxoid                       maxoid


      (a) Gaussian blob                        (b) ring          (c) spiral

       Fig. 1: Three data sets and their means, medoids, and maxoids.


prototype of a cluster with respect to those of other clusters. In experiments
with synthetic and real world data, we illustrate the behavior of this algorithm
and observe that it yields prototypes which are more distinct and hence more
amenable to human interpretation than those produced by k-means.


2   Means, Medoids, and Maxoids

In this section, we briefly recall the concepts of the sample mean and sample
medoid, introduce the idea of the sample maxoid, and review its characteristics.
   Consider a finite set X = {xi }ni=1 ⊂ Rm of data points. The sample mean
                                                 n
                                          1X
                                 µ=             xi .                          (1)
                                          n i=1

is arguably the most popular summary statistic of such data. The closely related
concept of the sample medoid, however, seems less well known. It is given by
                                                 n
                                          1X            2
                          m = argmin            xj − xi                       (2)
                                 xj       n i=1

and coincides with the data point xj whose average distance to all other points
is smallest which is to say that it is the data point closest to the mean [3, 14].
    Yet, our focus in this paper is not on central tendencies but on extremal
characteristics of a set of data. To make this notion precise, we introduce the
idea of the sample maxoid and define
Definition 1. The maxoid of a set X = {xi }ni=1 ⊂ Rm is given by
                                                 n
                                          1X            2
                          m = argmax            xj − xi .                     (3)
                                 xj       n i=1


                                          134
    Apparently, this definition reverses that of the sample medoid in that it
replaces minimization by maximization. It is thus straightforward to prove
Lemma 1. Given a set X = {xi }ni=1 of real valued data vectors, let µ = n1 i xi
                                                                          P
be the sample mean and k·k be the Euclidean norm. Then
                       1X         2  1X          2
                           xj − xi ≥     xk − xi                               (4)
                       n i           n i

implies that
                                        2                 2
                               xj − µ       ≥ xk − µ          .                (5)
That is, the maxoid m, i.e. the point xj ∈ X with the largest average distance
to all other points in X , is farthest from the sample mean µ.

Proof. Note that the left hand side of (4) can be written as
               1X         2  1X                      2
                   xj − xi =     (xj − µ) − (xi − µ) .
               n i           n i

Expanding the squared Euclidean distances in this sum, we have
       1X             2            2
                                                           
               xj − µ + xi − µ − 2(xj − µ)T (xi − µ)
      n i
                     2   1X             2               1X
          = xj − µ +            xi − µ − 2(xj − µ)T          (xi − µ)
                         n i                            n i
                     2   1X             2
          = xj − µ +            xi − µ − 2(xj − µ)T (µ − µ)
                         n i
                     2   1X             2
          = xj − µ +            xi − µ .
                         n i

Since these arguments also apply to the right hand side of (4), the inequality in
(4) can be cast as
                   2       1X        2        2  1X         2
          xj − µ       +       xi − µ ≥ xk − µ +     xi − µ
                           n i                   n i

                                2                 2
which is to say that xj − µ         ≥ xk − µ          .                          t
                                                                                 u

    Given this result, it is easy to understand the behavior of the means, medoids,
and maxoids in Fig. 1. In particular, we note a caveat for analysts working with
centroid methods: the sample mean is always is located in the center of the data,
yet, in cases where there is no clear mode, it is rather far from most data.The
medoid may or may not be close to the mean but always coincides with a data
point. The maxoid, too, always coincides with a data point but its behavior
seems not to depend on whether or not there is a mode. In fact, we can prove
the following


                                            135
    (a) k-means clusters         (b) k-medoids clusters                (c) k-maxoids clusters

Fig. 2: A simple data set consisting of three Gaussian blobs and results obtained
from k-means, k-medoids, and k-maxoids clustering.


Lemma 2. The maxoid of a finite set X = {xi }ni=1 of real valued data vectors
coincides with a vertex of the convex hull of X .

Proof. The maxoid of X is the maximizer of the convex function
                                          1 X        2
                             f (x) =          x − xi .                                      (6)
                                          n
                                            xi ∈X

The domain of f is given by the discrete set X which defines a polytope, that
is, a convex set of finitely many vertices. By Jensen’s inequality, the maximum
of a convex function over a convex set is attained at a vertex.               t
                                                                              u


3    From k-Means Clustering to k-Maxoids Clustering
Having familiarized ourselves with means, medoids, and maxoids, we ever so
briefly revisit k-means clustering and then present our idea for how to extend it
towards k-maxoid clustering.
    In the simplest setting, k-means clustering considers a set of n data points
X = {xi }ni=1 ⊂ Rm and attempts to determine a set C = {Cκ }kκ=1 of k clusters
where Cκ ⊂ X such that data points within a cluster are similar. In order to assess
similarity, the algorithm represents each cluster by its mean µκ and assigns data
point xi to cluster Cκ if µκ is the closest mean. This idea reduces clustering to
the problem of finding appropriate means which can be formalized as solving
                                        k
                                        X X                    2
                           argmin                    xj − µκ       .                        (7)
                           µ1 ,...,µk
                                        i=1 xj ∈Cκ


Since this may prove surprisingly difficult [2], k-means clustering is typically
realized using the following greedy optimization procedure:


                                             136
1. initialize cluster means µ1 , µ2 , . . . , µk
2. repeat until convergence
   (a) determine all clusters
                              n                               o
                                                 2          2
                        Cκ = xi           xi − µκ ≤ xi − µλ                   (8)

   (b) update all cluster means
                                                 1 X
                                       µκ =          xi                       (9)
                                               |Cκ |
                                                   xi ∈Cκ

    Looking at this procedure, its adaptation towards k-medoids clustering is ob-
vious: we simply have to replace the computation of means by that of medoids
and use cluster medoids instead of means in (8). The extension towards mean-
ingful k-maxoids clustering is straightforward, too, but not quite as obvious.
    Assuming that k data points have been randomly selected as initial maxoids,
we may of course cluster the data with respect to their distance to the maxoids.
This is again in direct analogy to (8). However, updating the maxoids only w.r.t.
the data points in their corresponding clusters may fail to produce reasonable
partitions of the data since initially selected maxoids may be close to each other
so that one (or several) of them may dominate the others in the subsequent
cluster assignment. Our idea is thus to update maxoids not only w.r.t. the data
in their cluster but also w.r.t. to the maxoids. That is, for the update step, we
propose to select the new maxoid of cluster Cκ as the data point in Cκ that is
farthest from the maxoids in the other clusters. Formally, this idea amounts to
solving the following constrained minimizing problem
                                 k
                                 X X                     2
                    argmin                    x j − mκ
                    m1 ,...,mk
                                 i=1 xj ∈Cκ
                                                 X                 2
                        s.t. mκ = argmax                 xj − mλ       .     (10)
                                        xj ∈Cκ
                                                 λ6=κ

which is easy to recognize as a variant of the problem in (7). The corresponding
greedy optimization procedure is shown in algorithm 1.
    Figure 2 shows how k-means, k-medoids, and k-maxoids clustering perform
on a data set consisting of three blob-like components. Setting k = 3, all three
methods reliably identify the latent structures in these data. Observable differ-
ences are miniscule and arguably negligible in practice.
    However, an important question is how k-maxoids clustering will deal with
situations where not all of the clusters contained in a data set are close to the
data convex hull. To illustrate this problem and answer the question, Fig. 3
shows a set of 2D data consisting of five clusters where one of them is situated
in between the others and does not contain any point on the the data convex
hull. The figure illustrates how the updates in algorithm 1 cause five randomly
selected maxoids to quickly move away from each other; in fact, in this exam-
ple, the algorithm converged to a stable clustering within only four iterations.


                                          137
Algorithm 1 k-maxoids clustering
Require: discrete set X = {x1 , . . . , xn } ⊂ Rm and parameter k ∈ N
                                                            (0)  (0)       (0)
 initialize iteration counter t ← 0 and cluster maxoids m1 , m2 , . . . , mk
 while not converged do
     determine all clusters
                   n                                      o
                                        2           (t) 2
            Cκ(t) = xi     xi − m(t)
                                  κ       ≤ x i − mλ

      update all cluster maxoids
                                       (t) 2
                           X
           m(t)
             κ = argmax         xi − m λ
                    xi ∈Cκ
                             λ6=κ

      increase iteration counter t ← t + 1


Although we observed this kind of efficiency in other experiments as well, the
important point conveyed by this example is that the idea of clustering accord-
ing to extremes works well even if there are substructures who cannot possibly
be represented by prototypes on the data convex hull. This is, again, due to the
fact that algorithm 1 inherently causes selected maxoids to be as far apart as
possible.
    In order to illustrate that extremal cluster prototypes may be more easily
interpretable to human analysts than central ones, we conducted an experiment
with the CBCL data set of face images1 which contains 2429 portraits of people
each of a resolution of 19×19 pixels. We turned each image into a 361 dimensional
vector and applied k-means, k-medoids, and k-maxoids clustering where k = 9.
The resulting prototypes in Fig. 4 clearly highlight the differences between the
three approaches.
    Figure 4(b) shows the prototypes returned by k-means clustering. They rep-
resent the average face of each cluster and, since each cluster contains several
hundred images, are blurred to an extent that makes it difficult to assign dis-
tinctive characteristics to these prototypes. A similiar observation applies to the
results produced by k-medoids clustering shown in Fig. 4(b). Here, the pro-
totypes correspond to actual data points yet still appear rather similar. The
prototypes in Fig. 4(c), on the other hand, resulted from k-maxoids clustering
and show clearly distinguishable visual characteristics. Again each correspond-
ing cluster contains several hundred images, yet their prototypes coincide with
actual data points far from one another. It is rather easy to identify these faces
as prototypes of pale or dark skinned people, of people wearing glasses, sporting
mustaches, or having been photographed under varying illumination conditions.
    In the next section, we will present and discuss an example of a real world
application which further highlights this favorable property of clustering with
extremes, namely the property of producing interpretable results.
1
    CBCL Face Database #1, MIT Center for Biological and Computation Learning,
    http://www.ai.mit.edu/projects/cbcl


                                         138
      (a) initialization      (b) 1st maxoid update         (c) 1st cluster update


 (d) 2nd maxoid update         (e) 2nd cluster update       (f) 3rd maxoid update


                                        ...


    (g) 3rd cluster update                                      (h) final result

Fig. 3: Convergence behavior of k-maxoid clustering applied to a 2D data set
containing five blob like clusters. Started with a random initialization of maxoids,
the algorithm quickly moves them apart and converges within four iterations.


4      A Practical Application: Player Preference Profiling in
       the Online Game Battlefield 3

With the rise of mobile, console, and PC based games that operate on a so called
freemium model, the problem of understanding how players interact with games
has become a major aspect of the game development cycle [4, 10, 19, 20]. In this
context, analytics provides actionable insights as to player behaviors and allows
developers and publishers to quickly adjust their content with respect to the


                                       139
         (a) exemplary faces                             (b) k-means


           (c) k-medoids                                (d) k-maxoids

Fig. 4: Clustering with k = 9 prototypes on the CBCL data base of face images.
(a) examples of 64 face images in this data collection which illustrate the range
of appearances. (b) k-means clustering produces cluster prototypes with are
the means of the corresponding clusters. (c) k-medoids clustering determines
prototypes that are actual data points closest to the local mean. (d) k-maxoids
clustering yields cluster prototypes that are extremal data points and therefore
appear more distinguishable to human observers than means or medoids.


outcomes they receive and thus to increase sales and monetization rates. In this
section, we apply the k-maxoids algorithm to a game analytics task, namely the
problem of deriving interpretable player profiles from analyzing vehicle usage
data of Battlefield 3.


                                      140
    Battlefield 3 is a first person shooter military simulation game published by
Electronic Arts in the Fall of 2011 as the eleventh installment in the Battlefield
Series which, as of this writing, has a history of over 15 years. The game offers
a single- and multi-player game-play experience where the former is composed
of a storyline that allows the player to control variety of military characters in
different real world locations and the latter puts the player in a imaginary war
between the United States of America (USA) and the Russian Federation (RF).
The combination of rich storyline, realistic graphics, flexibility through numerous
manageable in-game components (such as vehicles and character customization),
and the ability of supporting matches with large number of players has made
the game one of the most played titles in its genre. Compared to its competitors,
one of the most distinguishable features of the Battlefield series is the unique
vehicle experience which allows the players to control air-, land-, water-based,
and stationary vehicles.
    The data we use in this study is a collection of vehicle usage logs of a random
sample of 22,000 Battlefield 3 players which we obtained using a Web-based API
for the Player Stats Network (https://p-stats.com/). In order to extract vehicle
usage profiles from this data that can reveal how players interact with vehicle,
we used accumulated activity statistics as to time-spent, number of character
kills, and vehicle destroys made with the available 43 vehicles in the game.
    Running the k-maxoids algorithm on our data set, we obtained interpretable
player profiles that are semantically distinguishable from each other. In Fig. 5,
we an example of k-maxoids cluster prototypes indicating different player pref-
erences for vehicles in Battlefield 3. For each maxoid, we also indicate the per-
centage of players it represents.
    Upon a closer look at the maxoids, we observe entirely distinct player profiles
each representing different preferences for vehicles in the game. The first maxoid
represents a pilot player behavior, that is, a behavior where players spend most
of their vehicle time flying multirole fighter jets (F-18 and Su-35) and attack
jets (A-10 Thunderbolt and Su 25) where the same vehicle ordering applies for
both kills and destroys. Specifically for this particular maxoid the total flying
time is 982 hours which is actually comparable to the average yearly flight time
of experienced pilots in real life. It is important to note that the players in
this cluster particularly chose to fly with the equivalent (counterpart) planes
for American and Russian teams, which, during gameplay, creates a balance
between two teams. In other words, the prototype indicates a habit of choosing
a particular type of vehicle during a game. Indeed, behavioral patterns like this
are also observed for the profiles represented by the other prototypes.
    The second and the fourth most populated profiles represent a preference
for land oriented vehicles. Again, counterpart-vehicle mastering is also observed
where the players of the second and fourth profiles prefer to mostly use the
counterpart heavy and light tanks the American M1 Abrahams and the Russian
T-90 and the infantry fighting vehicles BMB and LAV respectively.
    A more distinct tower defense behavior is observed for the third maxoid where
players in the corresponding cluster spend 85% of their time on two counterpart


                                       141
C-1 % 89.745 C-2 % 3.950 C-3 % 3.268 C-4 % 1.891 C-5 % 1.109 C-6 % 0.023 C-7 % 0.014


Fig. 5: Seven player vehicle usage profiles obtained from k-maxoids clustering.
Each column visualizes a cluster prototype mκ which indicates the most popular
vehicles in the corresponding cluster of players. Note that prototypes are sorted
according to the percentage of players they represent and that we show the top
5 elements of each prototype.

C-1 % 61.432 C-2 % 25.541 C-3 % 5.650 C-4 % 4.777 C-5 % 1.459 C-6 % 0.659 C-7 % 0.482


Fig. 6: Seven player vehicle usage profiles resulting from k-means clustering.


stationary anti-aircraft vehicles. Similar to the second profile we observe the use
of two counterpart heavy tanks M1 Abrahams and T-90 for this profile as well.
    The fifth profile, on the other hand, shows a helicopter pilot profile where
the maxoid player in this cluster spends 68% of his time flying light fighter
helicopters AH 6 and Z 11.
    Finally, the least populated profile, represented by the right most maxoid in
the figure, indicates that some players spend most of their time on the heavy
counterpart tanks and the light fighting helicopters.


                                        142
    For comparison, we present results obtained from k-means clustering in Fig. 6.
Similar to the face clustering example discussed in the previous section, we find
k-means profiles to indicate general averages or mixed preferences for (counter-
part) heavy tanks, jets, and light fighting helicopters where each of the vehicles in
a prototype ranks high among the overall most frequently played vehicles in our
data set. Hence, while k-means results represent average behavioral profiles (as
already hinted at in [10]), the maxoids found by k-maxoids clustering represent
more extreme or archetypal behavior that can help game developers to develop
a deeper understanding of truly different types of user preferences and profiles
that cannot be captured k-means clustering but are important w.r.t. balancing
the game mechanics.


5   Conclusion

In this paper, we investigated the idea of clustering according to extremal rather
average properties of data points. In particular, we defined the notion of the
maxoid of a data set and presented an algorithm for k-maxoids clustering. This
algorithm can be understood as a modification of the classical k-means proce-
dure, where, in contrast to the classical approach, we determine cluster proto-
types not only w.r.t. the data points in a cluster but w.r.t. the prototypes of
other clusters. In a couple of didactic examples, we illustrated the behavior of
this algorithm and then applied it to a practical problem in the area of game
analytics.
    In our didactic examples, as well as in our real world application, we observed
our algorithm to produce cluster prototypes that are well distinguishable from
one another and are thus more easily interpretable for human analysts. This
property of clustering with extremes is particularly interesting for practitioners
in game analytics for it allows them to quickly identify potentially imbalanced
game mechanics.
    In addition to these kinds of practical applications of k-maxoids clustering,
we are currently investigating more theoretical aspects of its use. In particular,
we examine its use as a mechanism to preselect archetypes for efficient archetypal
analysis and hope to be able to report corresponding results soon.


References
 1. Aggarwal, C., Reddy, C. (eds.): Data Clustering: Algorithms and Applications.
    Chapman & Hall/CRC (2013)
 2. Aloise, D., Deshapande, A., Hansen, P., Popat, P.: NP-Hardness of Euclidean Sum-
    of-Squares Clustering. Machine Learning 75(2) (2009)
 3. Bauckhage, C.: NumPy / SciPy Recipes for Data Science: k-Medoids Clustering.
    researchgate.net (Feb 2015), https://dx.doi.org/10.13140/2.1.4453.2009
 4. Bauckhage, C., Kersting, K., Sifa, R., Thurau, C., Drachen, A., Canossa, A.: How
    Players Lose Interest in Playing a Game: An Empirical Study Based on Distribu-
    tions of Total Playing Times. In: Proc. IEEE CIG (2012)


                                        143
 5. Bauckhage, C., Thurau, C.: Making Archetypal Analysis Practical. In: Denzler, J.,
    Notni, G. (eds.) Pattern Recogntion. LNCS, vol. 5748. Springer (2009)
 6. Chang, C.I., Wu, C.C., Liu, W.M., Ouyang, Y.C.: A New Growing Method for
    Simplex-Based Endmember Extraction Algorithm. IEEE Trans. on Geoscience and
    Remote Sensing 44(10) (2006)
 7. Cheng, Y.: Mean Shift, Mode Seeking, and Clustering. IEEE Trans. on Pattern
    Analysis and Machine Intelligence 17(8) (1995)
 8. Cutler, A., Breiman, L.: Archetypal Analysis. Technometrics 36(4) (1994)
 9. Drachen, A., Canossa, A., Yannakakis, G.: Player Modeling using Self-Organization
    in Tomb Raider: Underworld. In: Proc. IEEE CIG (2009)
10. Drachen, A., Sifa, R., Bauckhage, C., Thurau, C.: Guns, Swords and Data: Clus-
    tering of Player Behavior in Computer Games in the Wild. In: Proc. IEEE CIG
    (2012)
11. Ester, M., H.-P.Kriegel, Sander, J., Xu, X.: A Density-based Algorithm for Dis-
    covering Clusters in Large Spatial Databases with Noise. In: Proc. ACM KDD
    (1996)
12. Estivill-Castro, V.: Why So Many Clustering Algorithms: A Position Paper. ACM
    SIGKDD Explorations Newsletter 4(1) (2002)
13. Eugster, M., Leisch, F.: Weighted and Robust Archetypal Analysis. Computational
    Statistics & Data Analysis 55(3) (2011)
14. Kaufman, L., Rousseeuv, P.: Clustering by Means of Medoids. In: Dodge, Y. (ed.)
    Statistical Data Analysis Based on the L1-Norm and Related Methods. Elsevier
    (1987)
15. MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Ob-
    servations. In: Proc. Berkeley Symp. on Mathematical Statistics and Probability
    (1967)
16. Miao, L., Qi, H.: Endmember Extraction From Highly Mixed Data Using Mini-
    mum Volume Constrained Nonnegative Matrix Factorization. IEEE Trans. on Geo-
    science and Remote Sensing 45(3) (2007)
17. Morup, M., Hansen, L.: Archetypal Analysis for Machine Learning and Data Min-
    ing. Neurocomputing 80 (2012)
18. Ostrouchov, G., Samatova, N.: On FastMap and the convex hull of multivariate
    data: toward fast and robust dimension reduction. IEEE Trans. on Pattern Analysis
    and Machine Intelligence 27(8) (2005)
19. Sifa, R., Bauckhage, C., Drachen, A.: Archetypal game recommender systems. In:
    Proc. LWA KDML (2014)
20. Sifa, R., Ojeda, C., Bauckhage, C.: User Churn Migration Analysis with DEDI-
    COM. In: Proc. ACM RecSys (2015)
21. Thurau, C., Kersting, K., Wahabzada, M., Bauckhage, C.: Descriptive Matrix Fac-
    torization for Sustainability: Adopting the Principle of Opposites. Data Mining and
    Knowledge Discovery 24(2) (2012)


                                         144

</pre>