=Paper= {{Paper |id=Vol-1231/long2 |storemode=property |title=Size-constrained 2-clustering in the plane with Manhattan distance |pdfUrl=https://ceur-ws.org/Vol-1231/long2.pdf |volume=Vol-1231 |dblpUrl=https://dblp.org/rec/conf/ictcs/BertoniGLP14 }} ==Size-constrained 2-clustering in the plane with Manhattan distance== https://ceur-ws.org/Vol-1231/long2.pdf
    Size-constrained 2-clustering in the plane with
                  Manhattan distance

       Alberto Bertoni, Massimiliano Goldwurm, Jianyi Lin, Linda Pini

           Dipartimento di Informatica, Università degli Studi di Milano
                    Via Comelico 39/41, 20135 Milano – Italy,



      Abstract. We present an algorithm for the 2-clustering problem with
      cluster size constraints in the plane assuming `1 -norm, that works in
      O(n3 log n) time and O(n) space. Such a procedure also solves a full
      version of the problem, computing the optimal solutions for all possible
      constraints on cluster sizes. The algorithm is based on a separation result
      concerning the clusters of any optimal solution of the problem and on
      an extended version of red-black trees to maintain a bipartition of a set
      of points in the plane.


Keywords: algorithms and data structures, clustering, cluster size constraints,
Manhattan distance.


1    Introduction
Clustering is one of the most used techniques in statistical data analysis and ma-
chine learning [5], with a wide range of applications in many areas like pattern
recognition, bioinformatics and image processing. Clustering consists in parti-
tioning data into groups, called clusters, that are required to be homogeneous
and well-separated according to a data similarity measure [9]. A natural repre-
sentation of the data elements is by means of points in a d-dimensional space
equipped with a suitable distance function that determines the similarity mea-
sure. We study this very usual case, called distance clustering.
    A central problem in distance clustering is the well-known Minimum Sum-of-
Squares Clustering (MSSC), also called Variance-based Clustering [2, 10]. The
MSSC problem requires to find a k-partition {A1 , ..., Ak } of a given set X =
{x1 , ..., xn } ⊂ Rd of n points that minimizes the weight
                                            k X
                                            X                     2
                      W (A1 , ..., Ak ) =              ||x − CAi ||2
                                            i=1 x∈Ai

where ||·||2 denotes the Euclidean norm and CAi is the mean point of the cluster
Ai , defined by
                                    X              2       1 X
                   CAi = argmin           ||µ − x||2 =          x.
                            µ∈Rd                          |Ai |
                                   x∈Ai                        x∈Ai



                                            33
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance

  MSSC is difficult in general; indeed for an arbitrary dimension d (given in the
  instance) the problem is NP-hard even if the number of clusters is fixed to k = 2
  [1, 8]; the same occurs if k is arbitrary and the dimension is fixed to d = 2 [14].
  However there’s a well-known heuristic for finding an approximate solution of
  MSSC, called k-Means [13], which is known to be usually very fast, but can
  require exponential time in the worst case [17].
     Moreover, a known variant of MSSC, called k-Medians Problem, is given by
                                               2
  substituting the squared-euclidean norm ||·||2 with the `1 -norm ||·||1 [15].
      In many applications, often people have some information on the clusters [3]:
  including this information into traditional clustering algorithms can increase the
  clustering performance. Problems that include such background information are
  called constrained clustering problems and are split into two classes. On the one
  hand, clustering problems with point-level constraints typically comprise a set
  of must-link constraints or cannot-link constraints [18], defining pairs of point
  that, respectively, must be or cannot be grouped in the same cluster. On the
  other hand, clustering problems with cluster-level constraints [6, 16] provides
  constraints concerning the size of the resulting clusters. For example, in [19]
  cluster size constraints are used for improving clustering accuracy; this approach,
  for instance, allows one to avoid extremely small or large clusters yielded by
  classical clustering techniques.
      In this work we study the 2-clustering problem in the plane, with cluster size
  constraints, assuming `1 -norm. The relevance of the 2-clustering problems is also
  due to the wide spread of hierarchical clustering techniques [11, Sec.14.3.12],
  that repeatedly apply the 2-clustering as the key step. This leads to natural
  size-constrained versions of so-called divisive hierarchical clustering.
      We recall that the 2-clustering problem with cluster size constraints has been
  studied in [12, 4], where it is shown that in dimension 1 the problem is solvable
  in polynomial time for every norm `p with integer p ≥ 1, while there is some
  evidence that the same result does not hold for non-integer p. Moreover, it is
  also known that for arbitrary dimension d the same problem is NP-hard even
  assuming equal sizes of the two clusters.
      Here we prove that, assuming dimension 2 and `1 -norm, the 2-clustering
  problem with cluster size constraints can be solved in O(n3 log n) time and O(n)
  space. Our procedure actually solves a full version of the problem, computing the
  optimal solutions for all possible sizes of cluster. Clearly this yields the solution
  to the general unconstrained 2-clustering problem in the plane with `1 -norm.
      The algorithm is based on a separation result stating that, in our hypotheses,
  the clusters of any optimal solution are separated by curves of some special forms,
  which can be analysed in a suitable linear order. The algorithm also make use
  of a particular extension of red-black trees to maintain a bipartition of a set of
  points in the plane.
      We remark that in this work we propose an efficient method for obtaining a
  solution that is globally optimal, instead of a locally optimal solution as yielded
  by various heuristics [3, 6, 19].


                                           34
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance

  2    Problem definition

  In this section we introduce some basic notions of distance-based clustering prob-
  lems on the plane assuming the Manhattan norm (also called `1 -norm), which
  is defined by kxk1 = |x1 | + |x2 | for any x = (x1 , x2 ) ∈ R2 .
      Given a set X ⊂ R2 of n points in the plane, a k-partition of X is a family
  {A1 , A2 , ..., Ak } of k nonempty subsets of X such that ∪ki=1 Ai = X and Ai ∩Aj =
  ∅, for i 6= j. Each Ai is called cluster. The centroid CA of a cluster A ⊂ X is
                             X                      X
            CA = argmin         kx − µk1 = argmin       (|x1 − µ1 | + |x2 − µ2 |)
                   µ∈R2   x∈A                 µ∈R2       x∈A

  Since the objective function in the minimization is not strictly convex, the cen-
  troid CA is not necessarily unique. Indeed it is well-known that the centroid CA
  is the component-wise median, i.e. the abscissa (ordinate) of the centroid is the
  median of the abscissae (ordinates) of the points in A. The weight W (A) of a
  cluster A is                          X
                               W (A) =     kx − CA k1
                                            x∈A

  while the weight of a k-partition {A1 , A2 , ..., Ak } (also called k-clustering) is
                                                         k
                                                         X
                           W (A1 , A2 , · · · , Ak ) =         W (Ai ).
                                                          1

  The classical Clustering Problem in the plane assuming the Manhattan norm is
  stated as follows.
  Definition 1 (Clustering Problem in R2 under `1 -norm). Given a point
  set X ⊂ R2 of cardinality n and an integer k, 1 < k < n, find P           a k-clustering
                                                                              k
  {A1 , A2 , ..., Ak } that minimizes the weight W (A1 , A2 , · · · , Ak ) = 1 W (Ai ).
  Note that here k is included in the instance of the problem. If k is fixed the prob-
  lem is called k-Clustering in R2 . Our main contribution in this paper concerns
  the 2-Clustering in R2 under `1 -norm.
     We are interested in a version of clustering problem where the cluster sizes
  are constrained. Formally, the problem can be stated as follows:
  Definition 2 (Size Constrained Clustering Problem in R2 under `1 -
  norm). Given a point set X ⊂ R2 of cardinality n, an integer k > 1 and
                                                  Pk
  k positive integers m1 , m2 , ..., mk such that     1 mi = n, find a k-clustering
  {A1 , A2 , ..., Ak } with |Ai | = mi for i = 1, ..., k, that minimizes the weight
                             Pk
  W (A1 , A2 , · · · , Ak ) = 1 W (Ai ).
  We denote this problem by SCC-2 (under `1 -norm). We stress that in the SCC-2
  problem the integers n, k, m1 , . . . , mk are part of the instance. On the contrary,
  if k is fixed and does not belong to the instance, the problem is denoted by
  k-SCC-2.


                                             35
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance

      In the following sections we present an algorithm for the 2-Clustering problem
  in R2 under `1 -norm, which solves at the same time the full version of the 2-
  SCC-2 problem, i.e. that yields the solutions to all 2-SCC-2 problems for every
  value of m1 ∈ {1, 2, . . . , bn/2c} and the same point set X = {x1 , x2 , ..., xn } ⊂ R2
  in input. The procedures works in O(n3 log n) time and is based on a separation
  result concerning the optimal solutions of 2-Clustering in the plane, presented
  in Section 3, and on an augmented version of red-black trees used to maintain
  bipartitions of a set of real points, described in Section 4.


  3      Separation results
  In this section we present a separation result concerning the optimal solution of
  2-SCC-2 problem under `1 -norm. It turns out that the clusters of the optimal
  solutions of this problem are separated by plane curves of 8 different types, we
  will call Ci and Si , i = 1, 2, 3, 4, respectively. Similar results are obtained in [4]
  for higher dimensional spaces and `p -norms, with p > 1.
      First, we show that in an optimal 2-clustering the swapping of two elements
  in different clusters does not decrease the solution value.
  Lemma 1 (Swapping Lemma). Let {A, B} be an optimal solution of the 2-
  SCC-2 problem. Then, for any a ∈ A and b ∈ B, it holds

                 ||a − CA ||1 + ||b − CB ||1 ≤ ||a − CB ||1 + ||b − CA ||1 .

  Proof. For the sake of simplicity, we omit the subscript 1 in denoting norm. By
  contradiction, let us assume that for some a ∈ A and b ∈ B

                   ||a − CA || + ||b − CB || > ||a − CB || + ||b − CA || .

  Then, the weight of the optimal solution is
              X                 X
  W (A, B) =      ||x − CA || +    ||x − CB || =
                 x∈A                    x∈B
                   X                                           X
             >                 ||x − CA || + ||b − CA || +             ||x − CB || + ||a − CB || =
                 x∈Ar{a}                                     x∈Br{b}
                       X                                X
             =                     ||x − CA || +                 ||x − CB || .
                 x∈Ar{a}∪{b}                       x∈Br{b}∪{a}

  Now, it is clear that the 2-clustering {A0 , B 0 }, with A0 = A r {a} ∪ {b} and
  B 0 = B r {b} ∪ {a}, is a feasible solution since |A0 | = |A| and |B 0 | = |B|.
  Furthermore, by definition of centroid it holds:
  X                 X                 X                   X
       ||x − CA ||+    ||x − CB || ≥       ||x − CA0 ||+      ||x − CB 0 || = W (A0 , B 0 )
  x∈A0                 x∈B 0                  x∈A0                x∈B 0

  and hence W (A, B) > W (A0 , B 0 ). However, this is in contradiction since {A, B}
  is an optimal solution.                                                          t
                                                                                   u


                                                   36
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance

  We are now able to obtain the general form of the equation for the curves
  separating the clusters in an optimal 2-clustering.
  Theorem 1 (Separation Result). In an optimal solution {A, B} of the 2-
  SCC-2 problem, the clusters A and B are separated by a curve of equation

                                 ||z − α||1 − ||z − β||1 = g                          (1)

  where the variable z ranges over R2 , while α, β ∈ R2 and g ∈ R are suitable
  constants.

  Proof. Let CA and CB be the centroids of A and B respectively. Then, by Lemma
  1 the inequality

                   ||a − CA ||1 − ||a − CB ||1 ≤ ||b − CA ||1 − ||b − CB ||1

  is satisfied for all a ∈ A and b ∈ B, and hence it also holds by taking the
  maximum over a’s on the left and the minimum over b’s on the right:

      cA := max{||a − CA ||1 − ||a − CB ||1 } ≤ min{||b − CA ||1 − ||b − CB ||1 } =: cB
            a∈A                                   b∈B

  Therefore, choosing g ∈ [cA , cB ], all the points of A and B are contained in the
  region defined respectively by ||z − CA ||1 − ||z − CB ||1 ≤ g and ||z − CA ||1 −
  ||z − CB ||1 ≥ g. The common boundary of the two regions has equation
  ||z − CA ||1 − ||z − CB ||1 = g, and thus the proof is concluded by setting α = CA
  and β = CB .                                                                     t
                                                                                   u

  Remark 1. It is noteworthy to observe that both the Swapping Lemma and the
  Separation Result can be extended to the case of `p -norm with any p ≥ 1 and
  dimension d ≥ 2, provided the norm is raised to the p-th power [12].

  Setting the symbols z = (x, y), α = (c, d), β = (e, f ), equation (1) becomes

                          |x − c| − |x − e| + |y − d| − |y − f | = g

  It is clear that it always represents a connected curve on the plane, which may
  have some sharp corner points, i.e. points where one or both partial derivatives
  do not exist. A rather standard study of this equation, conducted by considering
  all possible values of c, d, e, f, g, leads to derive 8 types of separating curves, we
  denote by symbols C1 , C2 , C3 , C4 and S1 , S2 , S3 , S4 , according to their shape.
  The proof is here omitted for lack of space. Such 8 types of curves are depicted
  in Figures 1–4.


  4      Bipartition red-black trees
  In this section we describe a natural data structure to maintain a bipartition of
  a set of real numbers, with possible multiple values, able to test membership, to
  execute insertion and deletion, and to perform two further operations: selecting


                                             37
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance

  the i-th smallest element in either of the two clusters of the bipartition, for any
  integer i, and moving an element from one cluster to the other.
     Let X be a finite multiset of real numbers, i.e. a finite set X of reals where
  any value may occur several times, represented as a finite ordered sequence

           X = (x1 , x2 , . . . , xn ), xi ∈ R for every i, and x1 ≤ x2 ≤ · · · ≤ xn .

  Moreover, consider a bipartition {A, B} of X, i.e. two non-empty subsets A ⊆ X,
  B ⊆ X such that A ∩ B = ∅ and A ∪ B = X. In the following A and B are
  called clusters. Operations Member, Insert and Delete can be defined in order
  to specify the cluster the elements belong to. More precisely, for every z ∈ R,
  Member(z) = (j1 , j2 ), where j1 (resp. j2 ) is the number of xi ’s in A (resp. in B)
  such that xi = z, Insert(z, A) adds a new element z to A, Insert(z, B) does the
  same to B, while Delete(z, A) and Delete(z, B) execute the delete operations.

                             C1                                         S2        C1



                                        β
       f
  S1                          @
                     @         @                                                           β
                      @                                 f
                                                                         @          @@
                             @                                               @
  C2                             @                 C2
       d             @                                  d           @         @
                 α                                              α



                     c              e                               c                  e
                         Figure 1                                        Figure 2




                 α
       d
                                                                α
  C3                                                    d

  S3                                               C3
       f                                                f
                                        β                                                  β



                     c       C4 e                                   c S4      C4 e
                         Figure 3                                      Figure 4




                                              38
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance

      Moreover, for every i ∈ {1, 2, . . . , n}, we define
                     
                       a if i ≤ |A| and a is the i-th smallest element of A
       SelectA (i) =
                       ⊥ if i > |A|
                     
                       b if i ≤ |B| and b is the i-th smallest element of B
       SelectB (i) =                                                        .
                       ⊥ if i > |B|

  Finally, for any z ∈ R, if A contains an element xi = z then MoveAB (z) deletes
  a value z from A and adds it to B, otherwise MoveAB (z) = ⊥. The operation
  MoveBA (z) is defined symmetrically.
      A natural data structure, able to executes the above operations in time log-
  arithmic with respect to n = |X|, is an augmented version of the well-known
  red-black trees [7]. Formally, we define a Bipartition Red-Black Tree for a bipar-
  tition {A, B} of X as a binary tree T such that: i ) all internal nodes have both
  the left and the right son, ii ) every node is red or black, iii ) the root and the
  leaves are black, iv ) if a node is red its sons are black, and v ) for every node v all
  simple paths from v to any leaf have the same number of black nodes. Further,
  the distinct elements of X (called keys) are assigned to the internal nodes in
  a bijective correspondence, respecting the standard rule of binary search trees,
  and the leaves are empty (usually represented by “nil”). Moreover, T satisfies
  the following two conditions:
   1. every internal node v contains a triple (z, mA (v), mB (v)), where z is the key
      assigned to v, and mA (v) (respectively, mB (v)) is the number of elements xi
      belonging to A (resp. B) such that xi = z;
   2. every internal node v contains the pair of integers (cA (v), cB (v)), where cA (v)
      (resp. cB (v)) is the number of elements of X belonging to A (resp. B) whose
      value is assigned to a vertex in the subtree rooted at v.
      For each internal node v, we denote by key(v), left(v) and right(v) the key
  contained in v, the left son and the right son of v, respectively. Clearly, v can be
  implemented as a record including the values mA (v), mB (v)), cA (v), cB (v).
      Procedures Member, Insert and Delete can be defined as in the standard red-
  blacks trees [7] with obvious changes. Clearly, Insert and Delete have to update
  the values mA (v), mB (v)), cA (v), cB (v), for all nodes v on the path from the
  root to the vertex resulting from the binary search.
      Operations Select and Move can be executed as follows. To compute SelectA (i)
  one checks whether i ≤ cA (r), where r is the root of T : in the affirmative case
  a standard searching procedure (we avoid to define for lack of space) is called,
  otherwise the value ⊥ is returned. SelectB (i) is defined in a similar way.
      To compute MoveAB (z) one first determines the node v containing z by
  calling a rather usual procedure, then the commands

                   mA (v) = mA (v) − 1     and mB (v) = mB (v) + 1

  are executed. Procedure MoveBA (z) is defined symmetrically.
     It is clear that all previous procedures can be executed in O(log n) time.


                                           39
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance

  5    Updating the weight of clusters
  Our main result, presented in the next section, is based on a procedure that
  updates a current bipartition by moving one point from a cluster to the other.
  Here we want to show how to update the centroids and the weights of the clusters
  of a current bipartition after executing such a moving operation.
      To this end we first deal with the problem of updating centroid and weight
  of a set of real numbers subject to insert and delete operations. The results will
  be easily extended to a set of points in R2 .
      Let A be a cluster in R, i.e. a finite sequence of ordered real numbers, A =
  {a1 , a2 , . . . , ak }, where a1 ≤ a2 ≤ · · · ≤ ak . Assuming `1 -norm it is well-known
  that the centroid of A is the median of the set, that is the value
                                      (
                                         a k+1             if k is odd
                              M (A) = 1 2               
                                         2 a k + a k +1 if k is even
                                            2        2


  Thus, if A is a cluster of a bipartition (of a finite multiset X ⊂ R of n elements),
  maintained by a Bipartition Red-Black Tree, then M (A) can be computed in
  O(log n) time by calling procedure SelectA (j) for suitable j’s.
      Now, let us recall the definition of the left and right part of the weight of A;
  given m = k+12 , we have
                                   X                                X
                        L(A) :=           ai ,           R(A) :=           ai
                                  1≤i 0 and yi > 0 for all i.
      We define a procedure that searches an optimal bipartition of X whose clus-
  ters are separated by a curve of type C1 . The procedure first sorts X according
  with 3 parameters: the two coordinates and their sum, i.e. the values xi ’s, yi ’s and
  xi + yi ’s, respectively. Then the elements of X are processed by three for-loops,
  each nested into the other, corresponding to the three parameters considered
  above. The algorithm evaluates all possible bipartitions of X whose clusters are
  separated by curves of type C1 and, for every admissible cluster size, only the
  bipartition with minimum weight is maintained. These bipartitions are consid-
  ered in a linear order and, at each step, a new bipartition of X is obtained from
  the current one {A, B} by swapping a point from cluster B to cluster A.
      Such a current bipartition {A, B} is maintained by two bipartition red-black
  trees: one for the 2-clustering {Ax , Bx } of the set {xi | pi = (xi , yi ) ∈ X},
  where Ax = {xi | pi ∈ A} and Bx = {xi | pi ∈ B}, and the other for the
  2-clustering (Ay , By ) of {yi | pi = (xi , yi ) ∈ X}, where Ay = {yi | pi ∈ A} and
  By = {yi | pi ∈ B}.


                                              41
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance

      An Update subroutine can be defined which, for an input (A, B, pi ), where
  {A, B} is the current bipartition of X and pi ∈ B, computes the new bipartition
  moving pi from B to A and returns the weights of its clusters. The subroutine
  applies the procedures described in Sections 4 and 5 and it is defined by the
  following scheme:

  Procedure Update(A, B, pi )
  begin
       MoveBx Ax (xi )
       MoveBy Ay (yi )
       w(Ax ) := UpdateW-ins(Ax , xi )
       w(Bx ) := UpdateW-del(Bx , xi )
       w(Ay ) := UpdateW-ins(Ay , yi )
       w(By ) := UpdateW-del(By , yi )
       return (w(Ax ) + w(Ay ), w(Bx ) + w(By ))
  end

      By the discussion presented in Sections 4 and 5 also this subroutine only
  requires O(log n) time.
      Thus, the overall algorithm is described in detail by the procedure given be-
  low, where three bipartitions are actually maintained, denoted by (A, B), (A0 , B 0 )
  and (A00 , B 00 ), respectively. Each of them need two Bipartition Red-Black trees,
  one for the abscissae and the other for the ordinates.
      Moreover, instruction Sortx (resp., Sorty and Sortx+y ) yields an ordered list
  of the indices of the points in X, sorted in non-decreasing order with respect to
  the values xi (resp., yi and xi + yi ).
      Taking into account the previous discussion, it is easy to see that the algo-
  rithm works in O(n3 log n) time; moreover, it has a space complexity O(n) since
  each Bipartition Red-Black tree only requires linear space.

  Procedure Full-biclustering(p1 , p2 , . . . , pn )
  begin
       (i1 , i2 , . . . , in ) := Sortx (1, 2, . . . , n)
       (j1 , j2 , . . . , jn ) := Sorty (1, 2, . . . , n)
       (k1 , k2 , . . . , kn ) := Sortx+y (1, 2, . . . , n)
       A := ∅ ; w(A) := 0
       B := {p1 , p2 , . . . , pn }
       compute the weight w(B) of cluster B
       i0 := 0 ; j0 := 0 ; x0 := 0 ; y0 := 0
       for r = 0, 1, . . . , n do
              A0 := A ; B 0 := B ; w(A0 ) := w(A) ; w(B 0 ) := w(B)
              for s = 0, 1, . . . , n do
                      A00 := A0 ; B 00 := B 0 ; w(A00 ) := w(A0 ) ; w(B 00 ) := w(B 0 )
                      for k = k1 , k2 , . . . , kn do
                             if xk ≥ xir ∧ yk ≥ yjs then
                                 (w(A00 ), w(B 00 )) = Update(A00 , B 00 , pk )


                                              42
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance

                          h := min(|A00 |, |B 00 |)
                          z := w(A00 ) + w(B 00)
                                                      W [h] := z
                          if W [h] > z then
                                                      Π[h] := (ir , js , k)
                if s 6= n ∧ xjs+1 ≥ xir then
                     (w(A0 ), w(B 0 )) = Update(A0 , B 0 , pjs+1 )
                     h := min(|A0 |, |B 0 |)
                     z := w(A0 ) + w(B 0 )
                                             W [h] := z
                     if W [h] > z then
                                             Π[h] := (ir , js+1 , 0)
            if r 6= n then
                (w(A), w(B)) = Update(A, B, pir+1 )
                h := min(|A|, |B|)
                z := w(A) + w(B) 
                                        W [h] := z
                if W [h] > z then
                                        Π[h] := (ir+1 , 0, 0)
        return (W, Π)
  end


  7     Conclusions

  In this paper we have shown a O(n3 log n) time algorithm for the size-constrained
  2-Clustering problem in the plane with `1 -norm. It is clear that our results
  strictly depend on those conditions and it is not evident (at least at the present
  time) whether they hold under different hypotheses or for more general problems.
  Moreover, being our algorithm exact and working in polynomial time, we do not
  think it is necessary to produce here, in our hypotheses (2-clustering in R2 with
  `1 -norm), an experimental comparison with traditional heuristics like k-Means,
  which instead yields an approximate solution, tackles a more general problem
  with larger number of clusters, and works in exponential time in the worst case.
       However, we think that the present contribution could be the starting point
  for a further analysis, still based on the separation result introduced in [4], of
  more general clustering problems, with dimension greater than 2, larger number
  of clusters, different norms and other constraints. It is clear that such a research
  direction should also include an experimental activity that plans a comparison
  with classical heuristics.
       We also remark we have already obtained recently some results under other
  hypotheses. In particular, in a companion paper we present more efficient al-
  gorithms for solving exactly the same problem with Euclidean norm, based on
  rather different techniques of computational geometry. More precisely, it turns
  out that the 2-Clustering problem in R2 under Euclidean  √ norm, with size con-
  straint k, can be solved by an exact algorithm in O(n 3 k log2 n) time, while the
  full version of the same problem (i.e. for all k = 1, 2, . . . , bn/2c) is solvable in
  O(n2 log n) time.


                                            43
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance

  References
   1. D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hardness of Euclidean
      sum-of-squares clustering. Machine Learning, 75:245–249, 2009.
   2. D. Aloise and P. Hansen. On the complexity of minimum sum-of-squares clustering.
      Technical report, Les Cahiers du GERAD, 2007.
   3. S. Basu, I. Davidson, and K. Wagstaff. Constrained Clustering: Advances in Algo-
      rithms, Theory, and Applications. Chapman and Hall/CRC, 2008.
   4. A. Bertoni, M. Goldwurm, J. Lin, and F. Saccà. Size Constrained Distance Clus-
      tering: Separation Properties and Some Complexity Results. Fundamenta Infor-
      maticae, 115(1):125–139, 2012.
   5. C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
   6. P. S. Bradley, K. P. Bennett, and A. Demiriz. Constrained K-Means Clustering.
      Technical Report MSR-TR-2000-65, Miscrosoft Research Publication, May 2000.
   7. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algo-
      rithms. MIT Press, 2nd edition, 2001.
   8. S. Dasgupta. The hardness of k-means clustering. Technical Report CS2007-0890,
      Dept. Computer Sc. and Eng., Univ. of California, San Diego, 2007.
   9. W. D. Fisher. On grouping for maximum homogeneity. Journal of the American
      Statistical Association, 53(284):789–798, 1958.
  10. S. Hasegawa, H. Imai, M. Inaba, and N. Katoh. Efficient algorithms for variance-
      based k-clustering. In Proceedings of Pacific Graphics ’93, pages 75–89, 1993.
  11. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning:
      Data Mining, Inference, and Prediction. Springer-Verlag, 2nd edition, 2009.
  12. J. Lin. Exact algorithms for size constrained clustering. PhD Thesis, Università
      degli Studi di Milano. Ledizioni Publishing, Milano, 2013.
  13. J. B. MacQueen. Some method for the classification and analysis of multivariate
      observations. In Proc. 5th Berkeley Symp. Math. Structures, pages 281–297, 1967.
  14. M. Mahajan, P. Nimbhorkar, and K. Varadarajan. The planar k-means problem
      is NP-hard. Theoretical Computer Science, 442:13–21, 2012.
  15. K. Sabo. Center-based l1-clustering method. International Journal of Applied
      Mathematics and Computer Science, 24(1):7–225, 2014.
  16. A. Tung, J. Han, L. Lakshmanan, and R. Ng. Constraint-Based Clustering in Large
      Databases. In J. Van den Bussche and V. Vianu, editors, Database Theory ICDT
      2001, volume 1973 of LNCS, pages 405–419. Springer Berlin/Heidelberg, 2001.
  17. A. Vattani. K-means requires exponentially many iterations even in the plane. In
      Proceedings of the 25th Symposium on Computational Geometry (SoCG), 2009.
  18. K. Wagstaff and C. Cardie. Clustering with instance-level constraints. In Proc. of
      the 17th Intl. Conf. on Machine Learning, pages 1103–1110, 2000.
  19. S. Zhu, D. Wang, and T. Li. Data clustering with size constraints. Knowledge-
      Based Systems, 23(8):883–889, 2010.




                                           44