=Paper=
{{Paper
|id=Vol-1231/long2
|storemode=property
|title=Size-constrained 2-clustering in the plane with Manhattan distance
|pdfUrl=https://ceur-ws.org/Vol-1231/long2.pdf
|volume=Vol-1231
|dblpUrl=https://dblp.org/rec/conf/ictcs/BertoniGLP14
}}
==Size-constrained 2-clustering in the plane with Manhattan distance==
Size-constrained 2-clustering in the plane with
Manhattan distance
Alberto Bertoni, Massimiliano Goldwurm, Jianyi Lin, Linda Pini
Dipartimento di Informatica, Università degli Studi di Milano
Via Comelico 39/41, 20135 Milano – Italy,
Abstract. We present an algorithm for the 2-clustering problem with
cluster size constraints in the plane assuming `1 -norm, that works in
O(n3 log n) time and O(n) space. Such a procedure also solves a full
version of the problem, computing the optimal solutions for all possible
constraints on cluster sizes. The algorithm is based on a separation result
concerning the clusters of any optimal solution of the problem and on
an extended version of red-black trees to maintain a bipartition of a set
of points in the plane.
Keywords: algorithms and data structures, clustering, cluster size constraints,
Manhattan distance.
1 Introduction
Clustering is one of the most used techniques in statistical data analysis and ma-
chine learning [5], with a wide range of applications in many areas like pattern
recognition, bioinformatics and image processing. Clustering consists in parti-
tioning data into groups, called clusters, that are required to be homogeneous
and well-separated according to a data similarity measure [9]. A natural repre-
sentation of the data elements is by means of points in a d-dimensional space
equipped with a suitable distance function that determines the similarity mea-
sure. We study this very usual case, called distance clustering.
A central problem in distance clustering is the well-known Minimum Sum-of-
Squares Clustering (MSSC), also called Variance-based Clustering [2, 10]. The
MSSC problem requires to find a k-partition {A1 , ..., Ak } of a given set X =
{x1 , ..., xn } ⊂ Rd of n points that minimizes the weight
k X
X 2
W (A1 , ..., Ak ) = ||x − CAi ||2
i=1 x∈Ai
where ||·||2 denotes the Euclidean norm and CAi is the mean point of the cluster
Ai , defined by
X 2 1 X
CAi = argmin ||µ − x||2 = x.
µ∈Rd |Ai |
x∈Ai x∈Ai
33
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance
MSSC is difficult in general; indeed for an arbitrary dimension d (given in the
instance) the problem is NP-hard even if the number of clusters is fixed to k = 2
[1, 8]; the same occurs if k is arbitrary and the dimension is fixed to d = 2 [14].
However there’s a well-known heuristic for finding an approximate solution of
MSSC, called k-Means [13], which is known to be usually very fast, but can
require exponential time in the worst case [17].
Moreover, a known variant of MSSC, called k-Medians Problem, is given by
2
substituting the squared-euclidean norm ||·||2 with the `1 -norm ||·||1 [15].
In many applications, often people have some information on the clusters [3]:
including this information into traditional clustering algorithms can increase the
clustering performance. Problems that include such background information are
called constrained clustering problems and are split into two classes. On the one
hand, clustering problems with point-level constraints typically comprise a set
of must-link constraints or cannot-link constraints [18], defining pairs of point
that, respectively, must be or cannot be grouped in the same cluster. On the
other hand, clustering problems with cluster-level constraints [6, 16] provides
constraints concerning the size of the resulting clusters. For example, in [19]
cluster size constraints are used for improving clustering accuracy; this approach,
for instance, allows one to avoid extremely small or large clusters yielded by
classical clustering techniques.
In this work we study the 2-clustering problem in the plane, with cluster size
constraints, assuming `1 -norm. The relevance of the 2-clustering problems is also
due to the wide spread of hierarchical clustering techniques [11, Sec.14.3.12],
that repeatedly apply the 2-clustering as the key step. This leads to natural
size-constrained versions of so-called divisive hierarchical clustering.
We recall that the 2-clustering problem with cluster size constraints has been
studied in [12, 4], where it is shown that in dimension 1 the problem is solvable
in polynomial time for every norm `p with integer p ≥ 1, while there is some
evidence that the same result does not hold for non-integer p. Moreover, it is
also known that for arbitrary dimension d the same problem is NP-hard even
assuming equal sizes of the two clusters.
Here we prove that, assuming dimension 2 and `1 -norm, the 2-clustering
problem with cluster size constraints can be solved in O(n3 log n) time and O(n)
space. Our procedure actually solves a full version of the problem, computing the
optimal solutions for all possible sizes of cluster. Clearly this yields the solution
to the general unconstrained 2-clustering problem in the plane with `1 -norm.
The algorithm is based on a separation result stating that, in our hypotheses,
the clusters of any optimal solution are separated by curves of some special forms,
which can be analysed in a suitable linear order. The algorithm also make use
of a particular extension of red-black trees to maintain a bipartition of a set of
points in the plane.
We remark that in this work we propose an efficient method for obtaining a
solution that is globally optimal, instead of a locally optimal solution as yielded
by various heuristics [3, 6, 19].
34
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance
2 Problem definition
In this section we introduce some basic notions of distance-based clustering prob-
lems on the plane assuming the Manhattan norm (also called `1 -norm), which
is defined by kxk1 = |x1 | + |x2 | for any x = (x1 , x2 ) ∈ R2 .
Given a set X ⊂ R2 of n points in the plane, a k-partition of X is a family
{A1 , A2 , ..., Ak } of k nonempty subsets of X such that ∪ki=1 Ai = X and Ai ∩Aj =
∅, for i 6= j. Each Ai is called cluster. The centroid CA of a cluster A ⊂ X is
X X
CA = argmin kx − µk1 = argmin (|x1 − µ1 | + |x2 − µ2 |)
µ∈R2 x∈A µ∈R2 x∈A
Since the objective function in the minimization is not strictly convex, the cen-
troid CA is not necessarily unique. Indeed it is well-known that the centroid CA
is the component-wise median, i.e. the abscissa (ordinate) of the centroid is the
median of the abscissae (ordinates) of the points in A. The weight W (A) of a
cluster A is X
W (A) = kx − CA k1
x∈A
while the weight of a k-partition {A1 , A2 , ..., Ak } (also called k-clustering) is
k
X
W (A1 , A2 , · · · , Ak ) = W (Ai ).
1
The classical Clustering Problem in the plane assuming the Manhattan norm is
stated as follows.
Definition 1 (Clustering Problem in R2 under `1 -norm). Given a point
set X ⊂ R2 of cardinality n and an integer k, 1 < k < n, find P a k-clustering
k
{A1 , A2 , ..., Ak } that minimizes the weight W (A1 , A2 , · · · , Ak ) = 1 W (Ai ).
Note that here k is included in the instance of the problem. If k is fixed the prob-
lem is called k-Clustering in R2 . Our main contribution in this paper concerns
the 2-Clustering in R2 under `1 -norm.
We are interested in a version of clustering problem where the cluster sizes
are constrained. Formally, the problem can be stated as follows:
Definition 2 (Size Constrained Clustering Problem in R2 under `1 -
norm). Given a point set X ⊂ R2 of cardinality n, an integer k > 1 and
Pk
k positive integers m1 , m2 , ..., mk such that 1 mi = n, find a k-clustering
{A1 , A2 , ..., Ak } with |Ai | = mi for i = 1, ..., k, that minimizes the weight
Pk
W (A1 , A2 , · · · , Ak ) = 1 W (Ai ).
We denote this problem by SCC-2 (under `1 -norm). We stress that in the SCC-2
problem the integers n, k, m1 , . . . , mk are part of the instance. On the contrary,
if k is fixed and does not belong to the instance, the problem is denoted by
k-SCC-2.
35
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance
In the following sections we present an algorithm for the 2-Clustering problem
in R2 under `1 -norm, which solves at the same time the full version of the 2-
SCC-2 problem, i.e. that yields the solutions to all 2-SCC-2 problems for every
value of m1 ∈ {1, 2, . . . , bn/2c} and the same point set X = {x1 , x2 , ..., xn } ⊂ R2
in input. The procedures works in O(n3 log n) time and is based on a separation
result concerning the optimal solutions of 2-Clustering in the plane, presented
in Section 3, and on an augmented version of red-black trees used to maintain
bipartitions of a set of real points, described in Section 4.
3 Separation results
In this section we present a separation result concerning the optimal solution of
2-SCC-2 problem under `1 -norm. It turns out that the clusters of the optimal
solutions of this problem are separated by plane curves of 8 different types, we
will call Ci and Si , i = 1, 2, 3, 4, respectively. Similar results are obtained in [4]
for higher dimensional spaces and `p -norms, with p > 1.
First, we show that in an optimal 2-clustering the swapping of two elements
in different clusters does not decrease the solution value.
Lemma 1 (Swapping Lemma). Let {A, B} be an optimal solution of the 2-
SCC-2 problem. Then, for any a ∈ A and b ∈ B, it holds
||a − CA ||1 + ||b − CB ||1 ≤ ||a − CB ||1 + ||b − CA ||1 .
Proof. For the sake of simplicity, we omit the subscript 1 in denoting norm. By
contradiction, let us assume that for some a ∈ A and b ∈ B
||a − CA || + ||b − CB || > ||a − CB || + ||b − CA || .
Then, the weight of the optimal solution is
X X
W (A, B) = ||x − CA || + ||x − CB || =
x∈A x∈B
X X
> ||x − CA || + ||b − CA || + ||x − CB || + ||a − CB || =
x∈Ar{a} x∈Br{b}
X X
= ||x − CA || + ||x − CB || .
x∈Ar{a}∪{b} x∈Br{b}∪{a}
Now, it is clear that the 2-clustering {A0 , B 0 }, with A0 = A r {a} ∪ {b} and
B 0 = B r {b} ∪ {a}, is a feasible solution since |A0 | = |A| and |B 0 | = |B|.
Furthermore, by definition of centroid it holds:
X X X X
||x − CA ||+ ||x − CB || ≥ ||x − CA0 ||+ ||x − CB 0 || = W (A0 , B 0 )
x∈A0 x∈B 0 x∈A0 x∈B 0
and hence W (A, B) > W (A0 , B 0 ). However, this is in contradiction since {A, B}
is an optimal solution. t
u
36
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance
We are now able to obtain the general form of the equation for the curves
separating the clusters in an optimal 2-clustering.
Theorem 1 (Separation Result). In an optimal solution {A, B} of the 2-
SCC-2 problem, the clusters A and B are separated by a curve of equation
||z − α||1 − ||z − β||1 = g (1)
where the variable z ranges over R2 , while α, β ∈ R2 and g ∈ R are suitable
constants.
Proof. Let CA and CB be the centroids of A and B respectively. Then, by Lemma
1 the inequality
||a − CA ||1 − ||a − CB ||1 ≤ ||b − CA ||1 − ||b − CB ||1
is satisfied for all a ∈ A and b ∈ B, and hence it also holds by taking the
maximum over a’s on the left and the minimum over b’s on the right:
cA := max{||a − CA ||1 − ||a − CB ||1 } ≤ min{||b − CA ||1 − ||b − CB ||1 } =: cB
a∈A b∈B
Therefore, choosing g ∈ [cA , cB ], all the points of A and B are contained in the
region defined respectively by ||z − CA ||1 − ||z − CB ||1 ≤ g and ||z − CA ||1 −
||z − CB ||1 ≥ g. The common boundary of the two regions has equation
||z − CA ||1 − ||z − CB ||1 = g, and thus the proof is concluded by setting α = CA
and β = CB . t
u
Remark 1. It is noteworthy to observe that both the Swapping Lemma and the
Separation Result can be extended to the case of `p -norm with any p ≥ 1 and
dimension d ≥ 2, provided the norm is raised to the p-th power [12].
Setting the symbols z = (x, y), α = (c, d), β = (e, f ), equation (1) becomes
|x − c| − |x − e| + |y − d| − |y − f | = g
It is clear that it always represents a connected curve on the plane, which may
have some sharp corner points, i.e. points where one or both partial derivatives
do not exist. A rather standard study of this equation, conducted by considering
all possible values of c, d, e, f, g, leads to derive 8 types of separating curves, we
denote by symbols C1 , C2 , C3 , C4 and S1 , S2 , S3 , S4 , according to their shape.
The proof is here omitted for lack of space. Such 8 types of curves are depicted
in Figures 1–4.
4 Bipartition red-black trees
In this section we describe a natural data structure to maintain a bipartition of
a set of real numbers, with possible multiple values, able to test membership, to
execute insertion and deletion, and to perform two further operations: selecting
37
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance
the i-th smallest element in either of the two clusters of the bipartition, for any
integer i, and moving an element from one cluster to the other.
Let X be a finite multiset of real numbers, i.e. a finite set X of reals where
any value may occur several times, represented as a finite ordered sequence
X = (x1 , x2 , . . . , xn ), xi ∈ R for every i, and x1 ≤ x2 ≤ · · · ≤ xn .
Moreover, consider a bipartition {A, B} of X, i.e. two non-empty subsets A ⊆ X,
B ⊆ X such that A ∩ B = ∅ and A ∪ B = X. In the following A and B are
called clusters. Operations Member, Insert and Delete can be defined in order
to specify the cluster the elements belong to. More precisely, for every z ∈ R,
Member(z) = (j1 , j2 ), where j1 (resp. j2 ) is the number of xi ’s in A (resp. in B)
such that xi = z, Insert(z, A) adds a new element z to A, Insert(z, B) does the
same to B, while Delete(z, A) and Delete(z, B) execute the delete operations.
C1 S2 C1
β
f
S1 @
@ @ β
@ f
@ @@
@ @
C2 @ C2
d @ d @ @
α α
c e c e
Figure 1 Figure 2
α
d
α
C3 d
S3 C3
f f
β β
c C4 e c S4 C4 e
Figure 3 Figure 4
38
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance
Moreover, for every i ∈ {1, 2, . . . , n}, we define
a if i ≤ |A| and a is the i-th smallest element of A
SelectA (i) =
⊥ if i > |A|
b if i ≤ |B| and b is the i-th smallest element of B
SelectB (i) = .
⊥ if i > |B|
Finally, for any z ∈ R, if A contains an element xi = z then MoveAB (z) deletes
a value z from A and adds it to B, otherwise MoveAB (z) = ⊥. The operation
MoveBA (z) is defined symmetrically.
A natural data structure, able to executes the above operations in time log-
arithmic with respect to n = |X|, is an augmented version of the well-known
red-black trees [7]. Formally, we define a Bipartition Red-Black Tree for a bipar-
tition {A, B} of X as a binary tree T such that: i ) all internal nodes have both
the left and the right son, ii ) every node is red or black, iii ) the root and the
leaves are black, iv ) if a node is red its sons are black, and v ) for every node v all
simple paths from v to any leaf have the same number of black nodes. Further,
the distinct elements of X (called keys) are assigned to the internal nodes in
a bijective correspondence, respecting the standard rule of binary search trees,
and the leaves are empty (usually represented by “nil”). Moreover, T satisfies
the following two conditions:
1. every internal node v contains a triple (z, mA (v), mB (v)), where z is the key
assigned to v, and mA (v) (respectively, mB (v)) is the number of elements xi
belonging to A (resp. B) such that xi = z;
2. every internal node v contains the pair of integers (cA (v), cB (v)), where cA (v)
(resp. cB (v)) is the number of elements of X belonging to A (resp. B) whose
value is assigned to a vertex in the subtree rooted at v.
For each internal node v, we denote by key(v), left(v) and right(v) the key
contained in v, the left son and the right son of v, respectively. Clearly, v can be
implemented as a record including the values mA (v), mB (v)), cA (v), cB (v).
Procedures Member, Insert and Delete can be defined as in the standard red-
blacks trees [7] with obvious changes. Clearly, Insert and Delete have to update
the values mA (v), mB (v)), cA (v), cB (v), for all nodes v on the path from the
root to the vertex resulting from the binary search.
Operations Select and Move can be executed as follows. To compute SelectA (i)
one checks whether i ≤ cA (r), where r is the root of T : in the affirmative case
a standard searching procedure (we avoid to define for lack of space) is called,
otherwise the value ⊥ is returned. SelectB (i) is defined in a similar way.
To compute MoveAB (z) one first determines the node v containing z by
calling a rather usual procedure, then the commands
mA (v) = mA (v) − 1 and mB (v) = mB (v) + 1
are executed. Procedure MoveBA (z) is defined symmetrically.
It is clear that all previous procedures can be executed in O(log n) time.
39
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance
5 Updating the weight of clusters
Our main result, presented in the next section, is based on a procedure that
updates a current bipartition by moving one point from a cluster to the other.
Here we want to show how to update the centroids and the weights of the clusters
of a current bipartition after executing such a moving operation.
To this end we first deal with the problem of updating centroid and weight
of a set of real numbers subject to insert and delete operations. The results will
be easily extended to a set of points in R2 .
Let A be a cluster in R, i.e. a finite sequence of ordered real numbers, A =
{a1 , a2 , . . . , ak }, where a1 ≤ a2 ≤ · · · ≤ ak . Assuming `1 -norm it is well-known
that the centroid of A is the median of the set, that is the value
(
a k+1 if k is odd
M (A) = 1 2
2 a k + a k +1 if k is even
2 2
Thus, if A is a cluster of a bipartition (of a finite multiset X ⊂ R of n elements),
maintained by a Bipartition Red-Black Tree, then M (A) can be computed in
O(log n) time by calling procedure SelectA (j) for suitable j’s.
Now, let us recall the definition of the left and right part of the weight of A;
given m = k+12 , we have
X X
L(A) := ai , R(A) := ai
1≤i 0 and yi > 0 for all i.
We define a procedure that searches an optimal bipartition of X whose clus-
ters are separated by a curve of type C1 . The procedure first sorts X according
with 3 parameters: the two coordinates and their sum, i.e. the values xi ’s, yi ’s and
xi + yi ’s, respectively. Then the elements of X are processed by three for-loops,
each nested into the other, corresponding to the three parameters considered
above. The algorithm evaluates all possible bipartitions of X whose clusters are
separated by curves of type C1 and, for every admissible cluster size, only the
bipartition with minimum weight is maintained. These bipartitions are consid-
ered in a linear order and, at each step, a new bipartition of X is obtained from
the current one {A, B} by swapping a point from cluster B to cluster A.
Such a current bipartition {A, B} is maintained by two bipartition red-black
trees: one for the 2-clustering {Ax , Bx } of the set {xi | pi = (xi , yi ) ∈ X},
where Ax = {xi | pi ∈ A} and Bx = {xi | pi ∈ B}, and the other for the
2-clustering (Ay , By ) of {yi | pi = (xi , yi ) ∈ X}, where Ay = {yi | pi ∈ A} and
By = {yi | pi ∈ B}.
41
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance
An Update subroutine can be defined which, for an input (A, B, pi ), where
{A, B} is the current bipartition of X and pi ∈ B, computes the new bipartition
moving pi from B to A and returns the weights of its clusters. The subroutine
applies the procedures described in Sections 4 and 5 and it is defined by the
following scheme:
Procedure Update(A, B, pi )
begin
MoveBx Ax (xi )
MoveBy Ay (yi )
w(Ax ) := UpdateW-ins(Ax , xi )
w(Bx ) := UpdateW-del(Bx , xi )
w(Ay ) := UpdateW-ins(Ay , yi )
w(By ) := UpdateW-del(By , yi )
return (w(Ax ) + w(Ay ), w(Bx ) + w(By ))
end
By the discussion presented in Sections 4 and 5 also this subroutine only
requires O(log n) time.
Thus, the overall algorithm is described in detail by the procedure given be-
low, where three bipartitions are actually maintained, denoted by (A, B), (A0 , B 0 )
and (A00 , B 00 ), respectively. Each of them need two Bipartition Red-Black trees,
one for the abscissae and the other for the ordinates.
Moreover, instruction Sortx (resp., Sorty and Sortx+y ) yields an ordered list
of the indices of the points in X, sorted in non-decreasing order with respect to
the values xi (resp., yi and xi + yi ).
Taking into account the previous discussion, it is easy to see that the algo-
rithm works in O(n3 log n) time; moreover, it has a space complexity O(n) since
each Bipartition Red-Black tree only requires linear space.
Procedure Full-biclustering(p1 , p2 , . . . , pn )
begin
(i1 , i2 , . . . , in ) := Sortx (1, 2, . . . , n)
(j1 , j2 , . . . , jn ) := Sorty (1, 2, . . . , n)
(k1 , k2 , . . . , kn ) := Sortx+y (1, 2, . . . , n)
A := ∅ ; w(A) := 0
B := {p1 , p2 , . . . , pn }
compute the weight w(B) of cluster B
i0 := 0 ; j0 := 0 ; x0 := 0 ; y0 := 0
for r = 0, 1, . . . , n do
A0 := A ; B 0 := B ; w(A0 ) := w(A) ; w(B 0 ) := w(B)
for s = 0, 1, . . . , n do
A00 := A0 ; B 00 := B 0 ; w(A00 ) := w(A0 ) ; w(B 00 ) := w(B 0 )
for k = k1 , k2 , . . . , kn do
if xk ≥ xir ∧ yk ≥ yjs then
(w(A00 ), w(B 00 )) = Update(A00 , B 00 , pk )
42
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance
h := min(|A00 |, |B 00 |)
z := w(A00 ) + w(B 00)
W [h] := z
if W [h] > z then
Π[h] := (ir , js , k)
if s 6= n ∧ xjs+1 ≥ xir then
(w(A0 ), w(B 0 )) = Update(A0 , B 0 , pjs+1 )
h := min(|A0 |, |B 0 |)
z := w(A0 ) + w(B 0 )
W [h] := z
if W [h] > z then
Π[h] := (ir , js+1 , 0)
if r 6= n then
(w(A), w(B)) = Update(A, B, pir+1 )
h := min(|A|, |B|)
z := w(A) + w(B)
W [h] := z
if W [h] > z then
Π[h] := (ir+1 , 0, 0)
return (W, Π)
end
7 Conclusions
In this paper we have shown a O(n3 log n) time algorithm for the size-constrained
2-Clustering problem in the plane with `1 -norm. It is clear that our results
strictly depend on those conditions and it is not evident (at least at the present
time) whether they hold under different hypotheses or for more general problems.
Moreover, being our algorithm exact and working in polynomial time, we do not
think it is necessary to produce here, in our hypotheses (2-clustering in R2 with
`1 -norm), an experimental comparison with traditional heuristics like k-Means,
which instead yields an approximate solution, tackles a more general problem
with larger number of clusters, and works in exponential time in the worst case.
However, we think that the present contribution could be the starting point
for a further analysis, still based on the separation result introduced in [4], of
more general clustering problems, with dimension greater than 2, larger number
of clusters, different norms and other constraints. It is clear that such a research
direction should also include an experimental activity that plans a comparison
with classical heuristics.
We also remark we have already obtained recently some results under other
hypotheses. In particular, in a companion paper we present more efficient al-
gorithms for solving exactly the same problem with Euclidean norm, based on
rather different techniques of computational geometry. More precisely, it turns
out that the 2-Clustering problem in R2 under Euclidean √ norm, with size con-
straint k, can be solved by an exact algorithm in O(n 3 k log2 n) time, while the
full version of the same problem (i.e. for all k = 1, 2, . . . , bn/2c) is solvable in
O(n2 log n) time.
43
M.Bertoni et al. Size-constrained 2-clustering in the plane with Manhattan distance
References
1. D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hardness of Euclidean
sum-of-squares clustering. Machine Learning, 75:245–249, 2009.
2. D. Aloise and P. Hansen. On the complexity of minimum sum-of-squares clustering.
Technical report, Les Cahiers du GERAD, 2007.
3. S. Basu, I. Davidson, and K. Wagstaff. Constrained Clustering: Advances in Algo-
rithms, Theory, and Applications. Chapman and Hall/CRC, 2008.
4. A. Bertoni, M. Goldwurm, J. Lin, and F. Saccà. Size Constrained Distance Clus-
tering: Separation Properties and Some Complexity Results. Fundamenta Infor-
maticae, 115(1):125–139, 2012.
5. C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
6. P. S. Bradley, K. P. Bennett, and A. Demiriz. Constrained K-Means Clustering.
Technical Report MSR-TR-2000-65, Miscrosoft Research Publication, May 2000.
7. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algo-
rithms. MIT Press, 2nd edition, 2001.
8. S. Dasgupta. The hardness of k-means clustering. Technical Report CS2007-0890,
Dept. Computer Sc. and Eng., Univ. of California, San Diego, 2007.
9. W. D. Fisher. On grouping for maximum homogeneity. Journal of the American
Statistical Association, 53(284):789–798, 1958.
10. S. Hasegawa, H. Imai, M. Inaba, and N. Katoh. Efficient algorithms for variance-
based k-clustering. In Proceedings of Pacific Graphics ’93, pages 75–89, 1993.
11. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning:
Data Mining, Inference, and Prediction. Springer-Verlag, 2nd edition, 2009.
12. J. Lin. Exact algorithms for size constrained clustering. PhD Thesis, Università
degli Studi di Milano. Ledizioni Publishing, Milano, 2013.
13. J. B. MacQueen. Some method for the classification and analysis of multivariate
observations. In Proc. 5th Berkeley Symp. Math. Structures, pages 281–297, 1967.
14. M. Mahajan, P. Nimbhorkar, and K. Varadarajan. The planar k-means problem
is NP-hard. Theoretical Computer Science, 442:13–21, 2012.
15. K. Sabo. Center-based l1-clustering method. International Journal of Applied
Mathematics and Computer Science, 24(1):7–225, 2014.
16. A. Tung, J. Han, L. Lakshmanan, and R. Ng. Constraint-Based Clustering in Large
Databases. In J. Van den Bussche and V. Vianu, editors, Database Theory ICDT
2001, volume 1973 of LNCS, pages 405–419. Springer Berlin/Heidelberg, 2001.
17. A. Vattani. K-means requires exponentially many iterations even in the plane. In
Proceedings of the 25th Symposium on Computational Geometry (SoCG), 2009.
18. K. Wagstaff and C. Cardie. Clustering with instance-level constraints. In Proc. of
the 17th Intl. Conf. on Machine Learning, pages 1103–1110, 2000.
19. S. Zhu, D. Wang, and T. Li. Data clustering with size constraints. Knowledge-
Based Systems, 23(8):883–889, 2010.
44