<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Exact Pseudopolynomial Algorithm for a Problem of Finding a Family of Disjoint Subsets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexandr Galashov</string-name>
          <email>galashov.alexandr@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Kel'manov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Novosibirsk State University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sobolev Institute of Mathematics</institution>
          ,
          <addr-line>4 Koptyug Ave., 630090 Novosibirsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>501</fpage>
      <lpage>509</lpage>
      <abstract>
        <p>We consider a strongly NP-hard problem of nding a family of disjoint subsets with given cardinalities in a nite set of points from Euclidean space. The minimum of the sum over all subsets from required family of the sum of the squared distances from the elements of these subsets to their centers is used as a search criterion. The subsets centers are optimizable variables de ned as the mean values of the elements of these subsets. With an additional restriction on the problem that the coordinates of the input points are integer, an exact algorithm is proposed. This algorithm is pseudopolynomial in the case of xed space dimension and of xed number of required subsets.</p>
      </abstract>
      <kwd-group>
        <kwd>Euclidean space</kwd>
        <kwd>subsets search</kwd>
        <kwd>clustering</kwd>
        <kwd>NP-hard problem</kwd>
        <kwd>exact pseudopolynomial-time algorithm</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The subject of our study is a strongly NP-hard problem of nding a family of disjoint
subsets in a nite set of points from Euclidean space. Our aim is justi cation of an
exact pseudopolynomial algorithm for a special case of this problem.</p>
      <p>The investigation is motivated by poor study of the problem and its importance in
applications, in particular, for mathematical problems of data analysis, approximation
theory and mathematical statistics. The problem models a situation where it's required
to classify noisy data provided from experimental observations for the states of some
material objects (see [1{4] and works cited there).</p>
      <p>The paper is organized as follows. In the next section the formal de nition of the
problem under study is given; an example of application (origin) of the problem being
investigated is also presented. In Section 3, we provide a review of the known results and
announce the obtained algorithmic result. Basic de nitions and statements, that give
elements to prove the properties of the proposed algorithm, are presented in Section
Copyright ⃝c by the paper's authors. Copying permitted for private and academic purposes.
In: A. Kononov et al. (eds.): DOOR 2016, Vladivostok, Russia, published at http://ceur-ws.org
4. Finally, in Section 5 we construct an exact algorithm for solving the problem under
study, justify its properties and show its pseudopolynomiality for a special case of the
problem.
1</p>
    </sec>
    <sec id="sec-2">
      <title>Problem formulation and its origin</title>
      <p>Everywhere below R denotes the set of real numbers, ∥ ∥ denotes the Euclidean norm,
and Z denotes the set of integer numbers.</p>
      <p>We consider the following problem.</p>
      <p>Problem 1. Given a set Y = fy1; : : : ; yN g of points from Rq and some positive integers
M1; : : : ; MJ . Find a family fC1; : : : ; CJ g of disjoint subsets of Y such that</p>
      <p>J
F (C1; : : : ; CJ ) = ∑ ∑ ∥y
j=1 y2Cj
y(Cj )∥2 ! min ;
where y(Cj ) = 1 ∑ y is the centroid (geometrical center) of the subset Cj , under
jCjj y2Cj
constraints jCj j = Mj ; j = 1; : : : ; J , and</p>
      <p>J
∑ Mj
j=1</p>
      <p>N ;
on the cardinalities of the required subsets.</p>
      <p>The Problem 1 has the following origin [5]. Given a table Y = fy1; : : : ; yN g
containing results of multiple measurements of the q-dimenstional tuple y of numerical
characteristics corresponding to J unique objects. Numbers Mj , j = 1; : : : ; J , of
measurements for the j-th object are known. Moreover, the table includes (N ∑jJ=1 Mj )
results of single measurements of the states of some random objects. Each result of
measurement, presented in the table, has an error. Furthermore, the correspondence
between the measurements and the objects is unknown. It's required to nd a family
fC1; : : : ; CJ g of disjoint subsets of the set Y, using the minimum of squared distances
criterion, where each subset contains the elements corresponding to the appropriate
unique object, and estimate values y(C1); : : : ; y(CJ ) corresponding to the numerical
characteristics of the unique objects (assuming that the data has a measurement
error).</p>
      <p>The Problem 1 regularly appears as a mathematical problem, in particular, for Data
Analysis and Pattern Recognition, and is induced [5] by the following approximation
model.</p>
      <p>
        Given a set Y = fy1; : : : ; yN g of points from Rq and some positive integers M1; : : : ; MJ .
Find a family fM1; : : : ; MJ g of disjoint subsets of the set N = f1; : : : ; N g such that
∑
where
8w1 2 Rq; n 2 M1;
&gt;
&gt;&gt;&lt; . . .
&gt;wJ 2 Rq; n 2 MJ ;
&gt;
&gt;
:vn 2 Rq; n 2 N n([jJ=1Mj );
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
under the same constraints on the required subsets, as in Problem 1.
      </p>
      <p>
        This problem consists in approximation of the sequence yn by the sequence xn using
the minimum of squared distances criterion under the condition that the structure of the
sequence xn is described by the formula (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ). In this formula, the points w1; : : : ; wJ are
interpreted as the numerical characteristics describing the unique objects. The points
from the set fvi; i 2 N n([jJ=1Mj )g are considered as the collection of characteristics
describing the random objects.
      </p>
      <p>
        Having uncovered the sum (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) using (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) and grouped the terms, we can check
using derivation, that for each xed collection fM1; : : : ; MJ g of sets, the values wj =
y(Mj ); j = 1; : : : ; J , provide the points where the objective function (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) attains its
minimum. If we put these values in the formulae (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ), (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) and let Cj = fyi 2 Y j i 2 Mj g,
then we can simply verify (see also [5]) that the model (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ), (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) induces Problem 1.
      </p>
      <p>A similar to Problem 1 is well-known strongly NP-hard [6] problem MSSC
(Minimum Sum-of-Squares Clustering) of clustering, which is also known as k-means [7],
[8], [9]. In this problem the objective function is the same as in Problem 1, but the
cardinalities of the required clusters are optimizable variables and instead of searching
a family of subsets which union could not cover all input set, we have to nd a partition
of this set.</p>
      <p>
        Notice, that the numbers from the set N n([jJ=1Mj ) in the model (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ), (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) correspond
to the elements from the set Yn([jJ=1Cj ) in Problem 1. Elements of these sets do not
appear in the formulae for the estimation of the values wj , j = 1; : : : ; J , and the
corresponding centroids y(Cj ), j = 1; : : : ; J , in Problem 1. Therefore, Problem 1 could
be considered as a problem of Data Censoring [10]. In this applied problem, a part
of the data (generally, unknown part) of obtained experimental results in a situation
of random failure of measurement instument has to be excluded from the estimation
procedure, because this data distorts nal estimations.
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Known and obtained results</title>
      <p>The strong NP-hardness of Problem 1 is implied from the results obtained in [11],
since in the cited work it was proved that the special case of Problem 1 when J = 1 is
NP-hard in the strong sense problem.</p>
      <p>The Problem 1 is referred to the algorithmically poorly-studied problems of the
discrete optimization. In [5] for this problem, was proposed a 2-approximation algorithm
which time complexity is equal to O(N 2(N J+1 + q)). For the case of Problem 1 when
the number J of required subsets is xed (not the input of the problem), this algorithm
is polynomial. Currently, there are no other algorithmic results for Problem 1 and the
known results [12{15] were obtained only for its special case when J = 1. These results
are described below.</p>
      <p>For Problem 1 in [12], a 2-approximation polynomial-time algorithm of complexity
O(qN 2) was constructed.</p>
      <p>For the variation of Problem 1 with an additional restriction that the coordinates
of the input points are integer and for the case of xed space dimension, in [13] an
exact pseudopolynomial algorithm was presented, which time complexity is equal to
O(N (M B)q), where B is the maximum absolute value of the coordinates of the input
points.</p>
      <p>Furthermore, for the case of the xed space dimension in [14] an FPTAS was
proposed. This scheme for a given relative error " nds (1 + ")-approximate solution in
O(N 2(M=")q) time, that is polynomial in the size of input and 1=".</p>
      <p>A PTAS of complexity O(qN 2="+1(9=")3="), where " is a guaranteed relative error,
was found in [15].</p>
      <p>In the current work, for the variation of Problem 1 with an additional restriction
that the coordinates of the input points are integer, an algorithm which nds an exact
solution in O(N (N 2 + qJ )(2M B + 1)qJ + J 2 log2N ) time is constructed, where B is
the maximum absolute value of the coordinates of the input points and M is the least
common multiple for the numbers M1; : : : ; MJ . In the case of the xed dimension q
of the space and of the xed number J of required subsets, the proposed algorithm is
pseudopolynomial and its time complexity is bounded by O(N 3(M B)qJ ).
3</p>
    </sec>
    <sec id="sec-4">
      <title>Fundamentals of algorithm</title>
      <p>In order to justify the properties of the algorithm, we need a few basic statements, an
auxiliary problem and an exact polynomial-time algorithm nding its solution.</p>
      <p>The following statements are the geometrical basis of the algorithm.</p>
      <p>Lemma 1. Let all conditions of Problem 1 be satis ed and Y</p>
      <p>Zq. Moreover, let
y2Y i2f1;:::;qg j(y)ij</p>
      <p>B = max max
be the maximum absolute value of the coordinates of the input points, where (y)i is the
i-th coordinate of the point y. Then, y(Cj ) 2 D; j = 1; : : : ; J , where
1</p>
      <p>M
D = fz 2 Rqj (z)k =
(v)k; (v)k 2 Zq; j(v)kj</p>
      <p>M B; k = 1; : : : ; qg ;
where M is the least common multiple for the numbers M1; : : : ; MJ .</p>
      <p>Proof. According to the de nition of the geometrical center (as the mean over the set),
we have
(y(Cj ))k =
1</p>
      <p>∑ (y)k =
jCj j y2Cj
p
Mj
; j = 1; : : : ; J; k = 1; : : : ; q ;
where p is an integer such that
jpj = j ∑ (y)kj</p>
      <p>Mj B</p>
      <p>
        M B :
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
      </p>
      <p>By the de nition of the least common multiple M , for each j = 1; : : : ; J , there
exists an integer s(j) such that 0 &lt; s(j) M , and M = s(j)Mj . Therefore,
p
Mj
=
s(j)p
s(j)Mj
=
s(j)p</p>
      <p>M
; j = 1; : : : ; J :
To conclude, notice that s(j)p 2 Z and
js(j)pj</p>
      <p>M</p>
      <p>Bs(j)</p>
      <p>BM :
jDj = (2M B + 1)q :</p>
      <p>The set D is the multi-dimensional grid with the rational step and with a center in
the origin. For the cardinality of this grid the following obvious equality holds
Lemma 2. For each non-empty nite set Z of points from Rq, the minimum over x
of the z∑2Z ∥z x∥2 is attained at x = z = jZ1 j ∑z2Z z.</p>
      <p>It could be easily checked by the derivation.</p>
      <p>The calculation basis of the algorithm is given by an exact polynomial-time
algorithm nding the solution of the following auxiliary problem.</p>
      <p>Problem 2. Given a set Y = fy1; : : : ; yN g, a tuple b = (b1; : : : ; bJ ) of points from Rq
and some positive integers M1; : : : ; MJ . Find a family fB1; : : : ; BJ g of disjoint subsets
of the set Y such that</p>
      <p>J
Gb(B1; : : : ; BJ ) = ∑ ∑ ∥y</p>
      <p>
        bj ∥2 ! min ;
j=1 y2Bj
under constraints jBj j = Mj ; j = 1; : : : ; J , and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) on the cardinalities of the required
subsets.
      </p>
      <p>The Problem 2 can be reduced [5] to the known assignment problem (see for example
[16]) and its exact solution can be found [5] in O(N (N 2 + qJ )) time. In the following,
the algorithm solving Problem 2 is noted by A1.</p>
      <p>The following technical lemma lies in the fundamentals of the alorithm and is given
for the completeness of the description.</p>
      <p>Lemma 3. The least common multiple for the numbers M1; : : : ; MJ , can be found in
O(J ∑jJ=1 log2 Mj ) time.</p>
      <p>Proof. Let M = lcm(M1; : : : ; MJ ) be the least common multiple for the numbers
M1; : : : ; MJ . Using well known properties of the least common multiple, we have the
following equalities.</p>
      <p>lcm(Mi; Mj ) =</p>
      <p>MiMj
gcd(Mi; Mj )
;</p>
      <p>
        ⊔⊓
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
(
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
(
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
where gcd(Mi; Mj ) is the greatest common divisor for the numbers Mi; Mj .
lcm(M1; : : : ; MJ ) = lcm(lcm(M1; : : : ; MJ 1); MJ ) :
(
        <xref ref-type="bibr" rid="ref10">10</xref>
        )
      </p>
      <p>It's known that the greatest common divisor for the numbers a; b can be found by
the Euclidean algorithm in O(log2(max(a; b))) time.</p>
      <p>
        From (
        <xref ref-type="bibr" rid="ref9">9</xref>
        ) and (
        <xref ref-type="bibr" rid="ref10">10</xref>
        ), the obvious recurrent algorithm of computing the least common
multiple for the numbers M1; : : : ; MJ , is implied. For the complexity of computing the
least common multiple for the numbers M1; : : : ; MJ , we have
      </p>
      <p>J 1
∑(log(max(MJ 1+1; lcm(M1; : : : ; MJ i)))2
i=1
J J i+1
∑(log2( ∏
i=1
k=1</p>
      <p>Mk))
where we used the following obvious upper bound on the least common multiple
lcm(M1; : : : ; MJ i)</p>
      <p>J i
∏ Mk :
k=1</p>
      <p>J
= O(J (∑ log2 Mk)) ;
k=1
⊔⊓
4</p>
    </sec>
    <sec id="sec-5">
      <title>Exact pseudopolynomial algorithm</title>
      <p>The idea of the proposed algorithm is as follows. In the region of the input space,
de ned by the maximum absolute value of the coordinates of the input points, a
multidimensional grid is constructed which is uniform over each coordinate and has the
rational step. By construction, all centroids including optimals belong to the grid.</p>
      <p>For each tuple of J points from the constructed grid, the auxiliary Problem 2 is
solved. Its solution | a family of disjoint subsets | is included in the collection of the
candidates on the solution for Problem 1.</p>
      <p>The family of the subsets with minimum value of the objective function of Problem
2 is taken as a solution of Problem 1.</p>
      <p>Let us formulate an algorithm which implements the described approach.
A l g o r i t h m A.</p>
      <p>Input : Set Y and positive integers M1; : : : ; MJ .</p>
      <p>
        Step 1. Find the least common multiple M for the numbers M1; : : : ; MJ using (
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
and (
        <xref ref-type="bibr" rid="ref10">10</xref>
        ), and the value B using formula (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ). Construct the grid D using formula (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ).
      </p>
      <p>
        Step 2. For every tuple d = (d1; : : : ; dJ ) 2 DJ , using Algorithm A1, nd and
memorize the exact solution fB1(d); : : : ; BJ (d)g of the auxiliary Problem 2 and the
value of the objective function Gd, putting b = d in the formula (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ).
      </p>
      <p>Step 3. In the collection fB1(d); : : : ; BJ (d)g, d 2 DJ , of the solutions found on
step 2, nd the tuple dA = (d1A; : : : ; dJA) on which the objective function Gd attains its
minimum. As the solution fC1A; : : : ; CJ g of Problem 1, we take constructed on step 2</p>
      <p>A
subsets B1(dA); : : : ; BJ (dA) corresponding to the tuple dA.
Output : Collection fC1A; : : : ; CJAg.</p>
      <p>Next statement speci es the properties of the proposed algorithm.</p>
      <p>Theorem 1. Let in the conditions of Problem 1, the coordinates of all points from the
set Y have integer values in the interval [ B; B]. Then, algorithm A nds an optimal
solution of this problem in O(N (N 2 + qJ )(2M B + 1)qJ + J 2 log2N ) time.
Proof. Let the subsets C1 ; : : : ; CJ be the optimal solution of Problem 1. According to
the de nition of step 3, algorithm A nds the solution of Problem 1 in the form:
where
fC1A; : : : ; CJAg = fB1(dA); : : : ; BJ (dA)g ;
dA = arg min Gd(B1(d); : : : ; BJ (d)) :</p>
      <p>d2DJ</p>
      <p>
        Thus, noticing that the elements of the optimal tuple y = (y(C1 ); : : : ; y(Cj ))
belong to the set D, according to Lemma 1, from the de nitions (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ), (
        <xref ref-type="bibr" rid="ref12">12</xref>
        ) and (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), the
following is implied
      </p>
      <p>GdA
∑J
j=1
∑ ∥y
y2Cj</p>
      <p>
        y(Cj )∥2 = F (C1 ; : : : ; CJ ) :
Then, using Lemma 2, de nitions (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ), (
        <xref ref-type="bibr" rid="ref11">11</xref>
        ), we obtain
(
        <xref ref-type="bibr" rid="ref11">11</xref>
        )
(
        <xref ref-type="bibr" rid="ref12">12</xref>
        )
(
        <xref ref-type="bibr" rid="ref13">13</xref>
        )
F (C1A; : : : ; CJA) =
∑J
      </p>
      <p>
        j=1
∑J
j=1
∑ ∥y
y2CjA
∑ ∥y
y2CjA
y(CjA)∥2
djA∥2 =
∑J
j=1
Combining (
        <xref ref-type="bibr" rid="ref13">13</xref>
        ) and (
        <xref ref-type="bibr" rid="ref14">14</xref>
        ), we nd the estimation
∑
y2Bj(dA)
∥y
djA∥2 = GdA : (
        <xref ref-type="bibr" rid="ref14">14</xref>
        )
      </p>
      <p>On the other hand, the sets C1A; : : : ; CJA are the feasible solution of Problem 1. Thus,
we have the estimation
From the obtained estimations, we have the equiality</p>
      <p>F (C1A; : : : ; CJA)</p>
      <p>F (C1 ; : : : ; CJ ) :
F (C1 ; : : : ; CJ )</p>
      <p>F (C1A; : : : ; CJA) :</p>
      <p>F (C1A; : : : ; CJA) = F (C1 ; : : : ; CJ ) :
Let us estimate the time complexity of the algorithm using its stepwise notation.</p>
      <p>
        By Lemma 3, computational costs of computing the least common multiple for
the numbers M1; : : : ; MJ on step 1 are O(J ∑kJ=1 log2 Mk), which is not greater than
O(J 2 log2 N ). On this step, computational costs of calculation of the values B and D
are bounded by O(qN ) and O(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) respectively.
      </p>
      <p>
        The time complexity required to nd an exact solution for the auxiliary Problem 2
(see Section 4) is equal to O(N (N 2 + qJ )). On step 2, this problem is solved O(jDjJ )
times. Therefore, according to (
        <xref ref-type="bibr" rid="ref7">7</xref>
        ), the complexity of step 2 is equal to O(N (N 2 +
qJ )(2M B + 1)qJ ).
      </p>
      <p>Summing up all the costs on all the steps, we nd that the time complexity of the
algorithm is bounded by O(N (N 2 + qJ )(2M B + 1)qJ + J 2 log2N ). ⊔⊓
Remark 1. If the dimension q of the space and the number J of desired subsets are
xed, the coordinates of the input points from Y are integer and belong to an interval
[ B; B], then algorithm A is an exact pseudopolynomial algorithm solving Problem 1
in O(N 3(M B)qJ ) time.
5</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In the paper, we have considered the strongly NP-hard problem of nding a family of
disjoint subsets in a nite set of points from Euclidean space.</p>
      <p>For the special case of the problem when the input points coordinates are integer,
we have constructed an exact algorithm. This algorithm is a pseudopolynomial one
when the input space dimension and the number of required subsets are xed.</p>
      <p>Of considerable interest is justi cation of faster approximation polynomial-time
algorithms with guaranteed accuracy for general problem.</p>
      <p>Acknowledgments. This work was supported by the Russian Foundation for Basic
Research (project nos. 15-01-00462, and 16-07-00168) and by the grant of Presidium
of RAS (program 5, project 227).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hastie</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tibshirani</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <source>Friedman J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction</source>
          . Springer-Verlag, New York (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>James</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Witten</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hastie</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tibshirani</surname>
            <given-names>R.</given-names>
          </string-name>
          : An Introduction to Statistical Learning. Springer Science+Business Media,
          <string-name>
            <surname>LLC</surname>
          </string-name>
          , New York (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bishop</surname>
            <given-names>C.M.</given-names>
          </string-name>
          :
          <source>Pattern Recognition and Machine Learning</source>
          . Springer Science+Business Media,
          <string-name>
            <surname>LLC</surname>
          </string-name>
          , New York (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Flach</surname>
            <given-names>P.</given-names>
          </string-name>
          :
          <source>Machine Learning: The Art and Science of Algorithms that Make Sense of Data</source>
          . Cambridge University Press, New York (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Galashov</surname>
            <given-names>A.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelmanov</surname>
            <given-names>A.V.:</given-names>
          </string-name>
          <article-title>A 2-Approximate Algorithm to Solve One Problem of the Family of Disjoint Vector Subsets</article-title>
          .
          <source>J. Automation and Remote Control</source>
          .
          <volume>75</volume>
          (
          <issue>4</issue>
          ),
          <volume>595</volume>
          {
          <fpage>606</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Aloise</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deshpande</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hansen</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popat</surname>
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>NP-hardness of Euclidean sum-of-squares clustering</article-title>
          .
          <source>J. Machine Learning</source>
          .
          <volume>75</volume>
          (
          <issue>2</issue>
          ),
          <volume>245</volume>
          {
          <fpage>248</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Jain</surname>
            <given-names>A.K.</given-names>
          </string-name>
          :
          <article-title>Data Clustering: 50 Years Beyond k-Means</article-title>
          .
          <source>J. Pattern Recognition Lett</source>
          .
          <volume>31</volume>
          ,
          <issue>651</issue>
          {
          <fpage>666</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Edwards</surname>
            <given-names>A.W.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cavalli-Sforza L</surname>
          </string-name>
          .L.:
          <article-title>A Method for Cluster Analysis</article-title>
          .
          <source>J. Biometrics</source>
          .
          <volume>21</volume>
          ,
          <issue>362</issue>
          {
          <fpage>375</fpage>
          , (
          <year>1965</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>MacQueen J.B.:</surname>
          </string-name>
          <article-title>Some Methods for Classi cation and Analysis of Multivariate Observations</article-title>
          .
          <source>In: Fith Berkley Symposium on Mathematical Statistics and Probability</source>
          , vol.
          <volume>1</volume>
          , pp.
          <volume>281</volume>
          {
          <fpage>297</fpage>
          . University of California Press, Berkley, California (
          <year>1967</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Bagdonavicius</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kruopis</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nikulin</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          :
          <article-title>Non-parametric Tests for Censored Data</article-title>
          . ISTE/WILEY, London, (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kel</surname>
          </string-name>
          <article-title>'manov</article-title>
          <string-name>
            <given-names>A.V.</given-names>
            ,
            <surname>Pyatkin</surname>
          </string-name>
          <string-name>
            <surname>A.V.</surname>
          </string-name>
          :
          <article-title>NP-Completeness of Some Problems of Choosing a Vector Subset</article-title>
          .
          <source>J. Applied and Industrial Mathematics</source>
          .
          <volume>5</volume>
          (
          <issue>3</issue>
          ),
          <volume>352</volume>
          {
          <fpage>357</fpage>
          , (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kel</surname>
          </string-name>
          <article-title>'manov</article-title>
          <string-name>
            <given-names>A.V.</given-names>
            ,
            <surname>Romanchenko</surname>
          </string-name>
          <string-name>
            <surname>S.M.:</surname>
          </string-name>
          <article-title>An Approximation Algorithm for Solving a Problem of Search for a Vector Subset</article-title>
          .
          <source>J. Applied and Industrial Mathematics</source>
          .
          <volume>6</volume>
          (
          <issue>1</issue>
          ),
          <volume>90</volume>
          {
          <fpage>96</fpage>
          , (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kel</surname>
          </string-name>
          <article-title>'manov</article-title>
          <string-name>
            <given-names>A.V.</given-names>
            ,
            <surname>Romanchenko</surname>
          </string-name>
          <string-name>
            <surname>S.M.</surname>
          </string-name>
          :
          <article-title>Pseudopolynomial Algorithms for Certain Computationally Hard Vector Subset and Cluster Analysis Problems</article-title>
          . J. Automation and
          <string-name>
            <given-names>Remote</given-names>
            <surname>Control</surname>
          </string-name>
          .
          <volume>73</volume>
          (
          <issue>2</issue>
          ),
          <volume>349</volume>
          {
          <fpage>354</fpage>
          , (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kel</surname>
          </string-name>
          <article-title>'manov</article-title>
          <string-name>
            <given-names>A.V.</given-names>
            ,
            <surname>Romanchenko</surname>
          </string-name>
          <string-name>
            <surname>S.M.:</surname>
          </string-name>
          <article-title>An FPTAS for a Vector Subset Search Problem</article-title>
          .
          <source>J. Applied and Industrial Mathematics</source>
          .
          <volume>8</volume>
          (
          <issue>3</issue>
          ),
          <volume>329</volume>
          {
          <fpage>336</fpage>
          , (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Schenmaier</surname>
            <given-names>V.V.</given-names>
          </string-name>
          :
          <article-title>An approximation scheme for a problem of search for a vector subset</article-title>
          .
          <source>J. Applied and Industrial Mathematics</source>
          .
          <volume>6</volume>
          (
          <issue>3</issue>
          ),
          <volume>381</volume>
          {
          <fpage>386</fpage>
          , (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Papadimitriou</surname>
            <given-names>C. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steiglitz</surname>
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Combinatorial Optimization: Algorithms and Complexity</article-title>
          . Prentice-Hall, Englewood Cliffs, New Jersey (
          <year>1982</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>