<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bee Colony Optimization for Clustering Incomplete Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tatjana Davidovic</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Natasa Glisovic</string-name>
          <email>natasaglisovic@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miodrag Raskovic</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Mathematical institute of Serbian academy of sciences and arts</institution>
          ,
          <addr-line>Belgrade</addr-line>
          ,
          <country country="RS">Serbia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>State University of Novi Pazar</institution>
          ,
          <addr-line>Novi Pazar</addr-line>
          ,
          <country country="RS">Serbia</country>
        </aff>
      </contrib-group>
      <fpage>94</fpage>
      <lpage>108</lpage>
      <abstract>
        <p>Many decision making processes include the situations when not all relevant data are available. The main issues one has to deal with when clustering incomplete data are the mechanism for lling in the missing values, the de nition of a proper distance function and/or the selection of the most appropriate clustering method. It is very hard to nd the method that can adequately estimate missing values in all required situations. Therefore, in the recent literature a new distance function, based on the propositional logic, that does not require to determine the values of missing data, is proposed. Exploiting this distance, we developed Bee Colony Optimization (BCO) approach for clustering incomplete data based on the p-median classi cation model. BCO is a population-based meta-heuristic inspired by the foraging habits of honey bees. It belongs to the class of Swarm intelligence (SI) methods. The improvement variant of BCO is implemented, the one that transforms complete solutions in order to improve their quality. The e ciency of the proposed approach is demonstrated by the comparison with some recent clustering methods.</p>
      </abstract>
      <kwd-group>
        <kwd>Data bases</kwd>
        <kwd>Missing values</kwd>
        <kwd>Nature-inspired methods</kwd>
        <kwd>Swarm intelligence</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Classi cation of objects
One of the most common tasks in the everyday decision making process is
clustering of the large amount of data [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. It consists of grouping a set of m objects
into K (a given number of) groups called clusters. The grouping has to be
performed in such a way that objects belonging to the same cluster are similar to
each other and at the same time they are di erent from the objects grouped in
other clusters. Each object is described by the set of attributes de ning di erent
properties of that object. In fact, an object oj , j = 1; 2; : : : ; m is represented
by an array oj = (a1; a2; : : : ; an) in the n-dimensional space. Coordinates ai,
i = 1; 2; : : : ; n represent attributes used to characterize a given object. Attributes
Copyright c by the paper's authors. Copying permitted for private and academic purposes.
In: S. Belim et al. (eds.): OPTA-SCL 2018, Omsk, Russia, published at http://ceur-ws.org
can be of di erent type (numerical or categorical), therefore, usually some
coding is performed to properly represent their values and make easier to work with
them.
      </p>
      <p>
        Some of the important decisions that need to be made during clustering are
to nd the most suitable measure of similarity (the one that optimizes a given
objective function), to classify objects in a certain number of clusters, sometimes
even to determine the number of clusters based on properties of the given data,
and, (in many cases) to resolve how to treat the data when they are incomplete.
Clustering problem is known to be NP-hard [
        <xref ref-type="bibr" rid="ref2 ref21 ref9">2, 9, 21</xref>
        ] and therefore, heuristic
approaches represent the most natural choice.
      </p>
      <p>
        The methods based on the distances are often used due to their simplicity
and applicability in di erent scenarios. They are very popular in the literature,
because they can be used for any type of data, as long as the corresponding
distance function is suitable for this type of data. Therefore, the problem of
grouping data can be reduced to the problem of nding the distance function for
this type of data. Consequently, nding the appropriate distance function has
become an important area of research in the data processing [
        <xref ref-type="bibr" rid="ref1 ref25">1, 25</xref>
        ]. In certain
cases the distances should be adapted to a speci c domain of variables, such as
categorical or time series ([
        <xref ref-type="bibr" rid="ref12 ref7">7, 12</xref>
        ]).
      </p>
      <p>
        The number of clusters K may be given in advance, however, in some
applications it may not even be known. In the cases when the most suitable number
of clusters has to be discovered by the clustering algorithm we are dealing with
automatic clustering. A recent survey of nature-inspired automatic clustering
methods is given in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. From the recent literature one can conclude that the
nature-inspired methods are dominating in the clustering eld. Among the scarce
local search exploring methods are primal-dual variable neighborhood search [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
and tabu search [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
      <p>
        The special class of clustering problems include dealing with incomplete data.
Most of the existing clustering algorithms assume that the object to be classi ed
are completely described. Therefore, prior to their application the data should
be preprocessed in order to determine the values of missing data. Recently, some
classi cation algorithms that do not require to resolve the missing data problem
were proposed. They include Rough Set Models [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], Bayesian Models [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and
Classi er Ensemble Models [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ].
      </p>
      <p>
        Our study is devoted to resolve the problem of missing data by using a new
distance function [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] that is de ned on the incomplete data. This distance is
based on the propositional logic and does not presume that the missing
attributes should be assigned any value. Once the problem of missing data is
overcame, one can apply any clustering method to classify the given objects. Instead
of using the existing (state-of-the-art) algorithms, we developed a new approach
based on the Bee Colony Optimization (BCO) meta-heuristic. BCO is one of the
population-based nature-inspired algorithm that mimics the behavior of honey
bees during the nectar collection process. It was proposed by Lucic and
Teodorovic in 2001 ([
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]) and evolved during the past years into simple and e ective
meta-heuristic method. Our implementation of BCO involves the distance
function from [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and the transformation of solutions that randomly changes the
cluster representatives. The preliminary experimental evaluation is performed
on 9 data sets from UCI machine learning repository [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Our BCO approach is
compared against six existing classi cation algorithms from the recent literature
[
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. The considered classi cation algorithms involve training preprocessing, and
therefore, they are in a superior position with respect to our meta-heuristic
approach. However, the obtained comparison results show that BCO is promising
approach to this hard yet important problem in the automated decision making
processes.
      </p>
      <p>The paper is divided into the following sections. The methodology of
clustering and the problem of missing data are described in the next section. The
BCO method overview and application to the clustering problem are presented
in Section 3, while the experimental evaluation is described in Section 4. The
advantages and disadvantages of applied technique and some future research are
given in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Clustering of Incomplete Data</title>
      <p>We consider clustering problem based on the p-median classi cation model that
can be formulated as follows. Let us assume that we are given a set O of m
objects oj , j = 1; 2; : : : ; m. In addition, a distance function D : O O ! R+ is
de ned over the pairs of objects with a role to measure the similarity between
the objects.</p>
      <p>In order to provide the Integer Linear Programming (ILP) formulation of the
considered clustering problem we de ne the following binary variables:
xjl =
yl =
1; if the object oj is assigned to cluster represented by object ol;
0; otherwise:
1; if object ol represents a culster;
0; otherwise:
The ILP formulation of the clustering problem is now described as follows:
s:t:</p>
      <p>m m
min X X xjlD(j; l)
j=1 l=1
m
X xjl = 1;
l=1
xjl</p>
      <p>yl;
m
X yl = K;
l=1
1
j
j
j
m;
m; 1</p>
      <p>The objective function (1) that should be minimized represents the sum of
distances from each object to its nearest cluster representative. Every object oj
should be assigned to exactly one cluster (represented by the object ol) as it
is described by constraints (3). Constraints (4) assure that an object oj can be
assigned to a cluster only if the object ol is selected to be a cluster representative.
The total number of luster representatives is set to K by the constraint (5). The
binary nature of the decision variables is described by the constraints (6).</p>
      <p>
        It is very common case in real application that some values of the attributes
describing objects to be clustering are missing. Values may be missing for various
reasons [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]: data may not be available (often happens in medicine e.g., some
very expensive or dangerous analyzes are performed only in the critical cases),
errors may occur during the entering process, some data may not be considered
important at the moment of entering, data acquisition from experiments may
be incompetent, some values may be inconsistent with other data and they are
erased, responses to a questionnaire may be incomplete, etc.
      </p>
      <p>
        There are two main techniques to deal with the problem of incomplete data.
The simplest approach is to discard samples with missing values. However, this
may be applied only in cases when the number of missing values is very small.
Otherwise, the result of discarding incompletely de ned objects will be
insufcient data for drawing any useful conclusion. Imputation is another common
solution to an incomplete data problem: it consists of replacing missing values
with one selected value of the considered attribute. The replacing value can
be determined in various ways [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]: it can be set to zero, to a random
number with some given distribution, to the average, minimum or maximum out of
the existing values, to the expected value calculated from the existing ones, to
a value obtained by the application of linear regression or k-nearest neighbors
to the existing ones, etc. Regardless the bulk of replacement techniques, their
accuracy still remains an open question. Both the above mentioned methods
require signi cant amount of computational e ort and have to be performed in
the preprocessing phase of clustering process.
      </p>
      <p>
        To overcame the missing data problem (i.e., to avoid data imputation), a
new distance function (as the measure of similarity between two objects) was
proposed in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The most important shortcoming of the known distances
(Euclidean, Manhattan, Minkowski, etc.) is that they are only applicable when we
know the value of all attributes describing the objects. Therefore, the authors of
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] proposed a metric that can compare the objects for which the values of some
attributes are not known. It is based on Hamming distance and propositional
logic formulae. For two objects o and p from the set of n-dimensional objects
the Hamming distance d(o; p) represents the number of the elements on which
these two vectors have di erent values.
      </p>
      <p>
        The distance proposed in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] exploits the propositional logic formulae in the
following way. Each object from the data can be represented as a formula that
is conjunction of the attribute values. More precisely, for o = (o1; o2; : : : ; on)
the corresponding formula = o1 ^ o2 ^ ^ on. If the value of an attribute
oi is not known, it is replaced by the disjunction of all possible values, i.e.,
oi = oi1 _ oi2 _ _ ois, where s is the number of possible values for oi. Therefore,
the corresponding object o can be represented by a set A of formulae obtained
when the value of missing attribute oi is substituted by each particular possible
value. The set A consists of the following formulae
1 = o1 ^ o2 ^
2 = o1 ^ o2 ^
.
.
.
      </p>
      <p>1
^ oi ^</p>
      <p>2
^ oi ^
s = o1 ^ o2 ^</p>
      <p>s
^ oi ^
^ on;
^ on;
^ on:
If an object o contains more attributes with unknown values, the set A of
formulae contains combinations of all possible values for all missing attributes.</p>
      <p>
        Let the objects o and p be represented by the sets of propositional formulae
A and B. The proposed distance D(o; p) between these two sets of formulae is
de ned by [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]:
max min d( ; ) + max min d( ; )
      </p>
      <p>2A 2B 2B 2A
D(o; p) = D(A; B) =
2
where d is the Hamming distance. More precisely, it counts the di erent
corresponding literals in these two formulae.</p>
      <p>
        In case when the values of all attributes are given (no missing values
appear), the proposed distance obviously reduces to the Hamming distance. In
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] it has been proved that D(o; p) satis es the conditions to be considered as
metric. In order to evaluate its e ciency, the proposed distance function has
been used within the clustering algorithm described in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The experimental
evaluation conducted in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] revealed that the proposed distance outperforms
Euclidian distance combined with two data imputation techniques: using
average value and linear regression. The Distance d(o; p) Can Be Incorporated Into
Any Distance-based Clustering Method and We Use It Within the Bco
Metaheuristic Described in the Next Section.
(7)
3
      </p>
      <p>Bee Colony Optimization Meta-heuristic Method
In This Section We Explain Our Implementation of the Bco Meta-heuristic for
Clustering Incomplete Data. At the Beginning, an Overview of the Bco
Algorithm is Presented and Then the Implementation Details Are Provided.
3.1</p>
      <sec id="sec-2-1">
        <title>Overview of the BCO method</title>
        <p>
          The main feature of the BCO method is that population of arti cial bees searches
for the optimal solution of a given optimization problem [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Each arti cial bee
is responsible for one solution of the considered problem. Its role is to make that
solution as good as possible depending on the current state of the search. The
algorithm runs in iterations until a stopping condition is met. At the end of the
BCO execution, the best found solution (the so called global best) is reported
as the nal one.
        </p>
        <p>Each iteration contains several execution steps consisting of two alternating
phases: forward pass and backward pass (Fig. 1). During each forward pass,
all bees are exploring the search space. Each bee applies a prede ned number
of moves, which yields a new solution. This part of the algorithm is problem
dependent and should be resolved in each particular implementation of the BCO
algorithm. Having obtained new solutions, the arti cial bees start executing a
second phase, the so-called backward pass where all bees share information about
the quality of their solutions. The quality of each solution is de ned by the
current value of the objective function. When all solutions are evaluated, each
bee decides with a certain probability whether it will stay loyal to its solution,
become a recruiter and advertise its solution to other bees. If a bee is not loyal,
it becomes an uncommitted follower and has to select one of the advertised
solutions.</p>
        <p>It is obvious that bees with better solutions should have more chances to keep
and advertise their solutions. Once the solution is abandoned, the
corresponding bee becomes uncommitted follower and has to select one of the advertised
solutions. This selection is taken with a probability, such that better advertised
solutions have greater opportunities to be chosen for further exploration.
Contrary to the bees in nature, arti cial bees that are loyal to their solutions are
also the recruiters, i.e., their solutions are advertised and would be considered
by uncommitted bees.</p>
        <p>In the basic variant of BCO there are only two parameters:
B { the number of bees involved in the search and</p>
        <p>N C { the number of forward/backward passes in a single BCO iteration.</p>
        <p>Algorithm 1 Pseudo-code of the BCO algorithm
procedure BCO</p>
        <p>Initialization(Problem input data, B; N C; ST OP )
while stopping criterion is not met do
for b 1; B do</p>
        <p>Sol(b) SelectSolution()
end for
for u 1; N C do
for b 1; B do</p>
        <p>EvaluateMove(Sol(b))</p>
        <p>SelectMove(Sol(b))
end for
EvaluateSolutions()
for b 1; B do</p>
        <p>Loyalty(Sol(b))
end for
for b 1; B do
if b is not loyal then</p>
        <p>Recruitment(Sol(b))
end if
end for
end for</p>
        <p>Update(xbest; f (xbest))
end while</p>
        <p>Return(xbest; f (xbest))
end procedure
. Initializing population</p>
        <p>. Forward pass
. Backward pass</p>
        <p>The pseudo-code of the BCO algorithm is given by the Algorithm 1. Steps in
the forward pass (EvaluateMove, SelectMove, and EvaluateSolutions)
are problem speci c and, obviously, they di er from implementation to
implementation. Therefore, there are no directions how to perform them. On the other
hand, that gives the opportunity to maximally explore a priori knowledge about
the considered problem and obtain a very powerful solution method.</p>
        <p>Loyalty decision for each bee depends on the quality of its own solution
related to solutions held by other bees. The simplest way to calculate the
probability that a bee stays loyal to its current solution is to set
pb = Ob; b = 1; 2; : : : ; B
(8)
where:
Ob - denotes the normalized value for the objective function (or any other tness
value) of solution created by the b-th bee. The normalization is performed in such
a way that Ob 2 [0; 1] and that larger values correspond to better solutions. More
precisely, in the case of the minimization problem</p>
        <p>Ob =</p>
        <p>fmax
Here, fb represents the value of objective function found by bee b, while fmax
and fmin correspond to the worst and the best values of objective function held
by the bees, respectively.</p>
        <p>
          Equation (8) and a random number generator are used for each arti cial bee
to decide whether it will stay loyal (and continue exploring its own solution) or
it will become an uncommitted follower (and select one among the advertised
solutions for further exploration). If the generated random number from [0; 1]
interval is smaller than the calculated probability pb, then the bee stays loyal to its
own solution. Otherwise, the bee becomes uncommitted. Some other probability
functions are evaluated in [
          <xref ref-type="bibr" rid="ref16 ref22">16, 22</xref>
          ].
        </p>
        <p>For each uncommitted follower it is decided which recruiter it will follow,
taking into account the quality of all advertised solutions. The probability that
the solution held by recruiter r would be chosen by any uncommitted bee equals:
pr =</p>
        <p>
          Or
R
X Ok
k=1
; r = 1; 2; : : : ; R
(9)
where Ok corresponds to the k-th advertised solution, and R denotes the number
of recruiters. Using equation (9) and a random number generator, each
uncommitted follower joins one recruiter through a roulette wheel. In [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] three other
selection heuristics are considered (tournament, rank and disruptive selection),
however, they will not be used in this work.
        </p>
        <p>
          BCO has been applied to the clustering problem in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. The authors
implemented constructive version of the BCO algorithm to cluster completely de ned
objects. Our implementation uses the improvement variant of BCO with a focus
on clustering incomplete data.
3.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Implementation Details</title>
        <p>
          We implemented the improvement variant of the BCO algorithm, denoted by the
BCOi in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. This means that each bee is assigned a complete feasible solution of
the clustering problem and performs some transformations that should improve
its quality.
        </p>
        <p>In order to improve the e ciency of the proposed BCO, we introduced a
preprocessing phase that starts with calculating distances between objects using
formula (7). In such a way the distance matrix is obtained with determined values
of all entries. The next step of the pre-processing phase is to sort the distance
matrix (as well as corresponding objects' indices) in the non-decreasing order.
This means that in the rows of sorted matrix positions of objects correspond
to their distance from the object marked by the row index. More precisely, in
the row s object i appears earlier then object j if D(s; i) &lt; D(s; j). The sorted
matrix is used in the solution transformation process and reduces its complexity
to O(1), i.e., enables to perform the transformation in the constant number of
steps.</p>
        <p>Each iteration of BCO starts with the initialization of population consisting
of B bees. In our implementation, for the rst iteration the population is
initialized by some randomly selected solutions. This means that K random objects
are selected to be initial cluster representatives, called centroids. In order to
obtain the corresponding value of the objective function, each object is assigned
to the cluster represented by the nearest centroid and sum of distances between
the objects and the associated centroids is calculated. The best solution in the
population is identi ed and declared as the global best solution. Starting from
the second iteration, B=2 population members are assigned the global best
solution obtained from the previous iterations, while the remaining solutions are
selected randomly.</p>
        <p>An iteration of BCO, is composed of N C forward-backward passes that are
implemented as follows. During the forward pass the transformation of the
current solution held by each bee is performed. This transformation consists of
replacing current centroids with some other objects. Namely, for each of the
centroids the new object is selected randomly in the following way. Let c be
the forward pass counter, i.e., c = 1; 2; : : : ; N C. If c &lt; N C=2 a random object
is selected from the closest m=2 ones. When c N C=2 all m objects are
considered as candidates to replace the current centroid. It is possible to select 0
as an index of the new centroid with the meaning that this centroid will not
be replaced in the transformed solution. The procedure to select a new object
consists of the following steps. First, a random number k = rand(1; t) between
1 and t (where t = m=2 or t = m, depending on the value for c) is selected.
Next, we select k1 = rand(0; k). If k1 = 0, the corresponding centroid will not
be changed, otherwise, the k1-th closest object is used to replace the current
centroid. This procedure is repeated for all K centroids and then the
transformation of the current solution is completed. The value of the objective function
is again determined in the usual way (each object is assigned to the cluster
represented by the nearest centroid and sum of distances between the objects and
the associated centroids is calculated).</p>
        <p>
          After transforming all B solutions, BCO performs backward pass starting
with the identi cation of the best solution held by bees that is used to update the
global best solution. The remaining steps of the backward pass are implemented
in standard way [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
4
        </p>
        <p>Experimental Evaluation of BCO for Clustering
The proposed BCO approach is implemented in C# under the Microsoft Visual
Studio 2010 platform and run on AMD A4-6210 APU x64-based processor with
4GB RAM.</p>
        <p>
          In order to evaluate the proposed BCO method for clustering, we tested it
on 9 UCI Repository of Machine Learning Databases [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] for classi cation and
compared the obtained results against the existing ones from recent literature
[
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. The relevant information about the used databases is presented in Table 1.
The presented 9 databases are selected for two reasons: they contain incomplete
data and they are used also in [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] enabling the comparison. As it can be seen
from Table 1 the percentage of missing data in all databases is very small and
this does not re ect the usual real life situations. However, the results used for
comparison are obtained for the original databases, and we left the performance
evaluation with respect to the amount of missing data for future work.
        </p>
        <p>
          In order to determine the best combination of values for BCO parameters, we
tested the combinations of following values: B 2 f10; 15; 20; 25; 30; 35; 40; 45; 50g;
N C 2 f10; 15; 20; 25; 30; 35; 40; 45; 50g. Stopping criterion is de ned as the
maximum number of iterations and is set to 200. The number of repetitions is set
to 100. It is important to note that BCO showed excellent stability, the same
results were obtained in all 100 executions. The obtained results are presented
in Figs. 2-5. The two graphics related to the same database represent the
dependence of the objective function value and classi cation accuracy on the values of
BCO parameters B and N C. For some databases the values of parameters do not
have any in uence (CVR, Heart-C, Hepatitis, L.cancer), and therefore, the
corresponding graphs are omitted. For some other the success rate does not depend
on the parameters' values, however, the objective function value varies when the
parameters' values are changing (Dermatology and H.colic). Therefore, for these
two databases we presented only graphs related to the objective function values.
The results for remaining databases are sensitive to the change of the values
for BCO parameters regarding both criteria: objective function value and
accuracy. Based on this results and having in mind that the main criterion for the
quality of clustering method is the minimal value of the objective function, we
selected B = 40 and N C = 35 as the best combination for parameters' values.
The results obtained for this combination of parameters' values are compared
with the ones reported in [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. It should be noted that this combination does
not guarantee the best possible results for each of the databases, it is selected
as the one that provide the least degradation in majority of the databases. For
the illustration, we report also the best results, obtained with the combinations
of parameters customized for each database.
        </p>
        <p>
          The methods evaluated in [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] represent the classi cation algorithms that
work in two stages. The rst one involves training on the set of already classi ed
data while the second stage is devoted to the evaluation of the resulting trained
classi cation method. Therefore, those methods already have some knowledge
about the data to be processed. Our clustering method is based on a
metaheuristic approach that does not involve any training or learning and it could be
considered to be in the inferior position with respect to the other method used
for comparison.
        </p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] six methods were compared: Selective Neural Network Ensemble
(SNNE), Multi-Granulation Ensemble Method (MGNE) proposed in [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], Neural
Network Ensemble method without any assumption about distribution (NNNE)
from [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], the method (Bag1) obtained when the mean value of the
corresponding attribute is assigned to the missing entries, and then conduct the bagging
algorithm [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], the method conducting bagging algorithm after the removal of
the samples with missing values (Bag2) and NN method that conduct a single
classi er on the data remaining after removing the samples with missing values.
We compared our BCO approach with all these methods with respect to the
results about classi cation accuracy reported in [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. The comparison results are
presented in Table 2.
        </p>
        <p>
          The rst column of Table 2 contains the database name, the next 6 columns
present the accuracy of the methods evaluated in [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. The remaining two columns
contain the results related to our BCO. Accuracy obtained for the selected
combination of values for BCO parameters (B = 40, N C = 35) is reported in the
eighth column, while in the last column the best results (for customized
combination of parameter values is shown. The improved values are underlined. We
cannot estimate if the comparison is fair enough, since no time measurement
units are provided in [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. Our running times were in milliseconds and therefore,
they are not reported here. As can be seen from this table, our BCO shows
slightly better performance on ve (out of nine) databases and on average.
        </p>
        <p>The presented results are preliminary, there is still room for the improvements
of the BCO generated results and we plan realize them as the future work. For
example, the solution modi cation scheme could be changed in the sense that
learning from previously visited solutions is included and the parameter values
may be tuned with ner granulation.
5</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>The clustering problem in large databases containing incomplete data is
considered in this paper. Instead of deleting objects with missing attributes or imputing
missing values, we used the recently proposed new metric function able to
determine distance between objects with missing attributes. This distance function is
based on Hamming distance and propositional logic and can be incorporated into
any clustering method. We used it within Bee Colony Optimization framework
and implemented a new population-based approach to the clustering problem.
The proposed implementation is tested on large databases available on the
Internet and compared with some recent clustering methods developed for classifying
incomplete data. Preliminary experimental evaluation shows the potential of our
approach.</p>
      <p>The directions of future work may include some modi cation of the
proposed approach. For example, theoretical aspect of meta-heuristic methods can
be taken into account, new transformation rules could be examined. The second
direction should include performance evaluation of the proposed approach with
respect to the amount of missing data. In addition, more extensive experimental
evaluation should be performed and the proposed approach need to be compared
with variety of the existing meta-heuristic clustering methods.
Acknowledgement. This work has been supported by the Serbian Ministry of
Science, Grant nos. OI174033 and III044006.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aggarwal</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          :
          <article-title>Towards systematic design of distance functions for data mining applications</article-title>
          .
          <source>In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <volume>9</volume>
          {
          <fpage>18</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Aloise</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deshpande</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popat</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>NP-hardness of Euclidean sumof-squares clustering</article-title>
          .
          <source>Machine learning 75(2)</source>
          ,
          <volume>245</volume>
          {
          <fpage>248</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Alzaqebah</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Abdullah</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Hybrid bee colony optimization for examination timetabling problems</article-title>
          .
          <source>Computers &amp; Operations Research</source>
          <volume>54</volume>
          ,
          <volume>142</volume>
          {
          <fpage>154</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Bagging predictors</article-title>
          .
          <source>Machine learning 24(2)</source>
          ,
          <volume>123</volume>
          {
          <fpage>140</fpage>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chan</surname>
            ,
            <given-names>E.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ching</surname>
            ,
            <given-names>W.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>M.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>J.Z.</given-names>
          </string-name>
          :
          <article-title>An optimization algorithm for clustering using weighted dissimilarity measures</article-title>
          .
          <source>Pattern recognition 37(5)</source>
          ,
          <volume>943</volume>
          {
          <fpage>952</fpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Classi cation of incomplete data using classi er ensembles</article-title>
          .
          <source>In: International Conference on Systems and Informatics (ICSAI)</source>
          . pp.
          <volume>2229</volume>
          {
          <fpage>2232</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannila</surname>
          </string-name>
          , H.:
          <article-title>Context-based similarity measures for categorical databases</article-title>
          .
          <source>In: European Conference on Principles of Data Mining and Knowledge Discovery</source>
          . pp.
          <volume>201</volume>
          {
          <fpage>210</fpage>
          . Springer (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Davidovic</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teodorovic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Selmic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Bee colony optimization Part I: The algorithm overview</article-title>
          .
          <source>Yugoslav Journal of Operational Research</source>
          <volume>25</volume>
          (
          <issue>1</issue>
          ),
          <volume>33</volume>
          {
          <fpage>56</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Falkenauer</surname>
          </string-name>
          , E.:
          <article-title>Genetic Algorithms and Grouping Problems</article-title>
          . Wiley, New York (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gautam</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ravi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Evolving clustering based data imputation</article-title>
          .
          <source>In: 2014 International Conference on Circuit, Power and Computing Technologies (ICCPCT)</source>
          . pp.
          <volume>1763</volume>
          {
          <fpage>1769</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Glisovic</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raskovic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Optimization for classifying the patients using the logic measures for missing data</article-title>
          .
          <source>Scienti c Publications of the State University of Novi Pazar Series A: Applied Mathematics, Informatics and mechanics 9</source>
          (
          <issue>1</issue>
          ),
          <volume>91</volume>
          {
          <fpage>101</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Gunopulos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Time series similarity measures and time series indexing</article-title>
          .
          <source>In: Acm Sigmod Record</source>
          . vol.
          <volume>30</volume>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <year>624</year>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Haghighat</surname>
            ,
            <given-names>A.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Forsati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Data clustering using bee colony optimization</article-title>
          .
          <source>In: The Seventh International Multi-Conference on Computing in the Global Information Technology</source>
          . pp.
          <volume>189</volume>
          {
          <issue>194</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brimberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Urosevic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mladenovic</surname>
          </string-name>
          , N.:
          <article-title>Solving large p-median clustering problems by primal{dual variable neighborhood search</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          <volume>19</volume>
          (
          <issue>3</issue>
          ),
          <volume>351</volume>
          {
          <fpage>375</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          :
          <article-title>Data clustering: 50 years beyond K-means</article-title>
          .
          <source>Pattern recognition letters 31(8)</source>
          ,
          <volume>651</volume>
          {
          <fpage>666</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Jaksic Kruger, T.:
          <article-title>Development, implementation, and theoretical analysis of the Bee Colony Optimization (BCO) meta-heuristic method</article-title>
          .
          <source>Ph.D. thesis</source>
          , Faculty of Techincal Sciences, University of Novi Sad (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Jose-Garcia,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Gomez-Flores</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          :
          <article-title>Automatic clustering using nature-inspired metaheuristics: A survey</article-title>
          .
          <source>Applied Soft Computing</source>
          <volume>41</volume>
          ,
          <issue>192</issue>
          {
          <fpage>213</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>H.-C.</given-names>
          </string-name>
          <string-name>
            <surname>Su</surname>
          </string-name>
          , C.-T.:
          <article-title>A selective bayes classi er with meta-heuristics for incomplete data</article-title>
          .
          <source>Neurocomputing</source>
          <volume>106</volume>
          ,
          <issue>95</issue>
          {
          <fpage>102</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Little</surname>
            ,
            <given-names>R.J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rubin</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          :
          <article-title>Statistical Analysis with Missing Data</article-title>
          . Wiley Series in Probability and Statistics. John Wiley &amp; Sons (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Lucic</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teodorovic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Bee system: modeling combinatorial optimization transportation engineering problems by swarm intelligence</article-title>
          .
          <source>In: Preprints of the TRISTAN IV Triennial Symposium on Transportation Analysis</source>
          . pp.
          <volume>441</volume>
          {
          <issue>445</issue>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Mahajan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nimbhorkar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varadarajan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>The planar k-means problem is N P -hard</article-title>
          . In: International Workshop on Algorithms and Computation. pp.
          <volume>274</volume>
          {
          <fpage>285</fpage>
          . Springer (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Maksimovic</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davidovic</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Parameter calibration in the bee colony optimization algorithm</article-title>
          .
          <source>In: XI Balcan Conference on Operational Research (BALCOR</source>
          <year>2013</year>
          ). pp.
          <volume>263</volume>
          {
          <issue>272</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Sung</surname>
            ,
            <given-names>C.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>H.W.:</given-names>
          </string-name>
          <article-title>A tabu-search-based heuristic for clustering</article-title>
          .
          <source>Pattern Recognition</source>
          <volume>33</volume>
          (
          <issue>5</issue>
          ),
          <volume>849</volume>
          {
          <fpage>858</fpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <article-title>UCI Repository of machine learning databases for classi cation</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Survey on distance metric learning and dimensionality reduction in data mining</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          <volume>29</volume>
          (
          <issue>2</issue>
          ),
          <volume>534</volume>
          {
          <fpage>564</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>Y-T.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <surname>Y-P.</surname>
          </string-name>
          , Zhang, Y-W.
          <article-title>: Multi-granulation ensemble classi cation for incomplete data</article-title>
          .
          <source>In: International Conference on Rough Sets and Knowledge Technology</source>
          . pp.
          <volume>343</volume>
          {
          <fpage>351</fpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>Y-T.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <surname>Y-P.</surname>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>Y-W.</given-names>
            ,
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <surname>X-Q.</surname>
          </string-name>
          :
          <article-title>A selective neural network ensemble classi cation for incomplete data</article-title>
          .
          <source>International Journal of Machine Learning and Cybernetics</source>
          <volume>8</volume>
          (
          <issue>5</issue>
          ),
          <volume>1513</volume>
          {
          <fpage>1524</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>A survey on rough set theory and its applications</article-title>
          .
          <source>CAAI Transactions on Intelligence Technology</source>
          <volume>1</volume>
          (
          <issue>4</issue>
          ),
          <volume>323</volume>
          {
          <fpage>333</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>