=Paper=
{{Paper
|id=Vol-2098/paper8
|storemode=property
|title=Bee Colony Optimization for Clustering
Incomplete Data
|pdfUrl=https://ceur-ws.org/Vol-2098/paper8.pdf
|volume=Vol-2098
|authors=Tatjana Davidović,Nataša Glišović,Miodrag Rašković
}}
==Bee Colony Optimization for Clustering
Incomplete Data==
Bee Colony Optimization for Clustering Incomplete Data Tatjana Davidović1 , Nataša Glišović2 , and Miodrag Rašković1 1 Mathematical institute of Serbian academy of sciences and arts, Belgrade, Serbia 2 State University of Novi Pazar, Novi Pazar, Serbia tanjad@mi.sanu.ac.rs; natasaglisovic@gmail.com; goca@mi.sanu.ac.rs Abstract. Many decision making processes include the situations when not all relevant data are available. The main issues one has to deal with when clustering incomplete data are the mechanism for filling in the miss- ing values, the definition of a proper distance function and/or the selec- tion of the most appropriate clustering method. It is very hard to find the method that can adequately estimate missing values in all required situ- ations. Therefore, in the recent literature a new distance function, based on the propositional logic, that does not require to determine the val- ues of missing data, is proposed. Exploiting this distance, we developed Bee Colony Optimization (BCO) approach for clustering incomplete data based on the p-median classification model. BCO is a population-based meta-heuristic inspired by the foraging habits of honey bees. It belongs to the class of Swarm intelligence (SI) methods. The improvement vari- ant of BCO is implemented, the one that transforms complete solutions in order to improve their quality. The efficiency of the proposed approach is demonstrated by the comparison with some recent clustering methods. Keywords: Data bases · Missing values · Classification of objects · Nature-inspired methods · Swarm intelligence 1 Introduction One of the most common tasks in the everyday decision making process is clus- tering of the large amount of data [15]. It consists of grouping a set of m objects into K (a given number of) groups called clusters. The grouping has to be per- formed in such a way that objects belonging to the same cluster are similar to each other and at the same time they are different from the objects grouped in other clusters. Each object is described by the set of attributes defining different properties of that object. In fact, an object oj , j = 1, 2, . . . , m is represented by an array oj = (a1 , a2 , . . . , an ) in the n-dimensional space. Coordinates ai , i = 1, 2, . . . , n represent attributes used to characterize a given object. Attributes Copyright c by the paper’s authors. Copying permitted for private and academic purposes. In: S. Belim et al. (eds.): OPTA-SCL 2018, Omsk, Russia, published at http://ceur-ws.org BCO for Clustering Incomplete Data 95 can be of different type (numerical or categorical), therefore, usually some cod- ing is performed to properly represent their values and make easier to work with them. Some of the important decisions that need to be made during clustering are to find the most suitable measure of similarity (the one that optimizes a given objective function), to classify objects in a certain number of clusters, sometimes even to determine the number of clusters based on properties of the given data, and, (in many cases) to resolve how to treat the data when they are incomplete. Clustering problem is known to be NP-hard [2, 9, 21] and therefore, heuristic approaches represent the most natural choice. The methods based on the distances are often used due to their simplicity and applicability in different scenarios. They are very popular in the literature, because they can be used for any type of data, as long as the corresponding distance function is suitable for this type of data. Therefore, the problem of grouping data can be reduced to the problem of finding the distance function for this type of data. Consequently, finding the appropriate distance function has become an important area of research in the data processing [1, 25]. In certain cases the distances should be adapted to a specific domain of variables, such as categorical or time series ([7, 12]). The number of clusters K may be given in advance, however, in some appli- cations it may not even be known. In the cases when the most suitable number of clusters has to be discovered by the clustering algorithm we are dealing with automatic clustering. A recent survey of nature-inspired automatic clustering methods is given in [17]. From the recent literature one can conclude that the nature-inspired methods are dominating in the clustering field. Among the scarce local search exploring methods are primal-dual variable neighborhood search [14] and tabu search [23]. The special class of clustering problems include dealing with incomplete data. Most of the existing clustering algorithms assume that the object to be classified are completely described. Therefore, prior to their application the data should be preprocessed in order to determine the values of missing data. Recently, some classification algorithms that do not require to resolve the missing data problem were proposed. They include Rough Set Models [28], Bayesian Models [18] and Classifier Ensemble Models [27]. Our study is devoted to resolve the problem of missing data by using a new distance function [11] that is defined on the incomplete data. This distance is based on the propositional logic and does not presume that the missing at- tributes should be assigned any value. Once the problem of missing data is over- came, one can apply any clustering method to classify the given objects. Instead of using the existing (state-of-the-art) algorithms, we developed a new approach based on the Bee Colony Optimization (BCO) meta-heuristic. BCO is one of the population-based nature-inspired algorithm that mimics the behavior of honey bees during the nectar collection process. It was proposed by Lučić and Teodor- ović in 2001 ([20]) and evolved during the past years into simple and effective meta-heuristic method. Our implementation of BCO involves the distance func- 96 T. Davidović et al. tion from [11] and the transformation of solutions that randomly changes the cluster representatives. The preliminary experimental evaluation is performed on 9 data sets from UCI machine learning repository [24]. Our BCO approach is compared against six existing classification algorithms from the recent literature [27]. The considered classification algorithms involve training preprocessing, and therefore, they are in a superior position with respect to our meta-heuristic ap- proach. However, the obtained comparison results show that BCO is promising approach to this hard yet important problem in the automated decision making processes. The paper is divided into the following sections. The methodology of clus- tering and the problem of missing data are described in the next section. The BCO method overview and application to the clustering problem are presented in Section 3, while the experimental evaluation is described in Section 4. The advantages and disadvantages of applied technique and some future research are given in Section 5. 2 Clustering of Incomplete Data We consider clustering problem based on the p-median classification model that can be formulated as follows. Let us assume that we are given a set O of m objects oj , j = 1, 2, . . . , m. In addition, a distance function D : O × O → R+ is defined over the pairs of objects with a role to measure the similarity between the objects. In order to provide the Integer Linear Programming (ILP) formulation of the considered clustering problem we define the following binary variables: 1, if the object oj is assigned to cluster represented by object ol , xjl = 0, otherwise. 1, if object ol represents a culster, yl = 0, otherwise. The ILP formulation of the clustering problem is now described as follows: m X X m min xjl D(j, l) (1) j=1 l=1 s.t. (2) m X xjl = 1, 1 ≤ j ≤ m, (3) l=1 xjl ≤ yl , 1 ≤ j ≤ m, 1 ≤ l ≤ m, (4) m X yl = K, (5) l=1 xjl , yl ∈ {0, 1}, 1 ≤ j ≤ m, 1 ≤ l ≤ m. (6) BCO for Clustering Incomplete Data 97 The objective function (1) that should be minimized represents the sum of distances from each object to its nearest cluster representative. Every object oj should be assigned to exactly one cluster (represented by the object ol ) as it is described by constraints (3). Constraints (4) assure that an object oj can be assigned to a cluster only if the object ol is selected to be a cluster representative. The total number of luster representatives is set to K by the constraint (5). The binary nature of the decision variables is described by the constraints (6). It is very common case in real application that some values of the attributes describing objects to be clustering are missing. Values may be missing for various reasons [10]: data may not be available (often happens in medicine e.g., some very expensive or dangerous analyzes are performed only in the critical cases), errors may occur during the entering process, some data may not be considered important at the moment of entering, data acquisition from experiments may be incompetent, some values may be inconsistent with other data and they are erased, responses to a questionnaire may be incomplete, etc. There are two main techniques to deal with the problem of incomplete data. The simplest approach is to discard samples with missing values. However, this may be applied only in cases when the number of missing values is very small. Otherwise, the result of discarding incompletely defined objects will be insuf- ficient data for drawing any useful conclusion. Imputation is another common solution to an incomplete data problem: it consists of replacing missing values with one selected value of the considered attribute. The replacing value can be determined in various ways [19]: it can be set to zero, to a random num- ber with some given distribution, to the average, minimum or maximum out of the existing values, to the expected value calculated from the existing ones, to a value obtained by the application of linear regression or k-nearest neighbors to the existing ones, etc. Regardless the bulk of replacement techniques, their accuracy still remains an open question. Both the above mentioned methods require significant amount of computational effort and have to be performed in the preprocessing phase of clustering process. To overcame the missing data problem (i.e., to avoid data imputation), a new distance function (as the measure of similarity between two objects) was proposed in [11]. The most important shortcoming of the known distances (Eu- clidean, Manhattan, Minkowski, etc.) is that they are only applicable when we know the value of all attributes describing the objects. Therefore, the authors of [11] proposed a metric that can compare the objects for which the values of some attributes are not known. It is based on Hamming distance and propositional logic formulae. For two objects o and p from the set of n-dimensional objects the Hamming distance d(o, p) represents the number of the elements on which these two vectors have different values. The distance proposed in [11] exploits the propositional logic formulae in the following way. Each object from the data can be represented as a formula α that is conjunction of the attribute values. More precisely, for o = (o1 , o2 , . . . , on ) the corresponding formula α = o1 ∧ o2 ∧ · · · ∧ on . If the value of an attribute oi is not known, it is replaced by the disjunction of all possible values, i.e., 98 T. Davidović et al. oi = o1i ∨ o2i ∨ · · · ∨ osi , where s is the number of possible values for oi . Therefore, the corresponding object o can be represented by a set A of formulae obtained when the value of missing attribute oi is substituted by each particular possible value. The set A consists of the following formulae α1 = o1 ∧ o2 ∧ · · · ∧ o1i ∧ · · · ∧ on ; α2 = o1 ∧ o2 ∧ · · · ∧ o2i ∧ · · · ∧ on ; .. . αs = o1 ∧ o2 ∧ · · · ∧ osi ∧ · · · ∧ on . If an object o contains more attributes with unknown values, the set A of for- mulae contains combinations of all possible values for all missing attributes. Let the objects o and p be represented by the sets of propositional formulae A and B. The proposed distance D(o, p) between these two sets of formulae is defined by [11]: max min d(α, β) + max min d(α, β) α∈A β∈B β∈B α∈A D(o, p) = D(A, B) = (7) 2 where d is the Hamming distance. More precisely, it counts the different corre- sponding literals in these two formulae. In case when the values of all attributes are given (no missing values ap- pear), the proposed distance obviously reduces to the Hamming distance. In [11] it has been proved that D(o, p) satisfies the conditions to be considered as metric. In order to evaluate its efficiency, the proposed distance function has been used within the clustering algorithm described in [5]. The experimental evaluation conducted in [11] revealed that the proposed distance outperforms Euclidian distance combined with two data imputation techniques: using aver- age value and linear regression. The Distance d(o, p) Can Be Incorporated Into Any Distance-based Clustering Method and We Use It Within the Bco Meta- heuristic Described in the Next Section. 3 Bee Colony Optimization Meta-heuristic Method In This Section We Explain Our Implementation of the Bco Meta-heuristic for Clustering Incomplete Data. At the Beginning, an Overview of the Bco Algo- rithm is Presented and Then the Implementation Details Are Provided. 3.1 Overview of the BCO method The main feature of the BCO method is that population of artificial bees searches for the optimal solution of a given optimization problem [8]. Each artificial bee is responsible for one solution of the considered problem. Its role is to make that BCO for Clustering Incomplete Data 99 solution as good as possible depending on the current state of the search. The algorithm runs in iterations until a stopping condition is met. At the end of the BCO execution, the best found solution (the so called global best) is reported as the final one. Fig. 1. Main steps of the BCO algorithm Each iteration contains several execution steps consisting of two alternating phases: forward pass and backward pass (Fig. 1). During each forward pass, all bees are exploring the search space. Each bee applies a predefined number of moves, which yields a new solution. This part of the algorithm is problem dependent and should be resolved in each particular implementation of the BCO algorithm. Having obtained new solutions, the artificial bees start executing a second phase, the so-called backward pass where all bees share information about the quality of their solutions. The quality of each solution is defined by the current value of the objective function. When all solutions are evaluated, each bee decides with a certain probability whether it will stay loyal to its solution, become a recruiter and advertise its solution to other bees. If a bee is not loyal, it becomes an uncommitted follower and has to select one of the advertised solutions. It is obvious that bees with better solutions should have more chances to keep and advertise their solutions. Once the solution is abandoned, the correspond- ing bee becomes uncommitted follower and has to select one of the advertised solutions. This selection is taken with a probability, such that better advertised solutions have greater opportunities to be chosen for further exploration. Con- trary to the bees in nature, artificial bees that are loyal to their solutions are also the recruiters, i.e., their solutions are advertised and would be considered by uncommitted bees. In the basic variant of BCO there are only two parameters: B – the number of bees involved in the search and N C – the number of forward/backward passes in a single BCO iteration. 100 T. Davidović et al. Algorithm 1 Pseudo-code of the BCO algorithm procedure BCO Initialization(Problem input data, B, N C, ST OP ) while stopping criterion is not met do for b ← 1, B do . Initializing population Sol(b) ← SelectSolution() end for for u ← 1, N C do for b ← 1, B do . Forward pass EvaluateMove(Sol(b)) SelectMove(Sol(b)) end for EvaluateSolutions() for b ← 1, B do . Backward pass Loyalty(Sol(b)) end for for b ← 1, B do if b is not loyal then Recruitment(Sol(b)) end if end for end for Update(xbest , f (xbest )) end while Return(xbest , f (xbest )) end procedure The pseudo-code of the BCO algorithm is given by the Algorithm 1. Steps in the forward pass (EvaluateMove, SelectMove, and EvaluateSolutions) are problem specific and, obviously, they differ from implementation to imple- mentation. Therefore, there are no directions how to perform them. On the other hand, that gives the opportunity to maximally explore a priori knowledge about the considered problem and obtain a very powerful solution method. Loyalty decision for each bee depends on the quality of its own solution related to solutions held by other bees. The simplest way to calculate the prob- ability that a bee stays loyal to its current solution is to set pb = Ob , b = 1, 2, . . . , B (8) where: Ob - denotes the normalized value for the objective function (or any other fitness value) of solution created by the b-th bee. The normalization is performed in such a way that Ob ∈ [0, 1] and that larger values correspond to better solutions. More precisely, in the case of the minimization problem fmax − fb Ob = , b = 1, 2, . . . , B. fmax − fmin BCO for Clustering Incomplete Data 101 Here, fb represents the value of objective function found by bee b, while fmax and fmin correspond to the worst and the best values of objective function held by the bees, respectively. Equation (8) and a random number generator are used for each artificial bee to decide whether it will stay loyal (and continue exploring its own solution) or it will become an uncommitted follower (and select one among the advertised solutions for further exploration). If the generated random number from [0, 1] in- terval is smaller than the calculated probability pb , then the bee stays loyal to its own solution. Otherwise, the bee becomes uncommitted. Some other probability functions are evaluated in [16, 22]. For each uncommitted follower it is decided which recruiter it will follow, taking into account the quality of all advertised solutions. The probability that the solution held by recruiter r would be chosen by any uncommitted bee equals: Or pr = R , r = 1, 2, . . . , R (9) X Ok k=1 where Ok corresponds to the k-th advertised solution, and R denotes the number of recruiters. Using equation (9) and a random number generator, each uncom- mitted follower joins one recruiter through a roulette wheel. In [3] three other selection heuristics are considered (tournament, rank and disruptive selection), however, they will not be used in this work. BCO has been applied to the clustering problem in [13]. The authors imple- mented constructive version of the BCO algorithm to cluster completely defined objects. Our implementation uses the improvement variant of BCO with a focus on clustering incomplete data. 3.2 Implementation Details We implemented the improvement variant of the BCO algorithm, denoted by the BCOi in [8]. This means that each bee is assigned a complete feasible solution of the clustering problem and performs some transformations that should improve its quality. In order to improve the efficiency of the proposed BCO, we introduced a pre- processing phase that starts with calculating distances between objects using formula (7). In such a way the distance matrix is obtained with determined values of all entries. The next step of the pre-processing phase is to sort the distance matrix (as well as corresponding objects’ indices) in the non-decreasing order. This means that in the rows of sorted matrix positions of objects correspond to their distance from the object marked by the row index. More precisely, in the row s object i appears earlier then object j if D(s, i) < D(s, j). The sorted matrix is used in the solution transformation process and reduces its complexity to O(1), i.e., enables to perform the transformation in the constant number of steps. 102 T. Davidović et al. Each iteration of BCO starts with the initialization of population consisting of B bees. In our implementation, for the first iteration the population is initial- ized by some randomly selected solutions. This means that K random objects are selected to be initial cluster representatives, called centroids. In order to ob- tain the corresponding value of the objective function, each object is assigned to the cluster represented by the nearest centroid and sum of distances between the objects and the associated centroids is calculated. The best solution in the population is identified and declared as the global best solution. Starting from the second iteration, B/2 population members are assigned the global best so- lution obtained from the previous iterations, while the remaining solutions are selected randomly. An iteration of BCO, is composed of N C forward-backward passes that are implemented as follows. During the forward pass the transformation of the cur- rent solution held by each bee is performed. This transformation consists of replacing current centroids with some other objects. Namely, for each of the centroids the new object is selected randomly in the following way. Let c be the forward pass counter, i.e., c = 1, 2, . . . , N C. If c < N C/2 a random object is selected from the closest m/2 ones. When c ≥ N C/2 all m objects are con- sidered as candidates to replace the current centroid. It is possible to select 0 as an index of the new centroid with the meaning that this centroid will not be replaced in the transformed solution. The procedure to select a new object consists of the following steps. First, a random number k = rand(1, t) between 1 and t (where t = m/2 or t = m, depending on the value for c) is selected. Next, we select k1 = rand(0, k). If k1 = 0, the corresponding centroid will not be changed, otherwise, the k1 -th closest object is used to replace the current centroid. This procedure is repeated for all K centroids and then the transfor- mation of the current solution is completed. The value of the objective function is again determined in the usual way (each object is assigned to the cluster rep- resented by the nearest centroid and sum of distances between the objects and the associated centroids is calculated). After transforming all B solutions, BCO performs backward pass starting with the identification of the best solution held by bees that is used to update the global best solution. The remaining steps of the backward pass are implemented in standard way [8]. 4 Experimental Evaluation of BCO for Clustering The proposed BCO approach is implemented in C# under the Microsoft Visual Studio 2010 platform and run on AMD A4-6210 APU x64-based processor with 4GB RAM. In order to evaluate the proposed BCO method for clustering, we tested it on 9 UCI Repository of Machine Learning Databases [24] for classification and compared the obtained results against the existing ones from recent literature [27]. The relevant information about the used databases is presented in Table 1. The presented 9 databases are selected for two reasons: they contain incomplete BCO for Clustering Incomplete Data 103 data and they are used also in [27] enabling the comparison. As it can be seen from Table 1 the percentage of missing data in all databases is very small and this does not reflect the usual real life situations. However, the results used for comparison are obtained for the original databases, and we left the performance evaluation with respect to the amount of missing data for future work. Table 1. Description of the databases: number of objects, attributes and classes, and percentage of missing data Name # objects # attributes attribute type # classes % missing B.cancer 699 10 cat. 2 0.03 CVR 435 16 cat. 2 1.05 Dermatology 366 34 cat.+int. 6 0.02 Heart-h 294 13 cat.+int.+ real 2 0.34 Heart-c 303 13 cat.+int.+ real 5 0.08 Hepatitis 155 19 cat.+int.+ real 2 0.71 H.colic 368 27 cat.+int.+ real 2 1.78 L.cancer 32 56 int. 2 0.17 MM 961 5 int. 2 0.21 Fig. 2. Parameter tuning of BCO parameters on B.cancer database In order to determine the best combination of values for BCO parameters, we tested the combinations of following values: B ∈ {10, 15, 20, 25, 30, 35, 40, 45, 50}; N C ∈ {10, 15, 20, 25, 30, 35, 40, 45, 50}. Stopping criterion is defined as the max- imum number of iterations and is set to 200. The number of repetitions is set to 100. It is important to note that BCO showed excellent stability, the same results were obtained in all 100 executions. The obtained results are presented in Figs. 2-5. The two graphics related to the same database represent the depen- dence of the objective function value and classification accuracy on the values of 104 T. Davidović et al. Fig. 3. Parameter tuning of BCO parameters on Heart-h database Fig. 4. Parameter tuning of BCO parameters on Dermatology and H.colic databases Fig. 5. Parameter tuning of BCO results on MM database BCO for Clustering Incomplete Data 105 BCO parameters B and N C. For some databases the values of parameters do not have any influence (CVR, Heart-C, Hepatitis, L.cancer), and therefore, the cor- responding graphs are omitted. For some other the success rate does not depend on the parameters’ values, however, the objective function value varies when the parameters’ values are changing (Dermatology and H.colic). Therefore, for these two databases we presented only graphs related to the objective function values. The results for remaining databases are sensitive to the change of the values for BCO parameters regarding both criteria: objective function value and accu- racy. Based on this results and having in mind that the main criterion for the quality of clustering method is the minimal value of the objective function, we selected B = 40 and N C = 35 as the best combination for parameters’ values. The results obtained for this combination of parameters’ values are compared with the ones reported in [27]. It should be noted that this combination does not guarantee the best possible results for each of the databases, it is selected as the one that provide the least degradation in majority of the databases. For the illustration, we report also the best results, obtained with the combinations of parameters customized for each database. The methods evaluated in [27] represent the classification algorithms that work in two stages. The first one involves training on the set of already classified data while the second stage is devoted to the evaluation of the resulting trained classification method. Therefore, those methods already have some knowledge about the data to be processed. Our clustering method is based on a meta- heuristic approach that does not involve any training or learning and it could be considered to be in the inferior position with respect to the other method used for comparison. In [27] six methods were compared: Selective Neural Network Ensemble (SNNE), Multi-Granulation Ensemble Method (MGNE) proposed in [26], Neural Network Ensemble method without any assumption about distribution (NNNE) from [6], the method (Bag1) obtained when the mean value of the correspond- ing attribute is assigned to the missing entries, and then conduct the bagging algorithm [4], the method conducting bagging algorithm after the removal of the samples with missing values (Bag2) and NN method that conduct a single classifier on the data remaining after removing the samples with missing values. We compared our BCO approach with all these methods with respect to the results about classification accuracy reported in [27]. The comparison results are presented in Table 2. The first column of Table 2 contains the database name, the next 6 columns present the accuracy of the methods evaluated in [27]. The remaining two columns contain the results related to our BCO. Accuracy obtained for the selected com- bination of values for BCO parameters (B = 40, N C = 35) is reported in the eighth column, while in the last column the best results (for customized combi- nation of parameter values is shown. The improved values are underlined. We cannot estimate if the comparison is fair enough, since no time measurement units are provided in [27]. Our running times were in milliseconds and therefore, 106 T. Davidović et al. Table 2. Results of the comparison: classification accuracy of the tested methods Databases SNNE MGNE NNNE Bag1 Bag2 NN BCO best BCO B.cancer 0.935 0.938 0.936 0.938 0.939 0.65 0.956 0.957 CVR 0.942 0.945 0.942 0.965 0.964 0.513 0.875 0.875 Dermatology 0.886 0.879 0.861 0.849 0.844 0.277 0.874 0.874 Heart-h 0.816 0.806 0.806 0.812 0.76 0.656 0.548 0.558 Heart-c 0.526 0.519 0.516 0.52 0.524 0.437 0.671 0.671 Hepatitis 0.676 0.676 0.682 0.663 0.681 0.551 0.895 0.895 H.colic 0.735 0.723 0.701 0.778 0.633 0.583 0.923 0.923 L.cancer 0.524 0.522 0.498 0.503 0.517 0.347 0.719 0.719 MM 0.834 0.836 0.801 0.829 0.84 0.504 0.536 0.672 bf Average 0.764 0.760 0.749 0.762 0.745 0.502 0.777 0.794 they are not reported here. As can be seen from this table, our BCO shows slightly better performance on five (out of nine) databases and on average. The presented results are preliminary, there is still room for the improvements of the BCO generated results and we plan realize them as the future work. For example, the solution modification scheme could be changed in the sense that learning from previously visited solutions is included and the parameter values may be tuned with finer granulation. 5 Conclusion The clustering problem in large databases containing incomplete data is consid- ered in this paper. Instead of deleting objects with missing attributes or imputing missing values, we used the recently proposed new metric function able to deter- mine distance between objects with missing attributes. This distance function is based on Hamming distance and propositional logic and can be incorporated into any clustering method. We used it within Bee Colony Optimization framework and implemented a new population-based approach to the clustering problem. The proposed implementation is tested on large databases available on the Inter- net and compared with some recent clustering methods developed for classifying incomplete data. Preliminary experimental evaluation shows the potential of our approach. The directions of future work may include some modification of the pro- posed approach. For example, theoretical aspect of meta-heuristic methods can be taken into account, new transformation rules could be examined. The second direction should include performance evaluation of the proposed approach with respect to the amount of missing data. In addition, more extensive experimental evaluation should be performed and the proposed approach need to be compared with variety of the existing meta-heuristic clustering methods. BCO for Clustering Incomplete Data 107 Acknowledgement. This work has been supported by the Serbian Ministry of Science, Grant nos. OI174033 and III044006. References 1. Aggarwal, C.C.: Towards systematic design of distance functions for data mining applications. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 9–18. ACM (2003) 2. Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum- of-squares clustering. Machine learning 75(2), 245–248 (2009) 3. Alzaqebah, M. Abdullah, S.: Hybrid bee colony optimization for examination timetabling problems. Computers & Operations Research 54, 142–154 (2015) 4. Breiman, L.: Bagging predictors. Machine learning 24(2), 123–140 (1996) 5. Chan, E.Y., Ching, W.K., Ng, M.K., Huang, J.Z.: An optimization algorithm for clustering using weighted dissimilarity measures. Pattern recognition 37(5), 943–952 (2004) 6. Chen, H., Du, Y., Jiang, K.: Classification of incomplete data using classifier ensem- bles. In: International Conference on Systems and Informatics (ICSAI). pp. 2229– 2232. IEEE (2012) 7. Das, G., Mannila, H.: Context-based similarity measures for categorical databases. In: European Conference on Principles of Data Mining and Knowledge Discovery. pp. 201–210. Springer (2000) 8. Davidović, T., Teodorović, D., Šelmić, M.: Bee colony optimization Part I: The algorithm overview. Yugoslav Journal of Operational Research 25(1), 33–56 (2015) 9. Falkenauer, E.: Genetic Algorithms and Grouping Problems. Wiley, New York (1998) 10. Gautam, C., Ravi, V.: Evolving clustering based data imputation. In: 2014 Inter- national Conference on Circuit, Power and Computing Technologies (ICCPCT). pp. 1763–1769. IEEE (2014) 11. Glišović, N., Rašković, M.: Optimization for classifying the patients using the logic measures for missing data. Scientific Publications of the State University of Novi Pazar Series A: Applied Mathematics, Informatics and mechanics 9(1), 91–101 (2017) 12. Gunopulos, D., Das, G.: Time series similarity measures and time series indexing. In: Acm Sigmod Record. vol. 30, P. 624. ACM (2001) 13. Haghighat, A.T., Forsati, R.: Data clustering using bee colony optimization. In: The Seventh International Multi-Conference on Computing in the Global Informa- tion Technology. pp. 189–194 (2012) 14. Hansen, P., Brimberg, J., Urošević, D., Mladenović, N.: Solving large p-median clustering problems by primal–dual variable neighborhood search. Data Mining and Knowledge Discovery 19(3), 351–375 (2009) 15. Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern recognition letters 31(8), 651–666 (2010) 16. Jakšić Krüger, T.: Development, implementation, and theoretical analysis of the Bee Colony Optimization (BCO) meta-heuristic method. Ph.D. thesis, Faculty of Techincal Sciences, University of Novi Sad (2017) 17. Jose-Garcia, A., Gómez-Flores, W.: Automatic clustering using nature-inspired metaheuristics: A survey. Applied Soft Computing 41, 192–213 (2016) 108 T. Davidović et al. 18. Lin,H.-C. Su, C.-T.: A selective bayes classifier with meta-heuristics for incomplete data. Neurocomputing 106, 95–102 (2013) 19. Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley Series in Probability and Statistics. John Wiley & Sons (2002) 20. Lučić, P., Teodorović, D.: Bee system: modeling combinatorial optimization trans- portation engineering problems by swarm intelligence. In: Preprints of the TRIS- TAN IV Triennial Symposium on Transportation Analysis. pp. 441–445 (2001) 21. Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is N P -hard. In: International Workshop on Algorithms and Computation. pp. 274– 285. Springer (2009) 22. Maksimović, P., Davidović, T.: Parameter calibration in the bee colony optimiza- tion algorithm. In: XI Balcan Conference on Operational Research (BALCOR 2013). pp. 263–272 (2013) 23. Sung, C.S., Jin, H.W.: A tabu-search-based heuristic for clustering. Pattern Recog- nition 33(5), 849–858 (2000) 24. UCI Repository of machine learning databases for classification 25. Wang, F., Sun, J.: Survey on distance metric learning and dimensionality reduction in data mining. Data Mining and Knowledge Discovery 29(2), 534–564 (2015) 26. Yan, Y-T., Zhang, Y-P., Zhang, Y-W.: Multi-granulation ensemble classification for incomplete data. In: International Conference on Rough Sets and Knowledge Technology. pp. 343–351. Springer (2014) 27. Yan, Y-T., Zhang, Y-P., Zhang, Y-W., Du, X-Q.: A selective neural network ensem- ble classification for incomplete data. International Journal of Machine Learning and Cybernetics 8(5), 1513–1524 (2017) 28. Zhang, Q., Xie, Q., Wang, G.: A survey on rough set theory and its applications. CAAI Transactions on Intelligence Technology 1(4), 323–333 (2016)