Social Spider Algorithm Approach for Clustering Harley Vera-Olivera* José Luis Soncco-Álvarez* Lauro Enciso-Rodas* harleyve@gmail.com, jose.soncco.alvarez@gmail.com, lauro.enciso@unsaac.edu.pe National University of San Antonio Abad del Cusco* Abstract of these problems there is little prior information, such as statistical models. It is under these restric- Clustering is a popular data analysis tech- tions that clustering is particularly appropriate for nique to identify homogeneous groups of the exploration of interrelationships between data objects based on the values of their at- to make a preliminary evaluation of its structure tributes, used in many disciplines and (Jain et al., 1999). Thus new conditions imposed applications. This extended abstract of by Big Data presented new challenges at different our undergraduate thesis for obtaining the levels including clustering. engineer degree in informatics and sys- The term clustering is used in several commu- tems, presents an approach based on the nities to describe methods for grouping of un- Social Spider Optimization (SSO) algo- labeled data (Jain et al., 1999). Clustering is rithm for optimizing clusters of data, tak- the task of discovering groups and data structures ing as metric the sum of euclidean dis- that are in some way or another ”similar”, with- tances. Other important algorithms of the out using known structures (vijayalakshmi and literature were implemented in order to Renuka Devi, 2012). Intuitively, patterns within a make comparisons: K-means algorithm, group are more similar compared to those patterns and a Genetic Algorithm (GA) for Clus- belonging to a different group. Here, the goal is tering. Experiments were performed us- to develop an automatic algorithm that can accu- ing 5 datasets taken from the UCI Ma- rately classify an unlabeled dataset in groups. chine Learning Repository, each algorithm Recent literature classifies clustering algorithms was executed many times and then the fol- in hierarchical, partitioning, and overlapping (Xu lowing measures were calculated: mean, and Wunsch, 2009). The partitional algorithm di- median, minimum, and maximum values vides a dataset into a finite number based on cer- of the results. These experiments showed tain criteria known as a measure of fitness. The fit- that the SSO algorithm outperforms the ness measure affects directly the natural formation K- means algorithm, and it has results of the groups, once a measure is selected the task equally competitive as the GA. All these of the partition becomes an optimization problem. results were confirmed by statistical tests K-means algorithm is the most fundamental performed over the outputs of the algo- concept of partitional grouping, was published in rithm. 1957 by Lloyd (Lloyd, 1982). In this case the 1 Introduction minimization of the Euclidean distance between its elements and the center of a cluster was con- Clustering is useful in several analysis such as ex- sidered as a criterion of optimization. Inspired by ploration of patterns, machine learning including K-means many algorithms were developed such data mining, documents retrieval, image segmen- as: Bisecting K-means (Steinbach et al., 2000), tation and pattern classification. However in many sort-means (Phillips, 2002), X-means (Pelleg and This work was started when first and second authors Moore, 1999), among others. were undergraduate students at the National University of Recent studies reveal a new trend, which was San Antonio Abad del Cusco and was finished when the au- thors were graduate students at the University of Brasilia, named as stochastic algorithms with randomized both receiving a CAPES scholarship. and local search meta-heuristic. The random pro- 114 cess generates arbitrary solutions that explore the for the clustering problem are presented, also the search space and are responsible for achieving pseudo-code of the algorithms is presented; in global solution (Nanda and Panda, 2014). The Section V the experiments and results are showed, first meta-heuristic inspired by nature was the ge- a discussion of this results is presented, and also a netic algorithm developed by Holland and his col- statistical analysis is performed; finally in Section leagues in 1975 (Holland, 1975). This algorithm VI are presented the conclusions and future work. is classified as evolutionary algorithm. On the other hand, new bio-inspired optimization algo- 2 The Clustering Problem rithms are being introduced, such is the case of the According to Mirkin (Mirkin, 2012), clustering is algorithm inspired by the social behavior of spi- a discipline dedicated to reveal and describe the ders (Cuevas et al., 2013) classified as swarm in- structures of groups in a dataset and may define telligence algorithm proposed in 2013, which had four important involved concepts: data, structure not been applied to the clustering problem until groups, reveal a group structure, and describe a our proposal. group structure. The following definitions were This extended abstract of our undergraduate taken from (Maulik and Bandyopadhyay, 2000; thesis for obtaining the engineer degree in in- De Falco et al., 2007; Karaboga and Ozturk, 2011; formatics and systems (Vera-Olivera and Soncco- Senthilnath et al., 2011). Álvarez, 2016), presents an approach based on the Suppose S = {x1 , x2 , . . . xn } is a set of N - SSO algorithm for the clustering problem. The dimensional n points and C = {c1 , c2 , . . . , ck } is contribution of this work is to show that the SSO a set of N -dimensional k elements. The clustering algorithm can produce competitive results regard- problem in a N -dimensional space RN consists in ing classic approaches such as: (a) the k-means al- partitioning the set S in a number k of clusters gorithm, which was implemented as presented in based on a similarity metric, where each cluster (Maulik and Bandyopadhyay, 2000); and (b) a ge- has as center an element ci from C. netic algorithm approach for the clustering prob- Suppose that Gi , i = 1, . . . , k, represents a lem, which was proposed by Maulik and Bandy- cluster, then the following properties hold: opadhyay (2000). The metric used for the com- parisons is the sum of euclidean distances of the • Gi 6= , to i = 1, . . . , k; elements of the clusters to their respective cen- ter, this metric is the output of the algorithms. • Gi \ G j = , to i, j = 1, . . . , k, such that For the experiments were used 5 datasets from i 6= j; the UCI Machine Learning Repository, for each of S k these datasets the algorithms were executed sev- • Gi = S eral times, and then the following measures were i=1 calculated: mean, median, minimum, and max- The clustering metric that has been adopted in imum values. This experiment showed that the this work is the sum of the Euclidean distances of SSO algorithm has better results compared to the the points of a group to their respective center. The ones obtained by the K-means algorithm, also the definition of this clustering metric M for k clus- SSO algorithm has equally competitive results as ters G1 , G2 , . . . , Gk , is given by the following ex- the GA. Additionally, a statistical analysis was pression: performed, since we are working with stochastic algorithms, using the Kolmogorov-Smirnov test and the Wilcoxon rank sum test as discussed in k X X (Demšar, 2006), (Durillo et al., 2009), (Muñoz et M(G1 , G2 , . . . , Gk ) = kxj ci k al., 2011). The results of these statistical test con- i=1 xj 2Gi firmed the results of the experiments. 3 Algorithm Based on the Social This paper is organized as follow: in Section Behavior of Spiders II, are given some definitions related to the clus- tering problem; in Section III is given the original Cuevas et al. (2013) proposed a new optimiza- proposal of the SSO algorithm; in Section IV, de- tion algorithm, called Social Spider Optimiza- tails of our approach based on the SSO algorithm tion(SSO), the development of this new algorithm 115 was guided by the operational principle of the so- 3.2 Modeling of Vibrations Through the cial behavior of spiders. The SSO algorithm as- Community Network sumes that the solution space is a community net- The community network is used as a mechanism work (spider web), where spiders interact to each for transmitting information between the members others. The main features of this approach are: of the colony. This information is coded as small vibrations that are critical for collective coordina- • Each solution within a space of solutions rep- tion of all individuals. The vibrations are based on resents the position of a spider in the commu- the weight and the distance of the spider that gen- nity network. erated it. The vibrations that are perceived by an • Each spider receives a weight according to individual i as a result of information transmitted the value of fitness solution that represents. by an individual j are modeled by the following expression: • The algorithm modeled two types of search 2 agents (spiders): male and female. Depend- V ib i,j = wj ⇤ e di,j ing on the genre each individual performs dif- Where di,j is the euclidean distance between ferent types of operations that simulate their spiders i and j. There are three special types of vi- social behavior within the colony. brations that are considered in the SSO algorithm: An important feature of the colonies of social • V ib i,c vibrations, where c is the closest spiders is that they have a high number of female member to i that has a higher weight com- agents. This fact is simulated by defining the num- pared to i (wc > wi ). ber of females Nf randomly within the range of 65 to 90% of N , which is the number of elements of • V ib i,b vibrations, where b is the individual the total population. The number of males Nm is who has the best weight (best fitness value) calculated as the complement of Nf regarding N . of the whole population S. The total population S is divided into two sub- groups F and M . The group F is the set of female • V ib i,f vibrations, where f is the female in- spiders, and the group M is the set of male spiders. dividual closest to i. F = {f1 , f2 , . . . , fNf } 3.3 Initialization of Population The SSO algorithm starts by initializing the set S, M = {m1 , m2 , . . . , mNm } which contains N spiders positions. Each posi- where S = F [ M = {s1 , s2 , . . . , sN } tion fi or mi , is an n-dimensional vector contain- ing the values to be optimized. These values are 3.1 Calculation of Fitness distributed uniformly between the values, plow and Each individual (spider) i of the population S re- phigh , which are previously specified. ceives a wi weight, that represents the quality of its 3.4 Cooperative Operators solution. This weight can be calculated as follows: 3.4.1 Cooperative Operator for female J(si ) worsts spiders wi = bests worsts To emulate the cooperative behavior of the female spiders, a new operator is defined. The operator where J(si ) is the fitness value calculated by eval- considers the change in position of a female spider uating the position of a spider si regarding the i at each iteration, this change can be attractive function J. The values worsts and bests consid- or repulsive and is calculated by combining three ering a maximization problem, are defined as fol- elements: lows: • The first element considers the change re- bests = max(J(sk )), k 2 {1, 2, . . . , N } garding the nearest member to i that has the highest weight and produces vibration worsts = min(J(sk )), k 2 {1, 2, . . . , N } V ib i,c ; 116 • The second element considers the change re- defines a probability of influence on the new off- garding the best individual of the population spring. The probability of influence Psi is as- S that produces vibration V ib i,b ; signed using the roulette-wheel selection, which is defined as follows: • The third incorporates a random movement. wi Ps i = P The last three elements can be considered as one j2T g wj movement, we use the ”+” symbol for attraction and the ”-” symbol for repulsion. The change in where si 2 T g . position can be calculated as follows: A spider is a solution within the solution space, so a new spider is formed by choosing values for fik+1 = fik ± movement each variable, this variable is chosen within the values defined by the method of roulette. For ex- where k represents the iteration number. ample let snew = {v1 , v2 , . . . , vn } be the new 3.4.2 Cooperative Operator for male spiders spider, each variable vi is determined using the To emulate the cooperative behavior of the male method of roulette-wheel selection. spiders, these are divided into two groups: dom- Once a new spider snew was formed is com- inant D and non-dominant N D. This division pared with the worst spider sworst from the colony is made according to its position respect to the according to their weights, where wworst = median of all male individuals. Individuals who minl2{1,2,...,N } (wl ). If the new spider snew is bet- have a weight that is above the median are con- ter than the worst spider sworst , then sworst is re- sidered dominants, otherwise they are considered placed by snew . Otherwise, the new spider is dis- non-dominant. carded and the colony does not suffer alterations. For dominant males are defined two move- If a replacement occur, the new spider takes the ments: (a) a movement of attraction to the near- genre and index from the spider replaced. est female f that produces a vibration V ib i,f , and 4 Optimization Algorithm Based on (b) a random movement. The last two movements Social Behaviour Spiders for can be considered as one, and then the change in a Clustering Problems dominant male can be calculated as follows: As proposal we present an SSO (Cuevas et al., mk+1 i = mki + D movement 2013) approach to solve the clustering problem. This optimization algorithm based on the social where k represents the iteration number. behaviour of spiders is a meta-heuristic algorithm For non-dominant males is defined just one of general purpose, so it is necessary to modify movement of attraction to the weighted average of many elements of the algorithms such as the rep- male spiders. Then the change in a non-dominant resentation of the individuals, calculation of the can be calculated as follows: fitness function, etc. Below are presented the el- mk+1 = mki + ND movement ements on which it was necessary to make mod- i ifications to the original algorithm proposed in where k represents the iteration number. (Cuevas et al., 2013). 3.5 Mating operator 4.1 Representations of Spiders (Individuals) Mating a colony of spiders is made between fe- The first consideration to take into account is the males and dominant males. So when a dominant representation of each spider. Each spider (male male mg finds a set of female spiders E g within or female) represents a set of k clusters centers, a range of mating r, it mates, forming a new off- which is a feasible solution to the problem of clus- spring Snew . This new offspring is generated from tering. the set T g , which is formed by the union of E g and For instance, let x = mg . When the set T g is empty, mating operation {(10.5; 20.4), (15.2; 25.0)} be a spider is canceled. that contains k = 2 cluster centers that are The weight of each spider that is involved in {(10.5; 20.4) and (15.2; 25.0)}, in this particular the mating process, i.e. spiders from the set T g , case each center has dimension n = 2. 117 Each spider of the initial population was gen- Algorithm 1: Algorithm for calculating the erated taking k random points of a given dataset, fitness of a spider where k is the number of cluster to be found. Input: An array of cluster centers C (spider 4.2 Distance between Two Spiders C); a set D of n-dimensional m points; an integer k > 0 that It is necessary to define the distance between two represents the number of clusters spiders, since a spider is formed by a set of cluster Output: Metric M of spider C centers (each center formed by several points) and 1 Create the set of empty clusters not by a set of points. So we define the distance G = {G1 , G2 , . . . , Gk } between two spiders as the sum of the euclidean 2 foreach point x of the set D do distances between their centers of clusters. 3 Assign the point x to the cluster Gi whose For instance, let a = {(ax1 ; ay1 ), (ax2 ; ay2 )} center Ci is the nearest to x; and b = {(bx1 ; by1 ), (bx2 ; by2 )} be two spiders that have k = 2 clusters centers, with each center 4 foreach cluster Gi do having dimension 2. Then the distance between 5 calculate a new center Ci⇤ ; these two spiders will be: 6 Calculate the metric M for the set of clusters G as defined in Section 2; da,b = d((ax1 ; ay1 ), (bx1 ; by1 )) + d((ax2 ; ay2 ), (bx2 ; by2 )) set T . In order to define the spider from which the where d((ax1 ; ay1 ), (bx1 ; by1 )) is the Euclidean new spider will inherit a cluster center, it is used distance between the centers (ax1 ; ay1 ) and the roulette-wheel selection. (bx1 ; by1 ). 4.5 Substitution of Spiders 4.3 Fitness and Weight of a Spider In order to decide which spiders will be replaced The fitness of each spider, which is an indicator by the new spiders produced in the mating stage, of how good is the solution that this spider repre- also is used the roulette wheel selection method, sents, is calculated using the metric M. The aim where spiders of the population with less weight of the SSO algorithm is to minimize the fitness of (greater fitness) have more probability to be re- the population. Thus, a spider that has the mini- placed. It is important to note that the weight of mum fitness is the best within the population. a spider have a negative correlation with respect The pseudocode for calculating the fitness of a to its fitness value, since we are working with a spider is presented in Algorithm 1. The weight of a minimization problem and not with a maximiza- spider i was re-defined, because fitness and weight tion problem as originally proposed by (Cuevas et have negative correlation, and it is calculated in al., 2013). the following way: The pseudocode of our proposal is showed in worsts J(si ) Algorithm 2. wi = worsts bests 5 Experiments and Results bests = min( J(sk ) ), k 2 {1, 2, . . . , N } To compare the algorithms were taken five dataset worsts = max( J(sk ) ), k 2 {1, 2, . . . , N } from UCI (UCI Machine Learning Repository) repository: Balance, Cancer-Int, Dermatology, Di- where J(si ) is the fitness value of the spider i abetes, Iris. that was calculated using Algorithm 1. The Balance dataset was generated to model psychological experiments, each example is clas- 4.4 Mating of Spiders sified as having the balance scale tip to the right, In the mating stage was defined a mating set T tip to the left, or be balanced. The attributes are the which is formed by a dominant male spider and the left weight, the left distance, the right weight, and female spiders that are within its range of mating. the right distance. The correct way to find the class From this set T are created new spiders, a new spi- is the greater of (left-distance * left-weight) and der represents a set of cluster centers, where each (right-distance * right-weight). If they are equal, cluster center is inherited from a spider within the it is balanced. 118 Algorithm 2: Social Spider Optimization al- Table 1: Properties of datasets gorithm for the clustering problem Dataset Size Attributes Classes Input: A dataset D of n-dimensional m Balance 625 4 3 points; an integer k > 0 that Cancer-Int 699 9 2 represents the number of clusters Dermatology 366 34 6 Output: Metric M of the clusters found Diabetes 768 8 2 1 foreach spider C of population P do Iris 150 4 3 2 Choose randomly k points from dataset D and create the array C (spider C) of cluster centers; GA, SSO) were executed 50 times . Each execu- 3 Calculate fitness of population P ; tion of an algorithm returns the metric M of the 4 Calculate weight of population P ; best solution found. Then, the following measures 5 for i = 2 to numberGenerations do were calculated: average, median, minimum and 6 Cooperative operator for female spiders; maximum value of the results. 7 Cooperative operator for male spiders; The results of the experiments for each dataset 8 Mating operator; are shown in tables 2, 3, 4, 5, and 6 where the best 9 Replacement of spiders in P ; results are highlighted in bold. 10 Calculate fitness of population P ; 11 Calculate weight of population P ; Table 2: Results of the experiments for the dataset 12 Return fitness (metric M) of the best solution Balance K-means Genetic. Alg. SSO Alg. found; Mean 1426,544 1423,860 1423,851 Median 1425,804 1423,851 1423,851 Minimum 1423,851 1423,851 1423,851 Maximum 1442,669 1424,071 1423,851 The Cancer-int dataset is one of three domains provided by the Oncology Institute that has repeat- edly appeared in the machine learning literature. Table 3: Results of the experiments for the dataset This data set includes 201 instances of one class Cancer-Int and 85 instances of another class. K-means Genetic Alg. SSO Alg. Mean 2824,135 2820,319 2820,302 In the Dermatology dataset is shown diagnoses Median 2824,136 2820,302 2820,302 of erythemato-squamos diseases. Minimum 2824,136 2820,302 2820,302 Maximum 2824,136 2821,138 2820,302 Diabetes patient records were obtained from two sources: an automatic electronic recording de- vice and paper records. The automatic device had 5.1 Discussion an internal clock to timestamp events, whereas the In the experiments for the dataset Balance, see Ta- paper records only provided ”logical time” slots ble 2, we can see that SSO algorithm has the best (breakfast, lunch, dinner, bedtime). results respect to all measures. Furthermore, re- Finally Iris contains 3 classes of 50 instances spect to the median and minimum values the SSO each, where each class refers to a type of iris plant. algorithm has the same values as the GA. One class is linearly separable from the other 2; From the results for the dataset Cancer-int, the latter are NOT linearly separable from each shown in Table 3, we can see that SSO algorithm other. has the best results respect to all measures. Fur- More features about the datasets are shown in thermore, respect to the median and minimum val- the Table 1. ues the SSO algorithm has the same results as the For the experiments the number of generations GA. was fixed at 100 for the three algorithms (K- In the case of Dermatology dataset, shown in means, GA, SSO). For the case of GA and SSO Table 4, GA algorithm has the best results respect algorithms the number of elements of their respec- to all measures. Furthermore, respect to the me- tive population was fixed at 100. dian and minimum values the SSO algorithm has The experiments were performed as follows: the same results as the GA this result is similar as for each dataset, the three algorithms (K-means, the two previous experiments. 119 Table 4: Results of the experiments for the dataset Table 6: Results of the experiments for the dataset Dermatology Iris K-means Genetic Alg. SSO Alg. K-means Genetic Alg. SSO Alg. Mean 1127,390 1092,353 1092,355 Mean 102.495 97.223 97.222 Median 1121,087 1092,341 1092,356 Median 97.326 97.222 97.222 Minimum 1092,644 1092,341 1092,341 Minimum 97.326 97.222 97.222 Maximum 1415,274 1092,373 1092,373 Maximum 124.182 97.232 97.222 Table 5: Results of the experiments for the dataset and the null hypothesis is assumed, this results is Diabetes shown using the symbol ’s-’. K-means Genetic Alg. SSO Alg. In the table 7 are shown the results of the sta- Mean 52072,244 49160,016 49159,956 Median 52072,243 49160,214 49159,939 tistical test of Wilcoxon between the SSO and K- Minimum 52072,244 49157,441 49157,441 Maximum 52072,244 49161,999 49165,111 means clustering algorithms. In this table we can see that there is statistical difference between SSO and K-means algorithms for all cases. So we can In the Table 5, we can see results of Diabetes conclude that the SSO algorithm presents results dataset, the results shown that SSO algorithm has significantly better than K-means algorithm. the best results respect to all measures. And, re- spect to the minimum values the SSO algorithm Table 7: Results of the Wilcoxon rank sum test has the same results as the GA. between the SSO algorithm and the K-means al- Finally in the results for the Iris dataset, shown gorithm in Table 6, SSO algorithm has the best results re- Dataset SSO Alg. K-means spect to all measures too. Furthermore, respect (median) (median) to the median and minimum values the SSO al- Balance 1423,851 1425,804 s+ gorithm has the same results as the GA. Cancer-Int 2820,302 2824,136 s+ Dermatology 1092,356 1121,086 s+ Diabetes 49159,939 52072,244 s+ 5.2 Statistical Analysis Iris 97,222 97,326 s+ An additional statistical analysis was performed for comparing the algorithms, since we are work- In the table 8 is shown the result of the statis- ing with stochastic algorithms. tical test of Wilcoxon between the SSO and the The following methodology was used: first the GA. In this table we can see that for most cases Kolmogorov-Smirnov test was applied to deter- the SSO and GA algorithms have similar results. mine whether results (of 50 executions) of each Only in the case of Iris dataset exists statistical algorithm have a normal distribution. After deter- significance, but we can not conclude that an al- mining that the algorithms do not have normal dis- gorithm is better than another since they have the tribution the non parametric Wilcoxon rank sum same median, we can only say that the algorithms test was applied to compare the medians of two have different behavior. algorithms. This methodology was discussed and applied in others works (Demšar, 2006), (Durillo Table 8: Results of the Wilcoxon rank sum test et al., 2009), (Muñoz et al., 2011). between the SSO algorithm and the genetic algo- The Wilcoxon rank sum test is used to test the rithm null hypothesis (H0 ) that the samples (of 50 ex- Dataset SSO Alg. GA (median) (median) ecutions) of two algorithms come from distribu- Balance 1423,851 1423,851 s- tions with same medians. If the null hypothesis Cancer-Int 2820,302 2820,302 s- is rejected the alternative hypothesis is assumed Dermatology 1092,356 1092,341 s- (HA ) that the samples come from distributions Diabetes 49159,939 49160,214 s- Iris 97,222 97,222 s+ with different medians. A significance level of 5% (p value less or equal than 0.05) was used for the Wilcoxon rank 6 Conclusions and Future Work sum test. If the test is successful then the null hypothesis is rejected and the alternative hypothe- In this work, a SSO approach for the cluster- sis is assumed, this result is shown using the ’s+’ ing problem has been proposed. For evaluat- symbol. Otherwise p value is greater than 0.05 ing its performance experiments were performed 120 over 5 datasets from the UCI repository (Balance, S. Lloyd. 1982. Least squares quantization in Cancer-Int, Dermatology, Diabetes, and Iris), also pcm. Information Theory, IEEE Transactions on, 28(2):129–137, Mar. comparisons were performed with two classic ap- proaches for the clustering problem: the k-means Ujjwal Maulik and Sanghamitra Bandyopadhyay. algorithm and a genetic algorithm for clustering. 2000. Genetic algorithm-based clustering tech- The experiments showed that the SSO algo- nique. Pattern Recognition, 33(9):1455 – 1465. rithm has better results regarding the algorithm k- Boris Mirkin. 2012. Clustering: a data recovery ap- means, and regarding the genetic algorithm, the proach. CRC Press. SSO algorithm has equally competitive results. Daniel M Muñoz, Carlos H Llanos, LDS Coelho, and All these results were validated statistically using Mauricio Ayala-Rincón. 2011. Opposition-based the non-parametric Wilcoxon rank sum test. Thus, shuffled pso with passive congregation applied to fm the main contribution of this work was to show matching synthesis. In Evolutionary Computation that the SSO algorithm can produce competitive (CEC), 2011 IEEE Congress on, pages 2775–2781. IEEE. results when compared with classic algorithms. As future works, we will include comparisons Satyasai Jagannath Nanda and Ganapati Panda. 2014. with newer algorithms of the literature. Also, it A survey on nature inspired metaheuristic algo- rithms for partitional clustering. Swarm and Evo- is interesting to include in the experiments bigger lutionary Computation, 16:1–18. datasets. Finally, additional experiments will be performed using other metrics such as the Clas- Dan Pelleg and Andrew Moore. 1999. Accelerating sification Error Percentage used in others works: exact k-means algorithms with geometric reasoning. In Proceedings of the fifth ACM SIGKDD interna- (De Falco et al., 2007), (Karaboga and Ozturk, tional conference on Knowledge discovery and data 2011) and (Senthilnath et al., 2011). mining, pages 277–281. ACM. Steven J Phillips. 2002. Acceleration of k-means and References related clustering algorithms. In Algorithm Engi- neering and Experiments, pages 166–177. Springer. Erik Cuevas, Miguel Cienfuegos, Daniel Zaldvar, and Marco Prez-Cisneros. 2013. A swarm opti- J Senthilnath, SN Omkar, and V Mani. 2011. Clus- mization algorithm inspired in the behavior of the tering using firefly algorithm: Performance study. social-spider. Expert Systems with Applications, Swarm and Evolutionary Computation, 1(3):164– 40(16):6374 – 6384. 171. Ivanoe De Falco, Antonio Della Cioppa, and Ernesto Michael Steinbach, George Karypis, Vipin Kumar, Tarantino. 2007. Facing classification problems et al. 2000. A comparison of document cluster- with particle swarm optimization. Applied Soft ing techniques. In KDD workshop on text mining, Computing, 7(3):652–658. volume 400, pages 525–526. Boston. Janez Demšar. 2006. Statistical comparisons of classi- Harley Vera-Olivera and José Luis Soncco-Álvarez. fiers over multiple data sets. The Journal of Machine 2016. Algoritmo de optimización basado en el com- Learning Research, 7:1–30. portamiento social de arañas para clustering. Uni- versidad Nacional de San Antonio Abad del Cusco. Juan J Durillo, José Garcı́a-Nieto, Antonio J Nebro, Undergraduate Thesis for Obtaining the Engineer Carlos A Coello Coello, Francisco Luna, and En- Degree in Informatics and Systems. rique Alba. 2009. Multi-objective particle swarm optimizers: An experimental comparison. In Evo- M vijayalakshmi and M Renuka Devi. 2012. A survey lutionary Multi-Criterion Optimization, pages 495– of different issue of different clustering algorithms 509. Springer. used in large data sets. In International Journal of Advanced Research in Computer Science and Soft- John H Holland. 1975. Adaptation in natural and ar- ware Engineering, pages 305 – 307. tificial systems: An introductory analysis with ap- plications to biology, control, and artificial intelli- R. Xu and D.C Wunsch. 2009. Clustering. Oxford gence. U Michigan Press. Wiley. Anil K Jain, M Narasimha Murty, and Patrick J Flynn. 1999. Data clustering: a review. ACM computing surveys (CSUR), 31(3):264–323. Dervis Karaboga and Celal Ozturk. 2011. A novel clustering approach: Artificial bee colony (abc) al- gorithm. Applied Soft Computing, 11(1):652–657. 121