Hybrid genetic algorithm for the maximum clique problem combining sharing and migration Roger Ouch, Kristopher W. Reese, Roman V. Yampolskiy Computer Engineering and Computer Science, University of Louisville Louisville, Kentucky 40292 Abstract increases dramatically. Traditional heuristic approaches for The Maximum Clique Problem (MCP) has been studied for finding the maximum clique problem have become decades and is well known in graph theory as a problem that is insufficient for this new challenge. difficult as it as it is known to be NP-complete. The MCP has a Cheng et al. [1] investigated finding the maximal clique vast domain of application such as finance, biochemistry, within massive networks. They introduced the concept of bioinformatics, and many more. Many niching methods have an H* graph where the maximum clique is computed in a been successfully applied in Genetic Algorithms (GA) to small part of a large graph recursively one at a time using diversify the population and avoid getting trapped within local external memory. Their experiments were done on large optima. In this paper, we propose an approach using the Sharing graph with 10 million vertices and 80 millions edges. method and a Hybrid Genetic Algorithm (HGA) for the maximum clique problem. We also propose a non-evolutionary C. Description of Genetic Algorithms approach using a migration mechanism to boost the current HGA. A well-known metaheuristic in evolutionary algorithms is the genetic algorithm. The earliest work related to GAs I. Introduction dates back to 1954 when Nils and Barricelli published a paper on computer simulation of evolution [2]. Genetic A. The Maximum Clique Problem algorithms have been formalized in later years and date The maximum clique problem is a classically studied back as early as 1975 when J.H Holland published his combinatorial optimization problem, which comes from papers on the subject [3]. graph theory and has many domains of application. The A GA is a combinatorial optimization technique that maximum clique problem can be briefly summarized using mimics the process of natural evolution. The algorithm the following definition. does not rely on building the final solution based on local Let’s G= (V,E) be an undirected graph on N vertices, search, which distinguishes itself from traditional where V is the vertex set and E is the edge set. A clique in approaches. A subset of feasible solutions of a problem is G is a subgraph of G in which there is an edge between encoded in chromosomes that represent individuals in a any two vertices. The size of a clique is the number of population. The evolutionary process is then applied vertices in the clique, and the maximum clique problem is through population selection, crossover, and mutation to find the largest clique in a given graph G. operations. Each individual is then evaluated using a fitness function and the best of the individuals are transmitted into B. Need For Improved Technique For Huge the next generation. The following procedure shows a basic genetic algorithm: Networks Over the last few years, the size of networks has increased GA procedure(){ rapidly and the amount of information from massive data Initialize population(); sets has become a highly researched computational Evaluate population(); challenge. While not (End_Condition){ Some examples of fast growing networks exist in social Select parent(); networks and bioinformatics. The use of traditional Crossover(); methods for analyzing and treating the large amounts of Mutation(); information has become problematic. The nature of the Evaluate population(); MCP (NP-complete) explains why is it more complicated } to find the maximum clique when the number of vertice N } Figure 1 Genetic algorithm Copyright retained by the authors. Since the 1990s, significant research has been conducted score of individual in populated landscapes. This technique on genetic algorithm, especially for finding the maximum is known as one of the best niching technique to escape clique. from local optima. In this paper, we implemented the sharing method, as we II. Related Work are attempting to diversify and explore the elements of the graph more thoroughly. The sharing function recalculates Early works on the maximum clique problem were focused the fitness function based on how dense an area of the on greedy approaches. One of the first significant graph is used. Therefore, as with any fitness function, our improvements on the maximum clique was done by Bron- shared fitness becomes that shown in equation 1. Kerbausch[4]. This algorithm uses a recursive f (i) backtracking procedure that augments the clique by one f '(i) = n (1) vertex at a time. Tomita, Tanaka and Takahashi [5] proved that with a worst-time complexity O(3n/3), the Bron- ∑ sh(d(i, j)) j =1 Kerbaush algorithm was reported as one of the fastest algorithms for listing all possible maximum cliques. where sh is the sharing based on the distance d between In 1986, Robson improved this algorithm by adding vertex i and j. The sharing function is found in equation backtracking techniques in combination with more (2). complicated case analyses and dynamic programming. € This however increased the space complexity of the MCP ⎧ d α ⎪1 − ( ) , if d < σshare [6]. sh(d) = ⎨ σshare (2) ⎪⎩ 0, otherwise A. Genetic Algorithm Methods For Max Clique Genetic algorithms have been successfully applied to many NP-hard problems in various domains [7-8]. GA has also C. Migration in GA been successfully used on graph problems, particularly on Yet another technique that inspired sociological and the graph-coloring problem [9]. On the MCP, the €first ecological movements between sub-populations is the approaches using GA had poor performance compared to concept of migration. Migration in genetic algorithms has other local search techniques [10, 11]. been well studied and applied in many parallel genetic The idea of combining a genetic algorithm and a algorithms. heuristic local search has been used in earlier applications. There are important considerations before implementing Marchiori [12] proposed a simple genetic algorithm based migration in genetic algorithms. Migration policies define on a combination of heuristic algorithms for the MCP. In the replacement rule to apply between two sub-populations. this HGA several important genetic mechanisms were It is equivalent to answering the question: “What kind of implemented such as the keep-two-best parents, elitism, a individual in a sub-population A is replacing what kind of roulette-wheel population selection, uniform crossover, individual in sub-population B”. Topology is another and a fitness function based on the size of the maximal important aspect in multi-population genetic algorithm. clique. Moreover, the heuristic method is used to do local The topology defines the structure of communication transformation on a sub-graph transforming all between several sub-populations [14]. Finally, the chromosomes after crossover and mutation into a maximal migration rate must be used to define the proportion of a clique. These processes have become known as the sub-population that is replaced with the immigrant “relax”, “repair”, and “extend” phases. Her experiments on individuals. the DIMACS data sets showed that results quality and Migration rates and migration policies have been shown computational time have improved dramatically compared to improve the convergence of genetic algorithm to other heuristic approaches. Marchiori [13] continued considerably. Cantu-Paz has conducted experiments in developing the hybrid genetic algorithm and compared this which the best-replacing-worst population policy has algorithm with two other competitive variations of genetic demonstrated the best results for migration policies [15]. algorithm: the iterated local search algorithm and the multistart local search algorithm. III. Proposed Solutions B. Fitness Sharing Method A. Hybrid Genetic Algorithm Fitness sharing method has been used to avoid GA The heuristic local search algorithm is a local search based converging to local maxima. This technique was proposed on greedy approach. Given a set of vertices, this algorithm by Holland [3] and expanded by Goldberg and Richardson finds the maximal clique using a “Relax”, a “Repair” and [22]. In a GA, the formation of population in local optimal an “Extend” steps. Marchiori described this algorithm in (also called niche) makes difficult the convergence toward details [12]. The relax step adds random vertices to a the global optimal solution. The fitness sharing method subgraph. The repair step then extracts a small maximal helps to diversify the population by reducing the fitness clique from the current sub-graph. Finally the extend step generates a bigger maximal clique by adding random populations to occur over each generation of the system. nodes to the clique obtained in the repair step. Figure 3 shows the implementation of our hybrid genetic The HGA is then obtained by combining the heuristic algorithm with a migration mechanism. local search with a genetic algorithm given in figure 1. The heuristic function is added after the crossover operation to find maximal clique. B. Introduction of a Migration Mechanism in Genetic Algorithm In our solution we propose a migration mechanism based on two sub-populations, a main sub-population and a secondary sub-population. The main population will have the role of a standard HGA. The secondary sub-population will serve as a pool that generates the new chromosomes for migrating into the main subpopulation. This mechanism should help to accelerate the convergence of the current hybrid genetic algorithm. Figure 3 Hybrid GA with migration mechanism The main sub-population is the standard genetic heuristic 1) Role of the Two Sub-Populations algorithm discussed by Marchiori [11]. The secondary sub- The idea of combining a heuristic local search and a population, however, is of more interest. This genetic algorithm is not new. It has proven to be subpopulation uses a non-evolutionary structure. It has no competitive in solving the maximum clique problem. One genetic operators and its role is to give random “seed” to issue with this approach is the convergence into local the main population at every generation. optima. The diversification strategy relies on the genetic To initialize the population, we randomly choose operators: crossover and mutation. The exploration of the vertices for each chromosome in the new population. We set of feasible solutions (or neighborhood) in hybrid then apply the heuristic function, which extracts the genetic algorithm is an area that can be improved. maximal clique from each chromosome. We then sort the In our approach we define two sub-populations. The population by fitness, to determine which of the elements Main sub-population uses a generic HGA defined should migrate to the main sub-population. previously. This sub-population specializes in the exploitation of the data. The secondary sub-population will IV. Experimental Results be dedicated to exploration of the data. In the secondary population, an effort is made to diversify the population. A. Benchmark Datasets Used The sub-population will then transfer the best of its population to the main sub-population through a migration The data used for our experiments are from the DIMACS mechanism. In this approach, we want to maximize the (Center for Discrete Mathematics and Theoretical functions of exploration and exploitation of the genetic Computer Science) benchmark graphs. It is composed of a algorithm. Figure 2 shows the role of each sub-population. collection of 9 different classes of graphs for evaluating and comparing different algorithms for solving the maximum clique problem. This collection consists of random graphs with known maximum clique size as well as graphs obtained from various areas of applications. The C- FAT graphs are based on the fault diagnosis problem [16]. The Hamming and Johnson graphs are from the coding theory problems [17] [18]. The Keller graphs arise from the Keller conjecture on tiling using hypercube [19]. The SAN, SANR, BROCK and P-HAT are composed of various types of random graphs with known maximum clique sizes. The Figure 2 Role of sub-populations BROCK graphs contain random graphs constructed so that they have hidden cliques that are larger than what might be 2) Migration Mechanism expected in a random graph. The P-HAT graphs are In our algorithm, it was decided to use a best-replace-worst random graphs with large variance in the vertex degree migration policy at each generation, where the best distribution and a larger clique than usual random graphs individuals in the sub-population will replace a percentage [20]. Finally, the MANN graphs are from the vertex- of the worst individuals in the main sub-population. We covering problem, which is closely related to the maximum also used a ladder migration topology defined in [15], clique problem [21]. which allows the migration between the two sub- B. Results Obtained of discovering bigger cliques are low because the HGA reach local maxima. A series of experiments were conducted using a small subset of the Dimacs data sets. The results obtained show that the performance of our algorithm is close to the best results obtained on the same data sets from the DIMACS challenge (table 1). Graph Best obtained Best DIMACS keller4   11   11   keller5   27   27   keller6   49   59   hamming8-­‐4   16   16   hamming10-­‐4   38   40   MANN_a27   126   126   MANN_a81   1096   1098   Table 1 Comparison of our algorithm with Best DIMACS C. Effect of Mutation Rate Variation on Results Figure 4 mutation= 1% In these experiments, we evaluate the effect of the mutation by holding other parameters static. We analyzed The results obtained in our experiments shows that 1% the effect of mutation on the Keller5 graph with the mutation rate achieves the best performance with the fastest mutation rate parameters: 0.5%, 1%, 5%, 10% and 20%. convergence and helps diversifying the population to its Table 2 shows the parameters chosen for this set of maximums. The HGA algorithm found a 24-clique after only eight generations. experiments. Parameters Value Population size 100 D. Effect of Migration Rate Variation on Results Maximum iteration 50 In this set of experiments, we fix the mutation rate with the Mutation 0.5%, 1%, 5%, 10%, 20% value found greedily in the previous experiment. We then vary the migration rate with the parameters: 1%, 5%, 10%, Crossover rate 100% (uniform crossover) 20%, 30%, 50% and 90%. Table 3 shows the parameters Table 2 Parameters for testing mutation rate chosen applied on the same graph (Keller5). At each generation, the mutation breaks current Parameters Value maximal clique in every chromosome, by replacing current nodes by other random nodes. Population size 100 On one hand, a low mutation rate keeps nodes that are Maximum iteration 50 good candidates for the maximum clique and the heuristic Mutation 1% algorithm find good local solutions. But less graph Crossover rate 100% (uniform crossover) exploration are possible using a low mutation rate. On the Migration rates 1%, 5%, 10%, 20%, 30%, other hand, a high mutation rate gives a good chance of Table 3 Parameters for50%, 90% testing migration rate graph exploration, but the local search become less efficient, as to many nodes are exchanged in The effect of a high migration rate is the replacement of chromosomes. Choosing a good mutation rate is a trade-off good solutions obtained by the HGA, whereas a low between finding a good local solution and good graph migration rate injects not enough new seeds in new exploration. Figure 4 shows the minimum, the average and population. The experiments show that the optimum maximum fitness score for each chromosome for 1% migration rate is reached at 10% and the maximum clique mutation rate. is found after 28 generations (ground truth maximum We can observe two phases applying the GA to the clique=27). Figure 5 shows the minimum, the average and maximum clique problem. In the first phase, a high maximum fitness score for each chromosome for 10% variation of the fitness score phase can be observed from 0 migration rate. to the 15th generation. During this phase, the HGA In this graph, we can observe that the maximum fitness explores various combinations of nodes in the graph, so increase by levels, which means that new maximal cliques the fitness score change rapidly. Then a phase of are found. After one or two generations the stabilization stabilization is observed where the maximal cliques are phase is reached and the GA continued graph exploration. kept in the population and improvement occurs The results show that the migration process helped the progressively. During the stabilization phase, the chances genetic algorithm to escape from local maxima and find the maximum clique by adding more graph exploration Table 5 parameters for testing fitness sharing capability. A high number of niches (σshare) allow diversifying the population by reducing the fitness value of individual from the same niche. This way, the chance to promote other individuals from other niches is higher. We obtained the best results in this series of experiments with the parameter σshare=20. the maximum clique is found after 170 generations. Figure 6 shows the minimum, the average and maximum fitness score for each chromosome for σshare=20. Figure 5 Migration rate 10% E. Effect of Migration with Variation of Population Size and Number of Generations In these experiment we test the effect of migration on a variation of the population size and maximum number of generation. We compare the effect of migration on different parameter setup. Table 4 shows the parameters for this set of experiments. Figure 6 σshare =20 Parameters Value The fitness sharing method also plays a function of (Population size / (10/1000,y), (10/1000,n), graph exploration by reducing the fitness values of Maximum iteration / (50/200,y), (50/200,n), individual in the same niche. Then other individual from migration) (100/100, y), (100/100,n), other niches are promoted in the population. Finally we (200/50,y), (200/50,n) have shown that fitness sharing method is also well suited for graph exploration and helps to find the maximum Mutation 1% clique. Crossover rate 100% (uniform crossover) Migration rates 10% V. CONCLUSION Table 4 Parameters for testing effects of migration For the maximum clique problem, we have proposed a The results indicate that for each configuration, applying solution that implemented a migration mechanism our migration mechanism increase the size of maximum composed of two subpopulations. The main sub-population clique found and also accelerates the convergence toward is specialized on data exploitation and the secondary the best solution. subpopulation is specialized on data exploration. Our results have shown that both sharing and migration F. Effect of fitness Sharing on Results improve the convergence and results of the genetic In these experiments we test the effect of fitness sharing on algorithm. We have shown that migration proves to be a same graph. The parameter σshare from equation (2) valuable tool in converging on the maximum, or high represents the number of niches. We vary the number of maxima relatively quickly. We have also discussed the niches with the values: 3, 7, 10, 20. Table 5 shows the potential usefulness in using sharing to find multiple parameters used for this set of experiments. maximum solutions in the same problem. We also found that in order to maximize the difference and prevent fast Parameters Value convergence of the minimum fitness value, a 1% mutation Population size 10 rate appears to give the best results. The migration rate was Maximum iteration 1000 also empirically derived to be best at 10% migration. Mutation 1% These experiments lay groundwork for potential research Crossover rate 100% (uniform crossover) in the future. We used a non-evolutionary approach by 3,7,10, 20 adding a second sub-population, which serves the main Number of Niches (σshare) population by generating random new seeds. Other the 10th ACM Symposium on Applied Computing. improvements can be made by developing a more specific ACM Press, 1995. heuristic local search algorithm for the secondary sub- [12] Marchiori, Elena. "A simple heuristic based genetic population dedicated to graph exploration. algorithm for the maximum clique problem." There are more improvement possibilities by developing Symposium on Applied Computing: Proceedings of a co-evolutionary method where instead of generating new the 1998 ACM symposium on Applied Computing, seed at each iteration, the second sub-population evolves at vol. 27, pp. 366-373. 1998. the same time with the main sub-population. Migration is [13] E. Marchiori. Genetic, iterated and multistart local maintained at every generation for faster convergence. search for the maximum clique problem”. Applications Finally a cultural co-evolutionary approach would give of Evolutionary Computing. Berlin, Germany: feedback from the main sub-population to the secondary Springer-Verlag, 2002, LNCS 2279, pp. 112–121. sub-population, to help building better candidate for exploration. [14] Cantu-Paz, E. (1999) Migration policies and takeovertimes in parallel genetic algorithms. IlliGAL REFERENCES Technical Report No. 99008. University of Illinois at Urbana-Champaign. [1] Cheng, James, Yiping Ke, Ada Wai-Chee Fu, Jeffrey [15] Cantu-Paz, E. A survey of parallel genetic algorithms. Xu Yu, and Linhong Zhu. "Finding maximal cliques Technical report 97003. University of Illinois at in massive networks by h*-graph." Proceedings of the Urbana-Champaign, May 2007. 2010 international conference on Management of data, [16] Berman, Pioa, and A. Pelc. "Distributed probabilistic pp. 447-458. ACM, 2010. fault diagnosis for multiprocessor systems." Fault- [2] Barricelli, Nils Aall (1954). "Esempi numerici di Tolerant Computing. FTCS-20. Digest of Papers. 20th processi di evoluzione". Methodos: 45–68. International Symposium. IEEE, 1990. [3] Holland J. H. Adaptation in natural and artificial [17] J. MacWilliams and N. J. A. Sloane, The Theory of systems. The University of Michigan Press, Ann Error Correcting Codes," North-Holland, Amsterdam, Arbor, MI, 1975. 1979. [4] Bron, Coen, and Joep Kerbosch. "Algorithm 457: [18] Sloane, N. J. A. "Unsolved problems in graph theory finding all cliques of an undirected graph." arising from the study of codes." Graph Theory Notes Communications of the ACM 16.9 (1973): 575-577. of New York 18 (1989): 11-20. [5] Tomita, Etsuji, Akira Tanaka, and Haruhisa [19] J. C. Lagarias and P. W. Shor, Keller's Cube-Tiling Takahashi. "The worst-case time complexity for Conjecture is False in High Dimensions," Bulletin generating all maximal cliques and computational AMS, 27(2), pp. 279-283. experiments." Theoretical Computer Science 363, no. [20] M. Brockington and J. Culberson, Camouflaging 1 (2006): 28-42. Independent Sets in Quasi-Random Graphs." Working [6] Robson, John Michael. "Algorithms for maximum Paper, Second DIMACS Implementation Challenge, independent sets." Journal of Algorithms 7.3 (1986): 1993. 425-440. [21] T. N. Bui and P. H. Eppley, A Hybrid Genetic [7] Yampolskiy, R., P. Anderson, J. Arney, V. Misic, and Algorithm for the Maximum Clique Problem." T. Clarke. "Printer model integrating genetic Proceedings of the 6th International Conference on algorithm for improvement of halftone patterns." Genetic Algorithms (ICGA), Pittsburgh, PA, Morgan Western New York Image Processing Workshop Kaufmann, 1995, pp. 478-484. (WNYIPW). 2004. [22] Goldberg, David E., and Jon Richardson. "Genetic [8] Ashby, Leif H., and Roman V. Yampolskiy. "Genetic algorithms with sharing for multimodal function algorithm and Wisdom of Artificial Crowds algorithm optimization." Proceedings of the Second International applied to Light up." 16th International Conference on Conference on Genetic Algorithms and their Computer Games (CGAMES), pp. 27-32. IEEE, 2011. application, pp. 41-49. 1987. [9] Hindi, Musa M., and Roman V. Yampolskiy. "Genetic Algorithm Applied to the Graph Coloring Problem." Midwest Artificial Intelligence and Cognitive Science Conference, p. 60. 2012. [10] B. Carter and K. Park. How good are genetic algorithms at finding large cliques: an experimental study. Technical report, Boston University, Computer Science Department, MA, October 1993. [11] K. Park and B. Carter. On the effectiveness of genetic search in combinatorial optimization. Proceedings of