Hybrid genetic algorithm for the maximum clique problem
                                         combining sharing and migration

                                     Roger Ouch, Kristopher W. Reese, Roman V. Yampolskiy
                                          Computer Engineering and Computer Science,
                                                     University of Louisville
                                                  Louisville, Kentucky 40292


                              Abstract                            increases dramatically. Traditional heuristic approaches for
The Maximum Clique Problem (MCP) has been studied for             finding the maximum clique problem have become
decades and is well known in graph theory as a problem that is    insufficient for this new challenge.
difficult as it as it is known to be NP-complete. The MCP has a      Cheng et al. [1] investigated finding the maximal clique
vast domain of application such as finance, biochemistry,         within massive networks. They introduced the concept of
bioinformatics, and many more. Many niching methods have          an H* graph where the maximum clique is computed in a
been successfully applied in Genetic Algorithms (GA) to           small part of a large graph recursively one at a time using
diversify the population and avoid getting trapped within local   external memory. Their experiments were done on large
optima. In this paper, we propose an approach using the Sharing   graph with 10 million vertices and 80 millions edges.
method and a Hybrid Genetic Algorithm (HGA) for the
maximum clique problem. We also propose a non-evolutionary
                                                                  C. Description of Genetic Algorithms
approach using a migration mechanism to boost the current HGA.
                                                                  A well-known metaheuristic in evolutionary algorithms is
                                                                  the genetic algorithm. The earliest work related to GAs
                         I. Introduction                          dates back to 1954 when Nils and Barricelli published a
                                                                  paper on computer simulation of evolution [2]. Genetic
A. The Maximum Clique Problem                                     algorithms have been formalized in later years and date
The maximum clique problem is a classically studied               back as early as 1975 when J.H Holland published his
combinatorial optimization problem, which comes from              papers on the subject [3].
graph theory and has many domains of application. The                A GA is a combinatorial optimization technique that
maximum clique problem can be briefly summarized using            mimics the process of natural evolution. The algorithm
the following definition.                                         does not rely on building the final solution based on local
   Let’s G= (V,E) be an undirected graph on N vertices,           search, which distinguishes itself from traditional
where V is the vertex set and E is the edge set. A clique in      approaches. A subset of feasible solutions of a problem is
G is a subgraph of G in which there is an edge between            encoded in chromosomes that represent individuals in a
any two vertices. The size of a clique is the number of           population. The evolutionary process is then applied
vertices in the clique, and the maximum clique problem is         through population selection, crossover, and mutation
to find the largest clique in a given graph G.                    operations. Each individual is then evaluated using a fitness
                                                                  function and the best of the individuals are transmitted into
B. Need For Improved Technique For Huge                           the next generation. The following procedure shows a basic
                                                                  genetic algorithm:
   Networks
Over the last few years, the size of networks has increased       GA procedure(){
rapidly and the amount of information from massive data                  Initialize population();
sets has become a highly researched computational                        Evaluate population();
challenge.                                                               While not (End_Condition){
   Some examples of fast growing networks exist in social                          Select parent();
networks and bioinformatics. The use of traditional                                Crossover();
methods for analyzing and treating the large amounts of                            Mutation();
information has become problematic. The nature of the                              Evaluate population();
MCP (NP-complete) explains why is it more complicated                    }
to find the maximum clique when the number of vertice N           }
                                                                  Figure 1 Genetic algorithm

Copyright retained by the authors.
   Since the 1990s, significant research has been conducted     score of individual in populated landscapes. This technique
on genetic algorithm, especially for finding the maximum        is known as one of the best niching technique to escape
clique.                                                         from local optima.
                                                                   In this paper, we implemented the sharing method, as we
                   II. Related Work                             are attempting to diversify and explore the elements of the
                                                                graph more thoroughly. The sharing function recalculates
Early works on the maximum clique problem were focused          the fitness function based on how dense an area of the
on greedy approaches. One of the first significant              graph is used. Therefore, as with any fitness function, our
improvements on the maximum clique was done by Bron-            shared fitness becomes that shown in equation 1.
Kerbausch[4]. This algorithm uses a recursive                                   f (i)
backtracking procedure that augments the clique by one           f '(i) = n                                           (1)
vertex at a time. Tomita, Tanaka and Takahashi [5] proved
that with a worst-time complexity O(3n/3), the Bron-                     ∑ sh(d(i, j))
                                                                         j =1
Kerbaush algorithm was reported as one of the fastest
algorithms for listing all possible maximum cliques.            where sh is the sharing based on the distance d between
   In 1986, Robson improved this algorithm by adding            vertex i and j. The sharing function is found in equation
backtracking techniques in combination with more                (2).
complicated case analyses and dynamic programming.   €
This however increased the space complexity of the MCP                  ⎧        d α
                                                                        ⎪1 − (        ) , if d < σshare
[6].                                                            sh(d) = ⎨      σshare                                (2)
                                                                        ⎪⎩      0, otherwise
A. Genetic Algorithm Methods For Max Clique
Genetic algorithms have been successfully applied to many
NP-hard problems in various domains [7-8]. GA has also          C. Migration in GA
been successfully used on graph problems, particularly on       Yet another technique that inspired sociological and
the graph-coloring problem [9]. On the MCP, the €first          ecological movements between sub-populations is the
approaches using GA had poor performance compared to            concept of migration. Migration in genetic algorithms has
other local search techniques [10, 11].                         been well studied and applied in many parallel genetic
   The idea of combining a genetic algorithm and a              algorithms.
heuristic local search has been used in earlier applications.      There are important considerations before implementing
Marchiori [12] proposed a simple genetic algorithm based        migration in genetic algorithms. Migration policies define
on a combination of heuristic algorithms for the MCP. In        the replacement rule to apply between two sub-populations.
this HGA several important genetic mechanisms were              It is equivalent to answering the question: “What kind of
implemented such as the keep-two-best parents, elitism, a       individual in a sub-population A is replacing what kind of
roulette-wheel population selection, uniform crossover,         individual in sub-population B”. Topology is another
and a fitness function based on the size of the maximal         important aspect in multi-population genetic algorithm.
clique. Moreover, the heuristic method is used to do local         The topology defines the structure of communication
transformation on a sub-graph transforming all                  between several sub-populations [14]. Finally, the
chromosomes after crossover and mutation into a maximal         migration rate must be used to define the proportion of a
clique. These processes have become known as the                sub-population that is replaced with the immigrant
“relax”, “repair”, and “extend” phases. Her experiments on      individuals.
the DIMACS data sets showed that results quality and               Migration rates and migration policies have been shown
computational time have improved dramatically compared          to improve the convergence of genetic algorithm
to other heuristic approaches. Marchiori [13] continued         considerably. Cantu-Paz has conducted experiments in
developing the hybrid genetic algorithm and compared this       which the best-replacing-worst population policy has
algorithm with two other competitive variations of genetic      demonstrated the best results for migration policies [15].
algorithm: the iterated local search algorithm and the
multistart local search algorithm.                                              III. Proposed Solutions
B. Fitness Sharing Method                                       A. Hybrid Genetic Algorithm
Fitness sharing method has been used to avoid GA                The heuristic local search algorithm is a local search based
converging to local maxima. This technique was proposed         on greedy approach. Given a set of vertices, this algorithm
by Holland [3] and expanded by Goldberg and Richardson          finds the maximal clique using a “Relax”, a “Repair” and
[22]. In a GA, the formation of population in local optimal     an “Extend” steps. Marchiori described this algorithm in
(also called niche) makes difficult the convergence toward      details [12]. The relax step adds random vertices to a
the global optimal solution. The fitness sharing method         subgraph. The repair step then extracts a small maximal
helps to diversify the population by reducing the fitness       clique from the current sub-graph. Finally the extend step
generates a bigger maximal clique by adding random             populations to occur over each generation of the system.
nodes to the clique obtained in the repair step.               Figure 3 shows the implementation of our hybrid genetic
   The HGA is then obtained by combining the heuristic         algorithm with a migration mechanism.
local search with a genetic algorithm given in figure 1. The
heuristic function is added after the crossover operation to
find maximal clique.

B. Introduction of a Migration Mechanism in
   Genetic Algorithm
In our solution we propose a migration mechanism based
on two sub-populations, a main sub-population and a
secondary sub-population. The main population will have
the role of a standard HGA. The secondary sub-population
will serve as a pool that generates the new chromosomes
for migrating into the main subpopulation. This
mechanism should help to accelerate the convergence of
the current hybrid genetic algorithm.                          Figure 3 Hybrid GA with migration mechanism
                                                                  The main sub-population is the standard genetic heuristic
  1) Role of the Two Sub-Populations                           algorithm discussed by Marchiori [11]. The secondary sub-
The idea of combining a heuristic local search and a           population, however, is of more interest. This
genetic algorithm is not new. It has proven to be              subpopulation uses a non-evolutionary structure. It has no
competitive in solving the maximum clique problem. One         genetic operators and its role is to give random “seed” to
issue with this approach is the convergence into local         the main population at every generation.
optima. The diversification strategy relies on the genetic        To initialize the population, we randomly choose
operators: crossover and mutation. The exploration of the      vertices for each chromosome in the new population. We
set of feasible solutions (or neighborhood) in hybrid          then apply the heuristic function, which extracts the
genetic algorithm is an area that can be improved.             maximal clique from each chromosome. We then sort the
   In our approach we define two sub-populations. The          population by fitness, to determine which of the elements
Main sub-population uses a generic HGA defined                 should migrate to the main sub-population.
previously. This sub-population specializes in the
exploitation of the data. The secondary sub-population will                 IV. Experimental Results
be dedicated to exploration of the data. In the secondary
population, an effort is made to diversify the population.
                                                               A. Benchmark Datasets Used
The sub-population will then transfer the best of its
population to the main sub-population through a migration      The data used for our experiments are from the DIMACS
mechanism. In this approach, we want to maximize the           (Center for Discrete Mathematics and Theoretical
functions of exploration and exploitation of the genetic       Computer Science) benchmark graphs. It is composed of a
algorithm. Figure 2 shows the role of each sub-population.     collection of 9 different classes of graphs for evaluating
                                                               and comparing different algorithms for solving the
                                                               maximum clique problem. This collection consists of
                                                               random graphs with known maximum clique size as well as
                                                               graphs obtained from various areas of applications. The C-
                                                               FAT graphs are based on the fault diagnosis problem [16].
                                                               The Hamming and Johnson graphs are from the coding
                                                               theory problems [17] [18]. The Keller graphs arise from the
                                                               Keller conjecture on tiling using hypercube [19]. The SAN,
                                                               SANR, BROCK and P-HAT are composed of various types
                                                               of random graphs with known maximum clique sizes. The
Figure 2 Role of sub-populations                               BROCK graphs contain random graphs constructed so that
                                                               they have hidden cliques that are larger than what might be
  2) Migration Mechanism                                       expected in a random graph. The P-HAT graphs are
In our algorithm, it was decided to use a best-replace-worst   random graphs with large variance in the vertex degree
migration policy at each generation, where the best            distribution and a larger clique than usual random graphs
individuals in the sub-population will replace a percentage    [20]. Finally, the MANN graphs are from the vertex-
of the worst individuals in the main sub-population. We        covering problem, which is closely related to the maximum
also used a ladder migration topology defined in [15],         clique problem [21].
which allows the migration between the two sub-
B. Results Obtained                                           of discovering bigger cliques are low because the HGA
                                                              reach local maxima.
A series of experiments were conducted using a small
subset of the Dimacs data sets. The results obtained show
that the performance of our algorithm is close to the best
results obtained on the same data sets from the DIMACS
challenge (table 1).
Graph                Best obtained          Best DIMACS
keller4	
            11	
                   11	
  
keller5	
            27	
                   27	
  
keller6	
            49	
                   59	
  
hamming8-­‐4	
       16	
                   16	
  
hamming10-­‐4	
      38	
                   40	
  
MANN_a27	
           126	
                  126	
  
MANN_a81	
           1096	
                 1098	
  
    Table 1 Comparison of our algorithm with Best DIMACS

C. Effect of Mutation Rate Variation on Results                                  Figure 4 mutation= 1%
In these experiments, we evaluate the effect of the
mutation by holding other parameters static. We analyzed        The results obtained in our experiments shows that 1%
the effect of mutation on the Keller5 graph with the          mutation rate achieves the best performance with the fastest
mutation rate parameters: 0.5%, 1%, 5%, 10% and 20%.          convergence and helps diversifying the population to its
Table 2 shows the parameters chosen for this set of           maximums. The HGA algorithm found a 24-clique after
                                                              only eight generations.
experiments.
       Parameters                      Value
Population size             100
                                                              D. Effect of Migration Rate Variation on Results
Maximum iteration           50                                In this set of experiments, we fix the mutation rate with the
Mutation                    0.5%, 1%, 5%, 10%, 20%            value found greedily in the previous experiment. We then
                                                              vary the migration rate with the parameters: 1%, 5%, 10%,
Crossover rate              100% (uniform crossover)
                                                              20%, 30%, 50% and 90%. Table 3 shows the parameters
           Table 2 Parameters for testing mutation rate       chosen applied on the same graph (Keller5).
   At each generation, the mutation breaks current
                                                                     Parameters                           Value
maximal clique in every chromosome, by replacing current
nodes by other random nodes.                                  Population size                100
   On one hand, a low mutation rate keeps nodes that are      Maximum iteration              50
good candidates for the maximum clique and the heuristic      Mutation                       1%
algorithm find good local solutions. But less graph           Crossover rate                 100% (uniform crossover)
exploration are possible using a low mutation rate. On the    Migration rates                1%, 5%, 10%, 20%, 30%,
other hand, a high mutation rate gives a good chance of                Table 3 Parameters for50%,   90%
                                                                                              testing migration rate
graph exploration, but the local search become less
efficient, as to many nodes are exchanged in                     The effect of a high migration rate is the replacement of
chromosomes. Choosing a good mutation rate is a trade-off     good solutions obtained by the HGA, whereas a low
between finding a good local solution and good graph          migration rate injects not enough new seeds in new
exploration. Figure 4 shows the minimum, the average and      population. The experiments show that the optimum
maximum fitness score for each chromosome for 1%              migration rate is reached at 10% and the maximum clique
mutation rate.                                                is found after 28 generations (ground truth maximum
   We can observe two phases applying the GA to the           clique=27). Figure 5 shows the minimum, the average and
maximum clique problem. In the first phase, a high            maximum fitness score for each chromosome for 10%
variation of the fitness score phase can be observed from 0   migration rate.
to the 15th generation. During this phase, the HGA               In this graph, we can observe that the maximum fitness
explores various combinations of nodes in the graph, so       increase by levels, which means that new maximal cliques
the fitness score change rapidly. Then a phase of             are found. After one or two generations the stabilization
stabilization is observed where the maximal cliques are       phase is reached and the GA continued graph exploration.
kept in the population and improvement occurs                 The results show that the migration process helped the
progressively. During the stabilization phase, the chances    genetic algorithm to escape from local maxima and find the
maximum clique by adding more graph exploration                          Table 5 parameters for testing fitness sharing
capability.
                                                                   A high number of niches (σshare) allow diversifying the
                                                                population by reducing the fitness value of individual from
                                                                the same niche. This way, the chance to promote other
                                                                individuals from other niches is higher.
                                                                   We obtained the best results in this series of experiments
                                                                with the parameter σshare=20. the maximum clique is found
                                                                after 170 generations. Figure 6 shows the minimum, the
                                                                average and maximum fitness score for each chromosome
                                                                for σshare=20.


                 Figure 5 Migration rate 10%

E. Effect of Migration with Variation of
   Population Size and Number of Generations
In these experiment we test the effect of migration on a
variation of the population size and maximum number of
generation. We compare the effect of migration on
different parameter setup. Table 4 shows the parameters
for this set of experiments.                                                          Figure 6 σshare =20

     Parameters                          Value                     The fitness sharing method also plays a function of
 (Population size /          (10/1000,y), (10/1000,n),          graph exploration by reducing the fitness values of
 Maximum iteration /        (50/200,y), (50/200,n),             individual in the same niche. Then other individual from
 migration)                 (100/100, y), (100/100,n),          other niches are promoted in the population. Finally we
                            (200/50,y), (200/50,n)              have shown that fitness sharing method is also well suited
                                                                for graph exploration and helps to find the maximum
 Mutation                   1%                                  clique.
 Crossover rate             100% (uniform crossover)
 Migration rates            10%                                                   V. CONCLUSION
       Table 4 Parameters for testing effects of migration
                                                                For the maximum clique problem, we have proposed a
The results indicate that for each configuration, applying      solution that implemented a migration mechanism
our migration mechanism increase the size of maximum            composed of two subpopulations. The main sub-population
clique found and also accelerates the convergence toward        is specialized on data exploitation and the secondary
the best solution.                                              subpopulation is specialized on data exploration.
                                                                   Our results have shown that both sharing and migration
F. Effect of fitness Sharing on Results                         improve the convergence and results of the genetic
In these experiments we test the effect of fitness sharing on   algorithm. We have shown that migration proves to be a
same graph. The parameter σshare from equation (2)              valuable tool in converging on the maximum, or high
represents the number of niches. We vary the number of          maxima relatively quickly. We have also discussed the
niches with the values: 3, 7, 10, 20. Table 5 shows the         potential usefulness in using sharing to find multiple
parameters used for this set of experiments.                    maximum solutions in the same problem. We also found
                                                                that in order to maximize the difference and prevent fast
       Parameters                          Value                convergence of the minimum fitness value, a 1% mutation
Population size                 10                              rate appears to give the best results. The migration rate was
Maximum iteration               1000                            also empirically derived to be best at 10% migration.
Mutation                        1%                                 These experiments lay groundwork for potential research
Crossover rate                  100% (uniform crossover)        in the future. We used a non-evolutionary approach by
                                3,7,10, 20                      adding a second sub-population, which serves the main
Number of Niches (σshare)
population by generating random new seeds. Other                    the 10th ACM Symposium on Applied Computing.
improvements can be made by developing a more specific              ACM Press, 1995.
heuristic local search algorithm for the secondary sub-        [12] Marchiori, Elena. "A simple heuristic based genetic
population dedicated to graph exploration.                          algorithm for the maximum clique problem."
   There are more improvement possibilities by developing           Symposium on Applied Computing: Proceedings of
a co-evolutionary method where instead of generating new            the 1998 ACM symposium on Applied Computing,
seed at each iteration, the second sub-population evolves at        vol. 27, pp. 366-373. 1998.
the same time with the main sub-population. Migration is       [13] E. Marchiori. Genetic, iterated and multistart local
maintained at every generation for faster convergence.              search for the maximum clique problem”. Applications
Finally a cultural co-evolutionary approach would give
                                                                    of Evolutionary Computing. Berlin, Germany:
feedback from the main sub-population to the secondary
                                                                    Springer-Verlag, 2002, LNCS 2279, pp. 112–121.
sub-population, to help building better candidate for
exploration.                                                   [14] Cantu-Paz, E. (1999) Migration policies and
                                                                    takeovertimes in parallel genetic algorithms. IlliGAL
                   REFERENCES                                       Technical Report No. 99008. University of Illinois at
                                                                    Urbana-Champaign.
[1] Cheng, James, Yiping Ke, Ada Wai-Chee Fu, Jeffrey          [15] Cantu-Paz, E. A survey of parallel genetic algorithms.
     Xu Yu, and Linhong Zhu. "Finding maximal cliques               Technical report 97003. University of Illinois at
     in massive networks by h*-graph." Proceedings of the           Urbana-Champaign, May 2007.
     2010 international conference on Management of data,      [16] Berman, Pioa, and A. Pelc. "Distributed probabilistic
     pp. 447-458. ACM, 2010.                                        fault diagnosis for multiprocessor systems." Fault-
[2] Barricelli, Nils Aall (1954). "Esempi numerici di               Tolerant Computing. FTCS-20. Digest of Papers. 20th
     processi di evoluzione". Methodos: 45–68.                      International Symposium. IEEE, 1990.
[3] Holland J. H. Adaptation in natural and artificial         [17] J. MacWilliams and N. J. A. Sloane, The Theory of
     systems. The University of Michigan Press, Ann                 Error Correcting Codes," North-Holland, Amsterdam,
     Arbor, MI, 1975.                                               1979.
[4] Bron, Coen, and Joep Kerbosch. "Algorithm 457:             [18] Sloane, N. J. A. "Unsolved problems in graph theory
     finding all cliques of an undirected graph."                   arising from the study of codes." Graph Theory Notes
     Communications of the ACM 16.9 (1973): 575-577.                of New York 18 (1989): 11-20.
[5] Tomita, Etsuji, Akira Tanaka, and Haruhisa                 [19] J. C. Lagarias and P. W. Shor, Keller's Cube-Tiling
     Takahashi. "The worst-case time complexity for                 Conjecture is False in High Dimensions," Bulletin
     generating all maximal cliques and computational               AMS, 27(2), pp. 279-283.
     experiments." Theoretical Computer Science 363, no.       [20] M. Brockington and J. Culberson, Camouflaging
     1 (2006): 28-42.                                               Independent Sets in Quasi-Random Graphs." Working
[6] Robson, John Michael. "Algorithms for maximum                   Paper, Second DIMACS Implementation Challenge,
     independent sets." Journal of Algorithms 7.3 (1986):           1993.
     425-440.                                                  [21] T. N. Bui and P. H. Eppley, A Hybrid Genetic
[7] Yampolskiy, R., P. Anderson, J. Arney, V. Misic, and            Algorithm for the Maximum Clique Problem."
     T. Clarke. "Printer model integrating genetic                  Proceedings of the 6th International Conference on
     algorithm for improvement of halftone patterns."               Genetic Algorithms (ICGA), Pittsburgh, PA, Morgan
     Western New York Image Processing Workshop                     Kaufmann, 1995, pp. 478-484.
     (WNYIPW). 2004.                                           [22] Goldberg, David E., and Jon Richardson. "Genetic
[8] Ashby, Leif H., and Roman V. Yampolskiy. "Genetic               algorithms with sharing for multimodal function
     algorithm and Wisdom of Artificial Crowds algorithm            optimization." Proceedings of the Second International
     applied to Light up." 16th International Conference on         Conference on Genetic Algorithms and their
     Computer Games (CGAMES), pp. 27-32. IEEE, 2011.                application, pp. 41-49. 1987.
[9] Hindi, Musa M., and Roman V. Yampolskiy. "Genetic
     Algorithm Applied to the Graph Coloring Problem."
     Midwest Artificial Intelligence and Cognitive Science
     Conference, p. 60. 2012.
[10] B. Carter and K. Park. How good are genetic
     algorithms at finding large cliques: an experimental
     study. Technical report, Boston University, Computer
     Science Department, MA, October 1993.
[11] K. Park and B. Carter. On the effectiveness of genetic
     search in combinatorial optimization. Proceedings of