=Paper= {{Paper |id=Vol-1348/maics2013_paper_1 |storemode=property |title=Hybrid Genetic Algorithm for the Maximum Clique Problem Combining Sharing and Migration |pdfUrl=https://ceur-ws.org/Vol-1348/maics2013_paper_1a.pdf |volume=Vol-1348 |dblpUrl=https://dblp.org/rec/conf/maics/OuchRY13 }} ==Hybrid Genetic Algorithm for the Maximum Clique Problem Combining Sharing and Migration== https://ceur-ws.org/Vol-1348/maics2013_paper_1a.pdf

Hybrid genetic algorithm for the maximum clique problem
combining sharing and migration

Roger Ouch, Kristopher W. Reese, Roman V. Yampolskiy
Computer Engineering and Computer Science,
University of Louisville
Louisville, Kentucky 40292

Abstract increases dramatically. Traditional heuristic approaches for
The Maximum Clique Problem (MCP) has been studied for finding the maximum clique problem have become
decades and is well known in graph theory as a problem that is insufficient for this new challenge.
difficult as it as it is known to be NP-complete. The MCP has a Cheng et al. [1] investigated finding the maximal clique
vast domain of application such as finance, biochemistry, within massive networks. They introduced the concept of
bioinformatics, and many more. Many niching methods have an H* graph where the maximum clique is computed in a
been successfully applied in Genetic Algorithms (GA) to small part of a large graph recursively one at a time using
diversify the population and avoid getting trapped within local external memory. Their experiments were done on large
optima. In this paper, we propose an approach using the Sharing graph with 10 million vertices and 80 millions edges.
method and a Hybrid Genetic Algorithm (HGA) for the
maximum clique problem. We also propose a non-evolutionary
C. Description of Genetic Algorithms
approach using a migration mechanism to boost the current HGA.
A well-known metaheuristic in evolutionary algorithms is
the genetic algorithm. The earliest work related to GAs
I. Introduction dates back to 1954 when Nils and Barricelli published a
paper on computer simulation of evolution [2]. Genetic
A. The Maximum Clique Problem algorithms have been formalized in later years and date
The maximum clique problem is a classically studied back as early as 1975 when J.H Holland published his
combinatorial optimization problem, which comes from papers on the subject [3].
graph theory and has many domains of application. The A GA is a combinatorial optimization technique that
maximum clique problem can be briefly summarized using mimics the process of natural evolution. The algorithm
the following definition. does not rely on building the final solution based on local
Let’s G= (V,E) be an undirected graph on N vertices, search, which distinguishes itself from traditional
where V is the vertex set and E is the edge set. A clique in approaches. A subset of feasible solutions of a problem is
G is a subgraph of G in which there is an edge between encoded in chromosomes that represent individuals in a
any two vertices. The size of a clique is the number of population. The evolutionary process is then applied
vertices in the clique, and the maximum clique problem is through population selection, crossover, and mutation
to find the largest clique in a given graph G. operations. Each individual is then evaluated using a fitness
function and the best of the individuals are transmitted into
B. Need For Improved Technique For Huge the next generation. The following procedure shows a basic
genetic algorithm:
Networks
Over the last few years, the size of networks has increased GA procedure(){
rapidly and the amount of information from massive data Initialize population();
sets has become a highly researched computational Evaluate population();
challenge. While not (End_Condition){
Some examples of fast growing networks exist in social Select parent();
networks and bioinformatics. The use of traditional Crossover();
methods for analyzing and treating the large amounts of Mutation();
information has become problematic. The nature of the Evaluate population();
MCP (NP-complete) explains why is it more complicated }
to find the maximum clique when the number of vertice N }
Figure 1 Genetic algorithm

Copyright retained by the authors.
Since the 1990s, significant research has been conducted score of individual in populated landscapes. This technique
on genetic algorithm, especially for finding the maximum is known as one of the best niching technique to escape
clique. from local optima.
In this paper, we implemented the sharing method, as we
II. Related Work are attempting to diversify and explore the elements of the
graph more thoroughly. The sharing function recalculates
Early works on the maximum clique problem were focused the fitness function based on how dense an area of the
on greedy approaches. One of the first significant graph is used. Therefore, as with any fitness function, our
improvements on the maximum clique was done by Bron- shared fitness becomes that shown in equation 1.
Kerbausch[4]. This algorithm uses a recursive f (i)
backtracking procedure that augments the clique by one f '(i) = n (1)
vertex at a time. Tomita, Tanaka and Takahashi [5] proved
that with a worst-time complexity O(3n/3), the Bron- ∑ sh(d(i, j))
j =1
Kerbaush algorithm was reported as one of the fastest
algorithms for listing all possible maximum cliques. where sh is the sharing based on the distance d between
In 1986, Robson improved this algorithm by adding vertex i and j. The sharing function is found in equation
backtracking techniques in combination with more (2).
complicated case analyses and dynamic programming. €
This however increased the space complexity of the MCP ⎧ d α
⎪1 − ( ) , if d < σshare
[6]. sh(d) = ⎨ σshare (2)
⎪⎩ 0, otherwise
A. Genetic Algorithm Methods For Max Clique
Genetic algorithms have been successfully applied to many
NP-hard problems in various domains [7-8]. GA has also C. Migration in GA
been successfully used on graph problems, particularly on Yet another technique that inspired sociological and
the graph-coloring problem [9]. On the MCP, the €first ecological movements between sub-populations is the
approaches using GA had poor performance compared to concept of migration. Migration in genetic algorithms has
other local search techniques [10, 11]. been well studied and applied in many parallel genetic
The idea of combining a genetic algorithm and a algorithms.
heuristic local search has been used in earlier applications. There are important considerations before implementing
Marchiori [12] proposed a simple genetic algorithm based migration in genetic algorithms. Migration policies define
on a combination of heuristic algorithms for the MCP. In the replacement rule to apply between two sub-populations.
this HGA several important genetic mechanisms were It is equivalent to answering the question: “What kind of
implemented such as the keep-two-best parents, elitism, a individual in a sub-population A is replacing what kind of
roulette-wheel population selection, uniform crossover, individual in sub-population B”. Topology is another
and a fitness function based on the size of the maximal important aspect in multi-population genetic algorithm.
clique. Moreover, the heuristic method is used to do local The topology defines the structure of communication
transformation on a sub-graph transforming all between several sub-populations [14]. Finally, the
chromosomes after crossover and mutation into a maximal migration rate must be used to define the proportion of a
clique. These processes have become known as the sub-population that is replaced with the immigrant
“relax”, “repair”, and “extend” phases. Her experiments on individuals.
the DIMACS data sets showed that results quality and Migration rates and migration policies have been shown
computational time have improved dramatically compared to improve the convergence of genetic algorithm
to other heuristic approaches. Marchiori [13] continued considerably. Cantu-Paz has conducted experiments in
developing the hybrid genetic algorithm and compared this which the best-replacing-worst population policy has
algorithm with two other competitive variations of genetic demonstrated the best results for migration policies [15].
algorithm: the iterated local search algorithm and the
multistart local search algorithm. III. Proposed Solutions
B. Fitness Sharing Method A. Hybrid Genetic Algorithm
Fitness sharing method has been used to avoid GA The heuristic local search algorithm is a local search based
converging to local maxima. This technique was proposed on greedy approach. Given a set of vertices, this algorithm
by Holland [3] and expanded by Goldberg and Richardson finds the maximal clique using a “Relax”, a “Repair” and
[22]. In a GA, the formation of population in local optimal an “Extend” steps. Marchiori described this algorithm in
(also called niche) makes difficult the convergence toward details [12]. The relax step adds random vertices to a
the global optimal solution. The fitness sharing method subgraph. The repair step then extracts a small maximal
helps to diversify the population by reducing the fitness clique from the current sub-graph. Finally the extend step
generates a bigger maximal clique by adding random populations to occur over each generation of the system.
nodes to the clique obtained in the repair step. Figure 3 shows the implementation of our hybrid genetic
The HGA is then obtained by combining the heuristic algorithm with a migration mechanism.
local search with a genetic algorithm given in figure 1. The
heuristic function is added after the crossover operation to
find maximal clique.

B. Introduction of a Migration Mechanism in
Genetic Algorithm
In our solution we propose a migration mechanism based
on two sub-populations, a main sub-population and a
secondary sub-population. The main population will have
the role of a standard HGA. The secondary sub-population
will serve as a pool that generates the new chromosomes
for migrating into the main subpopulation. This
mechanism should help to accelerate the convergence of
the current hybrid genetic algorithm. Figure 3 Hybrid GA with migration mechanism
The main sub-population is the standard genetic heuristic
1) Role of the Two Sub-Populations algorithm discussed by Marchiori [11]. The secondary sub-
The idea of combining a heuristic local search and a population, however, is of more interest. This
genetic algorithm is not new. It has proven to be subpopulation uses a non-evolutionary structure. It has no
competitive in solving the maximum clique problem. One genetic operators and its role is to give random “seed” to
issue with this approach is the convergence into local the main population at every generation.
optima. The diversification strategy relies on the genetic To initialize the population, we randomly choose
operators: crossover and mutation. The exploration of the vertices for each chromosome in the new population. We
set of feasible solutions (or neighborhood) in hybrid then apply the heuristic function, which extracts the
genetic algorithm is an area that can be improved. maximal clique from each chromosome. We then sort the
In our approach we define two sub-populations. The population by fitness, to determine which of the elements
Main sub-population uses a generic HGA defined should migrate to the main sub-population.
previously. This sub-population specializes in the
exploitation of the data. The secondary sub-population will IV. Experimental Results
be dedicated to exploration of the data. In the secondary
population, an effort is made to diversify the population.
A. Benchmark Datasets Used
The sub-population will then transfer the best of its
population to the main sub-population through a migration The data used for our experiments are from the DIMACS
mechanism. In this approach, we want to maximize the (Center for Discrete Mathematics and Theoretical
functions of exploration and exploitation of the genetic Computer Science) benchmark graphs. It is composed of a
algorithm. Figure 2 shows the role of each sub-population. collection of 9 different classes of graphs for evaluating
and comparing different algorithms for solving the
maximum clique problem. This collection consists of
random graphs with known maximum clique size as well as
graphs obtained from various areas of applications. The C-
FAT graphs are based on the fault diagnosis problem [16].
The Hamming and Johnson graphs are from the coding
theory problems [17] [18]. The Keller graphs arise from the
Keller conjecture on tiling using hypercube [19]. The SAN,
SANR, BROCK and P-HAT are composed of various types
of random graphs with known maximum clique sizes. The
Figure 2 Role of sub-populations BROCK graphs contain random graphs constructed so that
they have hidden cliques that are larger than what might be
2) Migration Mechanism expected in a random graph. The P-HAT graphs are
In our algorithm, it was decided to use a best-replace-worst random graphs with large variance in the vertex degree
migration policy at each generation, where the best distribution and a larger clique than usual random graphs
individuals in the sub-population will replace a percentage [20]. Finally, the MANN graphs are from the vertex-
of the worst individuals in the main sub-population. We covering problem, which is closely related to the maximum
also used a ladder migration topology defined in [15], clique problem [21].
which allows the migration between the two sub-
B. Results Obtained of discovering bigger cliques are low because the HGA
reach local maxima.
A series of experiments were conducted using a small
subset of the Dimacs data sets. The results obtained show
that the performance of our algorithm is close to the best
results obtained on the same data sets from the DIMACS
challenge (table 1).
Graph Best obtained Best DIMACS
keller4
11
11
keller5
27
27
keller6
49
59
hamming8-‐4
16
16
hamming10-‐4
38
40
MANN_a27
126
126
MANN_a81
1096
1098
Table 1 Comparison of our algorithm with Best DIMACS

C. Effect of Mutation Rate Variation on Results Figure 4 mutation= 1%
In these experiments, we evaluate the effect of the
mutation by holding other parameters static. We analyzed The results obtained in our experiments shows that 1%
the effect of mutation on the Keller5 graph with the mutation rate achieves the best performance with the fastest
mutation rate parameters: 0.5%, 1%, 5%, 10% and 20%. convergence and helps diversifying the population to its
Table 2 shows the parameters chosen for this set of maximums. The HGA algorithm found a 24-clique after
only eight generations.
experiments.
Parameters Value
Population size 100
D. Effect of Migration Rate Variation on Results
Maximum iteration 50 In this set of experiments, we fix the mutation rate with the
Mutation 0.5%, 1%, 5%, 10%, 20% value found greedily in the previous experiment. We then
vary the migration rate with the parameters: 1%, 5%, 10%,
Crossover rate 100% (uniform crossover)
20%, 30%, 50% and 90%. Table 3 shows the parameters
Table 2 Parameters for testing mutation rate chosen applied on the same graph (Keller5).
At each generation, the mutation breaks current
Parameters Value
maximal clique in every chromosome, by replacing current
nodes by other random nodes. Population size 100
On one hand, a low mutation rate keeps nodes that are Maximum iteration 50
good candidates for the maximum clique and the heuristic Mutation 1%
algorithm find good local solutions. But less graph Crossover rate 100% (uniform crossover)
exploration are possible using a low mutation rate. On the Migration rates 1%, 5%, 10%, 20%, 30%,
other hand, a high mutation rate gives a good chance of Table 3 Parameters for50%, 90%
testing migration rate
graph exploration, but the local search become less
efficient, as to many nodes are exchanged in The effect of a high migration rate is the replacement of
chromosomes. Choosing a good mutation rate is a trade-off good solutions obtained by the HGA, whereas a low
between finding a good local solution and good graph migration rate injects not enough new seeds in new
exploration. Figure 4 shows the minimum, the average and population. The experiments show that the optimum
maximum fitness score for each chromosome for 1% migration rate is reached at 10% and the maximum clique
mutation rate. is found after 28 generations (ground truth maximum
We can observe two phases applying the GA to the clique=27). Figure 5 shows the minimum, the average and
maximum clique problem. In the first phase, a high maximum fitness score for each chromosome for 10%
variation of the fitness score phase can be observed from 0 migration rate.
to the 15th generation. During this phase, the HGA In this graph, we can observe that the maximum fitness
explores various combinations of nodes in the graph, so increase by levels, which means that new maximal cliques
the fitness score change rapidly. Then a phase of are found. After one or two generations the stabilization
stabilization is observed where the maximal cliques are phase is reached and the GA continued graph exploration.
kept in the population and improvement occurs The results show that the migration process helped the
progressively. During the stabilization phase, the chances genetic algorithm to escape from local maxima and find the
maximum clique by adding more graph exploration Table 5 parameters for testing fitness sharing
capability.
A high number of niches (σshare) allow diversifying the
population by reducing the fitness value of individual from
the same niche. This way, the chance to promote other
individuals from other niches is higher.
We obtained the best results in this series of experiments
with the parameter σshare=20. the maximum clique is found
after 170 generations. Figure 6 shows the minimum, the
average and maximum fitness score for each chromosome
for σshare=20.

Figure 5 Migration rate 10%

E. Effect of Migration with Variation of
Population Size and Number of Generations
In these experiment we test the effect of migration on a
variation of the population size and maximum number of
generation. We compare the effect of migration on
different parameter setup. Table 4 shows the parameters
for this set of experiments. Figure 6 σshare =20

Parameters Value The fitness sharing method also plays a function of
(Population size / (10/1000,y), (10/1000,n), graph exploration by reducing the fitness values of
Maximum iteration / (50/200,y), (50/200,n), individual in the same niche. Then other individual from
migration) (100/100, y), (100/100,n), other niches are promoted in the population. Finally we
(200/50,y), (200/50,n) have shown that fitness sharing method is also well suited
for graph exploration and helps to find the maximum
Mutation 1% clique.
Crossover rate 100% (uniform crossover)
Migration rates 10% V. CONCLUSION
Table 4 Parameters for testing effects of migration
For the maximum clique problem, we have proposed a
The results indicate that for each configuration, applying solution that implemented a migration mechanism
our migration mechanism increase the size of maximum composed of two subpopulations. The main sub-population
clique found and also accelerates the convergence toward is specialized on data exploitation and the secondary
the best solution. subpopulation is specialized on data exploration.
Our results have shown that both sharing and migration
F. Effect of fitness Sharing on Results improve the convergence and results of the genetic
In these experiments we test the effect of fitness sharing on algorithm. We have shown that migration proves to be a
same graph. The parameter σshare from equation (2) valuable tool in converging on the maximum, or high
represents the number of niches. We vary the number of maxima relatively quickly. We have also discussed the
niches with the values: 3, 7, 10, 20. Table 5 shows the potential usefulness in using sharing to find multiple
parameters used for this set of experiments. maximum solutions in the same problem. We also found
that in order to maximize the difference and prevent fast
Parameters Value convergence of the minimum fitness value, a 1% mutation
Population size 10 rate appears to give the best results. The migration rate was
Maximum iteration 1000 also empirically derived to be best at 10% migration.
Mutation 1% These experiments lay groundwork for potential research
Crossover rate 100% (uniform crossover) in the future. We used a non-evolutionary approach by
3,7,10, 20 adding a second sub-population, which serves the main
Number of Niches (σshare)
population by generating random new seeds. Other the 10th ACM Symposium on Applied Computing.
improvements can be made by developing a more specific ACM Press, 1995.
heuristic local search algorithm for the secondary sub- [12] Marchiori, Elena. "A simple heuristic based genetic
population dedicated to graph exploration. algorithm for the maximum clique problem."
There are more improvement possibilities by developing Symposium on Applied Computing: Proceedings of
a co-evolutionary method where instead of generating new the 1998 ACM symposium on Applied Computing,
seed at each iteration, the second sub-population evolves at vol. 27, pp. 366-373. 1998.
the same time with the main sub-population. Migration is [13] E. Marchiori. Genetic, iterated and multistart local
maintained at every generation for faster convergence. search for the maximum clique problem”. Applications
Finally a cultural co-evolutionary approach would give
of Evolutionary Computing. Berlin, Germany:
feedback from the main sub-population to the secondary
Springer-Verlag, 2002, LNCS 2279, pp. 112–121.
sub-population, to help building better candidate for
exploration. [14] Cantu-Paz, E. (1999) Migration policies and
takeovertimes in parallel genetic algorithms. IlliGAL
REFERENCES Technical Report No. 99008. University of Illinois at
Urbana-Champaign.
[1] Cheng, James, Yiping Ke, Ada Wai-Chee Fu, Jeffrey [15] Cantu-Paz, E. A survey of parallel genetic algorithms.
Xu Yu, and Linhong Zhu. "Finding maximal cliques Technical report 97003. University of Illinois at
in massive networks by h*-graph." Proceedings of the Urbana-Champaign, May 2007.
2010 international conference on Management of data, [16] Berman, Pioa, and A. Pelc. "Distributed probabilistic
pp. 447-458. ACM, 2010. fault diagnosis for multiprocessor systems." Fault-
[2] Barricelli, Nils Aall (1954). "Esempi numerici di Tolerant Computing. FTCS-20. Digest of Papers. 20th
processi di evoluzione". Methodos: 45–68. International Symposium. IEEE, 1990.
[3] Holland J. H. Adaptation in natural and artificial [17] J. MacWilliams and N. J. A. Sloane, The Theory of
systems. The University of Michigan Press, Ann Error Correcting Codes," North-Holland, Amsterdam,
Arbor, MI, 1975. 1979.
[4] Bron, Coen, and Joep Kerbosch. "Algorithm 457: [18] Sloane, N. J. A. "Unsolved problems in graph theory
finding all cliques of an undirected graph." arising from the study of codes." Graph Theory Notes
Communications of the ACM 16.9 (1973): 575-577. of New York 18 (1989): 11-20.
[5] Tomita, Etsuji, Akira Tanaka, and Haruhisa [19] J. C. Lagarias and P. W. Shor, Keller's Cube-Tiling
Takahashi. "The worst-case time complexity for Conjecture is False in High Dimensions," Bulletin
generating all maximal cliques and computational AMS, 27(2), pp. 279-283.
experiments." Theoretical Computer Science 363, no. [20] M. Brockington and J. Culberson, Camouflaging
1 (2006): 28-42. Independent Sets in Quasi-Random Graphs." Working
[6] Robson, John Michael. "Algorithms for maximum Paper, Second DIMACS Implementation Challenge,
independent sets." Journal of Algorithms 7.3 (1986): 1993.
425-440. [21] T. N. Bui and P. H. Eppley, A Hybrid Genetic
[7] Yampolskiy, R., P. Anderson, J. Arney, V. Misic, and Algorithm for the Maximum Clique Problem."
T. Clarke. "Printer model integrating genetic Proceedings of the 6th International Conference on
algorithm for improvement of halftone patterns." Genetic Algorithms (ICGA), Pittsburgh, PA, Morgan
Western New York Image Processing Workshop Kaufmann, 1995, pp. 478-484.
(WNYIPW). 2004. [22] Goldberg, David E., and Jon Richardson. "Genetic
[8] Ashby, Leif H., and Roman V. Yampolskiy. "Genetic algorithms with sharing for multimodal function
algorithm and Wisdom of Artificial Crowds algorithm optimization." Proceedings of the Second International
applied to Light up." 16th International Conference on Conference on Genetic Algorithms and their
Computer Games (CGAMES), pp. 27-32. IEEE, 2011. application, pp. 41-49. 1987.
[9] Hindi, Musa M., and Roman V. Yampolskiy. "Genetic
Algorithm Applied to the Graph Coloring Problem."
Midwest Artificial Intelligence and Cognitive Science
Conference, p. 60. 2012.
[10] B. Carter and K. Park. How good are genetic
algorithms at finding large cliques: an experimental
study. Technical report, Boston University, Computer
Science Department, MA, October 1993.
[11] K. Park and B. Carter. On the effectiveness of genetic
search in combinatorial optimization. Proceedings of