=Paper= {{Paper |id=None |storemode=property |title=Genetic Algorithm Applied to the Graph Coloring Problem |pdfUrl=https://ceur-ws.org/Vol-841/submission_10.pdf |volume=Vol-841 |dblpUrl=https://dblp.org/rec/conf/maics/HindiY12 }} ==Genetic Algorithm Applied to the Graph Coloring Problem== https://ceur-ws.org/Vol-841/submission_10.pdf
             Genetic Algorithm Applied to the Graph Coloring Problem
                                        Musa M. Hindi and Roman V. Yampolskiy


                                             Computer Engineering and Computer Science
                                                 J.B. Speed School of Engineering
                                                       Louisville, Kentucky

                                                                    complete problems that also belong to Constraint
                            Abstract                                Satisfaction Problems (CSPs). The practical applications of
                                                                    Graph Coloring Problems include but are not limited to:
  In this paper we present a hybrid technique that applies a
  genetic algorithm followed by wisdom of artificial crowds
  approach to solving the graph-coloring problem. The                   •    Map coloring (B. H. Gwee, M. H. Lim and J. S.
  genetic algorithm described here utilizes more than one                    Ho 1993)
  parent selection and mutation methods depending on the                •    Scheduling (Daniel Marx and D Aniel Marx 2004)
  state of fitness of its best solution. This results in shifting       •    Radio Frequency Assignment (W. K. Hale 1980;
  the solution to the global optimum more quickly than using                 S. Singha, T. Bhattacharya and S. R. B. Chaudhuri
  a single parent selection or mutation method. The algorithm
                                                                             2008)
  is tested against the standard DIMACS benchmark tests
  while limiting the number of usable colors to the known               •    Register allocation (Wu Shengning and Li Sikun
  chromatic numbers. The proposed algorithm succeeded at                     2007)
  solving the sample data set and even outperformed a recent            •    Pattern Matching
  approach in terms of the minimum number of colors                     •    Sudoku
  needed to color some of the graphs.
                                                                    In this paper we demonstrate the use of genetic algorithms
The Graph Coloring Problem (GCP) is a well-known NP-                in solving the graph-coloring problem while strictly
complete problem. Graph coloring includes both vertex               adhering to the usage of no more than the number of colors
coloring and edge coloring. However, the term graph                 equal to the chromatic index to color the various test
coloring usually refers to vertex coloring rather than edge         graphs.
coloring.

Given a number of vertices, which form a connected
                                                                                          Prior Work
graph, the objective is to color each vertex such that if two       A great deal of research has been done to tackle the
vertices are connected in the graph (i.e. adjacent) they will       theoretical aspects of the Graph Coloring Problem in terms
be colored with different colors. Moreover, the number of           of its generalization as a Constraint Satisfaction Problem
different colors that can be used to color the vertices is          (Isabel Méndez Díaz and Paula Zabala 1999). The
limited and a secondary objective is to find the minimum            problem’s various applications and solutions have been
number of different colors needed to color a certain graph          discussed in detail in Porumbel’s paper (Daniel Cosmin
without violating the adjacency constraint. That number             Porumbel 2009). Evolutionary computation and parameter
for a given graph (G) is known as the Chromatic Number              control has been detailed in a number of papers including
(χ(G)) (Isabel Méndez Díaz and Paula Zabala 1999).                  ones by Back, Hammel, and Schwefel (T. Back, U.
                                                                    Hammel and H. P. Schwefel 1997) as well as work by
If k = {1, 2, 3...} and P(G, k) is the number of possible           Eiben, Hinterding and Michalewicz (A. E. Eiben, R.
solutions for coloring the graph G with k colors, then              Hinterding and Z. Michalewicz 1999). Srinivas and Patnaik
                                                                    examined crossover and mutation probabilities for
                                                                    optimizing genetic algorithm performance (M. Srinivas and
                χ(G) = min(k: P(G, k) > 0)                (1)       L. M. Patnaik 1994). Genetic algorithms and evolutionary
                                                                    approaches have been used extensively in solutions for the
                                                                    Graph Coloring Problem and its applications (F. F. Ali, et
                                                                    al. 1999; K. Tagawa, et al. 1999; Cornelius Croitoru, et al.
                                                                    2002; C. A. Glass and A. Prugel-Bennett 2003; Justine W.
Graph coloring problems are very interesting from the
                                                                    Shen 2003; Greg Durrett, Muriel Médard and Una-May
theoretical standpoint since they are a class of NP-
                                                                    O’Reilly 2010; Lixia Han and Zhanli Han 2010). Most
                                                                    recent work utilized a parallel genetic algorithm on a
similar dataset to the one used in this paper (Reza              Figure 1 displays the chromosome representation of a
Abbasian and Malek Mouhoub 2011).                                graph of 7 vertices using 4 colors:

The concept of utilizing a crowd of individuals for solving      The ultimate goal when solving GCPs is to reach a solution
NP complete problems has also been the topic of various          where no two adjacent vertices have the same color.
papers. Most notably the Wisdom of Crowds concept has            Therefore, the GA process continues until it either finds a
been used in solving the Traveling Salesman Problem              solution (i.e. 0 conflicts) or the algorithm has been run for
(Sheng Kung Michael Yi, et al. 2010b) as well as the             the predefined number of generations. In addition to the
Minimum Spanning Tree Problem (Sheng Kung Michael                GA, if a run fails to reach a solution a Wisdom of Crowds
Yi, et al. 2010a). In this paper we attempt to supplement        approach will be applied to the top performers in an
the solution produced by the genetic algorithm utilizing an      attempt to produce a better solution.
artificial crowd (Leif H. Ashby and Roman V. Yampolskiy
2011; Roman V. Yampolskiy and Ahmed El-Barkouky
2011).

                   Proposed Approach
Genetic algorithms share an overall structure and
workflow yet they vary in the specific details according to
the particular problem. The algorithm consists of a parent
selection method, a crossover method and a mutation
method.

The general algorithm is:

Algorithm1: General Genetic Algorithm
define: population, parents, child

population = randomly generated chromosomes;

while (terminating condition is not reached) {
  gaRun();
}

// a single run of a genetic algorithm                           Figure 1: Chromosome representation of a colored connected
function gaRun() {                                               graph
    parents = getParents();
    child = crossover(parents);                                  The overall genetic algorithm in this approach is a
    child = mutate(child);                                       generational genetic algorithm. The population size is kept
population.add(child);                                           constant at all times and with each generation the bottom
}                                                                performing half of the population is removed and new
                                                                 randomly generated chromosomes are added. The
The goal of the previous algorithm is to improve the             population size is set to 50 chromosomes. The value was
fitness of the population by mating its fittest individuals to   chosen after testing a number of different population sizes.
produce superior offspring that offer a better solution to       The value 50 was the least value that produced the desired
the problem. This process continues until a terminating          results.
condition is reached which could be simply that the total
number of generations has been run or any other parameter        The GA uses two different parent selection methods, a
like non-improvement of fitness over a certain number of         single crossover method and two different mutation
generations or that a solution for the problem has been          methods. Which of the parent selection and mutation
found.                                                           methods ends up selected depends on the state of the
                                                                 population and how close it is to finding a solution. The
The chromosome representation of the GCP is simply an            parent selection, crossover and mutation methods are
array with the length set to the number of vertices in the       outlined as follows:
graph. Each cell in the array is assigned a value from 0 to
the number of colors – 1. The adjacencies between the
vertices are represented by an adjacency matrix of
dimensions n×n where n: the number of vertices.
Algorithm2: parentSelection1:                                   A bad edge is defined as an edge connecting two vertices
define: parent1, parent2, tempParents;                          that have the same color. The number of bad edges is the
                                                                fitness score for any chromosome. As mentioned above, the
tempParents = two randomly selected chromosomes from the        alteration between the two different parent selection and
population;                                                     mutation methods depends on the best fitness. If the best
parent1 = the fitter of tempParents;                            fitness is greater than 4 then parentSelection1 and
                                                                mutation1 are used. If the best fitness is 4 or less then
tempParents = two randomly selected chromosomes from the        parentSelection2 and mutation2 are used. This alteration is
population;                                                     the result of experimenting with the different data sets. It
parent2 = the fitter of tempParents;                            was observed that when the best fitness score is low (i.e.
                                                                approaching an optimum) the usage of parent selection 2
return parent1, parent2;                                        (which copies the best chromosome as the new child) along
                                                                with mutation2 (which randomly selects a color for the
Algorithm3: parentSelection2:                                   violating vertex) results in a solution more often and more
define: parent1, parent2                                        quickly than using the other two respective methods.

parent1 = the top performing chromosome;                        Finally, the algorithm is run for 20,000 generations or until
parent2 = the top performing chromosome;                        a solution with 0 bad edges is found. If a solution is not
                                                                found         after      20,000        generations         the
return parent1, parent2;                                        wisdomOfArtificialCrowds algorithm is run. The algorithm
                                                                used is a localized wisdom of crowds algorithm that only
Algorithm4: crossover                                           builds a consensus out of the violating edges in the best
define: crosspoint, parent1, parent2, child                     solution. Moreover, it uses the best half of the final
                                                                population to produce an aggregate solution. Only a
crosspoint = random point along a chromosome;                   localized consensus is generated so as not to produce a
child = colors up to and including crosspoint from parent 1 +   result that alters the correctly colored vertices. Also, it
colors after crosspoint to the end of the chromosome from       takes the best half because they share the most similarity
parent2;                                                        and thus will most likely be different at the level of the bad
                                                                edges rather than the good ones.
return child;
                                                                Algorithm7: wisdomOfArtificialCrowds
Algorithm5: mutation1:                                          define: aggregateChromosome, newColor, expertChromosomes;
define: chromosome, allColors, adjacentColors, validColors,
newColor;                                                       expertChromosomes = best half of the final population;
                                                                aggregateChromosome = best performing chromosome;
for each(vertex in chromosome) {
if (vertex has the same color as an adjacent vertex) {          for        each (vertex in graph) {
           adjacentColors = all adjacent colors;                   if (vertex is part of a bad edge) {
           validColors = allColors – adjacentColors;                   newColor = most used color for vertex             among
                                                                       expertChromosomes;
          newColor = random color from validColors;                    aggregateChromosome.setColor(vertex, newColor)
          chromosome.setColor(vertex, newColor)                    }
   }                                                            }
}
return chromosome;                                                                          Data
Algorithm6: mutation2:                                          Data used to test our approach are derived from the
define: chromosome, allColors                                   DIMACS benchmarking graph collection. DIMACS is the
                                                                Center for Discrete Mathematics and Theoretical Computer
for each(vertex in chromosome) {                                Science. It is part of Rutgers University (The State
   if (vertex has the same color as an adjacent vertex) {       University of New Jersey Rutgers, et al. 2011). The data
           newColor = random color from allColors;              are frequently used in challenges involving constraint
           chromosome.setColor(vertex, newColor)                satisfaction problems. The files used have a .col extension.
   }                                                            Each file contains a header with the number of vertices (p)
}                                                               and the number of edges (e):
return chromosome;
                                                                p edge 496 11654
A number of lines follow the header with each line            File             |V|   |E|     Expected   kmin   HPGAGCP   Time
denoting the connected edges and their vertex indices:                                       χ(G)              Result    (s)
                                                              myciel3.col      11    20      4          4      4         0.003
                                                              myciel4.col      23    71      5          5      5         0.006
e 1 100                                                       queen5_5.col     23    160     5          5      5         0.031
                                                              queen6_6.col     25    290     7          7      8         6.100
                                                              myciel5.col      36    236     6          6      6         0.014
16 files were chosen from the DIMACS collection. The          queen7_7.col     49    476     7          7      8         6.270
graphs the files represent vary in vertex count, edge count   queen8_8.col     64    728     9          9      10        47.482
and overall complexity. The vertex count ranges between       huck.col         74    301     11         11     11        0.015
                                                              jean.col         80    254     10         10     10        0.015
11 and 561 and the edge count ranges from 20 to 11654.        david.col        87    406     11         11     11        0.019
The rationale behind the selection of these graphs other      games120.col     120   638     9          9      9         0.027
                                                              miles250.col     128   387     8          8      8         0.076
than the wide range of variation is that there is a known     miles1000.col    128   3216    42         42     42        48.559
chromatic number for each of them, or at least a good         anna.col         138   493     11         11     11        0.058
approximation.                                                fpsol2.i.1.col   496   11654   65         65     65        22.656
                                                              homer.col        561   1629    13         13     13        0.760
                                                              Table 1: Results of running the proposed algorithm on 16 .col
The following files were used in this approach:
                                                              files from the DIMACS collection
(myciel3.col, myciel4.col, myciel5.col, queen5_5.col,
queen6_6.col, queen7_ 7.col, queen8_8.col, huck.col,
jean.col,   david.col.     games120.col,      miles250.col,   The following graphs plot the relationship between the
miles1000.col, anna.col, fpsol2.i.1.col, homer.col).          fitness and the generation for a sample set of the files used:

                          Results
Table 1 displays the following statistics for each file:

    •    The number of vertices |V|
    •    The number of edges |E|
    •    The expected chromatic number χ(G)
    •    The minimum number of colors used by this
         algorithm kmin
    •    The minimum number of colors used by a
         comparative publication using a Hybrid Parallel
         Genetic Algorithm (HPGAGCP)
    •    Average time it took to find a solution

The genetic algorithm was developed in Java utilizing JDK     Figure 2: Fitness score over the number of generations for
1.6 and JUNG (Java Universal Network/Graph) framework         myciel3.col
for graph visualization (Joshua O'Madadhain, Danyel
Fisher and Tom Nelson 2011). Performance plots were
generated using MATLAB R2010a.

The tests were run on a desktop PC with the following
specifications:

CPU: Intel Core i7 860 @2.8Ghz
RAM: 8 GB DDR3 @1333MHz




                                                              Figure 3: Fitness score over the number of generations for
                                                              queen5_5.col
Figures 2 and 3 are not very interesting since the solutions
are found after a few generations. The next plot, however,
is of particular interest since it clearly represents the erratic
behavior of the fitness score between the initial rapid drop
until a solution is ultimately found. In standard genetic
algorithms the fitness score continues to increase or
decrease (depending of the definition of better fitness)
until the end of the run. This is not the case here. This
factor plays a huge role in obtaining the global optimum
with a higher probability than without it as will be
discussed later.




                                                                    Figure 6: GCP solution for games120.col




Figure 4: Fitness score over the number of generations for
queen6.col

Sample solutions: (graphs were visualized using the JUNG
framework (Joshua O'Madadhain, Danyel Fisher and Tom
Nelson 2011)):




                                                                    Figure 7: GCP solution for fpsol2.i.col

                                                                                            Discussion
                                                                    During the design of this approach, the issue of designing
                                                                    good operators was a constant concern. The goal of any
                                                                    good operator is to bring the chromosomes of a population
                                                                    closer to the desired solution. However, during the process,
                                                                    a chromosome’s fitness often improves but eventually ends
                                                                    up in a local optimum. The overall design of this approach
Figure 5: GCP solution for myciel5.col
                                                                    aimed to improve fitness towards the global optimum while
                                                                    avoiding the pitfalls of local optima.
To achieve that, a number of factors need to be considered.     postprocessor was applied the best solution had 5
Initially, the crossover function is applied to parents that    conflicting edges. After application of the Wisdom of
result from the first parent selection method. This method      Artificial Crowds postprocessor the graph was colored
selects parents by conducting a small tournament between        slightly differently but still had 5 conflicting edges. It is
random pairs of chromosomes. Two pairs are chosen               worth noting that the postprocessor added an average of
randomly and the fitter of each pair becomes a parent.          250 ms to the overall process.
These fit parents are then used as input to this crossover
method. The crossover conducts a simple one-point               The test cases used were very useful in both testing and
crossover with the cross point being chosen at random.          tuning the algorithm. The algorithm was able to scale
The result of this crossover is then subjected to the first     across the different graphs and produce optimum solutions
mutation method. The mutation is carried out at a high rate     in each case. A recent 2011 publication presented a parallel
of 0.7. This mutation examines each vertex and if a vertex      genetic algorithm for the GCP (Reza Abbasian and Malek
violates the coloring constraint a valid color is chosen at     Mouhoub 2011). This paper used most of the same test
random.                                                         files that were used in Abbasian’s approach. That
                                                                approach’s proposed algorithm failed to solve three of the
This process is very effective in reducing the number of        graphs and only produced solutions when the color count
conflicts rapidly which can be seen in all the plots through    was increased by 1. The files representing these graphs are
an almost perpendicular drop in fitness. However, in spite      queen6_6.col, queen7_7.col and queen8_8.col. The
of this method’s effectiveness at increasing fitness rapidly,   algorithm used in our approach successfully solved these
it has the side effect of keeping the solution at a local       three files using the specified known chromatic number as
optimum.                                                        the number of colors used. Our approach is also generally
                                                                faster at finding a solution. It is faster in all cases except
To fix that, another parent selection and mutation method       four. Three of those cases are the three aforementioned
is introduced. The two methods are applied when the             files, which the comparative method did not succeed at
overall fitness of the best solution drops below 5 conflicts.   finding a solution using the known chromatic index. The
After that point crossover is no longer applied. The top        fourth case is miles1000.col.
performer is selected and is subjected to the second
mutation method. This method finds the conflicting                                     Conclusion
vertices and replaces their conflicting colors with random
colors; which could be invalid as well. This has the            The overarching algorithm used in this approach is a
potential to either find a globally optimum solution (i.e. 0    genetic algorithm with a subsequent Wisdom of Crowds
conflicts) or produce a solution that is worse! This can be     post-processor. Within the genetic algorithm itself is a set
observed by the erratic pattern in some of the graphs after     of operators that utilize methods from the genetic algorithm
the sharp descent and before the actual resolution of the       domain as well as applying various heuristics in a
problem.                                                        stochastic manner. The end result is an quick progressive
                                                                climb to the peak of the globally optimum solution.
This seemingly worsening fitness is not bad however. In
fact, it is partly due to this worsening of the fitness that    The algorithms described here can also be applied to the
some solutions are found at all! When the solution              various subsets of the general GCP. In particular, Sudoku
becomes worse the fitness score increases. This will force      can benefit from these algorithms where it can be
the algorithm back to using the first parent selection and      represented as a graph with 81 vertices that must be colored
mutation methods. But, the population now contains a            using no more than 9 different colors (i.e. different
solution that hadn’t been there before, which increases the     numbers).
possibility of reaching the global optimum. The continuous
back and forth between parent selection and mutation                                    References
methods plays a crucial role in shifting the solution from a
local optimum to a global optimum.                              Abbasian, Reza, and Mouhoub, Malek. 2011. An efficient
                                                                hierarchical parallel genetic algorithm for graph coloring
Finally, if a solution is not found after 20,000 generations,   problem. In Proceedings of The 13th annual conference on
a Wisdom of Artificial Crowds algorithm is applied to the       Genetic and evolutionary computation, 521-528. Dublin, Ireland:
resultant population to produce a better solution. The          ACM
genetic algorithm had been producing optimal results and
thus, per the algorithmic workflow, the Wisdom of
Artificial Crowds postprocessor wouldn’t be applied.            Ali, F. F., Nakao, Z., Tan, R. B., and Yen-Wei, Chen. 1999. An
However, in order to test its effectiveness, a test was         evolutionary approach for graph coloring. In Proceedings of The
conducted by decreasing the generation count to 10,000 to       International Conference on Systems, Man, and Cybernetics, 527-
intentionally produce a suboptimal solution. The test was       532. IEEE
carried out on the miles1000.col file. Before the
Ashby, Leif H., and Yampolskiy, Roman V. 2011, Genetic             Porumbel, Daniel Cosmin. 2009. Heuristic Algorithms and
Algorithm and Wisdom of Artificial Crowds Algorithm Applied        Learning Techniques: Applications to the Graph Coloring
to Light Up, The 16th International Conference on Computer         Problem, Thesis Draft, Département Informatique, Université
Games: AI, Animation, Mobile, Interactive Multimedia,              d'Angers
Educational & Serious Games, Louisville, KY, USA: July 27 –
30, 2011                                                           Rutgers, The State University of New Jersey, Research, AT&T
                                                                   Labs, Jersey, Cancer Institute of New, University, Princeton,
Back, T., Hammel, U., and Schwefel, H. P. 1997. Evolutionary       Labs, Alcatel-Lucent Bell, America, NEC Laboratories, and
computation: comments on the history and current state. IEEE       Sciences, Applied Communication. 2011. Center for Discrete
Transactions on Evolutionary Computation 1: 3-17                   Mathematics and Theoretical Computer Science (DIMACS). Nov.
                                                                   1, 2011. http://dimacs.rutgers.edu/
Croitoru, Cornelius, Luchian, Henri, Gheorghies, Ovidiu, and
Apetrei, Adriana. 2002. A New Genetic Graph Coloring               Shen, Justine W. 2003. Solving the Graph Coloring Problem
Heuristic. In Proceedings of The Computational Symposium on        using Genetic Programming, in Genetic Algorithms and Genetic
Graph Coloring and its Generalizations, 63-74. Ithaca, New         Programming at Stanford 2003: 187-196, Stanford Bookstore
York, USA
                                                                   Shengning, Wu, and Sikun, Li. 2007. Extending Traditional
Díaz, Isabel Méndez, and Zabala, Paula. 1999, A Generalization     Graph-Coloring Register Allocation Exploiting Meta-heuristics
of the Graph Coloring Problem, Departamento de Computacion,        for Embedded Systems. In Proceedings of The Third International
Universidad de Buenes Aires.                                       Conference on Natural Computation. ICNC, 324-329. Haikou,
                                                                   Hainan, China
Durrett, Greg, Médard, Muriel, and O’Reilly, Una-May. 2010. A
Genetic Algorithm to Minimize Chromatic Entropy: 59-70, eds.       Singha, S., Bhattacharya, T., and Chaudhuri, S. R. B. 2008. An
P. Cowling and P. Merz, Springer Berlin / Heidelberg               Approach for Reducing Crosstalk in Restricted Channel Routing
                                                                   Using Graph Coloring Problem and Genetic Algorithm. In
Eiben, A. E., Hinterding, R., and Michalewicz, Z. 1999.            Proceedings of The International Conference on Computer and
Parameter control in evolutionary algorithms. IEEE Transactions    Electrical Engineering, 807-811. Phuket Island, Thailand
on Evolutionary Computation 2: 124-141
                                                                   Srinivas, M., and Patnaik, L. M. 1994. Adaptive probabilities of
Glass, C. A., and Prugel-Bennett, A. 2003. Genetic algorithm for   crossover and mutation in genetic algorithms. IEEE Transactions
graph coloring: Exploration of Galinier and Hao's algorithm.       on Systems, Man and Cybernetics 4: 656-667
Journal of Combinatorial Optimization 3: 229-236
                                                                   Tagawa, K., Kanesige, K., Inoue, K., and Haneda, H. 1999.
Gwee, B. H., Lim, M. H., and Ho, J. S. 1993. Solving four-         Distance based hybrid genetic algorithm: an application for the
colouring map problem using genetic algorithm. In Proceedings      graph coloring problem. In Proceedings of Congress on
of First New Zealand International Two-Stream Conference on        Evolutionary Computation, 2332. Washington DC
Artificial Neural Networks and Expert Systems, 332-333. New
Zealand                                                            Yampolskiy, Roman V., and El-Barkouky, Ahmed. 2011.
                                                                   Wisdom of Artificial Crowds Algorithm for Solving NP-Hard
Hale, W. K. 1980. Frequency assignment: Theory and                 Problems. International Journal of Bio-Inspired Computation
applications. Proceedings of the IEEE 12: 1497-1514                (IJBIC) 6: 358-369

Han, Lixia, and Han, Zhanli. 2010. A Novel Bi-objective Genetic    Yi, Sheng Kung Michael, Steyvers, Mark, Lee, Michael D., and
Algorithm for the Graph Coloring Problem. In Proceedings of        Dry, Matthew. 2010a. Wisdom of the Crowds in Minimum
The 2010 Second International Conference on Computer               Spanning Tree Problems. In Proceedings of The 32nd Annual
Modeling and Simulation, 3-6. Sanya, China: IEEE Computer          Conference of the Cognitive Science Society. Portland, Oregon
Society
                                                                   Yi, Sheng Kung Michael, Steyvers, Mark, Lee, Michael D., and
Marx, Daniel, and Marx, D Aniel. 2004, Graph Coloring              Dry, Matthew J. 2010b. Wisdom of the Crowds in Traveling
Problems and Their Applications in Scheduling, John von            Salesman                                          Problems.
Neumann PhD Students Conference, Budapest, Hungary                 http://www.socsci.uci.edu/~mdlee/YiEtAl2010.pdf
O'Madadhain, Joshua, Fisher, Danyel, and Nelson, Tom. 2011,
JUNG: Java Universal Network/Graph framework, Oct. 1 2011,
http://jung.sourceforge.net/.