ITAT 2016 Proceedings, CEUR Workshop Proceedings Vol. 1649, pp. 179–186 http://ceur-ws.org/Vol-1649, Series ISSN 1613-0073, c 2016 O. Trunda, R. Brunetto Fitness landscape analysis of hyper-heuristic transforms for the vertex cover problem Otakar Trunda and Robert Brunetto Charles University in Prague, Faculty of Mathematics and Physics Malostranské náměstí 25, Praha, Czech Republic otakar.trunda@mff.cuni.cz, robert@brunetto.cz Abstract: Hyper-heuristics have recently proved efficient In the next section, we provide some basic background in several areas of combinatorial search and optimiza- on the vertex cover problem, fitness landscape analysis tion, especially scheduling. The basic idea of hyper- techniques and hyper-heuristics. The third section presents heuristics is based on searching for search-strategy. In- our design of a hyper-heuristic for the vertex cover. In the stead of traversing the solution-space, the hyper-heuristic fourth section, we then define the spaces we analyse and traverses the space of algorithms to find or construct an the results of the analyses are given in the fifth section. algorithm best suited for the given problem instance. The The paper is concluded by a discussion about the results observed efficiency of hyper-heuristics is not yet fully ex- and possible future work. plained on the theoretical level. The leading hypothesis suggests that the fitness landscape of the algorithm-space is more favorable to local search techniques than the orig- 2 Background inal space. 2.1 Vertex cover problem In this paper, we analyse properties of fitness landscapes An undirected graph G is a tuple G = (V, E), where of the problem of minimal vertex cover. We focus on prop- V = {v1 , v2 , . . . , vn }, E = {e1 , e2 , . . . , em }, such that ∀ei ∈ erties that are related to efficiency of metaheuristics such E, ei = {ai , bi }, ai , bi ∈ V, ai 6= bi (no loops) and ∀ei , e j ∈ as locality and fitness-distance correlation. We compare E, i 6= j =⇒ ei 6= e j (no multiple edges). A set S ⊆ V is a properties of the original space and the algorithm space vertex cover in G iff ∀ei ∈ E, ∃s j ∈ S, s. t. s j ∈ ei . trying to verify the hypothesis explaining hyper-heuristics An undirected vertex-labelled graph G is an undirected performance. Our analysis shows that the hyper-heuristic- graph together with a cost function c : V 7→ R+ that assigns space really has some more favorable properties than the positive costs to vertices. Given an undirected vertex- original space. labelled graph G = (V, E), the minimal vertex cover prob- lem deals with finding a set S ⊆ V minimizing ∑si ∈S c(si ) 1 Introduction under the condition that S is a vertex cover in G. From now on, by graph we will always mean undirected vertex- Hyper-heuristics are becoming more and more popular in labelled graph. the field of combinatorial search and optimization. They We use several other standard graph notions: for vi ∈ V , transfer the search process from the space of candidate so- by Γ(vi ) we denote the neighborhood of vi i. e. the set of lutions to a space of algorithms that create candidate solu- all vertices that are connected to vi by an edge. Γ(vi ) = tions. Such approach combines metaheuristic techniques {v j ∈ V | ∃ei ∈ E, e = (vi , v j )}. By deg(vi ) we denote the with algorithm selection and proved to be efficient on degree of vertex. deg(vi ) = |Γ(vi )|. A vertex v is called many domains [2]. Several theoretical results about hyper- leaf if it’s degree is 1. A graph is called regular, if all its heuristics have been established. For example: given a set vertices have the same degree. of low-level algorithms, the hyper-heuristic combination We forbid loops in the graph as they don’t present any of them can outperform each of those individuals on all new challenge to the problem. Vertices with loops always domains. “The hyper-heuristic lunch is free [8].” How- have to be in the vertex cover, so by a linear preprocessing ever, the efficiency and inefficiency of hyper-heuristics on (removing all loopy vertices), we can reduce the problem some domains is not yet fully understood. with loops to an equivalent problem without loops. In this paper, we explore the hypothesis, that the effi- We forbid multiple edges between the same pair of ver- ciency of hyper-heuristics is caused by the fact that the tices as well. We could simply leave multiple edges in the space of algorithms is easier to explore for local search graph as they have no effect on the set of vertex covers nor techniques than the original space. We use fitness land- on the optimality of the solutions, but it would change de- scape analysis to compare properties of hyper-heuristic grees of vertices which might lead our search algorithms space with the original space and to determine which one astray. is more suitable for local search. We work with the vertex Many real-life optimization problems from transporta- cover problem, as an example of a well-established and tion, scheduling and operations research can be directly hard optimization problem. reduced to vertex cover. As for the complexity of the 180 O. Trunda, R. Brunetto problem, the vertex cover is a well-known NP-complete problem, a 2-approximation algorithm exists, and using parametrized complexity, the best known optimal algo- rithm for the unweighted version runs in time O(kn + 1.274k ), where n is the number of vertices and k is the size of an optimal vertex cover [3]. In practical applications, heuristics are typically used [1]. 2.2 Fitness landscape analysis In the theory of local search algorithms, like evolution- ary algorithms, hill-climbing, tabu-search and so on, the fitness landscape analysis is used to assess efficiency of the algorithm on a given problem. The theory of fitness landscapes can also be helpful when designing new meta- Figure 2: One of the solutions of the TSP example heuristic algorithm for a specific problem [9, 10]. A fitness landscape of an optimization problem is a set The role of the distance measure is to introduce a struc- M together with a function f : M 7→ R and a distance mea- ture to the set M. For the purposes of analysis, it is es- sure d : M × M → R+ . M denotes the set of all candi- pecially important to define neighborhoods of points. For date solutions, f is the fitness function which tells us how mi ∈ M, we denote the neighborhood of mi as N(mi ). It is good the candidate solution is, and d measures distance a set of all points from M that are close to mi . between candidate solutions. We work with combinatorial If d is given explicitly, then N can be defined as N(mi ) = problems, so the set M will be discrete and finite. {m j ∈ M | d(mi , m j ) ≤ ε}. Typically ε is taken to be 1. In In Figure 1, there is an example of a fitness landscape of the much more common case when we are given the set a small instance of the Travelling Salesman Problem [11]. of search operators P instead of d, the neighborhood is de- The tree in the upper part enumerates the set of all per- fined as NP (mi ) = {m j ∈ M | ∃p ∈ P, m j ∈ p(mi )}, i.e. the mutations which constitutes the set M. Beneath it, there set of all solutions reachable from mi by an application of is a graph of the fitness function for each point of M (the a single operator. The operators can also induce a distance length of a tour corresponding to each permutation). The measure as dP (mi , m j ) = minimum number of applications points of M are arranged linearly by their lexicographical of operators from P that allow moving from mi to m j . For- order which induces the distance measure and structure. mally: The marked point corresponds to the solution shown in Figure 2 which is a local optimum. 1. dP (mi , mi ) = 0 2. dP (mi , m j ) = (minmk ∈NP (mi ) d(mk , m j )) + 1 In the following text, we will use a few simple notions from function analysis and theory of search. Given a func- tion f : M 7→ R+ , that we want to minimize, we say: • x∗ ∈ M is an optimal solution (or global optimum) iff ∀m ∈ M : f (x∗ ) ≤ f (m) • x ∈ M is a local optimum iff ∀m ∈ N(x) : f (x) ≤ f (m) Figure 1: Example of a fitness landscape of a small TSP • if x is not an optimal solution, then the escape dis- tance of x is ed(x) = min{m∈M| f (m)< f (x)} d(x, m) = The measure d is typically not given explicitly, but in- distance to the nearest point with strictly lower fitness stead it is derived from the search operators that the al- value gorithm uses. (For example a mutation operator used by Fitness landscape analysis studies properties of fitness genetic algorithms.) Formally: a unary search operator p landscapes that are related to performance of local search is a map p : M 7→ 2M , where p(m) is the set of all possible algorithms. For example, smooth and globally convex modifications of m - i.e. the set of possible results of the function with a single local optimum is much easier to operator. When applying the operator p to a point m ∈ M, deal with than highly rugged function with many local the algorithm replaces m by one of the points from p(m). optima. Several methods for analyzing fitness landscapes The point can be selected randomly, or by some criterion, have been developed: e.g. minimum, in case of greedy operators. The algorithm repeatedly applies operators to move from one candidate • Size of M: smaller |M| leads to higher performance solution to another. and vice versa Fitness Landscape Analysis of Hyper-Heuristic Transforms for the Vertex Cover Problem 181 • Number of local optima: many local-search algo- ness value. Ideally the ed(m) should be constant for rithms are attracted to local optima. The higher num- each m ∈ M. We only compute average of ed(m) over ber of local optima therefore slows down the conver- local optima since for points which aren’t locally op- gence to global optimum. timal, the escape distance is always 1. • Average size of neighborhood: large neighborhood The landscape-analysis techniques have to traverse the decreases the number of local optima, but dramati- whole search-space of the problem multiple times. As cally increases time that the algorithm needs to tra- such search-spaces are typically very large, the analysis verse the space, and operators with large range of val- can only be done on small instances. There are techniques ues are close to random search. It is therefore impor- that allow to estimate some properties even in large spaces tant to keep the neighborhoods small. by sampling, but we won’t be using those here. For example, if all points of M are neighbors, then Note that our main purpose here is not to actually solve there are no local optima except the global ones. On large instances of vertex cover, but rather to verify the hy- the other hand, if neighborhoods are empty, i.e., no pothesis, that hyper-heuristic spaces might have very dif- pair of points are neighbors, then every point is a local ferent properties and might be much more favorable for optimum. There is therefore a trade-off between size local search techniques than the original space. of neighborhoods and number of local optima. • Fitness-distance correlation (FDC): even with only 2.3 Hyper-heuristics one local optimum, the algorithm might not be able Hyper-heuristics represent a new approach to search and to find it if it is surrounded by points with high fitness optimization which combines metaheuristics, automated values. It is therefore important that the optimal solu- parameter tuning, algorithm selection and genetic pro- tions are surrounded by low-fitness valued points and gramming. Nowadays, the algorithm selection approaches are far from high-valued points. FDC measures how are becoming more and more popular in combinatorial op- the distance between points corresponds to difference timization, as it is clear that no single algorithm can out- between their fitness values. Ideally, there should be perform every other on all domains, and therefore, the a strong positive correlation – i.e. the further from the most suitable algorithm has to be selected for the problem nearest global optimum the point is, the higher fitness at hand [4, 5]. value it should have. Hyper-heuristics try to build an algorithm suited for Formally, the FDC is computed as: FDC = given task by combining so called low-level algorithms cfd ρ( f )ρ(dopt ) , where c f d is a covariance of f and dopt , during the search. A pool of simple algorithms is given 1 |M| and the hyper-heuristic combines them into more complex c f d = |M| ∑i=1 ( f (mi ) − f )(dmi ,opt − dopt ), f is aver- units. The combination procedure is often based on some age fitness value over the whole M, dmi ,opt is a dis- kind of evolutionary computation and the quality of re- tance between mi and the nearest optimal solution, sulting units is measured by how well they can solve the and dopt is the average of dmi ,opt over the whole M. original task. The approach was especially successful in ρ( f ) and ρ(dopt ) are standard deviations of f and the area of scheduling and it is now applied to many other dopt respectively. kinds of problems [2]. FDC is in range [−1, 1]. In the ideal case, where Instead of searching the solution space directly, hyper- f (mi ) = dmi ,opt , the FDC = 1, for a random function, heuristics search the algorithm space. The approach is the FDC is close to 0 and FDC = -1 means that the op- based on an assumption that similar algorithms will find timal solution is “hidden” among high-valued points. solutions of similar quality (not necessarily close to each other) which implies that the algorithm space has some • Ruggedness: rugged function is opposite to smooth favorable properties: high locality, i.e., elements close - it is erratic, with large differences in fitness values to each other have similar evaluation, and low number between nearby points. Ruggedness is computed as of local extrema. See figure 3. On spaces with those f (x) f (x)d(x,y)=1 −( f )2 R= . Where overline denotes aver- properties, optimization metaheuristics can find good so- f 2 −( f )2 age over all values [7]. R is always in [−1, 1], value lutions quickly. Of course, such improvements come for of 1 indicates constant function, 0 means that values a price: evaluating an element from the algorithm space of neighbouring points are independent and -1 indi- takes much more time because it involves searching for a cates that the neighbouring points have opposite val- solution in the solution space. ues (i.e. every point is a local extreme). Note that Consider the following example: we want to color a higher ruggedness coefficient actually denotes more graph with the fewest colors possible. Using a genetic al- smooth function which is favorable for local search. gorithm, we could come up with a metaheuristic that will work on the set of all possible assignments of color to ver- • Average escape distance: low escape distance allows tices a the fitness function will penalize violations (con- the algorithm to quickly find a point with better fit- nected vertices having the same color) and high number 182 O. Trunda, R. Brunetto Figure 3: The hypothesized schema of a hyper-heuristic transformation of color used. Such algorithm might converge to globally 3. Select a vertex based on the S criterion and color it optimal solution, but it would take a long time. We could also use a greedy algorithm which works as 4. Color the last vertex follows: picks a vertex from a graph according to some 5. Compute the total number of colors used and use it as criterion and colors it by the lowest color possible, accord- the fitness value of the point (ILS) ing to the colors of its neighbours. Then it picks another vertex and so on until the graph is colored. The greedy The hyper-heuristic might also be based on an evolu- algorithm is very fast, but in most cases it wont converge tionary algorithm, which would in this case work on the to global optimum. algorithm space. This is popularly summarized as Heuris- There are several vertex-picking criteria, for example: tics select moves, hyper-heuristics select heuristics. The hyper-heuristic might be able to find an optimal solution, • Largest degree (L) - picks a vertex with the largest and more importantly, it might be able to find high-quality number of neighbours solutions much faster than the former approach. • Saturation degree (S) - picks a vertex who’s neigh- bours are colored with the largest number of different 3 Hyper-heuristic for minimal vertex cover colors We have designed a hyper-heuristic for the vertex cover in the similar manner as in the example mentioned ear- • Incidence degree (I) - picks a vertex with the largest lier. We used a pool of simple greedy algorithms and the number of colored neighbours hyper-heuristic combines them in order to solve the prob- Based on these three low-level greedy algorithms, a lem. The low-level algorithms work sequentially as de- hyper-heuristic would search a space of all combinations scribed in Algorithm 1. of them (the algorithm space). On a graph with 5 vertices, Algorithm 1 Greedy vertex cover the algorithm space would consist of all sequences of the 1: S ← 0/ length 4 containing symbols L, S and I. To evaluate the 2: while G contains some edge do point from the algorithm space, the hyper-heuristic would use the algorithm to find a solution, then evaluates the so- 3: v ← select vertex according to some criterion lution by the original fitness function and uses the value to 4: S ← S ∪ {v} evaluate the algorithm. For example a point ILS (from the 5: remove v from the graph (with incident edges) algorithm space) is evaluated as follows: 6: return S 1. Select a vertex based on the I criterion and color it As the vertex-selection criteria, we use the following: with the lowest possible color • Leightest (W) : selects the vertex with minimal 2. Select a vertex based on the L criterion and color it weight (cost) Fitness Landscape Analysis of Hyper-Heuristic Transforms for the Vertex Cover Problem 183 • Deg (G) : selects the vertex with maximal degree The distance measure is related to the search operators add vertex and remove vertex. One application of such operator • Sub (S) : selects the vertex v with maximal [(sum creates a solution within distance of 1 from the original of costs of neighbors) - cost of v] i.e. selected = point. argmaxv∈V (∑{w∈Γ(v)} c(w) − c(v)) • WDeg (G) : selects the vertex with minimal weight / 4.2 Hyper-heuristic space degree. Selected = argminv∈V (c(v)/deg(v) ) • M: set of all fixed-length sequences of symbols cor- • Div (D) : selects the vertex v with minimal ratio of responding to low-level heuristics cost of v and sum of costs of neighbors. Formally • f : total weight of the vertex cover found by the ap- selected = argminv∈V (c(v)/∑{w∈Γ(v)} c(w) ) plication of the sequence • Leaf (L) : selects the vertex v with minimal difference • d: distance between sequences is measured by the between cost of v and (sum of costs of neighbors of Levenshtein distance which is the minimal number v that are leaves), e.i. selected = argminv∈V (c(v) − of operations add symbol, remove symbol and replace ∑{w∈Γ(v) |w is a lea f } c(w)) symbol needed to transform the first sequence into the • Neig (N) : selects the vertex with maximal sum of second. costs of neighbors This distance measure is intuitively related to search oper- ators add symbol, remove symbol and replace symbol. • Next (X) : selects the vertex which is the next in order We distinguish the hyper-spaces by the set of low-level after the vertex selected by (W) (we used this as an algorithms that are used. We do not always use all the al- example of non-greedy, chaotic algorithm) gorithms, partially because of the high computational cost We break ties simply by the ordinal numbers of vertices, and partially because on some kinds of graphs, the cri- i.e., if more than one vertex achieve the optimal value of teria degenerate. For example, on a uniformly-weighted the criterion, we select among then the one with the lowest graph, the algorithm Neig will work exactly the same as number. Degree so its not worth using both of them. Furthermore, The space of algorithms consists of all sequences of let- we would like to be able to asses performance of single ters W,G,S,D,L (corresponding to selection criteria) of a algorithms, pairs, 3-tuples, 4-tuples and so on. fixed length. The i-th symbol in the sequence determines We work with these types of spaces: original space (de- the criterion by which the i-th vertex is selected. If a valid noted Cover), hyper-space with a specific set of low-level vertex cover is found before all symbols are used, the rest algorithms L and a fixed length of sequences k (denoted of the sequence is ignored. In much more frequent case HL,k ). We distinguish two alternatives as mentioned ear- when we have already traversed the whole sequence but lier: (i) a space, where during the evaluation, the sequence there are still edges in the graph, we use two strategies: (i) is applied repeatedly from the beginning until a valid cover repeat start over and read symbols from the beginning of the se- is found (denoted HL,k ), and (ii) a space, where during quence (repeat strategy) and (ii) use the last symbol in the the evaluation, after reaching the end of sequence, the last sequence for as long as there are edges in the graph (last symbol is applied repeatedly until a valid cover is found (denoted HL,k last ). strategy). 4 Analysis of the fitness landscapes 5 Experiments The hyper-heuristic can be viewed as a transformation of We generated a set of graphs, constructed the correspond- the search space. We will now describe properties of the ing spaces and computed fitness landscape metrics for original and transformed spaces. We use the notation of those spaces. We used graphs of various sizes, densities fitness landscapes as defined earlier. and types. For each type of graphs, we generated about 30 ∼ 40 instances and average the results. 4.1 Original space We used graphs of sizes n = 16 ∼ 45 vertices (not all values from the range were used). The densities were 0.1, • M: set of all vertex covers (not necessarily minimal 0.2, 0.3, 0.4, 0.5 and 0.7. The density of c means that in inclusion) there were 2c · n · (n − 1) edges in the graph. We used two • f : total weight of the vertex cover - sum of costs of policies to add edges - uniformly randomly (select ran- all vertices in the cover dom pair of vertices and add an edge until the desired number of edges is reached) and regularly (adds edges • d: distance between vertex covers is measured as the more likely to vertices with low degrees to create a near- number of elements on which the two covers differ. regular graph). Weights of vertices are integers taken uni- Formally: S1 , S2 ⊆ V : d(S1 , S2 ) = |{S1 \ S2 } ∪ {S2 \ formly from ranges (50 ∼ 50), (50 ∼ 60), (50 ∼ 100) and S1 }| (50 ∼ 1000). 184 O. Trunda, R. Brunetto In total, we generated over 24 000 graphs, roughly half of them were near-regular and half of them used the uni- form edge distribution. For each graph, we constructed the vertex covers-space and 20 hyper-spaces (for various com- binations of parameters - HL,klast and H repeat with different L,k sets L). The experiments run on 11 computers with 8 cores each for 20 hours on each. We present graphs comparing various metrics between spaces. On x-axis, there is the number of vertices in the graph, y-axis shows values of the particular metric. We plot several color-distinguished series in the same graph, each series represents one space. The space is described in the legend by an enumeration of low-level algorithms repeat last used. Exclamation mark at the end denotes the last strat- Figure 4: Best solutions in HL,k and HL,k egy; no mark means that the repeat strategy is used. The word cover denotes the original space of all vertex covers. We use a normalization so that different columns are of in Figure 5. Red columns come from the original space a similar magnitude and can be depicted in the same graph. and yellow ones come from one of the hyper-spaces. The Instead of plotting criterion, we plot criterion / number of hyper-space is smaller in size, so the total volume of yel- vertices - number of vertices. With this normalization, val- low is smaller. Solutions in the original space seem to fol- ues related to different number of vertices are not directly low normal distribution, while in the hyper-space, most of comparable, but values related to the same number of ver- the points generate near-to-optimal solutions or near-to- tices are, which is good enough for our purposes. With this average solutions. The behavior can be explained as fol- normalization, we plot weight of the cover divided among lows: high-quality solutions are generated due to greedi- each vertex. ness of the selection criteria and average-quality solutions We plot averages over all generated graphs grouped are generated since their number in the original space is into categories by number of vertices. We tried to distin- very high. guish the results further by edge-generation policy (regular graphs vs. random graphs), edge-density and width distri- bution. In most cases, such further distinguishing didn’t provide any new information - the results were very simi- lar in each of the smaller categories. Last vs. repeat strategy First, we have tried to com- pare the two ways of evaluating the sequences - repeat the whole sequence from the beginning and repeating only the repeat last . We believe last symbol, i.e. the spaces HL,k and HL,k that different algorithms should be used in the beginning of the search and different one at the end, so the space HL,k last should contain better solutions. We measure the value of the global optimum in each space. The graph is shown in last is slightly better than H repeat for all L. Figure 4. HL,k L,k Figure 5: Histogram of distribution of solutions based on their quality in the two spaces. Solution quality in spaces Hyper-spaces doesn’t contain all the vertex covers, they only contain some of them. For We also monitor the average value of solutions in each example, the set of all vertices is a valid vertex cover, space. Results are shown in Figure 6. All three hyper- but it is never generated by any algorithm sequence. On spaces show significantly lower average value than the the other hand, some vertex covers might be generated original space. (I.e. even a blind random search should by many different sequences of algorithms. To ease the provide better solutions in the hyper-space then in the orig- search, the hyper-space should contain an algorithm that inal space.) can generate an optimal solution, there should be a large Combination of algorithms We test the hypothesis, that number of different sequences that generate high-quality the combination of low-level algorithms can out-perform solutions and very few or none sequences that generate each of the individual algorithms. In Figure 7 there is the low-quality solutions. quality of an optimal solution in several hyper-spaces. Min To assess the quality of solutions generated by points denotes the best solution that can by found by repeated in the hyper-space, we created a histogram of quality of application of just one low-level algorithm. Other col- all solutions from original and hyper-space. It is depicted ors correspond to combinations. For all graph size, qual- Fitness Landscape Analysis of Hyper-Heuristic Transforms for the Vertex Cover Problem 185 Figure 6: Average quality of solutions in spaces Figure 8: FDC of spaces ity of optimal solution is better for most combinations of low-level algorithms. This result supports the hypothesis and suggests that the hyper-heuristic might be worth using even with its overhead (evaluating points in hyper-space is costly). TODO in most cases, however cover je lepsi nez H-H takze transformation vede ke ztrate optimalniho reseni. Figure 9: Ruggedness of spaces be large, but note that the amount of work needed to es- cape the local minimum is (number of neighbors)ED , so any savings in the exponent are significant. Figure 7: Best solutions of single algorithms and combi- nations Fitness-distance correlation Figure 8 shows the FDC for several spaces. In the original space, FDC decreases with the number of vertices, while in the original space it stays high regardless of the size of the graph. (Note that higher FDC is better for local search algorithms.) This graph is shown without normalization, as the FDC is naturally in [−1, 1]. The rest of graphs are also without normalization since it’s not needed. Figure 10: Escape distance from local optima Ruggedness In Figure 9 there is the ruggedness of our Number of local optima The graph 11 shows frequency spaces. Most of hyper-spaces have higher ruggedness that of local minima in our spaces (i.e. the number of local the original space which is again better for local search minima over all nodes). The result is much better in the algorithms. The ruggedness seem to be independent on original space than in the hyper-spaces. This is partially the size of the graph. caused by the fact that hyper-spaces contain many points Escape distance Figure 10 shows the average escape dis- that have the same fitness value and we define local min- tance (ED) of some of our spaces. The ED is systemati- ima as non-strict - i.e. points on plateaus are all considered cally lower in all hyper-spaces. The difference might not local minima. 186 O. Trunda, R. Brunetto It would also be beneficial to actually run some search algorithm on the hyper-space (for some large instances) and prove that the hyper-heuristic really works in practice and is at least comparable to metaheuristics working di- rectly in the solution space. Acknowledgement The research is supported by the Grant Agency of Charles University under contract no. 390214 and it is also sup- ported by SVV project number 260 104. We would like to thank the anonymous reviewers for their useful comments and suggestions. References Figure 11: Frequency of local optima [1] Eric Angel, Romain Campigotto, and Christian Laforest. 6 Conclusions and future work Implementation and Comparison of Heuristics for the Ver- tex Cover Problem on Huge Graphs. In 11th International We presented a hyper-heuristic transformation for the Symposium, SEA 2012, volume 7276 of Lecture Notes in vertex cover problem and constructed several variants Computer Science, pages 39–50, Bordeaux, France, June of hyper-heuristic spaces using a set of greedy algo- 2012. rithms. We measured several important features of the [2] E. K. Burke, M. Gendreau, M. Hyde, G. Kendall, G. Ochoa, original space and the hyper-spaces and by comparing E. Ozcan, and R. Qu. Hyper-heuristics: a survey of the state them we have experimentally verified the hyper-heuristic- of the art. Journal of the Operational Research Society, hypothesis that tries to explain the efficiency of hyper- 64(12):1695–1724, Dec 2013. heuristics in practise by properties of hyper-space. Our [3] Jianer Chen, Iyad A. Kanj, and Ge Xia. Improved Param- results can be summarized as follows: eterized Upper Bounds for Vertex Cover, pages 238–249. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006. 1. Combination of greedy algorithms is in most cases [4] Youssef Hamadi, Eric Monfroy, and Frédéric Saubion. Au- much better that using single algorithm all the time tonomous search. Springer-Verlag, 2012. [5] R. Kumar, S. K. Singh, and V. Kumar. A heuristic approach 2. Only small part of the original space can be generated for search engine selection in meta-search engine. In Com- from the hyper-space. Most of high-quality solutions puting, Communication Automation (ICCCA), 2015 Inter- can be generated, but sometimes we loose the optimal national Conference on, pages 865–869, May 2015. solution by the transformation [6] Marie-Eléonore Marmion, Clarisse Dhaenens, Laetitia 3. The transformation improves fitness-distance correla- Jourdan, Arnaud Liefooghe, and Sébastien Vérel. On tion, escape distance and average solution quality. It the neutrality of flowshop scheduling fitness landscapes. CoRR, abs/1207.4629, 2012. has some positive effect on ruggedness as well, but it increases the frequency of local optima in the space. [7] Peter Merz and Bernd Freisleben. Fitness landscapes, memetic algorithms, and greedy operators for graph bipar- 4. There were no significant differences between vari- titioning. Evol. Comput., 8(1):61–91, March 2000. ous types of graphs, e.g. regular vs. random. We [8] Riccardo Poli and Mario Graff. Genetic Programming: believe that this is caused by small size of our test 12th European Conference, EuroGP 2009 Tübingen, Ger- graphs. many, April 15-17, 2009 Proceedings, chapter There Is a Free Lunch for Hyper-Heuristics, Genetic Programming As future work, we would like to continue the analysis and Computer Scientists, pages 195–207. Springer Berlin using more landscape analysis techniques, such as neu- Heidelberg, Berlin, Heidelberg, 2009. trality [6], decomposability etc. Also, we would like to [9] Franz Rothlauf. Representations for genetic and evolution- add more low-level algorithms into the pool, especially ary algorithms (2. ed.). Springer, 2006. less-greedy ones. Greediness of the algorithms causes that [10] Franz Rothlauf. Design of Modern Heuristics. Natural only a very small portion of vertex covers are reachable Computing Series. Springer, 2011. from the hyper-space. In some cases, more than 50% [11] Mohammad-H Tayarani-N. and Adam Prügel-Bennett. An of all algorithms in the space generated the same vertex analysis of the fitness landscape of travelling salesman cover. Such small range of unique fitness values then cre- problem. Evolutionary Computation, 24(2):347–384, Jun ates large plateaus, which contribute to an illusion of high 2015. locality and fine ruggedness. It is also responsible for the large number of local extrema that we observed in the hyper-spaces.