ITAT 2016 Proceedings, CEUR Workshop Proceedings Vol. 1649, pp. 179–186
http://ceur-ws.org/Vol-1649, Series ISSN 1613-0073, c 2016 O. Trunda, R. Brunetto


                        Fitness landscape analysis of hyper-heuristic transforms
                                      for the vertex cover problem

                                                   Otakar Trunda and Robert Brunetto

                                      Charles University in Prague, Faculty of Mathematics and Physics
                                              Malostranské náměstí 25, Praha, Czech Republic
                                      otakar.trunda@mff.cuni.cz, robert@brunetto.cz

      Abstract: Hyper-heuristics have recently proved efficient             In the next section, we provide some basic background
      in several areas of combinatorial search and optimiza-             on the vertex cover problem, fitness landscape analysis
      tion, especially scheduling. The basic idea of hyper-              techniques and hyper-heuristics. The third section presents
      heuristics is based on searching for search-strategy. In-          our design of a hyper-heuristic for the vertex cover. In the
      stead of traversing the solution-space, the hyper-heuristic        fourth section, we then define the spaces we analyse and
      traverses the space of algorithms to find or construct an          the results of the analyses are given in the fifth section.
      algorithm best suited for the given problem instance. The          The paper is concluded by a discussion about the results
      observed efficiency of hyper-heuristics is not yet fully ex-       and possible future work.
      plained on the theoretical level. The leading hypothesis
      suggests that the fitness landscape of the algorithm-space
      is more favorable to local search techniques than the orig-
                                                                         2     Background
      inal space.                                                        2.1    Vertex cover problem
         In this paper, we analyse properties of fitness landscapes
                                                                         An undirected graph G is a tuple G = (V, E), where
      of the problem of minimal vertex cover. We focus on prop-
                                                                         V = {v1 , v2 , . . . , vn }, E = {e1 , e2 , . . . , em }, such that ∀ei ∈
      erties that are related to efficiency of metaheuristics such
                                                                         E, ei = {ai , bi }, ai , bi ∈ V, ai 6= bi (no loops) and ∀ei , e j ∈
      as locality and fitness-distance correlation. We compare
                                                                         E, i 6= j =⇒ ei 6= e j (no multiple edges). A set S ⊆ V is a
      properties of the original space and the algorithm space
                                                                         vertex cover in G iff ∀ei ∈ E, ∃s j ∈ S, s. t. s j ∈ ei .
      trying to verify the hypothesis explaining hyper-heuristics
                                                                            An undirected vertex-labelled graph G is an undirected
      performance. Our analysis shows that the hyper-heuristic-
                                                                         graph together with a cost function c : V 7→ R+ that assigns
      space really has some more favorable properties than the
                                                                         positive costs to vertices. Given an undirected vertex-
      original space.
                                                                         labelled graph G = (V, E), the minimal vertex cover prob-
                                                                         lem deals with finding a set S ⊆ V minimizing ∑si ∈S c(si )
      1   Introduction                                                   under the condition that S is a vertex cover in G. From
                                                                         now on, by graph we will always mean undirected vertex-
      Hyper-heuristics are becoming more and more popular in             labelled graph.
      the field of combinatorial search and optimization. They              We use several other standard graph notions: for vi ∈ V ,
      transfer the search process from the space of candidate so-        by Γ(vi ) we denote the neighborhood of vi i. e. the set of
      lutions to a space of algorithms that create candidate solu-       all vertices that are connected to vi by an edge. Γ(vi ) =
      tions. Such approach combines metaheuristic techniques             {v j ∈ V | ∃ei ∈ E, e = (vi , v j )}. By deg(vi ) we denote the
      with algorithm selection and proved to be efficient on             degree of vertex. deg(vi ) = |Γ(vi )|. A vertex v is called
      many domains [2]. Several theoretical results about hyper-         leaf if it’s degree is 1. A graph is called regular, if all its
      heuristics have been established. For example: given a set         vertices have the same degree.
      of low-level algorithms, the hyper-heuristic combination              We forbid loops in the graph as they don’t present any
      of them can outperform each of those individuals on all            new challenge to the problem. Vertices with loops always
      domains. “The hyper-heuristic lunch is free [8].” How-             have to be in the vertex cover, so by a linear preprocessing
      ever, the efficiency and inefficiency of hyper-heuristics on       (removing all loopy vertices), we can reduce the problem
      some domains is not yet fully understood.                          with loops to an equivalent problem without loops.
         In this paper, we explore the hypothesis, that the effi-           We forbid multiple edges between the same pair of ver-
      ciency of hyper-heuristics is caused by the fact that the          tices as well. We could simply leave multiple edges in the
      space of algorithms is easier to explore for local search          graph as they have no effect on the set of vertex covers nor
      techniques than the original space. We use fitness land-           on the optimality of the solutions, but it would change de-
      scape analysis to compare properties of hyper-heuristic            grees of vertices which might lead our search algorithms
      space with the original space and to determine which one           astray.
      is more suitable for local search. We work with the vertex            Many real-life optimization problems from transporta-
      cover problem, as an example of a well-established and             tion, scheduling and operations research can be directly
      hard optimization problem.                                         reduced to vertex cover. As for the complexity of the
180                                                                                                                      O. Trunda, R. Brunetto

      problem, the vertex cover is a well-known NP-complete
      problem, a 2-approximation algorithm exists, and using
      parametrized complexity, the best known optimal algo-
      rithm for the unweighted version runs in time O(kn +
      1.274k ), where n is the number of vertices and k is the size
      of an optimal vertex cover [3]. In practical applications,
      heuristics are typically used [1].

      2.2   Fitness landscape analysis

      In the theory of local search algorithms, like evolution-
      ary algorithms, hill-climbing, tabu-search and so on, the
      fitness landscape analysis is used to assess efficiency of
      the algorithm on a given problem. The theory of fitness
      landscapes can also be helpful when designing new meta-             Figure 2: One of the solutions of the TSP example
      heuristic algorithm for a specific problem [9, 10].
         A fitness landscape of an optimization problem is a set
                                                                         The role of the distance measure is to introduce a struc-
      M together with a function f : M 7→ R and a distance mea-
                                                                      ture to the set M. For the purposes of analysis, it is es-
      sure d : M × M → R+ . M denotes the set of all candi-
                                                                      pecially important to define neighborhoods of points. For
      date solutions, f is the fitness function which tells us how
                                                                      mi ∈ M, we denote the neighborhood of mi as N(mi ). It is
      good the candidate solution is, and d measures distance
                                                                      a set of all points from M that are close to mi .
      between candidate solutions. We work with combinatorial
                                                                         If d is given explicitly, then N can be defined as N(mi ) =
      problems, so the set M will be discrete and finite.
                                                                      {m j ∈ M | d(mi , m j ) ≤ ε}. Typically ε is taken to be 1. In
         In Figure 1, there is an example of a fitness landscape of
                                                                      the much more common case when we are given the set
      a small instance of the Travelling Salesman Problem [11].
                                                                      of search operators P instead of d, the neighborhood is de-
      The tree in the upper part enumerates the set of all per-
                                                                      fined as NP (mi ) = {m j ∈ M | ∃p ∈ P, m j ∈ p(mi )}, i.e. the
      mutations which constitutes the set M. Beneath it, there
                                                                      set of all solutions reachable from mi by an application of
      is a graph of the fitness function for each point of M (the
                                                                      a single operator. The operators can also induce a distance
      length of a tour corresponding to each permutation). The
                                                                      measure as dP (mi , m j ) = minimum number of applications
      points of M are arranged linearly by their lexicographical
                                                                      of operators from P that allow moving from mi to m j . For-
      order which induces the distance measure and structure.
                                                                      mally:
      The marked point corresponds to the solution shown in
      Figure 2 which is a local optimum.                                1. dP (mi , mi ) = 0
                                                                        2. dP (mi , m j ) = (minmk ∈NP (mi ) d(mk , m j )) + 1
                                                                         In the following text, we will use a few simple notions
                                                                      from function analysis and theory of search. Given a func-
                                                                      tion f : M 7→ R+ , that we want to minimize, we say:
                                                                        • x∗ ∈ M is an optimal solution (or global optimum) iff
                                                                          ∀m ∈ M : f (x∗ ) ≤ f (m)
                                                                        • x ∈ M is a local optimum iff ∀m ∈ N(x) : f (x) ≤ f (m)

      Figure 1: Example of a fitness landscape of a small TSP           • if x is not an optimal solution, then the escape dis-
                                                                          tance of x is ed(x) = min{m∈M| f (m)< f (x)} d(x, m) =
         The measure d is typically not given explicitly, but in-         distance to the nearest point with strictly lower fitness
      stead it is derived from the search operators that the al-          value
      gorithm uses. (For example a mutation operator used by
                                                                         Fitness landscape analysis studies properties of fitness
      genetic algorithms.) Formally: a unary search operator p
                                                                      landscapes that are related to performance of local search
      is a map p : M 7→ 2M , where p(m) is the set of all possible
                                                                      algorithms. For example, smooth and globally convex
      modifications of m - i.e. the set of possible results of the
                                                                      function with a single local optimum is much easier to
      operator. When applying the operator p to a point m ∈ M,
                                                                      deal with than highly rugged function with many local
      the algorithm replaces m by one of the points from p(m).
                                                                      optima. Several methods for analyzing fitness landscapes
      The point can be selected randomly, or by some criterion,
                                                                      have been developed:
      e.g. minimum, in case of greedy operators. The algorithm
      repeatedly applies operators to move from one candidate           • Size of M: smaller |M| leads to higher performance
      solution to another.                                                and vice versa
Fitness Landscape Analysis of Hyper-Heuristic Transforms for the Vertex Cover Problem                                                          181

        • Number of local optima: many local-search algo-                            ness value. Ideally the ed(m) should be constant for
          rithms are attracted to local optima. The higher num-                      each m ∈ M. We only compute average of ed(m) over
          ber of local optima therefore slows down the conver-                       local optima since for points which aren’t locally op-
          gence to global optimum.                                                   timal, the escape distance is always 1.

        • Average size of neighborhood: large neighborhood                        The landscape-analysis techniques have to traverse the
          decreases the number of local optima, but dramati-                   whole search-space of the problem multiple times. As
          cally increases time that the algorithm needs to tra-                such search-spaces are typically very large, the analysis
          verse the space, and operators with large range of val-              can only be done on small instances. There are techniques
          ues are close to random search. It is therefore impor-               that allow to estimate some properties even in large spaces
          tant to keep the neighborhoods small.                                by sampling, but we won’t be using those here.
           For example, if all points of M are neighbors, then                    Note that our main purpose here is not to actually solve
           there are no local optima except the global ones. On                large instances of vertex cover, but rather to verify the hy-
           the other hand, if neighborhoods are empty, i.e., no                pothesis, that hyper-heuristic spaces might have very dif-
           pair of points are neighbors, then every point is a local           ferent properties and might be much more favorable for
           optimum. There is therefore a trade-off between size                local search techniques than the original space.
           of neighborhoods and number of local optima.

        • Fitness-distance correlation (FDC): even with only                   2.3   Hyper-heuristics
          one local optimum, the algorithm might not be able                   Hyper-heuristics represent a new approach to search and
          to find it if it is surrounded by points with high fitness           optimization which combines metaheuristics, automated
          values. It is therefore important that the optimal solu-             parameter tuning, algorithm selection and genetic pro-
          tions are surrounded by low-fitness valued points and                gramming. Nowadays, the algorithm selection approaches
          are far from high-valued points. FDC measures how                    are becoming more and more popular in combinatorial op-
          the distance between points corresponds to difference                timization, as it is clear that no single algorithm can out-
          between their fitness values. Ideally, there should be               perform every other on all domains, and therefore, the
          a strong positive correlation – i.e. the further from the            most suitable algorithm has to be selected for the problem
          nearest global optimum the point is, the higher fitness              at hand [4, 5].
          value it should have.                                                   Hyper-heuristics try to build an algorithm suited for
           Formally, the FDC is computed as: FDC =                             given task by combining so called low-level algorithms
                 cfd
           ρ( f )ρ(dopt ) , where c f d is a covariance of f and dopt ,        during the search. A pool of simple algorithms is given
                    1     |M|                                                  and the hyper-heuristic combines them into more complex
           c f d = |M| ∑i=1 ( f (mi ) − f )(dmi ,opt − dopt ), f is aver-
                                                                               units. The combination procedure is often based on some
           age fitness value over the whole M, dmi ,opt is a dis-
                                                                               kind of evolutionary computation and the quality of re-
           tance between mi and the nearest optimal solution,
                                                                               sulting units is measured by how well they can solve the
           and dopt is the average of dmi ,opt over the whole M.
                                                                               original task. The approach was especially successful in
           ρ( f ) and ρ(dopt ) are standard deviations of f and
                                                                               the area of scheduling and it is now applied to many other
           dopt respectively.
                                                                               kinds of problems [2].
           FDC is in range [−1, 1]. In the ideal case, where                      Instead of searching the solution space directly, hyper-
            f (mi ) = dmi ,opt , the FDC = 1, for a random function,           heuristics search the algorithm space. The approach is
           the FDC is close to 0 and FDC = -1 means that the op-               based on an assumption that similar algorithms will find
           timal solution is “hidden” among high-valued points.                solutions of similar quality (not necessarily close to each
                                                                               other) which implies that the algorithm space has some
        • Ruggedness: rugged function is opposite to smooth                    favorable properties: high locality, i.e., elements close
          - it is erratic, with large differences in fitness values            to each other have similar evaluation, and low number
          between nearby points. Ruggedness is computed as                     of local extrema. See figure 3. On spaces with those
                f (x) f (x)d(x,y)=1 −( f )2
           R=                                 . Where overline denotes aver-   properties, optimization metaheuristics can find good so-
                        f 2 −( f )2
           age over all values [7]. R is always in [−1, 1], value              lutions quickly. Of course, such improvements come for
           of 1 indicates constant function, 0 means that values               a price: evaluating an element from the algorithm space
           of neighbouring points are independent and -1 indi-                 takes much more time because it involves searching for a
           cates that the neighbouring points have opposite val-               solution in the solution space.
           ues (i.e. every point is a local extreme). Note that                   Consider the following example: we want to color a
           higher ruggedness coefficient actually denotes more                 graph with the fewest colors possible. Using a genetic al-
           smooth function which is favorable for local search.                gorithm, we could come up with a metaheuristic that will
                                                                               work on the set of all possible assignments of color to ver-
        • Average escape distance: low escape distance allows                  tices a the fitness function will penalize violations (con-
          the algorithm to quickly find a point with better fit-               nected vertices having the same color) and high number
182                                                                                                                  O. Trunda, R. Brunetto


                                Figure 3: The hypothesized schema of a hyper-heuristic transformation


      of color used. Such algorithm might converge to globally            3. Select a vertex based on the S criterion and color it
      optimal solution, but it would take a long time.
         We could also use a greedy algorithm which works as              4. Color the last vertex
      follows: picks a vertex from a graph according to some              5. Compute the total number of colors used and use it as
      criterion and colors it by the lowest color possible, accord-          the fitness value of the point (ILS)
      ing to the colors of its neighbours. Then it picks another
      vertex and so on until the graph is colored. The greedy            The hyper-heuristic might also be based on an evolu-
      algorithm is very fast, but in most cases it wont converge      tionary algorithm, which would in this case work on the
      to global optimum.                                              algorithm space. This is popularly summarized as Heuris-
         There are several vertex-picking criteria, for example:      tics select moves, hyper-heuristics select heuristics. The
                                                                      hyper-heuristic might be able to find an optimal solution,
        • Largest degree (L) - picks a vertex with the largest        and more importantly, it might be able to find high-quality
          number of neighbours                                        solutions much faster than the former approach.
        • Saturation degree (S) - picks a vertex who’s neigh-
          bours are colored with the largest number of different
                                                                      3     Hyper-heuristic for minimal vertex cover
          colors                                                      We have designed a hyper-heuristic for the vertex cover
                                                                      in the similar manner as in the example mentioned ear-
        • Incidence degree (I) - picks a vertex with the largest      lier. We used a pool of simple greedy algorithms and the
          number of colored neighbours                                hyper-heuristic combines them in order to solve the prob-
         Based on these three low-level greedy algorithms, a          lem. The low-level algorithms work sequentially as de-
      hyper-heuristic would search a space of all combinations        scribed in Algorithm 1.
      of them (the algorithm space). On a graph with 5 vertices,
                                                                      Algorithm 1 Greedy vertex cover
      the algorithm space would consist of all sequences of the
                                                                       1: S ← 0/
      length 4 containing symbols L, S and I. To evaluate the
                                                                       2: while G contains some edge do
      point from the algorithm space, the hyper-heuristic would
      use the algorithm to find a solution, then evaluates the so-     3:     v ← select vertex according to some criterion
      lution by the original fitness function and uses the value to    4:     S ← S ∪ {v}
      evaluate the algorithm. For example a point ILS (from the        5:     remove v from the graph (with incident edges)
      algorithm space) is evaluated as follows:                        6: return S

       1. Select a vertex based on the I criterion and color it           As the vertex-selection criteria, we use the following:
          with the lowest possible color
                                                                          • Leightest (W) : selects the vertex with minimal
       2. Select a vertex based on the L criterion and color it             weight (cost)
Fitness Landscape Analysis of Hyper-Heuristic Transforms for the Vertex Cover Problem                                                     183

          • Deg (G) : selects the vertex with maximal degree             The distance measure is related to the search operators add
                                                                         vertex and remove vertex. One application of such operator
          • Sub (S) : selects the vertex v with maximal [(sum            creates a solution within distance of 1 from the original
            of costs of neighbors) - cost of v] i.e. selected =          point.
            argmaxv∈V (∑{w∈Γ(v)} c(w) − c(v))
          • WDeg (G) : selects the vertex with minimal weight /          4.2    Hyper-heuristic space
            degree. Selected = argminv∈V (c(v)/deg(v) )
                                                                             • M: set of all fixed-length sequences of symbols cor-
          • Div (D) : selects the vertex v with minimal ratio of               responding to low-level heuristics
            cost of v and sum of costs of neighbors. Formally
                                                                             • f : total weight of the vertex cover found by the ap-
            selected = argminv∈V (c(v)/∑{w∈Γ(v)} c(w) )
                                                                               plication of the sequence
          • Leaf (L) : selects the vertex v with minimal difference          • d: distance between sequences is measured by the
            between cost of v and (sum of costs of neighbors of                Levenshtein distance which is the minimal number
            v that are leaves), e.i. selected = argminv∈V (c(v) −              of operations add symbol, remove symbol and replace
            ∑{w∈Γ(v) |w is a lea f } c(w))                                     symbol needed to transform the first sequence into the
          • Neig (N) : selects the vertex with maximal sum of                  second.
            costs of neighbors                                           This distance measure is intuitively related to search oper-
                                                                         ators add symbol, remove symbol and replace symbol.
          • Next (X) : selects the vertex which is the next in order
                                                                            We distinguish the hyper-spaces by the set of low-level
            after the vertex selected by (W) (we used this as an
                                                                         algorithms that are used. We do not always use all the al-
            example of non-greedy, chaotic algorithm)
                                                                         gorithms, partially because of the high computational cost
         We break ties simply by the ordinal numbers of vertices,        and partially because on some kinds of graphs, the cri-
      i.e., if more than one vertex achieve the optimal value of         teria degenerate. For example, on a uniformly-weighted
      the criterion, we select among then the one with the lowest        graph, the algorithm Neig will work exactly the same as
      number.                                                            Degree so its not worth using both of them. Furthermore,
         The space of algorithms consists of all sequences of let-       we would like to be able to asses performance of single
      ters W,G,S,D,L (corresponding to selection criteria) of a          algorithms, pairs, 3-tuples, 4-tuples and so on.
      fixed length. The i-th symbol in the sequence determines              We work with these types of spaces: original space (de-
      the criterion by which the i-th vertex is selected. If a valid     noted Cover), hyper-space with a specific set of low-level
      vertex cover is found before all symbols are used, the rest        algorithms L and a fixed length of sequences k (denoted
      of the sequence is ignored. In much more frequent case             HL,k ). We distinguish two alternatives as mentioned ear-
      when we have already traversed the whole sequence but              lier: (i) a space, where during the evaluation, the sequence
      there are still edges in the graph, we use two strategies: (i)     is applied repeatedly from the beginning until a valid cover
                                                                                               repeat
      start over and read symbols from the beginning of the se-          is found (denoted HL,k       ), and (ii) a space, where during
      quence (repeat strategy) and (ii) use the last symbol in the       the evaluation, after reaching the end of sequence, the last
      sequence for as long as there are edges in the graph (last         symbol is applied repeatedly until a valid cover is found
                                                                         (denoted HL,k last ).
      strategy).

      4     Analysis of the fitness landscapes
                                                                         5     Experiments
      The hyper-heuristic can be viewed as a transformation of
                                                                         We generated a set of graphs, constructed the correspond-
      the search space. We will now describe properties of the
                                                                         ing spaces and computed fitness landscape metrics for
      original and transformed spaces. We use the notation of
                                                                         those spaces. We used graphs of various sizes, densities
      fitness landscapes as defined earlier.
                                                                         and types. For each type of graphs, we generated about
                                                                         30 ∼ 40 instances and average the results.
      4.1    Original space                                                 We used graphs of sizes n = 16 ∼ 45 vertices (not all
                                                                         values from the range were used). The densities were 0.1,
          • M: set of all vertex covers (not necessarily minimal         0.2, 0.3, 0.4, 0.5 and 0.7. The density of c means that
            in inclusion)                                                there were 2c · n · (n − 1) edges in the graph. We used two
          • f : total weight of the vertex cover - sum of costs of       policies to add edges - uniformly randomly (select ran-
            all vertices in the cover                                    dom pair of vertices and add an edge until the desired
                                                                         number of edges is reached) and regularly (adds edges
          • d: distance between vertex covers is measured as the         more likely to vertices with low degrees to create a near-
            number of elements on which the two covers differ.           regular graph). Weights of vertices are integers taken uni-
            Formally: S1 , S2 ⊆ V : d(S1 , S2 ) = |{S1 \ S2 } ∪ {S2 \    formly from ranges (50 ∼ 50), (50 ∼ 60), (50 ∼ 100) and
            S1 }|                                                        (50 ∼ 1000).
184                                                                                                                      O. Trunda, R. Brunetto

         In total, we generated over 24 000 graphs, roughly half
      of them were near-regular and half of them used the uni-
      form edge distribution. For each graph, we constructed the
      vertex covers-space and 20 hyper-spaces (for various com-
      binations of parameters - HL,klast and H repeat with different
                                              L,k
      sets L). The experiments run on 11 computers with 8 cores
      each for 20 hours on each.
         We present graphs comparing various metrics between
      spaces. On x-axis, there is the number of vertices in the
      graph, y-axis shows values of the particular metric. We
      plot several color-distinguished series in the same graph,
      each series represents one space. The space is described
      in the legend by an enumeration of low-level algorithms
                                                                                                               repeat      last
      used. Exclamation mark at the end denotes the last strat-                   Figure 4: Best solutions in HL,k    and HL,k
      egy; no mark means that the repeat strategy is used. The
      word cover denotes the original space of all vertex covers.
         We use a normalization so that different columns are of           in Figure 5. Red columns come from the original space
      a similar magnitude and can be depicted in the same graph.           and yellow ones come from one of the hyper-spaces. The
      Instead of plotting criterion, we plot criterion / number of         hyper-space is smaller in size, so the total volume of yel-
      vertices - number of vertices. With this normalization, val-         low is smaller. Solutions in the original space seem to fol-
      ues related to different number of vertices are not directly         low normal distribution, while in the hyper-space, most of
      comparable, but values related to the same number of ver-            the points generate near-to-optimal solutions or near-to-
      tices are, which is good enough for our purposes. With this          average solutions. The behavior can be explained as fol-
      normalization, we plot weight of the cover divided among             lows: high-quality solutions are generated due to greedi-
      each vertex.                                                         ness of the selection criteria and average-quality solutions
         We plot averages over all generated graphs grouped                are generated since their number in the original space is
      into categories by number of vertices. We tried to distin-           very high.
      guish the results further by edge-generation policy (regular
      graphs vs. random graphs), edge-density and width distri-
      bution. In most cases, such further distinguishing didn’t
      provide any new information - the results were very simi-
      lar in each of the smaller categories.
      Last vs. repeat strategy First, we have tried to com-
      pare the two ways of evaluating the sequences - repeat the
      whole sequence from the beginning and repeating only the
                                        repeat        last . We believe
      last symbol, i.e. the spaces HL,k        and HL,k
      that different algorithms should be used in the beginning
      of the search and different one at the end, so the space HL,k last

      should contain better solutions. We measure the value of
      the global optimum in each space. The graph is shown in
                   last is slightly better than H repeat for all L.
      Figure 4. HL,k                             L,k                       Figure 5: Histogram of distribution of solutions based on
                                                                           their quality in the two spaces.
      Solution quality in spaces Hyper-spaces doesn’t contain
      all the vertex covers, they only contain some of them. For              We also monitor the average value of solutions in each
      example, the set of all vertices is a valid vertex cover,            space. Results are shown in Figure 6. All three hyper-
      but it is never generated by any algorithm sequence. On              spaces show significantly lower average value than the
      the other hand, some vertex covers might be generated                original space. (I.e. even a blind random search should
      by many different sequences of algorithms. To ease the               provide better solutions in the hyper-space then in the orig-
      search, the hyper-space should contain an algorithm that             inal space.)
      can generate an optimal solution, there should be a large            Combination of algorithms We test the hypothesis, that
      number of different sequences that generate high-quality             the combination of low-level algorithms can out-perform
      solutions and very few or none sequences that generate               each of the individual algorithms. In Figure 7 there is the
      low-quality solutions.                                               quality of an optimal solution in several hyper-spaces. Min
         To assess the quality of solutions generated by points            denotes the best solution that can by found by repeated
      in the hyper-space, we created a histogram of quality of             application of just one low-level algorithm. Other col-
      all solutions from original and hyper-space. It is depicted          ors correspond to combinations. For all graph size, qual-
Fitness Landscape Analysis of Hyper-Heuristic Transforms for the Vertex Cover Problem                                                     185


           Figure 6: Average quality of solutions in spaces                                Figure 8: FDC of spaces


      ity of optimal solution is better for most combinations of
      low-level algorithms. This result supports the hypothesis
      and suggests that the hyper-heuristic might be worth using
      even with its overhead (evaluating points in hyper-space
      is costly). TODO in most cases, however cover je lepsi
      nez H-H takze transformation vede ke ztrate optimalniho
      reseni.


                                                                                        Figure 9: Ruggedness of spaces


                                                                         be large, but note that the amount of work needed to es-
                                                                         cape the local minimum is (number of neighbors)ED , so
                                                                         any savings in the exponent are significant.


      Figure 7: Best solutions of single algorithms and combi-
      nations

      Fitness-distance correlation Figure 8 shows the FDC for
      several spaces. In the original space, FDC decreases with
      the number of vertices, while in the original space it stays
      high regardless of the size of the graph. (Note that higher
      FDC is better for local search algorithms.) This graph is
      shown without normalization, as the FDC is naturally in
      [−1, 1]. The rest of graphs are also without normalization
      since it’s not needed.                                                    Figure 10: Escape distance from local optima
      Ruggedness In Figure 9 there is the ruggedness of our              Number of local optima The graph 11 shows frequency
      spaces. Most of hyper-spaces have higher ruggedness that           of local minima in our spaces (i.e. the number of local
      the original space which is again better for local search          minima over all nodes). The result is much better in the
      algorithms. The ruggedness seem to be independent on               original space than in the hyper-spaces. This is partially
      the size of the graph.                                             caused by the fact that hyper-spaces contain many points
      Escape distance Figure 10 shows the average escape dis-            that have the same fitness value and we define local min-
      tance (ED) of some of our spaces. The ED is systemati-             ima as non-strict - i.e. points on plateaus are all considered
      cally lower in all hyper-spaces. The difference might not          local minima.
186                                                                                                                    O. Trunda, R. Brunetto

                                                                         It would also be beneficial to actually run some search
                                                                      algorithm on the hyper-space (for some large instances)
                                                                      and prove that the hyper-heuristic really works in practice
                                                                      and is at least comparable to metaheuristics working di-
                                                                      rectly in the solution space.

                                                                      Acknowledgement
                                                                      The research is supported by the Grant Agency of Charles
                                                                      University under contract no. 390214 and it is also sup-
                                                                      ported by SVV project number 260 104. We would like to
                                                                      thank the anonymous reviewers for their useful comments
                                                                      and suggestions.

                                                                      References
                  Figure 11: Frequency of local optima
                                                                       [1] Eric Angel, Romain Campigotto, and Christian Laforest.
      6     Conclusions and future work                                    Implementation and Comparison of Heuristics for the Ver-
                                                                           tex Cover Problem on Huge Graphs. In 11th International
      We presented a hyper-heuristic transformation for the                Symposium, SEA 2012, volume 7276 of Lecture Notes in
      vertex cover problem and constructed several variants                Computer Science, pages 39–50, Bordeaux, France, June
      of hyper-heuristic spaces using a set of greedy algo-                2012.
      rithms. We measured several important features of the            [2] E. K. Burke, M. Gendreau, M. Hyde, G. Kendall, G. Ochoa,
      original space and the hyper-spaces and by comparing                 E. Ozcan, and R. Qu. Hyper-heuristics: a survey of the state
      them we have experimentally verified the hyper-heuristic-            of the art. Journal of the Operational Research Society,
      hypothesis that tries to explain the efficiency of hyper-            64(12):1695–1724, Dec 2013.
      heuristics in practise by properties of hyper-space. Our         [3] Jianer Chen, Iyad A. Kanj, and Ge Xia. Improved Param-
      results can be summarized as follows:                                eterized Upper Bounds for Vertex Cover, pages 238–249.
                                                                           Springer Berlin Heidelberg, Berlin, Heidelberg, 2006.
          1. Combination of greedy algorithms is in most cases         [4] Youssef Hamadi, Eric Monfroy, and Frédéric Saubion. Au-
             much better that using single algorithm all the time          tonomous search. Springer-Verlag, 2012.
                                                                       [5] R. Kumar, S. K. Singh, and V. Kumar. A heuristic approach
          2. Only small part of the original space can be generated
                                                                           for search engine selection in meta-search engine. In Com-
             from the hyper-space. Most of high-quality solutions
                                                                           puting, Communication Automation (ICCCA), 2015 Inter-
             can be generated, but sometimes we loose the optimal          national Conference on, pages 865–869, May 2015.
             solution by the transformation
                                                                       [6] Marie-Eléonore Marmion, Clarisse Dhaenens, Laetitia
          3. The transformation improves fitness-distance correla-         Jourdan, Arnaud Liefooghe, and Sébastien Vérel. On
             tion, escape distance and average solution quality. It        the neutrality of flowshop scheduling fitness landscapes.
                                                                           CoRR, abs/1207.4629, 2012.
             has some positive effect on ruggedness as well, but it
             increases the frequency of local optima in the space.     [7] Peter Merz and Bernd Freisleben. Fitness landscapes,
                                                                           memetic algorithms, and greedy operators for graph bipar-
          4. There were no significant differences between vari-           titioning. Evol. Comput., 8(1):61–91, March 2000.
             ous types of graphs, e.g. regular vs. random. We          [8] Riccardo Poli and Mario Graff. Genetic Programming:
             believe that this is caused by small size of our test         12th European Conference, EuroGP 2009 Tübingen, Ger-
             graphs.                                                       many, April 15-17, 2009 Proceedings, chapter There Is a
                                                                           Free Lunch for Hyper-Heuristics, Genetic Programming
         As future work, we would like to continue the analysis            and Computer Scientists, pages 195–207. Springer Berlin
      using more landscape analysis techniques, such as neu-               Heidelberg, Berlin, Heidelberg, 2009.
      trality [6], decomposability etc. Also, we would like to         [9] Franz Rothlauf. Representations for genetic and evolution-
      add more low-level algorithms into the pool, especially              ary algorithms (2. ed.). Springer, 2006.
      less-greedy ones. Greediness of the algorithms causes that      [10] Franz Rothlauf. Design of Modern Heuristics. Natural
      only a very small portion of vertex covers are reachable             Computing Series. Springer, 2011.
      from the hyper-space. In some cases, more than 50%              [11] Mohammad-H Tayarani-N. and Adam Prügel-Bennett. An
      of all algorithms in the space generated the same vertex             analysis of the fitness landscape of travelling salesman
      cover. Such small range of unique fitness values then cre-           problem. Evolutionary Computation, 24(2):347–384, Jun
      ates large plateaus, which contribute to an illusion of high         2015.
      locality and fine ruggedness. It is also responsible for
      the large number of local extrema that we observed in the
      hyper-spaces.