=Paper= {{Paper |id=Vol-2660/ialatecml_shortpaper2 |storemode=property |title=Towards Landscape Analysis in Adaptive Learning of Surrogate Models |pdfUrl=https://ceur-ws.org/Vol-2660/ialatecml_shortpaper2.pdf |volume=Vol-2660 |authors=Zbyněk Pitra,Martin Holeňa |dblpUrl=https://dblp.org/rec/conf/pkdd/PitraH20 }} ==Towards Landscape Analysis in Adaptive Learning of Surrogate Models== https://ceur-ws.org/Vol-2660/ialatecml_shortpaper2.pdf
         Towards Landscape Analysis in Adaptive
             Learning of Surrogate Models

                        Zbyněk Pitra1,2 and Martin Holeňa1
     1
         Institute of Computer Science, Academy of Sciences of the Czech Republic
                Pod Vodárenskou věží 2, 182 07 Prague 8, Czech Republic
                                {pitra, martin}@cs.cas.cz
         2
           Faculty of Nuclear Sciences and Physical Engineering, CTU in Prague
                        Břehová 7, 115 19 Prague 1, Czech Republic


Keywords: Adaptive learning · Optimization strategy · Black-box optimization
· Landscape analysis · Surrogate model


1     Introduction

A context in which we expect adaptive learning to be promising is the choice
of a suitable optimization strategy in black-box optimization. The reason why
strategy adaptation is needed in such a situation is that knowledge of the black-
box objective function is obtained only gradually during the optimization. That
knowledge covers two aspects:

1. the landscape of the black-box objective, revealed through its evaluation in
   previous iterations;
2. success or failure of the optimization strategies applied to that black-box
   objective in previous iterations.

To extract landscape knowledge, landscape analysis has been developed dur-
ing the last decade [7,10,11]. To include also the second aspect, we complement
features obtained using the landscape analysis with features describing the op-
timization employed in previous iterations.
    Our interest is in expensive black-box optimization, where the number of
evaluations of the expensive objective is usually decreased using a suitable surro-
gate model. Therefore, the research reported in this extended abstract addresses
adaptive learning of surrogate models, more precisely their learning in surrogate-
assisted versions of the state-of-the-art black-box optimization method, Covari-
ance Matrix Adaptation Evolution Strategy (CMA-ES) [3].
    Considering the results in [2,14] suggesting that the properties of landscape
features in connection with surrogate model selection problem should be analysed
in more detail, we contribute with this work a first essential step towards a
better understanding, by analysing the robustness of feature computation. Such
analysis of a large set of landscape features has already been presented only
in connection with selection of the most convenient optimization algorithm for
problems in fixed dimension [15].

    © 2020 for this paper by its authors. Use permitted under CC BY 4.0.
2      ZbyněkLandscape
     Towards  Pitra and Analysis
                        Martin Holeňa
                                 in Adaptive Learning of Surrogate Models            79

    This extended abstract focuses on surrogate model selection task in multiple
dimensions and discusses robustness of several classes of features against samples
of points from the same distribution.


2    Landscape Analysis for Surrogate Model Selection

Landscape analysis aims at measuring characteristics of the objective function
using functions that assign to each dataset      a set of real numbers [10]. Let’s con-
sider a dataset of N pairs of observations (xi , yi ) ∈ RD × R ∪ {◦} | i = 1, . . . , N ,
where ◦ denotes missing yi value (e. g., xi was not evaluated yet). Then the
dataset
S         can be utilized to describe landscape properties using a feature ϕ :
               × (R ∪ {◦})        7→ R ∪ {±∞, •}, where • denotes impossibility of
                              N,1
  N ∈N R
         N,D

feature computation.
    Feature classes convenient for continuous black-box optimization field are
mostly described in [7]. From the available feature classes we mention only those
convenient for problems with a high computational complexity (unlike e. g., cell-
mapping approach [8]) and at the same time not requiring additional evaluations
of the expensive function. Feature classes are able to measure the dissimilarity
among points of a subset of the sample (Dispersion) [9], express various informa-
tion content of the landscape (Information Content) [11], measure the relative
position of each value with respect to quantiles (Levelset) [10], extract the infor-
mation from linear or quadratic regression models (Meta-Model) [10] or from the
nearest or the better observation neighbours (Nearest Better Clustering) [6], and
describe the distribution of the objective values (y-Distribution) [10]. Moreover,
in [13] we have proposed the set of features based on the CMA-ES state variables
(CMA features).
    The surrogate model selection problem tackle the situation in an iteration
i of a surrogate-assisted algorithm A, where a set of surrogate models M are
trained using a training set T selected out of an archive A (T ⊂ A) of all points
evaluated so far using the objective function f: A = {(xi , f(xi ))| i = 1, . . . , N }.
Hereafter, a new set of points P = {xk |k = 1, . . . , α} is evaluated using a
surrogate model M ∈ M, where α ∈ N depends on the strategy defining the
usage of surrogate model in algorithm A. The research question is: How to select
the most convenient M from M according to A, T , and P?
    To tackle the research question connected with the surrogate model selec-
tion problem, we have proposed (see [14]) the following metalearning approach
visualised in Figure 1:
    In the first phase, each model M ∈ M is trained on each T (l) from the set
of datasets D = {A(l) , T (l) , P (l) }L
                                       l=1 , L ∈ N and its error ε is measured on P .
                                                                                      (l)

Simultaneously, a set of features Φ is computed on each dataset from D. Hereby,
a mapping SM : Φ → M from the space of landscape features to M is trained.
In the second phase, the trained mapping SM is utilized in each iteration i of the
algorithm A to select the model M ∈ M according to the features Φ calculated
on A(i) , T (i) , and P (i) . The selected M is utilized to fit T (i) and afterwards to
predict objective function values of points from P (i) .
80        Zbyněk Towards
                  Pitra andLandscape Anal. in Adapt. Learning of Surr. Models
                            Martin Holena                                                                        3

             D
                                                             Train model on T
                                                                                       M ∈M
                   A       T          P


                                                                                 Surrogate model space
                       Data space
                                                                                                   Assess
                                     Feature extraction                                            performance
                                                                                                   on P




                       Φ(A, T , P)                               S                     ε(M ) ∈ E


                                                          Selection mapping
                 Landscape feature space                  minimizing ε              Prediction error
                                                          S : Φ 7→ M


Figure 1: Scheme of the metalearning approach to the surrogate model selection
system [14].


3      Feature Robustness

To investigate robustness of feature computation against different samples of
points (in the sense of low variance), several independent archive realisations
using the same distributions should be available. To gain such realisations, we
have created a new set of artificial distributions by smoothing the distributions
from real runs of the surrogate algorithm on the set of benchmarks.
    First, we have generated a set of datasets D using independent runs of the
8 model settings from [13] for the DTS-CMA-ES algorithm [1,12] on the 24
noiseless single-objective benchmark functions from the COCO framework [4,5].
All runs were performed in dimensions 2, 3, 5, 10, and 20 on instances 11–15.
To gain 100 comparable archives using those runs, we have generated points
for new archives using the weighted sum of original archive distributions from
D, where the weight vector w(i) = 91 (0, . . . , 0 , 1 , 2 , 3, 2 , 1 , 0 , . . . , 0)>
                                                                     i−3 i−2 i−1 i i+1 i+2 i+3
provides distribution smoothing across the available iterations1 . Second, for all
A(i) , T (i) , and P (i) from D we have computed all features from the following
feature classes: Dispersion, Information Content, Levelset, Meta-Model, Nearest
Better Clustering, y-Distribution, CMA features.
    Once the features are computed, the numbers of ±∞ and • values of different
samples from one iteration are summarized and the rest of feature values is
normalized to [0, 1] range using feature minima and maxima over the whole D.
We then compare feature means and variances for individual iterations.
1
     Weighted
     Pimax (i) sum          of  the    original
                                           Pimax (i)archive P distributions  satisfies
                     (n)     (n)                                     (i)
                   m       C                          m(n) , n=0   (wn )2 C(n) , where
                                                              imax
       n=0
            w n N        ,        ∼    N      n=0
                                                  w n
     imax is the maximal iteration reached by particular original archive and m(n) and
     C(n) are mean and covariance matrix in iteration n.
4                               ZbyněkLandscape
                              Towards  Pitra and Analysis
                                                 Martin Holeňa
                                                          in Adaptive Learning of Surrogate Models                                                                                         81


                   0.04                                                           100                                          0.02                                                   100
                                                                        0.05                                                                                                  0.05




                                                                                        samples




                                                                                                                                                                                            samples
Feature variance




                                                                                                  Feature variance
                   0.03                                                 0.5                                            0.015                                                  0.5
                                                                        0.95                                                                                                  0.95
                   0.02                                                           50                                           0.01                                                   50




                                                                                        % of




                                                                                                                                                                                            % of
                   0.01                                                                                                0.005

                      0                                                           0                                               0                                                    0
                          6        75       210         476      1185          4996                                               1.2   1.5       2.1           3.5     6.7          22
                                        number of observations                                                                                density of observations


                   0.08                                                           100                                          0.08                                                   100
                                                                        0.05                                                                                                  0.05




                                                                                        samples




                                                                                                                                                                                            samples
Feature variance




                                                                                                            Feature variance
                   0.06                                                 0.5                                                    0.06                                           0.5
                                                                        0.95                                                                                                  0.95
                   0.04                                                 -         50                                           0.04                                           -       50




                                                                                        % of




                                                                                                                                                                                            % of
                   0.02                                                                                                        0.02

                      0                                                           0                                               0                                                    0
                       12          43        74         117      221           600                                                1.2   1.4       1.8           2.8     4.4          12
                                        number of observations                                                                                density of observations



Figure 2: The dependecies of 0.05, 0.5, and 0.95 quantile of feature variance,
the median number of ±∞, or • of feature values on the number of observations
N for two features are shown on plots in    √ the first column. The dependencies of
the same statistics on the data density D N are presented in the second column.
Plots in the first row represent statistics for feature ϕmed (A) – median distance of
the ’best’ vs. ’all’ objectives in A (from Dispersion feature class) and the second
row contains statistics for ϕεs (T ∪ P) – settling sensitivity of the information
content in T ∪ P (Information Content).


    Figure 2 shows the dependecies of 0.05, 0.5, and 0.95 quantile of feature
variance, the number of ±∞, or • on the number   √      of observations N in the
considered set (A, T , or P) and data density D N for two example features.
    The results show that most of the features are robust in the sense of having a
low variance, especially for higher numbers of observations. Robustness for lower
values of N is not frequently high, or even the feature is not possible to calculate
(e. g., some of Dispersion features). CMA features provided the most robust
results probably due to the fact that most of them are sample independent. The
lowest variance values, and also high numbers of cases where the feature was
impossible to calculate were observed at Dispersion features.


4                         Conclusion

The extended abstract addressed adaptive learning of a suitable optimization
setting in black-box optimization, more precisely, adaptive learning of a surrogate
model in a surrogate-assisted version of the CMA-ES. Its main message is the
relationship of this kind of adaptive learning to landscape analysis. A formal
framework for the learning of a surrogate model based on landscape analysis is
given, and considered kinds of landscape features are discussed. In the results
obtained so far, attention is paid in particular to feature robustness.
    This work in progress is part of a thorough investigation of the possibilities of
landscape analysis in the context of surrogate modelling for black-box optimiza-
82      Zbyněk Towards
                Pitra andLandscape Anal. in Adapt. Learning of Surr. Models
                          Martin Holena                                               5

tion. That investigation has already brought first results in the past [2,13,14],
but much still remains for further research.


Acknowledgements The reported research was supported by the Czech Sci-
ence Foundation grant No. 18-18080S and by the Grant Agency of the Czech
Technical University in Prague with its grant No. SGS20/183/OHK4/3T/14.
Further, access to computing and storage facilities owned by parties and projects
contributing to the National Grid Infrastructure MetaCentrum, provided under
the programme "Projects of Large Research, Development, and Innovations In-
frastructures" (CESNET LM2015042), is greatly appreciated.


References

 1. Bajer, L., Pitra, Z., Repický, J., Holeňa, M.: Gaussian process surrogate models for
    the CMA Evolution Strategy. Evolutionary Computation 27(4), 665–697 (2019)
 2. Dvořák, M.: Optimization of surrogate model settings using landscape analysis.
    Master’s thesis, Czech Technical University in Prague, Faculty of Information Tech-
    nology, Prague, Czech Republic (2020)
 3. Hansen, N.: The CMA evolution strategy: A comparing review. In: Towards a New
    Evolutionary Computation, pp. 75–102. No. 192 in Studies in Fuzziness and Soft
    Computing, Springer Berlin Heidelberg (Jan 2006)
 4. Hansen, N., Auger, A., Finck, S., Ros, R.: Real-parameter black-box optimization
    benchmarking 2012: Experimental setup. Tech. rep., INRIA (2012)
 5. Hansen, N., Finck, S., Ros, R., Auger, A.: Real-parameter black-box optimization
    benchmarking 2009: Noiseless functions definitions. Tech. Rep. RR-6829, INRIA
    (2009), updated February 2010
 6. Kerschke, P., Preuss, M., Wessing, S., Trautmann, H.: Detecting funnel struc-
    tures by means of exploratory landscape analysis. pp. 265–272. GECCO ’15, ACM
    (2015)
 7. Kerschke, P., Hoos, H.H., Neumann, F., Trautmann, H.: Automated algorithm
    selection: Survey and perspectives. Evolutionary computation 27(1), 3–45 (2019)
 8. Kerschke, P., Preuss, M., Hernández, C., Schütze, O., Sun, J.Q., Grimme, C.,
    Rudolph, G., Bischl, B., Trautmann, H.: Cell mapping techniques for exploratory
    landscape analysis. In: EVOLVE - A Bridge between Probability, Set Oriented
    Numerics, and Evolutionary Computation V. pp. 115–131. Springer International
    Publishing (2014)
 9. Lunacek, M., Whitley, D.: The dispersion metric and the cma evolution strategy.
    pp. 477–484. GECCO ’06, ACM (2006)
10. Mersmann, O., Bischl, B., Trautmann, H., Preuss, M., Weihs, C., Rudolph, G.:
    Exploratory landscape analysis. pp. 829–836. GECCO ’11, ACM (2011)
11. Muñoz, M.A., Kirley, M., Halgamuge, S.K.: Exploratory landscape analysis of con-
    tinuous space optimization problems using information content. IEEE Transactions
    on Evolutionary Computation 19(1), 74–87 (2015)
12. Pitra, Z., Bajer, L., Holeňa, M.: Doubly trained evolution control for the Surrogate
    CMA-ES. In: Proceedings of the PPSN XIV: 14th International Conference, Edin-
    burgh, UK, September 17-21. pp. 59–68. Springer International Publishing, Cham
    (2016)
6      ZbyněkLandscape
     Towards  Pitra and Analysis
                        Martin Holeňa
                                 in Adaptive Learning of Surrogate Models         83

13. Pitra, Z., Repický, J., Holeňa, M.: Landscape analysis of Gaussian process sur-
    rogates for the covariance matrix adaptation evolution strategy. pp. 691–699.
    GECCO ’19, ACM (2019)
14. Pitra, Z., Bajer, L., Holeňa, M.: Knowledge-based selection of Gaussian process
    surrogates. In: Kottke, D., Lemaire, V., Calma, A., Krempl, G., Holzinger, A.
    (eds.) ECML PKDD 2019: Workshop & Tutorial on Interactive Adaptive Learning.
    Proceedings. pp. 48–63. ECML PKDD 2019, Würzburg, Germany (Sep 2019)
15. Renau, Q., Dreo, J., Doerr, C., Doerr, B.: Expressiveness and robustness of land-
    scape features. In: Proceedings of the Genetic and Evolutionary Computation Con-
    ference Companion. p. 2048–2051. GECCO ’19, Association for Computing Ma-
    chinery, New York, NY, USA (2019)