=Paper= {{Paper |id=Vol-1257/paper4 |storemode=property |title=Towards an FCA-based Recommender System for Black-Box Optimization |pdfUrl=https://ceur-ws.org/Vol-1257/paper4.pdf |volume=Vol-1257 |dblpUrl=https://dblp.org/rec/conf/ecai/AsmusBSW14 }} ==Towards an FCA-based Recommender System for Black-Box Optimization== https://ceur-ws.org/Vol-1257/paper4.pdf
    Towards an FCA-based Recommender System
            for Black-Box Optimization

           Josefine Asmus1,3 , Daniel Borchmann2 , Ivo F. Sbalzarini1,3 , and
                                   Dirk Walther2,3
     1
         MOSAIC Group, Center for Systems Biology Dresden (CSBD), Max Planck
           Institute of Molecular Cell Biology and Genetics, Dresden, Germany.
             2
               TU Dresden, Theoretical Computer Science, Dresden, Germany.
         3
           Center for Advancing Electronics Dresden (cfAED), Dresden, Germany.
                                {asmus,ivos}@mpi-cbg.de
                    {Daniel.Borchmann,Dirk.Walther}@tu-dresden.de



          Abstract. Black-box optimization problems are of practical importance
          throughout science and engineering. Hundreds of algorithms and heuris-
          tics have been developed to solve them. However, none of them outper-
          forms any other on all problems. The success of a particular heuristic
          is always relative to a class of problems. So far, these problem classes
          are elusive and it is not known what algorithm to use on a given prob-
          lem. Here we describe the use of Formal Concept Analysis (FCA) to
          extract implications about problem classes and algorithm performance
          from databases of empirical benchmarks. We explain the idea in a small
          example and show that FCA produces meaningful implications. We fur-
          ther outline the use of attribute exploration to identify problem features
          that predict algorithm performance.



1        Introduction
Optimization problems are ubiquitous in science and engineering, including op-
timizing the parameters of a model [11], finding an optimal statistical estimator,
or finding the best operating conditions for an electronic circuit or a biochemical
network. An optimization problem is defined by a parameter space spanned by
the variables to be optimized, and an objective function defined over that space.
The goal is to find points in the parameter space where the objective function
has an extremum. In a black-box problem, the function is not known, but it
can be evaluated point-wise. The objective function is hence given as an oracle,
which, given a point in parameter space as an input, returns the objective func-
tion value at that point. Most practical applications in science and engineering
are black-box problems, where the objective function may comprise running a
numerical computer simulation or performing a laboratory measurement in order
to, e.g., determine how well an electronic circuit works.
    Partially funded by the German Research Foundation (DFG) via the Cluster of
    Excellence ‘Center for Advancing Electronics Dresden (cfAED)’. Supported by DFG
    Research Training Group 1763 (QuantLA).
    Owing to their practical importance, hundreds of algorithms and heuristics
have been developed to (approximately) solve black-box optimization problems.
The “No Free Lunch Theorem” [19] states that no algorithm performs better (i.e.,
finds the extremum in less iterations or finds a better extremum) than any other
algorithm on all problems. Algorithm performance is hence relative to a problem
or a class of problems. It is, however, unknown what these problem classes are,
and how one should choose a “good” algorithm given a certain problem instance.
    We aim at developing a recommender system for proposing efficient algo-
rithms for a given black-box optimization problem. Formal Concept Analysis
(FCA) [6] has previously been shown useful in recommender systems for rating
systems and advertisement [4, 8]. Here we present the idea of using attribute ex-
ploration in FCA to find discriminative attributes of empirical benchmark data
of different algorithms tested on different problems. We show that FCA produces
meaningful implications about problem features and algorithms that imply good
performance. Some of the implications found confirm hypotheses that are com-
monly known in the optimization community, others are novel. We also outline
the use of attribute exploration to discover sets of problem features that are
particularly predictive for algorithm performance. This is work in progress and
we are not presenting a final solution. We do, however, believe that FCA will
provide a powerful tool on the way towards a generic recommender system and
problem classification for black-box optimization.

2   Black-box Optimization Problems
We consider real-valued scalar black-box problems, modeled as an oracle function
f : Rd → R mapping a d-tuple x of real numbers (problem parameters) to a
scalar real number f (x) (objective function value). The function f is not assumed
to be explicitly known. A black-box optimization problem entails finding extremal
points of f using only point-wise function evaluations. The iterative sequence of
evaluations used to progress toward an extremum is called the search strategy.
    The Black-Box Optimization Benchmark (BBOB) test suit is a standard
collection of test functions used to empirically benchmark different optimization
heuristics [7]. The suit contains 24 benchmark functions with known ground
truth for parameter spaces of different dimensions. Figure 1 plots four example
functions in two dimensions.




     (a) Sphere       (b) Rastrigin sep.   (c) Weierstrass      (d) Katsuura

Fig. 1. Four black-box optimization benchmark (BBOB) functions in two dimensions.

   A problem instance is a triple (f, d, ), where f is an instance of a black-box
function, d > 0 is the parameter space dimension, and  ∈ R≥0 a tolerance.
Solving a problem instance (f, d, ) means finding a d-tuple x ∈ Rd such that
|fmin − f (x)| < , where fmin is the global minimum of f . Note that the ground-
truth global minimum fmin is known for the benchmark functions, but not in a
practical problem instance.

3     An FCA-based Recommender System
We propose a system for recommending suitable algorithms for a given black-
box optimization problem based on FCA. Similar recommender systems, albeit
not based on FCA, have been suggested [3, 9, 15], and FCA has been applied
to collaborative recommender systems for high-dimensional ratings [4] and in
contextual advertising [8]. To the best of our knowledge, the use of FCA in a
recommender system for black-box optimization has not been attempted so far.

3.1 Performance Measures
A run of an algorithm solving a problem instance is called successful. The per-
formance of an algorithm a on a problem instance p is measured by the expected
running time (ERT) [1, 3]:
                                              1 − π + (a, p) −
                  ERT(a, p) = N + (a, p) +                  N (a, p),
                                                π + (a, p)
where N + (a, p) and N − (a, p) are the average numbers of function evaluations
for successful and unsuccessful runs of a on p, respectively, and π + (a, p) is the
ratio of the number of successful runs over the number of all runs of a on p.
    The problem instances (fj , d, ), where j ∈ {1, ..., 24}, d ∈ {2, 3, 5, 10, 20, 40},
and  ∈ E = {103 , 101 , 10−1 , 10−3 , 10−5 , 10−8 }, have been used to test over
112 algorithms; the dataset is available online.1 Note that the algorithms treat
the benchmark functions as black boxes, i.e., they only evaluate the functions,
whereas the actual function remains hidden to the algorithm. The known global
minimum is only used to evaluate the performance of the algorithms.
    Often, algorithms cannot solve problem instances for any tolerance value
within a certain number of function evaluations; see Table 1. To obtain a per-
formance measure also in these cases, we introduce a penalty of 106 for every
tolerance level that the algorithm could not solve. This penalized ERT (pERT)
of an algorithm a on an instance f of a benchmark function in d dimensions for
a set E of tolerances is here defined as:
                     ERT(a, (f, d, best ))
 pERT(a, f, d, E) =                         + 106 |{ ∈ E | ERT(a, (f, d, )) = ∞}|,
                              d
where best is the smallest value in E such that ERT(a, (f, d, )) 6= ∞.

3.2 Benchmark Data Used
Table 1 shows the performance values of four algorithms on the benchmark func-
tions Sphere and Rastrigin separable in 10 dimensions. The algorithm (1+1)-
CMA-ES implements an evolutionary strategy with covariance matrix adapta-
tion [2]. We also include two variants of Particle Swarm Optimization (PSO):
1
    http://coco.gforge.inria.fr
                                              ERT                               pERT
Tolerance             103      101     10−1         10−3    10−5     10−8
Sphere
(1+1)-CMA-ES            1       90      267       440         611      872      87.2
PSO                     1      221    2 319     4 911       7 659   11 865   1 186.5
PSO-BFGS                1       68       74        74          74       75       7.5
Simplex                 1    1 596    3 450     7 283      18 418   80 428   8 042.8
Rastrigin separable
(1+1)-CMA-ES         4 1 456 464        ∞       ∞         ∞         ∞ 4 145 646.4
PSO                  4 259 992 6 532 067 6 533 720 6 537 337 6 542 104 654 210.4
PSO-BFGS             2 120 255          ∞       ∞         ∞         ∞ 4 012 025.5
Simplex              1         ∞        ∞       ∞         ∞         ∞ 5 000 000.1
Table 1. ERT and pERT performance values of four algorithms on the functions Sphere
and Rastrigin Separable in 10 dimensions.


standard PSO [5] and PSO-BFGS, a hybrid of PSO and the Broyden-Fletcher-
Goldfarb-Shanno (BFGS) algorithm [18]. The fourth algorithm is the classical
Simplex algorithm [16, 17]. Each algorithm can solve a problem instance for the
Sphere function to any value of tolerance, whereas on the separable Rastrigin
function this is the case only for PSO, but with higher ERT. The last column
shows the pERT values that account for any failed tolerance level with a penalty.
   Functions can be characterized by features. Several high-level features have
been hypothesized to predict black-box optimization algorithm performance,
such as multi-modality, global structure, and separability [10]. Table 2 lists the
extent to which different BBOB test functions exhibit these features. Multi-


          Function            multi-modality global structure separability
          Sphere                    none              none        high
          Rastrigin separable       high             strong       high
          Weierstrass               high            medium       none
          Katsuura                  high              none       none
         Table 2. Three high-level features of four BBOB benchmark functions.


modality refers to the number of local minima of a function.
    Global structure refers to the structure that emerges when only considering
minima (in a minimization problem ), i.e., the point set left after deleting all non-
optimal points. Separability specifies whether a function is a Cartesian product
of one-dimensional functions. For instance, the Sphere function (Fig. 1(a)) is
unimodal and, therefore, has no global structure (only one point remains when
restricting to minima) and is highly separable. The separable Rastrigin function
(cf. Fig. 1(b)) is separable, highly multimodal, and has a strong global structure
(the set of local minima forms a convex funnel).
3.3    Results

Different numerical measures have been suggested to estimate the features of a
black-box function from sample evaluations. These include fitness distance cor-
relations [13] and ICoFis, a method based on information content [14]. Further-
more, in [9] it is shown that in several cases it is possible to predict the high-level
features of Table 2 from low-level features that can be computed from samples
only. Based on these observations, we develop an FCA-based recommender sys-
tem that suggests well-performing algorithms for a given black-box optimization
problem by extracting implications from the BBOB database. The recommender
system consists of three steps; cf. Algorithm 3.1. In Step 1, features of the black-
box function that is given as an input are estimated by drawing samples from
the function. In Step 2, these features are used to determine the features of al-
gorithms that perform well on such functions. In Step 3, a specific algorithm is
chosen that has these algorithm features. Determining good algorithm features

Algorithm 3.1 Recommender System
Input: unknown black-box function from which samples can be drawn
Output: algorithm which is suitable for optimizing input function
   Step 1: estimate features of input function
   Step 2: imply features of algorithms that do well on functions with such features
   Step 3: pick an algorithm that has these features


(Step 2) is done using implications that form the background knowledge of the
recommender system. The premise of an implication contains algorithm features
together with features of benchmark functions, whereas the conclusion consists
of attributes corresponding to the performances of the algorithms when solving
the functions. The implications are obtained from the BBOB database by FCA.
More precisely, we build a formal context K = (G, M, I), where G is the set of
objects consisting of all pairs of benchmark functions and algorithms, M is the
set of attributes consisting of the features of algorithms and benchmark functions
together with the attributes assigned to performance values, and I is the inci-
dence matrix between the objects in G and the attributes in M . We obtain the
implications from the context K by computing the canonical base (or another
implicational base) of K.
    We illustrate our approach in Table 3. The objects are pairs of benchmark
functions and algorithms, as introduced in Sec. 3.2. The attributes are features
of algorithms, performance attributes Pi , for i ∈ { 0, 5, 6, 7 },2 and the function
features (Table 2). As algorithm features we use the respective names of the
algorithms together with the features deterministic and hybrid (stating that the
algorithm is a combination of different search strategy). We consider Table 3 as
a formal context and compute its canonical base to obtain the implications:
2
    To obtain a binary table, we partition the range of pERT-values into 10 intervals
    and introduce attribute names Pi for the intervals.
                                 { (1+1)-CMA-ES, multimod high } → { P6 }
                                                     { PSO, sep high } → { P0 }
                                   { PSO, sep none, multimod high } → { P6 }
        { PSO-BFGS, sep high, globstruct strong, multimod high } → { P6 }
                                         { sep high, globstruct none } → { P0 }
          { sep none, multimod high, globstruct none, Alg hybrid } → { P5 }
                                                   { globstruct med } → { P6 }
Based on this background knowledge, the recommender system would, e.g., sug-
gest to use the algorithm PSO for functions with high separability. If additionally
the function does not exhibit a global structure (e.g., the function is unimodal),
then any algorithm would do well. For functions with high multi-modality, no
separability, and no global structure, the system would recommend a hybrid
algorithm over (1+1)-CMA-ES, confirming previous observations [12].
   We note that the quality of the recommendations made by the system de-
pends on the “quality” of the implications constituting the background knowl-
edge. To obtain better recommendations (Step 3), suitable features of the input
function need to be efficiently computable (Step 1) and the implications need to
form a basis of a sufficient benchmark set (Step 2).

4    Discovering Function Features
It is not clear that the function features currently used in the optimization
community define meaningful problem classes with respect to expected algorithm
performance. It is an open and non-trivial problem to discover new features for
this purpose. We sketch a method that could assist an expert in finding new
features using attribute exploration from FCA [6]. This goal-oriented approach
may be more practical than an undirected synthesis of new features.
    In attribute exploration, we start from an initial formal context K and a set
S of valid implications of K. Then, we compute new implications A → B such
that A → B is valid in K, but does not follow from S, where A and B are sets
of attributes. In case such an implication can be found, a (human) expert has to
confirm or reject A → B. In the former case, A → B is added to S. If the expert
rejects A → B, s/he has to provide a counterexample that is then added to K.
If no new implications can be computed any more, the search terminates.
    Consider the context Kfeatures = (G, M, I), where G is the set of benchmark
functions, M the set of function features, and the incidence relation I represents
the fact that a function has a certain feature. Applying attribute exploration
directly to Kfeatures would not work, as we are seeking new attributes (features),
and not new objects (benchmark functions). Since objects and attributes be-
have symmetrically in every formal context, however, we can apply attribute
exploration to the dual context K−1 features = (M, G, I
                                                          −1
                                                               ).
                         −1
    Implications from Kfeatures are of the form { fi1 , . . . , fij } → { fij+1 }, fik ∈ G.
Presenting such implications to an expert means asking whether all features that
                                     (1+1)-CMA-ES




                                                     high multimod.


                                                     strong gl.struct
                                                     med. gl.struct.
                                                     no multimod.

                                                     no gl.struct.
                                     PSO-BFGS




                                                                             hybrid Alg.
                                                     high separ.
                                                     no separ.

                                                                             det. Alg.
                                     Simplex
                                     PSO




                                                                                           P0
                                                                                           P5
                                                                                           P6
                                                                                           P7
    Testfunction   Algorithm
    Sphere         (1+1)-CMA-ES      3           3         3               3               3
                   PSO                   3       3         3               3               3
                   PSO-BFGS                  3   3         3               3   3           3
                   Simplex                     3 3         3               3 3             3
    Rastrigin sep. (1+1)-CMA-ES      3                 3           3       3                       3
                   PSO                   3             3           3       3     3
                   PSO-BFGS                  3         3           3       3   3                   3
                   Simplex                       3     3           3       3 3                         3
    Weierstrass    (1+1)-CMA-ES      3                 3       3       3                           3
                   PSO                   3             3       3       3                           3
                   PSO-BFGS                  3         3       3       3          3                3
                   Simplex                       3     3       3       3     3                     3
    Katsuura       (1+1)-CMA-ES      3                 3 3             3                           3
                   PSO                   3             3 3             3                           3
                   PSO-BFGS                  3         3 3             3          3            3
                   Simplex                       3     3 3             3     3                 3
                               Table 3. Example Context.


fi1 , . . . , fij have in common are also features of function fij+1 . A counterexample
would then correspond to a new feature that fi1 , . . . , fij have in common, but
that is not shared by fij+1 . These are the features we are looking for.
     To illustrate the method, suppose that the context Kfeatures contains the
same functions as in Table 3, but the feature multi-modality as the only at-
tribute. Then, using our approach, the expert would need to judge the impli-
cation { Rastrigin } → { Weierstrass }, and s/he could reject the implication by
providing a “new” feature global structure strong, which the separable Rastrigin
function has, but the Weierstrass function has not (counterexample).

5    Conclusions
We have outlined the use of FCA in building a recommender system for black-
box optimization. The implications resulting from the small, illustrative example
presented here are meaningful in that they both confirm known facts and suggest
new ones. Attribute exploration could moreover be a powerful tool for discovering
new features that are more predictive of the expected performance.
    The ideas presented here are work in progress. Ongoing and future work
will address the computability/decidability of features on black-box functions,
and evaluate the proposed FCA-based recommender system using the BBOB
database. We will investigate novel function and algorithm features, and compare
the results obtained by FCA with results from clustering and regression analysis.
References
 1. A. Auger and N. Hansen. Performance evaluation of an advanced local search
    evolutionary algorithm. In Proc. of CEC-2005, pp. 1777–1784, 2005.
 2. A. Auger and N. Hansen. Benchmarking the (1+1)-CMA-ES on the BBOB-2009
    function testbed. In Proc. of GECCO-2009, pp. 2459–2466. ACM, 2009.
 3. B. Bischl, O. Mersmann, H. Trautmann, and M. Preuss. Algorithm selection based
    on exploratory landscape analysis and cost-sensitive learning. In Proc. of GECCO-
    12, pp. 313–320. ACM, 2012.
 4. P. du Boucher-Ryan and D. Bridge. Collaborative recommending using formal
    concept analysis. Knowledge-Based Systems, 19(5):309–315, 2006.
 5. M. El-Abd and M. S. Kamel. Black-box optimization benchmarking for noiseless
    function testbed using particle swarm optimization. In Proc. of GECCO-2009, pp.
    2269–2274. ACM, 2009.
 6. B. Ganter and R. Wille. Formal Concept Analysis: Mathematical Foundations.
    Springer, 1999.
 7. N. Hansen, S. Finck, R. Ros, A. Auger, et al. Real-parameter black-box optimiza-
    tion benchmarking 2009: Noiseless functions definitions. 2009.
 8. D. I. Ignatov, S. O. Kuznetsov, and J. Poelmans. Concept-based biclustering for
    internet advertisement. In ICDM Workshop, 2012, pp. 123–130, 2012.
 9. O. Mersmann, B. Bischl, H. Trautmann, M. Preuss, C. Weihs, and G. Rudolph.
    Exploratory landscape analysis. In Proc. of GECCO-11, pp. 829–836. ACM, 2011.
10. O. Mersmann, M. Preuss, and H. Trautmann. Benchmarking evolutionary algo-
    rithms: Towards exploratory landscape analysis. Springer, 2010.
11. C. L. Müller, R. Ramaswamy, and I. F. Sbalzarini. Global parameter identification
    of stochastic reaction networks from single trajectories. In Advances in Systems
    Biology, volume 736, chapter 28, pp. 477–498. Springer, 2012.
12. C. L. Müller and I. F. Sbalzarini. A tunable real-world multi-funnel benchmark
    problem for evolutionary optimization (and why parallel island models might rem-
    edy the failure of CMA-ES on it). In Proc. of IJCCI-2009, pp. 248–253, 2009.
13. C. L. Müller and I. F. Sbalzarini. Global characterization of the CEC 2005 fitness
    landscapes using fitness-distance analysis. In Proc. EvoStar, volume 6624 of Leture
    Notes in Computer Science, pp. 294–303. Springer, 2011.
14. M. Munoz, M. Kirley, and S. Halgamuge. Exploratory landscape analysis of con-
    tinuous space optimization problems using information content. IEEE Trans. Evol.
    Comput., 2014.
15. M. A. Muñoz, M. Kirley, and S. K. Halgamuge. A meta-learning prediction model
    of algorithm performance for continuous optimization problems. In Proc. of PPSN-
    2012, pp. 226–235. Springer, 2012.
16. J. A. Nelder and R. Mead. A simplex method for function minimization. The
    Computer Journal, 7(4):308–313, 1965.
17. L. Pál. Comparison of multistart global optimization algorithms on the BBOB
    noiseless testbed. In Proc. of GECCO-2013, pp. 1153–1160, 2013.
18. C. Voglis, G. S. Piperagkas, K. E. Parsopoulos, D. G. Papageorgiou, and I. E. La-
    garis. Mempsode: comparing particle swarm optimization and differential evolution
    within a hybrid memetic global optimization framework. In Proc. of GECCO-2012,
    pp. 253–260. ACM, 2012.
19. D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization.
    IEEE Trans. Evol. Comput., 1(1):67–82, 1997.