=Paper=
{{Paper
|id=None
|storemode=property
|title=Combining Meta-Learning and Optimization Algorithms for Parameter Selection
|pdfUrl=https://ceur-ws.org/Vol-950/planlearn2012_submission_1.pdf
|volume=Vol-950
}}
==Combining Meta-Learning and Optimization Algorithms for Parameter Selection==
<pdf width="1500px">https://ceur-ws.org/Vol-950/planlearn2012_submission_1.pdf</pdf>
<pre>
Combining Meta-Learning and Optimization Algorithms
              for Parameter Selection
                  T. Gomes and P. Miranda and R. Prudêncio1 and C. Soares2 and A. Carvalho3


Abstract. In this article we investigate the combination of meta-             recommendation of parameters is made. It must be observed that
learning and optimization algorithms for parameter selection. We              meta-learning however is very dependent on the quality of its meta-
discuss our general proposal as well as present the recent develop-           examples. It is usually difficult obtaining good results since meta-
ments and experiments performed using Support Vector Machines                 features are in general very noisy and the number of problems avail-
(SVMs). Meta-learning was combined to single and multi-objective              able for meta-example generation is commonly limited.
optimization techniques to select SVM parameters. The hybrid meth-               As discussed in [4], good solutions to a particular search problem
ods derived from the proposal presented better results on predictive          can be used to indicate promising regions of the search space for
accuracy than the use of traditional optimization techniques.                 similar problems. Related ideas have been applied to improve opti-
                                                                              mization tasks but in very different contexts (e.g. job shop scheduling
                                                                              [4]). The positive results in these contexts motivated us to apply sim-
1     Introduction                                                            ilar ideas for optimizing learning parameters. Here, we present the
                                                                              combination of optimization techniques and meta-learning for the
The induction of a machine learning model with a good predictive
                                                                              problem of parameter selection. Meta-learning is used to suggest an
accuracy to solve a learning problem is influenced by a variety of
                                                                              initial set of solutions, which are then refined by a search technique.
aspects, such as data pre-preprocessing, algorithm selection, param-
                                                                              In previous work, the search process starts by evaluating random so-
eter optimization and training procedure. The study presented in this
                                                                              lutions from the parameter space. In the proposed hybrid approach,
paper focuses on a specific and relevant step of modeling: parameter
                                                                              the search process starts with successful solutions from previous sim-
selection. Once a learning algorithm is chosen, the user has to de-
                                                                              ilar problems. Hence, we expect that meta-learning guides the search
fine its parameter values. Learning performance is usually affected
                                                                              directly to promising regions of the search space, thus speeding up
by a poor selection of these values. For instance, the performance of
                                                                              the convergence to good solutions.
SVMs depends on the adequate choice of its kernel function, kernel
parameters, regularization constant, among other aspects [2].                                         Initial
                                                                                                      Candidates
   Parameter selection is treated by many authors as an optimization          Input   -      ML                    -       Search        - Best
problem in which a search technique is employed to find the configu-          Problem                                                       Parameters
ration of parameters which maximizes the learning performance esti-                            6                                    6
mated on the problem at hand. There is an extensive literature apply-               Meta-                     Candidate              Estimated
ing optimization algorithms for parameter selection, especially for                 Examples                  Parameters             Performance
Artificial Neural Networks. Although it represents a systematic ap-                                                        ?
proach to parameter selection, this approach can be very expensive,                         MDB                            SVM
since a large number of candidate parameter configurations must be
evaluated to ensure that an optimal, or at least reasonably good, set
of parameters is found [6].                                                                       Figure 1.   General Architecture
   Meta-learning, originally proposed for algorithm selection, has
also been adapted to parameter selection (e.g., for SVM [6, 1]). In
this approach, the choice of parameter values for a given task is
based on parameter values sucessfully adopted in similar problems.
Each meta-example in this solution includes: (1) a set of characteris-
tics (called meta-features) describing a learning problem; and (2) the        2 Developed Research
best configuration of parameters (among a set of candidates) tested           Figure 1 shows the general architecture of the proposed solution. Ini-
on that problem. A meta-learner is used to acquire knowledge from a           tially, the Meta-Learner (ML) module retrieves a predefined number
set of such meta-examples in order to recommend (predict) adequate            of past meta-examples stored in a Meta-Database (MDB), selected
parameters for new problems based on their meta-features.                     on the basis of their similarity to the input problem. Next, the Search
   Compared to the optimization approach, meta-learning tends to              module adopts as initial search points the configurations of success-
be computationally more efficient, at least at the moment when the            ful parameter values on the retrieved meta-examples. In the Search
1                                                                             module, a search process iteratively generates new candidate values
       Universidade   Federal    de      Pernambuco,      Brazil,    email:
    {tafg,pbcm,rbcp}@cin.ufpe.br                                              for the SVM parameters. The final solution which is recommended
2 Universidade do Porto, Portugal, email: csoares@fep.up.pt                   by the system is the best one generated by the Search module up to
3 Universidade de São Paulo, São Carlos, Brazil, email: andre@icmc.usp.br
                                                                              its convergence or other stopping criteria.
   In [3], we performed experiments that evaluated the proposed hy-                                     meta-learning. It converges earlier to solutions with similar NMSE
brid method using Particle Swarm Optimization (PSO) in the Search                                       values compared to the best configurations observed in the 40 prob-
module. The system was empirically tested on the selection of two                                       lems. There is an additional cost in recommending the configurations
parameters for SVMs on regression problems: the γ parameter of                                          by the hybrid approach which is the cost of the meta-learning initial-
the RBF kernel and the regularization constant C, which may have                                        ization (specially the cost of computing the meta-features). However,
a strong influence in SVM performance. A database of 40 meta-                                           we deployed meta-features with a low computational cost.
examples was produced from the evaluation of a set of 399 config-                                          In [5], we extended the previous work to perform Multi-Objective
urations of (γ, C) on 40 different regression problems. Each meta-                                      Optimization (MOO) of SVM parameters. The Multi-Objective PSO
example refers to a single regression problem, which was described                                      (MOPSO) algorithm was used to optimize the parameters (γ, C) re-
in our work by 17 meta-features (see [3] for details).                                                  garding two conflicting objectives: complexity (number of support
                                                                                                        vectors) and success rate. We evaluated the MOPSO in two differ-
                                                                                                        ent versions: (1) MOPSO with initial population suggested by ML
                0.6
                                                                                   Hybrid−PSO
                                                                                                        (Hybrid MOPSO) and (2) MOPSO with random initial population.
                                                                                   Meta−Learning        In the meta-learning module, for each similar problem retrieved, we
                                                                                   PSO
               0.55
                                                                                                        generated a Pareto Front (a set of non-dominated solutions) by ap-
                                                                                                        plying the dominance evaluation to the 399 configurations of SVM
                0.5
                                                                                 Default LibSVM         parameters considered. In order to suggest an initial population, we
                                                                                                        select one random solution of each produced Pareto Front.
Minimum NMSE


               0.45
                                                                                                           In our experiments, the final Pareto Fronts optimized by the
                                                                                                        MOPSO and the Hybrid MOPSO were evaluated using three metrics
                0.4
                                                                                                        for MOO problems: Spacing, Hypervolume and Spread. The pro-
                                                                                                        posed hybrid approach was able to generate better comparative re-
               0.35
                                                                                                        sults, considering the Spacing and Hypervolume metrics. Regarding
                                                                                                        the maximum Spread, our approach lost in first generations, but was
                0.3
                          Best Configuration                                                            similar to MOPSO in the last generations.
               0.25
                      0      2       4      6       8     10     12      14       16      18       20
                                          Number of recommended configurations                          3 Conclusion
                                                                                                        The combination of meta-learning and optimization techniques
                                                                                                        showed promising results for SVM parameter values selection. The
               Figure 2.         NMSE result obtained at each recommended configuration                 proposed approach can be easily adapted to other learning algorithms
                                                                                                        (e.g., Artificial Neural Networks). A number of aspects need to be in-
                                                                                                        vestigated in our proposed solution such as alternative strategies to
                                                                                                        integrate meta-learning in the optimization process. For instance, not
                                                                                                        only the best solutions to similar problems can be considered, but
   Figure 2 compares the minimum NMSE (averaged over the 40
                                                                                                        also diverse solutions in the search space. Additionally, the limita-
problems) obtained by SVM using the parameters suggested by com-
                                                                                                        tions of the individual components (as usual in hybrid systems) need
bining meta-learning and PSO, referred to as Hybrid-PSO (using 5
                                                                                                        to be dealt with. For instance, new strategies to augment the number
initial solutions recommended by meta-learning), and the two meth-
                                                                                                        of datasets for meta-learning can improve the learning performance
ods individually, PSO (with random initialization, population size =
                                                                                                        in our context.
5) and meta-learning (which recommends the best configuration of
each retrieved meta-example). We also present in Figure 2 the aver-
age NMSE achieved by the default heuristic adopted by the LibSVM                                        Acknowledgments: The authors would like to thank CNPq, CAPES,
tool (γ = inverse of the number of attributes and C=1). Finally, Fig-                                   FAPESP and FACEPE (Brazilian Agencies) and FCT (Portuguese
ure 2 shows the average NMSE that would be achieved if the best                                         Agency) for their financial support.
parameter configuration had been chosen on each problem.
   By comparing PSO and meta-learning, we identified a trade-off in                                     REFERENCES
their relative performances. Meta-learning is better than PSO for a                                     [1] S. Ali and K. Smith-Miles, ‘A meta-learning approach to automatic ker-
small number of recommended parameter configurations. It is also                                            nel selection for support vector machines’, Neurocomputing, 70(1-3),
better than the default LibSVM parameters. Hence, meta-learning                                             173–186, (2006).
alone would be indicated in situations in which the SVM user had                                        [2] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector
strong resources constraints. In these situations, meta-learning could                                      Machines and Other Kernel-Based Learning Methods, Cambridge Uni-
                                                                                                            versity Press, 2000.
recommend a lower number of configurations with intermediate per-                                       [3] T. Gomes, R. B. C. Prudêncio, C. Soares, A. Rossi, , and A. Carvalho,
formance levels. PSO in turn is able to find better configurations                                          ‘Combining meta-learning and search techniques to select parameters for
along its search and then it is more adequate if a higher number of                                         support vector machines’, Neurocomputing, 75, 3–13, (2012).
configurations can be tested.                                                                           [4] S. Louis and J. McDonnell, ‘Learning with case-injected genetic algo-
                                                                                                            rithms’, IEEE Trans. on Evolutionary Computation, 8, 316–328, (2004).
   The Hybrid-PSO was able to combine the advantages of its com-                                        [5] P. Miranda, R. B. C. Prudêncio, C. Soares, and A. Carvalho, ‘Multi-
ponents. The performance of the Hybrid-PSO in the initial five rec-                                         objective optimization and meta-learning for svm parameter selection’,
ommended configurations is of course the same as the performance                                            in IJCNN 2012 (to appear), (2012).
of meta-learning (since the initial configurations are recommended                                      [6] C. Soares, P. Brazdil, and P. Kuba, ‘A meta-learning approach to select
by meta-learning). From that point of the curve, the Hybrid-PSO con-                                        the kernel width in support vector regression’, Machine Learning, 54(3),
                                                                                                            195–209, (2004).
sistently achieves better results compared to both the PSO and the

</pre>