=Paper=
{{Paper
|id=None
|storemode=property
|title=Combining Meta-Learning and Optimization Algorithms for Parameter Selection
|pdfUrl=https://ceur-ws.org/Vol-950/planlearn2012_submission_1.pdf
|volume=Vol-950
}}
==Combining Meta-Learning and Optimization Algorithms for Parameter Selection==
Combining Meta-Learning and Optimization Algorithms for Parameter Selection T. Gomes and P. Miranda and R. Prudêncio1 and C. Soares2 and A. Carvalho3 Abstract. In this article we investigate the combination of meta- recommendation of parameters is made. It must be observed that learning and optimization algorithms for parameter selection. We meta-learning however is very dependent on the quality of its meta- discuss our general proposal as well as present the recent develop- examples. It is usually difficult obtaining good results since meta- ments and experiments performed using Support Vector Machines features are in general very noisy and the number of problems avail- (SVMs). Meta-learning was combined to single and multi-objective able for meta-example generation is commonly limited. optimization techniques to select SVM parameters. The hybrid meth- As discussed in [4], good solutions to a particular search problem ods derived from the proposal presented better results on predictive can be used to indicate promising regions of the search space for accuracy than the use of traditional optimization techniques. similar problems. Related ideas have been applied to improve opti- mization tasks but in very different contexts (e.g. job shop scheduling [4]). The positive results in these contexts motivated us to apply sim- 1 Introduction ilar ideas for optimizing learning parameters. Here, we present the combination of optimization techniques and meta-learning for the The induction of a machine learning model with a good predictive problem of parameter selection. Meta-learning is used to suggest an accuracy to solve a learning problem is influenced by a variety of initial set of solutions, which are then refined by a search technique. aspects, such as data pre-preprocessing, algorithm selection, param- In previous work, the search process starts by evaluating random so- eter optimization and training procedure. The study presented in this lutions from the parameter space. In the proposed hybrid approach, paper focuses on a specific and relevant step of modeling: parameter the search process starts with successful solutions from previous sim- selection. Once a learning algorithm is chosen, the user has to de- ilar problems. Hence, we expect that meta-learning guides the search fine its parameter values. Learning performance is usually affected directly to promising regions of the search space, thus speeding up by a poor selection of these values. For instance, the performance of the convergence to good solutions. SVMs depends on the adequate choice of its kernel function, kernel parameters, regularization constant, among other aspects [2]. Initial Candidates Parameter selection is treated by many authors as an optimization Input - ML - Search - Best problem in which a search technique is employed to find the configu- Problem Parameters ration of parameters which maximizes the learning performance esti- 6 6 mated on the problem at hand. There is an extensive literature apply- Meta- Candidate Estimated ing optimization algorithms for parameter selection, especially for Examples Parameters Performance Artificial Neural Networks. Although it represents a systematic ap- ? proach to parameter selection, this approach can be very expensive, MDB SVM since a large number of candidate parameter configurations must be evaluated to ensure that an optimal, or at least reasonably good, set of parameters is found [6]. Figure 1. General Architecture Meta-learning, originally proposed for algorithm selection, has also been adapted to parameter selection (e.g., for SVM [6, 1]). In this approach, the choice of parameter values for a given task is based on parameter values sucessfully adopted in similar problems. Each meta-example in this solution includes: (1) a set of characteris- tics (called meta-features) describing a learning problem; and (2) the 2 Developed Research best configuration of parameters (among a set of candidates) tested Figure 1 shows the general architecture of the proposed solution. Ini- on that problem. A meta-learner is used to acquire knowledge from a tially, the Meta-Learner (ML) module retrieves a predefined number set of such meta-examples in order to recommend (predict) adequate of past meta-examples stored in a Meta-Database (MDB), selected parameters for new problems based on their meta-features. on the basis of their similarity to the input problem. Next, the Search Compared to the optimization approach, meta-learning tends to module adopts as initial search points the configurations of success- be computationally more efficient, at least at the moment when the ful parameter values on the retrieved meta-examples. In the Search 1 module, a search process iteratively generates new candidate values Universidade Federal de Pernambuco, Brazil, email: {tafg,pbcm,rbcp}@cin.ufpe.br for the SVM parameters. The final solution which is recommended 2 Universidade do Porto, Portugal, email: csoares@fep.up.pt by the system is the best one generated by the Search module up to 3 Universidade de São Paulo, São Carlos, Brazil, email: andre@icmc.usp.br its convergence or other stopping criteria. In [3], we performed experiments that evaluated the proposed hy- meta-learning. It converges earlier to solutions with similar NMSE brid method using Particle Swarm Optimization (PSO) in the Search values compared to the best configurations observed in the 40 prob- module. The system was empirically tested on the selection of two lems. There is an additional cost in recommending the configurations parameters for SVMs on regression problems: the γ parameter of by the hybrid approach which is the cost of the meta-learning initial- the RBF kernel and the regularization constant C, which may have ization (specially the cost of computing the meta-features). However, a strong influence in SVM performance. A database of 40 meta- we deployed meta-features with a low computational cost. examples was produced from the evaluation of a set of 399 config- In [5], we extended the previous work to perform Multi-Objective urations of (γ, C) on 40 different regression problems. Each meta- Optimization (MOO) of SVM parameters. The Multi-Objective PSO example refers to a single regression problem, which was described (MOPSO) algorithm was used to optimize the parameters (γ, C) re- in our work by 17 meta-features (see [3] for details). garding two conflicting objectives: complexity (number of support vectors) and success rate. We evaluated the MOPSO in two differ- ent versions: (1) MOPSO with initial population suggested by ML 0.6 Hybrid−PSO (Hybrid MOPSO) and (2) MOPSO with random initial population. Meta−Learning In the meta-learning module, for each similar problem retrieved, we PSO 0.55 generated a Pareto Front (a set of non-dominated solutions) by ap- plying the dominance evaluation to the 399 configurations of SVM 0.5 Default LibSVM parameters considered. In order to suggest an initial population, we select one random solution of each produced Pareto Front. Minimum NMSE 0.45 In our experiments, the final Pareto Fronts optimized by the MOPSO and the Hybrid MOPSO were evaluated using three metrics 0.4 for MOO problems: Spacing, Hypervolume and Spread. The pro- posed hybrid approach was able to generate better comparative re- 0.35 sults, considering the Spacing and Hypervolume metrics. Regarding the maximum Spread, our approach lost in first generations, but was 0.3 Best Configuration similar to MOPSO in the last generations. 0.25 0 2 4 6 8 10 12 14 16 18 20 Number of recommended configurations 3 Conclusion The combination of meta-learning and optimization techniques showed promising results for SVM parameter values selection. The Figure 2. NMSE result obtained at each recommended configuration proposed approach can be easily adapted to other learning algorithms (e.g., Artificial Neural Networks). A number of aspects need to be in- vestigated in our proposed solution such as alternative strategies to integrate meta-learning in the optimization process. For instance, not only the best solutions to similar problems can be considered, but Figure 2 compares the minimum NMSE (averaged over the 40 also diverse solutions in the search space. Additionally, the limita- problems) obtained by SVM using the parameters suggested by com- tions of the individual components (as usual in hybrid systems) need bining meta-learning and PSO, referred to as Hybrid-PSO (using 5 to be dealt with. For instance, new strategies to augment the number initial solutions recommended by meta-learning), and the two meth- of datasets for meta-learning can improve the learning performance ods individually, PSO (with random initialization, population size = in our context. 5) and meta-learning (which recommends the best configuration of each retrieved meta-example). We also present in Figure 2 the aver- age NMSE achieved by the default heuristic adopted by the LibSVM Acknowledgments: The authors would like to thank CNPq, CAPES, tool (γ = inverse of the number of attributes and C=1). Finally, Fig- FAPESP and FACEPE (Brazilian Agencies) and FCT (Portuguese ure 2 shows the average NMSE that would be achieved if the best Agency) for their financial support. parameter configuration had been chosen on each problem. By comparing PSO and meta-learning, we identified a trade-off in REFERENCES their relative performances. Meta-learning is better than PSO for a [1] S. Ali and K. Smith-Miles, ‘A meta-learning approach to automatic ker- small number of recommended parameter configurations. It is also nel selection for support vector machines’, Neurocomputing, 70(1-3), better than the default LibSVM parameters. Hence, meta-learning 173–186, (2006). alone would be indicated in situations in which the SVM user had [2] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector strong resources constraints. In these situations, meta-learning could Machines and Other Kernel-Based Learning Methods, Cambridge Uni- versity Press, 2000. recommend a lower number of configurations with intermediate per- [3] T. Gomes, R. B. C. Prudêncio, C. Soares, A. Rossi, , and A. Carvalho, formance levels. PSO in turn is able to find better configurations ‘Combining meta-learning and search techniques to select parameters for along its search and then it is more adequate if a higher number of support vector machines’, Neurocomputing, 75, 3–13, (2012). configurations can be tested. [4] S. Louis and J. McDonnell, ‘Learning with case-injected genetic algo- rithms’, IEEE Trans. on Evolutionary Computation, 8, 316–328, (2004). The Hybrid-PSO was able to combine the advantages of its com- [5] P. Miranda, R. B. C. Prudêncio, C. Soares, and A. Carvalho, ‘Multi- ponents. The performance of the Hybrid-PSO in the initial five rec- objective optimization and meta-learning for svm parameter selection’, ommended configurations is of course the same as the performance in IJCNN 2012 (to appear), (2012). of meta-learning (since the initial configurations are recommended [6] C. Soares, P. Brazdil, and P. Kuba, ‘A meta-learning approach to select by meta-learning). From that point of the curve, the Hybrid-PSO con- the kernel width in support vector regression’, Machine Learning, 54(3), 195–209, (2004). sistently achieves better results compared to both the PSO and the