Evolution of Algorithm Portfolio for Solving Strategies Alice Tarzariol Dipartimento di Scienze Matematiche, Informatiche e Fisiche Università degli Studi di Udine, Italy Abstract. The last decades have witnessed tremendous growth in the performance of solvers dealing with NP-hard problem instances. Com- monly, powerful solvers outperform the other candidates for the majority of cases but obtain poor results for particular sub-domains. For this reason, Algorithm Portfolio approaches have been developed, allowing to exploit a set of solvers with complementary strengths, instead of running a single best one for all instances. Recent successful implementations com- bine these approaches with automatic Algorithm Configuration methods, which consider a single parameterized solver and identify the best putative values of its hyper-parameters for the analyzed instances distribution. This paper aims to illustrate the development of the principal approaches that leverage machine learning to accelerate the solution search of modern solvers. Furthermore, it presents the basic notions necessary to understand the process that enabled these techniques to achieve their amazing results with some of the many performing implementations for SAT, Answer Set Programming, and Constraint Programming. Keywords: Algorithm Portfolio, Algorithm Configuration, Machine Learning, Solving Strategies. 1 Introduction Many “difficult” problems that we encounter nowadays, especially in the logistic area of industrial domains, can be modeled and solved using powerful ad-hoc tools. For example, we can reduce the considered problems in SAT formulae [39], programs expressed in Answer Set Programming [12] or Constraint Programming [43], and then leaving the search of solution to the implemented solvers. Till the 2000s, the principal approach to solve a problem instance contained in a collection of heterogeneous elements consisted of running a single “best” solver that outperforms the other candidates in the majority of cases. In [26], this is called winner-takes-all approach, to emphasize that the algorithm that obtains the higher average performance on the whole representative instances set, will always be preferred to solve a new problem drawn from the considered distribution. Since usually the performance of algorithms highly vary from instance to instance [48], even on the same domain, systems based on Algorithm Portfolio methods have been subsequently developed to leverage a group of solvers instead of a unique resource. They obtained outstanding results in industrial and solving competitions, exploiting solvers with high running time for the majority of the instances but with excellent performance on particular problem sub-domains. Modern solvers rely on different techniques, ranging from incomplete search methods, e.g., local search, to complete ones, as for instance CDCL [35], which has been particularly useful to accelerate SAT and ASP searching engine [10, 34]. Besides the solving core, the hyper-parameter values further guide the behavior of algorithms, whose tuning can be manually performed by domain experts involving considerable time and human efforts. The period on which algorithm portfolio appeared on solving competitions coincides with the concept of Algorithm Configuration, i.e., the techniques that efficiently explore solvers’ hyper-parameter configurations, detecting automatically the best setting for the considered domain. This work presents a picture of the evolution of some of the principal Algorithm Portfolio and Configuration techniques implemented in nowadays solvers, provid- ing the essential machine learning and statistical concepts necessary to address this topic. Since many different implementations are available in the literature, we solely report a limited portion of meaningful examples, useful to understand the main concepts and practical issues addressed by these systems. For a more comprehensive review we refer to [4, 24, 25, 46]. The structure of this paper is the following: Section 2 provides the basic machine learning concepts and related considerations for this topic, then Section 3 contains the definition and characteristics of Algorithm Portfolio methods, together with some practical examples derived from different fields, like SAT, ASP and CP. These are just some of the main programming paradigms implemented since in the literature we can find examples applied in further contexts, e.g., Planning, Mixed Integer Programming, etc. In Section 4, we will see the principles and main examples of Algorithm Configuration and lastly, Section 5 contains several modern systems that combine the previous ideas to speed up the search. 2 Machine Learning Background Although Machine Learning presents myriad of application domains, this section defines solely the techniques and notions that concerns our topic. Several criteria can be considered to classify the learning techniques [45], such as the prior knowledge available, determining deductive and inductive learning. The former deduces a logical conclusion from a set of “certain” general rules: CDCL implements this approach since, given an input formula as prior knowledge, it leads to an equisatisfiable formula by adding clauses that possibly accelerate the search. The learned clauses are certain, however, the scope of the deduced components is limited to solving the single problem instance considered. On the other hand, the prior knowledge of the second approach provides a support to draw conclusions, therefore, the truth of the inferred components follows a certain probability. These methods are leveraged by algorithm portfolio and automatic configuration systems, using as prior knowledge a set of previously solved problem instances, possibly independent and identically distributed according to some fixed probability distribution. An additional stage called features extraction, pre-processes the original instances by mapping them into a space defined by a set of features, i.e., measurable properties or characteristics. The features of the considered works include static features derived from syntactic analysis, e.g., the number of clauses in a SAT instance and statistics like clauses and variables ratio, or they can also involve probing features. The last features are more informative but also more expensive to be calculated since they are obtained by probing parts of the search space for a brief time interval. Finally, dynamic features are directly computed while solving is taking place: we do not contemplate this kind of features, although several portfolio methods rely on them, as for instance [7]. A further classification criterion regards the component learned; for instance, we could aim to find a direct mapping f from an input domain I into an output space O. In case of regression, f should describe the relationships between an independent continuous variable o ∈ O and a set of dependent variables i ∈ I, e.g., a function that predicts the performance of an algorithm for a specific problem instance, given its features. Instead, if the output domain O is discrete, the task is called classification, e.g., the predicted value could directly return the most suitable solver to run for the considered instance. As concerns when processing the prior knowledge, the eager learning ap- proach builds a general model from the data offline, i.e., before observing a new instance. Alternatively, the lazy learning method processes the representative input instances online, namely when the system encounters a new element to solve [36]. Moreover, considering the type of feedback provided to learn a model that entails f , we can distinguish between supervised learning and unsupervised learning methods. The former learn a model from a set of instances whose label (i.e., the correct output) is specified, trying to minimize the value produced by the selected loss function. In contrast, the latter finds a series of patterns in the input collection, since their output values are missing. Lastly, a third approach called reinforcement learning, iteratively learns a series of actions to take in a specific environment, in order to maximize a cumulative reward, trading off between exploration and exploitation phases. Although the literature includes different approaches, this paper analyzes just supervised-learning systems. Learning is performed by means of a parametric model, namely an explicit model with a fixed set of parameters that range over a defined domain. Or we can use nonparametric model approaches, whose structure cannot be characterized by a bounded set of parameters as its complexity grows in relation to the number of input data. The instance-based learning is a nonparametric lazy learning method since its knowledge consists just of indexed input instances and the reasoning phase over them is delayed until a new example must be classified. Some examples of supervised learning techniques applied for regression are linear regression, regression forests and support vector regression, while for classi- fication, we can rely for instance on logistic regression, support vector machine and decision forests. Case-based reasoning and k-nearest neighbors are examples of instance-based learning that can be employed for both tasks. 2.1 Model Evaluation and Selection In Machine Learning, the No Free Lunch Theorems defined in [47] led to a famous implication on the choice of the learning algorithm, as they proved that for any two algorithms, determining the most accurate learner for every domain is unfeasible. Consequently, in the real application scenario, the usual approach consists of considering manifold learning algorithms, evaluate them and then select the one that leads to the best result on a separate instances collection [41]. In order to evaluate the performance of a learning system, the data set used for training, called training set Ntrain , should contain instances drawn from the same distribution of the evaluation collection, i.e., the test set Ntest . However, to obtain unbiased results, the two sets should have different instances. The learned component should trade-off between correctly predicting the output value for instances in both Ntrain and Ntest , reducing as much as possible both over and under-fitting phenomena. Once an evaluation measure is chosen, model evaluation is performed: the most common is the holdout method which partitions the original collection in two sets, according to a certain splits ratio. Alternatively, k-folds cross-validation is a more costly method that leverages the whole collection for training, as it partitions the original data into k equal sized subset and performs training and testing k times, using each time the i-th test set and then averaging the results. Besides the parameters learned during training, the learning algorithm can have further hyper parameters whose values are pre-defined. Each configuration leads to a different learning model, therefore a ranking method called model selection orders the resulting models with respect to their relative estimated performances, selecting the one that returns the highest. For instance, the holdout method can be generalized to split the collection into three sets: training several learning algorithms on Ntrain , selecting among the learned models the best- performing according the validation set Nval and finally evaluating it on Ntest . An extension of the k-folds cross-validation method is a valid alternative to conduct model selection as it leads to good predictions for both model and hyper parameters, despite using a relatively small data collection. A final remark concerns the features extraction phase: the choice of significant but cheap features that adequately characterize the considered instances heavily influences the model performances. Examining a large set of features is usually counterproductive since they could contain the same information or even be misleading for the prediction. Two techniques that can reduce the number and impact of weak features are subset selection like stepwise regression, or regularization methods like ridge regression [14]. The former applies statistical techniques, like forward-stepwise regression that, starting from an empty set of features, iteratively includes the one that brings the best statistical improvement on the prediction model, returning at the end a subset of features that should capture the best characteristics to model the input instances. The latter approach instead, performs regression introducing a regularization term to the loss function of the model, allowing to keep all features but assigning to them different weights. 3 Algorithm Portfolio Algorithm Portfolios methods allow to consider simultaneously a set of algorithms exploiting their complementary strengths across an instances domain. Their first designs are provided in [17] and [13]. Two classification criterion contained in [25] distinguish these methods con- sidering the moment and the object chosen from the portfolio. With regard to “when to perform the choice”, a static portfolio approach operates the selection offline, which lessens further overhead during solving, despite precluding adjust- ment of the learned model in case of poor performance. On the other hand, a dynamic portfolio approach allows switching the initially chosen algorithm online, increasing its flexibility as well as the computational cost during solving. Considering the “object selected ” to solve a new instance, algorithm scheduling approach defines time slices and running order for each portfolio algorithm and, if a multicore architecture is available, parallel portfolio method extends this idea by running the schedule in parallel. Lastly, algorithm selection chooses the algorithm with the highest performance to solve the considered problem instance, as we can see in Figure 1. In [26] this is called per-instance algorithm selection, to differentiate it from the winner-takes-all approaches, also called per-distribution algorithm selection. Algorithm scheduling is a more robust approach than algorithm selection, as the latter chooses a unique solver and thus poor results cannot be mitigated. On the other hand, a schedule is useful if numerous problem instances are solved within short time intervals by different solvers. x ∈ P problem space kp(A, x)k = algorithm performance feature norm extraction mapping S(f (x)) A ∈ A p(A, x) p ∈ Rn f (x) ∈ F = Rm selection algorithm performance performance feature space mapping portfolio mapping measuring space Fig. 1. Diagram of algorithm selection problem derived from [42], with additional features extraction step (dashed elements) that maps the problem instance x into its features vector F (x) ∈ F which ranges in a simpler and more restricted domain than P. 3.1 Algorithm Selection Implementations SATzilla One of the most famous algorithm selection implementations is the pioneer system SATzilla. From its first appearance in 2004 SAT competition [37], numerous successive versions have been developed [51, 52]1 , improving its perfor- 1 Just to cite the most popular. mance so much that it won many gold medals on several solving competitions. Its success proved that algorithm portfolio implementations managed to outperform the single best solver, i.e., the standalone solver with the best performance across all instances, despite features extraction, selection overhead and by finding a sub-optimal solver whose performance is close to the optimal. In [37], statistical regression techniques allowed per-instance selection by learning offline an empirical hardness model for each solver A in a defined portfolio A. Namely, a function that predicts the amount of time that A will take to solve a problem instance i, according to its features xi . In order to learn the empirical hardness model MA for algorithm A using a training set I, the following steps are performed: 1. For each instance i ∈ I, compute its features together with its approximate2 runtime for A, which is the output value that the final model should predict. 2. Next, execute in order forward-stepwise selection to eliminate highly corre- lated features, a quadratic basis function expansion on the new features set and then, repeat forward-stepwise selection on the expanded features. 3. Lastly, use ridge regression technique to learn the model MA that predicts logarithm runtime of A for solving a new problem instance j, given xj . SATzilla-07 [51] extends its previous implementation, by adding further statistical techniques to the empirical hardness model [50] and introducing two simple but very efficient components that have been exploited in successive algorithm portfolio systems: pre-solvers and backup solver. The former is a single or a schedule of algorithms with a reasonable general performance that runs for a small amount of time before features computation. It should prevent the system performance penalization from features extraction on very easy instances, while algorithm selection focuses exclusively on harder cases. The latter consists of the solver that achieves the best average runtime on a validation data set, containing instances that are not solved by the pre-solvers. It will run if the online features computation fails for some reason (e.g., error or timeout). Lastly, SATzilla-07 applies subset solvers selection, choosing the best subset of solvers to use in the final portfolio by keeping the algorithms whose empirical model predicts the lowest total runtime on a validation set. SATzilla-11 [52] follows the main structure outlined in its previous versions, but applies decision forests to learn classification models for algorithm selection, instead of regression. This approach uses explicit cost-sensitive loss function to weigh misclassifications in direct proportion to its impact on portfolio performance; namely, the confidence that one solver could perform well on a particular problem instance guides the choice of the algorithms (returning a discrete value since we perform classification with respect to the portfolio algorithms) while the cost- sensitive weights allow to consider how far its predicted runtime deviates from the performance of the other solvers. Thanks to its new classification approach, SATzilla-11 considerably outperforms its previous versions. 2 As the considered SAT solvers are randomized, the running time necessaries to solve the same instance is not deterministic, but ranges in a certain interval. Claspfolio For Answer Set Programming, claspfolio [9] follows an approach similar to SATzilla-07, performing algorithm selection among a set of twelve configurations of clasp solver [10], with complementary strengths. It learns offline a performance model for each configuration using support vector regression techniques. The training set is obtained by a series of steps: initially, a collection of problem instance is grounded by gringo [11] and then claspre, a light-weight version of clasp, is run to extract their features. Lastly, after computing the running time needed to solve each grounded instance by every clasp configuration, their scores are calculated and considered as labels of the training set of each configurations’ model. When a new instance j is met online, claspre is run to extract its features xj but it also works as a pre-solver: if its execution leads directly to a solution, the selection is not needed. Then, the learned models use xj to predict the scoring of each configuration and choose the best one. ME-ASP [31] is a modular multi-engine approach for Answer Set Programming that implements algorithm selection, following a similar approach of a previous successful systems for Quantified Boolean Formulas (QBFs) solvers [40]. Its training set uses just static features and defines the solving time required by every portfolio solver as output values. Instead of applying an eager and parametric approach, like we have seen in the previous works, it uses 1-nearest neighbors classification technique to retrieve the most promising solver from the portfolio to solve a new problem instance. A successive work [32] extends this approach by considering more classification techniques. 3.2 Algorithm Scheduling Implementations The idea of using algorithm scheduling has been conceived since the heavy tailed nature of solving: often a solver either solves a problem in a short time, or it does not solve it at all within a given amount of CPU time. This behavior emerged especially on the results of several solving competitions. The first two following solutions concern Constraint Programming solvers, and manage to define a schedule online, as the size of their portfolio S is rather small. Moreover, they rely on instance-based learning approach, using as knowledge base features of problem instances and the running times of S solvers, taken from past CSP competitions. CPhydra [38] defines an algorithms schedule3 assigning a different time slice to each solver in S, according to the considered problem instance i. After efficiently indexing its knowledge base, CPhydra uses the case-based reasoning approach to define the schedule time slices4 . The retrieval phase computes the features xi and then uses 10-nearest neighbors techniques to 3 Actually, the order of solvers is static since the authors aim to maximize the probability of solving new instances within the time limit. 4 We just consider the first two core-reasoning phases of case-based reasoning cycle [1]. retrieve from the knowledge base, the set C of the ten most similar cases to i. For each c ∈ C, it considers its solvers’ time and similarity distance d(c) with i. The reuse step computes the solvers scheduling, i.e., a function f : S → − R that maps for each solver s ∈ S a time interval t that must not exceed a certain threshold T . f should be able to define a time scheduling such that, within T the number of solved instances in C is maximized, considering also the similarity of each case with i. We can formulate this goal as an optimization problem: X 1 [ max N (s, f (s)) d(c) + 1 c∈C s∈S X subject to f (s) ≤ T s∈S where N (s, f (s)) contains the instances in C that solver s is able to solve, given at least f (s) time. If every solver P reaches the maximal number of solved instances within a time t̂ such that s∈S t̂ < T , then a schedule that allocates to each solver a time interval equal to t̂ can trivially optimize the objective function. In this situation, in order to exploit the whole available time, CPhydra discards the solvers that are dominated by others in the portfolio and then computes the optimal time assignment for each of them, using the entire available time. SUNNY [2] is an algorithm scheduling for Constraint Programming whose name derives from its characteristics: it detects a SU bset of portfolio solvers by using k-N earest N eighbor technique to define a lazY learning model. When a new instance i must be solved, SUNNY computes its features xi and use them to retrieve N N (k, i), i.e., the set of k most similar instances from its knowledge base. Then, it finds the minimal subset of algorithms M ⊂ S that solves the highest number of instances in N N (k, i) within time T (choosing the subset with the lowest average solving time in case of ties). In order to define the schedule, SUNNY divides the time window [0, T ] in T /σ equal-sized slots, where σ is the sum of the number of instances that each algorithm in M solves within T , plus those that are not solved by anyone. Then, each solver s ∈ M is assigned with a number of time slots corresponding to the number of instances in N N (k, i) that s is able to solve within T . The additional time slots (that corresponds to the unsolved instances) are assigned to a previously defined backup solver. Finally the solvers are sorted in decreasing order, according to the allocated time. A successive parallel portfolio approach [3] extends SUNNY by defining a multicore solver scheduling. It deals also with constraint optimization while, concerning CSP, it allows to obtain a dynamic scheduling that is run in parallel. Aspeed [15] can be used for SAT, CSP or ASP solvers, and relies on clasp [10] to determine a “per-distribution” schedule5 that optimizes their performance on a set of problem instances. It was inspired by ppfolio [44]: a parallel portfolio 5 As it relies on per-distribution approach, online features extraction is not required. that won the 2011 SAT competition by simply using a handmade solver schedule based only on the number of available cores. Aspeed extends its idea by applying a more sophisticated reasoning to define a sequential or parallel scheduling. The sequential one consists of two components: a function σ that maps a time slice to each solver in S, and a function π that takes in input a position p ∈ {1, ..., |S|} and returns the pth solver of the defined schedule. Considering a set of instances I and a |S| × |I| matrix R that contains the runtimes of solvers for every instances in I, we can define both σ and π as multiobjective optimization problems. The first step consists in finding σ that optimizes the following equation: [ max { i | i ∈ I, R[s, i] ≤ σ(s)} s∈S X subject to σ(s) ≤ T. s∈S Namely, σ should allow to solve as much instances as possible in I, considering that the time slices assigned to each solver must sum up to a time threshold T or less. As this single optimization problem leads to numerous solutions, a second objective function is considered which favors a function σ that assigns similar time slices to each solver, i.e. that optimizes X min σ(s)2 . s∈S Once the time assignment σ is defined, the solvers must be sorted in order to reduce the expected time for solving an element in I. Thus we define a function π : {1, ..., |S|} → − S that optimizes X min τσ,π (i) i∈I where τσ,π (i) contains the time that the schedule defined by σ and π requires to solve i. The schedule follows the order defined by π: each solver s tries to solve i within its timeout σ(s) and until i is not solved the solving times are summed up, reaching T if i remains unsolved. The schedule can be extended to a multicore architecture by defining a function ν that, given a number of cores c, finds a partition P of S of dimension c. Then, for each solver subset Sk ∈ P, it uses the previous sequential scheduling approach. The new objective function finds ν, σ and πu for each u ∈ {1, ..., c} such that the expected time for solving an element in I is globally minimized. 4 Automatic Algorithm Configuration Algorithm Portfolio efficacy highly depends on the elements chosen to compose the portfolio. Besides the various techniques that underlie the solvers, there are free parameters or hyper parameters designed to customize their behavior by guiding further the search. Let us consider the problem of finding the optimal hyper parameters’ configuration of a certain parametrized solver A with respect to a set of representative problem instances I. Since the hyper parameters range from continuous to categorical domains6 , their combination leads to a high dimensional and structured configuration space. The exponential number of possibilities usually renders manual parameters tuning inefficient and without guarantee on the results quality, even if carried out by domain experts. We represent with Θ the configuration space of the hyper parameters of solver A. Algorithm Configuration approach applies statistical techniques to automati- cally find the putative configuration which optimally solves problem instances that follow a certain distribution. It returns the configuration with the minimal expected cost required to solve the elements in I with respect to a performance metric m : Θ × I → − R. Set of instances I Algorithm A and its configuration space Θ Select θ ∈ Θ Assess A(θ) on Return best found some Iθ ⊆ I configuration θ∗ return performance Configuration Task Fig. 2. Diagram of Algorithm Configuration problem as contained in [27]. As we can see in Figure 2, Algorithm Configuration iterates two actions until detecting with sufficient confidence a configuration that overcomes the others. The former action consists in deciding which configuration chose at each step: the high dimension of Θ, even discretizing the continuous domains, renders unfeasible exploring the whole space. Therefore, efficient search strategies have been defined to entail a trade-off between diversification and intensification. ParamILS [21], for instance, uses Iterated Local Search to perform a biased random walk over a chain of local optima, while Gender-Based Genetic Algorithm (GGA) [5] relies on Genetic Algorithms to compare populations of configurations. On the other hand, Sequential Model-based Algorithm Configuration (SMAC) [19] and Iterated F-race [6] apply Sequential Model-Based Optimization techniques which consist in leaning a model to guide the selection of promising configurations and it exploits the gathered data to iteratively bias the model. The latter action regards how to estimate the configurations performance and comparing them: for each configuration θ we consider an approximation of its estimated cost over a finite set of instances Iθ ⊂ I represented by cIθ (θ). We can see its definition in the objective function of Algorithm Configuration: 6 Furthermore, a parameter could be conditional on another one, such that the former is activated just for a specific value of the latter. cI (θ) θ z }| { 1 X θ∗ ∈ arg min (m(θ, i)) . θ∈Θ |Iθ | i∈Iθ 7 BasicILS applies the easiest solution to define each Iθ , i.e., using the same set of N instances, IN , for each configuration. Unfortunately, two problems arise in the general case, depending on the chosen size: when N is too small, the cost estimated for the best putative configuration cIN (θ∗ ) may underestimate its real cost, therefore the real optimal solution could be discarded in favor of θ∗ . In contrast, if N is too high, the huge number of runs slows down the evaluation for every single configuration. Similarly, in GGA the number of instances run to evaluate each element increases linearly at each iteration. On the other hand, FocusedILS, SMAC and Iterated F-race asses the perfor- mance over a dynamic number of evaluations, in order to quickly discard bad configurations and saving the evaluation budget for the most promising solu- tions by leveraging Racing Algorithm concept [33]. Moreover, adaptive capping techniques have been introduced to ParamILS [20] and SMAC [18] to terminate earlier runs for poor configurations. 5 Combination of Previous Methods The following section contains some systems that combine the previous techniques: claspfolio 2 leverages both algorithm portfolio approaches, while AutoFolio and Hydra extend algorithm portfolio idea exploiting algorithm configuration techniques. In the literature, we can find further performing systems that combine portfolio and configuration methods, managing to win several solving competitions. For instance, 3S [22] combines algorithm selection with a static SAT solver schedule; a successive work augments its idea defining a dynamic parallel portfolio [29]. Two other famous successful systems are ISAC [23] and CSHC [30]. Claspfolio 2 [16] extends its previous version increasing the robustness of selection by defining a static pre-solving scheduling that may intervene if the learned selection model performs poorly. Given in input a set of ASP problem instances I and a portfolio of solvers P, first of all, a training collection is built like in claspfolio with a pre-processing phase on both features and performances. The training phase consists of two parts: one computes a model M that maps each instance features and solver in P into a scoring value used to predict the best performing algorithm, while the other defines a pre-schedule of solvers, P re, like aspeed. According to user choice, M can be evaluated using cross-validation techniques: if the model performs particularly well, then the time slice allocated to P re will be short or even null, otherwise, the scheduling will be defined considering the whole available time and M prediction gets limited consideration. 7 ParamILS [21] contains two evaluation solutions: BasicILS and FocusedILS. When a new problem instance i must be solved, its features xi are extracted and used to select the best-predicted algorithm A using M. If this process fails, a backup-solver is run, otherwise, the pre-solving schedule P re is executed. If A is contained in P re, it is removed from the scheduling and its time slice is distributed among the remaining solvers. If i has not been solved by the pre-schedule, then A runs for the remaining time. AutoFolio A key aspect of claspfolio 2 is that implements several machine learning techniques and components that can be chosen to learn the portfolio model. AutoFolio [27] performs Algorithm Configuration over claspfolio 2 framework, choosing the optimal learning techniques with its optimal parameters configuration. More precisely, four top-level parameters are considered: pre- solving, performance pre-processing, algorithm selector approach and features pre-processing, where each decision leads to further parameters setting. One crucial point is to evaluate the performance of models correctly; therefore, the system applies model selection cross-validation techniques. Hydra [49] does not require a pre-defined portfolio: it takes in input a parametrized algorithm A, a training set of problem instances I, and a performance metric m. Then it iteratively combines an automatic algorithm configurator AC and an algorithm selection system AS to build a dynamic portfolio of A’s configurations. Let Pk be the portfolio algorithm obtained at step k: at k = 0 it can be an empty set, or without loss of generality, we can already instantiate it with a set of algorithms. At each step, if Pk 6= ∅, Hydra uses AS to learn a model that selects the most promising algorithm from Pk for each instance i ∈ I, with respect to m. We indicate with m(Pk , i) the performance obtained by the selected algorithm in Pk over instance i. Then Hydra uses AC to find the best configuration of A that is able to improve the performance of the current portfolio Pk . Given a candidate parameter configuration θ, a dynamic measure mk is used to consider the performance of the portfolio built so far for each instance i ∈ I: mk (θ, i) = max{m(θ, i), m(Pk , i)}. Leveraging mk , AC is able to find the parameter configuration θk that maxi- mizes the portfolio performance and then it adds it to the set Pk+1 = Pk ∪ {θk }. It iterates until a fixed number of steps or a time limit is reached. Algorithm Configurator Algorithm Algorithm Selection Model Portfolio Algorithm Selector Fig. 3. Diagram of Hydra framework. 6 Conclusions In recent years, Algorithm Portfolio systems are attracting widespread interest due to their outstanding performance. However, their exceptional results depend on the chosen portfolio solvers, as their strengths must be complementary with respect to the considered instances distribution. Algorithm Configuration can be leveraged to extend portfolio systems in several different ways. Since many modern systems are inspired by the previous, we analyzed some of the most influential works showing how their ideas and approaches evolve. In order to reproduce and thus compare the results of the systems, [8] introduces a standardized format and a repository called Algorithm Selection Library (ASlib). Moreover, the AS competitions 2015-2017 [28] contains comparison between different approaches. Acknowledgments. This is a summary of my Master Double Degree thesis with professor Agostino Dovier as supervisor from the University of Udine and professor Martin Gebser from the Alpen-Adria Universität Klagenfurt. This work is partially supported by projects GNCS 2017-2019 and PRID UNIUD ENCASE. References 1. A. Aamodt and E. Plaza. Case-based reasoning; foundational issues, methodological variations, and system approaches. AI COMMUNICATIONS, 7(1):39–59, 1994. 2. R. Amadini, M. Gabbrielli, and J. Mauro. Sunny: a lazy portfolio approach for constraint solving. Theory and Practice of Logic Programming, 14(4-5):509–524, 2014. 3. R. Amadini, M. Gabbrielli, and J. Mauro. A multicore tool for constraint solving. In IJCAI, pages 232–238, 2015. 4. R. Amadini, M. Gabbrielli, and J. Mauro. Why cp portfolio solvers are (under) utilized? issues and challenges. In International Symposium on Logic-Based Program Synthesis and Transformation, pages 349–364. Springer, 2015. 5. C. Ansótegui, M. Sellmann, and K. Tierney. A gender-based genetic algorithm for the automatic configuration of algorithms. In International Conference on Principles and Practice of Constraint Programming, pages 142–157. Springer, 2009. 6. P. Balaprakash, M. Birattari, and T. Stützle. Improvement strategies for the f-race algorithm: Sampling design and iterative refinement. In International workshop on hybrid metaheuristics, pages 108–122. Springer, 2007. 7. M. Balduccini. Learning and using domain-specific heuristics in asp solvers. AI Communications, 24(2):147–164, 2011. 8. B. Bischl, P. Kerschke, L. Kotthoff, M. Lindauer, Y. Malitsky, A. Fréchette, H. Hoos, F. Hutter, K. Leyton-Brown, K. Tierney, et al. Aslib: A benchmark library for algorithm selection. Artificial Intelligence, 237:41–58, 2016. 9. M. Gebser, R. Kaminski, B. Kaufmann, T. Schaub, M. T. Schneider, and S. Ziller. A portfolio solver for answer set programming: Preliminary report. In LPNMR, volume 6645 of Lecture Notes in Computer Science, pages 352–357. Springer, 2011. 10. M. Gebser, B. Kaufmann, A. Neumann, and T. Schaub. Conflict-driven answer set solving. In IJCAI, volume 7, pages 386–392, 2007. 11. M. Gebser, T. Schaub, and S. Thiele. Gringo: A new grounder for answer set pro- gramming. In International Conference on Logic Programming and Nonmonotonic Reasoning, pages 266–271. Springer, 2007. 12. M. Gelfond and Y. Kahl. Knowledge representation, reasoning, and the design of intelligent agents: The answer-set programming approach. Cambridge University Press, 2014. 13. C. P. Gomes and B. Selman. Algorithm portfolios. Artificial Intelligence, 126(1- 2):43–62, 2001. 14. T. Hastie, R. Tibshirani, and J. H. Friedman. The elements of statistical learning: data mining, inference, and prediction, 2nd Edition. Springer series in statistics. Springer, 2009. 15. H. Hoos, R. Kaminski, M. Lindauer, and T. Schaub. aspeed: Solver scheduling via answer set programming. Theory and Practice of Logic Programming, 15(1):117–142, 2015. 16. H. Hoos, M. Lindauer, and T. Schaub. claspfolio 2: Advances in algorithm selection for answer set programming. Theory and Practice of Logic Programming, 14(4- 5):569–585, 2014. 17. B. A. Huberman, R. M. Lukose, and T. Hogg. An economics approach to hard computational problems. Science, 275(5296):51–54, 1997. 18. F. Hutter, H. Hoos, and K. Leyton-Brown. Bayesian optimization with censored response data. CoRR, 2013. 19. F. Hutter, H. H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In International Conference on Learning and Intelligent Optimization, pages 507–523. Springer, 2011. 20. F. Hutter, H. H. Hoos, K. Leyton-Brown, and T. Stützle. Paramils: an automatic algorithm configuration framework. Journal of Artificial Intelligence Research, 36:267–306, 2009. 21. F. Hutter, H. H. Hoos, and T. Stützle. Automatic algorithm configuration based on local search. In AAAI, volume 7, pages 1152–1157, 2007. 22. S. Kadioglu, Y. Malitsky, A. Sabharwal, H. Samulowitz, and M. Sellmann. Algorithm selection and scheduling. In International Conference on Principles and Practice of Constraint Programming, pages 454–469. Springer, 2011. 23. S. Kadioglu, Y. Malitsky, M. Sellmann, and K. Tierney. Isac-instance-specific algorithm configuration. In ECAI, volume 215, pages 751–756, 2010. 24. P. Kerschke, H. H. Hoos, F. Neumann, and H. Trautmann. Automated algorithm selection: Survey and perspectives. CoRR, abs/1811.11597, 2018. 25. L. Kotthoff. Algorithm selection for combinatorial search problems: A survey. In Data Mining and Constraint Programming, pages 149–190. Springer, 2016. 26. K. Leyton-Brown, E. Nudelman, G. Andrew, J. McFadden, and Y. Shoham. A portfolio approach to algorithm select. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI’03, pages 1542–1543, San Francisco, CA, USA, 2003. Morgan Kaufmann Publishers Inc. 27. M. Lindauer, H. H. Hoos, F. Hutter, and T. Schaub. Autofolio: An automatically configured algorithm selector. Journal of Artificial Intelligence Research, 53:745–778, 2015. 28. M. Lindauer, J. N. van Rijn, and L. Kotthoff. The algorithm selection competition series 2015-17. arXiv preprint arXiv:1805.01214, 2018. 29. Y. Malitsky, A. Sabharwal, H. Samulowitz, and M. Sellmann. Parallel sat solver selection and scheduling. In International Conference on Principles and Practice of Constraint Programming, pages 512–526. Springer, 2012. 30. Y. Malitsky, A. Sabharwal, H. Samulowitz, and M. Sellmann. Algorithm portfolios based on cost-sensitive hierarchical clustering. In Twenty-Third International Joint Conference on Artificial Intelligence, 2013. 31. M. Maratea, L. Pulina, and F. Ricca. Applying machine learning techniques to asp solving. In LIPIcs-Leibniz International Proceedings in Informatics, volume 17. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2012. 32. M. Maratea, L. Pulina, and F. Ricca. A multi-engine approach to answer-set programming. Theory and Practice of Logic Programming, 14(6):841–868, 2014. 33. O. Maron and A. W. Moore. Hoeffding races: Accelerating model selection search for classification and function approximation. In Advances in neural information processing systems, pages 59–66, 1994. 34. J. Marques-Silva, I. Lynce, and S. Malik. Conflict-driven clause learning sat solvers. In Handbook of Satisfiability, 2009. 35. J. P. Marques Silva and K. A. Sakallah. Grasp-a new search algorithm for satis- fiability. In Proceedings of International Conference on Computer Aided Design, pages 220–227, 1996. 36. T. M. Mitchell. Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edition, 1997. 37. E. Nudelman, K. Leyton-Brown, A. Devkar, Y. Shoham, and H. Hoos. Satzilla: An algorithm portfolio for sat. Solver description, SAT competition, 2004, 2004. 38. E. O’Mahony, E. Hebrard, A. Holland, C. Nugent, and B. O’Sullivan. Using case- based reasoning in an algorithm portfolio for constraint solving. In Irish conference on artificial intelligence and cognitive science, pages 210–216, 2008. 39. C. H. Papadimitriou. Computational complexity. John Wiley and Sons Ltd., 2003. 40. L. Pulina and A. Tacchella. A multi-engine solver for quantified boolean formulas. In International Conference on Principles and Practice of Constraint Programming, pages 574–589. Springer, 2007. 41. S. Raschka. Model evaluation, model selection, and algorithm selection in machine learning. CoRR, abs/1811.12808, 2018. 42. J. R. Rice. The algorithm selection problem. Advances in computers, 15:65–118, 1976. 43. F. Rossi, P. Van Beek, and T. Walsh. Handbook of constraint programming. Elsevier, 2006. 44. O. Roussel. Description of ppfolio (2011). Proc. SAT Challenge, page 46, 2012. 45. S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall Press, Upper Saddle River, NJ, USA, 3rd edition, 2009. 46. K. A. Smith-Miles. Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Computing Surveys (CSUR), 41(1):6, 2009. 47. D. H. Wolpert. The lack of a priori distinctions between learning algorithms. Neural computation, 8(7):1341–1390, 1996. 48. D. H. Wolpert, W. G. Macready, et al. No free lunch theorems for optimization. IEEE transactions on evolutionary computation, 1(1):67–82, 1997. 49. L. Xu, H. Hoos, and K. Leyton-Brown. Hydra: Automatically configuring algorithms for portfolio-based selection. In AAAI, volume 10, pages 210–216, 2010. 50. L. Xu, H. H. Hoos, and K. Leyton-Brown. Hierarchical hardness models for sat. In International Conference on Principles and Practice of Constraint Programming, pages 696–711. Springer, 2007. 51. L. Xu, F. Hutter, H. H. Hoos, and K. Leyton-Brown. Satzilla-07: the design and analysis of an algorithm portfolio for sat. In International Conference on Principles and Practice of Constraint Programming, pages 712–727. Springer, 2007. 52. L. Xu, F. Hutter, J. Shen, H. H. Hoos, and K. Leyton-Brown. Satzilla2012: Improved algorithm selection based on cost-sensitive classification models. Proceedings of SAT Challenge, pages 57–58, 2012.