Measures for Combining Accuracy and Time for Meta- learning Salisu Mamman Abdulrahman1 and Pavel Brazdil1,2 significant wins (SW). This multicriteria evaluation measure Abstract.1 The vast majority of studies in meta-learning uses only combines the information about the accuracy and total few performance measures when characterizing different machine training/execution time of learning algorithms and is defined as: learning algorithms. The measure Adjusted Ratios of Ratio (ARR) addresses the problem of how to evaluate the quality of a model based on the accuracy and training time. Unfortunately, this measure suffers from a shortcoming that is described in this paper. A new (1) solution is proposed and it is shown that the proposed function satisfies the criterion of monotonicity, unlike ARR. where and represent the success rate and time of algorithm 1 INTRODUCTION on dataset , respectively. The term is the ratio of The major reason why data mining has attracted a great deal of success rates which can be seen as a measure of the advantage of attention in the information industry and in society as a whole in algorithm over algorithm (i.e., a benefit). The equivalent ratio recent years is due to the wide availability of huge amounts of data for time, , can be seen as a measure of the disadvantage of and the imminent need for turning such data into useful information and knowledge. The information and knowledge gained can be used algorithm over algorithm (i.e., a cost). Thus, the authors have for applications ranging from market analysis, fraud detection, taken the ratio of the benefit and the cost, obtaining thus a measure customer retention to production control and science exploration. of the overall quality of algorithm . Data mining tools such as Weka, Knime, and RapidMiner contain However, we note that time ratios have, in general, a much wider hundreds of operators covering a wide range of data analysis tasks, range of possible values than success rate ratios. If a simple time but unfortunately provide only limited advice on how to select the ratio were used it would dominate the ratio of ratios. This effect can right method according to the nature of the problem under analysis. be controlled by re-scaling using which provide a To alleviate these problems, different systems have been developed measure of the order of magnitude of the ratio. The relative that “intelligently” help users to analyze their data. The goal of Meta- importance between accuracy and time is taken into account by learning systems is to help the user by providing some guidance [1, multiplying this expression by the AccD parameter. This parameter is 2, 3]. This is done by suggesting a particular algorithm or provided by the user and represents the amount of accuracy he/she is operation(s) (e.g. application of particular preprocessing operation or willing to trade for a 10 times speedup or slowdown. For example, classification algorithm) to the user that would lead to good AccD = 10% means that the user is willing to trade 10% of accuracy performance. for 10 times speedup/slowdown. Finally, the value of 1 is added to The vast majority of studies in meta-learning uses only few performance measures when characterizing different machine to yield values that vary around 1, as happens with the learning algorithms. Regards classification, for instance, one success rate ratio. common measure is predictive accuracy. Other researchers have The ARR should ideally be monotonically increasing. Higher used also AUC, area under the ROC curve, or else precision, recall success rate ratios should lead to higher values of ARR. Higher time and F1. What is common to all these measures is the higher the ratios should lead to lower values of ARR. The overall effect of value, the better. Costs of operations, and in particular training time, combining the two should again be monotonic. are different though, as the lower the value, the better. We have decided to verify whether this property can be verified on An aggregate metric that combine both accuracy and time as metric data. We have fixed the value of SRR to 1 and varied the time ratio was presented in [4], ARR, the adjusted ratio of ratios, which allows from very small values (2-20) to very high values (220) and calculated the user to add more emphasis either on the predictive accuracy or on the ARR for three different values of AccD (0.2, 0.3 and 0.7). The the training time. This measure suffers however, from a shortcoming, result can be seen in the plot in Fig. 1. The horizontal axis shows the which is described in the next section. log of the time ratio (logRT). The vertical axis shows the ARR value. 2 RANKING BASED ON ACCURACY AND As can be seen, the resulting ARR function is not monotonic and even approaching infinity at some point. Obviously, this can lead to TIME incorrect rankings provided by the meta-learner. However, what is The Adjusted Ratio of Ratios (ARR) measure aggregates information even more worrying is that this can affect the evaluation results. In concerning accuracy and time. It can be seen as an extension of the the next section, we propose a solution to this problem. success rate ratios (SRR) method. This method was presented in [4] together with two other basic measures, average ranks (AR) and 3 OUR PROPOSED SOLUTION 1 When devising a new solution we did not wish to change the LIAAD Inesc Tec, Porto, sma@inescporto.pt, pbrazdil@inescporto.pt overall philosophy underlying ARR. We believe that it is indeed a 2 good idea to work with ratios, as absolute numbers do not carry FEP, University of Porto. much meaning. Figure 1. ARR with three different values for AccD (0.2, 0.3 and 0.7) Figure 3. Iso-curves with three different values of A3R (0.9, 1.0 and 1.1). Here n=8 was used to calculate the root. The accuracy of 90% can be considered good in one situation, but very bad in another. After some reflection, we have realized that the 4 FUTURE PLANS problem lies in the way how the time ratio has been re-scaled. So, we considered another way of re-scaling, which does not use log, but n- We intend to improve the methods presented in [5] which rely on th root instead, where n is a parameter. The proposed function is relatively pairwise comparison involving two algorithms. We plan to referred to as A3R and is defined as follows: upgrade this work by considering the information concerning both accuracy (or AUC) ratios and time ratios. Hence, the new function proposed will be very useful. Besides, another challenge is that the new set-up would use many more algorithms (in the order of 100’s) than in previous studies. We will exploit the OpenML [6] database in this process and collaboration is underway with U.Leiden on running some of the experiments and re-using the results. Considering that the number of As Fig. 2 shows, this function is monotonic. The higher the A3R, the algorithms is high, we need to re-think the method based on pairwise better. comparisons. Furthermore, we plan to use the method based on sampling landmarks, as in [5]. To simplify the whole procedure, we will probably use a fixed set of samples, rather than using some dynamic sampling strategy, as proposed in [5]. Still, we need to evaluate what the best number of samples is from the benefit-cost perspective. 1 2 5 CONCLUSION We have presented a new measure A3R for evaluating the performance of algorithms that considers both accuracy and time ratios suitably re-scaled. We have shown that this measure satisfies the criterion of monotonicity, unlike the previous version ARR. We have discussed the usage of A3R in further experiments on meta- learning. Figure 2. A3R for three different settings for the n-th root (4, 8, and 16) This work is funded (or part-funded) by the ERDF – European Regional Development Fund through the COMPETE Programme Taking n-th root in the denominator of eq.(2) enables to rescale the (operational programme for competitiveness) and by National Funds ratio of times. The higher the value of n, the greater the rescaling. So, through the FCT – Fundação para a Ciência e a Tecnologia for instance, if one algorithm is 10 slower than another, the ratio is (Portuguese Foundation for Science and Technology) within project 10. Taking for 8-th (2nd) root of this will decrease it to 1.33 (3.16). If «FCOMP - 01-0124-FEDER-022701» the ratio were 0.1 this would result in 0.74 (0.31). All numbers get closer to 1 after rescaling. REFERENCES The change from ARR to A3R is important, as we wish to [1] P. Brazdil, C. Giraud-Carrier, C.Soares, and R. Vilalta, Metalearning: recalculate many meta-learning experiments and consider both Applications to Data Mining, Springer, 2009. accuracy ratios (and possibly AUC ratios) together with time ratios, [2] Kalousis, A. (2002). Algorithm selection via meta-learning. PhD Thesis. suitably rescaled. University of Geneva. To understand the relationship between the success rate ratios [3] Gama, J. and P. Brazdil (1995). Characterization of classification (SRR) and time ratios (RT), we have constructed iso-A3R curves algorithms. Lecture Notes in Computer Science 990, 189–200. (Fig.3). The horizontal axis plots logRT in an increasing order of [4] Brazdil, P. B., C. Soares, and Joaquin Pinto Da Costa. "Ranking learning time rate ratios. Thus negative values on the left characterize fast algorithms: Using IBL and meta-learning on accuracy and time results." Machine Learning 50.3 (2003): 251-277. algorithms, while the positive values on the right characterize slow [5] Leite, R., and P. Brazdil. "Active Testing Strategy to Predict the Best ones. The vertical axis shows the success rate ratios (SRR). Each Classification Algorithm via Sampling and Metalearning." ECAI. 2010. curve shows the values of A3R where the values are constant. The [6] Vanschoren, J.. "The experiment database for machine learning." 5th blue (red, green) curve represents situations where A3R is 0.9 (1.0, Planning to Learn Workshop, WS28 at ECAI-2012. 2012. 1.1). As the ratio of times decreases (i.e. the algorithm is faster), it is sufficient to have lower values of the success rate ratio (SRR) to obtain the same value of A3R.