Measures for Combining Accuracy and Time for Meta-
                                                            learning
                                               Salisu Mamman Abdulrahman1 and Pavel Brazdil1,2
                                                                           significant wins (SW). This multicriteria evaluation measure
Abstract.1 The vast majority of studies in meta-learning uses only         combines the information about the accuracy and total
few performance measures when characterizing different machine             training/execution time of learning algorithms and is defined as:
learning algorithms. The measure Adjusted Ratios of Ratio (ARR)
addresses the problem of how to evaluate the quality of a model
based on the accuracy and training time. Unfortunately, this measure
suffers from a shortcoming that is described in this paper. A new                                                               (1)
solution is proposed and it is shown that the proposed function
satisfies the criterion of monotonicity, unlike ARR.

                                                                           where        and      represent the success rate and time of algorithm
1         INTRODUCTION
                                                                               on dataset , respectively. The term                    is the ratio of
The major reason why data mining has attracted a great deal of             success rates which can be seen as a measure of the advantage of
attention in the information industry and in society as a whole in         algorithm      over algorithm      (i.e., a benefit). The equivalent ratio
recent years is due to the wide availability of huge amounts of data
                                                                           for time,           , can be seen as a measure of the disadvantage of
and the imminent need for turning such data into useful information
and knowledge. The information and knowledge gained can be used            algorithm       over algorithm     (i.e., a cost). Thus, the authors have
for applications ranging from market analysis, fraud detection,            taken the ratio of the benefit and the cost, obtaining thus a measure
customer retention to production control and science exploration.          of the overall quality of algorithm       .
  Data mining tools such as Weka, Knime, and RapidMiner contain              However, we note that time ratios have, in general, a much wider
hundreds of operators covering a wide range of data analysis tasks,        range of possible values than success rate ratios. If a simple time
but unfortunately provide only limited advice on how to select the         ratio were used it would dominate the ratio of ratios. This effect can
right method according to the nature of the problem under analysis.        be controlled by re-scaling using                       which provide a
  To alleviate these problems, different systems have been developed
                                                                           measure of the order of magnitude of the ratio. The relative
that “intelligently” help users to analyze their data. The goal of Meta-
                                                                           importance between accuracy and time is taken into account by
learning systems is to help the user by providing some guidance [1,
                                                                           multiplying this expression by the AccD parameter. This parameter is
2, 3]. This is done by suggesting a particular algorithm or
                                                                           provided by the user and represents the amount of accuracy he/she is
operation(s) (e.g. application of particular preprocessing operation or
                                                                           willing to trade for a 10 times speedup or slowdown. For example,
classification algorithm) to the user that would lead to good
                                                                           AccD = 10% means that the user is willing to trade 10% of accuracy
performance.                                                               for 10 times speedup/slowdown. Finally, the value of 1 is added to
  The vast majority of studies in meta-learning uses only few
performance measures when characterizing different machine                                 to yield values that vary around 1, as happens with the
learning algorithms. Regards classification, for instance, one             success rate ratio.
common measure is predictive accuracy. Other researchers have                The ARR should ideally be monotonically increasing. Higher
used also AUC, area under the ROC curve, or else precision, recall         success rate ratios should lead to higher values of ARR. Higher time
and F1. What is common to all these measures is the higher the             ratios should lead to lower values of ARR. The overall effect of
value, the better. Costs of operations, and in particular training time,   combining the two should again be monotonic.
are different though, as the lower the value, the better.                     We have decided to verify whether this property can be verified on
  An aggregate metric that combine both accuracy and time as metric        data. We have fixed the value of SRR to 1 and varied the time ratio
was presented in [4], ARR, the adjusted ratio of ratios, which allows      from very small values (2-20) to very high values (220) and calculated
the user to add more emphasis either on the predictive accuracy or on      the ARR for three different values of AccD (0.2, 0.3 and 0.7). The
the training time. This measure suffers however, from a shortcoming,
                                                                           result can be seen in the plot in Fig. 1. The horizontal axis shows the
which is described in the next section.
                                                                           log of the time ratio (logRT). The vertical axis shows the ARR value.
2         RANKING BASED ON ACCURACY AND                                      As can be seen, the resulting ARR function is not monotonic and
                                                                           even approaching infinity at some point. Obviously, this can lead to
          TIME                                                             incorrect rankings provided by the meta-learner. However, what is
The Adjusted Ratio of Ratios (ARR) measure aggregates information          even more worrying is that this can affect the evaluation results. In
concerning accuracy and time. It can be seen as an extension of the
                                                                           the next section, we propose a solution to this problem.
success rate ratios (SRR) method. This method was presented in [4]
together with two other basic measures, average ranks (AR) and
                                                                           3         OUR PROPOSED SOLUTION

1                                                                            When devising a new solution we did not wish to change the
  LIAAD Inesc Tec, Porto, sma@inescporto.pt,
pbrazdil@inescporto.pt
                                                                           overall philosophy underlying ARR. We believe that it is indeed a
2                                                                          good idea to work with ratios, as absolute numbers do not carry
  FEP, University of Porto.
                                                                           much meaning.
Figure 1. ARR with three different values for AccD (0.2, 0.3 and 0.7)         Figure 3. Iso-curves with three different values of A3R (0.9, 1.0 and
                                                                              1.1). Here n=8 was used to calculate the root.
The accuracy of 90% can be considered good in one situation, but
very bad in another. After some reflection, we have realized that the         4          FUTURE PLANS
problem lies in the way how the time ratio has been re-scaled. So, we
considered another way of re-scaling, which does not use log, but n-          We intend to improve the methods presented in [5] which rely on
th root instead, where n is a parameter. The proposed function is             relatively pairwise comparison involving two algorithms. We plan to
referred to as A3R and is defined as follows:                                 upgrade this work by considering the information concerning both
                                                                              accuracy (or AUC) ratios and time ratios. Hence, the new function
                                                                              proposed will be very useful.
                                                                                 Besides, another challenge is that the new set-up would use many
                                                                              more algorithms (in the order of 100’s) than in previous studies. We
                                                                              will exploit the OpenML [6] database in this process and
                                                                              collaboration is underway with U.Leiden on running some of the
                                                                              experiments and re-using the results. Considering that the number of
As Fig. 2 shows, this function is monotonic. The higher the A3R, the          algorithms is high, we need to re-think the method based on pairwise
better.                                                                       comparisons.
                                                                                 Furthermore, we plan to use the method based on sampling
                                                                              landmarks, as in [5]. To simplify the whole procedure, we will
                                                                              probably use a fixed set of samples, rather than using some dynamic
                                                                              sampling strategy, as proposed in [5]. Still, we need to evaluate what
                                                                              the best number of samples is from the benefit-cost perspective.
                                                                      1
                                                                      2       5          CONCLUSION
                                                                              We have presented a new measure A3R for evaluating the
                                                                              performance of algorithms that considers both accuracy and time
                                                                              ratios suitably re-scaled. We have shown that this measure satisfies
                                                                              the criterion of monotonicity, unlike the previous version ARR. We
                                                                              have discussed the usage of A3R in further experiments on meta-
                                                                              learning.
  Figure 2. A3R for three different settings for the n-th root (4, 8,
and 16)                                                                       This work is funded (or part-funded) by the ERDF – European
                                                                              Regional Development Fund through the COMPETE Programme
Taking n-th root in the denominator of eq.(2) enables to rescale the          (operational programme for competitiveness) and by National Funds
ratio of times. The higher the value of n, the greater the rescaling. So,     through the FCT – Fundação para a Ciência e a Tecnologia
for instance, if one algorithm is 10 slower than another, the ratio is        (Portuguese Foundation for Science and Technology) within project
10. Taking for 8-th (2nd) root of this will decrease it to 1.33 (3.16). If    «FCOMP - 01-0124-FEDER-022701»
the ratio were 0.1 this would result in 0.74 (0.31). All numbers get
closer to 1 after rescaling.
                                                                              REFERENCES
   The change from ARR to A3R is important, as we wish to                     [1]    P. Brazdil, C. Giraud-Carrier, C.Soares, and R. Vilalta, Metalearning:
recalculate many meta-learning experiments and consider both                         Applications to Data Mining, Springer, 2009.
accuracy ratios (and possibly AUC ratios) together with time ratios,          [2]   Kalousis, A. (2002). Algorithm selection via meta-learning. PhD Thesis.
suitably rescaled.                                                                   University of Geneva.
   To understand the relationship between the success rate ratios             [3]    Gama, J. and P. Brazdil (1995). Characterization of classification
(SRR) and time ratios (RT), we have constructed iso-A3R curves                       algorithms. Lecture Notes in Computer Science 990, 189–200.
(Fig.3). The horizontal axis plots logRT in an increasing order of            [4]   Brazdil, P. B., C. Soares, and Joaquin Pinto Da Costa. "Ranking learning
time rate ratios. Thus negative values on the left characterize fast                 algorithms: Using IBL and meta-learning on accuracy and time
                                                                                     results." Machine Learning 50.3 (2003): 251-277.
algorithms, while the positive values on the right characterize slow
                                                                              [5]    Leite, R., and P. Brazdil. "Active Testing Strategy to Predict the Best
ones. The vertical axis shows the success rate ratios (SRR). Each
                                                                                     Classification Algorithm via Sampling and Metalearning." ECAI. 2010.
curve shows the values of A3R where the values are constant. The
                                                                              [6]   Vanschoren, J.. "The experiment database for machine learning." 5th
blue (red, green) curve represents situations where A3R is 0.9 (1.0,                 Planning to Learn Workshop, WS28 at ECAI-2012. 2012.
1.1). As the ratio of times decreases (i.e. the algorithm is faster), it is
sufficient to have lower values of the success rate ratio (SRR) to
obtain the same value of A3R.