Improve Ranking Efficiency
               by Optimizing Tree Ensembles


      Claudio Lucchese1,3 , Franco Maria Nardini1,3 , Salvatore Orlando2 ,
         Raffaele Perego1,3 , Fabrizio Silvestri4 , and Salvatore Trani1,5
               1
                   ISTI-CNR, Pisa, 2 University Ca’ Foscari of Venice,
                   3
                     Istella Srl, 4 Yahoo London, 5 University of Pisa.
      Abstract. Learning to Rank (LtR) is the machine learning method of
      choice for producing highly effective ranking functions. However, effi-
      ciency and effectiveness are two competing forces and trading off effec-
      tiveness for meeting efficiency constraints typical of production systems is
      one of the most urgent issues. This extended abstract shortly summarizes
      the work in [4] proposing CLEaVER, a new framework for optimizing
      LtR models based on ensembles of regression trees. We summarize the
      results of a comprehensive evaluation showing that CLEaVER is able
      to prune up to 80% of the trees and provides an efficiency speed-up up
      to 2.6x without affecting the effectiveness of the model.

    Modern search engines are expected to return highly relevant results in a
fractions of seconds to satisfy efficiency constraints. Learning-to-Rank (LtR) [1]
methodologies are nowadays pervasively used as effective solutions to ranking
problems. However, efficiency and effectiveness are intertwined concepts than
often counteract each other. In this extended abstract we shortly summarize
the work in [4] where we introduce CLEaVER, a framework developed on top
of QuickRank [5], for the optimization of LtR models based on ensembles of
regression trees after the learning phase has completed. Since document scoring
cost by using a tree ensemble model is linear in its size, CLEaVER first removes
a subset of the trees, and then fine-tunes the weights of the remaining ones
according to a given quality measure. Results of a comprehensive evaluation
using QuickScorer [2, 3], a state-of-the-art algorithm for efficient scoring, show
that CLEaVER is able to improve the efficiency of a given ranking ensemble
up to a 2.6x speed-up factor without affecting the effectiveness of the model.

References
 1. T.Y. Liu. Learning to rank for information retrieval. Foundations and Trends in
    IR. 2009.
 2. C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Ven-
    turini. QuickScorer: A fast algorithm to rank documents with additive ensembles
    of regression trees. In ACM SIGIR. 2015.
 3. C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Ven-
    turini. Exploiting CPU SIMD Extensions to Speed-up Document Scoring with
    Tree Ensembles. In ACM SIGIR. 2016.
 4. C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, F. Silvestri, S. Trani. Post-
    Learning Optimization of Tree Ensembles for Efficient Ranking. In ACM SIGIR.
    2016.
 5. G. Capannini, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto.
    Quality versus efficiency in document scoring with learning-to-rank models. In
    Information Processing & Management. 2016.