Speeding-up Document Scoring with Tree Ensembles using CPU SIMD Extensions Claudio Lucchese1,3 , Franco Maria Nardini1,3 , Salvatore Orlando2 , Raffaele Perego1,3 , Nicola Tonellotto1,3 , and Rossano Venturini4,3 1 ISTI-CNR, Pisa, 2 University Ca’ Foscari of Venice, 3 Istella Srl, 4 University of Pisa. Abstract. Scoring documents with learning-to-rank (LtR) models based on large ensembles of regression trees is currently deemed one of the best solutions to effectively rank query results to be returned by large scale Information Retrieval systems. This extended abstract shortly summa- rizes the work in [4] proposing V-QuickScorer (vQS), an algorithm which exploits SIMD vector extensions on modern CPUs to perform the traversal of the ensamble in parallel by evaluating multiple documents simultaneously. We summarize the results of a comprehensive evaluation of vQS against state-of-the-art scoring algorithms showing that vQS outperforms competitors with speed-ups up to a factor of 2.4x. Additive ensembles of regression trees, such as GBRT [2] and λ-MART [5], are nowadays considered among the most advanced LtR models for ranking doc- uments in IR systems, although these require very efficient scoring algorithms for processing queries by strict time budgets [1]. The state-of-the-art algorithm for efficient scoring via additive ensemble of regression trees is QuickScorer (QS) [3]. In this extended abstract we shortly summarize the work in [4] where we introduce vQS, a parallelized version of QS that exploits the SIMD capa- bilities of mainstream CPUs. Streaming SIMD Extensions (SSE) and Advanced Vector Extensions (AVX) are sets of instructions exploiting wide registers of 128 and 256 bits that allow parallel operations to be performed on simple data types, e.g., a 128 bit containing four single precision or two double precision floats. Using SIMD capabilities of mainstream CPUs, namely SSE 4.2 and AVX 2, vQS can process up to 8 documents in parallel. Results of a comprehensive evaluation of vQS on public datasets against state-of-the-art scoring algorithms show that vQS outperforms competitors with speed-ups up to a factor of 2.4x. References 1. G. Capannini, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, and N. Tonel- lotto. Quality versus efficiency in document scoring with learning-to-rank models. Information Processing & Management, 2016. 2. J. H. Friedman. Greedy function approximation: a gradient boosting machine. An- nals of Statistics, pages 1189–1232, 2001. 3. C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Ven- turini. Quickscorer: A fast algorithm to rank documents with additive ensembles of regression trees. In Proc. ACM SIGIR, pages 73–82. ACM, 2015. 4. C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. Exploiting cpu simd extensions to speed-up document scoring with tree ensembles. In Proc. ACM SIGIR 2016. ACM, 2016. 5. Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting boosting for information retrieval measures. Information Retrieval, 2010.