Speeding-up Document Scoring with Tree Ensembles using CPU SIMD Extensions

Claudio Lucchese

Franco Maria Nardini

Salvatore Orlando

Ra aele Perego

Nicola Tonellotto

Rossano Venturini

2 0 ISTI-CNR , Pisa 1 University Ca' Foscari of Venice 2 University of Pisa

Scoring documents with learning-to-rank (LtR) models based on large ensembles of regression trees is currently deemed one of the best solutions to e ectively rank query results to be returned by large scale Information Retrieval systems. This extended abstract shortly summarizes the work in [4] proposing V-QuickScorer (vQS), an algorithm which exploits SIMD vector extensions on modern CPUs to perform the traversal of the ensamble in parallel by evaluating multiple documents simultaneously. We summarize the results of a comprehensive evaluation of vQS against state-of-the-art scoring algorithms showing that vQS outperforms competitors with speed-ups up to a factor of 2.4x.

Capannini ,

Lucchese ,

F. M.

Nardini ,

Orlando ,

Perego , and

Tonellotto . Quality versus e ciency in document scoring with learning-to-rank models . Information Processing & Management , 2016 .

J. H.

Friedman . Greedy function approximation: a gradient boosting machine . Annals of Statistics , pages 1189 { 1232 , 2001 .

Lucchese ,

F. M.

Nardini ,

Orlando ,

Perego ,

Tonellotto , and

Venturini . Quickscorer: A fast algorithm to rank documents with additive ensembles of regression trees . In Proc. ACM SIGIR , pages 73 { 82 . ACM, 2015 .

Lucchese ,

F. M.

Nardini ,

Orlando ,

Perego ,

Tonellotto , and

Venturini . Exploiting cpu simd extensions to speed-up document scoring with tree ensembles . In Proc. ACM SIGIR 2016 . ACM, 2016 .

Wu ,

C. J.

Burges , K. M. Svore , and J. Gao . Adapting boosting for information retrieval measures . Information Retrieval , 2010 .