Balancing Efficiency and Effectiveness Trade-offs in Large Scale Multi-Stage Search Engines

Balancing Efficiency and Effectiveness Trade-offs in Large Scale Multi-Stage Search Engines JShaneCulpepper RMIT University Melbourne

Australia

Balancing Efficiency and Effectiveness Trade-offs in Large Scale Multi-Stage Search Engines 6E1AF192A057BD115361D7A98005F5A8 GROBID - A machine learning software for extracting information from scholarly documents

In this talk, we will discuss recent work on managing tradeoffs between efficiency and effectiveness in modern multi-stage ranking architectures which are comprised of a candidate generation stage followed by one or more reranking stages. In such an architecture, the quality of the final ranked list is often sensitive to the quality of initial candidate pool. We will briefly discuss a few recent related papers from my group, and then discuss future directions. First, we will explore dynamic cutoff prediction in early stage retrieval using query difficulty pre-retrieval features. We will then turn our attention to efficiency and effectiveness trade-offs in the later stage cascaded learning-to-rank algorithms. Specifically, we reexamine the importance of tightly integrating feature costs into multi-stage learning-to-rank (LTR) IR systems, and we present a novel approach to optimizing cascaded ranking models which can directly leverage a variety of different state-of-the-art LTR rankers such as Lamb-daMART and Gradient Boosted Decision Trees. Finally, we discuss interesting future research directions in multi-stage retrieval systems as modern retrieval tasks continue to evolve towards more complex interactive search systems.

Biography. Associate Professor Shane Culpepper completed his PhD in Computer Science at The University of Melbourne in 2008. He is currently a Vice-Chancellor's Principal Research Fellow and Director for the Centre for Information Discovery and Data Analytics at RMIT University in Melbourne, Australia. His current research focuses on building search systems to effectively and efficiently search web-scale data collections, and understanding how to measure the quality of the answers found. Research interests include efficient and scalable algorithm design, machine learning in information retrieval, and system evaluation. For more information about his research, visit his website at https://www.culpepper.io.

Acknowledgements. This work was supported by the Australian Research Council's Discovery Projects Scheme (DP170102231) and a grant from the Mozilla Foundation.

Efficient cost-aware cascade ranking in multi-stage retrieval R.-CChen LGallagher RBlanco JSCulpepper Proc. SIGIR SIGIR 2017 Assessing efficiencyeffectiveness tradeoffs in multi-stage retrieval systems without using relevance judgments CL AClarke JSCulpepper AMoffat Inf. Retr 19 4 2016. 2016 Dynamic cutoff prediction in multi-stage retrieval systems JSCulpepper CL AClarke JLin Proc. ADCS ADCS 2016 Incorporating risk-sensitiveness into feature selection for learning to rank DXDe Sousa SDCanuto TCRosa WSMartins MAGonçalves Proc. CIKM CIKM 2016 Hypothesis testing for the risksensitive evaluation of retrieval systems BTDinçer CMacdonald IOunis Proc. SIGIR SIGIR 2014 The whens and hows of learning to rank for web search CMacdonald RL TSantos IOunis Inf. Retr 16 5 2013. 2013 Efficient location-aware web search JMackenzie FMChoudhury JSCulpepper Proc. ADCS ADCS 2015 4 8 Query Driven Algorithm Selection in Early Stage Retrieval JMackenzie JSCulpepper RBlanco MCrane CL AClarke JLin Proc. WSDM WSDM 2018 Dynamic shard cutoff prediction for selective search HRMohammad KXu JCallan JSCulpepper Proc. SIGIR SIGIR 2018 Query understanding at Bing JPedersen 2010. 2010 SIGIR Invited talk Cost efficient gradient boosting SPeter FDiego FAHamprecht BNadler Proc. NIPS NIPS 2017 Robust ranking models via risk-sensitive optimization LWang PNBennett KCollins-Thompson Proc. SIGIR SIGIR 2012 A cascade ranking model for efficient ranked retrieval LWang JLin DMetzler Proc. SIGIR SIGIR 2011