Developing the Quantum Probability Ranking Principle Guido Zuccon Leif Azzopardi Department of Computing Science Department of Computing Science University of Glasgow University of Glasgow Scotland (UK) Scotland (UK) guido@dcs.gla.ac.uk leif@dcs.gla.ac.uk ABSTRACT the relevance of a document may depend on the previous In this work, we summarise the development of a rank- documents already assessed, for example in the novelty and ing principle based on quantum probability theory, called diversity tracks. In sub-topic retrieval, the IR system has the Quantum Probability Ranking Principle (QPRP), and to provide a document ranking which covers all the pos- we also provide an overview of the initial experiments per- sible facets (subtopics) relevant to the user’s information formed employing the QPRP. The main difference between need as soon as possible in the ranking. Consequently, fol- the QPRP and the classic Probability Ranking Principle, lowing the traditional Probability Ranking Principle, where is that the QPRP implicitly captures the dependencies be- document dependence is ignored, leads to sub-optimal per- tween documents by means of “quantum interference”. Sub- formance [10]2 . In [12], we perform a series of experiments sequently, the optimal ranking of documents is not based that also indicate this is the case, and further show that the solely on documents’ probability of relevance but also on QPRP leads to better empirical performance. This is be- the interference with the previously ranked documents. Our cause within the QPRP the interdependence between doc- research shows that the application of quantum theory to uments is naturally accounted for through the quantum in- problems within information retrieval can lead to consis- terference, and the QPRP suggests that documents ranked tently better retrieval effectiveness, while still being simple, until position n − 1 interfere with the degree of relevance of elegant and tractable. the document ranked at position n. Intuitively, documents expressing diverse information have higher degree of interfer- ence than documents that are similar. For the same reason, 1. INTRODUCTION documents containing novel information might strongly in- The idea of using quantum theory in information retrieval terfere with documents ranked in previous positions. Even (IR) was formally put forward by van Rijsbergen [9] in 20041 . contrary information might be captured by the interference In [9], the main thesis of this seminal book is to use quan- term: documents containing content contrary to the one pre- tum theory as a bridge between the three mainstream IR sented at the previous rank positions might trigger a revision approaches; i.e. vector space models, logic models and prob- of user’s beliefs about the topic. ability models. While this direction has been largely unex- The remainder of this paper is as follows: the next sec- plored, recently there has been a spate of work which aims tion briefly outlines the QPRP. Section 3 presents the main to develop quantum inspired or quantum based information results from the study recently performed on sub-topic re- retrieval models [1, 2, 3, 4, 5, 6, 8, 7, 13, 11]. trieval. Finally, we conclude in Section 4 by outlining the In this work, we report on the the development the Quan- directions for further work using the quantum based ranking tum Probability Ranking Principle [14, 12]. The ranking principle. principle is derived by developing an analogy between the famous double-slit experiment and document ranking. The 2. THE QPRP double slit experiment was conducted to demonstrate that In [14], the Quantum Probability Ranking Principle is pro- kolmogorovian probability fails to adequately describe the posed and it’s derivation is based on an analogy with the outcome of physical phenomena, and this motivated the famous double slit experiment. The resultant of this work development of quantum probability theory which incorpo- was the following formulation: when ranking documents, the rates the quantum interference between events. IR system has to maximise the total satisfaction of the user In [14], it is hypothesized that this quantum interference given the document ranking, achievable by maximising the can be used to account for the interdependence between doc- total probability of the ranking. Using the quantum law of uments and their associated judgements. In certain tasks, total probability, the resultant ranking strategy impose to select at each rank position a document d such that 1 Prior to this, van Rijsbergen gave talks as early as 1996 on X the topic. ` ´ d = arg max P (di ) + Idx ,di (1) dx ∈RA where RA is the list of documents already ranked and Idx ,di is the interference between documents dx and di . Note that Appears in the Proceedings of the 1st Italian Information Retrieval 2 This has led to arguments for the development of a new Workshop (IIR’10), January 27–28, 2010, Padova, Italy. ranking principle. http://ims.dei.unipd.it/websites/iir10/index.html Copyright owned by the authors. the traditional PRP is equivalent to the QPRP when the the interdependence between documents through quantum interference is null, i.e. Idx ,di = 0, ∀dx , di ∈ C, the doc- interference. The new ranking strategy has been empiri- uments corpus. In physics, the interference indicates the cally investigated, showing that the QPRP consistently out- amount and kind of interaction between waves. If two waves performs both the PRP and state-of-the-art approaches, i.e. strongly interact with each other, then the absolute value of MMR and PT, without requiring parameter tuning. This their interference is high, and vice versa low. The interac- suggests that the use of Quantum Theory to model pro- tion can generate two different outcomes: either increase the cesses within information retrieval can lead to substantial effect generated by the sum of the two waves (constructive improvements in retrieval effectiveness. interference, I > 0) or decrease it (destructive interference, Future work examining the utility and applications of the I < 0). In IR, the interference Idx ,di could be negative or Quantum Probability Ranking Principle will be directed to- positive, and thus demote or promote a document in the wards: ranking depending on the context. For instance in sub-topic • impact of the Pearson’s correlation coefficient as a mean retrieval it would be sensible if documents related to the to approximate interference; same subtopics negatively interfere, lowering the chances to rank both of them at high positions. This scenario is dis- • alternative estimations of the interference; cussed in the next section. • how to derive a complex amplitude distribution from the document corpus; 3. THE QPRP IN SUBTOPIC RETRIEVAL • the relationships between interference in the quantum In [12], the QPRP is empirically tested and validated on probability framework and conditional probabilities in the subtopic retrieval task. The ranking under the QPRP Kolmogorovian probability theory; and, was compared with the rankings of models which upholds • how to apply the QPRP paradigm to other retrieval the PRP, and also against state-of-the-art strategies for subtopic tasks, e.g. ad-hoc retrieval. retrieval, i.e. MMR and Portfolio Theory (PT) [10]. The main point of this experimentation was to deter- Acknowledgements: We would like to thank Keith van Rijs- mine whether the inherent document interdependence could bergen for his collaboration, support and mentoring, and Massimo be accounted for by the interference component within the Melucci for his comments and suggestions. This work has been QPRP. Intuitively, the interference component depends upon funded by the EPSRC Renaissance project (EP/F014384/1) and the Royal Society International Joint Project JP080734. both the inter-document dependencies and the document’s relevance probabilities. Since it is not possible to estimate the interference component directly from the text statistics, 5. REFERENCES [1] S. Arafat. Foundations research in information retrieval for the experiments reported in [12], we have used the Pear- inspired by quantum theory. PhD thesis, University of son’s correlation between interfering documents to compute Glasgow, December 2007. the interference. We performed the empirical investigation [2] S. Arafat and C. J. van Rijsbergen. Quantum theory and over the TREC subtopic retrieval track, which includes doc- the nature of search. In QI ’07, pages 114–121, 2007. uments from the Financial Times of London contained in [3] C. Flender, K. Kitto, and P. Bruza. Beyond ontology in TREC 6,7 and 8 collections and 20 ad-hoc retrieval topics, information systems. In QI 2009, pages 276–288, 2009. composed of subtopics, from the TREC interactive tracks. [4] Y. Hou and D. Song. Characterizing pure high-order entanglements in lexical semantic spaces via information We retrieved documents and generated the initial proba- geometry. In QI 2009, pages 237–250, 2009. bility distribution using Okapi BM25: this represented the [5] A. F. Huertas-Rosero, L. A. Azzopardi, and C. J. van PRP ranking. Afterwards we re-ranked the documents ac- Rijsbergen. Eraser lattices and semantic contents. In QI cording to three different strategies: our QPRP method 2009, pages 266–275, 2009. and two state-of-the-art techniques for subtopic retrieval, [6] M. Melucci. A basis for information retrieval in context. i.e. MMR and PT, which required parameters tuning. The ACM TOIS, 26(3):1–41, June 2008. experiments were repeated varying the level of retrieval cut- [7] B. Piwowarski and M. Lalmas. A quantum-based model for off and the length of the queries. interactive information retrieval. In ICTIR ’09, pages 224–231, 2009. From the experimental results3 , we found that (1) the [8] B. Piwowarski and M. Lalmas. Structured information QPRP improves upon PRP baselines for all levels of S- retrieval and quantum theory. In QI ’09, pages 289–298, precision and S-recall, (2) the QPRP outperforms MMR and March 2009. PT across most levels, (3) the QPRP consistently outper- [9] C. J. van Rijsbergen. The Geometry of Information forms other strategies across all topics when considering S- Retrieval. Cambridge University Press, 2004. MRR@100%, meaning that on each topic the QPRP returns [10] J. Wang and J. Zhu. Portfolio theory of information complete coverage of all subtopics at a rank lower than all retrieval. In SIGIR ’09, pages 115–122, 2009. the other strategies. And, unlike MMR and PT, no tuning [11] G. Zuccon, L. Azzopardi, and C. J. van Rijsbergen. or training is required! Revisiting logical imaging for information retrieval. In SIGIR ’09, pages 766–767, 2009. [12] G. Zuccon and L. A. Azzopardi. Using the Quantum 4. CONCLUSIONS Probability Ranking Principle to Rank Interdependent In this paper we have reported about the recent introduc- Documents. In ECIR 2010, 2010. to appear. [13] G. Zuccon, L. A. Azzopardi, and C. J. van Rijsbergen. A tion of a novel ranking strategy, the QPRP, based on quan- formalization of logical imaging for information retrieval tum probability and inspired by an analogy with the double using quantum theory. In DEXA ’08, pages 3–8, 2008. slits experiment in physics. The QPRP naturally encodes [14] G. Zuccon, L. A. Azzopardi, and K. van Rijsbergen. The 3 quantum probability ranking principle for information Experimental results are available online at http://www. retrieval. In ICTIR ’09, pages 232–240, 2009. dcs.gla.ac.uk/~guido/qprpresults.html