Cognitive Temporal Document Priors (Abstract) ˚ Maria-Hendrike Peetz and Maarten de Rijke ISLA, University of Amsterdam {M.H.Peetz, derijke}@uva.nl 1. INTRODUCTION have little cognitive background. Every moment of our life we retrieve information from our brain: Memory Chain Model. The memory chain model [1] assumes a we remember. We remember items to a certain degree: for a men- multi-store system of different levels of memory. The probability tally healthy human being retrieving very recent memories is virtu- to store an item in one memory being µ, ally effortless, while retrieving untraumatic memories from the past is more difficult [4]. Early research in psychology was interested in fMCM-1 pD, q, gq “ µe´aδg pq,Dq . (1) the rate at which people forget single items, such as numbers. Psy- The parameter a indicates how items are being forgotten. The func- chology researchers have also studied how people retrieve events. tion fMCM-1 pD, q, gq is equivalent to the exponential decay in [2] Chessa and Murre [1] record events and hits of web pages related when the two parameters (µ and a) are equal. In the two-store to an event and fit models of how people remember, the so-called system, an item is first remembered in short term memory with a retention function. Modeling the retention of memory has a long strong memory decay, and later copied to long term memory. Each history in psychology, resulting in a range of proposed retention memory has a different decay parameter, so the item decays in both functions. In information retrieval (IR), the relevance of a docu- memories, at different rates. The overall retention function is ment depends on many factors. If we request recent documents, ´ ¯ µ2 then how much we remember is bound to have an influence on the ´µ1 e´a1 δg pq,Dq ` a ´a pe´a2 δg pq,Dq ´e´a1 δg pq,Dq q relevance of documents. Can we use the psychologists’ models of fMCM-2 pD, q, gq “ 1´e 2 1 , the retention of memory as (temporal) document priors? Previous (2) work in temporal IR has incorporated priors based on the exponen- where an overall exponential memory decay is assumed. The pa- tial function into the ranking function [2, 3]—this happens to be rameter µ1 and µ2 are the likelihood that the items are initially saved one of the earliest functions used to model the retention of mem- in short and long term memory, whereas a1 and a2 indicate the for- ory. Many other such functions have been considered by psychol- getting of the items. Again, t is the time bin. ogists to model the retention of memory—what about the potential One can also consider the Weibull function ˆ ˙ of other retention functions as temporal document priors? aδg pD,qq d Inspired by the cognitive psychology literature on human mem- fBW pD, q, gq “ e´ d , (3) ory and on retention functions in particular, we consider seven tem- poral document priors. We propose a framework for assessing and its extension them, building on four key notions: performance, parameter sensi- ´ aδ pD,qq d ¯ ´ gd tivity, efficiency, and cognitive plausibility, and then use this frame- fEW pD, q, gq “ b ` p1 ´ bqµe . (4) work to assess those seven document priors. We show that on sev- eral data sets (newspaper and microblog), with different retrieval Here, a and d indicate how long the item is being remembered: models, the exponential function as a document prior should not be a indicates the overall volume of what can potentially be remem- the first choice. Overall, other functions, like the Weibull function, bered, d determines the steepness of the forgetting function; µ de- score better within our proposed framework. termines the likelihood of initially storing an item, and b denotes an asymptote parameter. The power function is ill-behaved between 0 and 1 and usual 2. METHODS approximations start at 1. The amended power function is We introduce basic notation and then describe several retention functions serving as temporal document priors. fAP pD, q, gq “ b ` p1 ´ bqµpδg pD, qq ` 1qa , (5) We say that document D in document collection D has time where a, b, and µ are the decay, an asymptote, and the initial learn- timepDq and text textpDq. A query q has time timepqq and text ing performance. textpqq. We write δg pq, Dq as the time difference between timepqq A very intuitive baseline is given by the linear function, and timepDq with the granularity g. We introduce a series of retention functions. The memory chain ´pa ¨ δg pq, Dq ` bq models ((1) and (2)) build on the assumptions that there are differ- fL pD, q, gq “ , (6) b ent memories. The Weibull functions ((3) and (4)) are of interest to psychologists because they fit human retention behavior well. In where a is the gradient and b is δg pq, argmaxD1 PD δg pq, D1 qq. Its contrast, the retention functions linear and hyperbolic ((6) and (7)) range is between 0 and 1 for all documents in D . The hyperbolic discounting functionhas been used to model how ˚ The full version of this paper appeared in ECIR 2013 [5]. humans value rewards: the later the reward the less they consider Table 1: Assessing temporal document priors; # improved queries is w.r.t. MCM-1. Condition MCM-1 MCM-2 BW EW AP L HD # impr. queries (temp.) n/a 14 (58%) 5 (20%) 16 (67%) 5 (20%) 2 (8%) 6 (25%) # impr. queries (non-temp.) n/a 27 (35%) 35 (46%) 26 (34%) 38 (50%) 36 (47 %) 33 (43%) # impr. queries (Tweets2011) n/a 16 (32%) 17 (34%) 22 (44%) 0 (0%) 17 (34 %) 21 (42%) MAP + – + 0 0 – 0 P10 – – 0 – 0 0 0 Rprec 0 ˘ + ˘ 0 0 0 MRR 0 0 + 0 + + + Sensitivity of parameters – – + – + + + Efficiency: # parameters 2 4 2 4 3 2 1 Plausibility: fits human behav. + ++ + ++ + n/a n/a Plausibility: neurobiol. expl. + + – + – – – the reward worth. Here, parameter optimisation. Of those functions, BW and L have only 1 few parameters, and BW performs best. fHD pD, q, gq “ , (7) ´p1 ` k ˚ δg pq, Dqq 4. CONCLUSION where k is the discounting factor. We have proposed a new perspective on functions used for tem- poral document priors used for retrieving recent documents. We 3. EXPERIMENTS showed how functions with a cognitive moti- vation yield similar, We propose a set of three criteria for assessing temporal docu- if not significantly better results than others on news and microblog ment priors and we determine whether the priors meet the criteria. datasets. In particular, the Weibull function is stable, easy to opti- mize, and motivated by psychological experiments. A framework for assessing temporal document priors. Performance. A document prior should improve the perfor- Acknowledgments. We thank Jessika Reissland for her inspira- mance on a set of test queries for a collection of time-aware docu- tion. This research was partially supported by the European Union’s ments. A well-performing document prior improves on the stan- ICT Policy Support Programme as part of the Competitiveness and dard evaluation measures across different collections and across Innovation Framework Programme, CIP ICT-PSP under grant agree- different query sets. We use the number of improved queries as ment nr 250430, the European Community’s Seventh Framework well as the stability of effectiveness with respect to different eval- Programme (FP7/2007-2013) under grant agreements nr 258191 uation measures as an assessment for performance, where stability (PROMISE Network of Excellence) and 288024 (LiMoSINe project), refers to that improved or non-decreasing performance over several the Netherlands Organisation for Scientific Research (NWO) under test collections. project nrs 612.061.814, 612.061.815, 640.004.802, 727.011.005, Sensitivity of parameters. A well-performing document prior 612.001.116, HOR-11-10, the Center for Creation, Content and is not overly sensitive with respect to parameter selection: the best Technology (CCCT), the BILAND project funded by the CLARIN- parameter values for a prior are in a region of the parameter space nl program, the Dutch national program COMMIT, by the ESF and not a single value. Research Network Program ELIAS, and the Elite Network Shifts Efficiency. Query runtime efficiency is of little importance project funded by the Royal Dutch Academy of Sciences. when it comes to distinguishing between document priors: if the parameters are known, all document priors boil down to simple References look-ups. We use the number of parameters as a way of assessing [1] A. G. Chessa and J. M. Murre. A memory model for internet hits after the efficiency of a prior. media exposure. Physica A Statistical Mechanics and its Applications, Cognitive plausibility. We define the cognitive plausibility 2004. of a document prior (derived from a retention function) with the [2] X. Li and W. B. Croft. Time-Based Language Models. In CIKM ’03, goodness of fit in large scale human experiments [4]. This conveys 2003. an experimental, but objective, view on cognitive plausibility. We [3] K. Massoudi, E. Tsagkias, M. de Rijke, and W. Weerkamp. Incorpo- rating Query Expansion and Quality Indicators in Searching Microblog also use a more subjective definition of plausibility in terms of neu- Posts. In ECIR 2011, 2011. robiological background and how far the retention function has a [4] M. Meeter, J. M. J. Murre, and S. M. J. Janssen. Remembering the biological explanation. news: modeling retention data from a study with 14,000 participants. Memory & Cognition, 33:793–810, 2005. Discussion. To ensure comparability with previous work, we use [5] M.-H. Peetz and M. de Rijke. Cognitive temporal document priors. In different models for different datasets: TREC-2 and TREC-{6,7,8} 34th European Conference on Information Retrieval (ECIR’13), 2013. for news and Tweets2011 for social media. On the news data set, we analyse the effect of different temporal priors on the perfor- mance of the baseline, query likelihood with Dirichlet smooth- ing [2]. We optimize parameters for different priors on TREC-6 using grid search. On the Tweets2011 data set, we analyse the ef- fect of different temporal priors incorporated in the query model- ing [3]. Table 1 gives an overview of the assessment of different docu- ment priors. We find that all but BW, AP, and L are stable in the