Cognitive Temporal Document Priors (Abstract)                                                                                  ˚


                                          Maria-Hendrike Peetz and Maarten de Rijke
                                                        ISLA, University of Amsterdam
                                                      {M.H.Peetz, derijke}@uva.nl


1.    INTRODUCTION                                                         have little cognitive background.
   Every moment of our life we retrieve information from our brain:           Memory Chain Model. The memory chain model [1] assumes a
we remember. We remember items to a certain degree: for a men-             multi-store system of different levels of memory. The probability
tally healthy human being retrieving very recent memories is virtu-        to store an item in one memory being µ,
ally effortless, while retrieving untraumatic memories from the past
is more difficult [4]. Early research in psychology was interested in                          fMCM-1 pD, q, gq “ µe´aδg pq,Dq .                         (1)
the rate at which people forget single items, such as numbers. Psy-        The parameter a indicates how items are being forgotten. The func-
chology researchers have also studied how people retrieve events.          tion fMCM-1 pD, q, gq is equivalent to the exponential decay in [2]
Chessa and Murre [1] record events and hits of web pages related           when the two parameters (µ and a) are equal. In the two-store
to an event and fit models of how people remember, the so-called           system, an item is first remembered in short term memory with a
retention function. Modeling the retention of memory has a long            strong memory decay, and later copied to long term memory. Each
history in psychology, resulting in a range of proposed retention          memory has a different decay parameter, so the item decays in both
functions. In information retrieval (IR), the relevance of a docu-         memories, at different rates. The overall retention function is
ment depends on many factors. If we request recent documents,                                          ´                                                     ¯
                                                                                                                          µ2
then how much we remember is bound to have an influence on the                                      ´µ1 e´a1 δg pq,Dq ` a ´a pe´a2 δg pq,Dq ´e´a1 δg pq,Dq q
relevance of documents. Can we use the psychologists’ models of            fMCM-2 pD, q, gq “ 1´e                         2    1                                 ,
the retention of memory as (temporal) document priors? Previous                                                                                 (2)
work in temporal IR has incorporated priors based on the exponen-          where an overall exponential memory decay is assumed. The pa-
tial function into the ranking function [2, 3]—this happens to be          rameter µ1 and µ2 are the likelihood that the items are initially saved
one of the earliest functions used to model the retention of mem-          in short and long term memory, whereas a1 and a2 indicate the for-
ory. Many other such functions have been considered by psychol-            getting of the items. Again, t is the time bin.
ogists to model the retention of memory—what about the potential              One can also consider the Weibull function
                                                                                                               ˆ                ˙
of other retention functions as temporal document priors?                                                           aδg pD,qq d

   Inspired by the cognitive psychology literature on human mem-                              fBW pD, q, gq “ e´ d                ,             (3)
ory and on retention functions in particular, we consider seven tem-
poral document priors. We propose a framework for assessing                and its extension
them, building on four key notions: performance, parameter sensi-                                                             ´
                                                                                                                                aδ pD,qq d
                                                                                                                                        ¯
                                                                                                                               ´ gd
tivity, efficiency, and cognitive plausibility, and then use this frame-                 fEW pD, q, gq “ b ` p1 ´ bqµe                       .           (4)
work to assess those seven document priors. We show that on sev-
eral data sets (newspaper and microblog), with different retrieval         Here, a and d indicate how long the item is being remembered:
models, the exponential function as a document prior should not be         a indicates the overall volume of what can potentially be remem-
the first choice. Overall, other functions, like the Weibull function,     bered, d determines the steepness of the forgetting function; µ de-
score better within our proposed framework.                                termines the likelihood of initially storing an item, and b denotes
                                                                           an asymptote parameter.
                                                                              The power function is ill-behaved between 0 and 1 and usual
2.    METHODS                                                              approximations start at 1. The amended power function is
   We introduce basic notation and then describe several retention
functions serving as temporal document priors.                                         fAP pD, q, gq “ b ` p1 ´ bqµpδg pD, qq ` 1qa ,                    (5)
   We say that document D in document collection D has time
                                                                           where a, b, and µ are the decay, an asymptote, and the initial learn-
timepDq and text textpDq. A query q has time timepqq and text
                                                                           ing performance.
textpqq. We write δg pq, Dq as the time difference between timepqq
                                                                             A very intuitive baseline is given by the linear function,
and timepDq with the granularity g.
   We introduce a series of retention functions. The memory chain                                            ´pa ¨ δg pq, Dq ` bq
models ((1) and (2)) build on the assumptions that there are differ-                        fL pD, q, gq “                        ,                      (6)
                                                                                                                       b
ent memories. The Weibull functions ((3) and (4)) are of interest
to psychologists because they fit human retention behavior well. In        where a is the gradient and b is δg pq, argmaxD1 PD δg pq, D1 qq. Its
contrast, the retention functions linear and hyperbolic ((6) and (7))      range is between 0 and 1 for all documents in D .
                                                                             The hyperbolic discounting functionhas been used to model how
˚ The full version of this paper appeared in ECIR 2013 [5].                humans value rewards: the later the reward the less they consider
                                Table 1: Assessing temporal document priors; # improved queries is w.r.t. MCM-1.

                    Condition                    MCM-1      MCM-2            BW         EW          AP          L            HD
                    # impr. queries (temp.)           n/a   14 (58%)         5 (20%)    16 (67%)    5 (20%)     2 (8%)       6 (25%)
                    # impr. queries (non-temp.)       n/a   27 (35%)         35 (46%)   26 (34%)    38 (50%)    36 (47 %)    33 (43%)
                    # impr. queries (Tweets2011)      n/a   16 (32%)         17 (34%)   22 (44%)    0 (0%)      17 (34 %)    21 (42%)
                    MAP                               +     –                +          0           0           –            0
                    P10                               –     –                0          –           0           0            0
                    Rprec                             0     ˘                +          ˘           0           0            0
                    MRR                               0     0                +          0           +           +            +
                    Sensitivity of parameters         –     –                +          –           +           +            +
                    Efficiency: # parameters          2     4                2          4           3           2            1
                    Plausibility: fits human behav.   +     ++               +          ++          +           n/a          n/a
                    Plausibility: neurobiol. expl.    +     +                –          +           –           –            –


the reward worth. Here,                                                            parameter optimisation. Of those functions, BW and L have only
                                         1                                         few parameters, and BW performs best.
                fHD pD, q, gq “                      ,                 (7)
                                ´p1 ` k ˚ δg pq, Dqq
                                                                                   4.       CONCLUSION
where k is the discounting factor.
                                                                                      We have proposed a new perspective on functions used for tem-
                                                                                   poral document priors used for retrieving recent documents. We
3.    EXPERIMENTS                                                                  showed how functions with a cognitive moti- vation yield similar,
  We propose a set of three criteria for assessing temporal docu-                  if not significantly better results than others on news and microblog
ment priors and we determine whether the priors meet the criteria.                 datasets. In particular, the Weibull function is stable, easy to opti-
                                                                                   mize, and motivated by psychological experiments.
A framework for assessing temporal document priors.
   Performance. A document prior should improve the perfor-                        Acknowledgments. We thank Jessika Reissland for her inspira-
mance on a set of test queries for a collection of time-aware docu-                tion. This research was partially supported by the European Union’s
ments. A well-performing document prior improves on the stan-                      ICT Policy Support Programme as part of the Competitiveness and
dard evaluation measures across different collections and across                   Innovation Framework Programme, CIP ICT-PSP under grant agree-
different query sets. We use the number of improved queries as                     ment nr 250430, the European Community’s Seventh Framework
well as the stability of effectiveness with respect to different eval-             Programme (FP7/2007-2013) under grant agreements nr 258191
uation measures as an assessment for performance, where stability                  (PROMISE Network of Excellence) and 288024 (LiMoSINe project),
refers to that improved or non-decreasing performance over several                 the Netherlands Organisation for Scientific Research (NWO) under
test collections.                                                                  project nrs 612.061.814, 612.061.815, 640.004.802, 727.011.005,
   Sensitivity of parameters. A well-performing document prior                     612.001.116, HOR-11-10, the Center for Creation, Content and
is not overly sensitive with respect to parameter selection: the best              Technology (CCCT), the BILAND project funded by the CLARIN-
parameter values for a prior are in a region of the parameter space                nl program, the Dutch national program COMMIT, by the ESF
and not a single value.                                                            Research Network Program ELIAS, and the Elite Network Shifts
   Efficiency.     Query runtime efficiency is of little importance                project funded by the Royal Dutch Academy of Sciences.
when it comes to distinguishing between document priors: if the
parameters are known, all document priors boil down to simple                      References
look-ups. We use the number of parameters as a way of assessing
                                                                                   [1] A. G. Chessa and J. M. Murre. A memory model for internet hits after
the efficiency of a prior.                                                             media exposure. Physica A Statistical Mechanics and its Applications,
   Cognitive plausibility.      We define the cognitive plausibility                   2004.
of a document prior (derived from a retention function) with the                   [2] X. Li and W. B. Croft. Time-Based Language Models. In CIKM ’03,
goodness of fit in large scale human experiments [4]. This conveys                     2003.
an experimental, but objective, view on cognitive plausibility. We                 [3] K. Massoudi, E. Tsagkias, M. de Rijke, and W. Weerkamp. Incorpo-
                                                                                       rating Query Expansion and Quality Indicators in Searching Microblog
also use a more subjective definition of plausibility in terms of neu-                 Posts. In ECIR 2011, 2011.
robiological background and how far the retention function has a                   [4] M. Meeter, J. M. J. Murre, and S. M. J. Janssen. Remembering the
biological explanation.                                                                news: modeling retention data from a study with 14,000 participants.
                                                                                       Memory & Cognition, 33:793–810, 2005.
Discussion. To ensure comparability with previous work, we use                     [5] M.-H. Peetz and M. de Rijke. Cognitive temporal document priors. In
different models for different datasets: TREC-2 and TREC-{6,7,8}                       34th European Conference on Information Retrieval (ECIR’13), 2013.
for news and Tweets2011 for social media. On the news data set,
we analyse the effect of different temporal priors on the perfor-
mance of the baseline, query likelihood with Dirichlet smooth-
ing [2]. We optimize parameters for different priors on TREC-6
using grid search. On the Tweets2011 data set, we analyse the ef-
fect of different temporal priors incorporated in the query model-
ing [3].
   Table 1 gives an overview of the assessment of different docu-
ment priors. We find that all but BW, AP, and L are stable in the