<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the Decaying Utility of News Recommendation Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Benjamin Kille</string-name>
          <email>benjamin.kille@tu-berlin.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sahin Albayrak</string-name>
          <email>sahin.albayrak@dai-labor.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Technische Universität Berlin</institution>
          ,
          <addr-line>Ernst-Reuter-Platz 7, 10587 Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Technische Universität Berlin</institution>
          ,
          <addr-line>Ernst-Reuter-Platz 7, 10587 Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>For how long will a recommendation model provide adequate recommendations? The answer to this question depends on the kind of model, its underlying data, and the domain among other factors. We analyse four types of models in the news domain on how their predictive performances change. Our observations show that replacing or updating models is necessary to maintain high predictive performance. The evaluation suggests that an exponential decay model describes the changing predictive performance accurately.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information systems → Data stream mining; Recommender
systems;</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        Content providers compete to attract and retain information
consumers in what can be described as “attention economy”. Therein,
consumers trade their attention in exchange for information and
entertainment. Brynjolfsson and Oh (2012) stress the dificulty
quantifying the value of such exchanges. Their estimate puts the
collective annual value for such exchanges in the United States at
100 billion dollars. Ciampaglia et al. (2015) emphasise the limited
attention span for newly published contents. Publishers employ
recommender systems to provide consumers better information
access
        <xref ref-type="bibr" rid="ref2 ref6">(Billsus and Pazzani 2007)</xref>
        . Recommender systems reduce
vast collections of items to manageable subsets. In dynamic
environments, they seek to maximise the number of interactions thus
connecting users and items. The rate at which interactions occur
is directly linked to business success. The more users engage with
the collection of items the more advertisements they encounter.
The more they enjoy the service, the less likely they are to quit
using it. As a result, successful recommender systems represent a
competitive advantage.
      </p>
      <p>Research on recommender systems has produced a myriad of
methods. These methods take data related to users, item, or
interaction between them. Subsequently, they learn regularities and
create a model capturing the essential information. The models
include global rankings, sets of rules, and latent factor representations
among others.</p>
      <p>Consequently, businesses continuously contemplate which model
to use to generate recommendations. Ideally, they would choose
the model maximising users’ attention. Although, determining the
utility of recommendation models has proven a dificult task. Shani
and Gunawardana (2010) point to a variety of properties linked to
the performance of recommender systems. These include accuracy,
novelty, and diversity. In other words, recommender systems ought
to provide relevant, new, and diferent items.</p>
      <p>Frequently, the interaction data is split into disjoint partitions.
One partition, the training set, is used to learn a model
describing the relation amid users and items. The remaining partitions
can be used to (a) optimise parameters, and (b) assess the utility.
Cross-validation, a procedure wherein random partitions are
permutatively used for training or testing, helps to limit the risk of
randomly selecting an unrepresentative sample.</p>
      <p>
        Still, using the described methodology, we merely obtain
information about what the best model would have been at some point
in time. We frame the problem from a slightly diferent perspective.
Suppose we have a set of recommendation models available.
Suppose further that we measure utility by models’ ability to predict
with which items users will interact in the future. We focus on how
the utility of a set of recommendation models changes over time.
In particular, we posit the hypothesis that the utility change can
be modelled in form of an exponential decay function. We use part
of the data set released for CLEF NewsREEL 2017 to conduct our
evaluation
        <xref ref-type="bibr" rid="ref12">(Lommatzsch et al. 2017)</xref>
        . The data set comprises logs
of various news publishers. News represent a particularly suited
domain for our analysis. Publisher publish news articles at high
rates. Simultaneously, readers favour novel news. Consequently,
we expect models’ utility to change rapidly.
      </p>
      <p>This work entails two contributions. First, we formalise the
concept of decaying utility of recommender models in the news domain.
Second, we conduct experiments for four selected models.</p>
      <p>The remainder of this paper commences with Section 2
introducing the notion of decaying utility. Section 3 describes the
experimental design used to analyse the changes in utility over time.
Section 4 presents our observations. Section 5 notes limitations
and discusses our findings. Section 6 relates our work to previously
published results and ideas. Section 7 summarises our findings and
points to directions for future work.
2</p>
    </sec>
    <sec id="sec-3">
      <title>DECAYING UTILITY</title>
      <p>Recommender systems provide lists of suggestions upon request.
The selection follows a set of rules represented in form of a model.
Models are derived from previously recorded data. We define the
utility of such a model as its ability to correctly predict future
inM
teractions amid users and items. Formally, let U = {um }m=1, I =</p>
      <p>N
{in }n=1 refer to the sets of users and items. The recommender
system monitors interactions amid users and items r = (um , in ).
A
Thereby, the system collects a set of interactions Rτ = {rα }α =1,
where interactions occurred in a closed time interval τ = [t0, T ],
and interactions are chronologically ordered tα &lt; tα +1. A
recommendation model MRτ is a function that takes an interaction rα and
returns a list of suggestions (ik , ik+1, . . . , iK ). Let t = [t0, T ] with
t0 &gt; τ . The utility of MRτ with respect to t refers to the number
of interactions rα ∈ Rt where um previously has been suggested
reading in . We normalise the utility by dividing through the
number of requests. A request refers to each interaction occurring in t .
Thereby, we obtain a utility measure which we refer to as response
rate. In practise, the response rate can be monitored by keeping
track of which items have been recommended by the model. We
hypothesise that the utility, or more concretely response rate,
follows an exponential decay. Similar to radioactive decay, readers
perceive an article as particularly interesting close to its
publication. As time progresses, the news has spread and the article attract
fewer readers. Exponential decays is characterised by the function
f (t ) = U · eV t , wherein U and V are the parameters. The function
describes a decay if V &lt; 0. Alternatively, the half-life t1/2 = ln 2
describes the time it takes to arrive at half the initial quantity. −V
3</p>
    </sec>
    <sec id="sec-4">
      <title>EXPERIMENT</title>
      <p>We conducted experiments to measure the change of utility in terms
of response rates for a selection of models. We consider the four
publishers whose characteristics are shown in Table 1. The data
correspond to one week of the NewsREEL 2017 data set. We notice
that sessions include few articles. Publisher B observers merely 3.3
articles per session on average. This impedes using models which
rely heavily on suficiently expressive user profiles such as
collaborative filtering. For each publisher we consider the time between
1–9 February, 2016. We learn four types of models each with the
data of 1 February, 2016. First, the random model takes all articles
and suggests a random subset. Second, the freshness model
suggests the articles in chronologically reversed order of publication.
Third, the popularity model suggests articles proportional to how
frequently they had been read. Fourth, the sequence model uses the
frequency of reading sequences. In other words, given an article in ,
the model suggests another item proportional to the frequency with
which it had been read after in . We apply the model to all requests
in the time 2–8 February, 2016. We determine whether readers
subsequently read any of the suggested articles. With this information,
we compute the average response rate for each hour. In addition,
we monitor newly added articles and derive the coverage of models.
The coverage is defined as the proportion of known articles covered
by any model. The coverage naturally shrinks as the publishers
release more and more articles unknown to the models. We repeat
this procedure shifting the period by one day at a time. Thereby,
we can compare the diferences in response rates for the same day
given diferent models.
4</p>
    </sec>
    <sec id="sec-5">
      <title>EVALUATION</title>
      <p>We consider the change in response rate as an appropriate proxy
for the utility of a recommender model over time. Figure 1 shows
the change in response rates over time for all combinations of
publishers and models. The response rates are plotted on a logarithmic
scale. For all models and publishers, we observe a decreasing trend
in response rates. The sequences model exhibits the highest response
rate for publishers A, B, and C. The popularity model exhibits the
highest response rate for publisher D. The random model performs
worst in the initial phase and mostly stagnates at this level. The
popularity model overtakes the sequences model over time. This
implies that businesses need to carefully monitor performances.</p>
      <p>Figure 2 shows the relation between response rates and coverage
for publisher A. We observe that as coverage decreases all
models loose predictive accuracy. The efect is most apparent for the
freshness model.</p>
      <p>We analyse how much we could gain by retraining the models
on a daily schedule. We focus on the sequences model. Figures 3
contrasts the response rate to the number of requests and coverage.
The top part of each subfigure shows the number of requests. At
times with fewer requests, response rates are based on a smaller
set of interactions. We observe this phenomenon particularly at
night time. The bottom part of each subfigure shows the
coverage. The retrained models are shown in varying colours. Similarly,
the centre part shows the response rates in corresponding colour
schemes. Initially, models have a relatively high predictive quality.
0
50
100
150
[Time] h
The predictive performance subsequently decreases and stabilises
on a noticeably lower level compared to the initial performance.
We observe a noticeable diference in predictive performance amid
the retrained models and their predecessors. This efect appears
closely linked to the coverage, which shows a similar trend. The
observations are consistent on all four publishers and afirm the
expectation of a exponential decay phenomenon. Publisher B
attracts less visitors and exhibits higher variance compared to the
other publishers. Retraining models appears particularly beneficial
to publisher D for which the decline in predictive performance
quickly renders models useless.</p>
      <p>Figure 4 illustrates the loss in predictive performance incurred
when using the initial model as opposed to learning a new model on
the second day. We observe that the loss is highest on the first day
for all publishers. The diferences in utility level of over time. For
publisher B, we observe that the older model occasionally performs
better than the new model.</p>
      <p>We have fitted an exponential function to our results using the
least squares method. Table 2 conveys the exponential fits to the
response rates for combinations of publishers and models. We observe
that the initial response rates (U ) vary considerably. The random
model has particularly low initial response rates. Conversely, the
sequences model scores highest with respect to initial response rates.
All fits exhibit decay, V &lt; 0, with the exception of the random
model for publisher B. Recall that publisher B observed less
interactions than the other publishers. This could cause higher levels of
variance.
5</p>
    </sec>
    <sec id="sec-6">
      <title>DISCUSSION AND LIMITATIONS</title>
      <p>The evaluation indicates that exponential decay models represent
a suited first attempt to mathematically describe how the utility of
recommendation models changes over time. The parameters vary
among publishers and models. Still, Figure 3 shows similar trends
for the sequence models across all publishers and despite which day
we picked. The coverage appears highly related to the decaying
response rates. Figure 3 and Figure 2 illustrate this relation. As time
passes, publishers add new articles to their collections. Unless we
update the models used to provide recommendations, they cover a
lesser proportion of articles. The distribution of requests over the
course of the day afects the response rates. Figure 3 illustrates the
diferences in requests for all four publishers. We observe a periodic
pattern with more requests during the day and fewer requests at
night. In addition, we observe that as the coverage arrives at 50 %
the response rates level of for the sequence models and all four
publishers. Figure 4 shows that switching to a retrained model is
most beneficial on the first day. This suggests that publishers should
replace or update their models at least once a day. Additional
experimentation is necessary to analyse how the choice of data used
to create the model afects its utility. We have kept the training
data set to the length of one day in our experiments. Using more
data and/or diferent types of models represents the direction to
further explore. Our experiments used recorded data and inferred
the utility rather than observing actual interactions resulting from
recommendations generated by our models. Joachims et al. (2017)
discuss how counterfactual reasoning facilitates using logged
information more efectively. Unfortunately, we lack the required
information on internal parameters of the recommender systems
RR</p>
      <p>1
10−1
10−2
10−3
10−4
10−5
6R0,e0q0.0
40,000
20,000</p>
      <p>0
1.0
RRS
0.8
0.6
0.4
0.2</p>
      <p>0
1.0
Cov.</p>
      <p>0.5</p>
      <p>0
60,000
Req.
40,000
20,000</p>
      <p>0
1.0
RRS
0.8
0.6
0.4
0.2
0.5
0
0
1.0
Cov.</p>
      <p>0
0
to apply their method. Our experiments are based on part of the
NewsREEL data set. In order to verify our findings, we have to
conduct experiments with the feedback of actual readers. This will
confirm whether the selection of publishers or the time period may
have biased the findings.
6</p>
    </sec>
    <sec id="sec-7">
      <title>RELATED WORK</title>
      <p>The decreasing predictive performance of models has been
discussed by Jambor et al. (2012) for the domain of movies. They
employed methods from Control Theory to devise an optimised
updating strategy. Movies exhibit diferent characteristics than news.
In particular, people tend to revisit movies much more frequently
than news thus impeding comparisons to our work. Koren (2009)
focused on collaborative filtering. He introduced a latent factor
model which captures the temporal development of preferences.
Thereby, he could more accurately predict how users rate movies.
Collaborative filtering requires expressive user profiles with
sufifciently clearly stated preferences. News consumption happens
anonymously disallowing creating such profiles. As Table 1
illustrates, publishers generally get to know readers’ preferences for few
articles. News recommender systems have to work in conditions</p>
      <p>Loss of Predictive Performance ct-1 - ct
0.2
A 0
−0.2
−0.4
0.2
B 0
−0.2
−0.4
0.2
C 0
−0.2
−0.4
0.2
D 0
−0.2
−0.4
40
60
80
100
120
140
160
180
[Time] h
in which little information is available about user preferences.
Baltrunas and Amatriain (2009) extended the time-aware collaborative
ifltering to implicit feedback. Implicit feedback can be derived from
log files such as the ones used in our experiment. Still, they apply
their method to movies, which again exhibit characteristics
diferent to news. Campos et al. (2014) discussed time-aware evaluation
protocols. They introduce a scheme to categorise evaluation
protocols focussing on rating prediction. Their scheme assigns our work
the time-dependent cross-validation category. Much of the work
on time-aware evaluation of recommender systems has focused on
movies and rating prediction.</p>
      <p>
        Das et al. (2007) present the news personalisation systems used
for Google’s news aggregator. Their system employs covisitation
counts similar to our sequence model. In addition, they use
probabilistic latent semantic indexing and MinHash clustering to improve
their response rates. The news aggregator has access to much more
comprehensive user profiles for the subset of users reading news
while logged in with their Google accounts.
        <xref ref-type="bibr" rid="ref11">Li et al. (2010)</xref>
        represent
news recommendation as contextual-bandit problem. Therein, the
system has a set of choices modelled figuratively as arm of bandit
found in casinos. The system learns how to choose depending on
the context. Garcin et al. (2013) introduce the notion of context
trees to news recommendation. Context trees capture
particularities of situation and use them to select a better set of article to be
recommended.
We have introduced the notion of utility decay for news
recommender systems. The utility decay refers to a model’s
decreasing ability to correctly anticipate future interactions amid users
and items. Experiments with data from four publishers have
conifrmed that exponential decay functions can be used to describe the
changes of response rates over time. We observed a similar pattern
for the coverage, the proportion of articles a model can potentially
suggest. We conjecture that there is a strong relation between the
two quantities. The relation depends on factors including the
publisher and the type of model. Further evaluation is necessary to
improve the understanding of utility decay in news
recommendation. First, we will consider varying the time span used to learn a
model. This will show whether reducing or increasing the amount
of data describes the changes of response rates more accurately.
Second, we will consider additional types of models. With little
information concerning users, we plan to evaluate an item-based
latent factor model. We intend to participate in the next edition of
NewsREEL to verify our findings with the feedback of actual news
readers. Finally, we will evaluate additional time periods to verify
that the observed pattern is not due to choosing a particular time.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Linas</given-names>
            <surname>Baltrunas</surname>
          </string-name>
          and
          <string-name>
            <given-names>Xavier</given-names>
            <surname>Amatriain</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Towards Time-dependant Recommendation based on Implicit Feedback</article-title>
          . Workshop on Context-aware
          <source>Recommender Systems</source>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Billsus</surname>
          </string-name>
          and
          <string-name>
            <given-names>Michael J</given-names>
            <surname>Pazzani</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Adaptive News Access. The Adaptive Web (</article-title>
          <year>2007</year>
          ),
          <fpage>550</fpage>
          -
          <lpage>570</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Erik</given-names>
            <surname>Brynjolfsson and JooHee Oh</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>The Attention Economy - Measuring the Value of Free Digital Services on the Internet</article-title>
          .
          <source>ICIS</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Pedro G Campos</surname>
          </string-name>
          ,
          <article-title>Fernando Díez</article-title>
          , and
          <string-name>
            <given-names>Iván</given-names>
            <surname>Cantador</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Time-aware Recommender Systems: a Comprehensive Survey and Analysis of Existing Evaluation Protocols. User Modeling and User-</article-title>
          <source>Adapted Interaction 24</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          (
          <year>2014</year>
          ),
          <fpage>67</fpage>
          -
          <lpage>119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Luca</surname>
          </string-name>
          <string-name>
            <surname>Ciampaglia</surname>
          </string-name>
          , Alessandro Flammini, and
          <string-name>
            <given-names>Filippo</given-names>
            <surname>Menczer</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>The production of information in the attention economy</article-title>
          .
          <source>Scientific reports 5</source>
          , 1 (May
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Abhinandan Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mayur Datar</surname>
            , Ashutosh Garg, and
            <given-names>Shyamsundar</given-names>
          </string-name>
          <string-name>
            <surname>Rajaram</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Google news personalization - scalable online collaborative filtering.</article-title>
          .
          <source>In WWW. ACM</source>
          , New York, New York, USA,
          <fpage>271</fpage>
          -
          <lpage>280</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Florent</given-names>
            <surname>Garcin</surname>
          </string-name>
          , Christos Dimitrakakis, and
          <string-name>
            <given-names>Boi</given-names>
            <surname>Faltings</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Personalized news recommendation with context trees</article-title>
          .. In RecSys. ACM Press, New York, New York, USA,
          <fpage>105</fpage>
          -
          <lpage>112</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Tamas</given-names>
            <surname>Jambor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Jun</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Neal</given-names>
            <surname>Lathia</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Using Control Theory for Stable and Eficient Recommender Systems.</article-title>
          .
          <source>In WWW. ACM</source>
          , New York, New York, USA,
          <fpage>11</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Thorsten</given-names>
            <surname>Joachims</surname>
          </string-name>
          , Adith Swaminathan, and
          <string-name>
            <given-names>Tobias</given-names>
            <surname>Schnabel</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Unbiased Learningto-Rank with Biased Feedback</article-title>
          .
          <source>In the Tenth ACM International Conference</source>
          . ACM Press, New York, New York, USA,
          <fpage>781</fpage>
          -
          <lpage>789</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Yehuda</given-names>
            <surname>Koren</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Collaborative Filtering with Temporal Dynamics</article-title>
          .
          <source>KDD</source>
          (
          <year>2009</year>
          ),
          <fpage>447</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Lihong</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Robert E Schapire</given-names>
            ,
            <surname>Wei</surname>
          </string-name>
          <string-name>
            <surname>Chu</surname>
          </string-name>
          , John Langford,
          <string-name>
            <given-names>and John</given-names>
            <surname>Langford</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>A contextual-bandit approach to personalized news article recommendation. In the 19th international conference</article-title>
          . ACM Press, New York, New York, USA,
          <fpage>661</fpage>
          -
          <lpage>670</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          , Benjamin Kille, Frank Hopfgartner, Martha Larson, Torben Brodt, Jonas Seiler, and
          <string-name>
            <given-names>Özlem</given-names>
            <surname>Özgöbek</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>CLEF 2017 NewsREEL Overview: A Stream-based Evaluation Task for Evaluation and Education</article-title>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Guy</given-names>
            <surname>Shani</surname>
          </string-name>
          and
          <string-name>
            <given-names>Asela</given-names>
            <surname>Gunawardana</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Evaluating Recommendation Systems</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>