<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CLEF 2017 NewsREEL Overview: Offline and Online Evaluation of Stream-based News Recommender Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Benjamin Kille</string-name>
          <email>benjamin.kille@dai-labor.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Lommatzsch</string-name>
          <email>andreas.lommatzsch@dai-labor.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frank Hopfgartner</string-name>
          <email>frank.hopfgartner@glasgow.ac.uk</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martha Larson</string-name>
          <email>m.a.larson@tudelft.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Torben Brodt</string-name>
          <email>torben.brodt@plista.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Plista GmbH</institution>
          ,
          <addr-line>Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Radboud University</institution>
          ,
          <addr-line>Nijmegen, and TU Delft, Delft</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>TU Berlin</institution>
          ,
          <addr-line>Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Glasgow</institution>
          ,
          <addr-line>Glasgow</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The CLEF NewsREEL challenge allows researchers to evaluate news recommendation algorithms both online (NewsREEL Live) and offline (NewsREEL Replay). Compared with the previous year NewsREEL challenged participants with a higher volume of messages and new news portals. In the 2017 edition of the CLEF NewsREEL challenge a wide variety of new approaches have been implemented ranging from the use of existing machine learning frameworks, to ensemble methods to the use of deep neural networks. This paper gives an overview over the implemented approaches and discusses the evaluation results. In addition, the main results of Living Lab and the Replay task are explained.</p>
      </abstract>
      <kwd-group>
        <kwd>recommender systems ¢ news ¢ multi-dimensional evaluation ¢ liv- ing lab ¢ stream-based recommender</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>The development of recommender services based on stream data is a challenging task.
Systems optimized for handling streams must be able to ensure highly precise
recommendations taking into account the continuous changes in the stream as well as changes in
the user preferences. In addition to technical complexity of the algorithms must be
considered ensuring the seamless integration of recommendations into existing applications
as well as ensuring the scalability of the system.</p>
      <p>
        Researchers in Academia often focus on the development of algorithms only tested
based on static datasets due to the lack of access to live data. CLEF NewsREEL [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
provides the opportunity to evaluate algorithms both based on live data (NewsREEL
Live Task) and offline simulated streams (NewsREEL Replay Task). The benchmarking
of the algorithms considers both the recommendation precision (measured by the
ClickThrough-Rate) and technical aspects (measured by reliability and response time). The
Replay Task gives new participants and students an easy access to the NewsREEL
challenge due to the fact the task can be run on standalone hardware without online
access and the necessity to fulfill specific time constraints. In addition, the Replay task
simplifies the debugging and the simulation of streams. Algorithms shown to be working
offline can then evaluated in the NewsREEL Live task without any changes.
      </p>
      <p>In the 2017’s edition of CLEF NewsREEL participants have implemented and
evaluated a wide spectrum of algorithms. Most teams participated in both online and the
offline evaluation. In this paper we give an overview over the implemented approaches
and discuss the evaluation results. The paper is structured as follows. In Section 2,
we briefly outline the recommendation scenario that is addressed by NewsREEL. In
Section 3, we provide an overview of the teams that registered to participate. Results are
presented and summarized in Sections 4 and 5 and discussed in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Scenario and Lab Setup</title>
      <p>In 2017, NewsREEL has continued the quest to bridge the worlds of data-driven offline
evaluation and user-centric live experience. NewsREEL has offered two tasks:
NewsREEL Live and NewsREEL Replay. We describe both tasks and conclude the section
with a dicussion of meta-challenges for participants.
2.1</p>
      <sec id="sec-2-1">
        <title>NewsREEL Live: Benchmarking News Recommendations in a Living Lab</title>
        <p>
          NewsREEL continues to provide participants the unique opportunity to explore how
their ideas affect news readers. Participants deploy their recommendation algorithms
and connect it to the Open Recommendation Platform (ORP) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Subsequently, their
system receives different types of messages initiating on a selection of news publishers.
Messages of type item update inform about changes to the set of news articles. New
articles may be added and existing articles may be updated. Messages of type event
notification convey happenings on the publishers platform caused by readers’ actions.
Readers may access news articles or click on recommendations. Error messages notify
participants about system malfunctions. These include delayed responses and invalid
items. Finally, messages of type recommendation request expect a list of news articles
in return. The recommendations will be displayed to the reader if the participant is
randomly selected among all active systems.
        </p>
        <p>ORP keeps track of readers’ reaction to recommendations. For each participants, it
counts the clicks as well as requests. Requests can be considered on two level. On the
one hand, we may consider a request for recommendations as a single entity. On the
other hand, we may consider a request for each individual item being recommended.
Readers clicking on a recommended article will typically trigger the page being reloaded.
As a result, the former way to count requests will yield a lower number than the latter.
We challenged participants to find the configuration which minimizes the click through
rate (CTR). Herein, the CTR refers to the number of clicks divided by the number of
requests counting lists instead of individual items.</p>
        <p>Our partners at plista have revised ORP for NewsREEL 2017. Changes concerned
both front-end and back-end. The former user interface has been deprecated in favor
of a replacement which is under ongoing development. Figure 1 and Figure 2 show
the currently available state of the new user interface. The new interface comes with a
flexible way to create dashboards. Participants can arrange their favorite information
as they please. Note, that the user interface has not yet been available in NewsREEL
2017. Participants received a tutorial illustrating how to use ORP by means of API calls
until the new user interface would be available. Plista migrated ORP to a new server and
extended its API. Calling the API, participants can control the communication with ORP
programmatically. The data format has been kept to reduce the efforts to update existing
implementations for NewsREEL’s participants.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>NewsREEL Replay: Benchmarking Stream-based News Recommendations Offline</title>
        <p>The NewsREEL Replay task allows participants to evaluate stream-based news
recommender algorithms offline. As described by Scriminaci [18], offline evaluation ensures
the exact reproducibility of experiments as well as the fine-grained analysis and
optimization of algorithms. Participants can simulate different load scenarios as well as check
the reliability of new approaches. Teams can optimize parameters before deploying
algorithms to NewsREEL Live.</p>
        <p>
          For the NewsREEL Replay task a data set and software components for simulating
the data stream are provided. Participants have access to a data set comprising a collection
of messages analogous to NewsREEL Live. The messages are chronologically ordered
and cover the period of four weeks starting on February 1, 2016. For further details
about the nature of the data set, we refer to [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Plista’s customer base changes over
time which is why some publishers are currently accessible through ORP but are not
included in the data set. Besides the data set, participants receive software to conduct
offline evaluations. The software simulates two systems. On the one hand, the software
emulates ORP’s functionality. This part sends recommendation requests, compares
recommendations to logs with later timestamps, and records the time taken until the
recommendations arrive. On the other hand, the software emulates the recommendation
engine. Participants include their own implementation in this part and modify it to
determine how changes affect the performance. The software produces estimated click
through rates and response time distributions. The estimated CTR relates to impressions
rather than clicks. For further details about this evaluation resource, we refer to [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>Analogous to the online evaluation, participants ought to find the configuration
with the highest CTR. Simultaneously, the offline evaluation reveals how changing
recommendation algorithms affect the response time. Participants generate insights on
which configurations accomplish a reasonable trade-off amid prediction accuracy and
response time.</p>
        <p>In 2017, we have released a new data set. The data set covers a four week period and
adheres to the format used in previous editions of NewsREEL. The software used to
conduct offline expirments facilitates re-using existing implementations. Hence, participants
experience minimal requirements to start their experiments.
2.3
NewsREEL 2017 constitutes a major revision with changes to main resources. Plista
revised and migrated ORP to achieve better stability, maintainability, and flexibility. We
released a new data set with more recent interactions. We moved from Idomaar to a new
evaluator. It took time to update support materials such as tutorial, descriptions, and
references. Having previous materials available, some participants reported confusion.
The new user interface has not been finished in time to engage pariticipants. The scale
of the data set has been challenging for participants. NewsREEL has been used within
the scope of university lectures. This confirms the interest of academic institutions to
provide students with more realistic problems.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Participation</title>
      <p>
        In the 2017 edition of NewsREEL, 87 participants have registered. Both tasks attracted
similarly many participants with NewsREEL Replay slightly ahead with 79 registrations
compared to NewsREEL Live with 64 participants. Participants deployed 27
recommendation services in Task 1. We received a total of six working notes describing participants’
approaches. For a more detailed analysis of peoples’ motivation to participate, we refer
to [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        participation
15
10
5
This section presents the results for both tasks of NewsREEL 2017. First, Section 4.1
introduces participants’ achievements in NewsREEL Live. Second, Section 4.2 illustrates
participants’ results in NewsREEL Replay. For results of the previous campaigns, we
refer to [
        <xref ref-type="bibr" rid="ref11 ref12 ref7">7, 12, 11</xref>
        ].
ORP has undergone revisions until March, 2017. Plista created accounts for all registered
participants on March 23, 2017. From this time on, they could establish communication
with ORP to initiate evaluations. This allowed participants to explore parameter space
to find the optimal configuration of their algorithms. Setting up their systems has been
a challenging endeavor. They had to implement, deploy, and maintain their systems.
Nineteen systems have been active in the evaluation period starting on April 23 and
ending with May 7, 2017. An error in ORP’s internal logging occurred on April 28, 2017.
Unfortunately, no information is available for this day. Table 1 lists our observations for
all nineteen systems. Systems are presented in alphabetical order. The system “BL2Beat”
refers to the organizers’ baseline implementation. Participants registered from as few as
two up to as many as 1268 clicks in the fourteen day period. The number of impressions
refers to how often the recommendations of a system have been shown to readers.
We observe a considerable variance from 349 to 81 245 impressions. The variance
emerges as some participants had their systems connect for longer periods than some
competitors. Some participants were connected to ORP for 289 h, whereas other systems
remained disconnected for most of the time. Table 1 includes the average number of
clicks and impressions per hour. These values reveal whether participants experienced
similar conditions. On average, participants received 203.6 (mean) or 224.0 (median)
impressions per hour. Incidentally, the median value refers precisely to the baseline
implementation. On average, participants registered 2.6 (mean) or 2.8 (median) clicks
per hour.
      </p>
      <p>
        Figure 4 illustrates the performance in more detail. Each triangle corresponds to
an algorithm which served recommendations to ORP. The x-value refers to the total
number of impressions. The y -value refers to the total number of clicks. Consequently,
the triangles’ positions indicate the average CTR per day. Two colored areas highlight
ranges of the CTR. The blue area refers to CTR below 1 %, whereas the brown area
refers to CTR above 2 %. A majority of participants finds itself in between both areas.
A few participants have been active for a relatively short period. They recieved few
impressions and clicks and thus clutter close to the origin.
The Offline Evaluation task has attracted several teams. The teams engaged in the
NewsREEL Replay mainly focused on testing new recommender approaches (e.g. deep
neuronal networks [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]), the efficiently optimization of parameter configuration (e.g.
finding similarity metrics for Collaborative Filtering [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]), and on studying the technical
complexity of algorithms. NewsREEL Replay does not require a permanent internet
connection. This ensures a low barrier to participate in the NewsREEL challenge and
motivates participant to test new ideas and algorithms.
      </p>
      <p>CTR &lt; 1%
Testing new Approaches Applying innovative ideas in a recommendation scenario
typically requires extended testing and debugging. Before setting up a stable running
live system, algorithms are prototypically implemented in order to proof that the new
approach is suitable for the scenario. The NewsREEL Replay task provides such a
testing environment. Participants can simulate the stream on local hardware and study
the strength and weaknesses of new algorithms. The offline tests can control the load (by
defining the number of concurrent messages sent by the offline simulation environment)
and debug the functionality of the implemented solution. In the NewsREEL challenge
2017 most new teams tested the algorithms first offline before participating in the
NewsREEL Live task. New recommender approaches based on Contextual Bandits and
Deep Neural Networks have been evaluated offline.</p>
      <p>
        Parameter Optimization In addition to the testing of new approaches, the optimization
of suitable parameter configuration is an important task. The parameter configuration
requires a sufficiently large data stream in order to ensure significant optimization results.
For speeding up the parameter optimization a parallelization of the optimization should
be supported. The NewsREEL Replay task addresses this need. The provided dataset and
the offline stream simulation components allow participants to simulate the data stream
in parallel on different machines and with different hardware configuration. In addition,
the simulated stream can be replayed faster in order to accelerate the optimization
process. The offline stream simulation ensures reproducible evaluation results as well
as the comparability of the results obtained in different evaluation runs. This aspect
of the NewsREEL Replay task as been extensively used by several teams (e.g. by
Beck et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]).
      </p>
      <p>Technical aspects The tight time constraints in the NewsREEL Live task and the
continuous changes in the number of messages make it difficult to analyze the technical
aspects of implemented algorithms. In the NewsREEL Live task several peaks in the
number of messages can be observed. Algorithms running in NewsREEL Live must
be able to handle such load peaks. The offline stream simulation component allows
participants to analyze load peaks by defining the numbers of messages concurrently
sent to the recommender. This helps participants to identify bottlenecks and to study the
handling of concurrent messages. The analysis of the response time has been conducted
by several teams by plotting histograms describing the frequency of different response
times. This is of special interest in ensemble-based approaches integrating different
algorithms with varying technical complexities.</p>
      <p>Discussion The NewsREEL Replay task enables the fine-grained analysis of new
algorithms and allows participants the efficient optimization of parameters. As NewsREEL
Replay can be run offline without considering response time constraints, it is a good
starting point for new participants to evaluate new ideas and algorithms. NewsREEL
Replay has been used by most participants for optimizing the algorithms with respect to
both recommendation precision and technical complexity.</p>
    </sec>
    <sec id="sec-4">
      <title>Working Notes Summary</title>
      <p>In NewsREEL 2017 the participants have evaluated a broad spectrum of recommender
approaches ranging from using existing frameworks and tools to ensemble methods to
the use of deep neural networks.</p>
      <p>
        Bons et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] developed a graph-based recommender algorithm. The graph consists
of nodes representing the items and directed edges describing the frequency that the two
connected news items are read in the specific sequence. Recommendation requests are
answered by computing the strongest item sequence containing the itemID given in the
recommendation request. The graph is managed in a Neo4j graph database.
Recommendations are computed based on a database query. If the itemID in the recommendation
request does not exist in the graph or the node is not yet connected with the graph, the
most recently created news items are return. The evaluation of the strategy shows that the
implemented graph-based recommender reaches a high CTR in the Living Lab scenario.
The implementation works efficiently ensuring that the time-constraints with respect to
response time are reliably fulfilled.
      </p>
      <p>
        Golian and Kuchar [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] analyze click patterns in time series from NewsREEL 2016.
They show that a limited set of news items attract a majority of clicks, and that they
continue to dominate for longer times than expected. The manuscript presents a series of
experiments in the context of online news recommender system evaluation. The authors
report that content-based methods achieve considerably lesser click-through-rates than
popularity-based methods.
      </p>
      <p>Ludmann [17] focuses on managing streams. His system relies on Odysseus, a data
stream management systems. Therein, he defines a set of queries which take part of the
data stream and determine the most popular articles. The selection entails the length of
the data stream segment as essential parameter. The working notes presents observations
in NewsREEL Live with a variety of parameter configurations. Results suggest that
considering successful recommendations improves the click through rates.</p>
      <p>
        Beck et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] developed a hybrid recommender system combining item-based
Collaborative Filtering algorithms with a most popular recommender. The system is
implemented using the Apache Mahout framework. The message stream is processed in
split into batches of equal size. Having collected the required number of messages for a
batch, the system builds a recommender model for this batch. When the model building
is completed, the new model replaces the old model. In order to ensure that for every
request recommendations are provided, a most popular item recommender runs a backup
recommender. If the Collaborative Filtering-based recommender fails or does not provide
a sufficient number of results, the recommendation result is completed by the backup
recommender. The evaluation of the recommender shows that the implemented solution
provides highly precise results and fulfills the technical requirements with respect to
response time and scalability.
      </p>
      <p>
        Liang et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] discuss how contextual bandits can be used to compute
recommendations. The authors define a list of recommendation models considering recency,
categories, and reading sequences among other factors. Their contextual bandit approach
seeks to determine a strategy mapping models to contexts in order to maximize the
expected rewards. They apply their contextual bandit both in NewsREEL Live and
NewsREEL Replay. The working note reports that performances vary by the domain
under consideration.
      </p>
      <p>
        Kumar et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] present d a hybrid recommender system for news. They combine
collaborative filtering with content-based filtering using a neural net architecture. Part of
the architecture models the relation amid users and items. The other part of the
architecture maps articles’ text onto a common latent space. The authors conduct an offline
experiment which compares their proposed method to three baselines. The experiment
focuses on readers who had previously read ten to fifteen articles. Their results favor their
approach over the baselines in terms of hitrate and normalized discounted cumulative
gain.
      </p>
      <p>The variety of methods used to address NewsREEL’s tasks indicate a large number of
connected research questions for the future. Most approaches achieved results superior
to the baseline and still yield the potential for further optimization.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>
        Similar to the past few iterations of CLEF NewsREEL [
        <xref ref-type="bibr" rid="ref13 ref6 ref9">6, 13, 9</xref>
        ], we were pleased to see
that participants trialled very diverse approaches to provide news recommendations. We
argue that this is due to the opportunity to evaluate recommendation algorithms in an
industry setting.
      </p>
      <p>Of both tasks, the evaluation in an online setting, referred to as NewsREEL Live
throughout the campaign, appears to be more attractive amongst participants. This is
also similar to previous years where we saw more teams evaluating their algorithms
using the Open Recommendation Platform run by plista. At the same time, this year,
an increasing number of participants also tested their algorithms in the offline setting,
referred to as NewsREEL Replay. One of the advantages of offline evaluation is that
it allows to benchmark algorithms that might not be suitable (yet) to be operated in an
online setting.</p>
      <p>Although the performance of the algorithms presented are promising, we argue that
there is still space for improvement with the aim of increasing the overall click-through
rate. We therefore would like to encourage researchers to perform more studies using
the data and infrastructure that has been provided as part of CLEF NewsREEL.
17. C. Ludmann. Recommending News Articles in the CLEF News Recommendation Evaluation
Lab with the Data Stream Management System Odysseus. In Working Notes of the 8th
International Conference of the CLEF Initiative, Dublin, Ireland. CEUR Workshop Proceedings,
2017.
18. M. Scriminaci, A. Lommatzsch, B. Kille, F. Hopfgartner, M. Larson, D. Malagoli, A. Serény,
and T. Plumbaum. Idomaar: A framework for multi-dimensional benchmarking of
recommender algorithms. In Proceedings of the Poster Track of the 10th ACM Conference on
Recommender Systems (RecSys 2016), Boston, USA, September 17, 2016., 2016.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>P. D. Beck</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Blaser</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Michalke</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          .
          <article-title>A System for Online News Recommendations in Real-Time with Apache Mahout</article-title>
          .
          <source>In Working Notes of the 8th International Conference of the CLEF Initiative</source>
          , Dublin, Ireland.
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P.</given-names>
            <surname>Bons</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kampstra</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T. van Kessel. A News</given-names>
            <surname>Recommender</surname>
          </string-name>
          <article-title>Engine with a Killer Sequence</article-title>
          .
          <source>In Working Notes of the 8th International Conference of the CLEF Initiative</source>
          , Dublin, Ireland.
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>T.</given-names>
            <surname>Brodt</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          .
          <article-title>Shedding light on a living lab: the CLEF NewsREEL open recommendation platform</article-title>
          .
          <source>In Fifth Information Interaction in Context Symposium</source>
          , IIiX '14,
          <string-name>
            <surname>Regensburg</surname>
          </string-name>
          , Germany,
          <source>August 26-29</source>
          ,
          <year>2014</year>
          , pages
          <fpage>223</fpage>
          -
          <lpage>226</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>C.</given-names>
            <surname>Golian</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Kuchar</surname>
          </string-name>
          .
          <article-title>News Recommender System based on Association Rules at CLEF NewsREEL 2017</article-title>
          .
          <source>In Working Notes of the 8th International Conference of the CLEF Initiative</source>
          , Dublin, Ireland.
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brodt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Seiler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Turrin</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Serény</surname>
          </string-name>
          .
          <article-title>Benchmarking news recommendations: The CLEF newsreel use case</article-title>
          .
          <source>SIGIR Forum</source>
          ,
          <volume>49</volume>
          (
          <issue>2</issue>
          ):
          <fpage>129</fpage>
          -
          <lpage>136</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Plumbaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brodt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Heintz</surname>
          </string-name>
          .
          <article-title>Benchmarking news recommendations in a living lab</article-title>
          .
          <source>In Information Access Evaluation</source>
          . Multilinguality, Multimodality, and Interaction - 5th
          <source>International Conference of the CLEF Initiative, CLEF</source>
          <year>2014</year>
          ,
          <article-title>Sheffield</article-title>
          , UK,
          <source>September 15-18</source>
          ,
          <year>2014</year>
          . Proceedings, pages
          <fpage>250</fpage>
          -
          <lpage>267</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brodt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heintz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Seiler. NewsREEL</surname>
          </string-name>
          <year>2014</year>
          :
          <article-title>Summary of the news recommendation evaluation lab</article-title>
          .
          <source>In Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15-18</source>
          ,
          <year>2014</year>
          ., pages
          <fpage>790</fpage>
          -
          <lpage>801</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brodt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Heintz</surname>
          </string-name>
          .
          <article-title>The plista dataset</article-title>
          .
          <source>In NRS'13: Proceedings of the International Workshop and Challenge on News Recommender Systems</source>
          , pages
          <fpage>14</fpage>
          -
          <lpage>21</lpage>
          . ACM, 10
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. G.</given-names>
            <surname>Gebremeskel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Seiler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Malagoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Serény</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brodt</surname>
          </string-name>
          , and
          <string-name>
            <surname>A. P. de Vries</surname>
          </string-name>
          . Overview of NewsREEL'16:
          <article-title>Multidimensional evaluation of real-time stream-recommendation algorithms</article-title>
          .
          <source>In Experimental IR Meets Multilinguality</source>
          , Multimodality, and Interaction - 7th
          <source>International Conference of the CLEF Association, CLEF</source>
          <year>2016</year>
          , Évora, Portugal, September 5-
          <issue>8</issue>
          ,
          <year>2016</year>
          , Proceedings, pages
          <fpage>311</fpage>
          -
          <lpage>331</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          , and
          <string-name>
            <surname>A. P. de Vries</surname>
          </string-name>
          .
          <article-title>A stream-based resource for multi-dimensional evaluation of recommender algorithms</article-title>
          .
          <source>In The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2017</year>
          ). ACM,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Seiler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Malagoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Serény</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Brodt</surname>
          </string-name>
          .
          <source>CLEF NewsREEL</source>
          <year>2016</year>
          :
          <article-title>Comparing multi-dimensional offline and online evaluation of news recommender systems</article-title>
          . In Working Notes of CLEF 2016 -
          <article-title>Conference and Labs of the Evaluation forum</article-title>
          , Évora, Portugal,
          <fpage>5</fpage>
          -
          <lpage>8</lpage>
          September,
          <year>2016</year>
          ., pages
          <fpage>593</fpage>
          -
          <lpage>605</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Turrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Serény</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brodt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Seiler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          .
          <source>Overview of CLEF NewsREEL</source>
          <year>2015</year>
          :
          <article-title>News recommendation evaluation lab</article-title>
          . In Working Notes of CLEF 2015 -
          <article-title>Conference and Labs of the Evaluation forum</article-title>
          , Toulouse, France, September 8-
          <issue>11</issue>
          ,
          <year>2015</year>
          .,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Turrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Serény</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brodt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Seiler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          .
          <article-title>Stream-based recommendations: Online and offline evaluation as a service</article-title>
          .
          <source>In Experimental IR Meets Multilinguality</source>
          , Multimodality, and Interaction - 6th
          <source>International Conference of the CLEF Association, CLEF</source>
          <year>2015</year>
          , Toulouse, France, September 8-
          <issue>11</issue>
          ,
          <year>2015</year>
          , Proceedings, pages
          <fpage>497</fpage>
          -
          <lpage>517</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. V.
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Khattar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gupta</surname>
            , and
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Varma</surname>
          </string-name>
          .
          <article-title>Deep Neural Architecture for News Recommendation</article-title>
          .
          <source>In Working Notes of the 8th International Conference of the CLEF Initiative</source>
          , Dublin, Ireland.
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Loni</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          .
          <source>CLEF NewsREEL</source>
          <year>2017</year>
          :
          <article-title>Contextual Bandit News Recommendation</article-title>
          .
          <source>In Working Notes of the 8th International Conference of the CLEF Initiative</source>
          , Dublin, Ireland.
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brodt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Seiler</surname>
          </string-name>
          , and Ö. Özgobek.
          <article-title>CLEF 2017 NewsREEL overview: A stream-based recommender task for evaluation and education</article-title>
          .
          <source>In 8th International Conference of the CLEF Association: Experimental IR Meets Multilinguality</source>
          , Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          (CLEF
          <year>2017</year>
          ). Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>