<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CLEF NewsREEL 2016: Image-based Recommendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francesco Corsini</string-name>
          <email>corsinifrancesco0@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martha Larson</string-name>
          <email>m.a.larson@tudelft.nl</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Delft University of Technology</institution>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Radboud University Nijmegen</institution>
          ,
          <country country="NL">Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Our approach to the CLEF NewsREEL 2016 News Recommendation Evaluation Lab investigates the connection between images and users clicking behavior. Our goal is to gain a better understanding of the contribution of visual representations accompanying images (thumbnails) to the success of news recommendation algorithms as measured by standard metrics. We experiment with visual information, namely Face Detection and Saliency Map, extracted from the images that accompany news items to see if they can be used to chose news items that have a higher chance of being clicked by users. Initial results seems to suggest great CTR improvement in the Simulated Environment task, while some decrease in performance has been found in the Living Lab task. The latter result must be further validated in the future.</p>
      </abstract>
      <kwd-group>
        <kwd>Recommender System</kwd>
        <kwd>News</kwd>
        <kwd>Image Analysis</kwd>
        <kwd>Face Detection</kwd>
        <kwd>Saliency Map</kwd>
        <kwd>Evaluation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The CLEF NewsREEL [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] News Recommendation Evaluation Lab challenges
participants to come up with an original and e ective solution for providing
recommendations for users in the news environment. Our participation is both for
Task 1 (Living Lab Evaluation) and Task 2 (Evaluation in Simulated
Environment). An overview of this year challenge results can be found at [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>Typical online news content providers publish images along with their news
items. Our work is motivated by the conjecture that these images play a role
in the e ect of the recommendation, especially whether a user will click on the
item. Content providers are well aware of the importance of images and are
already taking advantage of them (e.g., both their informative potential, and their
potential to act as clickbait). However, the e ect of images for automatic
recommendations is currently understudied and not well understood. Our research
looks for the e ect of such images, in order to determine if they can play a crucial
role in the de nition of a more re ned recommendation. Our hypothesis is that
people tend to click on news articles because they are curious about the image,
as the image catches their eye, and some images depict things clearly making it
very easy to see what the article is actually about. Speci cally, in this work, we
will focus on the usefulness of information about faces appearing and saliency
in images. The Open Recommendation Platform (ORP) by plista provided a
unique framework to test and benchmark our approach. Given the constraints
of the online environment (100ms timeout response time, unpredictable load on
the server), new and innovative architectures and algorithms were developed in
order to deal with the heavy computational load caused by the image analysis.
Our research also investigates whether features extracted from images can be
used in a real-time recommendation pipeline.</p>
      <p>The rest of the paper is organized as following: in section 2 we discuss the
related work on how to trigger interest on images presented, plus the background
needed to understand our approach to image classi cation. Section 3 describes
our approach to solve the challenges presented in Task 1 and 2 and here our
algorithm is presented. The outcome of our experiments and the results of the
evaluations is presentend in section 4. The discussion 5 follows presenting future
work and a wrap up for the conclusion.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work and Background</title>
      <sec id="sec-2-1">
        <title>Grabbing Attention</title>
        <p>
          In this section, we discuss factors that trigger our eyes to land on an image. With
content-based image retrieval on the rise, there is an increase in the study of cues
that could help in ranking the retrieved images. A sound measure that would help
to automatically rank is how interesting people nd an image. Much research
has been devoted to the study of interestingness on the Internet, especially with
Flicker images, e.g., [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. However, this sort of interestingness is di erent from
what we investigate here. Speci cally, it implies some sort of community and
social behavior that goes beyond the e ect of images merely catching the eye.
The presence of this kind of behavior cannot be assumed to be present in news
recommendation environment, where the images come from the news provider,
rather than being contributed by community members. Flickr's interestingness
is based on social parameters linked to the behavior, i.e., according to the
uploader's score reputation and ratio between views, favorites and comments. As
example, images with a positive connotation (smile, bright), tend to always have
a higher level of interestingness in social media.
        </p>
        <p>
          Other related research comes from the area of advertising. An accurate
prediction of the probability that users click on ads is crucial for the online
advertisement business. Even if with di erent methods, both our work and ads
business share the same goal: predict (and increase) how many clicks an
image(or an ad) receives. State-of-the-art click through rate prediction algorithms
rely heavily on historical information collected for advertisers, users and
publishers. However, recent work has seen the integration of multimedia features
extracted from display ads into the click prediction models [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The
features related to an increase in CTR are numerous. In particular, Cheng et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]
present an extensive list of image features and their correlation with CTR. In
this study, we focus on key features from [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], chosen because of their promise
and their feasibility in being deployed in an online environment. From a study
of the literature, we found two of most interesting and investigation-worthy
features: the presence of a person [13] [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] [12], especially when having a face
clearly visible facing the camera, and the analysis of the saliency map to detect
aesthetics and simplicity [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] [11]. However, due to unexpected technical
issues during the implementation of these features, only the presence of a person
(face detection) was fully developed at the start of the Task 1 challenge. For this
reason, it was the only one adopted for consistency throughout all the Task 1
evaluation window. However both features have been tested together in the Task
2 part of the challenge.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Image Classi cation</title>
        <p>Our approach is based on a straightforward binary image classi er, which
classi es the image of the target item (thumbnail) as either \interesting" or \not
interesting". The motivation behind this choice of binary classi ers is the lack
of time resources and easy management of the results; a better and more re ned
approach to the classi cation (e.g. degrees of interestingness) is planned in
future work 5.3. The classi cation process can be summarized simply as follows:
According to our research an image is interesting if it either has:
{ The presence of a person: A single central person (portrait) is preferred over
multiple people all over the image
{ A single cluster in the middle of the image with a at background. A single
object is preferred over multiple objects
As for example, the Fig. 1a and 1b are considered \interesting", 1a for the
presence of a face and 1b for satisfying the single object in the center. While 1c
does not satisfy either of the two requirements.</p>
        <p>
          (a) Face
(c) Not interesting
(b) Salient
Our approach was designed to validate our hypothesis that images impact user
clicks on recommendations rather than to reach the maximum possible CTR.
The Living Lab Evaluation [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] (Task 1) was executed on the ORP, where part
of plista's tra c is redirected. The ORP makes it possible to deploy and test
algorithms in a real environment. The platform uses HTTP protocol
supporting JSON format for data. Communication is handled by four types of
messages: Recommendation requests, Impressions, Item Updates, Error Messages.
The timeout for the waiting for the response is 100ms: if the system does not
answer within this timeframe, the request is considered as an \error"
The Evaluation in Simulated Environment [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] (Task 2) o cially makes use
of a set of data provided by the NewsREEL organizers. The set includes item
updates and event noti cation [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. However, this o cial dataset did not have a
crucial eld which was required by our image-based algorithm: the img url.
Although the eld itself is present, the o cial data set was collected in June 2013
and the most part of the links have disappeared since the images are hosted
by the publishers themselves. Domains tend to remove the items (especially
images) after some time of inactivity, by cleaning their databases of old dated
articles, as they take much space and do not generate any kind of tra c. Our
participation in CLEFNewsREEL using the \o cial" dataset is, for this reason,
compromised. However, this fact did not prevent us from testing our algorithms
on another o ine dataset. The data used are daily dumps from the plista ORP
platform, just like the original dataset with a much more recent date (May 2016).
The algorithms developed and tested are the following:
{ Task 1: Baseline1
{ Task 1: Baseline1 + Faces
{ Task 2: Baseline1
{ Task 2: Baseline1 + Faces
{ Task 2: Baseline1 + Faces + Salience
{ Task 2: Baseline2
{ Task 2: Baseline2 + Faces + Salience
        </p>
        <p>Baseline1 is a Popularity with a freshness windows of 100 items, while
Baseline2 is Random with the same freshness windows. For the remaining part of the
paper these two algorithms will be called Pop100 and Rand100. By looking at
the di erence between the image enhanced algorithm and the relative baseline
we can understand the e ectiveness of image-based recommendation in the news
environment.
3.1</p>
      </sec>
      <sec id="sec-2-3">
        <title>Algorithm</title>
        <p>
          Although the algorithms deployed in the Living Lab Evaluation (Task 1) di ered
from the one deployed in the Evaluation in Simulated Environment (Task 2), the
logic behind them is quite similar and can be summarized as follows:
A recency windows for each combination of category/domain is created, each
window encompassing 100 items. Every time a new update comes in, it is
processed by taking the url img eld and scraping the corresponding image from
the website. Features for the image are computed with our image processing
algorithms, namely Viola-Jones [14] for face detection and spectral residuals [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
for the saliency map. The saliency map involves the extraction of several
subfeatures (e.g., number of objects and their positions, background to foreground
ratio) which are then used to detect if the image satie es the requirement of
being a single cluster in the middle of the image. This newly processed item is
then added to the possible recommendations list, while the oldest item in the
list is discarded (if full).
        </p>
        <p>For the Pop100 algorithm: These items are sorted by a popularity score, which
is an aggregation of how many impressions the item has received plus how many
clicks it received in previous recommendations. Whenever a recommendation
request arrives, the top N items are selected and only picked if they individually
satisfy the \visual requirements" (see 2.2). If not enough items have been
gathered before the top C elements have been considered, then standard popularity
is used instead, without taking into consideration the \visual requirements" in
order to ll the remaining spots. For the Rand100 algorithm, the logic is the
same, however the ranking step is replaced by a random picking of items. The
rst C random times the item will be picked only if it satis es the \visual
requirements", after C times this restriction decays.</p>
        <p>The constant C has been determined from empirical testing, and it can be
interpreted as a tradeo between \being interesting" and \following the baseline".
In case of Pop100, the smaller C the most the items will be popular and less
\visually interesting". As our intention here is to test if the visual component
has an e ect, C has been intentionally exaggerated in order to make the e ect
more notable.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <sec id="sec-3-1">
        <title>Living Lab Evaluation</title>
        <p>The Online results show the data obtained from the scoreboard in the ORP
during the evaluation window. Although the evaluation itself ran for around
40 days, not all the days have been taken in consideration due to issues which
resulted in the recommender receiving a low volume of requests. As a result,
only 24 days have been considered for the results. In order to answer our research
question we decided to benchmark our image enhanced algorithm against its own
baseline without image information. As for the Online, Pop100 is the baseline.</p>
        <p>As can be seen from the Fig. 2: although the image enhanced recommender
had more overall clicks, the baseline performed better in CTR value over long
period of time. The Pop100+Faces sees a 28% decrease in CTR over the
baseline Pop100. Our conclusion is that the lower result is actually due to a mixture
of technical problems that most likely undermined the performance of the
algorithm. A rundown of the problems can be found in the discussion in section
5.1
4.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Evaluation in Simulated Environment</title>
        <p>The Task 2 evaluation was done by using the dataset from ORP daily dumps.
Three non-consecutive days have been used as a test set. We consider three days
to be the minimum-sized data set large enough to provide a reliable comparison.
Each day has an average of 68.000 requests. Since the algorithm running in the
Task 1 environment accumulated a total of 175.000 requests over a month, we
needed three days to reach approximately the same number of requests to have a
comparable size for the dataset. Further testing is planned over a larger dataset
in the future. The evaluation metric works as following: A recommendation is
a successful hit if the user lands on the recommended page within 10 minutes
of navigating the website. In this testing we conducted tests over two di erent
baselines: Rand100 and Pop100. Rand100 was introduced in order to "weaken"
the strength of the baseline algorithm in order to better show the e ect of the
Image features. The results can be seen in the Table 1</p>
        <p>Introducing Image-based recommendation leads to a clicks increase of 51%
with respect to the baseline Rand100, while the increase is 36% with respect to
the Pop100 when considering only faces, 22% with both features.
Rand100 258
Rand100+Face+Salience 390
Pop100 630
Pop100+Face 857
Pop100+Face+Salience 771
The results from the Task 1 and Task 2 evaluation di er: we think that this may
be due to the inherent di erence between the testing environments. We discuss
this with more details in this section.
The results gathered during the evaluation window of a month suggest that the
baseline (Pop100) performs better than the image-based algorithm. This can be
partially attributed to the technical problems which the image-based algorithm
faced when running online.</p>
        <p>One of the problem encountered was to make the algorithm fast enough to keep
up with the ORP rate of updates. While the requests sent by the platform do
follow the performance of the algorithm (if the algorithm is struggling less requests
are sent), this does not apply to the updates; therefore all the updates are sent
at anytime. Updates are the \computationally intensive" part in our algorithm,
as each update usually comes with an image that needs to be downloaded and
analyzed. Updates tend to come in groups of 10 or more, making it necessary
to queue them. Even when trying to solve the matter with various strategies, it
sometimes happened that the next batch of updates came before the queue was
all processed, making the queue longer and the processing time even longer, thus
making the problem worse: if repeated enough times the server would crash and
get rebooted, therefore going through a new cold start period. Longer queue and
longer processing time meant longer delay to answer recommendation requests
as well, thus failing due to the timeout time. The time resources available for
this research were necessary limited and not all solutions to this problem have
been explored.
5.2</p>
      </sec>
      <sec id="sec-3-3">
        <title>Evaluation in Simulated Environment</title>
        <p>The Evaluation method used in this task does make the CTR quite worse than
the one obtained in Task 1, as there is no actual user answering directly to
the recommendation shown. Therefore no direct CTR comparison can be made.
However the di erence between the baseline and the baseline+visual information
can be used to infer the e ect of such features.</p>
        <p>For both baselines Rand100 and Pop100 we can see a signi cant improvement of
the CTR when we make use of the Image information. As expected the increase
is bigger in the "weaker" baseline, Rand100. However the most striking di
erence is the improved performance over the Pop100, especially when compared
with the results of the similar experiment conducted Task 1. This strengthens
our idea that the Task 1 implementations results were jeopardized by the poor
technical performance rather than the Image-based recommendation model.
5.3</p>
      </sec>
      <sec id="sec-3-4">
        <title>Future Work</title>
        <p>The algorithm and the approach developed during this challenge was intended
to be an exploratory task. Much is still needed to indeed prove the real e ect of
images on the recommendation.</p>
        <p>Both Task 1 and Task 2 testing needs to be continued on all the possible
combinations of baselines and features used in this paper, in order to test both the
single e ect of the features independently and their strength against di erent
baselines. This is especially needed in order to investigate further the di erence
between Task 1 and Task 2, especially in light of the results obtained in this
paper. A larger dataset (including images) needs to be used for testing in Task
2. This is our aim in the forthcoming future. Improvement in e ciency and
running times are needed in order to allow the algorithm to properly work in an
Living Lab environment. The current implementation has many aws that likely
resulted in many delays and worse CTR. A possible approach could be to not
compute images until they reach a minimum level of popularity: this would lter
out many \socially uninteresting" images.</p>
        <p>Although this paper has focused its attention on the exploitation of high level
visual clues (people, saliency map), a more in depth analysis of other feature
classes may reveal useful insights. Notable global features include colorfulness,
brightness and saturation. Another interesting approach could be the inclusion
of visual information of how and where the recommendation is displayed (website
related features). All of this on top of a more re ned approach to the classi
cation, by introducing di erent degrees of interestingness in the process.
5.4</p>
      </sec>
      <sec id="sec-3-5">
        <title>Conclusion</title>
        <p>Task 1 and Task 2 results seems to contradict each other at the rst look. Task
2 shows an increase of the recommender performance while Task 1 shows a
decrease. We can partially explain the di erence by the fact that early Task 1
implementation ran in technical di culties typical of the online environment,
which partially jeopardized the nal outcome.</p>
        <p>By looking at the Task 2 results we can clearly see an improvement of the CTR
when introducing image-based recommendations. This initial result seems to
suggest a great improvement even when combined with already strong baselines
(Popularity/Recency). More experiments with di erent baseline combinations
and settings are required in the future to de nitively prove the e ectiveness
of image-based recommendation in the news environment. We think that the
results shown in this paper provide a good initial con rmation of its potential.
11. Judith A . Redi and Isabel Povoa. The Role of Visual Attention in the Aesthetic
Appeal of Comsumer Images: a Preliminary Study. In Visual Communications and
Image Processing (VCIP). Intelligent Systems, Delft University of Technology, The
Netherlands, 2013.
12. Paola Ricciardelli, Cristina Iani, Luisa Lugli, Antonello Pellicano, and Roberto
Nicoletti. Gaze direction and facial expressions exert combined but di erent e ects
on attentional resources. Cognition and Emotion, 26(6):1134{1142, 2012.
13. Andreas E. Savakis, Stephen P. Etz, and Alexander C. P. Loui. Evaluation of
image appeal in consumer photography. Proc. SPIE 3959, 3959:111{120, 2000.
14. P Viola and M Jones. Rapid object detection using a boosted cascade of simple
features. Computer Vision and Pattern Recognition (CVPR), 1:I|-511|-I|-518,
2001.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Haibin Cheng, Roelof Van Zwol,
          <string-name>
            <surname>Javad Azimi</surname>
          </string-name>
          , Eren Manavoglu, Ruofei Zhang,
          <string-name>
            <surname>Yang Zhou</surname>
            , and
            <given-names>Vidhya</given-names>
          </string-name>
          <string-name>
            <surname>Navalpakkam</surname>
          </string-name>
          .
          <article-title>Multimedia Features for Click Prediction of New Ads in Display Advertising</article-title>
          .
          <source>In 18th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          , pages
          <volume>777</volume>
          {
          <fpage>785</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Sagnik</given-names>
            <surname>Dhar</surname>
          </string-name>
          , Tamara L. Berg, Stony Brook, Vicente Ordonez, and Tamara L.
          <article-title>Berg. High level describable attributes for predicting aesthetics and interestingness</article-title>
          .
          <source>Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition</source>
          , pages
          <volume>1657</volume>
          {
          <fpage>1664</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Xiaoli</given-names>
            <surname>Fern</surname>
          </string-name>
          .
          <article-title>The Impact of Visual Appearance on User Response in Online Display Advertising</article-title>
          .
          <source>Proceedings of the 21st international conference companion on World Wide Web</source>
          , pages
          <volume>457</volume>
          {
          <fpage>458</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M</given-names>
            <surname>Gygli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H</given-names>
            <surname>Grabner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H</given-names>
            <surname>Riemenschneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F</given-names>
            <surname>Nater</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L Van</given-names>
            <surname>Gool</surname>
          </string-name>
          .
          <source>The Interestingness of Images. Computer Vision</source>
          (ICCV),
          <year>2013</year>
          IEEE International Conference on, (iii):
          <volume>1633</volume>
          {
          <fpage>1640</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Frank</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          , Torben Brodt, Jonas Seiler, Benjamin Kille, Andreas Lommatzsch, Martha Larson, Roberto Turrin, and
          <string-name>
            <given-names>Andras</given-names>
            <surname>Sereny</surname>
          </string-name>
          .
          <article-title>Benchmarking news recommendations: The CLEF NewsREEL use case</article-title>
          .
          <source>SIGIR Forum</source>
          ,
          <volume>49</volume>
          (
          <issue>2</issue>
          ):
          <volume>129</volume>
          {
          <fpage>136</fpage>
          ,
          <year>January 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Frank</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          , Benjamin Kille, Andreas Lommatzsch, Till Plumbaum, Torben Brodt, and
          <string-name>
            <given-names>Tobias</given-names>
            <surname>Heintz</surname>
          </string-name>
          .
          <source>Benchmarking News Recommendations in a Living Lab</source>
          , pages
          <volume>250</volume>
          {
          <fpage>267</fpage>
          . Springer International Publishing,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Xiaodi</given-names>
            <surname>Hou</surname>
          </string-name>
          and
          <string-name>
            <given-names>Liqing</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <article-title>Saliency detection: A spectral residual approach</article-title>
          .
          <source>Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition</source>
          , (
          <volume>800</volume>
          ):
          <volume>1</volume>
          {
          <issue>8</issue>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Kille</surname>
          </string-name>
          , Frank Hopfgartner, Torben Brodt, and
          <string-name>
            <given-names>Tobias</given-names>
            <surname>Heintz</surname>
          </string-name>
          .
          <article-title>The plista Dataset</article-title>
          .
          <source>In 2013 International News Recommender Systems Workshop and Challenge</source>
          , pages
          <volume>16</volume>
          {
          <fpage>23</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Kille</surname>
          </string-name>
          , Andreas Lommatzsch, Gebrekirstos Gebremeskel, Frank Hopfgartner, Martha Larson, Jonas Seiler, Davide Malagoli, Andras Sereny, Torben Brodt, and Arjen de Vries.
          <article-title>Overview of NewsREEL'16: Multi-dimensional Evaluation of Real-Time Stream-Recommendation Algorithms</article-title>
          . In Norbert Fuhr, Paulo Quaresma, Birger Larsen, Teresa Goncalves, Krisztian Balog, Craig Macdonald, Linda Cappellato, and Nicola Ferro, editors,
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction 7th International Conference of the CLEF Association, CLEF</source>
          <year>2016</year>
          , Evora, Portugal, September 5-
          <issue>8</issue>
          ,
          <year>2016</year>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Benjamin</surname>
            <given-names>Kille</given-names>
          </string-name>
          , Andreas Lommatzsch, Roberto Turrin, Andras Sereny, Martha Larson, Torben Brodt, Jonas Seiler, and
          <string-name>
            <given-names>Frank</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          .
          <article-title>Stream-Based Recommendations: Online and O ine Evaluation as a Service</article-title>
          , pages
          <volume>497</volume>
          {
          <fpage>517</fpage>
          . Springer International Publishing,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>