<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Predicting the Interest in News based On Image Annotations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andreas Lommatzsch DAI-Labor</string-name>
          <email>benjamin.kille@dai-labor.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>TU Berlin andreas@dai-lab.de</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alexandru Ciobanu Technische Universität Berlin tu-berlin.de</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Benjamin Kille DAI-Labor, TU Berlin</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Multimedia</institution>
          ,
          <addr-line>News, Recommender Systems, Image Analysis</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>29</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>In recent years, the World Wide Web has changed from text-focused web pages to multi-media sources featuring photos, videos, and audio. The worldwide growth of broadband connections has facilitated this trend and supports the spread of user-generated content. Navigating and finding interesting content has become a dificult challenge. In this paper, we present approaches which use visual features to predict how interesting a news article will be. This task is part of the NewsREEL Multimedia challenge. The challenge provides a large-scale data set of news items, images, and interactions. We implement a recommender system which can distinguish interesting articles from irrelevant ones based on image features. We evaluate the system's throughput and predictions. We explain our insights and outline ideas to apply the gained knowledge in additional domains.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>The number of documents and news articles published on the World
Wide Web has increased dramatically. Users struggle to find
relevant items. Recommender systems support users by reducing
information overload. They analyze users’ behavior toward items and
derive patterns to determine the most relevant items. Collaborative
ifltering and content-based filtering have become the most widely
used algorithms for recommender systems.</p>
      <p>Multimedia content—e.g. photos, videos, and audio—permeate
our everyday lives. More and more services emerge that enable us
to share photos and videos. Still, research on recommender systems
has yet to leverage multimedia content. This work contributes to
this efort by focusing on the use of image data for recommending
news. In particular, we use methods which automatically determine
iftting descriptors for images. The task asks us to estimate how
interesting freshly published news articles will become. The
evaluation setting equates interestingness with popularity due to the
lack of user profiles. Hence, we focus on non-personalized
recommender systems. We hypothesize that images play a decisive role
as they capture users’ attention. Thus, we use image annotations
to implement an estimator.</p>
      <p>The remainder of the paper is structured as follows: In Sec. 2
we recapitulate the scenario. Sec. 3 discusses related work. We
present the approach in Sections 4. Subsequently, Sec. 5 illustrates
the evaluation results. Finally, Sec. 6 details our findings and gives
an outlook to future research.</p>
    </sec>
    <sec id="sec-2">
      <title>PROBLEM DESCRIPTION</title>
      <p>
        Several domains demand to estimate items’ relevancy based on
images. In this work, we address the task defined by NewsREEL
Multimedia [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We determine the most relevant news items based
on the multimedia dataset provided by the task organizers. The data
include news articles, images displayed next to them, and
interactions with readers. We report the evaluation metrics precision at ten
(Prec@10) as well as precision at the top ten percent (Prec@10%).
We consider each news portal (“domain”) independently. More
details can be found in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>
        Recommender systems support users in finding the most
interesting information. Traditionally, recommender system analyze user
profiles and provide recommendations based on the similarity in
the user behavior (“Collaborative Filtering” [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]). In the
worldwide web, users can anonymously access most websites as they
relinquish login procedures. As a result, systems lack access to
comprehensive user profiles. They rely on session-based approaches or
content-based filtering instead.
      </p>
      <p>
        Item-based recommender algorithms correlate item features and
user feedback which is taken to indicate the interest in the items.
Item features can be defined based on the item content. Typically,
text-mining approaches or semantic algorithms—describing the
item based on ontologies—are used to obtain the item features [
        <xref ref-type="bibr" rid="ref6 ref8">6, 8</xref>
        ].
      </p>
      <p>
        Reduced computational costs have facilitated deep learning
approaches for recognizing patterns in images. Deep Learning
frameworks, such as Tensorflow [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or Keras [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] trained on large image
collections try to automatically identify concepts in images and
to label images with meaningful terms. The quality of the image
annotations depends on the concrete scenario and the size of the
training dataset.
      </p>
      <p>
        The use of automatically computed image features for news
recommender systems is still a topic for future research. Several
case studies [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] suggest that there is a potential for developing
useful recommender systems based on visual image features. This
motivates us to implement new recommendation algorithms with
image features. The subsequent sections explain our approach and
the implementation.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>APPROACH</title>
      <p>We consider only the images displayed next to the news items.
We ignore additional meta-data or textual features such as text
snippets or headlines. In the first step, we annotate the images. We
use Google Vision—Google’s Image Annotation Service—to ensure
reliable labels. We annotate all images provided by the dataset in
NewsREEL Multimedia. Google Vision outputs a list of labels and
their probabilities. We use the five most likely labels.</p>
      <p>Having inspected the image annotations, we recognized the
need to process the labels further. Many labels exhibited a too
finegrained level of details. Consequently, we have trimmed the labels
to the first word. For instance, “football equipment and supplies” has
become “football”.</p>
      <p>Our approach assumes that the labels represent the key
information to estimate how exciting news items are. We use the
impression information in the training dataset—available for weeks
one to three, and six to eight—to train an estimator. In other words,
we calculate the number of impression for each label. Some labels
appear in more articles than others. Still, readers’ preferences
remain uncertain. Thus, we normalize the labels’ weights obtaining
the average impression per label. As a result, we get three figures:
the total number of impressions, the average number of
impressions, and the number of articles linked to the label. We carry out
the calculations for each news portal separately. This accounts for
variations in topics amid publishers. Furthermore, the publishers
vary concerning the number of impressions which could bias our
features.</p>
      <p>We estimate an item’s popularity based on the five labels
assigned to its accompanying image. Subsequently, we sort the items
according to their scores and submit the top items to the task
organizers.</p>
    </sec>
    <sec id="sec-5">
      <title>5 EVALUATION</title>
      <p>Our model uses 94 000 news articles and 704 000 labels. The
automatic annotation failed for about 1.4% of all articles. In some
cases, Google Vision failed to provide labels. In other cases, labels
exhibited a low probability. We successfully process 96% of all items
contained in the test set. For the remaining 4% either no label was
found or the label did not exist in the training set. We explored a
variety of hyper-parameters to optimize our estimates. For instance,
we varied the number of labels and the weeks used to train the
estimator. We have validated diferent settings. Eventually, we
observed the best performance for the configuration with five labels
and the entire training data. We have submitted these estimates to
the task organizers. We obtained results regarding precision at ten,
precision at the top ten percent, as well as average precision at the
top ten percent.</p>
      <p>Table 1 lists the results for publishers 13554, 17614, and 39234.
The results show that the prediction quality highly depends on
the news portal. Our approach performs very successfully for
publisher 13554. Our method achieves 70% Precision@10, and 58%
Precision@10%. Analyzing the model in detail, we find that
photos of German car brands and items comparing diferent cars are
popular on this domain; articles without car exterior photos (e.g.
portraits, buildings and cockpit designs) get only a small number
of impressions.</p>
      <p>
        For the domain 17614 our approach outperforms the baseline [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
but reaches a lower precision for portal 13554. Analyzing the
annotations most important for classifying the items on website 17614,
we find that images annotated with police and transportation are
popular in this domain.
      </p>
      <p>For the domain 39234 the approach performs similar to the
baseline. The big variance in the observed prediction performance seems
to indicate that the computed annotations are only suitable for
prediction the popularity of items in certain domains. Moreover, the
importance of images for the popularity of images may difer on
the considered news portals.</p>
    </sec>
    <sec id="sec-6">
      <title>6 CONCLUSION</title>
      <p>In this paper, we have presented several approaches for estimating
the interests in news items based on visual features. Results show
that our approach outperforms the baseline. Still, textual features
seem to contain more information than visual features. We have
observed varying levels of performance depending on the publisher.
For some publishers—e.g. 13554—visual features perform far above
the baseline. For other publishers, on the other hand, diferences
remain small.</p>
      <p>Our approach determines fitting descriptors for images. Thereby,
we optimize the recommendations indirectly. We suppose that
readers engage with concepts related to labels. Alternatively, we could
hypothesize that readers react more strongly to the image rather
than the concept. If this thesis holds, we may be better of designing
low-level image features.</p>
      <p>We plan to extend this line of research. Currently, our system
considers labels separately. We will develop a model for label
categories which will allow us to improve the preprocessing. Besides, we
will further analyze the labels for each domain. We expect manual
inspect to provide valuable clues on how to improve performance.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Devin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghemawat</surname>
          </string-name>
          , G. Irving,
          <string-name>
            <given-names>M.</given-names>
            <surname>Isard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kudlur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Levenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Monga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Murray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Steiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tucker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vasudevan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Warden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wicke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Zheng</surname>
          </string-name>
          .
          <article-title>Tensorflow: A system for large-scale machine learning</article-title>
          .
          <source>In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation</source>
          , OSDI'
          <volume>16</volume>
          , pages
          <fpage>265</fpage>
          -
          <lpage>283</lpage>
          , Berkeley, CA, USA,
          <year>2016</year>
          . USENIX Association.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Chollet</surname>
          </string-name>
          et al. Keras. https://keras.io,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Corsini</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          .
          <source>CLEF NewsREEL</source>
          <year>2016</year>
          :
          <article-title>Image based Recommendation</article-title>
          .
          <source>In Working Notes of the 7th International Conference of the CLEF Initiative, Evora, Portugal. CEUR Workshop Proceedings</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Herlocker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Konstan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bochers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Riedl</surname>
          </string-name>
          .
          <article-title>An algorithmic framework for performing collaborative filtering</article-title>
          .
          <source>In Proceedings of the 22nd International Conference on Research and Development in Information Retrieval (SIGIR'99)</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Koren</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Bell</surname>
          </string-name>
          .
          <source>Advances in Collaborative Filtering</source>
          , pages
          <fpage>145</fpage>
          -
          <lpage>186</lpage>
          . Springer US, Boston, MA,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          . Semantic Movie Recommendations, chapter
          <volume>5</volume>
          , pages
          <fpage>133</fpage>
          -
          <lpage>154</lpage>
          . Advances in Computer Vision and Pattern Recognition. Springer International Publishing,
          <source>Smart Information Systems edition</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Ramming</surname>
          </string-name>
          . NewsREEL Multimedia at MediaEval 2018:
          <article-title>News Recommendation with Image and Text Content</article-title>
          .
          <source>In Procs. of MediaEval</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lops</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gemmis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Semeraro</surname>
          </string-name>
          .
          <article-title>Content-based recommender systems: State of the art and trends</article-title>
          .
          <source>In Recommender Systems Handbook, chapter 3</source>
          , pages
          <fpage>73</fpage>
          -
          <lpage>105</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>