=Paper= {{Paper |id=Vol-2283/MediaEval_18_paper_12 |storemode=property |title=Predicting the Interest in News Based on Image Annotations |pdfUrl=https://ceur-ws.org/Vol-2283/MediaEval_18_paper_12.pdf |volume=Vol-2283 |authors=Alexandru Ciobanu,Andreas Lommatzsch,Benjamin Kille |dblpUrl=https://dblp.org/rec/conf/mediaeval/CiobanuLK18 }} ==Predicting the Interest in News Based on Image Annotations== https://ceur-ws.org/Vol-2283/MediaEval_18_paper_12.pdf
     Predicting the Interest in News based On Image Annotations
              Alexandru Ciobanu                               Andreas Lommatzsch                            Benjamin Kille
          Technische Universität Berlin                        DAI-Labor, TU Berlin                      DAI-Labor, TU Berlin
           alexandru.ciobanu@campus.                            andreas@dai-lab.de                    benjamin.kille@dai-labor.de
                   tu-berlin.de
ABSTRACT                                                                   2   PROBLEM DESCRIPTION
In recent years, the World Wide Web has changed from text-focused          Several domains demand to estimate items’ relevancy based on
web pages to multi-media sources featuring photos, videos, and             images. In this work, we address the task defined by NewsREEL
audio. The worldwide growth of broadband connections has facili-           Multimedia [7]. We determine the most relevant news items based
tated this trend and supports the spread of user-generated content.        on the multimedia dataset provided by the task organizers. The data
Navigating and finding interesting content has become a difficult          include news articles, images displayed next to them, and interac-
challenge. In this paper, we present approaches which use visual           tions with readers. We report the evaluation metrics precision at ten
features to predict how interesting a news article will be. This           (Prec@10) as well as precision at the top ten percent (Prec@10%).
task is part of the NewsREEL Multimedia challenge. The challenge           We consider each news portal (“domain”) independently. More de-
provides a large-scale data set of news items, images, and interac-        tails can be found in [7].
tions. We implement a recommender system which can distinguish
interesting articles from irrelevant ones based on image features.         3   RELATED WORK
We evaluate the system’s throughput and predictions. We explain
                                                                           Recommender systems support users in finding the most interest-
our insights and outline ideas to apply the gained knowledge in
                                                                           ing information. Traditionally, recommender system analyze user
additional domains.
                                                                           profiles and provide recommendations based on the similarity in
                                                                           the user behavior (“Collaborative Filtering” [4, 5]). In the world-
KEYWORDS                                                                   wide web, users can anonymously access most websites as they
Multimedia, News, Recommender Systems, Image Analysis                      relinquish login procedures. As a result, systems lack access to com-
                                                                           prehensive user profiles. They rely on session-based approaches or
1     INTRODUCTION                                                         content-based filtering instead.
The number of documents and news articles published on the World              Item-based recommender algorithms correlate item features and
Wide Web has increased dramatically. Users struggle to find rele-          user feedback which is taken to indicate the interest in the items.
vant items. Recommender systems support users by reducing infor-           Item features can be defined based on the item content. Typically,
mation overload. They analyze users’ behavior toward items and             text-mining approaches or semantic algorithms—describing the
derive patterns to determine the most relevant items. Collaborative        item based on ontologies—are used to obtain the item features [6, 8].
filtering and content-based filtering have become the most widely             Reduced computational costs have facilitated deep learning ap-
used algorithms for recommender systems.                                   proaches for recognizing patterns in images. Deep Learning frame-
    Multimedia content—e.g. photos, videos, and audio—permeate             works, such as Tensorflow [1] or Keras [2] trained on large image
our everyday lives. More and more services emerge that enable us           collections try to automatically identify concepts in images and
to share photos and videos. Still, research on recommender systems         to label images with meaningful terms. The quality of the image
has yet to leverage multimedia content. This work contributes to           annotations depends on the concrete scenario and the size of the
this effort by focusing on the use of image data for recommending          training dataset.
news. In particular, we use methods which automatically determine             The use of automatically computed image features for news
fitting descriptors for images. The task asks us to estimate how           recommender systems is still a topic for future research. Several
interesting freshly published news articles will become. The eval-         case studies [3] suggest that there is a potential for developing
uation setting equates interestingness with popularity due to the          useful recommender systems based on visual image features. This
lack of user profiles. Hence, we focus on non-personalized recom-          motivates us to implement new recommendation algorithms with
mender systems. We hypothesize that images play a decisive role            image features. The subsequent sections explain our approach and
as they capture users’ attention. Thus, we use image annotations           the implementation.
to implement an estimator.
    The remainder of the paper is structured as follows: In Sec. 2         4   APPROACH
we recapitulate the scenario. Sec. 3 discusses related work. We            We consider only the images displayed next to the news items.
present the approach in Sections 4. Subsequently, Sec. 5 illustrates       We ignore additional meta-data or textual features such as text
the evaluation results. Finally, Sec. 6 details our findings and gives     snippets or headlines. In the first step, we annotate the images. We
an outlook to future research.                                             use Google Vision—Google’s Image Annotation Service—to ensure
                                                                           reliable labels. We annotate all images provided by the dataset in
MediaEval ’18, 29-31 October 2018, Sophia Antipolis, France
© 2018 Copyright held by the owner/author(s).
                                                                           NewsREEL Multimedia. Google Vision outputs a list of labels and
                                                                           their probabilities. We use the five most likely labels.
MediaEval ’18, 29-31 October 2018, Sophia Antipolis, France


   Having inspected the image annotations, we recognized the             Table 1: Evaluation results for our approach grouped by
need to process the labels further. Many labels exhibited a too fine-    news portals and weeks. Columns refer to precision at ten
grained level of details. Consequently, we have trimmed the labels       (P@10), precision at the top ten percent (P@10%), and aver-
to the first word. For instance, “football equipment and supplies” has   age precision at the top ten percent (AP@10%).
become “football”.
   Our approach assumes that the labels represent the key infor-                        domain 13554            domain 39234           domain 17614
                                                                                       P@ P@ AP@               P@ P@ AP@              P@ P@ AP@
mation to estimate how exciting news items are. We use the im-
                                                                             Week       10 10%    10%           10 10%    10%          10 10%    10%
pression information in the training dataset—available for weeks
                                                                             04       0.70    0.63     0.58   0.10    0.14    0.11    0.10   0.25     0.21
one to three, and six to eight—to train an estimator. In other words,
                                                                             10       0.70    0.56     0.51   0.00    0.14    0.10    0.00   0.28     0.19
we calculate the number of impression for each label. Some labels            11       0.70    0.58     0.51   0.00    0.15    0.10    0.00   0.27     0.19
appear in more articles than others. Still, readers’ preferences re-         12       0.70    0.54     0.52   0.10    0.17    0.11    0.00   0.27     0.19
main uncertain. Thus, we normalize the labels’ weights obtaining             avg.     0.70    0.58     0.53   0.05    0.15    0.11    0.03   0.27     0.20
the average impression per label. As a result, we get three figures:
the total number of impressions, the average number of impres-
sions, and the number of articles linked to the label. We carry out      to indicate that the computed annotations are only suitable for pre-
the calculations for each news portal separately. This accounts for      diction the popularity of items in certain domains. Moreover, the
variations in topics amid publishers. Furthermore, the publishers        importance of images for the popularity of images may differ on
vary concerning the number of impressions which could bias our           the considered news portals.
features.
   We estimate an item’s popularity based on the five labels as-         6    CONCLUSION
signed to its accompanying image. Subsequently, we sort the items
                                                                         In this paper, we have presented several approaches for estimating
according to their scores and submit the top items to the task orga-
                                                                         the interests in news items based on visual features. Results show
nizers.
                                                                         that our approach outperforms the baseline. Still, textual features
                                                                         seem to contain more information than visual features. We have
                                                                         observed varying levels of performance depending on the publisher.
5   EVALUATION                                                           For some publishers—e.g. 13554—visual features perform far above
Our model uses 94 000 news articles and 704 000 labels. The au-          the baseline. For other publishers, on the other hand, differences
tomatic annotation failed for about 1.4% of all articles. In some        remain small.
cases, Google Vision failed to provide labels. In other cases, labels       Our approach determines fitting descriptors for images. Thereby,
exhibited a low probability. We successfully process 96% of all items    we optimize the recommendations indirectly. We suppose that read-
contained in the test set. For the remaining 4% either no label was      ers engage with concepts related to labels. Alternatively, we could
found or the label did not exist in the training set. We explored a      hypothesize that readers react more strongly to the image rather
variety of hyper-parameters to optimize our estimates. For instance,     than the concept. If this thesis holds, we may be better off designing
we varied the number of labels and the weeks used to train the           low-level image features.
estimator. We have validated different settings. Eventually, we ob-         We plan to extend this line of research. Currently, our system
served the best performance for the configuration with five labels       considers labels separately. We will develop a model for label cate-
and the entire training data. We have submitted these estimates to       gories which will allow us to improve the preprocessing. Besides, we
the task organizers. We obtained results regarding precision at ten,     will further analyze the labels for each domain. We expect manual
precision at the top ten percent, as well as average precision at the    inspect to provide valuable clues on how to improve performance.
top ten percent.
   Table 1 lists the results for publishers 13554, 17614, and 39234.     REFERENCES
The results show that the prediction quality highly depends on           [1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat,
the news portal. Our approach performs very successfully for pub-            G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray,
                                                                             B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng.
lisher 13554. Our method achieves 70% Precision@10, and 58%                  Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th
Precision@10%. Analyzing the model in detail, we find that pho-              USENIX Conference on Operating Systems Design and Implementation, OSDI’16,
                                                                             pages 265–283, Berkeley, CA, USA, 2016. USENIX Association.
tos of German car brands and items comparing different cars are          [2] F. Chollet et al. Keras. https://keras.io, 2015.
popular on this domain; articles without car exterior photos (e.g.       [3] F. Corsini and M. Larson. CLEF NewsREEL 2016: Image based Recommendation.
portraits, buildings and cockpit designs) get only a small number            In Working Notes of the 7th International Conference of the CLEF Initiative, Evora,
                                                                             Portugal. CEUR Workshop Proceedings, 2016.
of impressions.                                                          [4] J. Herlocker, J. Konstan, A. Bochers, and J. Riedl. An algorithmic framework
   For the domain 17614 our approach outperforms the baseline [7],           for performing collaborative filtering. In Proceedings of the 22nd International
but reaches a lower precision for portal 13554. Analyzing the anno-          Conference on Research and Development in Information Retrieval (SIGIR’99), 1999.
                                                                         [5] Y. Koren and R. Bell. Advances in Collaborative Filtering, pages 145–186. Springer
tations most important for classifying the items on website 17614,           US, Boston, MA, 2011.
we find that images annotated with police and transportation are         [6] A. Lommatzsch. Semantic Movie Recommendations, chapter 5, pages 133–154.
                                                                             Advances in Computer Vision and Pattern Recognition. Springer International
popular in this domain.                                                      Publishing, Smart Information Systems edition, 2015.
   For the domain 39234 the approach performs similar to the base-       [7] A. Lommatzsch, B. Kille, M. Larson, F. Hopfgartner, and L. Ramming. NewsREEL
line. The big variance in the observed prediction performance seems          Multimedia at MediaEval 2018: News Recommendation with Image and Text
                                                                             Content. In Procs. of MediaEval 2018.
Predicting the Interest in News based On Image Annotations                               MediaEval ’18, 29-31 October 2018, Sophia Antipolis, France


[8] P. Lops, M. Gemmis, and G. Semeraro. Content-based recommender systems: State      Springer, 2011.
    of the art and trends. In Recommender Systems Handbook, chapter 3, pages 73–105.