News Images in MediaEval 2020
                                       Benjamin Kille1 , Andreas Lommatzsch1 , Özlem Özgöbek2
                                                           1 Berlin Institute of Technology, Germany
                                              2 Norwegian University of Science and Technology, Norway

                         benjamin.kille@dai-labor.de,andreas.lommatzsch@dai-labor.de,ozlem.ozgobek@ntnu.no

ABSTRACT                                                                          2     BACKGROUND AND RELATED WORK
Images play an important role in online news reading behavior.                    The Multimedia Evaluation Benchmark (MediaEval) investigates the
They attract users’ attention and they can decide whether users                   intersection of multimedia and recommender systems for the third
pay attention to some content over other content. News Images                     time in 2020. In 2018, the NewsREEL Multimedia1 task provided
in MediaEval 2020 aims to gain more insight into the interplay of                 data from multiple publishers concerning the interaction of users
news images and news consumption. Within this task, participants                  with content. In 2019, a subtask of the Multimedia RecSys2 featured
get access to a large set of articles and accompanying images. The                similar data.
task consists of two separate subtasks. Participants can choose to                   Publishers employ news recommender systems to personalize
participate in both or one of them. In the first subtask, participants            their services [5]. This emergence of ‘fake news’ has fueled the
have to predict which images and articles have been paired by                     interest in news recommender systems [9]. Research has picked
the publisher. In the second subtask, participants have to estimate               up on the demand and established venues to discuss the relation
the likelihood that users will click recommendations consisting of                of news recommendation and misinformation [8]. Besides the rec-
pairs of articles and images. This paper describes the task setting               ommended content, researchers devote more and more attention
in detail and draws connections to existing research. The overview                to the presentation. The research distributes across different areas.
illustrates the metrics and evaluation procedures that are used.                  Research on image analysis produces tools and models to extract
                                                                                  better features from image data. Recommender system research
                                                                                  strives to better understand personalization and user behavior. The
1    INTRODUCTION                                                                 subfield dedicated to news recommendation deals with the particu-
Online news articles are multimodal: the textual content of an ar-                larities of news. For instance, users exhibit a session-based interest
ticle is often accompanied by an image. The image illustrates the                 as opposed to long-term interests with regard to music, literature,
text’s content and attracts readers’ attention. Research in both mul-             or television. The subfield of multimedia recommendation delves
timedia and recommender systems domains generally assumes a                       deeper into how content can contribute to generating recommen-
simple relationship between images and text occurring together. For               dations for users.
instance, image captioning [4] often assumes that the caption quite
literally describes the image’s scenery. However, other research                  3     TASK DESCRIPTION
shows that when images accompany news articles, the relation-                     The task seeks to explore the relation between images and articles.
ship becomes more complicated [7]. The MediaEval 2020 News                        We define two subtasks, either or both of which participants can
Images Task investigates the real-world relationship of news text                 choose to take part in.
and images in more depth, in order to understand its implications
for journalism and news recommender systems.                                      3.1     Task 1: Image-Text Re-Matching
    The task branches into two subtasks, both of which participants               In practice, publishers employ staff to search for images to accom-
can address using text-based or image-based features. The first                   pany news articles. In many cases, the employees have access to
subtask focuses on predicting which images and articles have been                 imagery from the event. Sometimes, they select images from a data-
paired by the publisher, whereas the second subtask focuses on                    base (e.g. stock images). As a result, readers encounter pairs of
estimating the likelihood that users will click recommendations                   articles and images. This subtask has removed the link between im-
consisting of pairs of articles and images. Given these two sub-                  ages and articles. Thus, participants separately get a list of articles
tasks, the ultimate objective of this task is to gain additional insight          and images. Participants must develop suited models to reconstruct
about i) the relationship of news text and the images accompanying                the link between articles and images. These models can help us to
them. ii) the connection between the image and title shown by a                   understand what makes an image fitting to an article.
recommender system to users, and iii) the tendency of users to click
on the recommended article. In particular, the main focus of this                 3.2     Task 2: News Click Prediction
task is research that transcends conventional work in the area of                 Publishers continuously monitor users’ interactions with their on-
image concept detection and that includes aspects of images that                  line services. Webservers record clicks to provide the basis for
go beyond their literally depicted content (such as quality, style,               optimization. The servers’ logs reveal that some articles attract
and framing).                                                                     more views than others. We hypothesize that images play a role
                                                                                  in users’ complex decision making. The evaluation data has the
Copyright 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).                        1 http://www.multimediaeval.org/mediaeval2018/newsreelmm/
MediaEval’20, 14-15 December 2020, Online                                         2 http://www.multimediaeval.org/mediaeval2019/mmrecsys/
MediaEval’20, December 2020, Online                                                                                                   Kille et al.


click statistics removed. Participants must develop suited models          5.1    Task 1: Image Task Re-Matching
to estimate the likelihood of clicks. These models can reveal what         The evaluation set contains 4114 images and articles each. A valid
makes an image appealing to users.                                         submission pairs exactly one image with exactly one article. Specif-
   Both subtasks investigate news consumption behavior. We will            ically, the participants have to submit a file with two columns.
assess submission both in terms of quantitative performance—i.e.           The first column must contain the image references (i.e., iid). The
measured by the evaluation metrics—as well as qualitative insight          second column must contain the article references (i.e., aid). To
into the interplay between images and news consumption.                    compare submissions, the evaluation protocol computes the pro-
                                                                           portion of correctly matched pairs. For instance, if participants
4    DATASET                                                               accurately matched 1000 pairs, the score will be 1000 divided by
Server logs, covering a three-month period, constitute the building        4114 or ≈ 24.3 %.
block for this task’s data set. The logs have been obtained from a big
German publisher. They comprise information related to articles,           5.2    Task 2: News Click Prediction
images, and interactions with users. The data set represents articles      The training data reveals how often the system has recommended
with a reference, the link to the article, the title, and a text snippet   each article and how often these recommendations have resulted
of at most 256 characters. The data set presents images as the pair        in clicks. This information remains hidden for the evaluation data.
of a reference and the link to the image. As the publisher main-           Participants must estimate the chance of an image being clicked.
tains the copyright for the images, participants need to download          Hence, a valid submission presents two columns. The first column
them individually. Interactions between users and content occur in         contains the image reference (i.e., iid). The second column features
three ways: reading, being recommended articles, and clicking these        a numerical value corresponding to the likelihood of a click for
recommendations. Reading and clicking on recommendations are               that particular image. The evaluation protocol follows a three-step
interactions triggered from the user. Generating recommendations           procedure. First, the protocol eliminates all images that had not
is triggered by the system.                                                been displayed to users at least 100 times. This step is necessary
    The data set comes in three batches. The first and second batch        for robust calculation of the evaluation scores. As a result, the
constitute the data designated to train the models. These batches          evaluation set retains 2329 images. Without this step, an image
include a mapping between articles and images as well as the in-           which has been displayed twice and clicked once would obtain a
teraction statistics. The third batch splits into separate files for the   high score without meaningfully reflecting the performance of the
articles and images. Besides, the batch omits the interaction statis-      approach. Second, the protocol sorts all images according to their
tics. In addition to the images, the data includes tags derived from       estimated likelihoods. Third, the protocol compares the obtained
the images using the ImageNet model [2]. Participants can use the          ranking to the actual ranking to compute the precision. Precision
tags a textual representation of the images.                               quantifies the proportion of relevant items ranked at the top of
                                                                           the list. In this task, we have identified the 85 images with the
                                                                           highest likelihood of being clicked. Hence, we compute precision
Table 1: Data Set Statistics. The number of cases refers to                as the proportion of those images ranked in the top 85 images in
both articles and images. Cases with articles using the same               the submission.
image have been removed. The estimated download time has
been measured at the Technische Universität Berlin with a                  5.3    Run Description
standard laptop.
                                                                           Participants report results in dedicated working notes. The results
                                                                           ought to highlight their reasoning, qualitative insights, and critical
 Feature                   Batch 1           Batch 2         Batch 3
                                                                           reflections about what can be deduced from the quantitative results.
 Time Span            January 2019     February 2019     March 2019        Participants can submit up to five runs for each subtask.
 Purpose                  Training          Training     Evaluation
 No. Cases                    4688              4676           4114        6     CONCLUSION
 Download Time              45 min            45 min         40 min        Understanding the complicated relation of content and presenta-
                                                                           tion remains a tough challenge. Various external factors impede
                                                                           drawing conclusion from data samples. This task strives to shed
   Table 1 summarizes the data set. All batches contain between            light on a subject that has become increasingly relevant, which is
4000 and 5000 pairs of articles and images. The cases have been            related to images and their strong influence on the perception and
assigned to the batches based on the chronology of the log files.          the authenticity of news. The presence of ‘fake news’ threatens
Participants ought to be able to obtain the images in less than three      social cohesion. Insights into the effect of content presentation yield
hours with a standard internet connection.                                 the potential to safeguard against the erosion of trust into media.
                                                                           Knowing what features to consider when detecting fake news can
5    EVALUATION                                                            help publishers to prevent their spread.
The third batch of the data set lacks both the link between articles
and images and the interaction data. The two subtasks challenge            ACKNOWLEDGMENTS
participants to reestablish them. Participants can submit up to five       We would like to thank plista for kindly providing the real world
runs for each of the two subtasks.                                         data. Further, we thank Martha Larson for her support.
NewsImages: The role of images in online news                                                               MediaEval’20, December 2020, Online


REFERENCES                                                                    [5] Mozhgan Karimi, Dietmar Jannach, and Michael Jugovac. 2018. News
 [1] Francesco Corsini and Martha Larson. 2016. CLEF NewsREEL 2016: Im-           recommender systems–Survey and roads ahead. Information Process-
     age based Recommendation. In Working Notes of the 7th International          ing & Management 54, 6 (2018), 1203–1227.
     Conference of the CLEF Initiative, Evora, Portugal. CEUR Workshop        [6] Andreas Lommatzsch, Benjamin Kille, Frank Hopfgartner, Martha
     Proceedings.                                                                 Larson, Torben Brodt, Jonas Seiler, and Özlem Özgobek. 2017. CLEF
 [2] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei.       2017 NewsREEL Overview: A Stream-based Recommender Task for
     2009. Imagenet: A large-scale hierarchical image database. In 2009           Evaluation and Education. In 8th International Conference of the CLEF
     IEEE Conference on Computer Vision and Pattern Recognition. IEEE,            Association: Experimental IR Meets Multilinguality, Multimodality, and
     248–255.                                                                     Interaction (CLEF 2017). Springer.
 [3] Mouzhi Ge and Fabio Persia. 2017. A Survey of Multimedia Recom-          [7] Nelleke Oostdijk, Hans van Halteren, Erkan Bas, ar, and Martha Larson.
     mender Systems: Challenges and Opportunities. International Journal          2020. The Connection between the Text and Images of News Articles:
     of Semantic Computing 11, 03 (2017), 411–428. https://doi.org/10.1142/       New Insights for Multimedia Analysis. In Proceedings of The 12th
     S1793351X17500039                                                            Language Resources and Evaluation Conference. 4343–4351.
 [4] MD. Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin, and           [8] Özlem Özgöbek, Benjamin Kille, Jon Atle Gulla, and Andreas Lom-
     Hamid Laga. 2019. A Comprehensive Survey of Deep Learning for                matzsch. 2019. The 7th international workshop on news recommen-
     Image Captioning. ACM Comput. Surv. 51, 6, Article 118 (Feb. 2019).          dation and analytics (INRA 2019). In Proceedings of the 13th ACM
     https://doi.org/10.1145/3295748                                              Conference on Recommender Systems. 558–559.
                                                                              [9] Xinyi Zhou and Reza Zafarani. 2020. A Survey of Fake News. Comput.
                                                                                  Surveys 53, 5 (Sep 2020), 1–40. https://doi.org/10.1145/3395046