News Images in MediaEval 2020 Benjamin Kille1 , Andreas Lommatzsch1 , Özlem Özgöbek2 1 Berlin Institute of Technology, Germany 2 Norwegian University of Science and Technology, Norway benjamin.kille@dai-labor.de,andreas.lommatzsch@dai-labor.de,ozlem.ozgobek@ntnu.no ABSTRACT 2 BACKGROUND AND RELATED WORK Images play an important role in online news reading behavior. The Multimedia Evaluation Benchmark (MediaEval) investigates the They attract users’ attention and they can decide whether users intersection of multimedia and recommender systems for the third pay attention to some content over other content. News Images time in 2020. In 2018, the NewsREEL Multimedia1 task provided in MediaEval 2020 aims to gain more insight into the interplay of data from multiple publishers concerning the interaction of users news images and news consumption. Within this task, participants with content. In 2019, a subtask of the Multimedia RecSys2 featured get access to a large set of articles and accompanying images. The similar data. task consists of two separate subtasks. Participants can choose to Publishers employ news recommender systems to personalize participate in both or one of them. In the first subtask, participants their services [5]. This emergence of ‘fake news’ has fueled the have to predict which images and articles have been paired by interest in news recommender systems [9]. Research has picked the publisher. In the second subtask, participants have to estimate up on the demand and established venues to discuss the relation the likelihood that users will click recommendations consisting of of news recommendation and misinformation [8]. Besides the rec- pairs of articles and images. This paper describes the task setting ommended content, researchers devote more and more attention in detail and draws connections to existing research. The overview to the presentation. The research distributes across different areas. illustrates the metrics and evaluation procedures that are used. Research on image analysis produces tools and models to extract better features from image data. Recommender system research strives to better understand personalization and user behavior. The 1 INTRODUCTION subfield dedicated to news recommendation deals with the particu- Online news articles are multimodal: the textual content of an ar- larities of news. For instance, users exhibit a session-based interest ticle is often accompanied by an image. The image illustrates the as opposed to long-term interests with regard to music, literature, text’s content and attracts readers’ attention. Research in both mul- or television. The subfield of multimedia recommendation delves timedia and recommender systems domains generally assumes a deeper into how content can contribute to generating recommen- simple relationship between images and text occurring together. For dations for users. instance, image captioning [4] often assumes that the caption quite literally describes the image’s scenery. However, other research 3 TASK DESCRIPTION shows that when images accompany news articles, the relation- The task seeks to explore the relation between images and articles. ship becomes more complicated [7]. The MediaEval 2020 News We define two subtasks, either or both of which participants can Images Task investigates the real-world relationship of news text choose to take part in. and images in more depth, in order to understand its implications for journalism and news recommender systems. 3.1 Task 1: Image-Text Re-Matching The task branches into two subtasks, both of which participants In practice, publishers employ staff to search for images to accom- can address using text-based or image-based features. The first pany news articles. In many cases, the employees have access to subtask focuses on predicting which images and articles have been imagery from the event. Sometimes, they select images from a data- paired by the publisher, whereas the second subtask focuses on base (e.g. stock images). As a result, readers encounter pairs of estimating the likelihood that users will click recommendations articles and images. This subtask has removed the link between im- consisting of pairs of articles and images. Given these two sub- ages and articles. Thus, participants separately get a list of articles tasks, the ultimate objective of this task is to gain additional insight and images. Participants must develop suited models to reconstruct about i) the relationship of news text and the images accompanying the link between articles and images. These models can help us to them. ii) the connection between the image and title shown by a understand what makes an image fitting to an article. recommender system to users, and iii) the tendency of users to click on the recommended article. In particular, the main focus of this 3.2 Task 2: News Click Prediction task is research that transcends conventional work in the area of Publishers continuously monitor users’ interactions with their on- image concept detection and that includes aspects of images that line services. Webservers record clicks to provide the basis for go beyond their literally depicted content (such as quality, style, optimization. The servers’ logs reveal that some articles attract and framing). more views than others. We hypothesize that images play a role in users’ complex decision making. The evaluation data has the Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 1 http://www.multimediaeval.org/mediaeval2018/newsreelmm/ MediaEval’20, 14-15 December 2020, Online 2 http://www.multimediaeval.org/mediaeval2019/mmrecsys/ MediaEval’20, December 2020, Online Kille et al. click statistics removed. Participants must develop suited models 5.1 Task 1: Image Task Re-Matching to estimate the likelihood of clicks. These models can reveal what The evaluation set contains 4114 images and articles each. A valid makes an image appealing to users. submission pairs exactly one image with exactly one article. Specif- Both subtasks investigate news consumption behavior. We will ically, the participants have to submit a file with two columns. assess submission both in terms of quantitative performance—i.e. The first column must contain the image references (i.e., iid). The measured by the evaluation metrics—as well as qualitative insight second column must contain the article references (i.e., aid). To into the interplay between images and news consumption. compare submissions, the evaluation protocol computes the pro- portion of correctly matched pairs. For instance, if participants 4 DATASET accurately matched 1000 pairs, the score will be 1000 divided by Server logs, covering a three-month period, constitute the building 4114 or ≈ 24.3 %. block for this task’s data set. The logs have been obtained from a big German publisher. They comprise information related to articles, 5.2 Task 2: News Click Prediction images, and interactions with users. The data set represents articles The training data reveals how often the system has recommended with a reference, the link to the article, the title, and a text snippet each article and how often these recommendations have resulted of at most 256 characters. The data set presents images as the pair in clicks. This information remains hidden for the evaluation data. of a reference and the link to the image. As the publisher main- Participants must estimate the chance of an image being clicked. tains the copyright for the images, participants need to download Hence, a valid submission presents two columns. The first column them individually. Interactions between users and content occur in contains the image reference (i.e., iid). The second column features three ways: reading, being recommended articles, and clicking these a numerical value corresponding to the likelihood of a click for recommendations. Reading and clicking on recommendations are that particular image. The evaluation protocol follows a three-step interactions triggered from the user. Generating recommendations procedure. First, the protocol eliminates all images that had not is triggered by the system. been displayed to users at least 100 times. This step is necessary The data set comes in three batches. The first and second batch for robust calculation of the evaluation scores. As a result, the constitute the data designated to train the models. These batches evaluation set retains 2329 images. Without this step, an image include a mapping between articles and images as well as the in- which has been displayed twice and clicked once would obtain a teraction statistics. The third batch splits into separate files for the high score without meaningfully reflecting the performance of the articles and images. Besides, the batch omits the interaction statis- approach. Second, the protocol sorts all images according to their tics. In addition to the images, the data includes tags derived from estimated likelihoods. Third, the protocol compares the obtained the images using the ImageNet model [2]. Participants can use the ranking to the actual ranking to compute the precision. Precision tags a textual representation of the images. quantifies the proportion of relevant items ranked at the top of the list. In this task, we have identified the 85 images with the highest likelihood of being clicked. Hence, we compute precision Table 1: Data Set Statistics. The number of cases refers to as the proportion of those images ranked in the top 85 images in both articles and images. Cases with articles using the same the submission. image have been removed. The estimated download time has been measured at the Technische Universität Berlin with a 5.3 Run Description standard laptop. Participants report results in dedicated working notes. The results ought to highlight their reasoning, qualitative insights, and critical Feature Batch 1 Batch 2 Batch 3 reflections about what can be deduced from the quantitative results. Time Span January 2019 February 2019 March 2019 Participants can submit up to five runs for each subtask. Purpose Training Training Evaluation No. Cases 4688 4676 4114 6 CONCLUSION Download Time 45 min 45 min 40 min Understanding the complicated relation of content and presenta- tion remains a tough challenge. Various external factors impede drawing conclusion from data samples. This task strives to shed Table 1 summarizes the data set. All batches contain between light on a subject that has become increasingly relevant, which is 4000 and 5000 pairs of articles and images. The cases have been related to images and their strong influence on the perception and assigned to the batches based on the chronology of the log files. the authenticity of news. The presence of ‘fake news’ threatens Participants ought to be able to obtain the images in less than three social cohesion. Insights into the effect of content presentation yield hours with a standard internet connection. the potential to safeguard against the erosion of trust into media. Knowing what features to consider when detecting fake news can 5 EVALUATION help publishers to prevent their spread. The third batch of the data set lacks both the link between articles and images and the interaction data. The two subtasks challenge ACKNOWLEDGMENTS participants to reestablish them. Participants can submit up to five We would like to thank plista for kindly providing the real world runs for each of the two subtasks. data. Further, we thank Martha Larson for her support. NewsImages: The role of images in online news MediaEval’20, December 2020, Online REFERENCES [5] Mozhgan Karimi, Dietmar Jannach, and Michael Jugovac. 2018. News [1] Francesco Corsini and Martha Larson. 2016. CLEF NewsREEL 2016: Im- recommender systems–Survey and roads ahead. Information Process- age based Recommendation. In Working Notes of the 7th International ing & Management 54, 6 (2018), 1203–1227. Conference of the CLEF Initiative, Evora, Portugal. CEUR Workshop [6] Andreas Lommatzsch, Benjamin Kille, Frank Hopfgartner, Martha Proceedings. Larson, Torben Brodt, Jonas Seiler, and Özlem Özgobek. 2017. CLEF [2] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2017 NewsREEL Overview: A Stream-based Recommender Task for 2009. Imagenet: A large-scale hierarchical image database. In 2009 Evaluation and Education. In 8th International Conference of the CLEF IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Association: Experimental IR Meets Multilinguality, Multimodality, and 248–255. Interaction (CLEF 2017). Springer. [3] Mouzhi Ge and Fabio Persia. 2017. A Survey of Multimedia Recom- [7] Nelleke Oostdijk, Hans van Halteren, Erkan Bas, ar, and Martha Larson. mender Systems: Challenges and Opportunities. International Journal 2020. The Connection between the Text and Images of News Articles: of Semantic Computing 11, 03 (2017), 411–428. https://doi.org/10.1142/ New Insights for Multimedia Analysis. In Proceedings of The 12th S1793351X17500039 Language Resources and Evaluation Conference. 4343–4351. [4] MD. Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin, and [8] Özlem Özgöbek, Benjamin Kille, Jon Atle Gulla, and Andreas Lom- Hamid Laga. 2019. A Comprehensive Survey of Deep Learning for matzsch. 2019. The 7th international workshop on news recommen- Image Captioning. ACM Comput. Surv. 51, 6, Article 118 (Feb. 2019). dation and analytics (INRA 2019). In Proceedings of the 13th ACM https://doi.org/10.1145/3295748 Conference on Recommender Systems. 558–559. [9] Xinyi Zhou and Reza Zafarani. 2020. A Survey of Fake News. Comput. Surveys 53, 5 (Sep 2020), 1–40. https://doi.org/10.1145/3395046