<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Benjamin Kille</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Lommatzsch</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Özlem Özgöbek</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mehdi Elahi</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Duc-Tien Dang-Nguyen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Berlin Institute of Technology</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kristiania University College</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Norwegian University of Science and Technology</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Bergen</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <abstract>
        <p>Images play a crucial role in online news perception. Images catch users' attention and strongly afect how they interpret the news. Images serve diferent roles, e.g., visualizing a scene discussed in the text, highlight certain aspects by a stock photo, or showing archived footage of a relevant person or organization. News Images as part of MediaEval 2022 aims to gain more insight into the interplay of images and texts in diferent news domains. Participant access a large set of articles and accompanying images collected from general online news portals and an RSS-based news stream. In contrast to NewsImages 2021 data come from diferent news sources. Thus, this year's task facilitates comparing image usage on diferent portals and analyzing transfer learning strategies. This paper describes the NewsImages task, explains the dataset and evaluation metrics. It draws connections to existing research.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Publishers present news as a multimodal mix: images, video clips, and soundbites accompany
the textual content. Imagery attracts readers’ attention and illustrates aspects of the textual body.
Research both on multimedia and personalization has previously assumed a simple relationship
between the modalities. For instance, image captioning [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] models the task as quite literally
describing the image’s scenery. Contrarily, Oostdijk et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] finds that the relationship is
much more complicated. The News Images task at MediaEval 2022 investigates the relationship
with real-world data seeking to better understand its implications for journalism and news
personalization.
      </p>
      <p>Participants have access to news from three diferent sources: Publishers’ websites, RSS feeds,
and social media. For each kind of source, the data presents a training portion and an evaluation
set for which the link between text and image has been removed. Participants have to re-match
the articles and images. Thereby, the task addresses questions such as: What makes an image
appealing as depiction of news events? How do editors select images? What do readers find
most relevant for news images? News Images seeks to surpass conventional research on image
concept detection.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>
        The Multimedia Evaluation Benchmark task NewsImages researches the interesting aspects of
multi-media content in the news domain for the fifth time in 2022. In the years 2018-2020, the
NewsREEL Multimedia [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4, 5</xref>
        ] focused on predicting the popularity of news items based on
multi-media content. In 2021 the focus shifted on understanding the connection between texts
and images [6].
      </p>
      <p>The annotation and the understanding of images has been an active research topic in recent
years. Several datasets (e.g. Flickr30k [7], MS COCO [8]) exist, providing textual annotation of
images. These datasets are designed for learning methods for automated image labeling and
image retrieval. The dataset focus on describing the image content but do neither consider the
role of the image in the context nor relation between the image and the surrounding text. The
NewsImages task focus on researching the relation of images and news articles. NewsImages
is also strongly related to the news recommendation problem. CLEF NewsREEL researched
methods for identifying trends in news streams and for providing news recommendations.
News recommender systems extract features from news articles and the user behavior for
computing highly relevant recommendations. Most existing recommender systems only consider
the textual content, making use of traditional Information Retrieval methods and advanced
Language Model-based approaches. Text-based recommender systems can eficiently provide
good recommendations. Besides, recent years have seen an upward trend concerning research
on multimodal recommender systems. For instance, Truong and Lauw [9] investigate how
to leverage multimodal user feedback, Salah et al. [10] prepare a framework for multimodal
recommender systems and Oramas et al. [11] examine the use of multimodal data for music
recommendation. For a comprehensive review, we refer to [12]. The News Images task supports
the research toward multimodality.</p>
      <p>Strongly related with news recommendations it the detection of ‘fake news’ [13]. ‘Fake news’
often put data or images in an misleading context in order to attract users or to achieve an
intended perception. Thus, a fine-grained analysis of images and text helps to get a better
understanding of this phenomenon. The NewsImages task allows to research the use of images
in news articles with respect to diferent news domains.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Task Description</title>
      <p>News Images explores the relation between news texts and images. Concretely, the task’s data
set considers three news channels: publishers’ news portals, RSS feeds, and social media. For
each channel, participants obtain a training set that includes the link between text and image.
Besides, they receive an evaluation set with the link removed. Participants’ task is to develop
and evaluate ways to re-match news articles and images. The set of images contains some
instances that could be related to more than one article. For instance, the editor uses a stock
photo capturing the happening more conceptually. Thus, participants can submit an ordered list
of image candidates. The evaluation protocol checks the position of the actually linked image
and rewards submission with the match early on.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Dataset</title>
      <p>We provide a dataset comprising three diferent news sources: Online News portals, Twitter,
and an RSS news feed. The heterogeneity of the data sources allows us to analyze the relation
of images and texts in three diferent yet similar news domains.</p>
      <p>Creating the dataset follows three steps: (1) We crawled news articles from the source. The
crawling stretched from March to August 2022. (2) We applied two filters to guarantee data
quality. We removed articles with more than 30% non-ASCII characters as well as articles with
fewer than 20 characters. (3) We apply a set of rules to determine the best image in cases with
multiple images. An expressive image has a reasonable size, suficient entropy in the color
spectrum—which helps us to filter out logos—, contains little text if at all, and ideally is unique.
Having filtered the images automatically, an annotator checked the output and assured that
inadequate images and logos had been excluded. The filtering succeeded for the analyzed RSS
feed. Still, the high variety of images on major news portals necessitated a manual post-filtering.</p>
      <p>The data contains information related to articles and images. Articles’ metadata include the
URL, title, and a text snippet. The image data consist of a URL and an image hashcode. We do
not provide image captions and ask participants not to make use of the image filename.</p>
      <p>The data set comprises three batches each consisting of a training and a test set. The test sub
batches provide 1,500 elements to simplify the comparison of the results obtained for the three
batches. Table 1 illustrates the data set.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Evaluation</title>
      <p>We want to better understand the interplay between news texts and images. As a proxy, we
task participants to re-match texts and images.</p>
      <p>Participants obtain a set of unlinked news articles and images. For each article, they have
to provide a list of images sorted by match likelihood. We cap the lists at 100 to simplify
computation and account for the expectation that editors will not spend time browsing to long
lists of images. Each of the three evaluation sets contains 1500 articles and images. Consequently,
participants provide a tab-separated file with one column for the article reference followed by
100 columns with image identifiers.</p>
      <sec id="sec-5-1">
        <title>5.1. Evaluation Metric</title>
        <p>We use Mean Reciprocal Rank (MRR) [14] as the main evaluation criteria. MRR is defined as

MRR = 1 ∑︁</p>
        <p>1
 =1 rank()
where rank() returns the rank at which the matching image was listed or a very large number
such as 1012 if the list excludes the matching image. The earlier the matching image appears on
average, the higher the score. The MRR strongly favors the top of the list and penalizes finding
a match further down.</p>
        <p>Besides, we compute the Average Precision (AP) at ranks  for  ∈ {1, 5, 10, 20, 50, 100}.
AP lets us investigate whether the predictions are more accurate in some ranges.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Run Description</title>
        <p>Participants’ working notes inform about their ideas. We encourage participants to explore
diferent ideas to help us better understand the interplay between texts and images in news.
Consequently, participants can submit up to five runs for each of the three test sets. Each run
consists of predictions for each of the three test sets. We further motivate participants to compare
the results of diferent runs and analyze the findings with respect to quality, computational
complexity, and used resources. The discussion of the results should take into consideration
the dataset’s particularities and explain how findings translate to other scenarios. Finally,
participants should describe what they have learned and contemplate how their insights can
help to advance the research.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Understanding the relation between text content and images in news remains a tough challenge.
Images have diferent roles in news such as attracting users, highlighting specific aspects of a
message or providing additional context for a news article. Dependent of the concrete news
event, the image may depict the news event, shows an old similar scene or depicts persons or
objects related to the news text. Understanding the user preferences in consuming images and
the publishers policies for selecting images helps to research to induced news perception and
intentions.</p>
      <p>Consequently, understanding the relation between news texts and images can give publishers
an competitive advantages, help to detect fake news and click bating, as well as pave to way for
the penalization of news.</p>
      <p>Acknowledgements We would like to thank Marc Gallofré Ocaña for kindly supporting the
providing real world data. Further, we thank Martha Larson for her support.
[5] B. Kille, A. Lommatzsch, O. Özgöbek, Newsimages: The role of images in online news, in:
Proceedings of the MediaEval Benchmarking Initiative for Multimedia Evaluation 2020, CEUR
Workshop Proceedings, 2020. URL: http://ceur-ws.org/Vol-2882/.
[6] B. Kille, A. Lommatzsch, Ö. Özgöbek, M. Elahi, D.-T. Dang-Nguyen, News images in mediaeval
2021, in: Proceedings of the MediaEval Benchmarking Initiative for Multimedia Evaluation 2021,
CEUR Workshop Proceedings, 2021. URL: http://ceur-ws.org/Vol-3181/paper2.pdf.
[7] P. Young, A. Lai, M. Hodosh, J. Hockenmaier, From Image Descriptions to Visual Denotations: New
Similarity Metrics for Semantic Inference over Event Descriptions, Transactions of the Association
for Computational Linguistics 2 (2014) 67–78. doi:10.1162/tacl_a_00166.
[8] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick, Microsoft
COCO: Common Objects in Context, in: European Conference on Computer Vision, Springer, 2014,
pp. 740–755. doi:10.1007/978-3-319-10602-1_48.
[9] Q.-T. Truong, H. Lauw, Multimodal review generation for recommender systems, in: The World</p>
      <p>Wide Web Conference, 2019, pp. 1864–1874.
[10] A. Salah, Q.-T. Truong, H. W. Lauw, Cornac: A comparative framework for multimodal recommender
systems., J. Mach. Learn. Res. 21 (2020) 95–1.
[11] S. Oramas, O. Nieto, M. Sordo, X. Serra, A deep multimodal approach for cold-start music
recommendation, in: Proceedings of the 2nd workshop on deep learning for recommender systems, 2017,
pp. 32–37.
[12] Y. Deldjoo, M. Schedl, P. Cremonesi, G. Pasi, Recommender systems leveraging multimedia content,</p>
      <p>ACM Computing Surveys (CSUR) 53 (2020) 1–38.
[13] X. Zhou, R. Zafarani, A survey of fake news, ACM Computing Surveys 53 (2020) 1–40. URL:
http://dx.doi.org/10.1145/3395046. doi:10.1145/3395046.
[14] E. M. Voorhees, et al., The TREC-8 Question Answering Track Report., in: TREC, volume 99, 1999,
pp. 77–82.
[15] F. Corsini, M. Larson, CLEF NewsREEL 2016: Image based Recommendation, in: Working Notes
of the 7th International Conference of the CLEF Initiative, Evora, Portugal, CEUR Workshop
Proceedings, 2016.
[16] A. Lommatzsch, B. Kille, F. Hopfgartner, M. Larson, T. Brodt, J. Seiler, Ö. Özgobek, CLEF 2017
NewsREEL overview: A stream-based recommender task for evaluation and education, in: 8th
International Conference of the CLEF Association: Experimental IR Meets Multilinguality, Multimodality,
and Interaction (CLEF 2017), Springer, 2017.
[17] M. Ge, F. Persia, A survey of multimedia recommender systems: Challenges and opportunities,
International Journal of Semantic Computing 11 (2017) 411–428. URL: https://doi.org/10.1142/
S1793351X17500039. doi:10.1142/S1793351X17500039.
[18] M. Karimi, D. Jannach, M. Jugovac, News recommender systems–survey and roads ahead,
Information Processing &amp; Management 54 (2018) 1203–1227.
[19] Ö. Özgöbek, B. Kille, J. A. Gulla, A. Lommatzsch, The 7th international workshop on news
recommendation and analytics (inra 2019), in: Proceedings of the 13th ACM Conference on
Recommender Systems, 2019, pp. 558–559.
[20] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale hierarchical image
database, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp.
248–255.
[21] B. Kille, A. Lommatzsch, Ö. Özgöbek, M. Elahi, D.-T. Dang-Nguyen, News images in mediaeval
2021, in: Proceedings of the MediaEval Benchmarking Initiative for Multimedia Evaluation 2021,
CEUR Workshop Proceedings, 2021. URL: http://ceur-ws.org/Vol-3181/.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. Z.</given-names>
            <surname>Hossain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sohel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Shiratuddin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Laga</surname>
          </string-name>
          ,
          <article-title>A comprehensive survey of deep learning for image captioning</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>51</volume>
          (
          <year>2019</year>
          ). URL: https://doi.org/10.1145/3295748. doi:
          <volume>10</volume>
          . 1145/3295748.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Oostdijk</surname>
          </string-name>
          , H. van Halteren,
          <string-name>
            <given-names>E.</given-names>
            <surname>Basar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <article-title>The connection between the text and images of news articles: New insights for multimedia analysis</article-title>
          ,
          <source>in: Proceedings of The 12th Language Resources and Evaluation Conference</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>4343</fpage>
          -
          <lpage>4351</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          , L. Ramming,
          <article-title>Mediaeval 2018 - overview on newsreel multimedia</article-title>
          ,
          <source>in: Proceedings of the MediaEval Benchmarking Initiative for Multimedia Evaluation</source>
          <year>2018</year>
          , CEUR Workshop Proceedings,
          <year>2018</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2283</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deldjoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <article-title>The 2019 multimedia for recommender system task: Movierec and newsreel at mediaeval</article-title>
          ,
          <source>in: Proceedings of the MediaEval Benchmarking Initiative for Multimedia Evaluation</source>
          <year>2019</year>
          , CEUR Workshop Proceedings,
          <year>2019</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2670</volume>
          /.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>