<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>NewsImages Fusion: Bridging Textual Context and Visual Content in Media Representation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dr.R.Priyadharsini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arvind.V</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harish.J</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>P.Vettri Chezian</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>MohanaPriya E</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of Engineering</institution>
          ,
          <addr-line>Chennai - 603110, Tamil Nadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>As the consumption of news content becomes increasingly visual, the evaluation of news images plays a pivotal role in media understanding and interpretation. This research addresses the challenges associated with the automated assessment of news images with the mapping of textual information using Convolutional Neural Networks (CNNs). The work leverages a comprehensive dataset of news images and proposes a CNN architecture tailored to the intricacies of media content. The research first delves into the existing landscape of news image evaluation, highlighting gaps and limitations in current methodologies. Motivated by the need for robust and eficient image assessment tools, our work focuses on the design and implementation of a CNN tailored for news media. Upon Further Investigations,it was found out that the proposed system has an accuracy of 14.11.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>NewsImages Fusion</kwd>
        <kwd>Text-Image Relationship</kwd>
        <kwd>image captioning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the contemporary landscape of digital media, news dissemination is increasingly characterized
by the integration of visual content, with news images serving as crucial elements in shaping
public perception. As society navigates an era inundated with information, the ability to
assess the credibility, relevance, and impact of news images becomes paramount. This research
addresses the imperative need for automated and eficient methodologies to evaluate news
images, a challenge exacerbated by the sheer volume and diversity of media content. Online news
articles are multimodal: the textual content of an article is often accompanied by a multimedia
item such as an image. The image is important for illustrating the content of the text, but
also attracting readers’ attention. Research in multimedia and recommender systems generally
assumes a simple relationship between images and text occurring together. For example, in
image captioning [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] the caption is often assumed to describe the literally depicted content of
the image. In contrast, when images accompany news articles, the relationship becomes less
clear[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Since there are often no images available for the most recent news messages, stock
images, archived photos, or even generated photos are used. An additional challenge is the
wide spectrum of news domains, reaching from politics to economics to sports and to health
and entertainment. The goal of this task is to investigate these intricacies in more depth, in
order to understand the implications that it may have for the areas of journalism and news
personalization. The task takes a large set of news articles paired with their corresponding
images. The two entities have been paired but we do not know how. For instance, journalists
could have selected an appropriate picture manually, generated an illustration using generative
AI, or a machine could have selected an image from a stock photo database. The image can
have a semantic relation to the story but has not necessarily been taken directly at the reported
event, nor event exist (in case of synthetic images). Automatic image captioning is insuficient
to map the images to articles.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The evolving landscape of multimedia content in news articles has spurred significant research
eforts to understand and enhance the interaction between text and images. This section provides
a comprehensive overview of the background and related work in this domain. Recent work
by Lommatzsch et al.[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]has made substantial strides in bridging the "Depiction Gap" with the
introduction of NewsImages. This online news dataset focuses on text-image rematching,ofering
valuable insights into the intricate relationship between news articles and their associated images.
The authors highlight the challenges in accurately pairing textual and visual content, setting the
stage for a deeper exploration. Garcin et al.[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] contribute to the discourse on recommendation
systems, emphasizing the limitations of ofline evaluations in predicting the performance of
diverse recommendation techniques. Their study underscores the need for sophisticated models
that incorporate novelty into recommendations and questions the reliability of Click-Through
Rate (CTR) as a sole metric, especially for popular items. These findings resonate with the
challenges encountered in multimedia recommendation tasks. Ge and Persia [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]provide a
comprehensive survey of multimedia recommender systems, shedding light on challenges and
opportunities in this domain. Their work spans across research communities,delving into
the intersection of multimedia information systems and recommender systems.Categorizing
papers based on recommender algorithm, multimedia object, and application domain, the
survey identifies key features that pave the way for potential research opportunities.Continuous
evaluation in large-scale information access systems is explored by Hopfgartner etal.[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. They
advocate for the adoption of living labs, presenting a case for ongoing evaluation.The relevance
of their approach extends to the evaluation of multimedia recommendation systems, providing
a framework for refining algorithms and adapting to evolving user preferences. Hossain et al.[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
contribute to the landscape of multimedia understanding with a comprehensive survey of deep
learning for image captioning. The survey encompasses the evolving techniques used to bridge
the semantic gap between textual descriptions and visual content, a challenge inherent in the
news domain explored by our work. The stream-based recommender task overview presented
by Lommatzsch et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] at CLEF 2017 is particularly relevant to our study. It emphasizes
the need for ongoing evaluation and education in the field of recommender systems, aligning
with our goal of refining algorithms based on insights gained from continuous assessments.
Oostdijk et al.[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] contribute insights into the connection between text and images in news
articles. Their work ofers new perspectives for multimedia analysis, which resonates with our
exploration of the impact of image content on consumer engagement in the context of social
media posts related to major U.S. airlines and compact SUV models. Lops et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] provide a
comprehensive survey of content-based recommender systems, addressing fundamental aspects
characterizing this category of systems. Their exploration of techniques for representing items
to be recommended aligns with the challenges posed by diverse multimedia content in news
articles. Li and Xie [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] leverage observational data to explore the impact of image content on
consumer engagement with social media posts. The study introduces pathways through which
image content influences engagement, aligning with our investigation into the interaction
between text and images in the realm of news articles.Finally, Liu, Han, and Chilton [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] present
a significant contribution to the field with their work on multimodal image generation for
news illustration. Their exploration of generating images for news articles aligns with the
overarching theme of our study, emphasizing the importance of understanding the relationship
between textual and visual content.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Objective</title>
      <p>Develop a comprehensive dataset of news images representative of diverse media
contexts.Design and implement a CNNarchitecture tailored to the specific characteristics of news
images.Evaluate the performance of the proposed CNN against benchmark methods using
carefully selected metrics. Provide insights into the potential applications and limitations of CNNs
in the realm of news image evaluation.This task explores the relationship between text and
images in news articles. A dataset includes paired news articles and images, with undisclosed
pairing methods—whether manual selection, generative AI, or automatic machine choice. The
images may have semantic ties to the story but need not depict the reported event. Conventional
image captioning falls short in accurately mapping images to articles in this diverse context.This
dataset is curated from web news articles, providing crucial details for each article, including
URL, Title, and initial news text. Paired with each article is a corresponding image, and the
dataset covers both English and German articles, with machine-translated versions for the
latter.With a 1:1 relationship, the dataset follows a structure akin to NewsImages 2022 data
structures.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Approach</title>
      <p>The provided code defines a convolutional neural network (CNN) model for image
classification using PyTorch. The CNN architecture consists of two convolutional layers followed by
max pooling operations and two fully connected layers. The model is trained on a custom
dataset,NewsDataset, which combines textual and image data. It loads image data from a
specified folder and transforms it using resize and tensor conversion operations. The training
process involves iterating through the dataset, computing predictions, and optimizing the model
parameters using the MRR metrics. Evaluation metrics such as Mean Reciprocal Rank (MRR),
Precision@K,and Recall@K are calculated both during training and testing phases to assess the
model’s performance. Finally, the model is evaluated on a separate test dataset, and Precision@K
and Recall@K values are reported. Overall, the code represents a pipeline for training and
evaluating a CNN model for image classification tasks involving textual and image data.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Evaluation Methodology</title>
      <p>The computation involves the Mean Reciprocal Rank (MRR) as the oficial metric and a series of
Precision@K scores and Recall@K values, where K takes values from 1, 5, 10, 20, 50, 100. The
primary metric for the task is the average MRR, providing insights into the average position at
which the linked image appears.Additionally, the average precision scores ofer a comprehensive
evaluation of performance across various ranks within the list.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Results and Analysis</title>
      <p>A series of experiments was conducted, The proposed system was evaluated using MRR
metrics.The Training accuracy was found out to be 76.52 and the Testing accuracy was found out to
be 14.11.</p>
      <p>K-Values</p>
      <p>Precision
The insights gathered from the referenced works pave the way for a comprehensive discussion
on the intricate relationship between text and images in news articles. The diverse perspectives
ofered by researchers in multimedia recommender systems, continuous evaluation, image
captioning, and content-based recommendation systems provide a rich foundation for our
analysis.Wee have also observed that the architecture involves two convolutional layers for
feature extraction, followed by fully connected layers for further processing and classification.
The convolutional layers extract and learn features from the input image, while the fully
connected layers combine these features to make predictions about the input image’s class.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. Z.</given-names>
            <surname>Hossain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sohel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Shiratuddin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Laga</surname>
          </string-name>
          ,
          <article-title>A comprehensive survey of deep learning for image captioning</article-title>
          ,
          <source>ACM Computing Surveys (CSUR) 51</source>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Oostdijk</surname>
          </string-name>
          , H. van Halteren, E. Bas, ar, M. Larson,
          <article-title>The connection between the text and images of news articles: New insights for multimedia analysis (</article-title>
          <year>2020</year>
          )
          <fpage>4343</fpage>
          -
          <lpage>4351</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tesic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bartolomeu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Semedo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pivovarova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <article-title>Newsimages: Addressing the depiction gap with an online news dataset for text-image rematching (</article-title>
          <year>2022</year>
          )
          <fpage>227</fpage>
          -
          <lpage>233</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Garcin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Faltings</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Donatsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alazzawi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bruttin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Huber</surname>
          </string-name>
          ,
          <article-title>Ofline and online evaluation of news recommender systems at swissinfo</article-title>
          .
          <source>ch</source>
          (
          <year>2014</year>
          )
          <fpage>169</fpage>
          -
          <lpage>176</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Persia</surname>
          </string-name>
          ,
          <article-title>A survey of multimedia recommender systems: Challenges and opportunities</article-title>
          ,
          <source>International Journal of Semantic Computing</source>
          <volume>11</volume>
          (
          <year>2017</year>
          )
          <fpage>411</fpage>
          -
          <lpage>428</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Balog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <article-title>Continuous evaluation of large-scale information access systems: a case for living labs (</article-title>
          <year>2019</year>
          )
          <fpage>511</fpage>
          -
          <lpage>543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brodt</surname>
          </string-name>
          , J. Seiler, Ö. Özgöbek,
          <article-title>Clef 2017 newsreel overview: A stream-based recommender task for evaluation and education (</article-title>
          <year>2017</year>
          )
          <fpage>239</fpage>
          -
          <lpage>254</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lops</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. De Gemmis</surname>
          </string-name>
          , G. Semeraro,
          <article-title>Content-based recommender systems: State of the art and trends (</article-title>
          <year>2011</year>
          )
          <fpage>73</fpage>
          -
          <lpage>105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <article-title>Is a picture worth a thousand words? an empirical study of image content and social media engagement</article-title>
          ,
          <source>Journal of Marketing Research</source>
          <volume>57</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Qiao</surname>
          </string-name>
          , L. Chilton,
          <article-title>Multimodal image generation for news illustration (</article-title>
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1145/ 3526113.3545621.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>