<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">NewsImages Fusion: Bridging Textual Context and Visual Content in Media Representation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><roleName>Dr</roleName><forename type="first">R</forename><surname>Priyadharsini</surname></persName>
							<email>priyadharsinir@ssn.edu.in</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science and Engineering</orgName>
								<orgName type="institution">Sri Sivasubramaniya Nadar College of Engineering</orgName>
								<address>
									<postCode>603110</postCode>
									<settlement>Chennai, Tamil Nadu</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">P</forename><forename type="middle">Vettri</forename><surname>Chezian</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science and Engineering</orgName>
								<orgName type="institution">Sri Sivasubramaniya Nadar College of Engineering</orgName>
								<address>
									<postCode>603110</postCode>
									<settlement>Chennai, Tamil Nadu</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">NewsImages Fusion: Bridging Textual Context and Visual Content in Media Representation</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">501ABD051C5EFC4A73BB5B762ED85641</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:09+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>NewsImages Fusion</term>
					<term>Text-Image Relationship</term>
					<term>image captioning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>As the consumption of news content becomes increasingly visual, the evaluation of news images plays a pivotal role in media understanding and interpretation. This research addresses the challenges associated with the automated assessment of news images with the mapping of textual information using Convolutional Neural Networks (CNNs). The work leverages a comprehensive dataset of news images and proposes a CNN architecture tailored to the intricacies of media content. The research first delves into the existing landscape of news image evaluation, highlighting gaps and limitations in current methodologies. Motivated by the need for robust and efficient image assessment tools, our work focuses on the design and implementation of a CNN tailored for news media. Upon Further Investigations,it was found out that the proposed system has an accuracy of 14.11.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In the contemporary landscape of digital media, news dissemination is increasingly characterized by the integration of visual content, with news images serving as crucial elements in shaping public perception. As society navigates an era inundated with information, the ability to assess the credibility, relevance, and impact of news images becomes paramount. This research addresses the imperative need for automated and efficient methodologies to evaluate news images, a challenge exacerbated by the sheer volume and diversity of media content. Online news articles are multimodal: the textual content of an article is often accompanied by a multimedia item such as an image. The image is important for illustrating the content of the text, but also attracting readers' attention. Research in multimedia and recommender systems generally assumes a simple relationship between images and text occurring together. For example, in image captioning <ref type="bibr" target="#b0">[1]</ref> the caption is often assumed to describe the literally depicted content of the image. In contrast, when images accompany news articles, the relationship becomes less clear <ref type="bibr" target="#b1">[2]</ref>. Since there are often no images available for the most recent news messages, stock images, archived photos, or even generated photos are used. An additional challenge is the wide spectrum of news domains, reaching from politics to economics to sports and to health and entertainment. The goal of this task is to investigate these intricacies in more depth, in order to understand the implications that it may have for the areas of journalism and news personalization. The task takes a large set of news articles paired with their corresponding images. The two entities have been paired but we do not know how. For instance, journalists could have selected an appropriate picture manually, generated an illustration using generative AI, or a machine could have selected an image from a stock photo database. The image can have a semantic relation to the story but has not necessarily been taken directly at the reported event, nor event exist (in case of synthetic images). Automatic image captioning is insufficient to map the images to articles.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>The evolving landscape of multimedia content in news articles has spurred significant research efforts to understand and enhance the interaction between text and images. This section provides a comprehensive overview of the background and related work in this domain. Recent work by Lommatzsch et al. <ref type="bibr" target="#b2">[3]</ref>has made substantial strides in bridging the "Depiction Gap" with the introduction of NewsImages. This online news dataset focuses on text-image rematching,offering valuable insights into the intricate relationship between news articles and their associated images. The authors highlight the challenges in accurately pairing textual and visual content, setting the stage for a deeper exploration. Garcin et al. <ref type="bibr" target="#b3">[4]</ref> contribute to the discourse on recommendation systems, emphasizing the limitations of offline evaluations in predicting the performance of diverse recommendation techniques. Their study underscores the need for sophisticated models that incorporate novelty into recommendations and questions the reliability of Click-Through Rate (CTR) as a sole metric, especially for popular items. These findings resonate with the challenges encountered in multimedia recommendation tasks. Ge and Persia <ref type="bibr" target="#b4">[5]</ref>provide a comprehensive survey of multimedia recommender systems, shedding light on challenges and opportunities in this domain. Their work spans across research communities,delving into the intersection of multimedia information systems and recommender systems.Categorizing papers based on recommender algorithm, multimedia object, and application domain, the survey identifies key features that pave the way for potential research opportunities.Continuous evaluation in large-scale information access systems is explored by Hopfgartner etal. <ref type="bibr" target="#b5">[6]</ref>. They advocate for the adoption of living labs, presenting a case for ongoing evaluation.The relevance of their approach extends to the evaluation of multimedia recommendation systems, providing a framework for refining algorithms and adapting to evolving user preferences. Hossain et al. <ref type="bibr" target="#b0">[1]</ref> contribute to the landscape of multimedia understanding with a comprehensive survey of deep learning for image captioning. The survey encompasses the evolving techniques used to bridge the semantic gap between textual descriptions and visual content, a challenge inherent in the news domain explored by our work. The stream-based recommender task overview presented by Lommatzsch et al. <ref type="bibr" target="#b6">[7]</ref> at CLEF 2017 is particularly relevant to our study. It emphasizes the need for ongoing evaluation and education in the field of recommender systems, aligning with our goal of refining algorithms based on insights gained from continuous assessments. Oostdijk et al. <ref type="bibr" target="#b1">[2]</ref> contribute insights into the connection between text and images in news articles. Their work offers new perspectives for multimedia analysis, which resonates with our exploration of the impact of image content on consumer engagement in the context of social media posts related to major U.S. airlines and compact SUV models. Lops et al. <ref type="bibr" target="#b7">[8]</ref> provide a comprehensive survey of content-based recommender systems, addressing fundamental aspects characterizing this category of systems. Their exploration of techniques for representing items to be recommended aligns with the challenges posed by diverse multimedia content in news articles. Li and Xie <ref type="bibr" target="#b8">[9]</ref> leverage observational data to explore the impact of image content on consumer engagement with social media posts. The study introduces pathways through which image content influences engagement, aligning with our investigation into the interaction between text and images in the realm of news articles.Finally, Liu, Han, and Chilton <ref type="bibr" target="#b9">[10]</ref> present a significant contribution to the field with their work on multimodal image generation for news illustration. Their exploration of generating images for news articles aligns with the overarching theme of our study, emphasizing the importance of understanding the relationship between textual and visual content.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Objective</head><p>Develop a comprehensive dataset of news images representative of diverse media contexts.Design and implement a CNNarchitecture tailored to the specific characteristics of news images.Evaluate the performance of the proposed CNN against benchmark methods using carefully selected metrics. Provide insights into the potential applications and limitations of CNNs in the realm of news image evaluation.This task explores the relationship between text and images in news articles. A dataset includes paired news articles and images, with undisclosed pairing methods-whether manual selection, generative AI, or automatic machine choice. The images may have semantic ties to the story but need not depict the reported event. Conventional image captioning falls short in accurately mapping images to articles in this diverse context.This dataset is curated from web news articles, providing crucial details for each article, including URL, Title, and initial news text. Paired with each article is a corresponding image, and the dataset covers both English and German articles, with machine-translated versions for the latter.With a 1:1 relationship, the dataset follows a structure akin to NewsImages 2022 data structures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Approach</head><p>The provided code defines a convolutional neural network (CNN) model for image classification using PyTorch. The CNN architecture consists of two convolutional layers followed by max pooling operations and two fully connected layers. The model is trained on a custom dataset,NewsDataset, which combines textual and image data. It loads image data from a specified folder and transforms it using resize and tensor conversion operations. The training process involves iterating through the dataset, computing predictions, and optimizing the model parameters using the MRR metrics. Evaluation metrics such as Mean Reciprocal Rank (MRR), Precision@K,and Recall@K are calculated both during training and testing phases to assess the model's performance. Finally, the model is evaluated on a separate test dataset, and Precision@K and Recall@K values are reported. Overall, the code represents a pipeline for training and evaluating a CNN model for image classification tasks involving textual and image data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Evaluation Methodology</head><p>The computation involves the Mean Reciprocal Rank (MRR) as the official metric and a series of Precision@K scores and Recall@K values, where K takes values from 1, 5, 10, 20, 50, 100. The primary metric for the task is the average MRR, providing insights into the average position at which the linked image appears.Additionally, the average precision scores offer a comprehensive evaluation of performance across various ranks within the list. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Results and Analysis</head><p>A series of experiments was conducted, The proposed system was evaluated using MRR metrics.The Training accuracy was found out to be 76.52 and the Testing accuracy was found out to be 14.11. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>K-Values Precision</head></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Architecture Diagram</figDesc><graphic coords="4,130.96,101.03,333.37,201.62" type="bitmap" /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Discussion And Outlook</head><p>The insights gathered from the referenced works pave the way for a comprehensive discussion on the intricate relationship between text and images in news articles. The diverse perspectives offered by researchers in multimedia recommender systems, continuous evaluation, image captioning, and content-based recommendation systems provide a rich foundation for our analysis.Wee have also observed that the architecture involves two convolutional layers for feature extraction, followed by fully connected layers for further processing and classification. The convolutional layers extract and learn features from the input image, while the fully connected layers combine these features to make predictions about the input image's class.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A comprehensive survey of deep learning for image captioning</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Z</forename><surname>Hossain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Sohel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Shiratuddin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Laga</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys (CSUR)</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="page" from="1" to="36" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">N</forename><surname>Oostdijk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Van Halteren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Bas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Larson</surname></persName>
		</author>
		<title level="m">The connection between the text and images of news articles: New insights for multimedia analysis</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="4343" to="4351" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Newsimages: Addressing the depiction gap with an online news dataset for text-image rematching</title>
		<author>
			<persName><forename type="first">A</forename><surname>Lommatzsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kille</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tesic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bartolomeu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Semedo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Pivovarova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Larson</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="227" to="233" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Offline and online evaluation of news recommender systems at swissinfo</title>
		<author>
			<persName><forename type="first">F</forename><surname>Garcin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Faltings</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Donatsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Alazzawi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bruttin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Huber</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="169" to="176" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A survey of multimedia recommender systems: Challenges and opportunities</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Persia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Semantic Computing</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="411" to="428" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Continuous evaluation of large-scale information access systems: a case for living labs</title>
		<author>
			<persName><forename type="first">F</forename><surname>Hopfgartner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Balog</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lommatzsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kille</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Schuth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Larson</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="511" to="543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Clef 2017 newsreel overview: A stream-based recommender task for evaluation and education</title>
		<author>
			<persName><forename type="first">A</forename><surname>Lommatzsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kille</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hopfgartner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Larson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Brodt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Seiler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ö</forename><surname>Özgöbek</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="239" to="254" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Content-based recommender systems: State of the art and trends</title>
		<author>
			<persName><forename type="first">P</forename><surname>Lops</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>De Gemmis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Semeraro</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="73" to="105" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Is a picture worth a thousand words? an empirical study of image content and social media engagement</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xie</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Marketing Research</title>
		<imprint>
			<biblScope unit="volume">57</biblScope>
			<biblScope unit="page" from="1" to="19" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Qiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chilton</surname></persName>
		</author>
		<idno type="DOI">10.1145/3526113.3545621</idno>
		<title level="m">Multimodal image generation for news illustration</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
