<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Leveraging Large Language Models for News Values Analysis (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gullal S. Cheema</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Massiollah Azimi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ralph Ewerth</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric Müller-Budack</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>L3S Research Center, Leibniz University Hannover</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Leibniz University Hannover</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>TIB - Leibniz Information Centre for Science and Technology</institution>
          ,
          <addr-line>Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This study explores the use of large language models (LLMs) for the automated extraction of news values, marking the first efort to apply LLMs to this task. We evaluate open-source models, including LLaMA3, Yi-1.5, and Qwen2, using structured prompts to detect four news values: sentiment, geographical proximity, timeliness, and eliteness. Results demonstrate promising performance with few-shot prompting over zero-shot evaluation. Manual annotations revealed challenges due to the complexity and subjectivity of news values, with moderate inter-annotator agreement (Cohen's kappa of 0.41), emphasizing the need for larger datasets and multiple annotators to ensure reliable ground truth labels. Fake news analysis showed distinct patterns, including a higher emphasis on negativity and prominence compared to true news.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;News values detection</kwd>
        <kwd>news analysis</kwd>
        <kwd>generative AI</kwd>
        <kwd>large language models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Motivation</title>
      <p>
        News values, as introduced by Galtung and Ruge [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], capture the attributes that determine the
"newsworthiness" of actors, events, and issues. They guide how news media construct narratives and
newsworthiness [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], shaping public perception and discourse. Contemporary studies have expanded their
applicability to diverse platforms, from social media engagement [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to the analysis of fake news [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
underscoring their relevance in traditional and digital contexts. For instance, Tandoc et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
highlighted distinct patterns in fake news, identifying a prevalence of timeliness, negativity, and prominence,
alongside a lack of objectivity. These approaches highlight the growing need for scalable, context-aware
computational methods, such as large language models (LLMs), to analyze and extract these values
from news articles efectively.
      </p>
      <p>
        So far, very few computational models on the detection of news values in news articles have been
introduced. Existing approaches have employed methods such as statistical features, linguistic analysis,
and machine learning approaches like Support Vector Machines (SVMs) and Convolutional Neural
Networks (CNNs) to extract and classify news values. Studies like Potts et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and Bednarek et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
utilized these methods on specific corpora, while others, such as di Buono et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Piotrkowicz et
al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], incorporated word embeddings and sentiment analysis. While these eforts provided interesting
insights, they rely on rather outdated AI approaches and have not yet leveraged powerful, generative
AI approaches [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to detect news values.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Classification of News Values using LLMs</title>
      <p>This work addresses the aforementioned limitations, and makes three primary contributions towards
the use of LLMs for computational news values analysis.</p>
      <p>
        (1) Evaluation of Open-Source LLMs: We evaluate the performance of state-of-the-art open-source
LLMs, including LLaMA3 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Yi-1.5 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and Qwen2 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], in detecting news values. These include
sentiment (positive or negative tone), geographical proximity (local, regional, national, or global relevance),
timeliness (old, recent, ongoing, or future events), and eliteness (presence of influential individuals
or organizations). Definitions and subcategories of news values are adapted from Cheema et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
To optimize model performance, we use structured prompt design with clear context, task definition,
and output constraints. We compare structured JSON outputs (Table 2) with unstructured formats to
assess consistency and evaluate the impact of zero-shot and few-shot prompting (in-context learning)
on annotation accuracy. Structured outputs consistently outperform unstructured ones; therefore,
we present comparison results and prompt types exclusively for structured outputs in Table 1. The
evaluation is conducted separately for each news value.
      </p>
      <p>
        (2) LLM vs Human Annotation: We compare LLM-generated annotations with human annotations to
evaluate strengths and limitations. Two annotators, each with a computer science background, label 20
news articles (10 true, 10 fake) from the ISOT Fake News Dataset [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which includes true news articles
from Reuters and fake news flagged by Politifact. Annotation guidelines are adapted and refined from
Cheema et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], ensuring a systematic and consistent evaluation of annotation quality.
      </p>
      <p>(3) True vs. Fake News Analysis: Leveraging the best performing LLM from the previous step, we
analyze 100 news articles (50 true, 50 fake) to examine diferences in detected news value patterns.
Through frequency and bigram analysis, we identify distinctive characteristics in true and fake news,
providing insights into how these patterns may contribute to automated misinformation detection.</p>
      <p>These contributions lay the groundwork for future research in automating news values analysis and
applying it to diverse journalistic and computational contexts.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Preliminary Results</title>
      <p>Inter-Annotator Agreement: To validate the reliability of manual annotations, we compared the
annotations made by two human annotators using Cohen’s kappa. The average kappa score across 20 news
articles articles was 0.41, indicating a moderate agreement. Individual kappa values varied widely,
ranging from 0.13 to 0.76, reflecting the complexity and subjectivity inherent in annotating news values
despite consistent guidelines.</p>
      <p>
        LLM performance: Qwen2 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] achieved the best results across the evaluated LLMs. Results for all LLMs
are provided in Table 1. Few-shot prompting with two examples, consistently outperformed zero-shot,
with notable improvements across all models, particularly for complex news values like timeliness and
geographical proximity. Qwen2 delivered robust results across most news values but struggled with
sentiment in the zero-shot setting. Prompt complexity (Table 2) also influenced outcomes, with simple
output format (Type 1) for direct classification excelling in zero-shot and sentence-wise analysis (Type
3) with justification format yielding better results in the few-shot setup.
      </p>
      <p>
        Comparison of News Values between True and Fake News: The analysis revealed distinct diferences
in narrative patterns between true and fake news articles. Mentions of prominent individuals and
organizations were frequent in both categories, reflecting the political focus of the dataset. Events
tied closely to the timing of publication were common, while references to older or future events
appeared less frequently, aligning with the topical nature of political reporting. A notable finding was
the significantly higher emphasis on negative language in fake news—over three times more frequent
than in true news—suggesting a strategic emphasis on negativity to attract attention or shape
perceptions. Additionally, patterns in news value bigrams revealed that fake news often amplifies eliteness
and negative sentiment through repeated associations, such as consecutive mentions of influential
entities or chains of negative sentiment. These results suggest that fake news leverages emotional and
influential elements [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to serve political or ideological purposes, ofering valuable insights for detecting
misinformation.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and Future Work</title>
      <p>This work presents the first study leveraging LLMs for automated extraction of news values,
demonstrating promising results on a small dataset. Manual annotations revealed challenges with complexity
and subjectivity, as shown by moderate inter-annotator agreement (Cohen’s kappa of 0.41), highlighting
the need for larger datasets and more annotators to establish reliable ground truth labels. Future work
will expand to more news values, larger datasets, and diverse media contexts. Combining manual and
LLM-generated annotations for semi-supervised datasets and exploring eficient fine-tuning methods,
such as instruction-tuning, could enhance performance.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was funded by the German Federal Ministry of Education and Research (BMBF, FakeNarratives
project, no. 16KIS1517).</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used GPT-4o in order to: Paraphrase and reword,
Grammar and spelling check. After using these tool(s)/service(s), the author(s) reviewed and edited the
content as needed and take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bednarek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Caple</surname>
          </string-name>
          ,
          <source>The Discourse of News Values: How News Organizations Create Newsworthiness</source>
          , Oxford University Press,
          <year>2017</year>
          . URL: https://doi.org/10.1093/acprof:oso/9780190653934. 001.0001. doi:
          <volume>10</volume>
          .1093/acprof:oso/9780190653934.001.0001.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Araujo</surname>
          </string-name>
          , T. G. van der Meer,
          <article-title>News values on social media: Exploring what drives peaks in user activity about organizations on twitter</article-title>
          ,
          <source>Journalism</source>
          <volume>21</volume>
          (
          <year>2020</year>
          )
          <fpage>633</fpage>
          -
          <lpage>651</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tandoc</surname>
          </string-name>
          , R. Thomas,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bishop</surname>
          </string-name>
          ,
          <article-title>What is (fake) news? analyzing news values (and more) in fake stories</article-title>
          .
          <source>media and communication</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <fpage>110</fpage>
          -
          <lpage>119</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Potts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bednarek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Caple</surname>
          </string-name>
          ,
          <article-title>How can computer-based methods help researchers to investigate news values in large datasets? a corpus linguistic study of the construction of newsworthiness in the reporting on hurricane katrina</article-title>
          ,
          <source>Discourse &amp; Communication</source>
          <volume>9</volume>
          (
          <year>2015</year>
          )
          <fpage>149</fpage>
          -
          <lpage>172</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bednarek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Caple</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Huan</surname>
          </string-name>
          ,
          <article-title>Computer-based analysis of news values: A case study on national day reporting</article-title>
          ,
          <source>Journalism Studies</source>
          <volume>22</volume>
          (
          <year>2021</year>
          )
          <fpage>702</fpage>
          -
          <lpage>722</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>M. P.</surname>
          </string-name>
          di Buono,
          <string-name>
            <given-names>J.</given-names>
            <surname>Snajder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Basic</surname>
          </string-name>
          , G. Glavas,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tutek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Milic-Frayling</surname>
          </string-name>
          ,
          <article-title>Predicting news values from headline text and emotions</article-title>
          ,
          <source>in: Proceedings of the 2017 Workshop: Natural Language Processing meets Journalism</source>
          ,
          <source>NLPmJ@EMNLP</source>
          , Copenhagen, Denmark, September 7,
          <year>2017</year>
          , Association for Computational Linguistics,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .18653/V1/W17-4201.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Piotrkowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dimitrova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Markert</surname>
          </string-name>
          ,
          <article-title>Automatic extraction of news values from headline text, in: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics</article-title>
          ,
          <string-name>
            <surname>EACL</surname>
          </string-name>
          <year>2017</year>
          , Valencia, Spain, April 3-
          <issue>7</issue>
          ,
          <year>2017</year>
          , Student Research Workshop, Association for Computational Linguistics,
          <year>2017</year>
          , pp.
          <fpage>64</fpage>
          -
          <lpage>74</lpage>
          . doi:
          <volume>10</volume>
          .18653/V1/E17-4007.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Yang</surname>
          </string-name>
          et al.,
          <source>Qwen2 technical report, CoRR abs/2407</source>
          .10671 (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .48550/ARXIV.2407. 10671. arXiv:
          <volume>2407</volume>
          .
          <fpage>10671</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dubey</surname>
          </string-name>
          et al.,
          <article-title>The llama 3 herd of models</article-title>
          ,
          <source>CoRR abs/2407</source>
          .21783 (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .48550/ARXIV. 2407.21783. arXiv:
          <volume>2407</volume>
          .
          <fpage>21783</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Young</surname>
          </string-name>
          et al.,
          <source>Yi: Open foundation models by 01</source>
          .ai,
          <source>CoRR abs/2403</source>
          .04652 (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .48550/ ARXIV.2403.04652. arXiv:
          <volume>2403</volume>
          .
          <fpage>04652</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Cheema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hakimov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Müller-Budack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Otto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Bateman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ewerth</surname>
          </string-name>
          ,
          <article-title>Understanding image-text relations and news values for multimodal news analysis</article-title>
          ,
          <source>Frontiers Artif. Intell</source>
          .
          <volume>6</volume>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .3389/FRAI.
          <year>2023</year>
          .
          <volume>1125533</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          , I. Traore,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saad</surname>
          </string-name>
          ,
          <article-title>Detecting opinion spams and fake news using text classification</article-title>
          ,
          <source>Security and Privacy</source>
          <volume>1</volume>
          (
          <year>2018</year>
          )
          <article-title>e9</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>