<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>CC-specific subcorpus of</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Change Subcorpus based on New York Times Articles</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francesca Grasso</string-name>
          <email>fr.grasso@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ronny Patz</string-name>
          <email>ronny.patz@uni-potsdam.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manfred Stede</string-name>
          <email>stede@uni-potsdam.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Climate Change</institution>
          ,
          <addr-line>Corpora, Topic Modeling</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Potsdam</institution>
          ,
          <addr-line>Karl-Liebknecht-Str. 24-25, 14476, Potsdam</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Turin</institution>
          ,
          <addr-line>Corso Svizzera 185, 10149, Turin</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2043</year>
      </pub-date>
      <volume>3</volume>
      <issue>630</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Over the past decade, the analysis of discourses on climate change (CC) has gained increased interest within the social sciences and the NLP community. Textual resources are crucial for understanding how narratives about this phenomenon are crafted and delivered. However, there still is a scarcity of datasets that cover CC in marking the first CC analysis on this data. The subcorpus was created by combining diferent methods for text selection to ensure representativeness and reliability, which is validated using ClimateBERT. To provide initial insights into the CC subcorpus, we discuss the results of a topic modeling experiment (LDA). These show the diversity of contexts in which CC is discussed in news media over time.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <sec id="sec-2-1">
        <title>We present NYTAC-CC, a topic-specific subcorpus with</title>
        <p>3,630 articles addressing climate change (CC), derived
from the New York Times Annotated Corpus. This
subcorpus covers a 20-year period, drawing from NYTAC’s
collection of 1.8 million articles published between 1987
and 2007, which is available through the Linguistic Data
Consortium.</p>
        <p>
          The original corpus, and thus also the
subcorpus, includes a variety of metadata such as the
‘desk’ (the newspaper branch) and both manually- and
automatically-labeled content categories, with many
articles also featuring hand-written summaries. The
extensive use of NYTAC in NLP research over the last 15
years (e.g., [
          <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
          ]) benefits CC researchers, allowing for
detailed historical analysis of CC discussions in news
media. This includes exploring how CC debates were
interwoven with topics like domestic and foreign policy,
science reporting, and arts and culture coverage. Unlike
other CC-focused resources that often contain shorter
documents, the NYTAC-CC subcorpus ofers a diverse
tent, making it a unique resource for investigating the
evolution of CC narratives over time.
        </p>
        <p>The contribution of this paper is threefold:
(i) We present the NYTAC-CC subcorpus and its
con†These authors contributed equally.
mateBERT. As ClimateBERT falsely classifies a number of
true positives from our subcorpus as (false) negatives, we
demonstrate that our approach achieves better results in
ensuring recall of relevant CC articles from the NYTAC
corpus.</p>
        <p>(iii) To gain initial insights into the CC subcorpus
coverage, we use keyword analysis and topic modeling
the 1987-2007 time span. The results show important
trends over time, including key periods of reporting and
a large variety of contexts in which CC is discussed.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Thus, our goal is to provide a substantively new and relevant subcorpus, developed and validated in multiple</title>
      </sec>
      <sec id="sec-2-3">
        <title>NYT’s coverage of climate change during the time period</title>
        <p>covered in our corpus. Although several studies have
explored U.S. print media’s reporting on anthropogenic</p>
      </sec>
      <sec id="sec-2-4">
        <title>CC, we cover an important 20-year period in which much</title>
        <p>of today’s climate change discourse evolved.
array of articles with varying lengths and complex con- (specifically LDA) to track specifics of CC reporting over</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work: CC in News</title>
    </sec>
    <sec id="sec-4">
      <title>3. Building the NYTAC-CC</title>
      <p>
        Despite the growing interest in addressing climate 3.1. Challenges in CC Text Selection
change among various academic communities, as pointed
out by Luo et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the topic has so far received lim- The New York Times Annotated Corpus (LDC release)1
ited attention within the ’core’ NLP community. This contains 1,855,658 articles (1987-2007), each formatted
is largely due to the NLP field’s focus on standardized as a single XML file. Metadata include date, author, and
datasets and shared tasks, where the topic of CC has been newsroom desk. Articles are manually annotated with
loscarcely addressed. cations, people, organizations, and key topics. However,
      </p>
      <p>
        Eforts can be observed within the context of social topic labels are generally not suficient for our purpose,
media, with datasets made available for CC-related tasks that is, finding all CC-related articles, because (i) not all
[
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. However, there remains a scarcity of work ad- articles are labeled; (ii) some labels of potentially
CCdressing CC at the news article level, which is essential relevant text are overly broad, e.g., ’weather,’ which also
for the NLP community investigating CC narratives in encompasses many non-CC topics; and (iii) some articles
media or performing downstream tasks involving longer we consider CC-relevant are tagged with labels that do
texts. In contrast, the analysis of CC discourse on both not relate to CC.
social media and traditional media has been extensively Our goal is to design a retrieval method that not only
studied in various social science disciplines [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. In the ensures validity and reliability but also emphasizes
reprefollowing, we will focus on prominent work targeting sentativeness, ensuring that the corpus adequately covers
traditional news media. content related to the specific subject it aims to represent.
      </p>
      <p>
        A widely-cited early study by Trumbo [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] examined Traditional approaches, such as the use of keywords or
the framing techniques used by various ”claim makers” n-grams, can be inadequate if used alone and can lead
in the online editions of five U.S. newspapers. After to misclassifications due to both false positives and false
querying with diferent terms and manually filtering the negatives. Crucially, this holds even with advanced
modresults, the remaining articles were thoroughly investi- els, particularly when tasked with processing large
lingated. Boykof [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] later studied the ”claims and frames” guistic units such as entire articles [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The changing
issue in a similar manner. Legagneux et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] con- use of language in time-spanning corpora can further
ducted a comparative study of scientific literature and challenge single-method approaches, since they must
press articles to investigate coverage diferences between handle texts that, although consistent in topic, may cover
CC and biodiversity. They analyzed materials from the the phenomenon in varied ways over time.
USA, Canada, and the United Kingdom spanning 1991 Moreover, we aim for an approach that is reproducible,
to 2016, using representative keywords to query and re- i.e., that can also be applied to other corpora that do not
trieve relevant content. Similarly, [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] examined how come with this type of metadata. We have therefore opted
journalistic norms afected CC reporting in U.S. TV and for a hybrid approach that combines the advantages of
newspapers. Other studies examined the frequency of both keyword-based methods and automatic
classificaCC mentions, or the ’attention cycle’. Brossard et al. tion, while also aiming to overcome the weaknesses of
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] compared CC reporting between the NYT and the both.
      </p>
      <p>
        French Le Monde. Grundmann and Krishnamurthy [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
analyzed newspapers from four countries, enhancing arti- 3.2. Our Hybrid Approach
cle counts with word frequency and collocation analyses
using corpus-linguistic tools, where the outcomes are Our subcorpus construction is built on text retrieval
methmanually interpreted. The work of [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] highlights one ods previously used in studies on CC discourse (see, e.g.,
of the few instances where NLP technology is used to Section 2), but merges them into a hybrid approach to
analyze CC in newspapers, where authors applied su- address their strengths and weaknesses. In the literature,
pervised classification to construct a corpus and identify we identified the following approaches:
frame categories within four U.S. papers. Continuing in
the NLP domain, [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] utilized a specialized corpus that 1. Search with bigrams: typically, this involves
includes CC-related news articles, though details on data terms like “climate change,” sometimes
accomparetrieval are not available. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] compiled a dataset of 11k nied by one or two others, notably “global
warmnews articles from Science Daily through web scraping. ing” and ”greenhouse efect”; e.g., [
        <xref ref-type="bibr" rid="ref10 ref12">10, 12</xref>
        ];
      </p>
      <p>
        In conclusion, there remains a scarcity of available 2. Search with a longer list of keywords, followed
corpora containing larger text units like entire articles, by manual filtering; e.g., [
        <xref ref-type="bibr" rid="ref18 ref19">19, 18</xref>
        ];
which are essential for the NLP community investigating
CC narratives in traditional media or performing various
downstream tasks involving news articles.
3. Complex Boolean queries with keywords and
op
      </p>
      <p>
        erators (AND, OR, NOT); e.g., [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ];
      </p>
      <sec id="sec-4-1">
        <title>4. Manual annotation of training data followed by supervised classification; e.g., [ 16].</title>
      </sec>
      <sec id="sec-4-2">
        <title>As a first exploratory step, we experimented with</title>
        <p>method (1), obtaining the expected unsatisfactory results.</p>
        <p>We subsequently refined our retrieval process from the
NYTAC by extending methods (2) and (4). Texts that we
consider relevant for the CC topic must not only merely
mention CC in passing, but should discuss aspects of
anthropogenic CC, relate substantial information, or convey
a stance on its existence or urgency.</p>
        <p>
          Bigram search. Initially, we experimented with a
list of bigrams (see Appendix A) sourced from the BBC
Climate Change Glossary2. This was done to cover
terminologies used over the two decades spanned by the Figure 1: Key features in classifying ”climate change” articles
corpus. This method led to the retrieval of 10,707
articles. Upon manual inspection, we found that many were
false positives, addressing general environmental issues
but not specifically related to CC. Conversely, many arti- the labels ’1’ (CC-related) or ’0’ (not CC-related).
cles we regarded as relevant did not contain the bigram We used the manually-annotated data to train and test
”climate change” (searching for this bigram yielded only an XGBoost classifier, configured to diferentiate between
2,080 texts). Consequently, this led us to seek a more CC-related and non-CC articles. The features used
inelaborate approach. cluded keyword counts, (those from [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], plus ’Kyoto’),
        </p>
        <p>
          Keyword search. In response to the limited perfor- the 50 most frequent ’topic’ labels from the article
metamance of the bigram search, we proceeded to extract data, and several binary features: whether an article was
CC-related articles using keywords that were employed published by (i) the ’Dining’ or ’Style’ desks or by (ii)
by [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] to identify topic-relevant articles in Nature and other desks; whether it was published on the weekend;
Science (see Appendix B). To these, we added the key- whether a keyword appeared in the title or the first
paraword ”Kyoto”, given the specific time period of our corpus graph; and whether the article was (i) an opinion piece or
where the Kyoto conference had a similar importance a letter versus (ii) another type of article. The classifier
as later the ”Paris agreement”. However, the resulting achieved a precision score of 1.0 and a recall score of 0.94
subcorpus still contained many false positives, primarily on our held-out evaluation set of 100 texts. Subsequently,
from long list-like articles combining various news items. we used the classifier to label the entire intermediate
corTo ensure homogeneity, we excluded these articles, re- pus, labeling 9,253 articles as not CC-related and 3,630
sulting in an intermediate corpus of 12,883 articles. CC-related, thus forming what we now refer to as our
        </p>
        <p>
          Text ranking and supervised classification. To ifnal ’NYTAC climate change subcorpus’ and make
availovercome the presence of false positives, we implemented able as the list of document IDs.3 Figure 1 illustrates the
an additional, more elaborate filtering step on the inter- features that had the greatest impact on the classification
mediate corpus. Initially, we heuristically ranked the decisions.
articles for topic relevance, using a score based on
accumulated keyword weights. This score reflects both 3.3. Evaluation with ClimateBERT
the frequency of the keywords and their position within
the article, as content in the beginning is generally con- We aim to demonstrate (i) the relevance of our
3,630sidered most important. Specifically, we multiply the article subcorpus in genuinely consisting of climate
number of keyword occurrences per sentence by a score change (CC)-related articles and, thereby, (ii) the validity
representing sentence prominence (1 for the first sen- of our combined method for retrieving topic-consistent
tence, 0.9 for the second, 0.8 for the third, and so on). texts from a larger, heterogeneous collection while
minAfter automatically ranking the articles, we selected 450 imizing false positives. To perform that validation, we
articles for manual tagging: the top 150, the last 150, and employed ClimateBERT, specifically    [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], a
150 from the middle. We manually assessed them to de- BERT-based model trained on CC-related texts. In
partictermine if they were at least partially about CC, using ular, we used distilroberta-base-climate-detector from the
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>2https://www.bbc.com/news/science-environment-11833685</title>
      </sec>
      <sec id="sec-4-4">
        <title>3https://github.com/discourse-lab/NYTAC-CC</title>
        <p>
          Hugging Face platform[
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], a fine-tuned version with
a classification head for detecting climate-related
paragraphs. Given its specialization in CC-related texts, we
deemed ClimateBERT a very suitable tool to confirm the
accuracy of our dataset. In doing so, we are also indirectly
assessing the model’s capability in detecting CC-related
content within larger portions of texts. As the model’s
context length is limited to 512 tokens, we addressed
this limitation by adopting two diferent approaches
described below.
        </p>
        <p>In the first approach, longer texts were truncated due
to the model’s limited context length. Of the 3,630
instances, the model recognized 3,468 articles as +climate.</p>
        <p>We manually inspected the remaining 162 texts classified
as -climate, i.e., as false negatives. We found that the
model clearly misclassified 75 texts, which included
relevant CC content appearing beyond the initial 512 tokens.</p>
        <p>More qualitative insights on these 162 texts are provided
in the subsection below.</p>
        <p>In addition, we attempted a second approach to
overcome the context length constraint by using a sliding
window technique. This involved creating chunks of
longer texts (&gt; 512 tokens), classifying each chunk, and
labeling the entire text as +climate if any of the chunks Figure 2: Monthly article count in CC subcorpus
were labeled as such. This second approach led to
significantly diferent results, as only 3 out of 3,630 instances
were labeled -climate.</p>
        <p>These results demonstrate both the representativeness Kyoto Protocol or metaphorical uses of global warming.
of our corpus and the validity of our hybrid subcorpus
selection method. In addition, we show how automatic 4. Overview of NYTAC-CC
classification models can be limiting when dealing with
long text units, therefore reinforcing the need for a com- In this section, we provide an initial overview of the
bined approach to build topic-relevant (sub)corpora. NYTAC-CC coverage, including the article distribution
over time and a preliminary subtopics exploration.</p>
        <sec id="sec-4-4-1">
          <title>3.4. Analysis of the ClimateBERT misclassifications</title>
        </sec>
      </sec>
      <sec id="sec-4-5">
        <title>As discussed in Section 3.3, we manually inspected 162</title>
        <p>articles that ClimateBERT initially classified as false
negatives within our subcorpus. Of these, 75 were clearly
related to CC. Specifically, 48 articles featured significant
discussions on CC-related issues beyond the model’s
512token limit. Additionally, 27 articles contained detailed
CC narratives within the first 512 tokens, often
intersecting with other topics like politics (e.g., conferences on
CC) and population (e.g., CC impacts on specific regions).
This misclassification highlights the models’ limitation
extending beyond the mere input token limitation,
underscoring the challenges in handling topic intersections.</p>
        <p>Although not the primary focus, CC was still
mentioned in the remaining articles. In particular, 51 articles
included CC in contexts marginally related to their main
narratives, integrating CC with other discussions. In
another 36 articles, CC was a secondary topic,
occasionally mentioned only in passing, such as references to the</p>
        <sec id="sec-4-5-1">
          <title>4.1. Temporal and Keyword highlights</title>
        </sec>
      </sec>
      <sec id="sec-4-6">
        <title>We examine the temporal distribution of articles and key</title>
        <p>lexical features in our corpus to illuminate trends and
shifts in CC coverage over time (see Figure 2).</p>
        <p>The analysis reveals a peak in articles during 1990,
with up to 50 mentions per month, followed by a decline
to 20 articles per month in the mid-90s. After the Kyoto
Protocol in December 1997, the curve shows a steady rise
with intermittent bursts in coverage. In the figure, we
have marked important ’climate events’ corresponding
to the years they occurred.</p>
        <p>The frequency ratios of the top eight lexical features
determined by the classifier (cf. Figure 1) over time in
Figure 3 illustrate the dominance of ’greenhouse’ in the
late 1980s. ’Warming’ remains the most frequent term
throughout, but in the final years, ’climate’ gains
prominence, suggesting a shift of term preference from ’global
warming’ to ’climate change’—a transition noted in
various other studies as well. Also, the two ’Kyoto’ events
are clearly visible: the international accord was reached
in 1997, and the Bush administration’s decision not to 9. plant: coal, company, emission, power, utility
ratify it occurred in 2001.</p>
        <p>At the same time, we also find that many articles fo- 10. water: area, land, river, population, fish
cused on weather or pollution primarily addressed these 11. state: pollution, air, ozone, epa, smog
issues directly, mentioning climate change only tangen- 12. china: government, people, war, security, country
tially. This reduces the co-occurence of other prominent
CC terms in these articles. 13. car: vehicle, fuel, gasoline, hydrogen, auto</p>
        <sec id="sec-4-6-1">
          <title>4.2. Document Structuring with LDA</title>
          <p>14. ice: sea, arctic, ocean, glacier, bear
15. forest: tree, plant, species, fire, crop
Building on the basic statistics discussed in the previous 16. weather: winter, temperature, snow, degree, heat
subsection, we delved deeper into the range of subtopics 17. storm: el_nino, drought, hurricane, wind, flood
within the CC corpus using topic modeling, specifically
Latent Dirichlet Allocation (LDA). This approach helps 18. island: bird, beach, garden, long_island, sand
to uncover underlying thematic structures in the data, As is common with topic models, some overlap
bewhich are not immediately apparent from simple key- tween topics can occasionally be observed when
examinword analysis. ing the complete top-30 term lists, for example, between</p>
          <p>Preprocessing Steps To prepare the texts for LDA, topics company and plant. Additionally, we find some
we performed several preprocessing steps on article titles apparent ’outlier’ terms in all the topics.
and bodies, including removing punctuation, lemmatiz- As a preliminary approximation, we tagged each text
ing words, and converting all text to lowercase to ensure in the subcorpus with the predominant topic identified by
consistency. We also joined frequently co-occurring bi- the model, allowing us to track the evolution of topic
covgrams into single terms to preserve important phrases. erage over time (see Figure 4). This LDA-based analysis
For our topic modeling, we focused on nouns and proper highlights how the context of CC-related coverage in the
nouns that ranked among the top 10,000 by frequency NYTAC corpus shifts over time, for example from a
framand had more than two letters. This refinement allowed ing within science and pollution debates to a discourse
us to emphasize key entities and their relationships, cen- context in which greenhouse gas emissions were central.
tral to the content of the articles, and avoid the dilution of Further, our findings complement the manual inspection
thematic significance by less informative parts of speech, discussed in Section 3.3, illustrating how climate change
enhancing consistency through the use of pseudowords. discussions, while sometimes secondary in broader
arti</p>
          <p>Model Selection The best LDA model was chosen cles on government policy (topic ’administration’), are
based on the coherence score, calculated using the Python integral to discussions on foreign policy (’China’) and
Gensim library. This ensures an objective selection pro- cultural topics (’people’).
cess, minimizing subjective interpretation. We
prioritized coherence to ensure that the topics generated by
the model are interpretable and meaningful. The optimal 5. Conclusion and Future Work
model identified 18 topics, with a coherence score of .56,
indicating a reasonable level of interpretability. We chose In this paper, we introduced the NYTAC-CC, a specialized
the highest-ranked term as the ’name’ of each topic and subcorpus of 3,630 climate change articles from the New
listed five additional representative terms as follows: York Times Annotated Corpus spanning 1987 to 2007,
marking the first CC analysis with this dataset.
Addressing the lack of available news-based textual resources
for NLP tasks, we employed a hybrid method combining
keyword-based prefiltering and automatic classification
to optimize the corpus construction. The
representativeness of the subcorpus was confirmed using ClimateBERT,
but additional manual inspection of ClimateBERT’s
classification of a relevant amount of true positives as (false)
negatives also showed the model’s limitations and the
benefits of the hybrid approach chosen.</p>
          <p>Initial analyses of the subcorpus, including statistics,
keyword searches, and topic modeling, highlight the
corpus’s potential for detailed diachronic and subtopic
exploration.</p>
          <p>Thus, the NYTAC-CC subcorpus can be a useful
resource for examining the historical narrative of climate
change in news media. As it builds on the NYTAC corpus,
it adds to previous work on this data, providing valuable
insights for social science research. It also serves as a
beneficial dataset for developing NLP applications that
require a deep understanding of climate-related discourse.</p>
          <p>While the size of the subcorpus may restrict certain
quantitative analyses, its rich, concentrated content is ideal
for qualitative studies. Furthermore, it ofers the
potential for expansion and further integration with additional
sources to enhance its utility and relevance for
ongoing climate change research. Future work will expand
on these findings with advanced topic modeling
techniques and integrate more recent articles to enrich the
diachronic analysis.</p>
          <p>A. List of Bigrams
climate change, global warming, greenhouse efect, acid
rain, ozone layer, greenhouse gases, fossil fuels,
greenhouse emissions, ice shelves, ice sheets, rising sea, sea
levels, Kyoto Protocol, Montreal Protocol, carbon
footprint, carbon dioxide, carbon neutral, emission trading,
feedback loop, global dimming, renewable energy, Stern
Review.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>B. List of Keywords</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Bhowmick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tanaka</surname>
          </string-name>
          ,
          <article-title>Omnia mutantur, nihil interit: Connecting past with present by finding corresponding terms across time</article-title>
          ,
          <source>in: Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2015</year>
          . URL: https: //api.semanticscholar.org/CorpusID:1121386.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Alonso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Berberich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Bedathur</surname>
          </string-name>
          , G. Weikum,
          <article-title>Time-based exploration of news archives</article-title>
          ,
          <year>2010</year>
          . URL: https://api.semanticscholar.org/CorpusID: 2353972.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Kantner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Overbeck</surname>
          </string-name>
          ,
          <article-title>Exploring soft concepts with hard corpus-analytic methods</article-title>
          , in: N.
          <string-name>
            <surname>Reiter</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Pichler</surname>
          </string-name>
          , J. Kuhn (Eds.), Reflektierte algorithmische Textanalyse, De Gruyter, Berlin,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Webersinke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kraus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bingler</surname>
          </string-name>
          , M. Leippold,
          <article-title>ClimateBERT: A Pretrained Language Model for Climate-Related Text</article-title>
          ,
          <source>in: Proceedings of AAAI 2022 Fall Symposium: The Role of AI</source>
          in Responding to Climate Challenges,
          <year>2022</year>
          . doi:https://doi.org/ 10.48550/arXiv.2212.13631.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Card</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <article-title>Detecting stance in media on global warming, in: Findings of the Association for Computational Linguistics: EMNLP 2020</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>3296</fpage>
          -
          <lpage>3315</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Efrosynidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karasakalidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sylaios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Arampatzis</surname>
          </string-name>
          ,
          <article-title>The climate change twitter dataset</article-title>
          ,
          <source>Expert Syst. Appl</source>
          .
          <volume>204</volume>
          (
          <year>2022</year>
          )
          <article-title>117541</article-title>
          . URL: https: //api.semanticscholar.org/CorpusID:248807383.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Samantray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pin</surname>
          </string-name>
          ,
          <article-title>Data and code for: Credibility of climate change denial in social media (</article-title>
          <year>2019</year>
          ). URL: https://doi.org/10.7910/DVN/LNNPVD. doi:
          <volume>10</volume>
          .7910/DVN/LNNPVD.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Diehl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Huber</surname>
          </string-name>
          , H. G. de Zúñiga,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Social media and beliefs about climate change: A cross-national analysis of news use, political ideology, and</article-title>
          trust in science,
          <source>International Journal of Public Opinion Research</source>
          (
          <year>2019</year>
          ). URL: https: //api.semanticscholar.org/CorpusID:214067785.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shehata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Johansson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Johansson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Andersen</surname>
          </string-name>
          ,
          <article-title>Climate change frame acceptance and resistance: Extreme weather, consonant news, and personal media orientations</article-title>
          ,
          <source>Mass Communication and Society</source>
          <volume>25</volume>
          (
          <year>2021</year>
          )
          <fpage>51</fpage>
          -
          <lpage>76</lpage>
          . URL: https: //api.semanticscholar.org/CorpusID:238720934.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Trumbo</surname>
          </string-name>
          ,
          <article-title>Constructing climate change: claims and frames in US news coverage of an environmental issue</article-title>
          ,
          <source>Publ. Underst. Science</source>
          <volume>5</volume>
          (
          <year>1996</year>
          )
          <fpage>269</fpage>
          -
          <lpage>283</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Boykof</surname>
          </string-name>
          ,
          <article-title>The cultural politics of climate change discourse in UK tabloids</article-title>
          ,
          <source>Political Geography</source>
          <volume>27</volume>
          (
          <year>2008</year>
          )
          <fpage>549</fpage>
          -
          <lpage>569</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Legagneux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Casajus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cazelles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chevallier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chevrinais</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Guéry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jacquet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jafré</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-J. Naud</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Noisette</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Ropars</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Vissault</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Archambault</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bêty</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Berteaux</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Gravel</surname>
          </string-name>
          ,
          <article-title>Our house is burning: Discrepancy in climate change vs. biodiversity coverage in the media as compared to scientific literature</article-title>
          ,
          <source>Frontiers in Ecology and Evolution</source>
          <volume>5</volume>
          (
          <year>2018</year>
          ). URL: https://api.semanticscholar.org/ CorpusID:39805874.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Boykof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Boykof</surname>
          </string-name>
          ,
          <source>Climate Change and Journalistic Norms: A Case-Study of US Mass-Media Coverage, Geoforum</source>
          <volume>38</volume>
          (
          <year>2007</year>
          )
          <fpage>1190</fpage>
          -
          <lpage>2004</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Brossard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shanahan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>McComas</surname>
          </string-name>
          ,
          <article-title>Are issuecycles culturally constructed? A comparison of French and American coverage of global climate change</article-title>
          ,
          <source>Mass Communication and Society</source>
          <volume>7</volume>
          (
          <year>2004</year>
          )
          <fpage>359</fpage>
          -
          <lpage>377</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Grundmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Krishnamurthy</surname>
          </string-name>
          ,
          <article-title>The Discourse of Climate Change: A Corpus-based Approach, Critical Approaches to Discourse Analysis across Disciplines 4 (</article-title>
          <year>2010</year>
          )
          <fpage>113</fpage>
          -
          <lpage>133</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Stecula</surname>
          </string-name>
          , E. Merkley, Framing Climate Change: Economics, Ideology, and
          <article-title>Uncertainty in American News Media Content From 1988 to 2014, Frontiers in Communication 4 (</article-title>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mittal</surname>
          </string-name>
          , Neuralnere:
          <article-title>Neural named entity relationship extraction for end-to-end climate change knowledge graph construction</article-title>
          ,
          <source>in: ICML 2021 Workshop on Tackling Climate Change with Machine Learning</source>
          ,
          <year>2021</year>
          . URL: https://www. climatechange.ai/papers/icml2021/76.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Leippold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. S.</given-names>
            <surname>Varini</surname>
          </string-name>
          ,
          <article-title>Climatext: A dataset for climate change topic detection</article-title>
          ,
          <source>in: NeurIPS 2020 Workshop on Tackling Climate Change with Machine Learning</source>
          ,
          <year>2020</year>
          . URL: https://www. climatechange.ai/papers/neurips2020/69.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hulme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Obermeister</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Randalls</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Borie, Framing the challenge of climate change in Nature and Science editorials</article-title>
          ,
          <source>nature climate change 8</source>
          (
          <year>2018</year>
          )
          <fpage>515</fpage>
          -
          <lpage>521</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ivanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <source>Media Attention for Climate Change around the World: A Comparative Analysis of Newspaper Coverage in 27 Countries, Global Environmental Change</source>
          <volume>23</volume>
          (
          <year>2013</year>
          )
          <fpage>1233</fpage>
          -
          <lpage>1248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hulme</surname>
          </string-name>
          ,
          <article-title>Why we disagree about climate change: Understanding controversy, inaction and opportunity</article-title>
          , Cambridge UP, Cambridge,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bingler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kraus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Leippold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Webersinke</surname>
          </string-name>
          , How Cheap Talk in Climate Disclosures Relates to Climate Initiatives, Corporate Emissions, and Reputation Risk, Working paper,
          <source>Available at SSRN 3998435</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>