NYTAC-CC: A Climate Change Subcorpus based on New York Times Articles Francesca Grasso1,∗,† , Ronny Patz2,† and Manfred Stede2,† 1 University of Turin, Corso Svizzera 185, 10149, Turin, Italy 2 University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany Abstract Over the past decade, the analysis of discourses on climate change (CC) has gained increased interest within the social sciences and the NLP community. Textual resources are crucial for understanding how narratives about this phenomenon are crafted and delivered. However, there still is a scarcity of datasets that cover CC in news media in a representative way. This paper presents a CC-specific subcorpus of 3,630 articles extracted from the 1.8 million New York Times Annotated Corpus, marking the first CC analysis on this data. The subcorpus was created by combining different methods for text selection to ensure representativeness and reliability, which is validated using ClimateBERT. To provide initial insights into the CC subcorpus, we discuss the results of a topic modeling experiment (LDA). These show the diversity of contexts in which CC is discussed in news media over time. Keywords Climate Change, Corpora, Topic Modeling 1. Introduction struction using blending of dictionary-based and super- vised methods in order to ensure representativeness as We present NYTAC-CC, a topic-specific subcorpus with well as validity and reliability, which are key in social 3,630 articles addressing climate change (CC), derived science research [3]. This hybrid approach addresses the from the New York Times Annotated Corpus. This sub- challenges of refining a topic-specific subcorpus from a corpus covers a 20-year period, drawing from NYTAC’s larger corpus, aiming to mitigate the limitations of tradi- collection of 1.8 million articles published between 1987 tional keyword-based sampling that often results in false and 2007, which is available through the Linguistic Data positives. Consortium. The original corpus, and thus also the (ii) To demonstrate the validity of the subcorpus, and subcorpus, includes a variety of metadata such as the thus its reliability for further downstream tasks, we il- ‘desk’ (the newspaper branch) and both manually- and lustrate the results of a classification experiment using automatically-labeled content categories, with many ar- ClimateBERT [4]. While this experiment further vali- ticles also featuring hand-written summaries. The ex- dates that the articles in our NYTAC-CC subcorpus are, tensive use of NYTAC in NLP research over the last 15 indeed, true positives, it also shows limitations of Cli- years (e.g., [1, 2]) benefits CC researchers, allowing for mateBERT. As ClimateBERT falsely classifies a number of detailed historical analysis of CC discussions in news true positives from our subcorpus as (false) negatives, we media. This includes exploring how CC debates were demonstrate that our approach achieves better results in interwoven with topics like domestic and foreign policy, ensuring recall of relevant CC articles from the NYTAC science reporting, and arts and culture coverage. Unlike corpus. other CC-focused resources that often contain shorter (iii) To gain initial insights into the CC subcorpus documents, the NYTAC-CC subcorpus offers a diverse coverage, we use keyword analysis and topic modeling array of articles with varying lengths and complex con- (specifically LDA) to track specifics of CC reporting over tent, making it a unique resource for investigating the the 1987-2007 time span. The results show important evolution of CC narratives over time. trends over time, including key periods of reporting and The contribution of this paper is threefold: a large variety of contexts in which CC is discussed. (i) We present the NYTAC-CC subcorpus and its con- Thus, our goal is to provide a substantively new and relevant subcorpus, developed and validated in multiple CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, iterations, and to then provide a first overview of the Dec 04 — 06, 2024, Pisa, Italy NYT’s coverage of climate change during the time period † These authors contributed equally. covered in our corpus. Although several studies have Envelope-Open fr.grasso@unito.it (F. Grasso); ronny.patz@uni-potsdam.de explored U.S. print media’s reporting on anthropogenic (R. Patz); stede@uni-potsdam.de (M. Stede) Orcid 0000-0001-8473-9491 (F. Grasso); 0000-0002-0761-086X (R. Patz); CC, we cover an important 20-year period in which much 0000-0001-6819-2043 (M. Stede) of today’s climate change discourse evolved. © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2. Related Work: CC in News 3. Building the NYTAC-CC Despite the growing interest in addressing climate 3.1. Challenges in CC Text Selection change among various academic communities, as pointed out by Luo et al. [5], the topic has so far received lim- The New York Times Annotated Corpus (LDC release)1 ited attention within the ’core’ NLP community. This contains 1,855,658 articles (1987-2007), each formatted is largely due to the NLP field’s focus on standardized as a single XML file. Metadata include date, author, and datasets and shared tasks, where the topic of CC has been newsroom desk. Articles are manually annotated with lo- scarcely addressed. cations, people, organizations, and key topics. However, Efforts can be observed within the context of social topic labels are generally not sufficient for our purpose, media, with datasets made available for CC-related tasks that is, finding all CC-related articles, because (i) not all [6, 7]. However, there remains a scarcity of work ad- articles are labeled; (ii) some labels of potentially CC- dressing CC at the news article level, which is essential relevant text are overly broad, e.g., ’weather,’ which also for the NLP community investigating CC narratives in encompasses many non-CC topics; and (iii) some articles media or performing downstream tasks involving longer we consider CC-relevant are tagged with labels that do texts. In contrast, the analysis of CC discourse on both not relate to CC. social media and traditional media has been extensively Our goal is to design a retrieval method that not only studied in various social science disciplines [8, 9]. In the ensures validity and reliability but also emphasizes repre- following, we will focus on prominent work targeting sentativeness, ensuring that the corpus adequately covers traditional news media. content related to the specific subject it aims to represent. A widely-cited early study by Trumbo [10] examined Traditional approaches, such as the use of keywords or the framing techniques used by various ”claim makers” n-grams, can be inadequate if used alone and can lead in the online editions of five U.S. newspapers. After to misclassifications due to both false positives and false querying with different terms and manually filtering the negatives. Crucially, this holds even with advanced mod- results, the remaining articles were thoroughly investi- els, particularly when tasked with processing large lin- gated. Boykoff [11] later studied the ”claims and frames” guistic units such as entire articles [18]. The changing issue in a similar manner. Legagneux et al. [12] con- use of language in time-spanning corpora can further ducted a comparative study of scientific literature and challenge single-method approaches, since they must press articles to investigate coverage differences between handle texts that, although consistent in topic, may cover CC and biodiversity. They analyzed materials from the the phenomenon in varied ways over time. USA, Canada, and the United Kingdom spanning 1991 Moreover, we aim for an approach that is reproducible, to 2016, using representative keywords to query and re- i.e., that can also be applied to other corpora that do not trieve relevant content. Similarly, [13] examined how come with this type of metadata. We have therefore opted journalistic norms affected CC reporting in U.S. TV and for a hybrid approach that combines the advantages of newspapers. Other studies examined the frequency of both keyword-based methods and automatic classifica- CC mentions, or the ’attention cycle’. Brossard et al. tion, while also aiming to overcome the weaknesses of [14] compared CC reporting between the NYT and the both. French Le Monde. Grundmann and Krishnamurthy [15] analyzed newspapers from four countries, enhancing arti- 3.2. Our Hybrid Approach cle counts with word frequency and collocation analyses using corpus-linguistic tools, where the outcomes are Our subcorpus construction is built on text retrieval meth- manually interpreted. The work of [16] highlights one ods previously used in studies on CC discourse (see, e.g., of the few instances where NLP technology is used to Section 2), but merges them into a hybrid approach to analyze CC in newspapers, where authors applied su- address their strengths and weaknesses. In the literature, pervised classification to construct a corpus and identify we identified the following approaches: frame categories within four U.S. papers. Continuing in 1. Search with bigrams: typically, this involves the NLP domain, [4] utilized a specialized corpus that terms like “climate change,” sometimes accompa- includes CC-related news articles, though details on data nied by one or two others, notably “global warm- retrieval are not available. [17] compiled a dataset of 11k ing” and ”greenhouse effect”; e.g., [10, 12]; news articles from Science Daily through web scraping. In conclusion, there remains a scarcity of available 2. Search with a longer list of keywords, followed corpora containing larger text units like entire articles, by manual filtering; e.g., [19, 18]; which are essential for the NLP community investigating CC narratives in traditional media or performing various downstream tasks involving news articles. 1 https://www.ldc.upenn.edu 3. Complex Boolean queries with keywords and op- erators (AND, OR, NOT); e.g., [20]; 4. Manual annotation of training data followed by supervised classification; e.g., [16]. As a first exploratory step, we experimented with method (1), obtaining the expected unsatisfactory results. We subsequently refined our retrieval process from the NYTAC by extending methods (2) and (4). Texts that we consider relevant for the CC topic must not only merely mention CC in passing, but should discuss aspects of an- thropogenic CC, relate substantial information, or convey a stance on its existence or urgency. Bigram search. Initially, we experimented with a list of bigrams (see Appendix A) sourced from the BBC Climate Change Glossary2 . This was done to cover ter- minologies used over the two decades spanned by the Figure 1: Key features in classifying ”climate change” articles corpus. This method led to the retrieval of 10,707 arti- cles. Upon manual inspection, we found that many were false positives, addressing general environmental issues but not specifically related to CC. Conversely, many arti- the labels ’1’ (CC-related) or ’0’ (not CC-related). cles we regarded as relevant did not contain the bigram We used the manually-annotated data to train and test ”climate change” (searching for this bigram yielded only an XGBoost classifier, configured to differentiate between 2,080 texts). Consequently, this led us to seek a more CC-related and non-CC articles. The features used in- elaborate approach. cluded keyword counts, (those from [21], plus ’Kyoto’), Keyword search. In response to the limited perfor- the 50 most frequent ’topic’ labels from the article meta- mance of the bigram search, we proceeded to extract data, and several binary features: whether an article was CC-related articles using keywords that were employed published by (i) the ’Dining’ or ’Style’ desks or by (ii) by [19] to identify topic-relevant articles in Nature and other desks; whether it was published on the weekend; Science (see Appendix B). To these, we added the key- whether a keyword appeared in the title or the first para- word ”Kyoto”, given the specific time period of our corpus graph; and whether the article was (i) an opinion piece or where the Kyoto conference had a similar importance a letter versus (ii) another type of article. The classifier as later the ”Paris agreement”. However, the resulting achieved a precision score of 1.0 and a recall score of 0.94 subcorpus still contained many false positives, primarily on our held-out evaluation set of 100 texts. Subsequently, from long list-like articles combining various news items. we used the classifier to label the entire intermediate cor- To ensure homogeneity, we excluded these articles, re- pus, labeling 9,253 articles as not CC-related and 3,630 sulting in an intermediate corpus of 12,883 articles. CC-related, thus forming what we now refer to as our Text ranking and supervised classification. To final ’NYTAC climate change subcorpus’ and make avail- overcome the presence of false positives, we implemented able as the list of document IDs.3 Figure 1 illustrates the an additional, more elaborate filtering step on the inter- features that had the greatest impact on the classification mediate corpus. Initially, we heuristically ranked the decisions. articles for topic relevance, using a score based on ac- cumulated keyword weights. This score reflects both 3.3. Evaluation with ClimateBERT the frequency of the keywords and their position within the article, as content in the beginning is generally con- We aim to demonstrate (i) the relevance of our 3,630- sidered most important. Specifically, we multiply the article subcorpus in genuinely consisting of climate number of keyword occurrences per sentence by a score change (CC)-related articles and, thereby, (ii) the validity representing sentence prominence (1 for the first sen- of our combined method for retrieving topic-consistent tence, 0.9 for the second, 0.8 for the third, and so on). texts from a larger, heterogeneous collection while min- After automatically ranking the articles, we selected 450 imizing false positives. To perform that validation, we articles for manual tagging: the top 150, the last 150, and employed ClimateBERT, specifically 𝐶𝑙𝑖𝑚𝑎𝑡𝑒𝐵𝑒𝑟𝑡𝐹 [4], a 150 from the middle. We manually assessed them to de- BERT-based model trained on CC-related texts. In partic- termine if they were at least partially about CC, using ular, we used distilroberta-base-climate-detector from the 2 3 https://www.bbc.com/news/science-environment-11833685 https://github.com/discourse-lab/NYTAC-CC Hugging Face platform[22], a fine-tuned version with a classification head for detecting climate-related para- graphs. Given its specialization in CC-related texts, we deemed ClimateBERT a very suitable tool to confirm the accuracy of our dataset. In doing so, we are also indirectly assessing the model’s capability in detecting CC-related content within larger portions of texts. As the model’s context length is limited to 512 tokens, we addressed this limitation by adopting two different approaches de- scribed below. In the first approach, longer texts were truncated due to the model’s limited context length. Of the 3,630 in- stances, the model recognized 3,468 articles as +climate. We manually inspected the remaining 162 texts classified as -climate, i.e., as false negatives. We found that the model clearly misclassified 75 texts, which included rele- vant CC content appearing beyond the initial 512 tokens. More qualitative insights on these 162 texts are provided in the subsection below. In addition, we attempted a second approach to over- come the context length constraint by using a sliding window technique. This involved creating chunks of longer texts (> 512 tokens), classifying each chunk, and labeling the entire text as +climate if any of the chunks Figure 2: Monthly article count in CC subcorpus were labeled as such. This second approach led to signif- icantly different results, as only 3 out of 3,630 instances were labeled -climate. These results demonstrate both the representativeness Kyoto Protocol or metaphorical uses of global warming. of our corpus and the validity of our hybrid subcorpus selection method. In addition, we show how automatic classification models can be limiting when dealing with 4. Overview of NYTAC-CC long text units, therefore reinforcing the need for a com- In this section, we provide an initial overview of the bined approach to build topic-relevant (sub)corpora. NYTAC-CC coverage, including the article distribution over time and a preliminary subtopics exploration. 3.4. Analysis of the ClimateBERT misclassifications 4.1. Temporal and Keyword highlights As discussed in Section 3.3, we manually inspected 162 We examine the temporal distribution of articles and key articles that ClimateBERT initially classified as false neg- lexical features in our corpus to illuminate trends and atives within our subcorpus. Of these, 75 were clearly shifts in CC coverage over time (see Figure 2). related to CC. Specifically, 48 articles featured significant The analysis reveals a peak in articles during 1990, discussions on CC-related issues beyond the model’s 512- with up to 50 mentions per month, followed by a decline token limit. Additionally, 27 articles contained detailed to 20 articles per month in the mid-90s. After the Kyoto CC narratives within the first 512 tokens, often intersect- Protocol in December 1997, the curve shows a steady rise ing with other topics like politics (e.g., conferences on with intermittent bursts in coverage. In the figure, we CC) and population (e.g., CC impacts on specific regions). have marked important ’climate events’ corresponding This misclassification highlights the models’ limitation to the years they occurred. extending beyond the mere input token limitation, un- The frequency ratios of the top eight lexical features derscoring the challenges in handling topic intersections. determined by the classifier (cf. Figure 1) over time in Although not the primary focus, CC was still men- Figure 3 illustrate the dominance of ’greenhouse’ in the tioned in the remaining articles. In particular, 51 articles late 1980s. ’Warming’ remains the most frequent term included CC in contexts marginally related to their main throughout, but in the final years, ’climate’ gains promi- narratives, integrating CC with other discussions. In nence, suggesting a shift of term preference from ’global another 36 articles, CC was a secondary topic, occasion- warming’ to ’climate change’—a transition noted in var- ally mentioned only in passing, such as references to the ious other studies as well. Also, the two ’Kyoto’ events 1. emission: country, world, greenhouse_gas, car- bon_dioxide, global_warming 2. administration: president, policy, white_house, bill, congress 3. people: time, life, book, world, earth 4. scientist: temperature, climate, study, research, uni- versity 5. energy: oil, fuel, gas, production, power 6. city: new_york, people, park, town, mayor, manhat- Figure 3: Keyword distributions over time tan 7. company: business, project, program, group, director 8. global_warming: report, climate_change, scientist, are clearly visible: the international accord was reached panel, editor in 1997, and the Bush administration’s decision not to 9. plant: coal, company, emission, power, utility ratify it occurred in 2001. At the same time, we also find that many articles fo- 10. water: area, land, river, population, fish cused on weather or pollution primarily addressed these 11. state: pollution, air, ozone, epa, smog issues directly, mentioning climate change only tangen- 12. china: government, people, war, security, country tially. This reduces the co-occurence of other prominent CC terms in these articles. 13. car: vehicle, fuel, gasoline, hydrogen, auto 14. ice: sea, arctic, ocean, glacier, bear 4.2. Document Structuring with LDA 15. forest: tree, plant, species, fire, crop Building on the basic statistics discussed in the previous 16. weather: winter, temperature, snow, degree, heat subsection, we delved deeper into the range of subtopics 17. storm: el_nino, drought, hurricane, wind, flood within the CC corpus using topic modeling, specifically 18. island: bird, beach, garden, long_island, sand Latent Dirichlet Allocation (LDA). This approach helps to uncover underlying thematic structures in the data, As is common with topic models, some overlap be- which are not immediately apparent from simple key- tween topics can occasionally be observed when examin- word analysis. ing the complete top-30 term lists, for example, between Preprocessing Steps To prepare the texts for LDA, topics company and plant. Additionally, we find some we performed several preprocessing steps on article titles apparent ’outlier’ terms in all the topics. and bodies, including removing punctuation, lemmatiz- As a preliminary approximation, we tagged each text ing words, and converting all text to lowercase to ensure in the subcorpus with the predominant topic identified by consistency. We also joined frequently co-occurring bi- the model, allowing us to track the evolution of topic cov- grams into single terms to preserve important phrases. erage over time (see Figure 4). This LDA-based analysis For our topic modeling, we focused on nouns and proper highlights how the context of CC-related coverage in the nouns that ranked among the top 10,000 by frequency NYTAC corpus shifts over time, for example from a fram- and had more than two letters. This refinement allowed ing within science and pollution debates to a discourse us to emphasize key entities and their relationships, cen- context in which greenhouse gas emissions were central. tral to the content of the articles, and avoid the dilution of Further, our findings complement the manual inspection thematic significance by less informative parts of speech, discussed in Section 3.3, illustrating how climate change enhancing consistency through the use of pseudowords. discussions, while sometimes secondary in broader arti- Model Selection The best LDA model was chosen cles on government policy (topic ’administration’), are based on the coherence score, calculated using the Python integral to discussions on foreign policy (’China’) and Gensim library. This ensures an objective selection pro- cultural topics (’people’). cess, minimizing subjective interpretation. We priori- tized coherence to ensure that the topics generated by the model are interpretable and meaningful. The optimal 5. Conclusion and Future Work model identified 18 topics, with a coherence score of .56, indicating a reasonable level of interpretability. We chose In this paper, we introduced the NYTAC-CC, a specialized the highest-ranked term as the ’name’ of each topic and subcorpus of 3,630 climate change articles from the New listed five additional representative terms as follows: York Times Annotated Corpus spanning 1987 to 2007, Figure 4: Topic coverage over the 20-year period marking the first CC analysis with this dataset. Address- with present by finding corresponding terms across ing the lack of available news-based textual resources time, in: Annual Meeting of the Association for NLP tasks, we employed a hybrid method combining for Computational Linguistics, 2015. URL: https: keyword-based prefiltering and automatic classification //api.semanticscholar.org/CorpusID:1121386. to optimize the corpus construction. The representative- [2] O. Alonso, K. Berberich, S. J. Bedathur, G. Weikum, ness of the subcorpus was confirmed using ClimateBERT, Time-based exploration of news archives, 2010. but additional manual inspection of ClimateBERT’s clas- URL: https://api.semanticscholar.org/CorpusID: sification of a relevant amount of true positives as (false) 2353972. negatives also showed the model’s limitations and the [3] C. Kantner, M. Overbeck, Exploring soft concepts benefits of the hybrid approach chosen. with hard corpus-analytic methods, in: N. Reiter, Initial analyses of the subcorpus, including statistics, A. Pichler, J. Kuhn (Eds.), Reflektierte algorithmis- keyword searches, and topic modeling, highlight the cor- che Textanalyse, De Gruyter, Berlin, 2020. pus’s potential for detailed diachronic and subtopic ex- [4] N. Webersinke, M. Kraus, J. Bingler, M. Leippold, ploration. ClimateBERT: A Pretrained Language Model for Thus, the NYTAC-CC subcorpus can be a useful re- Climate-Related Text, in: Proceedings of AAAI 2022 source for examining the historical narrative of climate Fall Symposium: The Role of AI in Responding to change in news media. As it builds on the NYTAC corpus, Climate Challenges, 2022. doi:https://doi.org/ it adds to previous work on this data, providing valuable 10.48550/arXiv.2212.13631 . insights for social science research. It also serves as a [5] Y. Luo, D. Card, D. Jurafsky, Detecting stance in beneficial dataset for developing NLP applications that re- media on global warming, in: Findings of the As- quire a deep understanding of climate-related discourse. sociation for Computational Linguistics: EMNLP While the size of the subcorpus may restrict certain quan- 2020, Online, 2020, pp. 3296–3315. titative analyses, its rich, concentrated content is ideal [6] D. Effrosynidis, A. Karasakalidis, G. Sylaios, for qualitative studies. Furthermore, it offers the poten- A. Arampatzis, The climate change twitter dataset, tial for expansion and further integration with additional Expert Syst. Appl. 204 (2022) 117541. URL: https: sources to enhance its utility and relevance for ongo- //api.semanticscholar.org/CorpusID:248807383. ing climate change research. Future work will expand [7] A. Samantray, P. Pin, Data and code for: Cred- on these findings with advanced topic modeling tech- ibility of climate change denial in social media niques and integrate more recent articles to enrich the (2019). URL: https://doi.org/10.7910/DVN/LNNPVD. diachronic analysis. doi:10.7910/DVN/LNNPVD . [8] T. Diehl, B. Huber, H. G. de Zúñiga, J. H. Liu, So- cial media and beliefs about climate change: A References cross-national analysis of news use, political ide- ology, and trust in science, International Jour- [1] Y. Zhang, A. Jatowt, S. S. Bhowmick, K. Tanaka, nal of Public Opinion Research (2019). URL: https: Omnia mutantur, nihil interit: Connecting past //api.semanticscholar.org/CorpusID:214067785. [9] A. Shehata, J. Johansson, B. Johansson, K. Ander- 27 Countries, Global Environmental Change 23 sen, Climate change frame acceptance and re- (2013) 1233–1248. sistance: Extreme weather, consonant news, and [21] M. Hulme, Why we disagree about climate change: personal media orientations, Mass Communica- Understanding controversy, inaction and opportu- tion and Society 25 (2021) 51 – 76. URL: https: nity, Cambridge UP, Cambridge, 2009. //api.semanticscholar.org/CorpusID:238720934. [22] J. Bingler, M. Kraus, M. Leippold, N. Webersinke, [10] C. Trumbo, Constructing climate change: claims How Cheap Talk in Climate Disclosures Relates and frames in US news coverage of an environmen- to Climate Initiatives, Corporate Emissions, and tal issue, Publ. Underst. Science 5 (1996) 269–283. Reputation Risk, Working paper, Available at SSRN [11] M. Boykoff, The cultural politics of climate change 3998435, 2023. discourse in UK tabloids, Political Geography 27 (2008) 549–569. [12] P. Legagneux, N. Casajus, K. Cazelles, C. Chevallier, M. Chevrinais, L. Guéry, C. Jacquet, M. Jaffré, M.-J. A. List of Bigrams Naud, F. Noisette, P. Ropars, S. Vissault, P. Archam- climate change, global warming, greenhouse effect, acid bault, J. Bêty, D. Berteaux, D. Gravel, Our house rain, ozone layer, greenhouse gases, fossil fuels, green- is burning: Discrepancy in climate change vs. bio- house emissions, ice shelves, ice sheets, rising sea, sea diversity coverage in the media as compared to levels, Kyoto Protocol, Montreal Protocol, carbon foot- scientific literature, Frontiers in Ecology and Evolu- print, carbon dioxide, carbon neutral, emission trading, tion 5 (2018). URL: https://api.semanticscholar.org/ feedback loop, global dimming, renewable energy, Stern CorpusID:39805874. Review. [13] M. Boykoff, J. Boykoff, Climate Change and Jour- nalistic Norms: A Case-Study of US Mass-Media Coverage, Geoforum 38 (2007) 1190–2004. B. List of Keywords [14] D. Brossard, J. Shanahan, K. McComas, Are issue- cycles culturally constructed? A comparison of climate, atmosphere, weather, warming, carbon, green- French and American coverage of global climate house, pollution. change, Mass Communication and Society 7 (2004) 359–377. [15] R. Grundmann, R. Krishnamurthy, The Discourse of Climate Change: A Corpus-based Approach, Criti- cal Approaches to Discourse Analysis across Disci- plines 4 (2010) 113–133. [16] D. A. Stecula, E. Merkley, Framing Climate Change: Economics, Ideology, and Uncertainty in American News Media Content From 1988 to 2014, Frontiers in Communication 4 (2019). [17] P. Mishra, R. Mittal, Neuralnere: Neural named entity relationship extraction for end-to-end cli- mate change knowledge graph construction, in: ICML 2021 Workshop on Tackling Climate Change with Machine Learning, 2021. URL: https://www. climatechange.ai/papers/icml2021/76. [18] M. Leippold, F. S. Varini, Climatext: A dataset for climate change topic detection, in: NeurIPS 2020 Workshop on Tackling Climate Change with Machine Learning, 2020. URL: https://www. climatechange.ai/papers/neurips2020/69. [19] M. Hulme, N. Obermeister, S. Randalls, M. Borie, Framing the challenge of climate change in Nature and Science editorials, nature climate change 8 (2018) 515–521. [20] A. Schmidt, A. Ivanova, M. S. Schäfer, Media At- tention for Climate Change around the World: A Comparative Analysis of Newspaper Coverage in