<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Dec</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>How Green is Sentiment Analysis? Environmental Topics in Corpora at the University of Turin</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cristina Bosco</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muhammad Okky Ibrohim</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valerio Basile</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Indra Budi</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CRISIS - Centro di Ricerca Interuniversitario sui cambiamenti Socio-ecologici e la transizione alla Sostenibilità</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dipartimento di Informatica - Università degli Studi di Torino</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Faculty of Computer Science - Universitas Indonesia</institution>
          ,
          <country country="ID">Indonesia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>langing.</institution>
          <addr-line>ai</addr-line>
          ,
          <country country="ID">Indonesia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>02</volume>
      <issue>2023</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Despite the unanimous recognition of the plight associated with environmental phenomena and the proliferation of the discourse about it, there is still little work on these issues in the field of NLP. This paper provides a report on the activities we are carrying on at the University of Turin in the application of Sentiment Analysis to environmental topics. In pursuit of the goal of developing resources and tools specifically designed for addressing the complexity of the ongoing environmental debate, we are currently focused on exploring the language used for green issues and defining some annotation schemes that can describe them at diferent granularity.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;environment</kwd>
        <kwd>corpora</kwd>
        <kwd>sentiment analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>scribe and discuss environmental topics also mirrors this
complexity and is featured by a certain degree of
specialIt has become increasingly common to apply Sentiment ization.</p>
      <p>Analysis (SA) and text classification to issues with social Modelling this reality can be therefore especially
comimpact about which people debate. On the one hand, plex but also particularly useful because it ultimately
studying a socially impacting phenomenon from such a allows us to better understand the relationship between
computational perspective means creating a precise con- humans and the environment and to be more aware of
ceptual and linguistic model, thereby achieving a greater the sensitivity towards the environment which is hidden
understanding of its characteristics, its dynamics, and, in us.
not least, how people perceive it. On the other hand, it The characteristics of the discourse about the
environis a matter of creating tools that can help policymakers ment can make especially challenging the classification
and citizens define strategies to address the problems of opinions expressed about it. We may hypothesize that
associated with the phenomenon, bearing in mind that an accurate annotation of data about environmental
topthe impact of an intervention depends meaningfully on ics can be helpful in order to achieve reliable results, e.g.,
how it is proposed by governments and political parties in the detection of the polarity or stance in these texts.
and accepted by citizens. According to this hypothesis, we are following two major</p>
      <p>Among the issues that have a unique social importance directions: a) to preliminary analyze the linguistic
featoday are certainly those related to the environment in tures of the discourse about the environment carried on
which we live. As far as the emergency related to the in diferent text genres and b) to design specific
annotaenvironment, at first sight, one cannot but notice that tion schemes that take into account the specific features
the environmental issues underlie a great complexity. of these texts and to apply them on selected corpora.
This is due to the mixing of natural and human entities The first direction allowed us to better understand the
and related interests, such as individuals, public and pri- meaning of the wide-spreading discussion about the
lanvate organisations on the one side, and climate, animals guage used in green communication. This was also useful
and plants on the other one. The language used to de- in preparing the ground for the second direction of
rewhich environmental topics are addressed by applying Italian journals in which are discussed environmental
SA and in which only fairly rough techniques were used. topics. The first sample of data, described in section 3.1,</p>
      <p>In this paper, we describe a variety of experiences is the result of a random collection while the second
carried on at the Department of Computer Science of the one, described in section 3.2, is collected using keywords
University of Turin in the development of corpora and about a specific topic related to the environment, i.e.
livetools for SA applied to environmental topics during the stock.
last few years.</p>
      <p>The paper is organized as follows. The next section 3.1. European Environment Agency
briefly surveys previous work related to the application
of SA to environmental topics. Section three focuses
on the collection of data, while the fourth is about the
annotation schemes we adopted. Finally, the last section
provides some conclusions and hints about our future
works.</p>
      <p>The first step in our investigation consisted of a linguistic
analysis of the discourse about the environment and we 3.2. Livestock Issues
applied it to documents from public institutions or online
journals to inform citizens about these topics. Applying The livestock sector is currently at the center of a heated
a multilingual perspective we collected texts from an in- debate that has focused mainly on intensive farming.
stitutional website in Italian and English, and from some Among the several publications in which these issues are
1https://quifinanza.it/green/stop-al-greenwashing-in-etichetta-c
osa-vuole-fare-lue/699054/
2https://www.eea.europa.eu/en
3https://www.eionet.europa.eu/</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>There is a huge amount of divulgation and
communication about environmental issues related in particular
to products and services. A 2020 EU Commission study
found that more than half of the environmental claims
examined in the EU were vague, misleading or unfounded,
while 40% were completely unfounded1. In section 3.3,
we moreover show that it can be dificult for citizens to
understand the exact meaning of texts discussing issues
related to the environment, making easier to mislead
their content.</p>
      <p>
        To explore SA applied to environment topics,
researchers have conducted reviews and surveys providing
diferent perspectives. In particular, in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a review is
conducted to explore the application of SA in the climate
change debate. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] explore the use of SA for analyzing
opinions on several smart city issues like climate change,
urban policy, energy, and trafic. While [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] explore
papers that used various types of data sources (i.e. news
articles, social media, etc.), [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] explore only papers that
analyze sentiment in social media. However, both [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
and [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] do not provide an in-depth exploration of the
NLP techniques (from the creation of dataset to the
evaluation of SA models) that researchers used applying SA
on natural environment topics, since they only cover a
few among the large variety of topics closely related to
nature and environment, like food or carbon issues.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Exploring Green Language</title>
      <p>The European Environment Agency2 (EEA) is an agency of
the European Union that delivers knowledge and data to
support Europe’s environment and climate goals. Since
1994, EEA and the European Information Network
Environmental training and observation 3 (Eionet) provides data
and information on Europe’s climate and environment to
citizens and decision-makers European politicians,
publishing articles and more extensive reports which address
the state of air quality, or a set of inter-connected or
systemic issues, such as the mobility system.</p>
      <p>We collected Italian and English data from the EEA
website and we built two comparable corpora composed
of 10 reports each. The Italian corpus (henceforth
EEAIta) includes 14,612 tokens corresponding to 556
sentences, while the English corpus (henceforth EEA-Eng)
is composed of 11,778 tokens corresponding to 562
sentences.</p>
      <p>A qualitative analysis based on the lists of frequency,
obtained with SketchEngine, shows that the most used
terms in both corpora, Italian and English, refer to the
theme of sustainable-environmental quality, but with a
slight nuance that diferentiates the Italian with respect
to English. The most frequent terms in the Italian corpus
concern especially the sphere of the fight against the
conservation of oceans and seas, the sustaining of the
Earth’s ecosystem and conservation. In the English
corpus, instead, we find a higher frequency of terms related
to climate change. In both cases, these are not terms of
high specialisation, that is, terms that are dificult to
understand by the great majority of citizens, but technical
terms relating to the field of reference, and therefore not
easily traceable in other contexts. For example, in the
Italian corpus, we can highlight words such as “siccità”
(drought), “efetto serra” (greenhouse efect), “ecosistema”
(ecosystem), “inquinamento” (pollution), “suolo” (soil),
“microplastiche e nano plastiche” (microplastics and nano
plastics), while in the English one “pollution”, “climate
change”, “adaptation”, “mitigation”, “habitat”.
presented and discussed, we selected a sample of texts
from online journals, namely mostly from CREA Futuro
but also from L’informatore agrario and agricultura.it.
Our corpus is composed of 20,854 words (4,386 diferent
lemmas) corresponding to 24,383 tokens, organized into
725 sentences and 21 documents.</p>
      <p>CREA Futuro is an initiative of CREA (Consiglio per la
Ricerca in Agricoltura e l’analisi dell’Economia agraria)4,
the leading Italian research organization dedicated to the
agri-food supply chains, supervised by the Ministry of
Agriculture, Food Sovereignty and Forests, and organized
in 12 research centres. This online publication5 is aimed
at citizens to combine authoritative information, based
on scientific evidence. From the CREAfuturo website, we
selected a sample composed of 11 documents. The other
texts are from the freely accessible web version of two
journals, namely L’informatore agrario6 (8 documents)
and agricultura.it7 (2 documents).</p>
      <p>As expected the frequency lists collected using
SkecthEngine show that the words occurring more than 40
times are "produzione" (production), "animali" (animals),
"carne" (meat), "acqua" (water), "latte" (milk),
"allevamento" (farming), "zootecnia" (livestock), "benessere"
(welfare) and "stress.</p>
      <sec id="sec-3-1">
        <title>3.3. How dificult is to read green texts?</title>
        <p>All the texts we collected about green topics are intended
for a general audience, but we want to understand how
specialized they are, and thus less or more readable for
a citizen. We calculated the readability scores for each
of them. Diferent metrics are used for expressing the
readability of diferent languages and we selected two of
the most used ones for the two observed languages.
For Italian texts, we used the Gulpease index8 whose
scales are reported in Figure 1. The Gulpease index has
been separately calculated for the 10 reports of the
EEAIta corpus, showing values that vary from 45 to 53, for the
less and the more readable text respectively (see Table 1).
This means that the reports are unreadable for readers
having primary school diplomas, but hard readable for
readers having secondary school diplomas and easily
readable for the other ones. According to this index, our
texts are on average readable and not particularly
specialized with the exception of some terms.</p>
        <p>
          The Gulpease index was calculated also for the 21
documents of the Livestock-Ita corpus showing that are also
less readable than the EEA’s reports. Considering that
4https://www.crea.gov.it/en/home
5https://creafuturo.crea.gov.it/
6https://www.informatoreagrario.it/
7https://www.agricultura.it/
8The index can be calculated using the formula provided in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and
implemented in online calculators, such as https://www.weband
multimedia.it/site/index.php?area=5&amp;subarea=1&amp;formato=scheda
&amp;id=36.
the index of the harder-to-read document has a Gulpease
index of 28 and the easier an index of 45, they are also
featured in a larger variation.
        </p>
        <p>Finally, we used the Flesch–Kincaid index9 for
evaluating the readability of English texts. The values of this
index broadly correspond to those of the Gulpease index:
values from 100 to 90 are associated with very easy
readable texts, from 89 to 80 with easy readable, from 79 to 70
with fairly easy readable, and from 69 to 60 with standard
readable. Values below 59 are instead associated with
dificult-to-read texts: from 59 to 50 fairly dificult, from
49 to 30 dificult and from 29 to 0 very dificult or almost
unreadable without a higher level of schooling.</p>
        <sec id="sec-3-1-1">
          <title>Corpus</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>EEA-Italian lives-Italian</title>
        </sec>
        <sec id="sec-3-1-3">
          <title>EEA-English Max G</title>
          <p>53
45
Max F
46.25</p>
          <p>Min G
45
28
Min F
20.24</p>
          <p>Var G
8
17
Var F
26.01</p>
          <p>
            For English EEA’s reports, the Flesch–Kincaid index
score varies from 20.24 to 46.25, calculated for the less
and the more readable text respectively. This means that
9This index is described in [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ].
the same typology of texts observed for Italian is featured
by a higher specialization and meaningfully lower
readability. The harder-to-read reports are suitable only for
post-graduated people, but also the less dificult ones can
be hard to read for undergraduate people.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Developing corpora from social media about environmental topics</title>
      <p>The observations based on texts published by EEA and
in online journals helped us in having a more clear idea
of how the language is used for communicating with
the citizens and discussing environmental topics. Similar
topics are discussed also in social media and we collected
data from Twitter in order to build some datasets useful
for advancing the application of classification tasks and
SA on environmental topics.</p>
      <p>Italian data: We collected from Twitter, in a time slot
spanning from February 2nd 2022 to March 4th 2022, a
total of 8,756 (including some duplicated messages in
which more than one of the keywords occurs). They were
ifltered using the following set of keywords: "Transizione
energetica" (energy turnaround), "Agenda 2030", "Crisi
climatica" (climate crisis), "Combustibili fossili" (fossil
fuel), "Deforestazione" (deforestation), "Greenwashing",
"Riscaldamento globale" (global warming), "Impatto
ambientale" (environmental impact), "Climate Change",
"Green Deal", "Sviluppo sostenibile" (sustainability),
"COP26", "Energie rinnovabili" (renewable energy).
been applied to the English corpus.</p>
      <p>As far as stance is concerned, we used the basic
scheme based on 3 labels, i.e. Against, Favour, Neutral,
also considering Of-topic for the annotation of unclear
messages.</p>
      <p>In the fine-grained structured SA scheme, there are
instead two label types that need to be annotated i.e.</p>
      <p>Spans and Relations. While Span labeling means to
identify a set of adjacent or closely connected words, Relation
labeling means to identify a relation between two entities
annotated as Spans.</p>
      <p>
        Each Span may represent a Holder, an Expression, a
Target, or a Topic. A Holder can be a Citizen (an ordinary
person/group not afiliated with any oficial community/
organization), a Government (a central or sub-unit
government or its stakeholders), a Political Party (a political
party or its stakeholders), a Media (a mass media or its
stakeholders), a Company (a company or its
stakeholders), a Private Foundation (a private foundation or its
stakeholders), or an NGO (Non-Governmental
Organization). An Expression can be Positive or Negative. The
same entities that can be annotated as Holders can be
annotated also as Targets. Topics include the general
label Environment, but also more specific labels, i.e., the
10 environmental topics we used to collect the English
dataset obtained from [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Relations are used for labeling the relationship
between the Expression and its Holder, Target, or Topic.</p>
      <p>This allow us to group the Expression and its proper
Holder, Target, or Topic, also considering that one tweet
can include more Expressions and each Expression may
be to be linked to a diferent Holder, Target and Topic. We
also annotate the Coreference as the additional relation
label. For the annotation of this fine-grained structured
SA annotation, we used the annotation tool provided by</p>
      <p>Langing Annotate10. The example of annotation for this</p>
      <p>
        English data: we collected from Twitter, within the
date range 12 September 2022 until 30 September 2022, a
larger amount of data. In collecting this dataset, we used
120 queries from 10 environmental topics including
"Environment", "Green", "Sustainability", "Food", "Organism", fine-grained scheme can be seen in Figure 2: the text
con"Climate Change", "Carbon", "Energy", "Waste", and
"Pollution". These 10 environmental topics are obtained from
the systematic review conducted by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], while the queries
are obtained from the surveyed papers. We obtained a
total of 495,970 tweets, including several duplicated
messages, since we use many keywords to collect the data.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Annotation Schemes for</title>
      </sec>
      <sec id="sec-4-2">
        <title>Environmental Topics</title>
        <p>We applied three diferent forms of annotation to our
data: one is based on the stance of the user against or in
favour of the environmental topics and related politics,
one is a fine-grained structured sentiment analysis
annotation, while the last one is a sentiment term extraction
annotation. The first and second schemes have been
applied to the Italian data only, while the last scheme has
tains two Expressions of negative sentiment. If we wrap
each Expression and its Holder, Target, and Topic using
a quintuple format (similar to quadruple format used in
Notice that in this fine-grained scheme annotation, a
Holder, Target, or Topic span should be connected to an
Expression span. However, an Expression span can also
occur without a Holder, Target, or Topic11.</p>
        <p>
          Lastly, for sentiment term extraction annotation,
this scheme is a subset of our fine-grained scheme
annotation. Instead of annotating Expression span with its
Holder, Target, and Topic, we only annotate the
Expression span. Following the guidelines for crowdsourcing
datasets conducted by [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], we limit the annotation of
English data to Expressions only as a first step, in order
to avoid overloading crowdsourcing contributors with a
too complex task.
        </p>
        <p>Against
Favour
Neutral
Of-topic</p>
        <sec id="sec-4-2-1">
          <title>Annotator-1</title>
          <p>tweets (%)
121 (3.7%)
1032 (31.7%)
1789 (54%)
312 (9.6%)</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>Annotator-2</title>
          <p>tweets (%)
710 (21.8%)
733 (22.5%)
1691 (52%)
119 (3.7%)</p>
          <p>Annotator-1 has annotated the message as Against and
Annotator-2 as Favour, or vice versa, weak in the other
cases. The strong disagreement, occurring in 201
annotated tweets, has been annotated also by a third skilled
annotator that solved 168 cases by selecting the label
used by the first or that chosen by the second annotator.</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>4.2.2. Fine-grained structured sentiment analysis annotation</title>
        </sec>
        <sec id="sec-4-2-4">
          <title>Text</title>
          <p>18 gradi a febbraio e rompete i coglioni col riscaldamento globale.. Ne vorrei 30 fissi
(18 degrees in February and bust your balls with global warming.. I’d like 30 fixed)
Bottigliette di plastica e collaborazione per ridurre l’impatto ambientale
(Plastic bottles and collaboration to reduce environmental impact)
"#ClimateChange Nel 2021 la crisi climatica è costata 343 miliardi di dollari a livello globale
(#ClimateChange In 2021, the climate crisis cost $343 billion globally)
Interisti state rosicando così tanto che contribuite alla deforestazione della foresta Amazzonica. #InterMilan
(Interisti are so gnawed that you contribute to the deforestation of the Amazon rainforest. #InterMilan)</p>
        </sec>
        <sec id="sec-4-2-5">
          <title>Label</title>
          <p>Against
Favour
Neutral</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.2. Annotation of the Italian data</title>
        <p>For the annotation of the fine-grained structured SA, we
A portion of the Italian data from Twitter, namely 3,254 used the same Italian dataset described in Section 4, from
tweets without duplicates (corresponding to 58,893 words which we drew the corpus annotated for stance. In this
and 1,990 sentences), have been manually annotated for case, we only selected a portion of the corpus composed
stance, while its annotation with the fine-grained SA of the tweets that contain the keyword "green" (whether
scheme is currently ongoing. a word or subword as in "greenwashing"). Using this
iflter term, we obtained 1,396 tweets and after dropping
4.2.1. Stance annotation the duplicate tweets, we randomly chose 500 tweets to
The annotation for this scheme was done using Google be annotated by two other master’s degree students.
Sheets, and some examples of annotation are provided For span-level analysis, we analyze the annotation
agreein Table 2. ment level by calculating the pairwise weighted 1 −</p>
        <p>The agreement occurs in around one-third of the data 12 between annotators using SeqEval library13. In
(2,233 over 3,254), while the disagreement in the other this case, 1 −  is used to evaluate the span-level
ones (1,021). The higher percentage of disagreement is agreement because it not only evaluates the entity span
referred to as the label against, as reported in Table 3. agreement but also evaluates the Beginning, Inside,
OutThe disagreement has been considered as strong when side (BIO) tagging structure. In this annotation, we obtain
11For more examples and details about this fine-grained structured</p>
        <p>SA annotation see the guidelines: https://github.com/okkyibrohim
/environmental-topics-in-corpora/tree/main/annotator_guideli
nes
12We calculate a weighted average of 1 −  instead of the
macro one since we only annotate 500 tweets for this scheme,
making many entities have no enough tweets to be calculated the
1 − .
13https://github.com/chakki-works/seqeval
a 63.67% of weighted 1 − , indicating the
annotators have a moderate agreement and can be used for
experiments in future works.</p>
        <p>To see the sentiment distribution for each annotator,
we convert the span-level label to the document-level
label into a Negative, Positive, or Neutral, polarity label
via majority voting between the Expression label. The
distribution of document-level labels between annotators
can be seen in Table 4. From Table 4, we see that the
sentiment polarity in document-level distribution is quite
balanced for Annotator-1. However, in Annotator-2, the
Positive polarity has a significant amount more than the
other two polarity labels. or this document-level label,
we evaluated the agreement score using Cohen’s Kappa
score and got a score of 0.5718, indicating the
documentlevel label has a moderate agreement and can be used for
experiments in future works.</p>
        <p>Negative
Positive
Neutral</p>
        <sec id="sec-4-3-1">
          <title>Annotator-1</title>
          <p>tweets (%)
164 (32.8%)
178 (35.6%)
158 (31.6%)</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>Annotator-2</title>
          <p>tweets (%)
131 (26.2%)
220 (44.0%)
149 (29.8%)</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>4.3. Annotation of the English data</title>
        <p>From the total of 495,970 collected tweets, we randomly
select 700 tweets for English sentiment term annotation
For this English annotation, we use crowdsourced
annotators from Prolific 14 who must have English as their first
language and a 100% of approval rate for their previous
works in the Prolific platform. Annotators were paid
£9/h to perform tasks up to one hour of duration. In this
annotation scheme, each data chunk will be annotated
by 3 anonymous Prolific workers, which means we have
27 workers in total.</p>
        <p>The Fleiss’ Kappa score for this annotation, computed
at the document level as for Italian, can be seen in
Table 5.15</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and future work</title>
      <p>This paper presents a report on the activities we are
carrying on at the University of Turin in the application of
SA to environmental topics. Starting with a linguistic
analysis of texts extracted from diferent genres, we are
developing data sets for stance detection, fine-grained</p>
      <sec id="sec-5-1">
        <title>Kappa</title>
        <p>Interpretation
moderate
moderate</p>
        <p>slight
moderate</p>
        <p>fair
moderate
fair
structured SA, and sentiment term extraction16.
Notwithstanding the relevance of these topics, very few
applications of textual classification techniques and SA has been
developed until now. With our activities, we want to start
iflling out this gap for Italian and English. Nevertheless
this is only a starting point and in future work we will
address a more extended domain of texts, for example
news and interviews, so as to provide a more reliable
barometer of sentiments towards climate topics as found
in a general audience.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The work of English annotation is funded by PUTI Q1
research grant from Universitas Indonesia with number
NKB-394/UN2.RST/HKP.05.00/2022.</p>
      <p>Muhammad Okky Ibrohim thanks to FSE REACT-EU
for PhD Research Projects funding dedicated to GREEN
topics on Ministerial Decree 1061/21.</p>
      <p>
        We thank for their contribution the master’s degree
students Fabiola Summa, Marco Stella, Martina Gagliardi,
Gaia Miele and Maria Comandè.
14https://www.prolific.co/
15All agreement score interpretation used in this research is obtained
from [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
16The dataset and code for agreement evaluation can be seen on this
GitHub page: https://github.com/okkyibrohim/environmental-top
ics-in-corpora
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Ibrohim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <article-title>Sentiment analysis for the natural environment: A systematic review</article-title>
          ,
          <source>ACM Comput. Surv</source>
          . (
          <year>2023</year>
          ). URL: https://doi.org/10.1 145/3604605. doi:
          <volume>10</volume>
          .1145/3604605, just Accepted.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Stede</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Patz</surname>
          </string-name>
          ,
          <article-title>The climate change debate and natural language processing</article-title>
          ,
          <source>in: Proceedings of the 1st Workshop on NLP for Positive Impact</source>
          , Association for Computational Linguistics, Online,
          <year>2021</year>
          , pp.
          <fpage>8</fpage>
          -
          <lpage>18</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .nlp4posi mpact
          <issue>-1</issue>
          .2. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .nlp4posim pact
          <issue>-1</issue>
          .2.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kowalski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Varde</surname>
          </string-name>
          , G. de Melo, R. W. Taylor, Public opinion matters:
          <article-title>Mining social media text for environmental management</article-title>
          ,
          <source>SIGWEB Newsl</source>
          .
          <article-title>(</article-title>
          <year>2019</year>
          ). URL: https://doi.org/10.1145/3352683.3352688. doi:
          <volume>10</volume>
          .1145/3352683.3352688.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lucisano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Piemontese</surname>
          </string-name>
          , Gulpease:
          <article-title>Una formula per la predizione della dificoltà dei testi in lingua italiana</article-title>
          ,
          <source>Scuola e città 3</source>
          (
          <year>1988</year>
          )
          <fpage>110</fpage>
          -
          <lpage>124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kincaid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fishburne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rogers</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. B.S.</surname>
          </string-name>
          ,
          <article-title>Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel"</article-title>
          ,
          <source>Research Branch Report</source>
          (
          <year>1975</year>
          )
          <fpage>8</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Barnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Oberlaender</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Troiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kutuzov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Buchmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Agerri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Øvrelid</surname>
          </string-name>
          , E. Velldal, SemEval
          <year>2022</year>
          task 10:
          <article-title>Structured sentiment analysis</article-title>
          ,
          <source>in: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Seattle, United States,
          <year>2022</year>
          , pp.
          <fpage>1280</fpage>
          -
          <lpage>1295</lpage>
          . URL: https://aclanthology.org /
          <year>2022</year>
          .semeval-
          <volume>1</volume>
          .180. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .sem eval-
          <volume>1</volume>
          .
          <fpage>180</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sabou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bontcheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Derczynski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Scharl</surname>
          </string-name>
          ,
          <article-title>Corpus annotation through crowdsourcing: Towards best practice guidelines</article-title>
          ,
          <source>in: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)</source>
          ,
          <source>European Language Resources Association (ELRA)</source>
          , Reykjavik, Iceland,
          <year>2014</year>
          , pp.
          <fpage>859</fpage>
          -
          <lpage>866</lpage>
          . URL: http://www.lrec-conf.org/procee dings/lrec2014/pdf/497_Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Landis</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. G. Koch,</surname>
          </string-name>
          <article-title>The measurement of observer agreement for categorical data</article-title>
          ,
          <source>Biometrics</source>
          <volume>33</volume>
          (
          <year>1977</year>
          )
          <fpage>159</fpage>
          -
          <lpage>174</lpage>
          . URL: http://www.jstor.org/stable/2 529310.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>