=Paper= {{Paper |id=Vol-3370/paper10 |storemode=property |title=On the Definition of Prescriptive Annotation Guidelines for Language-Agnostic Subjectivity Detection |pdfUrl=https://ceur-ws.org/Vol-3370/paper10.pdf |volume=Vol-3370 |authors=Federico Ruggeri,Francesco Antici,Andrea Galassi,Katerina Korre,Arianna Muti,Alberto Barrón-Cedeño |dblpUrl=https://dblp.org/rec/conf/ecir/RuggeriAGKMB23 }} ==On the Definition of Prescriptive Annotation Guidelines for Language-Agnostic Subjectivity Detection== https://ceur-ws.org/Vol-3370/paper10.pdf
On the Definition of Prescriptive Annotation
Guidelines for Language-Agnostic Subjectivity
Detection
Federico Ruggeri1,∗ , Francesco Antici1 , Andrea Galassi1 , Katerina Korre2 ,
Arianna Muti2 and Alberto Barrón-Cedeño2
1
    Department of Computer Science and Engineering (DISI), University of Bologna, Italy
2
    Department of Interpreting and Translation (DIT), University of Bologna, Italy


                                         Abstract
                                         Defining subjectivity indicators without relying on domain-specific assumptions or incurring inter-
                                         pretation biases is a well-known challenge. To account for these limitations, recent work is shifting
                                         toward annotation procedures for subjectivity detection that are not limited to language-specific cues.
                                         Nonetheless, developing a rigorous methodology to address edge cases and annotators’ bias, while
                                         maintaining desired properties like language agnosticism, is yet an open problem. In this work, we rely
                                         on the prescriptive annotation paradigm and propose a methodology based on three key aspects. We
                                         present a case study on subjectivity detection for fact-checking in English and Italian news to evaluate
                                         the efficacy of the proposed methodology and discuss the open challenges.

                                         Keywords
                                         Subjectivity Detection, Annotation Guidelines, Natural Language Processing, Fact-Checking




1. Introduction
Subjectivity is a feature of language: when making an utterance, the speaker simultaneously
expresses their position, attitude, and feelings towards the utterance, thus leaving their own
mark [1]. Subjectivity Detection (SD) is the task of distinguishing objective content from sub-
jective one. Previous SD approaches can be divided into syntactic and semantic [2]. The first
category relies on keyword spotting [3, 4] or lexicons [5, 6, 7] as standard practice. However,
these solutions are known to be language-specific unless some intermediate lossy translation
procedure is considered [8]. Likewise, lexicon-based approaches require an external knowledge
base which limits their applicability. In contrast, semantic approaches tackle SD via statisti-
cal [9, 10] or neural [11, 12, 13] methods for text representation by relying on labeled training
corpora. This requirement is either addressed by considering domain-specific assumptions [9]

In: R. Campos, A. Jorge, A. Jatowt, S. Bhatia, M Litvak (eds.): Proceedings of the Text2Story’23 Workshop, Dublin
(Republic of Ireland), 2-April-2023
∗
    Corresponding author.
Envelope-Open federico.ruggeri6@unibo.it (F. Ruggeri); francesco.antici@unibo.it (F. Antici); a.galassi@unibo.it (A. Galassi);
aikaterini.korre2@unibo.it (K. Korre); arianna.muti2@unibo.it (A. Muti); a.barron@unibo.it (A. Barrón-Cedeño)
Orcid 0000-0002-1697-8586 (F. Ruggeri); 0000-0002-1125-0588 (F. Antici); 0000-0001-9711-7042 (A. Galassi);
0000-0002-9349-9554 (K. Korre); 0000-0002-3387-6557 (A. Muti); 0000-0003-4719-3420 (A. Barrón-Cedeño)
                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)




                                                                                                         103
or designing annotation guidelines [11, 14, 15, 16].
   Despite their independence from linguistic tools and allowing cross-lingual applicability
with minor efforts [16, 17, 18], semantic approaches face a crucial yet demanding issue: the
perception of subjectivity is itself subjective [2] and, thus, it is affected by interpretation bias [19],
annotation ambiguity, and edge cases. As a result, defining practical, non-language-specific,
and largely applicable annotation guidelines is a well-known challenge [15].
   In this work, we adopt a prescriptive approach [20] and frame SD for a specific task to
downplay annotation ambiguity [21], describing a method for the development of task-oriented
annotation guidelines based on three key aspects: schematic case-based guidelines, iterative
refinement, and reliable annotation. We also consider a preliminary case study on fact-checking
to empirically evaluate the proposed methodology and elaborate on the encountered open
challenges.


2. Methodology
We identify three key aspects for developing task-oriented SD annotation guidelines. We
follow the prescriptive paradigm [20] to impose a specific and consistent conceptualization of
subjectivity for annotation.

Schematic case-based guidelines. Given a task that partially relies on SD, it is necessary
to define subjectivity according to the task’s objectives. It is, therefore, necessary to define
annotation guidelines that are schematic and based on specific real cases. This formulation is
less sensitive to domain- or language-specific cues and eases the annotators’ training process.
Moreover, these properties could foster collecting large corpora for SD based on annotation
guidelines rather than relying on domain-dependent assumptions [22].

Iterative refinement. Agreeing on a set of validated annotation guidelines is a collaborative
refinement process. Such a process has the objective of discovering annotation edge cases,
i.e. instances that are not covered by annotation guidelines resulting in high inter-annotator
disagreement. Indeed, a preliminary version of annotation guidelines is unlikely to thoroughly
cover all possible cases. For this reason, guideline refinement is an iterative process consisting
of multiple annotation pilot studies since edge case discovery depends on the nature of sampled
annotation data [23]. The pilot studies are designed to instruct annotators and reach a common
set of validation annotation guidelines [24], and are iterated until a sufficient level of agreement
is reached [25]. This formulation is in line with the prescriptive paradigm [20], where annotator
disagreement is a call to action to refine annotation guidelines.

Reliable annotation. The last key aspect concerns the data annotation task. First, annotators
are provided with refined annotation guidelines to instruct them. Second, text instances are
assigned to multiple annotators to downplay the impact of noisy labels and annotators’ bias [19].
This process allows for discriminating edge cases from instances with a unanimous or almost
perfect agreement. Tracking of individual annotations per instance is considered a measure of
quality assurance [20, 26]. Eventually, labels can be aggregated via voting strategies for training




                                                   104
machine learning models [27]. In case of disagreement, a discussion phase among annotators
takes place to agree on a solution. An additional annotator to label these instances is considered
if an agreement is not reached. To address the problem of noisy labels, it is possible to discard
those assigned by annotators that strongly disagree with each other [28] and explicitly report
for which instances the discussion phase did not solve ambiguities [29, 30].


3. Discussion and Open Challenges
We elaborate on the presented methodology by discussing a case study on fact-checking.
We consider a pipeline for fact-checking where SD is performed to discriminate between
objective sentences that can be directly verified and subjective sentences that must be processed
or rewritten to extract the objective claim or information. The detection and processing of
subjective content have the final purpose of creating an objective narrative upon which fact-
checking relies [31]. We consider the task of labeling sentences in English and Italian news
articles targeting ongoing controversial topics, such as political affairs, Covid-19, civil rights,
and economics (see Appendix A).
   We initially design a set of preliminary annotation criteria suitable for fact-checking purposes
(see Appendix B). These guidelines are mainly derived from existing work on SD on related
domains [11, 32]. We recruit six human annotators with native or near-native knowledge of
the English and Italian languages. After two annotation pilot studies, annotators agree on a
common set of annotation criteria. We keep track of inter-annotator agreement (IAA) over pilot
studies to validate their efficacy. In particular, the average Cohen’s kappa over annotator pairs
is 0.39 (fair agreement) and 0.53 (moderate agreement) for the first and second pilot studies,
respectively. We consider both Italian and English annotations when computing the IAA and
observe comparable results between languages. The observed 14% gain between the two studies
denotes a significant improvement in the annotation criteria.
   During the pilot studies, we discuss the importance of contextual information (Section 3.1)
for annotation and address several edge cases (Section 3.2). These observations are consistent
in both languages, proving the efficacy of our methodology regardless of the language.

3.1. Annotating with Context
The lack of context may lead to ambiguous annotation cases, depending on the chosen input
granularity [33, 34]. In our setting, we consider sentence-level granularity as common prac-
tice [31]. This choice represents a suitable testing ground for evaluating context importance
given the limited scope of a sentence. For this purpose, in the second pilot study, we arrange
annotators into two groups. Half of them label input sentences in order of appearance, while
the remaining half labels sentences in random order, neglecting any contextual information as
done in the first pilot study. We observe a 0.38 and 0.53 average Cohen’s kappa over annotator
pairs for the context and non-context groups, respectively.
   Our findings contrast the results of Ljubešić et al. [35], suggesting that context may be useful
only in certain tasks or specific scenarios. Moreover, we identify two additional reasons in
favor of a non-contextual annotation formulation. First, the use of context leads to an increased




                                               105
Table 1
Example of edge cases encountered in our case study.
 (a) Emotions       He looked like he was on the verge of crying.
 (b) Quotes         “Crosbie is an extremely violent man who has no place in society, and we welcome the jury’s
                    verdict today.”
 (c) Intensifiers   Recognising that, last Friday the US announced a further $600m of military aid to Ukraine,
                    including more Himars rockets that have so damaged Moscow’s logistics and its ability to resist.
 (d) Speculations   Putin will hope to sow uncertainty in the eyes of policymakers’ meetings in New York.



annotator’s workload. Consequently, it negatively affects the applicability of annotation guide-
lines to multiple scenarios. Second, contextual information may not be available in certain
domains and settings, as in tweets [2]. These observations and the higher IAA suggest that a
non-contextual annotation for SD is a preferred formulation.

3.2. Edge Cases
During our pilot studies, we identify four edge cases, as reported in Table 1.
   Emotions. Statements carrying emotions convey a subjective point of view [36, 37] but
they cannot be verified or confuted by a fact-checking system since they are based on the
author’s beliefs and sensations only. Since it is impossible to provide such information in a
more objective form, we label these statements as objective.
   Quotes. In news sources, authors frequently use quotes to support their thesis. Even if the
quoted content may be subjective, the task concerns detecting subjectivity only for the article’s
author. For this reason, we label quoted content as objective.
   Intensifiers. We identify intensifiers as indicators of subjectivity since their presence could
be symptomatic of the author’s personal point of view. For example, in Table 1 (c) it is difficult
to state if the expression “so damaged” conveys the author’s personal point of view or, rather, is
descriptive and can be re-formulated as “that have in this way damaged”.
   Speculations. Annotators often struggle to judge implicit statements without leveraging
their own interpretation bias [38]. We consider speculation as a subjectivity indicator, since
authors make use of it to allude to their own interpretation of events and consequences. The
expression “will hope to sow uncertainty” in Table 1 (d) is an example.


4. Conclusions
We have presented our ongoing work on developing annotation guidelines for task-oriented SD.
In particular, we introduced a methodology based on the prescriptive paradigm [20] to provide
a task-specific definition of subjectivity via schematic and language-independent annotation
criteria. These criteria are developed to cover annotation edge cases and downplay annotators’
interpretation biases. The application of our methodology to a preliminary case study on
fact-checking in two different languages allowed us to reduce the ambiguity of the annotation
by identifying edge cases and addressing them through the definition of specific guidelines. In
future works, we will extend our approach to further languages.




                                                       106
Acknowledgments
This work has been partly funded by the European Union’s Horizon 2020 Research and Inno-
vation programme under grant agreement 101017142 (”StairwAI: Stairway to AI”) and partly
funded by PNRR - M4C2 - Investimento 1.3, Partenariato Esteso PE00000013 (”FAIR - Future
Artificial Intelligence Research” - Spoke 8 ”Pervasive AI”), funded by the European Commission
under the NextGeneration EU programme. A. Muti’s research is carried out under the project
“DL4AMI–Deep Learning models for Automatic Misogyny Identification”, in the framework of
Progetti di formazione per la ricerca: Big Data per una regione europea più ecologica, digitale e
resiliente—Alma Mater Studiorum–Università di Bologna, Ref. 2021-15854. K. Korre’s research is
carried out under the project “RACHS: Rilevazione e Analisi Computazionale dell’Hate Speech
in rete”, in the framework of the PON programme FSE REACT-EU, Ref. DOT1303118.


References
 [1] L. Feng, On the subjectivity and intersubjectivity of language, in: Communication and
     Linguistics Studies, volume 6, 2020, pp. 1–5. doi:10.11648/j.cls.20200601.11.
 [2] I. Chaturvedi, E. Cambria, R. E. Welsch, F. Herrera, Distinguishing between facts and
     opinions for sentiment analysis: Survey and challenges, Inf. Fusion 44 (2018) 65–77. URL:
     https://doi.org/10.1016/j.inffus.2017.12.006. doi:10.1016/j.inffus.2017.12.006.
 [3] J. Wiebe, E. Riloff, Creating subjective and objective sentence classifiers from unannotated
     texts, in: A. Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing,
     Springer Berlin Heidelberg, Berlin, Heidelberg, 2005, pp. 486–497.
 [4] E. Riloff, J. Wiebe, Learning extraction patterns for subjective expressions, in: Proceedings
     of the 2003 Conference on Empirical Methods in Natural Language Processing, 2003, pp.
     105–112. URL: https://aclanthology.org/W03-1014.
 [5] N. Das, S. Sagnika, A subjectivity detection-based approach to sentiment analysis, in:
     D. Swain, P. K. Pattnaik, P. K. Gupta (Eds.), Machine Learning and Information Processing,
     Springer Singapore, Singapore, 2020, pp. 149–160.
 [6] H. Yu, V. Hatzivassiloglou, Towards answering opinion questions: Separating facts from
     opinions and identifying the polarity of opinion sentences, in: Proceedings of the 2003
     Conference on Empirical Methods in Natural Language Processing, EMNLP ’03, Association
     for Computational Linguistics, USA, 2003, p. 129–136. URL: https://doi.org/10.3115/1119355.
     1119372. doi:10.3115/1119355.1119372.
 [7] J. Villena-Román, J. García-Morera, M. Á. G. Cumbreras, E. Martínez-Cámara, M. T. Martín-
     Valdivia, L. A. U. López, Overview of TASS 2015, in: J. Villena-Román, J. García-Morera,
     M. Á. G. Cumbreras, E. Martínez-Cámara, M. T. Martín-Valdivia, L. A. U. López (Eds.),
     Proceedings of TASS 2015: Workshop on Sentiment Analysis at SEPLN co-located with
     31st SEPLN Conference (SEPLN 2015), Alicante, Spain, September 15, 2015, volume 1397
     of CEUR Workshop Proceedings, CEUR-WS.org, 2015, pp. 13–21. URL: http://ceur-ws.org/
     Vol-1397/overview.pdf.
 [8] F. Benamara, B. Chardon, Y. Mathieu, V. Popescu, Towards context-based subjectivity
     analysis, in: Proceedings of 5th International Joint Conference on Natural Language




                                              107
     Processing, Asian Federation of Natural Language Processing, Chiang Mai, Thailand, 2011,
     pp. 1180–1188. URL: https://aclanthology.org/I11-1132.
 [9] B. Pang, L. Lee, A sentimental education: Sentiment analysis using subjectivity summa-
     rization based on minimum cuts, in: Proceedings of the 42nd Annual Meeting of the
     Association for Computational Linguistics (ACL-04), Barcelona, Spain, 2004, pp. 271–278.
     URL: https://aclanthology.org/P04-1035. doi:10.3115/1218955.1218990.
[10] F. Sha, F. C. N. Pereira, Shallow parsing with conditional random fields, in: M. A. Hearst,
     M. Ostendorf (Eds.), Human Language Technology Conference of the North American
     Chapter of the Association for Computational Linguistics, HLT-NAACL 2003, Edmonton,
     Canada, May 27 - June 1, 2003, The Association for Computational Linguistics, 2003. URL:
     https://aclanthology.org/N03-1028/.
[11] F. Antici, L. Bolognini, M. A. Inajetovic, B. Ivasiuk, A. Galassi, F. Ruggeri, Subjectivita:
     An italian corpus for subjectivity detection in newspapers, in: CLEF, volume 12880
     of LNCS, Springer, 2021, pp. 40–52. URL: https://doi.org/10.1007/978-3-030-85251-1_4.
     doi:10.1007/978-3-030-85251-1\_4.
[12] N. Kalchbrenner, E. Grefenstette, P. Blunsom, A convolutional neural network for mod-
     elling sentences, in: Proceedings of the 52nd Annual Meeting of the Association for
     Computational Linguistics, ACL 2014, June 22-27, 2014, Baltimore, MD, USA, Volume
     1: Long Papers, The Association for Computer Linguistics, 2014, pp. 655–665. URL:
     https://doi.org/10.3115/v1/p14-1062. doi:10.3115/v1/p14-1062.
[13] I. Chaturvedi, Y. Ong, I. Tsang, R. Welsch, E. Cambria, Learning word dependencies in
     text by means of a deep recurrent belief network, Knowledge-Based Systems 108 (2016).
     doi:10.1016/j.knosys.2016.07.019.
[14] J. M. Wiebe, R. F. Bruce, T. P. O’Hara, Development and use of a gold-standard data
     set for subjectivity classifications, in: Proceedings of the 37th Annual Meeting of the
     Association for Computational Linguistics, Association for Computational Linguistics,
     College Park, Maryland, USA, 1999, pp. 246–253. URL: https://aclanthology.org/P99-1032.
     doi:10.3115/1034678.1034721.
[15] T. Wilson, J. Wiebe, Annotating opinions in the world press, in: Proceedings of the
     SIGDIAL 2003 Workshop, The 4th Annual Meeting of the Special Interest Group on
     Discourse and Dialogue, July 5-6, 2003, Sapporo, Japan, The Association for Computer
     Linguistics, 2003, pp. 13–22. URL: https://aclanthology.org/W03-2102/.
[16] M. Abdul-Mageed, M. Diab, Subjectivity and sentiment annotation of Modern Standard
     Arabic newswire, in: Proceedings of the 5th Linguistic Annotation Workshop, Association
     for Computational Linguistics, Portland, Oregon, USA, 2011, pp. 110–118. URL: https:
     //aclanthology.org/W11-0413.
[17] I. Amini, S. Karimi, A. Shakery, Cross-lingual subjectivity detection for resource lean
     languages, in: Proceedings of the Tenth Workshop on Computational Approaches to
     Subjectivity, Sentiment and Social Media Analysis, Association for Computational Lin-
     guistics, Minneapolis, USA, 2019, pp. 81–90. URL: https://aclanthology.org/W19-1310.
     doi:10.18653/v1/W19-1310.
[18] C. Banea, R. Mihalcea, J. Wiebe, Sense-level subjectivity in a multilingual setting, Computer
     Speech & Language 28 (2014) 7–19. URL: https://www.sciencedirect.com/science/article/
     pii/S0885230813000181. doi:https://doi.org/10.1016/j.csl.2013.03.002.




                                               108
[19] M. Geva, Y. Goldberg, J. Berant, Are we modeling the task or the annotator? an investiga-
     tion of annotator bias in natural language understanding datasets, in: K. Inui, J. Jiang, V. Ng,
     X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Lan-
     guage Processing and the 9th International Joint Conference on Natural Language Process-
     ing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Association for Com-
     putational Linguistics, 2019, pp. 1161–1166. URL: https://doi.org/10.18653/v1/D19-1107.
     doi:10.18653/v1/D19-1107.
[20] P. Röttger, B. Vidgen, D. Hovy, J. B. Pierrehumbert, Two contrasting data annotation
     paradigms for subjective NLP tasks, in: M. Carpuat, M. de Marneffe, I. V. M. Ruíz (Eds.),
     Proceedings of the 2022 Conference of the North American Chapter of the Association
     for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle,
     WA, United States, July 10-15, 2022, Association for Computational Linguistics, 2022,
     pp. 175–190. URL: https://doi.org/10.18653/v1/2022.naacl-main.13. doi:10.18653/v1/2022.
     naacl-main.13.
[21] T. A. Wilson, Fine-grained subjectivity and sentiment analysis: recognizing the intensity,
     polarity, and attitudes of private states, University of Pittsburgh, 2008.
[22] R. Satapathy, S. Pardeshi, E. Cambria, Polarity and subjectivity detection with multitask
     learning and BERT embedding, Future Internet 14 (2022) 191. URL: https://doi.org/10.3390/
     fi14070191. doi:10.3390/fi14070191.
[23] V. K. Pradhan, M. Schaekermann, M. Lease, In search of ambiguity: A three-stage workflow
     design to clarify annotation guidelines for crowd workers, Frontiers Artif. Intell. 5 (2022)
     828187. URL: https://doi.org/10.3389/frai.2022.828187. doi:10.3389/frai.2022.828187.
[24] R. Artstein, Inter-annotator agreement, Handbook of linguistic annotation (2017) 297–313.
[25] E. Musi, D. Ghosh, S. Muresan, Towards feasible guidelines for the annotation of argument
     schemes, in: Proceedings of the Third Workshop on Argument Mining (ArgMining2016),
     Association for Computational Linguistics, Berlin, Germany, 2016, pp. 82–93. URL: https:
     //aclanthology.org/W16-2810. doi:10.18653/v1/W16-2810.
[26] M. Teruel, C. Cardellino, F. Cardellino, L. A. Alemany, S. Villata, Increasing argument
     annotation reproducibility by using inter-annotator agreement to improve guidelines, in:
     N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard,
     J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, T. Tokunaga (Eds.), Proceedings of
     the Eleventh International Conference on Language Resources and Evaluation, LREC 2018,
     Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA), 2018.
     URL: http://www.lrec-conf.org/proceedings/lrec2018/summaries/1048.html.
[27] A. T. Nguyen, B. Wallace, J. J. Li, A. Nenkova, M. Lease, Aggregating and predicting
     sequence labels from crowd annotations, in: Proceedings of the 55th Annual Meet-
     ing of the Association for Computational Linguistics (Volume 1: Long Papers), As-
     sociation for Computational Linguistics, Vancouver, Canada, 2017, pp. 299–309. URL:
     https://aclanthology.org/P17-1028. doi:10.18653/v1/P17-1028.
[28] J. Amidei, P. Piwek, A. Willis, Identifying annotator bias: A new IRT-based method for
     bias identification, in: Proceedings of the 28th International Conference on Computational
     Linguistics, International Committee on Computational Linguistics, Barcelona, Spain
     (Online), 2020, pp. 4787–4797. URL: https://aclanthology.org/2020.coling-main.421. doi:10.
     18653/v1/2020.coling-main.421.




                                                109
[29] V. Basile, F. Cabitza, A. Campagner, M. Fell, Toward a perspectivist turn in ground truthing
     for predictive computing, CoRR abs/2109.04270 (2021).
[30] G. Abercrombie, V. Basile, S. Tonelli, V. Rieser, A. Uma (Eds.), Proceedings of the 1st
     Workshop on Perspectivist Approaches to NLP @LREC2022, European Language Resources
     Association, Marseille, France, 2022. URL: https://aclanthology.org/2022.nlperspectives-1.0.
[31] Z. Guo, M. Schlichtkrull, A. Vlachos,                      A Survey on Automated Fact-
     Checking,              Transactions of the Association for Computational Lin-
     guistics      10      (2022)     178–206.       URL:      https://doi.org/10.1162/tacl_a_00454.
     doi:10.1162/tacl_a_00454.                         arXiv:https://direct.mit.edu/tacl/article-
     pdf/doi/10.1162/tacl_a_00454/1987018/tacl_a_00454.pdf.
[32] L. de Saussure, P. Schulz, Subjectivity out of irony, Semiotica 2009 (2009) 397–416. URL:
     https://doi.org/10.1515/SEMI.2009.018. doi:doi:10.1515/SEMI.2009.018.
[33] J. Pavlopoulos, J. Sorensen, L. Dixon, N. Thain, I. Androutsopoulos, Toxicity detection:
     Does context really matter?, in: Proceedings of the 58th Annual Meeting of the Association
     for Computational Linguistics, Association for Computational Linguistics, Online, 2020,
     pp. 4296–4305. URL: https://aclanthology.org/2020.acl-main.396. doi:10.18653/v1/2020.
     acl-main.396.
[34] S. Menini, A. P. Aprosio, S. Tonelli, Abuse is contextual, what about nlp? the role of
     context in abusive language annotation and detection, CoRR abs/2103.14916 (2021). URL:
     https://arxiv.org/abs/2103.14916. arXiv:2103.14916.
[35] N. Ljubešić, I. Mozetič, P. K. Novak, Quantifying the impact of context on the quality of
     manual hate speech annotation, Natural Language Engineering (2022) 1–14.
[36] R. Mihalcea, C. Banea, J. Wiebe, Multilingual subjectivity and sentiment analysis, in:
     Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics:
     Tutorial Abstracts, Association for Computational Linguistics, Jeju Island, Korea, 2012,
     p. 4. URL: https://aclanthology.org/P12-4004.
[37] K. Veronika, Subjectivity and emotions as sources of insight in an ethnographic case study:
     A tale of the field, M@n@gement 9 (2006) 117–135. URL: https://management-aims.com/
     index.php/mgmt/article/view/4089.
[38] T. Caselli, V. Basile, J. Mitrović, I. Kartoziya, M. Granitzer, I feel offended, don’t be abusive!
     implicit/explicit messages in offensive and abusive language, in: Proceedings of the
     Twelfth Language Resources and Evaluation Conference, European Language Resources
     Association, Marseille, France, 2020, pp. 6193–6202. URL: https://aclanthology.org/2020.
     lrec-1.760.



Appendix

A. News Sources Considered
For our pilot studies, we consider the news sources reported in Table 2. For each study, we
randomly sample up to six articles (∼ 150 sentences on average). All the annotators label the
sampled articles at the sentence level.




                                                 110
Table 2
Sources considered for the pilot studies.
           English                                       Italian
           frontpagemag.com      shtfplan.com            fascinazione.it   ilfoglio.it
           telegraph.co.uk       theguardian.com         avantionline.it   liberoquotidiano.it
           vdare.com                                     avvenire.it


B. Initial Draft of Annotation Guidelines
The initial set of annotation criteria for subjectivity detection states that a sentence is subjective
if:

   (i) it explicitly reports the personal opinion of its author;

  (ii) it contains sarcastic or ironic expressions;

 (iii) it contains exhortations or personal auspices;

 (iv) it contains discriminating or downgrading expressions;

  (v) it contains rhetorical figures explicitly made by its author to convey their opinion;

 (vi) it contains a conclusion made by its author that is drawn despite insufficient factual infor-
      mation.

  After the first pilot study, annotators identify and discuss two major edge cases: emotions
and quotes. In particular, the following annotation criteria are added:

 (vii) a sentence is objective when it describes the personal feelings, emotions or moods of its author,
       without conveying opinions on other matters;

(viii) a sentence is objective if it expresses an opinion, claim, emotion, or a point of view that is
       explicitly attributable to a third-party (e.g., a person mentioned in the text). The presence
       of quotation marks (“ ”), when used to quote a third person (be it at the beginning of the
       sentence, at the end, or both), represents an explicit third-party opinion, even if it is not
       clearly stated in the sentence.

  Additionally, annotation criteria (i) is modified to explicitly address rhetorical questions:
rhetorical questions are considered as an expression of opinion.
  After the second pilot study, annotators identify and discuss two additional edge cases:
speculations and intensifiers. In particular, the following annotation criteria are added:

 (ix) a sentence is subjective if it contains intensifiers that can be attributed to its author to express
      their opinion.

  Moreover, annotation criteria (i) is modified to address speculations: speculations that draw
conclusions are considered opinions.




                                                   111