Community-based Stance Detection
Emanuele Brugnoli1,2,3,∗ , Donald Ruggiero Lo Sardo1,2,3
1
Sony Computer Science Laboratories Rome, Joint Initiative CREF-SONY, Piazza del Viminale 1, 00184, Rome, Italy.
2
Centro Studi e Ricerche Enrico Fermi (CREF), Piazza del Viminale 1, 00184 Rome, Italy.
3
Dipartimento di Fisica - Sapienza Università di Roma, P.le A. Moro 2, 00185 Rome, Italy.
Abstract
Stance detection is a critical task in understanding the alignment or opposition of statements within social discourse. In
this study, we present a novel stance detection model that labels claim-perspective pairs as either aligned or opposed. The
primary innovation of our work lies in our training technique, which leverages social network data from X (formerly Twitter).
Our dataset comprises tweets from opinion leaders, political entities and news outlets, along with their followers’ interactions
through retweets and quotes. By reconstructing politically aligned communities based on retweet interactions, treated as
endorsements, we check these communities against common knowledge representations of the political landscape. Our
training dataset consists of tweet/quote pairs where the tweet comes from a political entity and the quote either originates
from a follower who exclusively retweets that political entity (treated as aligned) or from a user who exclusively retweets a
political entity from an opposing ideological community (treated as opposed). This curated subset is used to train an Italian
language model based on the RoBERTa architecture, achieving an accuracy of approximately 85%. We then apply our model
to label all tweet/quote pairs in the dataset, analyzing its out-of-sample predictions. This work not only demonstrates the
efficacy of our stance detection model but also highlights the utility of social network structures in training robust NLP
models. Our approach offers a scalable and accurate method for understanding political discourse and the alignment of social
media statements.
Keywords
Stance Detection, Polarisation, Social Networks
1. Introduction rum [7], the increase in societal polarization features
among the top three risks for democratic societies. While
Stance detection is a critical task within the domain of a macroscopic increase of polarization has been ob-
natural language processing (NLP). It involves identify- served, an understanding of the microscopic pathways
ing the position or attitude expressed in a piece of text though which it develops is still an open field of re-
towards a specific topic, claim, or entity[1, 2]. Tradition- search. Through stance detection it would be possible
ally, stances are classified into three primary categories: to reconstruct these pathways down to the individual
favor, against, and neutral. This classification enables a text-comment pairs.
detailed description of textual data, facilitating a deeper Stance detection, has been explored across various
insight into public opinion and discourse dynamics. fields with differing definitions and applications. Du
In recent years, the proliferation of digital commu- Bois introduces the concept of the stance triangle, where
nication platforms such as social media, forums, and stance-taking involves evaluating objects, positioning
online news outlets has resulted in an unprecedented subjects, and aligning with others in dialogic interac-
volume of user-generated content. This surge under- tions, emphasizing the sociocognitive aspects and inter-
scores the necessity for automated systems capable of subjectivity in discourse [6]. Sayah and Hashemi focus
efficiently analyzing and interpreting these vast text cor- on academic writing, analyzing stance and engagement
pora. Stance detection addresses this need by providing features like hedges, self-mention, and appeals to shared
tools that can systematically assess opinions and reac- knowledge to understand communicative styles and in-
tions embedded within texts, thus offering valuable ap- terpersonal strategies [8]. Küçük and Can define stance
plications across various fields including social media detection as the classification of an author’s position to-
analysis [3, 4], search engines [5], and linguistics [6]. wards a target (favor, against, or neutral), highlighting its
According to the last report of World Economic Fo- importance in sentiment analysis, misinformation detec-
tion, and argument mining [9]. These diverse approaches
CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, underscore the multifaceted nature of stance detection
Dec 04 — 06, 2024, Pisa, Italy and its applications in enhancing the understanding of
∗
Corresponding author. social discourse, academic rhetoric, and online content
Envelope-Open emanuele.brugnoli@sony.com (E. Brugnoli);
donaldruggiero.losardo@sony.com (D. R. Lo Sardo)
analysis. For a review of the recent developments of the
Orcid 0000-0002-5342-3184 (E. Brugnoli); 0000-0003-3102-6505 field we refer to Alturayeif et al. [2] and AlDayel et al.
(D. R. Lo Sardo) [3].
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
In this work, we propose a novel approach to training In the following sections, we will outline the data
stance detection models by leveraging the interactions gathering approach used for the dataset. Subsequently,
within highly polarized communities. Our method uti- we will describe the community detection methods em-
lizes tweet/quote pairs from the Italian political debate ployed to identify leaders and users within the Italian
to construct a robust training set. We operate under political discourse. We will then discuss the model archi-
the assumption that users who predominantly retweet a tecture and its training process. In the results section, we
particular political profile are likely in agreement with will evaluate the model’s performance and present our
the statements made by that profile. We restricted our findings. Finally, the conclusion will address potential
analysis to retweet since this form of communication future developments, the implications of our work, and
primarily aligns with the endorsement hypothesis [10]. its limitations.
Namely, being a simple re-posting of a tweet, retweet-
ing is commonly thought to express agreement with the
claim of the tweet [11]. Further, though retweets might 2. Results
be used with other purposes such as those described by
In this study, we focus on a comprehensive set of Italian
Marsili [12], the repeated nature of the interaction we
opinion leaders active on Twitter/X, including the official
observe in our networks reduces the probability that the
profiles of major news media outlets as well as prominent
activity falls outside of the endorsement behavior.
politicians and political parties. The profiles of news me-
Conversely, while quoting a tweet works similarly to
dia outlets are further classified according to assessments
retweeting, the function allows users to add their own
provided by NewsGuard, which categorize them as either
comments above the tweet. This makes this form of
questionable or reliable sources. This classification is cru-
communication controversial regarding the endorsement
cial for evaluating the quality of the information these
hypothesis, as agreement or disagreement with the tweet
outlets disseminate, particularly regarding their repu-
depends on the stance of the added comment. On the
tation for spreading misinformation. For the selected
other hand, the information social media users see, con-
leaders, we collected all tweets produced from January
sume, and share through their news feed heavily depends
2018 to December 2022. The general public (followers)
on the political leaning of their early connections [13, 14].
is identified based on their RTs to the content produced
In other words, while algorithms are highly influential
by these leaders. See Materials and Methods for details
in determining what people see and shaping their on-
on the data collection process. Using this node configu-
platform experiences [15], there is significant ideological
ration, we construct a bipartite network with two layers:
segregation in political news exposure [16]. It is therefore
leaders and followers, where the links represent the num-
reasonable to expect that users who almost exclusively
ber of RTs by the latter of tweets made by the former. If
retweet a political entity (party, leader, or both) use quote
a group of followers retweets tweets from two different
tweets to express agreement with statements posted by
leaders, it indicates that these leaders are likely communi-
that entity and disagreement with statements posted by
cating similar messages or viewpoints. To analyze these
political entities ideologically distant from their preferred
relationships more deeply, we perform a monopartite
one. Additionally, the quote interaction perfectly encap-
projection onto the leader layer. This projection, detailed
sulates the stance triangle described by Du Bois [6].
in Materials and Methods, simplifies the network by con-
In order to correctly assess political opposition we
centrating solely on the leaders and the connections be-
construct a retweet network and use the Louvain com-
tween them that are inferred from their shared followers.
munity detection algorithm [17] to characterize leaders
Panel (A) of Figure 1 shows the RT network of leaders
and, through label propagation, the followers that align
aggregated in terms of communities identified through
with their views.
an optimized version of the Louvain algorithm [17]. The
Through these community labels we construct a
a posteriori analysis of the political leaders in each group
dataset of claim-perspective couples by annotating tweet-
reveals that the clustering algorithm effectively identi-
quote pairs from profiles that clearly express political
fied communities that align with the political affiliations
alignment as favor and annotating tweet-quote pairs in
of the leaders in each cluster [18, 19]. Specifically, the
which the profiles come from different communities as
Left-leaning community includes political entities such as
against. Finally, we use a pretrained BERT model for
+Europa, Azione, Enrico Letta, and Nicola Fratoianni; the
Italian language and fine-tune it to the classification task.
Right-leaning community features leaders from FdI, FI,
This methodology aims to enhance the accuracy of
and Lega; and the Five Star Movement (M5S) community
stance detection models by incorporating real-world pat-
includes key figures like Giuseppe Conte and Luigi Di
terns of agreement and disagreement observed in polar-
Maio. An interesting observation from the network con-
ized online environments. Further, it enables an unsuper-
figuration is the clustering of questionable news sources.
vised training paradigm that can be scaled to very large
These profiles consistently group within the same com-
datasets.
accuracy, i.e., the ratio of correctly predicted instances
(both true favor and true against) to the total number
(A) Retweet network of instances. The best-trained models from each fold
demonstrate nearly identical performance, as shown by
the average accuracy and F1-scores reported in the fol-
lowing table. The best model from fold 3 is identified
Overall Favor Against
Acc (SD) F1 (SD) F1 (SD)
Political profiles
Training 0.863 (10−5 ) 0.863 (10−5 ) 0.864 (10−5 )
0.846 (10−6 ) 0.846 (10−6 ) 0.846 (10−5 )
Questionable news sources
Reliable news sources
Test
Table 1
Average performance of the best models from each fold on the
(B) Stance network
training set and the test set. The table reports the mean and
standard deviation (SD) for each metric considered: Accuracy
for the overall model, and F1-score for each individual class.
as the highest performing and is therefore used in the
following analyses. The corresponding confusion ma-
trices for both the training and test sets are provided in
Figure 1: Projection of the follower-leader bipartite network Appendix - Table 5.
onto the layer of leaders. In both (A) and (B), the edges repre-
Given the imbalance in the label distribution of the
sent connections between leaders based on follower activity.
(A) The edge weights are derived from the number of shared
claim-perspective dataset, we use 41, 347 pairs – each
followers who retweeted content from both leaders. (B) The annotated as favor and previously removed to create a
edge weights are based on the positive difference between balanced training set – as an additional test set to eval-
favoring and against quote tweets made by shared followers uate the model’s performance. The model achieves an
on the content produced by the two leaders. In these visu- accuracy of 83.6% when predicting the stance of these
alizations, the node positions remain constant, providing a pairs.
consistent framework for comparison. Node colors refer to The model is then applied to classify all the collected
communities as a result of running an optimized version of the tweet-quote pairs based on their stance. Thus, following
Louvain algorithm. Nodes frame colors refer to the different the same procedure used to construct the RT network
types of leaders: political entities (azure), questionable news
of leaders, we develop the stance network and analyze
sources (dark red), and reliable news sources (dark blue).
its community structure. In this case, the weight of a
link in the bipartite follower-leader network represents
the positive difference between the number of favoring
munity, suggesting a potential alignment or affinity with and against quotes from a follower on the leader’s tweets.
specific political leanings or ideologies. Panel (B) of Figure 1 shows the stance network of leaders
Leveraging the political bias of followers in our Twitter aggregated in terms of communities identified through
network, we build a very large dataset of tweet-quote the Louvain algorithm. The node positions in this rep-
pairs, each annotated with the corresponding stance (fa- resentation are the same as those in the RT network,
vor or against), as better described in Materials and Meth- providing a consistent framework for comparison. More
ods. Since this method assigns the stance to each pair formally, to evaluate the differences in clustering assign-
in an unsupervised manner, to ensure that our approach ments between nodes present in both the retweet net-
is performing correctly, we randomly selected 500 pairs work and the stance network, we perform a clustering
(250 favor and 250 against) and manually annotated their comparison. Namely, we use the contingency table [22]
stance. We then compared the results of the automatic an- associated with both the representations to compute com-
notation with the manual annotation. The results, shown munity overlap. Figure 2 shows the comparison results
in Appendix - Table 3, indicate a high level of accuracy broken down by source type: political entities and news
in favor and against classifications, with a small number outlets. While clusters C and D of the stance network
of neutral cases. The dataset serves as training set for primarily align with clusters 2 and 3 of the RT network, re-
fine-tuning UmBERTo [20], an Italian language model spectively, clusters A and B of the stance network mainly
based on the RoBERTa architecture [21], to assign stance represent a refinement of cluster 1 from the RT network.
labels to claim-perspective pairs. The fine-tuning process This suggests that even in the stance network, the emerg-
is performed using 5-fold cross-validation. The optimal ing communities align with the political affiliations of
performance for each fold is assessed by measuring the the leaders within each cluster.
Political entities (base for training) News outlets (not used for training)
in opinion dynamics, significantly explain the variation
5 18 0 1
in behaviors [25]. 42 69 11 5
1 1
RT network
RT network
3 20 5 0 1 12 58 3
Moreover, our model’s ability to reconstruct commu-
2
nities based on the accurate classification of textual pairs
1 30 0 3 3 2 6 10 (as shown in Figure 2) underscores its potential for com-
3
A B C D
munity reconstruction in scenarios where the interaction
A B C D
Stance network
Agreement
Stance network network is not provided.
Importantly, this approach also opens avenues for
0 25 50 75 100 (%)
Figure 2: Contingency table associated with retweet network studying network dynamics based on the probability
and stance network. Data is broken down by source type: of agreement between account pairs. This has signif-
political entities and news outlets. icant implications for understanding and potentially mit-
igating coordinated attacks, such as disinformation cam-
paigns and political propaganda. By identifying patterns
of agreement and disagreement, we can better detect and
Although the tweet-quote pairs used to train the model
analyze the strategies behind these coordinated efforts,
include only tweets from political entities, the result is
enhancing our ability to safeguard democratic processes
significant. The training set does not include pairs where
and public discourse.
the quote comes from a follower who exclusively retweets
political entities from the same ideological community as
the tweet’s author. This demonstrates the model’s ability 4. Materials and Methods
to reconstruct communities through precise classification
of textual pairs. Data Collection. Our dataset comprises approxima-
The contingency table for news outlets, while display- tively 15 million tweets collected by monitoring the ac-
ing less pronounced patterns overall, still demonstrate tivity of 583 profiles that reflect Italian online social di-
clear coherence in classification between the retweet net- alogue (e.g., La Repubblica, Il Corriere della Sera, Il Gior-
work and the stance network. This is particularly remark- nale). Profiles were selected based on the list of news
able considering that these profiles were not included in sites monitored by NewsGuard, a news rating agency
the model’s training set. The recovery of the retweet net- dedicated to assigning reliability scores. According to
work’s community structure within the stance network NewsGuard, this list covers approximately 95% of online
suggests that the model successfully generalizes across engagement with news, providing near-comprehensive
profiles with differing linguistic constraints, with only coverage of news-related dialogue [26].
a minimal loss in accuracy, while still allowing for the Additionally, we included Italian political entities in
reconstruction of group affiliations. the list of profiles. This inclusion encompasses all major
political parties and their leaders (e.g., Giorgia Meloni
and Fratelli d’Italia, Elly Schlein and PD, Giuseppe Conte
3. Discussion and M5S). For a complete list of the monitored political
profiles see Appendix - Table 4.
Stance detection remains a vital yet challenging area in
For each monitored profile, we collected all tweets
natural language processing (NLP), traditionally limited
from January 2018 to December 2022 using the Twitter/X
by the constraints of supervised learning. The availability
API before the limitations introduced by the new man-
of large language corpora, where interaction networks
agement1 . We also gathered all retweets (RTs) and quotes
can be reconstructed, offers a novel approach that in-
(QTs) of this content within the same time frame, limited
corporates the social and dynamic aspects of stance, as
to those tweets that gained at least 20 RTs or 10 QTs. The
outlined by Du Bois in his work on the stance triangle
following table provides a detailed breakdown of the data
[6].
matching these criteria.
Our model addresses a more complex task compared
to other state-of-the-art models. While existing models Category Profiles Tweets RTs QTs
typically classify a user’s stance on specific topics, our News 329 279, 793 16, 365, 178 3, 587, 830
model classifies claim-perspective pairs into favor and Politics 38 101, 017 15, 385, 363 2, 388, 621
against categories. This requires a deeper analysis of the TOTAL 367 380, 810 31, 750, 541 5, 976, 451
relational stance between multiple interacting users and
Table 2
their statements.
Breakdown of the dataset.
Despite this increased complexity, our model achieved
results comparable to those of existing state-of-the-art
models [23, 24]. This success supports the hypothesis
that in-group/out-group determinants, well-documented 1 https://twitter.com/XDevelopers/status/1621026986784337922
Community Detection. In order to reconstruct the ity requirement for a single political entity, we calculated
discourse communities from the twitter activity we built for each follower 𝑥 the total number of retweets of con-
a retweet network. In the context of the data collection tent produced by the set of political entities 𝒫 defined
strategy previously described, most RTs are from a non- in Table 4 and excluded the bottom 80% of the resulting
monitored user (a follower) to one of the users monitored distribution (i.e., we imposed |RT𝑥 (𝒫 )| > 7). For the re-
(a leader), excluding a few RTs from one leader to another maining users, we then assigned the label favor to those
(45, 299). We can therefore consider this network as a quotes of tweets from their preferred political entity and
bipartite network, i.e. a network where all links are from the label against to those quotes of tweets from entities
one node type to another, with 367 leaders and 934, 394 belonging to other political communities, as determined
followers, connected through links with a weight 𝑤𝑥𝑖 by the community detection analysis. This procedure
equal to the number of RTs from the follower 𝑥 to the resulted in the creation of a dataset containing 243, 277
leader 𝑖. unique claim-perspective (tweet-quote) pairs, each an-
To identify communities among leaders we assume notated with the corresponding stance. Since the label
that leaders with the same readership are more likely distribution of the dataset was unbalanced towards favor
to be in the same political community. We therefore (specifically, 142, 312 favor and 100, 965 against), we ran-
constructed a monopartite network by projecting on the domly removed 41, 347 favor pairs to obtain a balanced
leader layer, i.e. we construct a network from the set training set for the stance model. The removed pairs were
of all length two paths assigning weights that are the later used as additional test set to evaluate the model’s
product of the path’s links. accuracy.
We used the Bipartite Weighted Configuration Model Stance model. We initialized our model starting from
(BiWCM) to statistically validate our bipartite projec- UmBERTo [20], an Italian language model based on the
tion [27]. BiWCM accounts for weighted interactions RoBERTa architecture [21]. Specifically, we relied on the
and preserves the strength of nodes in both layers, en- cased version trained using SentencePiece tokenizer and
suring that our observed co-occurrences are not due to Whole Word Masking on a large corpus, encompassing
random chance but represent genuine structural patterns around 70 GB of text. This makes it highly effective for
in the data. In order to find political communities in various natural language processing tasks in Italian, as
the network, we applied the Louvain algorithm 1000 it leverages a vast and diverse dataset to understand the
times and selected the solution that minimized modu- nuances of the language [29, 30]. The pretrained model
larity, i.e., the strength of division of the network into was then fine-tuned on the constructed dataset of tweet-
clusters, with higher values indicating a structure where quote pairs to create a tool capable of inferring the stance
more edges lie within communities than would be ex- of claim-perspective text pairs: favor if the perspective
pected by chance [28]. agrees with the claim, and against otherwise. To input
The same procedure was followed to construct the the text pairs into the pretrained model, we utilized Um-
stance network and study its community structure. In BERTo’s special tokens. Specifically, we concatenated
this case, the weight of a link in the bipartite follower- the tweet and quote as
leader network indicates the fraction of favoring quotes
+ tweet + + quote + ,
from the follower to the leader’s tweets.
Claim-Perspective Pairs Selection. To construct a where , , and represent the start, sep-
dataset of claim-perspective text pairs annotated with aration, and end tokens, respectively. Since we set
the corresponding stance (favor if the perspective sup- max_seq_length = 256 , which limits the total number
ports the claim, against otherwise), we first identified of tokens that can be processed by the model, in cases
users who clearly expressed an (almost) absolute prefer- where the concatenated strings exceeded this limit, the
ence for a single political entity through their retweet longer text between the tweet and the quote was trun-
activity. Specifically, for each follower, we calculated cated. This ensures that the input remains within the
the distribution of their RTs across the political entities model’s processing capacity while preserving as much
defined in Table 4. Then, we filtered those who allocated information as possible from both texts. Conversely,
at least 80% of their RTs to a single political entity. Some shorter concatenated strings were padded using the spe-
users, although meeting the previous requirement, may cial token until they reached the 256-token limit.
not have had a sufficient level of retweet activity during Tweets and quotes were preprocessed before being con-
the analyzed period to be considered inclined towards catenated by removing URLs, mentions, non-UTF-8 char-
a particular political entity. For example, a user who acters, line breaks, and tabs.
has only given one retweet to the set of political profiles The pretrained UmBERTo model was imported into
would appear totally inclined towards a particular entity. Python from the HugginFace Transformers library [31]
To reduce the uncertainty arising from the indiscriminate as a model for sequence classification. The fine-tuning
inclusion of all profiles satisfying the high retweet activ- procedure enabled the model to output the probability dis-
tribution over the stance labels by minimizing the cross- heavily on the assumption that retweets are mainly a
entropy loss between the predicted labels and the true form of endorsement, and that quotes within one’s own
labels, effectively learning to classify the stance of claim- political community are all in agreement and that outside
perspective pairs. We chose to perform 5-fold cross- of one’s political community they are all in disagreement.
validation to ensure the reliability of the results [32]. While the high level of polarization observed in these
Namely, the data was first partitioned into 5 equally (or networks support the validity of these assumptions, it
nearly equally) sized segments or folds. Subsequently 5 also restricts the applicability of the model to domains
iterations of training and testing are performed such that where polarization is evident and these assumptions are
within each iteration a different fold of the data is held- valid.
out for testing while the remaining 4 folds are used for
learning. Thus, for each training-test split, we fine-tuned
the UmBERTo model for 4 epochs using a batch size of 64 Acknowledgments
(for both training and testing) and an improved version
We extend our deepest gratitude to Vittorio Loreto, the
of the Adam optimizer [33] with a learning rate of 5𝑒 − 5
director of the Sony Computer Science Laboratories (CSL)
and a weight decay of 0.01 for regularization. The chosen
and Professor at La Sapienza University of Rome, for his
hyperparameters are among those recommended in the
invaluable support and sponsorship of this research. His
literature[34, 21].
guidance was pivotal for the successful completion of our
study. We also thank the anonymous reviewers for their
5. Conclusion insightful suggestions, which have greatly contributed
to enhancing the quality of this work.
This study introduces a novel stance detection model that
significantly advances the understanding of alignment
and opposition in social discourse. By leveraging social References
network data from X (formerly Twitter), we developed a
[1] D. Küçük, F. Can, Stance detection: Concepts, ap-
robust training technique that utilizes interactions within
proaches, resources, and outstanding issues, in:
politically aligned communities. Our approach involved
Proceedings of the 44th International ACM SIGIR
curating a dataset of tweet/quote pairs, where the quotes
Conference on Research and Development in Infor-
are derived from users’ interactions with leaders and
mation Retrieval, 2021, pp. 2673–2676.
politicians. This dataset facilitated the training of a BERT
[2] N. Alturayeif, H. Luqman, M. Ahmed, A systematic
model, which achieved a state of the art accuracy of
review of machine learning techniques for stance
approximately 85%.
detection and its applications, Neural Computing
Our findings underscore the efficacy of using social
and Applications 35 (2023) 5113–5144.
network structures to train NLP models, demonstrating
[3] A. Aldayel, W. Magdy, It is more than what you say!:
that retweet interactions can serve as reliable indicators
Leveraging user online activity for improved stance
of political alignment. This methodology not only en-
detection, 2019. URL: https://2019.ic2s2.org/, 5th
hances the scalability of stance detection but also offers
International Conference on Computational Social
a nuanced understanding of political discourse on social
Science, IC2S2 2019 ; Conference date: 17-07-2019
media platforms. By reconstructing and validating polit-
Through 20-07-2019.
ically aligned communities through expert knowledge,
[4] A. Gupta, S. Mehta, Automatic stance detection for
our model provides a robust framework for analyzing the
twitter data, in: 2022 1st International Conference
alignment of social media statements.
on Informatics (ICI), IEEE, 2022, pp. 223–225.
The implications of this work extend beyond stance
[5] T. Draws, K. Natesan Ramamurthy, I. Bal-
detection, offering potential applications in monitoring
dini, A. Dhurandhar, I. Padhi, B. Timmermans,
political sentiment, identifying misinformation, and un-
N. Tintarev, Explainable cross-topic stance detec-
derstanding public opinion dynamics. Future research
tion for search results, in: Proceedings of the 2023
could explore the integration of additional social net-
Conference on Human Information Interaction and
work features and exploring the capacity of the model
Retrieval, 2023, pp. 221–235.
to generalize to other domains, interaction types and
[6] J. W. Du Bois, The stance triangle, Stancetaking in
understanding how stance propagates within networks.
discourse: Subjectivity, evaluation, interaction 164
Additionally, investigating the role of specific linguis-
(2007) 139–182.
tic markers like adverbs across different languages and
[7] World Economic Forum, Global Risks Report
cultures can reveal universal and language-specific de-
2024, Technical Report, World Economic Forum,
terminants of stance.
2024. URL: https://www.weforum.org/publications/
While our model shows promising results, it also relies
global-risks-report-2024/.
[8] L. Sayah, M. R. Hashemi, Exploring stance and consensus, in: A. P. Rocha, L. Steels, H. J. van den
engagement features in discourse analysis papers., Herik (Eds.), Proceedings of the 16th International
Theory & Practice in Language Studies (TPLS) 4 Conference on Agents and Artificial Intelligence,
(2014). ICAART 2024, Volume 3, Rome, Italy, February 24-
[9] D. Küçük, F. Can, Stance detection: A survey, ACM 26, 2024, SCITEPRESS, 2024, pp. 1405–1412. doi:10.
Computing Surveys (CSUR) 53 (2020) 1–37. 5220/0012595000003636 .
[10] C. Becatti, G. Caldarelli, R. Lambiotte, F. Saracco, [19] M. Pratelli, F. Saracco, M. Petrocchi, Entropy-based
Extracting significant signal of news consumption detection of twitter echo chambers, PNAS Nexus 3
from social networks: the case of Twitter in Italian (2024) pgae177. doi:10.1093/pnasnexus/pgae177 .
political elections, Palgrave Communications 5 [20] L. Parisi, S. Francia, P. Magnani, Umberto: an
(2019). doi:10.1057/s41599- 019- 0300- 3 . italian language model trained with whole word
[11] D. Boyd, S. Golder, G. Lotan, Tweet, tweet, retweet: masking, https://github.com/musixmatchresearch/
Conversational aspects of retweeting on twitter, in: umberto, 2020.
2010 43rd Hawaii International Conference on Sys- [21] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen,
tem Sciences, 2010, pp. 1–10. doi:10.1109/HICSS. O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
2010.412 . Roberta: A robustly optimized bert pretraining ap-
[12] N. Marsili, Retweeting: Its linguistic and epistemic proach, arXiv (2019). doi:10.48550/arXiv.1907.
value, Synthese 198 (2021) 10457–10483. 11692 .
[13] W. Chen, D. Pacheco, K.-C. Yang, F. Menczer, [22] S. S. Brier, Analysis of contingency tables under
Neutral bots probe political bias on social media, cluster sampling, Biometrika 67 (1980) 591–596.
Nature Communications 12 (2021). doi:10.1038/ [23] A. Rashed, M. Kutlu, K. Darwish, T. Elsayed,
s41467- 021- 25738- 6 . C. Bayrak, Embeddings-based clustering for tar-
[14] B. Nyhan, J. Settle, E. Thorson, M. Wojcieszak, get specific stances: The case of a polarized turkey,
P. Barberá, A. Y. Chen, H. Allcott, T. Brown, in: Proceedings of the International AAAI Confer-
A. Crespo-Tenorio, D. Dimmery, D. Freelon, ence on web and social media, volume 15, 2021, pp.
M. Gentzkow, S. González-Bailón, A. M. Guess, 537–548.
E. Kennedy, Y. M. Kim, D. Lazer, N. Malhotra, [24] S. Shi, K. Qiao, J. Chen, S. Yang, J. Yang, B. Song,
D. Moehler, J. Pan, D. R. Thomas, R. Tromble, L. Wang, B. Yan, Mgtab: A multi-relational graph-
C. V. Rivera, A. Wilkins, B. Xiong, C. K. de Jonge, based twitter account detection benchmark, arXiv
A. Franco, W. Mason, N. J. Stroud, J. A. Tucker, preprint arXiv:2301.01123 (2023).
Like-minded sources on facebook are prevalent [25] S. Rathje, J. J. Van Bavel, S. Van Der Linden, Out-
but not polarizing, Nature 620 (2023) 137–144. group animosity drives engagement on social me-
doi:10.1038/s41586- 023- 06297- w . dia, Proceedings of the National Academy of Sci-
[15] P. Gravino, D. R. Lo Sardo, E. Brugnoli, Cross- ences 118 (2021) e2024292118.
platform impact of social media algorithmic adjust- [26] NewsguardTech.com, Social impact report
ments on public discourse, ArXiv (2024). doi:10. 2021, 2022. Available from https://www.
48550/arXiv.2405.00008 . newsguardtech.com/wp-content/uploads/2022/
[16] S. González-Bailón, D. Lazer, P. Barberá, M. Zhang, 01/NewsGuard-Social-Impact-Report-1.21.22.pdf
H. Allcott, T. Brown, A. Crespo-Tenorio, D. Freelon, (accessed Nov 27, 2023).
M. Gentzkow, A. M. Guess, S. Iyengar, Y. M. Kim, [27] M. Bruno, D. Mazzilli, A. Patelli, T. Squartini,
N. Malhotra, D. Moehler, B. Nyhan, J. Pan, C. V. F. Saracco, Inferring comparative advantage via
Rivera, J. Settle, E. Thorson, R. Tromble, A. Wilkins, entropy maximization, Journal of Physics: Com-
M. Wojcieszak, C. Kiewiet de Jonge, A. Franco, plexity 4 (2023) 045011. doi:10.1088/2632- 072X/
W. Mason, N. Jomini Stroud, J. A. Tucker, Asymmet- ad1411 .
ric ideological segregation in exposure to political [28] M. E. J. Newman, M. Girvan, Finding and evaluating
news on facebook, Science 381 (2023) 392–398. community structure in networks, Phys. Rev. E 69
doi:10.1126/science.ade7138 . (2004) 026113. doi:10.1103/PhysRevE.69.026113 .
[17] V. D. Blondel, J.-L. Guillame, R. Lambiotte, E. Lefeb- [29] F. Bianchi, D. Nozza, D. Hovy, FEEL-IT: Emotion
vre, Fast unfolding of communities in large net- and sentiment classification for the Italian language,
works, Journal of Statistical Mechanics: The- in: O. De Clercq, A. Balahur, J. Sedoc, V. Barriere,
ory and Experiment 10008 (2008). doi:10.1088/ S. Tafreshi, S. Buechel, V. Hoste (Eds.), Proceed-
1742- 5468/2008/10/P10008 . ings of the Eleventh Workshop on Computational
[18] E. Brugnoli, P. Gravino, D. R. Lo Sardo, V. Loreto, Approaches to Subjectivity, Sentiment and Social
G. Prevedello, Fine-grained clustering of social Media Analysis, Association for Computational Lin-
media: How moral triggers drive preferences and guistics, Online, 2021, pp. 76–83.
[30] F. Tamburini, How “bertology” changed the state-
of-the-art also for italian nlp, in: Proceedings of
the Seventh Italian Conference on Computational Political entity Twitter profiles
Linguistics, CLiC-it 2020, Online, 2020. +Europa piu_europa, emmabonino
[31] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. De- Articolo Uno articolounodp, robersperanza
langue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Fun- Azione azione_it, carlocalenda
towicz, et al., Huggingface’s transformers: State-of- Cambiamo! giovannitoti
the-art natural language processing, arXiv (2019). Coraggio Italia coraggio_italia, luigibrugnaro
doi:10.48550/arXiv.1910.03771 . Democrazia e Autonomia movimentodema
Europa Verde europaverde_it, angelobonelli1
[32] P. Refaeilzadeh, L. Tang, H. Liu, Cross-Validation,
FdI giorgiameloni, fratelliditalia
Springer US, Boston, MA, 2009, pp. 532–538. doi:10. FI forza_italia, berlusconi
1007/978- 0- 387- 39940- 9_565 . ItalExit gparagone
[33] I. Loshchilov, F. Hutter, Decoupled weight decay IV italiaviva, matteorenzi
regularization, arXiv (2017). doi:10.48550/arXiv. Lega legasalvini, matteosalvinimi
1711.05101 . giuseppeconteit, mov5stelle,
M5S
[34] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, luigidimaio
Bert: Pre-training of deep bidirectional transform- ManifestA manifesta_it
ers for language understanding, arXiv (2018). NcI maurizio_lupi
doi:10.48550/arXiv.1810.04805 . pdnetwork, enricoletta, sbonaccini,
PD
ellyesse
Automatic Potere al Popolo potere_alpopolo
Favor Against Σ Rifondazione comunista direzioneprc
Favor 221 7 228 SI si_sinistra, nfratoianni
Manual
Against 16 209 225 Unione di Centro antoniodepoli
Neutral 13 34 37 Unione Popolare unione_popolare, demagistris
Σ 250 250 500
Table 4
Table 3 List of Twitter profiles related to the main political entities
Comparison between manual and automatic annotation for active in Italy during the five-year period 2018-2022.
500 randomly selected tweet-quote pairs. The F1 score for the
Favor category is 0.86, and for the Against category, it is 0.86
as well. These results indicate a strong agreement between
manual and automatic annotation methods, especially consid-
ering that the unsupervised stance classification method does
not account for labels other than Favor and Against, while
some contents were manually classified as Neutral. Predicted
Favor Against Σ
Favor 70, 690 10, 082 80, 772
Actual
Against 10, 517 70, 255 80, 722
Σ 81, 207 80, 337 161, 544
(a) training set
Predicted
Favor Against Σ
Favor 16, 929 3, 264 20, 193
Actual
Against 2, 740 17, 453 20, 193
Σ 19, 669 20, 717 40, 386
(b) test set
Table 5
Confusion matrices for both the (a) training and (b) test sets.