<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>The Italian Conference on CyberSecurity, April</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Inferring Political Leaning on X (Twitter): A Zero-Shot Approach in an Italian Scenario</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Caterina Senette</string-name>
          <email>caterina.senette@iit.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Margherita Gambini</string-name>
          <email>margherita.gambini@iit.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tiziano Fagni</string-name>
          <email>tiziano.fagni@iit.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Victoria Popa</string-name>
          <email>victoria.popa@iit.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maurizio Tesconi</string-name>
          <email>maurizio.tesconi@iit.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Informatics and Telematics (IIT) - CNR</institution>
          ,
          <addr-line>Via Giuseppe Moruzzi, 1 56124 Pisa -</addr-line>
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università di Pisa</institution>
          ,
          <addr-line>Dipartimento di Computer Science, Largo Bruno Pontecorvo, 3, 56127 Pisa</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>0</volume>
      <fpage>8</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>In recent years, there has been growing attention on predicting the political orientation of active social media users, aiding in political forecasts, modeling opinion dynamics, and understanding user polarization. Existing methods, primarily for X (Twitter) users, use content-based or a blend of content, network, and communication analysis. The latest research highlights that a user's political stance mainly hinges on their views on key political and social issues, prompting a shift towards detecting user stances through their content shared on social networks. This work investigates the use of an unsupervised stance-detection framework Tweets2Stance (T2S) based on zero-shot classification (ZSC) models [ 1] to predict users' stances toward a set of social-political statements using content-based analysis of their X (Twitter) timelines in an Italian scenario. The ground-truth user stances are drawn from Voting Advice Applications (VAAs), tools aiding citizens in identifying their political leanings by comparing their preferences with party stances. Leveraging the agreement levels of six parties on 20 statements from VAAs, the study aims to predict Party p's stance on each statement s using X (Twitter) Party account data. T2S, employing zero-shot learning, proves efective across various contexts beyond politics, showcasing a minimum MAE of 1.13 despite a general maximum F1 value of 0.4, demonstrating significant progress given the task complexity.</p>
      </abstract>
      <kwd-group>
        <kwd>user stance detection</kwd>
        <kwd>Zero-shot learning</kwd>
        <kwd>unsupervised ML</kwd>
        <kwd>political leaning</kwd>
        <kwd>X (Twitter)</kwd>
        <kwd>VAA</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        During the last few years, there has been a growing attention towards social media for what
is explicitly shared among users (content, thoughts, and behavior), as well as for what is
hidden and latent. Among this latent information, the user’s stance, i.e. the expression of a
user’s point of view and perception toward a given statement [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], is particularly interesting;
in fact, stance detection on social media is an emerging opinion mining paradigm that well
applies in diferent social and political contexts, and for which many researchers are working to
propose solutions ranging from natural language processing, web science, and social computing
[
        <xref ref-type="bibr" rid="ref3">3, 4, 5, 6, 7, 8, 9, 10</xref>
        ]. Some work [
        <xref ref-type="bibr" rid="ref3">3, 11</xref>
        ] dealt with stance-detection at the user-level; however,
to the best of our knowledge, a completely unsupervised technique exploiting user’s textual
content only has never been explored. Hence, the work herein described investigates the
use of an unsupervised stance-detection framework based on zero-shot learning models and
previously introduced by us [
        <xref ref-type="bibr" rid="ref1">12, 1</xref>
        ] named   2 ( 2 ), to detect the stance of a X
(Twitter) account using its timeline in an Italian scenario. The idea for this framework stems
out from observing how Voting Advice Applications (VAAs) work. Voting Advice Applications,
originally developed in the 1980s as paper-and-pencil civic education questionnaires [13], are
online tools that aid citizens, mainly before elections, to identify their political leaning by
comparing their policy preferences with the political stances of parties or candidates running
for ofice. VAAs are widespread in many countries and have a crucial role in online election
campaigns worldwide. Basically, the user marks its position on a range of policy statements. The
application compares the individual’s answers to the positions of each Party or candidate and
generates a rank-ordered list or a graph indicating which Party or candidate is located closest
to the user’s policy preferences. One of the crucial elements of the VAAs is the questionnaire:
the selection of the statements, their balance among the political poles, and their phrasing have
an impact both on the way in which users respond, as well as on the overall users‘ engagement
on the poll itself. For these reasons, the VAA’s issued statements should cover the spectrum
of the most important topics of an election campaign and adequately show crucial diferences
among all the competitors in the political scenario for which the VAA is designed [14]. This
careful definition of the questionnaire, i.e. taking into high consideration the main topics under
discussion at a certain time, suggested us the possibility of using the oficial position of Italian
parties about specific political statements (during a certain political election period) as the
ground-truth to determine that stance from the timeline of the X (Twitter) Party accounts in a
completely unsupervised way 1; notice that only tweets written during the pre-election period
are considered.
      </p>
      <p>
        Objectives Starting from the knowledge of the agreement level of six parties on 20 diferent
statements (VAA’s statements), the objective of the study is to predict the stance of a Party 
toward each statement  exploiting what the X (Twitter) Party account wrote on X (Twitter).
Diferently from previous works in the literature [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], our classification model is built for diferent
topics and we come up with a fine-grained stance-detector solution working along five classes
that could be generalized to various spheres, not just the political one.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>
        Stance-detection is an emerging opinion-mining paradigm that well applies in several social
and political scenarios. The state of the art resumed in a highly valuable survey [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] highlights
the importance of categorization since stance-detection can be classified according to the target
(single, multi-related, or claim-based), according to the task type (detection or prediction) or
distinguishing between stance at user level or the statement level. At the statement level [17, 18],
1the Italian Parties’ oficial positions about 20 political statements were kindly provided by the Observatory on
Political Parties and Representation [15] based on the VAA NavigatoreElettorale for the European Elections 2019
[16]
whose objective is to predict the stance described in a piece of text, previous research works
are mainly based on Natural Language Processing (NLP) methods and classification tasks with
three classes (support/against/none). Instead, at the user level, the objective is to predict the
stance of a user toward a given topic and generally, prediction solutions incorporate diferent
users’ attributes along with the text of their posts. Our work falls under the category of stance
detection tasks at the user level, specifically focusing on target-specific stances—a common
approach in social media stance detection. This involves predicting stances on specific topics,
often using separate classification models for each topic. Notable approaches [ 19, 4, 5, 20]
utilize post text along with various user attributes, typically employing binary classification
(support and against). Lynn et al. [11] explored using user-level features alone versus
documentlevel features in predicting tweet stances without the tweets highlighting the importance of
integrating user features into predictive systems. Other target-specific strategies in literature
were conducted at the statement level [6, 7, 8, 21]. In [6] the approach was conducted at
the statement level through unsupervised methods, and classification was made along three
positions (favour, against, neither). In [7] is introduced a stance-detection shared task, where
teams inferred three-level tweet stances using natural language systems: for, against, or neutral
towards the given target. Divided into supervised (Task A) and unsupervised (Task B) sub-tasks,
they received 19 and 9 team submissions respectively. The highest F-score reached was 67.82
for Task A and 56.28 for Task B. As mentioned above, target-specific approaches could consider
single or multiple targets. Usually, the concept of multi-target classification has been used to
analyse the relation between two political candidates by using domain knowledge about these
targets. In that case, the same model can be applied to diferent targets on the hypothesis that
the same piece of text that contains the stance in favour to a target, it also implicitly contains
the stance against the other [22, 9, 23]. Our method handles a broad multi-target classification
task, where each statement represents a specific target. Unlike previous methods, it operates
without the need for pre-selected texts or distinct models for each target
      </p>
      <sec id="sec-3-1">
        <title>2.1. Machine Learning (ML) approaches</title>
        <p>Among ML features for stance detection, the literature distinguishes between linguistic features,
revealing stance based on text-linguistic features [24, 7], and users’ vocabulary, which is based on
their choice of words [10, 25]. Since textual cues could refer both to textual features, sentiment,
and semantics, we limit our attention to textual features. In this context, the most used ML
approaches are based on supervised techniques [19, 5, 23, 18, 26]. Some works attempted to
enrich dataset entities applying unconstrained supervised methods such as transfer learning,
weak-supervision, and distant supervision methods for stance detection [6, 4]. Other innovative
approaches are those that propose unsupervised learning strategies [10, 27, 28] exploiting
clustering techniques and embeddings representations of users’ tweet[29]. The limitations
across these studies include: (i) time-intensive data collection and analysis, particularly with
network-based approaches; (ii) challenges in accessing or retrieving necessary data due to
stringent social media data protection policies; (iii) most models are limited to two or three
stance classes at most; (iv) reliance on supervised or semi-supervised models, which require
large datasets and have limited generalizability tied to training sets[30]. For all these reasons,
the recent challenge for user-level and target-specific stance detection is to move towards
unsupervised systems exploiting textual content only. To this aim, a ZSL technique exploiting
advanced pre-trained Natural Language Inference (NLI) models [24, 31] can be a viable solution
as our T2S framework proved.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Task Definition</title>
      <p>The task is to predict the stance   of a Social Media User  with respect to a social-political
statement (or sentence)  making use of the User’s textual content timeline on the considered
social media (e.g., the X (Twitter) timeline). The stance   represents a five-level categorical
label: completely agree (5), agree (4), neither disagree nor agree (3), disagree (2), completely disagree
(1). The integer mappings used by the Tweets2Stance framework are shown in parentheses.</p>
      <p>The desired ground-truth is the label   , which is the known agreement/ disagreement level
of User  in regard to sentence  . Remind that the ground-truth is only used to evaluate our
proposed   2 framework and find its optimal parameters; no training step ever occurs.
In this work, we assume that users are the X (Twitter) accounts of six Italian Parties, as the
following section will detail.
4. Data collection and Pre-processing
The political scenario under analysis refers to the European and Municipal elections in Italy on
26th May 2019, when Italian citizens were called for the election of the Italian representatives
to the European Parliament. The number of Members of the European Parliament (751 deputies
in total) for each country is approximately proportional to the population. In 2019, Italy had to
elect 76 deputies. Contextually, Italian voters had also to participate in the municipal election of
mayors, municipal and district councillors (in about 3800 Italian municipalities), with a planned
run-of on 9th June 2019. In that context, we focused our attention on the six major parties in
Italy: three center-right parties including Forza Italia (FI), Fratelli d’Italia (FDI), and Lega, two
left-wing parties including Partito Democratico (PD) and +Europa (+Eu) 2, and the Movimento 5
Stelle (M5S) representing a sort of third pole at that time. The Italian parliament included other
minor parties, especially on the left- wing, representing less than 5% of the Italian population
each. We did not consider these parties in the current study. As previously said, we started
from the assumption that knowing the parties’ answers on the VAA’s statements, it is possible
to predict the stance of a Party  in regard to each statement  exploiting what the Party wrote
on X (Twitter). The definition of the 20 statements (Table 3 in Appendix A) that express the
political positions of the six referenced parties towards selected themes under discussion in Italy
and in Europe in 2019, was entrusted to a group of political experts [15, 16] who provided us
with the ground-truth   for each Party  and statement  on which the current work is based.
At first, we collected timelines of the oficial X (Twitter) account of each party using the oficial
X (Twitter) API3. Considering the speed with which political discussion nowadays takes place,
especially on social media, the observation period was adequately chosen in order to maximize
the number of tweets avoiding noise and of-topic content. Furthermore, to intercept any
2+Europa was recently born in 2018 and it is characterized for a pro-European and liberal orientation.
3https://developer.X(Twitter).com/en/docs
valuable information or discussion trends over time we have extended the analysis considering
four temporal ranges and built the associated datasets4 as described in Table 1.</p>
      <p>As a preliminary step, since the text collected from tweets contains a lot of noise and irrelevant
information, we pre-processed the tweets in order to remove anything which doesn’t have
predicting significance, such as: item URLs, ”  @ ∶ ” prefix of retweets, mentions at the
beginning of a reply tweet, tweets with {1, 2, 3}words and empty tweets, hashtags and emojis
(replaced with empty string). Lastly, since we wanted to test our prediction approach on English
tweets as well, we further translated the Italian tweets using the google_trans_new5 Python
package.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Framework Design</title>
      <p>
        This section briefly describes the proposed Tweets2Stance (T2S) framework (Fig. 1) to detect
the stance   of a X (Twitter) User  in regard to a sentence  , exploiting its X (Twitter) timeline
   = [ 1, ...,   ]. More details of the framework are provided in a previous work where we
have extensively introduced it [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>A User might either not talk about a specific political argument (here expressed with sentence
 ), or debate on an issue not risen by our pre-defined set of statements. For these reasons, our
framework executes a preliminary      step, exploiting a Zero-Shot Classifier (ZSC)
to get only those tweets talking about the topic  of the sentence  . A ZSC is a
language-modelbased method that, given a text and a set of labels (e.g., topics), assigns a classification probability
score to each label [21]. The higher the score assigned to a label, the higher the likelihood that
the input text pertains to that specific label. ZSC does not require further fine-tuning on the
target dataset. After obtaining the in-topic tweets    through Topic Filtering, the Agreement
Detector module employs the same ZSC to detect the user’s agreement/disagreement level. In
Fig. 1 we use colour-codes to identify the four parameters of the  2 framework that we’ll vary
during our experiments, as explained in Section 6.</p>
      <p>Topic Filtering The      module extracts the in-topic tweets    from the X (Twitter)
Timeline    of Party  , using the topic   associated with sentence  (e.g., the topic for the
sentence ”overall, membership in the EU has been a bad thing for the UK ” can be ”UK membership
in EU ”). The topic definitions for all considered sentences can be found in the linked repository.
4The four raw datasets can be found at https://github.com/marghe943/Tweets2Stance_dataset
5https://pypi.org/project/google-trans-new/
•• DD34 •• DD57</p>
      <p>Dataset
extract
User Timeline</p>
      <p>TLu</p>
      <p>Tweets2Stance
Topic tps</p>
      <p>Topic Filtering</p>
      <p>Treshold th
• 0,5 • 0,6 • 0,7
• 0,8 • 0,9
o BART
Language</p>
      <p>Model LM
Zero-shot classifier</p>
      <p>C
Filtered Tweets</p>
      <p>Sentence s
Agreement</p>
      <p>Predictor
Algorithm Alg</p>
      <p>Alg1
Alg3</p>
      <p>Alg2
Alg4</p>
      <p>Agreement
label
regard to sentence  . The inputs are the X (Twitter) timeline    extracted from a certain time-period
dataset   , the sentence  , the topic 
algorithm 
as explained in Section 6.</p>
      <p>associated with  , a language model 
, a threshold ℎ and an
. The highlighted components are the parameters that we’ll vary during our experiments,
scores    .</p>
      <p>The module utilizes the ZSC  to retrieve the in-topic tweets   
and their corresponding topic

Agreement Detector</p>
      <p>The</p>
      <p>ifve-valued label   through an algorithm (</p>
      <p>,   ), defining
module (Fig. 1 - Module 2) computes the final

  = {(</p>
      <p>, )|  ∈    }
as the  scores of tweets   
agreement of tweet   with sentence  .</p>
      <p>Each employed algorithm</p>
      <p>exploits one of the following mapping functions:
with respect to sentence  , each one indicating the relevance and
1() =
⎪
⎪
⎨
⎪</p>
      <p>ranges from 1 to 5, corresponding to the five agreement/disagreement labels
defined in Section
3. Similarly,  2()</p>
      <p>
        ranges from 1 to 4, representing an intermediate
agreement/disagreement scale. Specifically,  2() = {1, 2}
while  2() = 3
indicates agreement and  2() = 4
has the same meaning as in Section 3,
represents complete agreement. The
rationale behind this intermediate mapping is explained in Algorithm 4 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        We defined four algorithms with diferent complexity levels, details of each one are provided
in the Appendix B and the already mentioned work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
(1)
(3)
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Experimental Setup</title>
      <sec id="sec-6-1">
        <title>6.1. Baselines</title>
        <p>
          It is a good practice to compare the proposed methods with a bunch of baselines. To the best of
our knowledge, no baseline method has been devised for the typology of our stance detection
task yet: unlike our approach, the state-of-the-art unsupervised user-stance detection method
proposed by Darwish et al. [10] cannot operate without context information from other users
and it is not suitable for a multi-class ordinal classification like our case. Therefore, the following
baselines to compute   for Party  and sentence  were used:
Random   is set to a random integer picked from a discrete uniform distribution of  ∈ [
          <xref ref-type="bibr" rid="ref1">1, 5</xref>
          ] .
        </p>
        <p>The numpy random method6 was used with random seed set to 42. .</p>
        <p>Predict 3   is set to 3 (neither disagree, nor agree).</p>
        <p>Sentence Bert The newest Transformer-based language models like BERT can be used as
feature extractors [32], providing contextual word and sentence embeddings. The
SentenceBert architecture of the Sentence Transformers Python library7 was used with the English
all-mpnet-base-v2 model on translated tweets, and with the multi-lingual model
distilusebase-multilingual-cased-v1 on the Italian tweets.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Experiments in detail</title>
        <p>As already explained in section 5, our  2 method has got four parameters to tune: the language
model  to be used for zero-shot classification, the dataset  from which extract the X
(Twitter) timeline    , the algorithm  for the   step, and the threshold value ℎ for
the      step. Considering the values of those parameters in Fig. 1, we carried out
each experiment having in mind the four research questions summarized in Table 2 and ordered
by specificity.</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Evaluation</title>
        <p>In evaluating the stance detection model, traditional metrics like MSE, MAE, R2 Score, and
Residual Plots are common. However, a bespoke metric is needed to address varying error
importance across stance classes. For instance, misclassifying agree instead of completely
disagree carries a diferent weight than neither disagree, nor agree instead of agree. In the
absence of such a metric, MAE is chosen. Lastly, since the predicted value is an integer among
{1, 2, 3, 4, 5}, a classification evaluation metric was considered as well: the weighted F1 score was
picked, since it summarizes both Precision and Recall [33]. The sklearn.metrics Python package
was used to compute both MAE8 and F1_weighted9
6https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html
7https://www.sbert.net/
8https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html
9https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Results and Discussion</title>
      <sec id="sec-7-1">
        <title>7.1. Best Language Model LM</title>
        <p>First, we explored which is the best language model for ZSC on Italian tweets: a model
pretrained on a mix of languages including Italian or one fine-tuned on Italian text? Also, would
results improve with an English model on translated tweets? Furthermore, would the results
benefit from using an English language model on translated tweets instead? We answered these
questions by looking at Fig. 2: each cell (  ,   ) indicates the minimum MAE (maximum F1)
obtained with our  2 method for a certain language model   and dataset   by varying the
algorithm  and the threshold ℎ according to Fig. 1.</p>
        <p>MAE</p>
        <p>F1
XRoberta1
1.28
1.27
1.27
1.29</p>
        <p>XRoberta1
0.24
0.25
0.25
0.25
l
e
dXRoberta2
o
m
1.32
1.31
1.28
1.28
0.29
0.29
0.27</p>
        <p>0.26
1.15
1.30
1.25
1.20 l
e
dXRoberta2
o
m
BART
1.19
1.13
1.18
1.25</p>
        <p>BART
0.37
0.40
0.38
0.36
D3</p>
        <p>D4 D5
dataset</p>
        <p>D7</p>
        <p>D3</p>
        <p>D4 D5
dataset</p>
        <p>D7
 ,   ) of language models and</p>
        <p>Among the cross-lingual models    1 and    2, the best one seemed to be
   1: it had an overall better MAE, while F1 results were close to    2’s; we
considered MAE as the first metric to judge the performances since it tells how much we are close to
the correct answer. Apparently, fine-tuning on an Italian translation of a subset of the MNLI
dataset (   2) doesn’t contribute a lot to text classification in our  2 framework. All
in all, the best choice is translating the pre-processed tweets in English and using an English
model like  : it reached significantly higher values on both MAE and F1. Supposedly, using
a model pre-trained and fine-tuned on a single language gives better results for our prediction
task: learning on a single language allows us to focus on more details and features of the
language.</p>
      </sec>
      <sec id="sec-7-2">
        <title>7.2. Best Dataset D</title>
        <p>The choice of the dataset’s time period (  ) as one of the parameters to tune is motivated by the
use of T2S for stance detection during political elections, where the proximity to the elections
may impact the likelihood of users discussing socio-political topics. Fixed the language model
 =  , the dataset  4 was immediately detected as the best one, since it had the best
MAE and F1 (Fig. 2). Presumably, the X (Twitter) political discussion four months before the
Italian elections was enough to grasp the Parties’ stances. We evaluated the mean MAE and
mean F1 for each cell (  ,   ) of Fig. 2 as well, but the results confirmed  and  4 as the
best language model and dataset.</p>
      </sec>
      <sec id="sec-7-3">
        <title>7.3. Best Algorithm Alg</title>
        <p>Once the language model  =  and dataset  4 were chosen, we tested our algorithms
 against the baselines   ,   3, and     , examining the best  across all
thresholds ℎ . Fig. 3 describes how much each algorithm performed across diferent thresholds.
These results include the performances of the three baselines as well. Altogether, the optimal
algorithm can be identified in 3 : F1 seemed to contradict it and bend over 4 instead, but
the gain over the prediction error is far more important. This result suggests that assigning the
neutral label (neither disagree, nor agree) only when there’s a minimum number of tweets 
does not boost the performance of our  2 method. Also, we executed 4 with  = {2, 3} ,
ifnding out that the results didn’t vary a lot from each other; therefore, we showed 4 =3 in
Fig. 3.</p>
      </sec>
      <sec id="sec-7-4">
        <title>7.4. Best Threshold th and Party Analysis</title>
        <p>Fixed the language model  =  , the dataset  4 and the algorithm 3 , threshold ℎ = 0.6
was immediately detected as the optimal one, since it had the best MAE and a good F1 (Fig.
3). Therefore, the best setup   of our  2 framework was ( ,   , , ℎ) = ( ,  4,
3, 0.6) . To explore the specific performance of our  2 method over the Parties, we used
the optimal setup   but by varying the threshold ℎ . Fig. 4 shows the results. Each point
indicates the MAE (F1) on the 20 sentences’ agreement level   for a certain Party  . Each Party
behaves diferently, thus it is likely that  2 highly depends on the Party’s timeline in terms of
how much it generally writes, how much it writes in-topic, and how much it writes using figures
of speech or hashtags and emojis (which we removed). Looking at both the MAE and F1, we
observed a regular trend for thresholds ℎ = {0.8, 0.9} for five parties out of six: the outlier Party
EA1.4
M
all
(all)
Alg1
Alg2
Alg3</p>
        <p>0.70
threshold th
Alg4
sentence_bert
all
(all)
0.50
0.55
0.60
0.65
0.75
0.80
0.85
0.90
0.50
0.55
0.60
0.65
0.75
0.80
0.85</p>
        <p>0.90
0.70
threshold th
  5 was more predictable for those thresholds. That may happen because the user’s
timeline deals with a certain statement in a clearer way; for example, looking at   5
and    _ ’s tweets filtered for the sentence 19 and ℎ = 0.9 , we saw that   5
wrote clearer and explicit tweets supporting the argument (it completely agrees), while from
   _ ’s timeline it’s not immediately clear that it disagrees;    _ tweeted about tax
reduction, fewer fees on families, and job creation, in that case, our  2 framework marked it
’completely agree’ since the party didn’t explicitly disagree with income support for the poorest
as beneficial for the Italian economy.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusions and Future Work</title>
      <p>
        In this work, we investigate the use of an unsupervised stance-detection framework
Tweets2Stance (T2S) based on zero-shot classification [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to predict users’ stances toward
a set of social-political statements using content-based analysis of their X (Twitter) timelines in
an Italian scenario. In particular, we dealt with the stance of 20 political statements for the six
major parties in Italy. Results showed that, although the general maximum F1 value was 0.4,
 2 could correctly predict the stance with a general minimum MAE of 1.13, which is a great
0.50
0.55
0.60
0.65
0.75
0.80
0.85
      </p>
      <p>0.90
0.70
threshold th
pdnetwork
Piu_Europa</p>
      <p>Mov5Stelle
forza_italia</p>
      <p>LegaSalvini
FratellidItalia
0.50
0.55
0.60
0.65
0.75
0.80
0.85</p>
      <p>0.90
0.70
threshold th
achievement considering that MAE tells how close we are to the correct answer, and that we
worked with a final five-valued label. Also, as we hypothesized, the  2 ’s performance highly
depends on how the X (Twitter) account of the Party (hence the social media user) writes, e.g.
the employed figures of speech, the words used, and so on. As mentioned when introducing the
work, the approach is potentially generalizable to several topics. If applied to political discourse,
it could represent the first step of a pipeline whose output is the user’s political leaning. In
the near future, we will investigate how T2S’s agreement levels output can be used to derive
the political leaning of a social media user, for example by trying to emulate a VAA algorithm.
Besides, we hope to apply it to detect extremist accounts on social media; however, a domain
expert may be needed to define precise social statements to use. Future research could address
T2S limitations by using advanced models like GPT-4 or conversational AI such as ChatGPT for
robust stance detection.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>
        We thank Project SERICS (PE00000014) - NRRP MUR program funded by the EU- NGEU, and
Project SoBigData-PlusPlus Grant Agreement number: 871042 CUP B54I1900639000.
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] A. ALDayel, W. Magdy, Stance detection on social media: State of the art and trends,
      </p>
      <p>Information Processing &amp; Management 58 (2021) 102597.
[4] M. Dias, K. Becker, Inf-ufrgs-opinion-mining at semeval-2016 task 6: Automatic generation
of a training corpus for unsupervised identification of stance in tweets, in: Proceedings
of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 2016, pp.
378–383.
[5] Y. Igarashi, H. Komatsu, S. Kobayashi, N. Okazaki, K. Inui, Tohoku at semeval-2016 task
6: Feature-based model versus convolutional neural network for stance detection, in:
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016),
2016, pp. 401–407.
[6] I. Augenstein, T. Rocktäschel, A. Vlachos, K. Bontcheva, Stance detection with bidirectional
conditional encoding, arXiv preprint arXiv:1606.05464 (2016).
[7] S. Mohammad, S. Kiritchenko, P. Sobhani, X. Zhu, C. Cherry, Semeval-2016 task 6:
Detecting stance in tweets, in: Proceedings of the 10th international workshop on semantic
evaluation (SemEval-2016), 2016, pp. 31–41.
[8] S. Hamidian, M. T. Diab, Rumor detection and classification for twitter data, arXiv preprint
arXiv:1912.08926 (2019).
[9] K. Darwish, W. Magdy, T. Zanouda, Improved stance prediction in a user similarity feature
space, in: Proceedings of the 2017 IEEE/ACM international conference on advances in
social networks analysis and mining 2017, 2017, pp. 145–148.
[10] K. Darwish, P. Stefanov, M. Aupetit, P. Nakov, Unsupervised user stance detection on
twitter, in: Proceedings of the International AAAI Conference on Web and Social Media,
volume 14, 2020, pp. 141–152.
[11] V. Lynn, S. Giorgi, N. Balasubramanian, H. A. Schwartz, Tweet classification without the
tweet: An empirical examination of user versus document attributes, in: Proceedings of
the Third Workshop on Natural Language Processing and Computational Social Science,
2019, pp. 18–28.
[12] M. Gambini, T. Fagni, C. Senette, M. Tesconi, Tweets2stance: users stance detection
exploiting zero-shot learning algorithms on tweets, arXiv preprint arXiv:2204.10710
(2022).
[13] L. Cedroni, Voting Advice Applications in Europe: The state of the art, Scriptaweb, 2010.
[14] T. Louwerse, M. Rosema, The design efects of voting advice applications: Comparing
methods of calculating matches, Acta politica 49 (2014) 286–312.
[15] OPPR, Opi - observatory on political parties and representation, ???? URL: http://opi.sp.</p>
      <p>unipi.it/opi-political-parties/.
[16] O. on Political Parties, R. (OPPR), Navigatoreelettorale europee 2019, 2019. URL: http:
//opi.sp.unipi.it/opi-political-parties/oppr-projects/.
[17] A. Murakami, R. Raymond, Support or oppose? classifying positions in online debates
from reply activities and opinion expressions, in: Coling 2010: Posters, 2010, pp. 869–875.
[18] M. A. Walker, P. Anand, R. Abbott, J. E. F. Tree, C. Martell, J. King, That is your evidence?:</p>
      <p>Classifying stance in online political debate, Decision Support Systems 53 (2012) 719–729.
[19] S. Gottipati, M. Qiu, L. Yang, F. Zhu, J. Jiang, Predicting user’s political party using
ideological stances, in: International Conference on Social Informatics, Springer, 2013, pp.
177–191.
[20] A. Aldayel, W. Magdy, Your stance is exposed! analysing possible factors for stance
detection on social media, Proceedings of the ACM on Human-Computer Interaction 3
(2019) 1–20.
[21] W. Yin, J. Hay, D. Roth, Benchmarking zero-shot text classification: Datasets, evaluation
and entailment approach, in: Proceedings of the 2019 Conference on Empirical Methods
in Natural Language Processing and the 9th International Joint Conference on Natural
Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong
Kong, China, 2019, pp. 3914–3923. URL: https://aclanthology.org/D19-1404. doi:10.18653/
v1/D19- 1404.
[22] P. Sobhani, D. Inkpen, X. Zhu, A dataset for multi-target stance detection, in: Proceedings
of the 15th Conference of the European Chapter of the Association for Computational
Linguistics: Volume 2, Short Papers, 2017, pp. 551–557.
[23] M. Lai, V. Patti, G. Rufo, P. Rosso, Stance evolution and twitter interactions in an italian
political debate, in: International Conference on Applications of Natural Language to
Information Systems, Springer, 2018, pp. 15–27.
[24] S. Ghosh, P. Singhania, S. Singh, K. Rudra, S. Ghosh, Stance detection in web and social
media: a comparative study, in: International Conference of the Cross-Language Evaluation
Forum for European Languages, Springer, 2019, pp. 75–87.
[25] L. Dong, N. Yang, W. Wang, F. Wei, X. Liu, Y. Wang, J. Gao, M. Zhou, H.-W. Hon, Unified
language model pre-training for natural language understanding and generation, Advances
in Neural Information Processing Systems 32 (2019).
[26] B. Zhang, M. Yang, X. Li, Y. Ye, X. Xu, K. Dai, Enhancing cross-target stance detection with
transferable semantic-emotion knowledge, in: Proceedings of the 58th Annual Meeting of
the Association for Computational Linguistics, 2020, pp. 3188–3197.
[27] A. Joshi, P. Bhattacharyya, M. Carman, Political issue extraction model: A novel
hierarchical topic model that uses tweets by political and non-political authors, in: Proceedings of
the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social
Media Analysis, 2016, pp. 82–90.
[28] T. Fagni, S. Cresci, Fine-Grained Prediction of Political Leaning on Social Media with</p>
      <p>Unsupervised Deep Learning, Journal of Artificial Intelligence Research 73 (2022) 633–672.
[29] A. Rashed, M. Kutlu, K. Darwish, T. Elsayed, C. Bayrak, Embeddings-based clustering for
target specific stances: The case of a polarized turkey, arXiv preprint arXiv:2005.09649
(2020).
[30] R. Cohen, D. Ruths, Classifying political orientation on twitter: It’s not easy!, in:
Proceedings of the International AAAI Conference on Web and Social Media, volume 7,
2013.
[31] W. Yin, J. Hay, D. Roth, Benchmarking zero-shot text classification: Datasets, evaluation
and entailment approach, arXiv preprint arXiv:1909.00161 (2019).
[32] N. Reimers, I. Gurevych, Sentence-BERT: Sentence embeddings using Siamese
BERTnetworks, in: Proceedings of the 2019 Conference on Empirical Methods in Natural
Language Processing and the 9th International Joint Conference on Natural Language
Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong,
China, 2019, pp. 3982–3992. URL: https://aclanthology.org/D19-1410. doi:10.18653/v1/
D19- 1410.
[33] F. Sebastiani, Machine learning in automated text categorization, ACM computing surveys
(CSUR) 34 (2002) 1–47.</p>
      <p>A. Statements
nr.
1
la Sanità dovrebbe essere più aperta agli opera- apertura della Sanità ad operatori
priproteggere l’ambiente è più importante della importanza della protezione
dell’ambitagliare la spesa pubblica è un buon modo per tagli
alla
spesa
pubblica come
il sostegno al reddito alle fasce più povere della
migliorare l’economia aiutando le
popolazione è positivo per l’economia italiana
fasce a basso reddito
l’introduzione di una aliquota unica sui redditi conseguenze della flat tax per
l’econo(”flat tax”) sarebbe di beneficio all’economia
italmia italiana
iana
nr.
16
17
18
19
20</p>
      <p>Sentence
tori privati
crescita economica
risolvere la crisi economica
B. Algorithms ordered by complexity
Algorithm 1 [Alg1] The label   is computed as
Algorithm 2 [Alg2] First, it maps each tweet   ∈    into the label   ∈ {1, 2, 3, 4, 5}using its
where   ∈    and   ∈    .</p>
      <p>sentence score   ∈</p>
      <p>then,   is
The step of assigning   to each tweet   ∈</p>
      <p>. In fact, the tweet normalization may help in aggregating the contribution of each
tweet (  ) using the standard mean, which means applying the macro aggregation. In a
multi-class classification setup, macro-metric aggregation is preferable if it is suspected
that there may be class imbalance; in fact, the values   are not balanced with respect
to the current sentence  : likely, if a Party  agrees with a sentence, there will be lot of
tweets in agreement with it (many   = 4 or   = 5) and a few (errors) or no tweets in
disagreement (few labels   = 1, or   = 2, or   = 3), and vice-versa.</p>
      <p>(Eq. 5), hopefully returns a more fair
further define   as the number of voters for the integer label  ∈ {1, 2, 3, 4, 5}
where   are the labels computed from Eq. 5. Let’s define  = (
⌊ ∑=1</p>
      <p>otherwise
where ⌊...⌉ is the round function. The majority voting (case 8a) may have a bigger
contribution in assigning correct labels than the plain standard mean (case 8b taken from</p>
      <p>), since it better accounts for class imbalance.</p>
      <p>Algorithm 4 [Alg4] The previous algorithms take into consideration the neutral label  = 3
(neither disagree, nor agree) also when ∣    ∣≠ 0. However, we wondered how the results
would change if  was only considered when ∣    ∣= 0. The neutral label may also be
assigned in the presence of a low number of in-topic    : in this particular situation, the
user may have not taken a position about the current sentence  yet; also, choosing  
looking at just one tweet may not be significant. Therefore, 4
stems from 3
having
  =  2(</p>
      <p>)
  =
⎧
⎨
⎩ rounded standard mean (case 8b)</p>
      <p>where  is the minimum number of tweets for which the majority voting algorithm or
the standard mean is executed. Since the {3, 4}labels in output from  2()
represent the
 
and 
labels 4 and 5 respectively (as coded in Table ??)
ifnal labels, they must be mapped again to the real final integer
(7)
(8b)</p>
      <p>(9)
(10)
(11)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gambini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Senette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fagni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tesconi</surname>
          </string-name>
          ,
          <article-title>From tweets to stance: An unsupervised framework for user stance detection on twitter</article-title>
          ,
          <source>in: International Conference on Discovery Science</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>96</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Biber</surname>
          </string-name>
          , E. Finegan,
          <article-title>Adverbial stance types in english</article-title>
          ,
          <source>Discourse processes 11</source>
          (
          <year>1988</year>
          )
          <fpage>1</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>Algorithm</source>
          <volume>3</volume>
          [Alg3]
          <article-title>Like 2 , but slightly modifying how   is computed (Eq. 6). Let's</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>