<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Evaluation Campaign of Natural
Language Processing and Speech Tools for Italian, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Vitali at ACTI - Transformer-based Conspiracy Theory Identification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michael Vitali</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vincenzo Scotti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mark James Carman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DEIB, Politecnico di Milano</institution>
          ,
          <addr-line>Via Ponzio 34/5, 20133, Milano (MI)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>0</volume>
      <fpage>7</fpage>
      <lpage>08</lpage>
      <abstract>
        <p>English. In this work, we participated in the Automatic Conspiracy Theory Identification (ACTI) competition, which involved two sub-tasks: identifying whether an input text is a conspiracy theory and recognising the specific conspiracy theory it discusses. Our approach involved fine-tuning two BERT models, one in Italian and one multilingual, and combining them in an ensemble. The results were promising, and we achieved a position among the top participants in the challenge. This work contributes to the advancement of automatic conspiracy theory identification and highlights the efectiveness of fine-tuned BERT models in this domain. Italiano. In questo lavoro, abbiamo partecipato alla competizione di Identificazione Automatica delle Teorie Cospiratorie (Automatic Conspiracy Theory Identification, ACTI), che si compone due sotto-problemi: identificare se un dato testo riguarda una teoria del complotto e riconoscere a quale teoria del complotto in particolare si fa riferimento nel testo. Il nostro approccio prevedeva l'adattamento di due modelli BERT, uno in italiano e uno multilingue, e la loro combinazione in un ensemble. I risultati sono stati promettenti e abbiamo raggiunto una posizione tra i primi partecipanti nella sfida. Questo lavoro contribuisce allo sviluppo dell'identificazione automatica delle teorie del complotto e mette in evidenza l'eficacia dei modelli BERT adattati in questo ambito.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Natural Language Processing</kwd>
        <kwd>Transformer Network</kwd>
        <kwd>BERT</kwd>
        <kwd>Ensemble</kwd>
        <kwd>Conspiracy Theory Identification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In this section, we present the data sets that constitute
the two diferent sub-tasks of the ACTI task (hereafter
theories. Both sub-tasks use separate data sets consisting
of Italian text samples. Table 1 provides an overview of
the main statistics for the text samples in each corpus,
we used NLTK [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] tokeniser to compute the number of
tokens. Figure 1 illustrates the label distributions.
      </p>
      <p>Sub-task A involves binary classification to identify
whether a text sample relates to a conspiracy theory or not.
Samples contain noise like emoticons or spelling errors,
hence we assumed they had not been pre-processed.
Label distribution is well balanced between the two
classes, as can be seen in the top part of Figure 1.</p>
      <p>Sub-task B extends sub-task A by introducing a
multi-class classification aspect. The goal is to identify if
a text pertains to one of the following conspiracy theories:
COVID, QAnon, Flat-Earth, and Russia. As for sub-task
A, samples have not been pre-processed. Diferently from
sub-task A, the label distribution is unbalanced, COVID
and QAnon are more frequent than Flat-Earth and Russia.</p>
      <sec id="sec-1-1">
        <title>2.2. Pre-processing</title>
        <p>Text samples in the data sets contain a lot of noise, like
emoticons, slang, or spelling errors; thus, we applied
some cleaning and pre-processing steps. Moreover, the
two data sets do not contain a large number of samples,
and sub-task B presents an unbalanced label distribution,
introducing the risks of overfitting and learning biased
models. To cope with this issue we considered applying
data augmentation to the data sets. Initial results on the
training set showed that augmentation was relevant to
obtain good results on sub-task A, while we did not need
to apply it to sub-task B, despite the class unbalance.</p>
        <p>To clean the data sets, we employed basic text
transformation and regular expressions to:
• Convert all text to lowercase to ensure consistency
and reduce the vocabulary size.
• Clean the data by removing noise such as
emoticons, slang, and special characters with
regular expressions. This step helps to improve
the quality of the text samples.
• Clean data by removing specific patterns from
the text, including dates, texts between brackets,
links, emails, and multiple spaces, using regular
expressions.</p>
        <p>Additionally, we applied data augmentation to sub-task
A, to increase the number of samples. The method of
choice was back-translation, which involves translating a
text sample from the source language to another language
and then translating it back. This process preserves the
original text’s semantics while potentially altering the
syntax, generating synthetic samples.</p>
        <p>Original</p>
        <p>Augmented
100
101
102
103
Russia</p>
        <p>Original
100
101
102</p>
        <p>103
Count
The ACTI task comprises two sub-tasks: sub-task A, a
binary classification task to determine if a given text piece
is about a conspiracy theory or not, and sub-task B, a
multiclass classification task to recognise specific conspiracy</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Model</title>
      <p>In this section, we describe the architecture of the
classifiers we built using</p>
      <p>
        Transformer neural networks
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and the training process we followed to prepare our
models for evaluation.
      </p>
      <sec id="sec-2-1">
        <title>3.1. Architecture</title>
        <p>
          To solve both ACTI sub-tasks, we used the same
Transformer-based classification architecture, changing
only the target classes from one task to the other. We
explored diferent Transformer Encoder neural networks,
namely BERT [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], pre-trained on diferent data sets.
We also explored individual and combined applications
of the classifiers.
        </p>
        <p>We visualise the single-model and
ensemble-model pipelines in Figure 2.</p>
        <p>Each BERT-based classifier takes as input a sequence
of tokens, extracted from the pre-processed text. The
sequence starts with a classification token [CLS] and is
concluded by an end-of-sequence token [SEP], introduced
during the tokenisation process. To classify the input piece
of text, we retrieve the contextual embedding computed
by the transformer hidden layers in correspondence of the
[CLS] input token and use it to feed a linear classifier. The
ifnal classification layer outputs the probability
distribution of the input piece of text to belong to one of the
possible classes. We reported the entire process in Figure 2a.</p>
        <p>To improve the classification results and take the best
from the trained models, we considered also creating an
ensemble [12, Chapter 16]. For each task, we aggregated
the predictions of the individual models. To aggregate
the predictions, we froze the fine-tuned classifiers and
learned a separate Logistic Regression classifier on top
of the Transformer models. The Logistic Regression
classifier takes as input the probability distributions
predicted by individual models and compute a new output
probability combining the previous. The entire ensemble
pipeline is represented in Figure 2b.</p>
      </sec>
      <sec id="sec-2-2">
        <title>3.2. Training</title>
        <p>To efectively train our models, we adopted
5-fold
cross-validation to find the best hyperparameters for each
of the considered models and each task. We preferred
this approach to the usual train-validation split to
make the best out of the available data. Given the best
hyperparameters combination, we retrained the model
on the entire training data set.</p>
        <p>We fine-tuned two variants of BERT base ( 110M
parameters):</p>
        <p>• Italian BERT (uncased)2, pre-trained on Italian.</p>
        <sec id="sec-2-2-1">
          <title>2Model card:</title>
          <p>La Terra è piatta!</p>
          <p>Input text
(b) Ensemble classifier.</p>
          <p>
            We used the implementations available in the
Transformers library from Hugging Face [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ]. Each configuration
was trained using the Adam optimiser [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ], a linear
learning rate schedule and a batch size of 8.
          </p>
          <p>For each of these models, during cross-validation,
we varied the learning rate and the number of epochs.
Additionally, we explored regularisation: we evaluated
models with and without dynamic masking. We varied the
learning rate in {1× 10− 5,2× 10− 5,3× 10− 5,5× 10− 5},
and the number of epochs in {2,3,4}. Dynamic masking
applies the same kind of masking BERT uses during
pre-training, randomly corrupting the input sequence
by masking the tokens.</p>
          <p>
            We adopted 5-fold cross-validation with the ensemble
as well. Referring to the implementation of Logistic
Regression available in the Scikit-Learn library [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ], we
explored values for the following hyperparameters:
regularisation strength (2 regularisation), number of iterations,
and solver. We varied the inverse of the regularisation
strength in {10− 3,10− 2,10− 1,1,10,102,103}, the
maximum number of iterations in {20,50,100,200,500,1000},
and we tried all solvers apart from the Newton-Cholesky
one. Additionally, we weighed the classes with a weight
inversely proportional to class frequencies, to obtain a
more balanced classifier.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Results</title>
      <p>In this section, we explain how we evaluated the models
proposed for each sub-task, present the results obtained
on each task, and provide comments on these results. In
both cases, we focus in the results of the ensembles, since
they peeform better than individual models in both cases.</p>
      <sec id="sec-3-1">
        <title>4.1. Evaluation</title>
        <p>We evaluated the classification models using the 1
score. For the multitask settings, we computed the
macro average of the scores to account for potential class
distribution imbalances.</p>
        <p>We reported the 1 scores on the test set in table
Table 2. In addition to the results of the submitted models,
we included some additional scores for comparison and
to provide further insight into the results. The 1 scores
are computed on 70% of the test data for sub-task A 50%
of the test data for sub-task B via the Kaggle platform4
(which hosted the competition), as determined by the
authors of the ACTI task for the private leaderboard.</p>
        <sec id="sec-3-1-1">
          <title>3Model card:</title>
          <p>https://huggingface.co/bert-base-multilingual-uncased
4Website: https://www.kaggle.com
ure Italian BERT - P(Conspiracy)
atMultilingual BERT - P(Conspiracy)
e
F-l Multilingual BERT N-onPe(N-orBmiaals)
eod Italian BERT - P(Normal)
M</p>
          <p>(a) Sub-task A.</p>
          <p>Multilingual BERT - P(COVID)
erutea IIttaalliiaann BBEERRTTN-o-nPeP((C-QOAVBnIioDan)s)
F Italian BERT - P(Russia)
-lMultilingual BERT - P(Flat-Earth)
ed Italian BERT - P(Flat-Earth)
oM Multilingual BERT - P(Russia)</p>
          <p>Multilingual BERT - P(QAnon)</p>
          <p>Italian BERT - P(Flat-Earth)
eMultilingual BERT - P(Flat-Earth)
teuar Multilingual BERTN-onPe(Q-AnBoina)s
F-edl IIItttaaallliiiaaannnBBBEEERRRTTT---PPP(((RCQuOAsVnsIoiDna)))
Mo Multilingual BERT - P(COVID)</p>
          <p>Multilingual BERT - P(Russia)</p>
          <p>Multilingual BERT - P(QAnon)
ruteae IIttaalliiaann BBEERRTTN-o-nPeP((Q-CAOnBVoiInaD)s)
FMultilingual BERT - P(Flat-Earth)
le-d MultiIltianlgiuaanlBBEERRTT--PP((RCuOsVsIiDa))
oM Italian BERT - P(Flat-Earth)</p>
          <p>Multilingual BERT - P(Russia)</p>
          <p>Multilingual BERT - P(Russia)
urteae ItIatlailainanBEBRETRTN-o-nPe(PR(-uCsOBsViiIaaDs))
FMultilingual BERT - P(Flat-Earth)
edl- ItaliaIntaBlEiRaTn -BEPR(TFl-atP-(EQaArntohn))
oM Multilingual BERT - P(COVID)</p>
          <p>Multilingual BERT - P(QAnon)
Label: Conspiracy
1</p>
          <p>Concerning feature relevance analysis in the ensemble,
from Figure 3a, we can see that the ensemble gives a
higher weight to the prediction of the Italian BERT model,
rather than the multilingual one. This hints that for this
specific task of detecting whether the text concerns a
conspiracy theory or not, having a language-specific
model may be the better solution. However, the ensemble
improves over both single models, thus the multilingual
model is contributing to the correct classification as well.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>4.3. Sub-task B</title>
        <p>For Sub-task B, the ensemble model achieved a test
accuracy score of 89.83% (see Table 2). This result
highlights again the efectiveness of the ensemble
approach in capturing task-specific patterns and making
accurate predictions.</p>
        <p>The best model configuration for sub-task B, involved
ifne-tuning with the following hyperparameters. Italian
BERT: learning rate of 3 × 10− 5 , number of epochs of
2, and dynamic masking enabled. Multilingual BERT:
learning rate of 3 × 10− 5 , number of epochs of 2,
and dynamic masking enabled. Logistic Regression
(ensemble): inverse of the regularisation strength of 10− 3,
maximum number of iterations of 20, Newton-CG solver.</p>
        <p>Comparing it with other configurations, we observe
that the ensemble model outperformed the
configuration without dynamic masking, which achieved an
accuracy score of 87.67%. This indicates that dynamic
masking played a crucial role in improving the model’s
performance. When comparing the ensemble model’s
accuracy score with the provided baseline accuracy score
of 68.37%, we observe a significant performance boost,
underscoring the efectiveness of our approach.</p>
        <p>Concerning feature relevance analysis in the ensemble,
from Figure 3b), we can see that, diferently from sub-task
A, here both models contribute equally to the prediction.
In fact, the values of the weights associated with the same
input probability and the same output class models are
close for the diferent models.</p>
        <p>Additionally, to get better insights on the behaviour
of the two Transformer-based classifiers on the sub-tasks,
we analysed the weights learned by the Logistic
Regression during the training of the ensemble. The higher the
weight, the stronger the contribution of the probability
predicted by a classifier to the prediction and, thus, the
stronger the relevance of that classifier in the ensemble.
To this end, we visualised all the weights of the two
Logistic Regression models in Figure 3.</p>
      </sec>
      <sec id="sec-3-3">
        <title>4.2. Sub-task A</title>
        <p>For Sub-task A, the ensemble model achieved a test 1
score of 82.30% (see Table 2). This result highlights
the efectiveness of combining the predictions from
individual models to improve overall performance.</p>
        <p>The best model configuration for sub-task A, involved
ifne-tuning with the following hyperparameters. Italian
BERT: learning rate of 2 × 10− 5 , number of epochs of
4, and dynamic masking enabled. Multilingual BERT: 5. Discussion
learning rate of 2 × 10− 5 , number of epochs of 4,
and dynamic masking enabled. Logistic Regression In this report, we described our approach to training
(ensemble): inverse of the regularisation strength of 10− 2, Transformer-based classification models for conspiracy
maximum number of iterations of 20, Newton-CG solver. theory identification. We trained and evaluated our
mod</p>
        <p>Comparing it with other configurations, we observe els on the two sub-tasks of the ACTI data set, an Italian
that the ensemble model outperformed other approaches, benchmark for conspiracy theory identification. The
such as using augmentation alone (77.04%) or not ifrst task involved binary text classification to determine
applying any augmentation nor masking (81.67%). whether a piece of text is about a conspiracy theory or not,
Furthermore, compared to the provided baseline 1 while the second task focused on multi-class classification
score of 51.07%, the proposed ensemble model shows to identify the specific conspiracy theory referenced in
a substantial improvement, highlighting the efectiveness a piece of text.
of our approach.</p>
        <p>Given the limited resources available, including the
data set size and computational power, we were unable to
explore all possible avenues. Additionally, the availability
of pre-trained Italian Language Models is also limited.</p>
        <p>Most Italian models are part of multilingual models
rather than dedicated Italian models, and the available
Italian-only models are smaller if compared to English
ones (for example). However, this provided us with the
opportunity to train potentially multilingual models for
conspiracy theory identification, although we did not test
this approach on other languages in our current study.</p>
        <p>The results obtained from our models are promising.</p>
        <p>For Sub-task A, our ensemble model achieved a test
1 score of 82.30%, outperforming both the other
configurations we explored and the provided baseline
1 score of 51.07%. Regarding Sub-task B, our ensemble
model achieved a test 1 score of 89.83%, surpassing the
other configurations we tested and the provided baseline
1 score of 68.37%. This highlights the efectiveness
of combining the predictions from individual models to
improve overall performance on this task.</p>
        <p>Moving forward, we aim to explore the application
of end-to-end text generation models for conspiracy
theory identification. Current research suggests that
LLMs can be efectively employed for text classification
tasks by concatenating the text to classify with a question
asking for the class and triggering text generation. We
plan to leverage one of these multilingual LLMs with
a combination of prompting and in-context learning,
enabling zero-shot to few-shots learning.</p>
        <p>Overall, our study contributes to the understanding
of conspiracy theory identification using
Transformerbased models. The achieved results show the potential of
these models in accurately classifying conspiracy-related
texts, and future investigations can explore additional
approaches to further enhance performance.
The source code developed for the challenge
is available via GitHub at the following link:
https://github.com/MichaelVitali/Evalita2023.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Stoehr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          , Acti at evalita 2023:
          <article-title>Overview of the conspiracy theory identification task</article-title>
          ,
          <source>arXiv preprint arXiv:2307.06954</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Menini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          , G. Venturi,
          <year>Evalita 2023</year>
          :
          <article-title>Overview of the 8th evaluation campaign of natural language processing and speech tools for italian, in: Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2023</year>
          ), CEUR.org, Parma, Italy,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          , P. Liu,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>A survey of large language models</article-title>
          ,
          <source>CoRR abs/2303</source>
          .18223 (
          <year>2023</year>
          ). URL: https:// doi.org/10.48550/arXiv.2303.18223. doi:
          <volume>10</volume>
          .48550/ arXiv.2303.18223. arXiv:
          <volume>2303</volume>
          .
          <fpage>18223</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Weidinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mellor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rauh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Grifin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uesato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Huang</surname>
          </string-name>
          , M. Cheng, M. Glaese,
          <string-name>
            <given-names>B.</given-names>
            <surname>Balle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kasirzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Kenton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brown</surname>
          </string-name>
          , W. Hawkins,
          <string-name>
            <given-names>T.</given-names>
            <surname>Stepleton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Birhane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Haas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rimell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Hendricks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Isaac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Legassick</surname>
          </string-name>
          , G. Irving, I. Gabriel,
          <article-title>Ethical and social risks of harm from language models</article-title>
          ,
          <source>CoRR abs/2112</source>
          .04359 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2112.04359. arXiv:
          <volume>2112</volume>
          .
          <fpage>04359</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Horta</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Casiraghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Verginer</surname>
          </string-name>
          ,
          <article-title>Understanding online migration decisions following the banning of radical communities</article-title>
          ,
          <source>in: Proceedings of the 15th ACM Web Science Conference</source>
          <year>2023</year>
          , WebSci '23,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2023</year>
          , p.
          <fpage>251</fpage>
          -
          <lpage>259</lpage>
          . URL: https://doi.org/10.1145/3578503.3583608. doi:
          <volume>10</volume>
          .1145/3578503.3583608.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Verginer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          , G. Casiraghi,
          <article-title>Spillover of antisocial behavior from fringe platforms: The unintended consequences of community banning</article-title>
          ,
          <source>in: Proceedings of the International AAAI Conference on Web and Social Media</source>
          , volume
          <volume>17</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>742</fpage>
          -
          <lpage>753</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://aclanthology.org/N19-1423. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gote</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Brandenberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schlosser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schweitzer</surname>
          </string-name>
          ,
          <article-title>Helping a friend or supporting a cause? disentangling active and passive cosponsorship in the U.S. congress, in: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics</article-title>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>2952</fpage>
          -
          <lpage>2969</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>166</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bird</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Klein</surname>
          </string-name>
          , E. Loper,
          <string-name>
            <surname>Natural Language Processing with Python</surname>
          </string-name>
          ,
          <source>O'Reilly</source>
          ,
          <year>2009</year>
          . URL: http://www. oreilly.de/catalog/9780596516499/index.html.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          , in: I. Guyon, U. von Luxburg, S. Bengio,
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. V. N.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , R. Garnett (Eds.),
          <source>Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9</source>
          ,
          <year>2017</year>
          , Long Beach, CA, USA,
          <year>2017</year>
          , pp.
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          . URL: https://proceedings.neurips.cc/paper/2017/hash/ 3f5ee243547dee91fbd053c1c4a845aa-Abstract. html.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          , in: J.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Doran</surname>
          </string-name>
          , T. Solorio (Eds.),
          <source>Proceedings of the</source>
          <year>2019</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis</article-title>
          , MN, USA, June 2-7,
          <year>2019</year>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://doi.org/10.18653/ v1/n19-
          <fpage>1423</fpage>
          . doi:
          <volume>10</volume>
          .18653/v1/n19-
          <fpage>1423</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hastie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <source>The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition</source>
          , Springer Series in Statistics, Springer,
          <year>2009</year>
          . URL: https://doi.org/10.1007/978-0-
          <fpage>387</fpage>
          -84858-7. doi:
          <volume>10</volume>
          .1007/978-0-
          <fpage>387</fpage>
          -84858-7.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gugger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Drame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lhoest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Rush</surname>
          </string-name>
          , Transformers:
          <article-title>State-ofthe-art natural language processing</article-title>
          , in: Q. Liu, D. Schlangen (Eds.),
          <source>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP 2020 - Demos, Online, November 16-20</source>
          ,
          <year>2020</year>
          , Association for Computational Linguistics,
          <year>2020</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          . URL: https://doi.org/10.18653/v1/
          <year>2020</year>
          .emnlp-demos.6. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-demos.
          <volume>6</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A method for stochastic optimization</article-title>
          , in: Y. Bengio, Y. LeCun (Eds.),
          <source>3rd International Conference on Learning Representations, ICLR</source>
          <year>2015</year>
          , San Diego, CA, USA, May 7-
          <issue>9</issue>
          ,
          <year>2015</year>
          , Conference Track Proceedings,
          <year>2015</year>
          . URL: http://arxiv.org/abs/1412.6980.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. VanderPlas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , E. Duchesnay,
          <article-title>Scikit-learn: Machine learning in python</article-title>
          ,
          <source>J. Mach. Learn. Res</source>
          .
          <volume>12</volume>
          (
          <year>2011</year>
          )
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          . URL: https://dl.acm.org/doi/10.5555/1953048.2078195. doi:
          <volume>10</volume>
          .5555/1953048.2078195.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>