<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the Impact of Hate Speech Synthetic Data on Model Fairness</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Camilla Casula</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sara Tonelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler</institution>
          ,
          <addr-line>Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Although attention has been devoted to the issue of online hate speech, some phenomena, such as ableism or ageism, are scarcely represented by existing datasets and case studies. This can lead to hate speech detection systems that do not perform well on underrepresented identity groups. Given the unprecedented capabilities of LLMs in producing high-quality data, we investigate the possibility of augmenting existing data with generative language models, reducing target imbalance. We experiment with augmenting 1,000 posts from the Measuring Hate Speech corpus, an English dataset annotated with target identity information, adding around 30,000 synthetic examples using both simple data augmentation methods and diferent types of generative models, comparing autoregressive and sequence-to-sequence approaches. We focus our evaluation on the performance of models on diferent identity groups, finding that performance can difer greatly for diferent targets and "simpler" data augmentation approaches can improve classification better than state-of-the-art language models. . Warning: this paper contains examples that may be ofensive or upsetting.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;hate speech detection</kwd>
        <kwd>synthetic data</kwd>
        <kwd>model fairness</kwd>
        <kwd>hate speech target</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>model classification outputs and sensitive attributes [ 16].</p>
      <p>
        A potential solution that has been proposed for many
Generic hate speech detection models can nowadays of the issues with hate speech detection data is the
creachieve high performance on benchmark datasets, es- ation of synthetic data [17]. Indeed, recent research has
pecially for high-resource languages [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, these shown it to be a promising solution [18, 19, 20, 21], albeit
models can still present a number of issues and weak- with mixed results [22, 23]. However, no in-depth
analynesses. In particular, the creation and maintenance of sis of the efects of data augmentation (DA) for less
repcorpora for this task can be problematic due to the rel- resented hate speech targets has been carried out, while
ative scarcity of hateful data online [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the negative it could be beneficial not only to make systems more
psychological impact on annotators [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], dataset decay accurate and robust, but also fairer, with comparable
perand therefore reproducibility of results [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and more. formance on hate speech targeting diferent demographic
      </p>
      <p>
        Hate speech detection models have also been found groups [16]. Another aspect we investigate in this work
to often have a tendency to over-rely on specific iden- is a comparison between recent generative language
modtity terms, in particular minority group mentions and els and more traditional approaches to data augmentation
other identity-related terms [
        <xref ref-type="bibr" rid="ref5">5, 6, 7</xref>
        ]. Another issue with with regards to hate speech detection, since increasing
existing datasets and systems for this task is related to the amount of training data with synthetic examples has
the representation of identity groups that are targets of been successfully exploited well before the advent of
hate, which is rather unbalanced. For example, misogyny generative large language models, and can lead to
imhas been covered in several datasets [8, 9], while other provements although these methods have a much lower
phenomena have received much less attention, such as computational cost [24].
religious hate [10] or hate against LGBTQIA+ people In this work, we therefore address the following
re[11, 12, 13]. Furthermore, phenomena such as ageism search questions:
and ableism have only been marginally addressed, as (Q1) What is the impact of data augmentation on
shown in the survey by Yu et al. [14]. This disparity af- model performance for specific target identities?
fects in turn system fairness, because ofenses against (Q2) Can information about identity groups in the
less-represented targets will be classified with a lower ac- generation process help the creation of better and more
curacy, further impacting communities that are already representative synthetic examples?
marginalized [15]. By fairness, in this work we mean (Q3) Can certain data augmentation setups enhance
group fairness, which implies independence between the performance of models on underrepresented targets,
therefore improving their fairness by reducing
difer
      </p>
      <sec id="sec-1-1">
        <title>CLiC-it 2025: Eleventh Italian Conference on Computational Linguis- ences in performance across diferent identity groups?</title>
        <p>tics, September 24 — 26, 2025, Cagliari, Italy We aim at answering these questions through a set of
$ ccasula@fbk.eu (C. Casula); satonelli@fbk.eu (S. Tonelli)
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License experiments in which we focus on the performance of
Attribution 4.0 International (CC BY 4.0).
models by target identity. In addition, we introduce two
novel elements compared to previous work on generative
DA: (i) we experiment with setups in which we exploit
target identity information during generation,
attempting to increase the relative representation of scarcely
represented targets, with the aim of positively impacting
model fairness, and (ii) we experiment with
instructionifnetuned large language models (LLMs), which have
recently been shown to be able to improve downstream
task performance [25]. We also further investigate
potential fairness-related weaknesses of models using the
HateCheck test suite [7] combined with a manual
analysis of generated examples.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <sec id="sec-2-1">
        <title>The field of hateful content detection has gained a large</title>
        <p>amount of traction in recent years, with increased ef- carried out on the impact data augmentation can have
fort from the research community in establishing com- on the performance of models for specific targets of hate,
mon guidelines and benchmarks (e.g. Basile et al. [26], or into the exploitation of target identity information to
Zampieri et al. [27]) across diferent languages and tar- potentially improve fully automated data augmentation
gets of hate [28, 29, 11, 30]. processes.</p>
        <p>
          A potential way that has been proposed to mitigate
some of the issues with hate speech datasets, such as data
scarcity [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and negative psychological impact on anno- 3. Data
tators [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], is data augmentation, which could also benefit
the performance of hate speech detection systems. Data For our experiments, we use the Measuring Hate Speech
augmentation refers to a family of approaches aimed (MHS) Corpus [41, 42], a dataset consisting of social
meat increasing the diversity of training data without col- dia posts in English from three social media platforms
lecting new samples [31]. While DA is widely used to (Reddit, Twitter, and YouTube). While the corpus is meant
make models more robust across many machine learning to capture diferent levels of hatefulness on a scale, it also
applications, it has not been as frequently adopted or includes binary hate speech labels for benchmarking
purresearched in NLP [32, 33] until recently, with LLMs that poses, which we use in our experiments.
are capable of generating realistic text [34, 35]. The MHS corpus features labels regarding the binary
        </p>
        <p>DA for the detection of hate speech has recently been identification of pre-specified identity groups and
subexplored using generative LLMs: Juuti et al. [36] use groups in texts. Importantly, this annotation is present
GPT-2 [37] to augment toxic language data in extremely regardless of hatefulness, resulting in target annotations
low-resource scenarios. Similarly, Wullach et al. [18] even for posts containing supportive or counter-speech.
and D’Sa et al. [19] successfully augment toxic language In the MHS dataset 1 we find annotations for seven target
datasets using GPT-2. Fanton et al. [38] combine GPT-2 identity groups: race, religion, origin, gender, sexuality,
and human validation to create counter-narratives that age, and disability. Their distribution in the data can
cover multiple hate targets. More recently, Ocampo et al. be seen in Figure 1, which shows how the most widely
[39] have applied data augmentation to increase the num- studied targets of hate speech, race and gender, are also
ber of instances for the minority class in implicit and the most widely represented in the MHS corpus.
subtle examples of hate speech. Casula and Tonelli [22] Given that the MHS corpus uses disaggregated
annoshow that generative data augmentation for hate speech tations, we aggregate them so that each example has a
detection using GPT-2 is in some cases challenged by a unique label and set of targets. First, we consider each
simple oversampling baseline, while Casula et al. [23] example to be about or targeting all the identity groups
analyse the qualitative diferences between original and identified by at least half of the annotators who
annoparaphrased hate speech data. Finally, Hartvigsen et al. tated it. Since the hatespeech label in the dataset can
[20] use manually curated (through a human-in-the-loop assume three values (0: non hateful, 1: unclear, 2: hateful),
process) prompts to generate implicitly hateful sequences we binarize these by averaging all the annotations for
with GPT-3 [40].</p>
        <p>To our knowledge, no dedicated analyses have been
1https://huggingface.co/datasets/ucberkeley-dlab/measuring-hatespeech
a given post, mapping it to hateful if the average score
is higher than 1 and to non hateful if it is lower.2 After
this process, we are left with 35,243 annotated posts, of
which 9,046 are annotated as containing hate speech.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Methodology</title>
      <p>For our experiments, we compare diferent generation
strategies to train hate speech detection models of
different sizes, aiming at assessing the impact of data
augmentation based on language models on specific target
identities. In order to do this, we evaluate both
decoderonly and encoder-decoder models, experimenting also
with their instruction-tuned counterparts. Additionally,
we experiment with the inclusion of target identity
information in the prompts, with the assumption that this
information might lead to more varied and representative
generated texts. We then use two diferent methods of
exploiting existing information and data to generate new
sequences: finetuning and few-shot prompting.
4.1. Generative Models
information might help in generating more varied data
with regards to identity group mentions for both hateful
and non-hateful messages. By generating target-specific
examples also for the non-hateful class, we ideally aim
at implicitly contrasting identity term bias. In order to
do this, we encode target identity information into the
prompts given to the models in various ways.
4.3. Finetuning vs Few-Shot Prompting</p>
      <sec id="sec-3-1">
        <title>A large number of works on data augmentation based</title>
        <p>on generative models rely on finetuning a model on a
small set of gold data, and then generating new data
with the finetuned model, encoding the label information
within the text sequences in some form (e.g.
AnabyTavor et al. [34], Kumar et al. [35]). Other works use
fewshot demonstration-based prompting, in which the
pretrained model is prompted with one or more sequences
similar to what the model is expected to generate, with
no finetuning (e.g. Hartvigsen et al. [20], Azam et al.
[43], Ashida and Komachi [48]). We experiment with
both strategies.</p>
        <p>While most of the work on generation-based data aug- Finetuning (FT) For finetuning, we follow an
apmentation for this task focuses on decoder-only Trans- proach similar to that of Anaby-Tavor et al. [34], in which
former models [22], other works have shown encoder- a generative LLM is finetuned on annotated sequences
decoder Transformers to be potentially efective as well that are concatenated with labels. At generation time,
[43]. Since no work has been carried out on comparing the desired label information is fed into the model, and
decoder-only with encoder-decoder models for this type the model is expected to generate a sequence
belongof data augmentation, we experiment with both. Then, ing to the specified class. We discuss the details of the
based on work showing how instruction-tuning can im- formatting of the label information in Section 4.4.
prove generalization to unseen tasks [25, 44], we aim at This method has the upside of theoretically being more
experimenting also with instruction-finetuned models. likely to generate examples that are closer to the original</p>
        <p>To favor reproducibility, we choose to only use openly distribution of the data to be augmented. However, this
available models for our experiments. We employ Llama can also be a downside, if the desired efect is increasing
3.1 8B in its base and Instruct versions [45], OPT in its the variety of the data. In addition, finetuning is more
base and IML (instruction-tuned) versions [46] and T5 in computationally expensive than few-shot prompting.
its base and FLAN (instruction-tuned) versions [47, 44]. For models finetuned with target identity information,
We use the 1.3B parameter version of OPT and OPT-IML given that each sequence can be associated with more
and the Large version of T5 and Flan-T5 (770M), aiming than one target (in cases of intersectional hate speech
at capturing in our analyses the efects of this kind of for instance), a diferent label-encoding sequence will be
methodology with diferent model sizes. used to include all target identities represented in that
post. An example of prompt to produce a a post about
gender that is hateful is Write a hateful social media post</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4.2. Target Identity Information about gender.</title>
      <p>In addition to performing DA with diferent types of
models and techniques, we investigate for the first time
the possibility of including target identity information
both when finetuning models and when prompting them,
with the hypothesis that the inclusion of this kind of</p>
      <sec id="sec-4-1">
        <title>2While we are aware this does not exploit the most novel and inter</title>
        <p>esting features of the MHS dataset, the exploration of annotator
(dis)agreement with regards to data augmentation is beyond the
scope of this work, and is left for future research.</p>
        <p>Few-shot prompting (FS) Following the large
amount of works focusing on few-shot
demonstrationbased instructions, especially with instruction-finetuned
models [49, 44], we also experiment with
demonstrationbased prompting, in which the models are shown 3
examples belonging to the desired label (and target identity,
if available), and then asked to produce a new one.</p>
        <p>With models exploiting target identity information for
few-shot prompting, we associate the desired label and</p>
        <p>We aim at using the same type of prompting layout across
experiments. We choose to use prompting sequences in
natural language, given that they have been found to lead
to generally more realistic generated examples for this
purpose [22]. In order to find prompts in natural language
that could be leveraged by our models, we consulted the
FLAN corpus [25], which is part of the finetuning data
of both FLAN-T5 and OPT-IML. Among the instruction
templates, we find one of the CommonGen templates [ 50]
to fit with our aims: ‘ Write a sentence about the following
things: [concepts], [target]’. We reformulate it to obtain
a prompting sequence that reflects our application, and
can be exploited by instruction-finetuned models: Write
a [∅/ hateful] social media post [∅/ about t], where  is a
target identity category.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Setup</title>
      <sec id="sec-5-1">
        <title>For all experiments, we simulate a setup in which we have a small amount of gold data available prior to augmenta</title>
      </sec>
      <sec id="sec-5-2">
        <title>Baselines We implement three baselines using De</title>
        <p>BERTa: i) the classifier finetuned on the starting 1k gold
examples; ii) the same classifier finetuned on an
oversampled version of the training data (repeating the initial 1k
sequences until we get to 31k, the size of the augmented
setups), which has been found efective even in
crossdataset scenarios [22]; and (iii) as a stronger baseline, we
also compare all of our models with models trained on
No augmentation
Oversampling
EDA</p>
        <p>Model</p>
        <p>Tar
data augmented using Easy Data Augmentation (EDA) generation-augmented data in terms of macro-averaged
[52]. EDA consists of four operations: synonym
replacement, random insertion, random swap, and random dele- F1 score and hateful class F1 (h-F1)both globally and by
tion of tokens. Similarly to our other setups, we produce target identity group is reported in Table 1. All models
30k new sequences with EDA, of which 7,500 with each are tested on a held-out portion of the gold data from the
operation, on the initial 1,000 examples in each fold. We MHS corpus.</p>
        <p>Considering simply the no augmentation baseline, it
then also experiment with the mixture of EDA and gener- is clear that performance can vary greatly across
tarative DA, in which instead of augmenting the initial gold
data with 30k synthetic sequences obtained with EDA or
generative DA, we randomly select 15k examples of each
and concatenate them.
get groups, with up to 27% hate-F1 diferences between
them. In particular, the model appears to struggle with
posts about origin (Or), religion (Re), and age (Ag), while,
although underrepresented compared to other target
groups, posts about disability (Di) tend to be classified
more accurately on average. This suggests that
perfor</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results and Discussion</title>
      <p>mance might also be influenced by factors other than
In this section we report the results of our experiments, the representation of targets in the dataset, such as how
averaged across 5 data folds using diferent random seeds. broad a target category is or how much variation there
The performance of our baselines and models trained on is within it. For instance, origin can include any type</p>
    </sec>
    <sec id="sec-7">
      <title>7. Qualitative Analysis</title>
      <p>Flan-T5</p>
      <p>FT
FS</p>
      <p>Y
N
Y
N
of discrimination based on geographical origin, poten- Table 2
tially making it harder to generalize for, and religion as Generated texts labeled as correct by human annotators in
a category encompasses any type of religious discourse, terms of labels, target categories, and realism. N/A refers to
in spite of each religion being targeted through specific cases in which all of the generated texts were nonsensical (0%
ofense types [ 10]. This makes classification challeng- realistic), with impossible assignment of labels or categories.
ing, especially for systems that rely primarily on lexical Model Tar Label Target Realism
features.</p>
      <p>Most of the models trained on generation-augmented Llama 3.1 8B FT NY 8978%% 72/% 8869%%
data outperform the no augmentation baseline across tar- FS Y 93% 53% 86%
gets, with diferent improvements based on target iden- N 90% / 84%
tity group (origin, religion, and age in particular). Strik- Llama 3.1 8B Inst. FT Y 87% 66% 79%
ingly, however, EDA performs better than all generation- N 87% / 73%
based DA configurations, regardless of prompting type FS Y 89% 61% 81%
or access to target information, for all targets but age. N 83% / 79%</p>
      <p>We hypothesize EDA is efective because small pertur- OPT FT Y 93% 63% 66%
bations can make models more robust, especially with N N/A / 0%
regard to the hateful class, while generative models do FS Y 90% 39% 83%
increase performance, but they are also more likely to N 81% / 70%
inject noise. OPT-IML FT Y 96% 53% 66%</p>
      <p>The impact of finetuning vs. few-shot prompting N N/A / 0%
seems model-dependent, with diferences across models FS Y 90% 57% 79%
also regarding the impact of target information. Inter- N 81% / 73%
estingly, the amount of synthetic examples labeled as T5 FT Y 83% 59% 80%
hateful that pass filtering does not appear to be linked N 74% / 30%
with better performances of models trained on synthetic FS Y N/A N/A 0%
data. N N/A / 0%
In this section, we look into the synthetically generated
texts and the models trained on them from a qualitative
point of view. First we carry out a manual annotation Consider for example the following sentence, generated
on the generated texts. Then, we turn to the HateCheck giving ‘age’ as target information: ‘F*ckin white men are
test suite [7], which includes examples aimed at explor- trashy like a muthaf*cker’. In this case, Label would be
ing the weaknesses of hate speech models, especially ‘hateful’, Realism would be ‘Yes’ but Target would be ‘No’,
their out-of-distribution generalization, again focusing because the target identity category of the generated
on performance by target. HateCheck targets are in some example is ‘race’ and not ‘age’.
cases more specific than those present in our dataset, thus Inter-annotator agreement was calculated using
Kripproviding a complementary view on our models’ perfor- pendorf’s alpha on 10% of the manually analyzed data
mance. (112 examples). The annotators showed moderate
agreement with regards to label correctness ( = 0.76), while
7.1. Manual Annotation the scores were higher for category correctness ( = 0.83)
and realism ( = 0.82).</p>
      <p>A total of 1,120 generated texts filtered with DeBERTa The results of the manual analysis are reported in Table
were annotated by two annotators with a background in 2. In most cases, the addition of target information
relinguistics and experience in hate speech research. For sults in more realistic texts and, in general, more accurate
each combination of finetuning/prompting/target pres- label assignment. However, this is not directly associated
ence for each model, they annotated 70 examples, evenly with improved model performance from augmented data.
distributed across labels and, where available, targets. In addition, the rate of realistic texts and the accuracy of
The examples were annotated according to label correct- the identity categories are still somewhat low compared
ness, target category correctness (where available), and to the correctness of label assignment, showing that the
realism. generative models we tested might have dificulties
deal</p>
      <p>For the examples generated without access to target ing with more than one type of constraint/instruction.
information, the target dimension was not annotated. Indeed, while few-shot (FS) approaches sometimes lead
No Augmentation
EDA</p>
      <p>Model
to more realistic generated sequences, this often entails in MHS). The age category is present in MHS corpus and
lower label or category correctness and vice versa.
entirely missing in HateCheck.
7.2. HateCheck We report in Table 3 the results obtained by the
models trained on augmented data on HateCheck in terms
of hate-class F1 score, divided by target identity group
We perform a second qualitative analysis using the Hat- according to the identity categories used in HateCheck.
eCheck test suite [7], a collection of functional testing
examples that enable targeted diagnostic insights of hate
Overall, we can notice significant improvements over the
no augmentation baseline across diferent setups and
tarspeech detection models. All HateCheck test cases men- get identities, although these improvements are variable,
tion a specific target identity, to allow the exploration of
unintended biases against diferent target groups. How- and again overshadowed by Easy Data Augmentation,
ever, the target groups used in HateCheck do not fully resulting in more evenly distributed scores across all
taroverlap with the target identity groups in the MHS cor- gets.</p>
      <sec id="sec-7-1">
        <title>Interestingly, however, even overlapping targets be</title>
        <p>pus (Figure 1). The target identities in HateCheck are: tween the MHS dataset and HateCheck show diferent
women (which would fall under gender in MHS), trans
improvements across generative DA setups. For instance,
people (gender in MHS), gay people (sexuality in MHS), the HateCheck target people with disabilities apparently
black people (race in MHS), disabled people (disability in
MHS), Muslims (religion in MHS) and immigrants (origin benefits from the inclusion of target information more
than the Disability category in the MHS corpus,
suggesting that the inclusion of identity group information could generative language models can be beneficial, even when
afect performances diferently in and out of domain. using only openly available models. However, given their</p>
        <p>Furthermore, HateCheck test instances are manually high computational costs, alternatives like EDA could
created starting from templates that include specific men- be considered if limited resources are available, because
tions of identity-group terms, which could lead to sen- they can still yield performance improvements compared
tences that are very diferent from ‘real’ examples that to a low-resource setting. Again, there seems to be no
could be found on social media. For instance, one of the one-fits-all solution or approach to generation or data
hateful templates used in HateCheck is ‘I hate [IDEN- augmentation in this kind of scenario.
TITY]’, in which [IDENTITY] is substituted with women, We acknowledge that data augmentation techniques
disabled people, and so on for all of the targets reported may be used also for malicious purposes, for example
above. This can result in somewhat stereotypical exam- to create thousands of hateful examples with the goal
ples, while hateful posts in the MHS corpus often contain of hurting the same groups that we want to support.
slurs and ofensive terms instead of more neutral names Because of this, we provide all the necessary details for
to refer to people belonging to a certain identity group. the reproduction of our results, but we do not plan to
openly release the code or to upload the generated data
produced by our experiments, especially in order to avoid
8. Conclusions it being crawled and ending up in the training data of
LLMs in the future. We are, however, open to sharing the
data with other researchers who might be interested.</p>
        <p>We have investigated the impact of data augmentation
with generative models on specific targets of hate,
experimenting with instruction-finetuned models and the
addition of target information when generating new se- Acknowledgments
quences. Overall, it appears that DA methods have
different types of impact on diferent targets, but they can This work was funded by the European Union’s CERV
improve performance even for scarcely represented iden- fund under grant agreement No. 101143249
(HATEtity categories (Q1). However, we observed that genera- DEMICS).
tive data augmentation alone is not as strong as simpler
methods such as EDA.</p>
        <p>Through a qualitative analysis, we also emphasized the References
fact that including target information when generating
synthetic examples can facilitate the creation of examples
that are more realistic and exhibit more correct label
assignments (Q2), although further work could investigate
why these characteristics do not directly correlate with
downstream task performance.</p>
        <p>Overall, our analysis shows that there is potential in
data augmentation with regards to model group fairness
(Q3), implying independence between model
classification output and sensitive attributes [16]. However,
although potentially useful, this type of DA can still lead to
unpredictable results, and it is not guaranteed to always
improve the performance of models across all identity
groups with regards to hate speech. We plan to further
explore this research direction in the future,
considering also intersectionality and more specific targets (e.g.
groups such as trans women rather than the gender
category). In addition, we worked on English data because
of the availability of the Measuring Hate Speech corpus,
which was large enough to perform our DA experiments
and presented the kind of fine-grained target annotation
required in our study. However, we are aware that DA
would benefit more classification with lower-resourced
languages, so we plan to work on diferent languages in
the future.</p>
        <p>In summary, we show that data augmentation with
in Text Classification, in: Proceedings of the Proceedings of the Eighth Evaluation Campaign of
2018 AAAI/ACM Conference on AI, Ethics, and Natural Language Processing and Speech Tools for
Society, ACM, New Orleans LA USA, 2018, pp. Italian. Final Workshop (EVALITA 2023), CEUR.org,
67–73. URL: https://dl.acm.org/doi/10.1145/3278721. Parma, Italy, 2023.</p>
        <p>3278729. doi:10.1145/3278721.3278729. [14] Z. Yu, I. Sen, D. Assenmacher, M. Samory, L.
Fröh[6] B. Kennedy, X. Jin, A. Mostafazadeh Davani, M. De- ling, C. Dahn, D. Nozza, C. Wagner, The unseen
hghani, X. Ren, Contextualizing hate speech clas- targets of hate: A systematic review of hateful
sifiers with post-hoc explanation, in: Proceed- communication datasets, Social Science Computer
ings of the 58th Annual Meeting of the Associa- Review (2024) 08944393241258771. doi:10.1177/
tion for Computational Linguistics, Association for 08944393241258771.</p>
        <p>Computational Linguistics, Online, 2020, pp. 5435– [15] Z. Talat, J. Bingel, I. Augenstein, Disembodied
ma5442. URL: https://aclanthology.org/2020.acl-main. chine learning: On the illusion of objectivity in nlp,
483. doi:10.18653/v1/2020.acl-main.483. ArXiv abs/2101.11974 (2021).
[7] P. Röttger, B. Vidgen, D. Nguyen, Z. Waseem, [16] J. Anthis, K. Lum, M. Ekstrand, A. Feller,
H. Margetts, J. Pierrehumbert, HateCheck: Func- A. D’Amour, C. Tan, The Impossibility of Fair LLMs,
tional tests for hate speech detection models, in: 2024. URL: http://arxiv.org/abs/2406.03198. doi:10.
Proceedings of the 59th Annual Meeting of the As- 48550/arXiv.2406.03198, arXiv:2406.03198 [cs,
sociation for Computational Linguistics and the stat].
11th International Joint Conference on Natural Lan- [17] B. Vidgen, L. Derczynski, Directions in abusive
language Processing (Volume 1: Long Papers), Associa- guage training data, a systematic review: Garbage
tion for Computational Linguistics, Online, 2021, pp. in, garbage out, PLOS ONE 15 (2020) e0243300.
41–58. URL: https://aclanthology.org/2021.acl-long. doi:10.1371/journal.pone.0243300.
4. doi:10.18653/v1/2021.acl-long.4. [18] T. Wullach, A. Adler, E. Minkov, Fight fire
[8] S. Bhattacharya, S. Singh, R. Kumar, A. Bansal, with fire: Fine-tuning hate detectors using large
A. Bhagat, Y. Dawer, B. Lahiri, A. K. Ojha, De- samples of generated hate speech, in:
Findveloping a multilingual annotated corpus of misog- ings of the Association for Computational
Linyny and aggression, in: Proceedings of the Second guistics: EMNLP 2021, Association for
ComputaWorkshop on Trolling, Aggression and Cyberbul- tional Linguistics, Punta Cana, Dominican Republic,
lying, European Language Resources Association 2021, pp. 4699–4705. URL: https://aclanthology.org/
(ELRA), Marseille, France, 2020, pp. 158–168. URL: 2021.findings-emnlp.402. doi: 10.18653/v1/2021.
https://aclanthology.org/2020.trac-1.25. findings-emnlp.402.
[9] E. Guest, B. Vidgen, A. Mittos, N. Sastry, G. Tyson, [19] A. G. D’Sa, I. Illina, D. Fohr, D. Klakow,
H. Margetts, An Expert Annotated Dataset for the D. Ruiter, Exploring Conditional Language
Detection of Online Misogyny, in: Proceedings of Model Based Data Augmentation Approaches
the 16th Conference of the European Chapter of the for Hate Speech&amp;#xa0;Classification, in: Text,
Association for Computational Linguistics: Main Speech, and Dialogue: 24th International
ConVolume, Association for Computational Linguistics, ference, TSD 2021, Olomouc, Czech Republic,
Online, 2021, pp. 1336–1350. September 6–9, 2021, Proceedings, Springer-Verlag,
[10] A. Ramponi, B. Testa, S. Tonelli, E. Jezek, Address- Berlin, Heidelberg, 2021, pp. 135–146. doi:10.1007/
ing religious hate online: from taxonomy creation 978-3-030-83527-9_12.
to automated detection, PeerJ Computer Science 8 [20] T. Hartvigsen, S. Gabriel, H. Palangi, M. Sap,
(2022) e1128. D. Ray, E. Kamar, ToxiGen: A large-scale
[11] B. R. Chakravarthi, R. Priyadharshini, R. Pon- machine-generated dataset for adversarial and
imnusamy, P. K. Kumaresan, K. Sampath, D. Then- plicit hate speech detection, in: Proceedings of
mozhi, S. Thangasamy, R. Nallathambi, J. P. McCrae, the 60th Annual Meeting of the Association for
Dataset for identification of homophobia and tran- Computational Linguistics (Volume 1: Long
Pasophobia in multilingual youtube comments, 2021. pers), Association for Computational Linguistics,
arXiv:2109.00227. Dublin, Ireland, 2022, pp. 3309–3326. URL: https://
[12] D. Locatelli, G. Damo, D. Nozza, A cross-lingual aclanthology.org/2022.acl-long.234. doi:10.18653/
study of homotransphobia on twitter, in: Proceed- v1/2022.acl-long.234.
ings of the First Workshop on Cross-Cultural Con- [21] C. Casula, E. Leonardelli, S. Tonelli, Don’t
augsiderations in NLP (C3NLP), 2023, pp. 16–24. ment, rewrite? assessing abusive language
detec[13] D. Nozza, A. T. Cignarella, G. Damo, T. Caselli, tion with synthetic data, in: L.-W. Ku, A. Martins,
V. Patti, HODI at EVALITA 2023: Overview of V. Srikumar (Eds.), Findings of the Association for
the Homotransphobia Detection in Italian Task, in: Computational Linguistics: ACL 2024, Association
for Computational Linguistics, Bangkok, Thailand, science/article/pii/S0306457322002199. doi:https:
2024, pp. 11240–11247. URL: https://aclanthology. //doi.org/10.1016/j.ipm.2022.103118.
org/2024.findings-acl.669/. doi: 10.18653/v1/2024. [29] M. Zampieri, P. Nakov, S. Rosenthal, P. Atanasova,
findings-acl.669. G. Karadzhov, H. Mubarak, L. Derczynski, Z.
Pite[22] C. Casula, S. Tonelli, Generation-based data aug- nis, Ç. Çöltekin, SemEval-2020 task 12:
Mulmentation for ofensive language detection: Is it tilingual ofensive language identification in
soworth it?, in: Proceedings of the 17th Conference cial media (OfensEval 2020), in: Proceedings of
of the European Chapter of the Association for the Fourteenth Workshop on Semantic Evaluation,
Computational Linguistics, Association for Com- International Committee for Computational
Linputational Linguistics, Dubrovnik, Croatia, 2023, guistics, Barcelona (online), 2020, pp. 1425–1447.
pp. 3359–3377. URL: https://aclanthology.org/2023. URL: https://aclanthology.org/2020.semeval-1.188.
eacl-main.244. doi:10.18653/v1/2020.semeval-1.188.
[23] C. Casula, S. Vecellio Salto, A. Ramponi, S. Tonelli, [30] E. Leonardelli, C. Casula, S. Vecellio Salto, J. E. Bak,
Delving into qualitative implications of synthetic E. Muratore, A. Kołos, T. Louf, S. Tonelli,
MuLTadata for hate speech detection, in: Y. Al- Telegram: A Fine-Grained Italian and Polish Dataset
Onaizan, M. Bansal, Y.-N. Chen (Eds.), Pro- for Hate Speech and Target Detection, in:
Proceedceedings of the 2024 Conference on Empirical ings of the Eleventh Italian Conference on
CompuMethods in Natural Language Processing, As- tational Linguistics (CLiC-it 2025), 2025.
sociation for Computational Linguistics, Miami, [31] S. Y. Feng, V. Gangal, J. Wei, S. Chandar, S. Vosoughi,
Florida, USA, 2024, pp. 19709–19726. URL: https: T. Mitamura, E. Hovy, A survey of data
aug//aclanthology.org/2024.emnlp-main.1099/. doi:10. mentation approaches for NLP, in: Findings
18653/v1/2024.emnlp-main.1099. of the Association for Computational
Linguis[24] J. Chen, D. Tam, C. Rafel, M. Bansal, D. Yang, An tics: ACL-IJCNLP 2021, Association for
CompuEmpirical Survey of Data Augmentation for Limited tational Linguistics, Online, 2021, pp. 968–988.
Data Learning in NLP, Transactions of the Associa- URL: https://aclanthology.org/2021.findings-acl.84.
tion for Computational Linguistics 11 (2023) 191– doi:10.18653/v1/2021.findings-acl.84.
211. URL: https://direct.mit.edu/tacl/article-pdf/doi/ [32] L. F. A. O. Pellicer, T. M. Ferreira, A. H. R. Costa,
10.1162/tacl_a_00542/2074871/tacl_a_00542.pdf. Data augmentation techniques in natural language
[25] J. Wei, M. Bosma, V. Zhao, K. Guu, A. W. Yu, processing, Applied Soft Computing 132 (2023)
B. Lester, N. Du, A. M. Dai, Q. V. Le, Finetuned 109803. doi:10.1016/j.asoc.2022.109803.
language models are zero-shot learners, in: In- [33] M. Bayer, M.-A. Kaufhold, C. Reuter, A Survey on
ternational Conference on Learning Representa- Data Augmentation for Text Classification, ACM
tions, 2022. URL: https://openreview.net/forum?id= Computing Surveys 55 (2022) 146:1–146:39. doi:10.
gEZrGCozdqR. 1145/3544558.
[26] V. Basile, C. Bosco, E. Fersini, D. Nozza, V. Patti, F. M. [34] A. Anaby-Tavor, B. Carmeli, E. Goldbraich, A.
KanRangel Pardo, P. Rosso, M. Sanguinetti, SemEval- tor, G. Kour, S. Shlomov, N. Tepper, N. Zwerdling,
2019 Task 5: Multilingual Detection of Hate Speech Do Not Have Enough Data? Deep Learning to the
Against Immigrants and Women in Twitter, in: Rescue!, in: Proceedings of the AAAI Conference
Proceedings of the 13th International Workshop on Artificial Intelligence, volume 34, 2020, pp. 7383–
on Semantic Evaluation, Association for Compu- 7390. doi:10.1609/aaai.v34i05.6233.
tational Linguistics, Minneapolis, Minnesota, USA, [35] V. Kumar, A. Choudhary, E. Cho, Data
augmen2019, pp. 54–63. doi:10.18653/v1/S19-2007. tation using pre-trained transformer models, in:
[27] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, Proceedings of the 2nd Workshop on Life-long
N. Farra, R. Kumar, SemEval-2019 Task 6: Identi- Learning for Spoken Language Systems,
Associafying and Categorizing Ofensive Language in So- tion for Computational Linguistics, Suzhou, China,
cial Media (OfensEval), in: Proceedings of the 2020, pp. 18–26. URL: https://aclanthology.org/2020.
13th International Workshop on Semantic Evalu- lifelongnlp-1.3.
ation, Association for Computational Linguistics, [36] M. Juuti, T. Gröndahl, A. Flanagan, N. Asokan,
Minneapolis, Minnesota, USA, 2019, pp. 75–86. A little goes a long way: Improving toxic
landoi:10.18653/v1/S19-2010. guage classification despite data scarcity, in:
Find[28] C. Bosco, V. Patti, S. Frenda, A. T. Cignarella, M. Pa- ings of the Association for Computational
Linciello, F. D’Errico, Detecting racial stereotypes: An guistics: EMNLP 2020, Association for
CompuItalian social media corpus where psychology meets tational Linguistics, Online, 2020, pp. 2991–3009.
NLP, Information Processing and Management 60 doi:10.18653/v1/2020.findings-emnlp.269.
(2023) 103118. URL: https://www.sciencedirect.com/ [37] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei,
I. Sutskever, Language Models are Unsupervised son, D. Valter, S. Narang, G. Mishra, A. Yu, V. Zhao,
Multitask Learners, 2019. Y. Huang, A. Dai, H. Yu, S. Petrov, E. H. Chi, J. Dean,
[38] M. Fanton, H. Bonaldi, S. S. Tekiroğlu, M. Guerini, J. Devlin, A. Roberts, D. Zhou, Q. V. Le, J. Wei,
ScalHuman-in-the-Loop for Data Collection: A Multi- ing instruction-finetuned language models, 2022.
Target Counter Narrative Dataset to Fight Online arXiv:2210.11416.</p>
        <p>Hate Speech, in: Proceedings of the 59th Annual [45] M. A. Llama Team, The llama 3 herd of
modMeeting of the Association for Computational Lin- els, 2024. URL: https://arxiv.org/abs/2407.21783.
guistics and the 11th International Joint Conference arXiv:2407.21783.
on Natural Language Processing (Volume 1: Long [46] S. Zhang, S. Roller, N. Goyal, M. Artetxe, M. Chen,
Papers), Association for Computational Linguis- S. Chen, C. Dewan, M. Diab, X. Li, X. V. Lin, T.
Mitics, Online, 2021, pp. 3226–3240. doi:10.18653/ haylov, M. Ott, S. Shleifer, K. Shuster, D. Simig,
v1/2021.acl-long.250. P. S. Koura, A. Sridhar, T. Wang, L. Zettlemoyer,
[39] N. Ocampo, E. Sviridova, E. Cabrio, S. Villata, An Opt: Open pre-trained transformer language
modin-depth analysis of implicit and subtle hate speech els, 2022. arXiv:2205.01068.
messages, in: Proceedings of the 17th Conference [47] C. Rafel, N. Shazeer, A. Roberts, K. Lee, S. Narang,
of the European Chapter of the Association for M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the
Computational Linguistics, Association for Com- limits of transfer learning with a unified
text-toputational Linguistics, Dubrovnik, Croatia, 2023, text transformer, Journal of Machine Learning
Repp. 1997–2013. URL: https://aclanthology.org/2023. search 21 (2020) 1–67. URL: http://jmlr.org/papers/
eacl-main.147. v21/20-074.html.
[40] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, [48] M. Ashida, M. Komachi, Towards automatic
genJ. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, eration of messages countering online hate speech
G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, and microaggressions, in: Proceedings of the Sixth
G. Krueger, T. Henighan, R. Child, A. Ramesh, Workshop on Online Abuse and Harms (WOAH),
D. M. Ziegler, J. Wu, C. Winter, C. Hesse, Association for Computational Linguistics, Seattle,
M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, Washington (Hybrid), 2022, pp. 11–23. URL: https:
J. Clark, C. Berner, S. McCandlish, A. Radford, //aclanthology.org/2022.woah-1.2. doi:10.18653/
I. Sutskever, D. Amodei, Language Models are v1/2022.woah-1.2.</p>
        <p>Few-Shot Learners, arXiv:2005.14165 [cs] (2020). [49] S. Iyer, X. V. Lin, R. Pasunuru, T. Mihaylov, D. Simig,
arXiv:2005.14165. P. Yu, K. Shuster, T. Wang, Q. Liu, P. S. Koura, et al.,
[41] C. J. Kennedy, G. Bacon, A. Sahn, C. von Vacano, Opt-iml: Scaling language model instruction meta
Constructing interval variables via faceted Rasch learning through the lens of generalization, 2022.
measurement and multitask deep learning: a hate arXiv:2212.12017.
speech application, 2020. URL: http://arxiv.org/ [50] B. Y. Lin, W. Zhou, M. Shen, P. Zhou, C.
Bhagavatabs/2009.10277. doi:10.48550/arXiv.2009.10277, ula, Y. Choi, X. Ren, CommonGen: A constrained
arXiv:2009.10277 [cs]. text generation challenge for generative
common[42] P. Sachdeva, R. Barreto, G. Bacon, A. Sahn, C. von sense reasoning, in: Findings of the Association
Vacano, C. Kennedy, The measuring hate speech for Computational Linguistics: EMNLP 2020,
Ascorpus: Leveraging rasch measurement theory for sociation for Computational Linguistics, Online,
data perspectivism, in: Proceedings of the 1st 2020, pp. 1823–1840. URL: https://aclanthology.org/
Workshop on Perspectivist Approaches to NLP 2020.findings-emnlp.165. doi: 10.18653/v1/2020.
@LREC2022, European Language Resources Asso- findings-emnlp.165.
ciation, Marseille, France, 2022, pp. 83–94. URL: [51] P. He, J. Gao, W. Chen, Debertav3:
Improvhttps://aclanthology.org/2022.nlperspectives-1.11. ing deberta using electra-style pre-training with
[43] U. Azam, H. Rizwan, A. Karim, Exploring data gradient-disentangled embedding sharing, 2023.
augmentation strategies for hate speech detection arXiv:2111.09543.
in Roman Urdu, in: Proceedings of the Thir- [52] J. Wei, K. Zou, EDA: Easy data augmentation
techteenth Language Resources and Evaluation Con- niques for boosting performance on text
classifiference, European Language Resources Associa- cation tasks, in: Proceedings of the 2019
Confertion, Marseille, France, 2022, pp. 4523–4531. URL: ence on Empirical Methods in Natural Language
https://aclanthology.org/2022.lrec-1.481. Processing and the 9th International Joint
Con[44] H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, ference on Natural Language Processing
(EMNLPW. Fedus, Y. Li, X. Wang, M. Dehghani, S. Brahma, IJCNLP), Association for Computational
LinguisA. Webson, S. S. Gu, Z. Dai, M. Suzgun, X. Chen, tics, Hong Kong, China, 2019, pp. 6382–6388. URL:
A. Chowdhery, A. Castro-Ros, M. Pellat, K. Robin- https://aclanthology.org/D19-1670. doi:10.18653/</p>
      </sec>
      <sec id="sec-7-2">
        <title>Below are examples the sequences and prompts used for training and prompting our models.</title>
        <sec id="sec-7-2-1">
          <title>FT-no target Write a (hateful) social media post: {text}</title>
        </sec>
        <sec id="sec-7-2-2">
          <title>FS-target Write a (hateful) social media post about</title>
          <p>{target}: {text} [...]</p>
        </sec>
        <sec id="sec-7-2-3">
          <title>Write a (hateful) social media post about {target}: {text}</title>
        </sec>
        <sec id="sec-7-2-4">
          <title>FS-no target Write a (hateful) social media post: {text}</title>
          <p>[...]</p>
        </sec>
        <sec id="sec-7-2-5">
          <title>Write a (hateful) social media post: {text}</title>
          <p>The values used for ‘target’ are the identity group
names in the MHS dataset, reported in Sec. 3.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>B. Hyperparameters and</title>
    </sec>
    <sec id="sec-9">
      <title>Reproducibility</title>
      <sec id="sec-9-1">
        <title>For all of our experiments, we employ the HuggingFace</title>
        <p>Python library. All the hyperparameters we use that are
not specified in this section are the default ones from
their TrainingArguments class. The classifiers we use
as baselines and for filtering are trained on 5 epochs.</p>
        <p>We finetune all generative models with batch 16 and
 = 1 − 3. For generation, we set top-p=0.9 and min
and max lengths of generated sequences to 5 and 150
tokens respectively. Finally, we avoid repeating 4-grams.</p>
        <p>All the classifiers that are trained on augmented data are
trained for 3 epochs with batch size 16 and LR 5 − 6. In
this case, at the end of training, we preserve the model
from the epoch with the lowest evaluation cross-entropy
loss.</p>
        <p>The random seeds we used for shufling, subsampling
the gold data, and initializing both generative and
classiifcation models are 522, 97, 709, 16, and 42. These were
chosen randomly. Finetuning of all classifiers and
generative models, including baselines and models trained on
augmented data, took 70 hours, of which 55 on a Nvidia
V100 GPU and 15 on a Nvidia A40. Inference time for
generating all of the sequences (a total of 8 million generated
texts) took ∼ 400 hours total.</p>
        <p>Declaration on Generative AI
During the preparation of this work, the author(s) did not use any generative AI tools or services.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Atanasova</surname>
          </string-name>
          , G. Karadzhov,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mubarak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Derczynski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pitenis</surname>
          </string-name>
          , Ç. Çöltekin, SemEval-2020
          <source>Task</source>
          <volume>12</volume>
          :
          <article-title>Multilingual Ofensive Language Identification in Social Media (OfensEval 2020)</article-title>
          ,
          <source>in: Proceedings of the Fourteenth Workshop on Semantic Evaluation</source>
          , International Committee for Computational Linguistics,
          <source>Barcelona (online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1425</fpage>
          -
          <lpage>1447</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Founta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Djouvas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chatzakou</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Leontiadis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Blackburn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Stringhini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vakali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sirivianos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kourtellis</surname>
          </string-name>
          ,
          <article-title>Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior</article-title>
          ,
          <source>in: Proceedings of the International AAAI Conference on Web and Social Media</source>
          , volume
          <volume>12</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Masullo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. N.</given-names>
            <surname>Whipple</surname>
          </string-name>
          ,
          <article-title>The downsides of digital labor: Exploring the toll incivility takes on online comment moderators</article-title>
          ,
          <source>Computers in Human Behavior</source>
          <volume>107</volume>
          (
          <year>2020</year>
          )
          <fpage>106262</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Klubicka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fernández</surname>
          </string-name>
          ,
          <article-title>Examining a hate speech corpus for hate speech detection and popularity prediction</article-title>
          ,
          <source>in: Proceedings of 4REAL Workshop - Workshop on Replicability and Reproducibility of Research Results in Science and Technology of Language</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Dixon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sorensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Thain</surname>
          </string-name>
          , L. Vasserman,
          <article-title>Measuring and Mitigating Unintended Bias FT-target Write a (hateful) social media post about {target}: {text}</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>