<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Italian Counter Narrative Generation to Fight Online Hate Speech</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yi-Ling Chung</string-name>
          <email>ychung@fbk.eu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Serra Sinem Tekirog˘ lu</string-name>
          <email>tekiroglu@fbk.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Guerini</string-name>
          <email>guerini@fbk.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Trento</institution>
          ,
          <addr-line>Fondazione Bruno Kessler</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. Counter Narratives are textual responses meant to withstand online hatred and prevent its spreading. The use of neural architectures for the generation of Counter Narratives (CNs) is beginning to be investigated by the NLP community. Still, the efforts were solely targeting English. In this paper, we try to fill the gap for Italian, studying how to implement CN generation approaches effectively. We experiment with an existing dataset of CNs and a novel language model, recently released for Italian, under several configurations, including zero and few shot learning. Results show that even for underresourced languages, data augmentation strategies paired with large unsupervised LMs can held promising results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Italiano. Le Contro Narrative sono
risposte testuali volte a contrastare l’odio
online e a prevenirne la diffusione. La
comunita` di NLP ha iniziato a studiare l’uso
di architetture neurali per la generazione
di CN. Tuttavia, gli sforzi sono stati rivolti
esclusivamente all’inglese. In questo
lavoro, cerchiamo di colmare la lacuna per
l’italiano, mostrando come implementare
efficacemente approcci di generazione di
CN. Sperimentiamo con un dataset
esistente di CN e un modello del linguaggio
per l’italiano recentemente rilasciato, in
diverse configurazioni, tra cui zero e few
shot learning. I risultati mostrano che
anche per lingue con poche risorse,
strategie di data augmentation abbinate a
potenti modelli del linguaggio possono
offrire risultati promettenti.</p>
      <p>Copyright ©2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction</title>
      <p>
        The rise of online Hate Speech (HS) brings along
the need for combating strategies as it can
trigger harmful psychological effects on the target
groups and more crimes against them. While
research studies have been widely focusing on hate
speech detection methodologies for social media
platforms
        <xref ref-type="bibr" rid="ref10 ref13 ref30 ref35 ref6">(Schmidt and Wiegand, 2017; Fortuna
and Nunes, 2018)</xref>
        , a recent line of research has
taken the problem a step further by addressing
the automatic generation of counter responses, aka
counter narratives
        <xref ref-type="bibr" rid="ref12 ref14 ref19 ref20 ref23 ref25 ref27 ref28 ref29 ref33 ref4">(Qian et al., 2019; Tekirog˘lu et
al., 2020)</xref>
        , in order to assist non-governmental
organizations in their real-world online hatred
combating efforts. An example of HS along with a
possible CN are shown below:
HS: Gli arabi sono tutti terroristi e vogliono
conquistarci con la violenza e le bombe. Bisogna
rispondere con il napalm. [Arabs are all
terrorists and they want to conquer us with
violence and bombs. We must respond with
napalm.]
CN: Essere di origine araba non significa essere
terroristi, evitiamo generalizzazioni che
portano solo ad altro odio. [Being of Arab
descent does not mean being a terrorist, let’s
avoid generalizations that only lead to more
hatred.]
      </p>
      <p>
        Despite the encouraging results of the counter
narrative generation task, experiments have been
limited to English due to the scarcity of hate
speech / counter narrative data in other languages.
In this paper, we investigate counter narrative
generation for Italian as a case study where zero or
only a small amount of task specific in-language
data is available. We first explore the portability
of generation across languages, considering that
recent neural machine translation (NMT) systems
have shown outstanding performances. We
propose utilizing off-the-shelf NMT models to
synthesize silver data from other languages, and
finetuning GePpeTto
        <xref ref-type="bibr" rid="ref12 ref19 ref20 ref33">(Mattei et al., 2020)</xref>
        , a recently
developed GPT-2 based language model for
Italian, on the silver data. We then examine the effect
of combining silver with gold data on CN
generation by experimenting with various gold data sizes.
Our findings show that a proper combination of
silver and gold data while fine-tuning LMs can
drastically reduce the need for expert-annotator
effort on target languages.
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        In this section we briefly recap relevant works
for our counter narrative generation task,
including the problem of online hatred recognition,
effectiveness of approaches to hatred intervention,
methodologies for generating counter-arguments,
and text generation for low-resourced languages.
Hate problem. A wealth of work has
investigated online hateful content, aiming at creating
datasets for hate speech identification
        <xref ref-type="bibr" rid="ref1 ref21 ref29 ref3 ref31 ref32 ref36">(Warner and
Hirschberg, 2012; Burnap and Williams, 2015;
Silva et al., 2016)</xref>
        . For instance, there are datasets
collected from Facebook
        <xref ref-type="bibr" rid="ref15 ref16 ref26 ref28 ref34 ref5 ref9">(Kumar et al., 2018)</xref>
        ,
forums
        <xref ref-type="bibr" rid="ref1 ref15 ref16 ref21 ref26 ref28 ref29 ref31 ref32 ref34 ref5 ref6 ref9">(Silva et al., 2016; de Gibert et al., 2018)</xref>
        ,
and Twitter
        <xref ref-type="bibr" rid="ref1 ref21 ref29 ref31 ref32 ref37">(Silva et al., 2016; Waseem and Hovy,
2016)</xref>
        . Hate speech detection tasks are available
at IberEval
        <xref ref-type="bibr" rid="ref15 ref16 ref26 ref28 ref34 ref5 ref9">(Fersini et al., 2018)</xref>
        for Spanish and
EVALITA
        <xref ref-type="bibr" rid="ref15 ref16 ref26 ref28 ref34 ref38 ref5 ref7 ref8 ref9">(Del Vigna12 et al., 2017; Fersini et al.,
2018)</xref>
        for Italian.
      </p>
      <p>
        Hate countering. Counter narratives can be
used as an effective approach to moderate hateful
content on social media platforms such as
Twitter
        <xref ref-type="bibr" rid="ref22 ref38 ref7 ref8">(Munger, 2017; Wright et al., 2017)</xref>
        , Youtube
        <xref ref-type="bibr" rid="ref14 ref23 ref25 ref27 ref28 ref29 ref38 ref4 ref7 ref8">(Ernst et al., 2017; Mathew et al., 2019)</xref>
        and
Facebook
        <xref ref-type="bibr" rid="ref37">(Schieb and Preuss, 2016)</xref>
        . Previous studies
on hate countering cover several aspects of CNs.
For example: defining counter narratives
        <xref ref-type="bibr" rid="ref1 ref21 ref29 ref31 ref32">(Benesch
et al., 2016)</xref>
        , studying their effectiveness
        <xref ref-type="bibr" rid="ref1 ref21 ref22 ref29 ref31 ref32 ref37 ref38 ref38 ref7 ref7 ref8 ref8">(Schieb
and Preuss, 2016; Silverman et al., 2016; Ernst
et al., 2017; Munger, 2017; Wright et al., 2017)</xref>
        ,
linguistically characterizing online counter
narrative accounts
        <xref ref-type="bibr" rid="ref15 ref16 ref26 ref28 ref34 ref5 ref9">(Mathew et al., 2018)</xref>
        , creating real
or simulated CN datasets
        <xref ref-type="bibr" rid="ref12 ref14 ref14 ref14 ref19 ref20 ref23 ref23 ref23 ref25 ref25 ref25 ref27 ref27 ref27 ref28 ref28 ref28 ref29 ref29 ref29 ref33 ref4 ref4 ref4">(Mathew et al., 2019;
Chung et al., 2019; Qian et al., 2019; Tekirog˘lu
et al., 2020)</xref>
        , and neural approaches to CN
generation
        <xref ref-type="bibr" rid="ref12 ref14 ref19 ref20 ref23 ref25 ref27 ref28 ref29 ref33 ref4">(Qian et al., 2019; Tekirog˘lu et al., 2020)</xref>
        .
Counter-argument Generation. This task
share the same abstract goal as CN generation
i.e. to produce the opposite or alternate stance of
a statement. Previous works adopted
sequenceto-sequence architectures to generate arguments
        <xref ref-type="bibr" rid="ref14 ref14 ref15 ref15 ref16 ref16 ref23 ref23 ref25 ref25 ref26 ref26 ref27 ref27 ref28 ref28 ref28 ref28 ref29 ref29 ref34 ref34 ref4 ref4 ref5 ref5 ref9 ref9">(Rakshit et al., 2019; Hua et al., 2019; Rach et al.,
2018; Le et al., 2018)</xref>
        targeting specific domains
in which massive discussion is available, such as
politics
        <xref ref-type="bibr" rid="ref10 ref13 ref14 ref15 ref16 ref23 ref25 ref26 ref27 ref28 ref28 ref29 ref34 ref35 ref4 ref5 ref6 ref9">(Hua et al., 2019; Hua and Wang, 2018;
Le et al., 2018)</xref>
        , and economy
        <xref ref-type="bibr" rid="ref15 ref15 ref16 ref16 ref26 ref26 ref28 ref28 ref34 ref34 ref5 ref5 ref9 ref9">(Le et al., 2018;
Wachsmuth et al., 2018)</xref>
        .
      </p>
      <p>
        NLG for under-resourced languages. In spite
of several studies addressing NLG, only a few
have investigated the generation for languages
other than English. For instance, there is the
porting of SimpleNLG API
        <xref ref-type="bibr" rid="ref11">(Gatt and Reiter, 2009)</xref>
        to Dutch
        <xref ref-type="bibr" rid="ref10 ref13 ref35 ref5 ref6">(de Jong and Theune, 2018)</xref>
        and Italian
        <xref ref-type="bibr" rid="ref1 ref21 ref29 ref31 ref32">(Mazzei et al., 2016)</xref>
        , or Bilingual generation via
combining NMT and Generative Adversarial
Networks
        <xref ref-type="bibr" rid="ref14 ref23 ref25 ref27 ref28 ref29 ref4">(Rashid et al., 2019)</xref>
        .
      </p>
    </sec>
    <sec id="sec-4">
      <title>3 Italian Counter Narrative Generation</title>
      <p>Our main goal is to determine a methodology
for Italian counter narrative generation
considering the scarcity of gold standard data for training.
Accordingly, we hypothesize that the availability
of a decent amount of silver data can provide a
kick-start for the generative models. Therefore,
we resort to data augmentation through
translation with the help of the existing datasets of hate
speech / counter narrative pairs in other languages.
For translation setting, we use DeepL1, an
off-theshelf and well-performing MT system, to translate
data from other languages to Italian. The
translated pairs are used for fine-tuning a large
Italian pre-trained generative model, i.e. GePpeTto,
along with the original Italian gold standard pairs.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Dataset</title>
      <p>
        For our study, we use CONAN dataset
        <xref ref-type="bibr" rid="ref14 ref23 ref25 ref27 ref28 ref29 ref4">(Chung
et al., 2019)</xref>
        , which is a niche-sourced
hatecountering dataset that consists of HS/CN pairs
focusing on Islamophobia. The dataset provides
pairs in English, French, and Italian, collected
with the help of operators from three European
NGOs specialized in online hate countering. Each
pair in CONAN can either be an original or one of
the 2 paraphrases of an original pair. In the
experiments, we used the following splits:
1. 2142 pairs (original IT pairs and 1 IT
paraphrase pair) as a training set made of gold
1https://www.deepl.com/translator
standard data.
2. 5996 pairs as a training set made of silver
data obtained by automatically translating FR
and EN pairs to IT.
3. 1071 pairs (the rest of the IT paraphrased
pairs) are kept for testing purposes.
5
      </p>
    </sec>
    <sec id="sec-6">
      <title>Models</title>
      <p>In order to inspect how Italian CN generation can
be accomplished under different resource
conditions, we test the effect of using (i) silver data, (ii)
gold standard data, and (iii) their combination. In
particular we experiment with the following
configurations on which GePpeTto is fine-tuned:
GP-trans. GePpeTto is fine-tuned on the
silver data obtained by translating EN and FR pairs
to IT using DeepL. This configuration represents
the worst case scenario, where no HS/CN pair is
available in the target language, and corresponds
to a zero-shot learning setting.</p>
      <p>Gp-ita. We fine-tune GePpeTto on all the
original IT pairs in CONAN. This represents our
practical best-case scenario, despite the fact that more
pairs might provide better results.</p>
      <p>GP-hybrid. We conjecture that introducing
even a small amount of gold standard examples
can help LMs adapt to the domain-specific
idiosyncrasies. Moreover, we inspect how
generation performance varies with the size of gold
standard data provided. In this regard, we conduct
a second phase of fine-tuning on top of the
GPtrans model using 100, 300, 500, 800, and full IT
pairs of CONAN. Therefore, we can represent
various intermediate conditions of few-shot learning
where few to several pairs for the target language
are available. Thus, we assess how much the
pretraining with the silver data helps to reduce the
amount of gold standard data needed to reach a
proper generation performance.
5.1</p>
      <sec id="sec-6-1">
        <title>Training Details</title>
        <p>For all the experiments, we have used
GePpeTto as the pretrained Italian
language model adopted from HuggingFace’s
Transformers library2 and fine-tuned our models
on a single K80 GPU using a batch size of
2048 tokens. The training pairs are represented
as [HS start token] HS [CN start token]
2https://github.com/huggingface/transformers
CN [CN end token]. The hyperparameter
tuning details are provided in the following.
At test time, we employed nucleus sampling
with a p value of 0.9 for the generation of
CNs. Conditioned on HSs, the generated
sequence of text tagged with [CN start token]
CN [CN end token] is selected as output.
Training Epochs We have empirically chosen 5
epochs for training for all the configurations, tuned
from f2, 3 and 5g on test set. Preliminary
experiments show that while lower number of epochs
grant higher novelty in the output, they also came
at the cost of lower BLEU scores. A further
manual evaluation confirmed that the generation with
5 epoch provides more suitable responses.
Learning rate Once defining the epochs, we
experimented with different learning rates of
[1,2,5]e-5 and chose 5e-5 for the best performing
setting - preliminary experiments show that while
producing less novel and slightly more repeated
text, the learning rate of 5e-5 consistently has
better results in terms of BLEU and ROUGE scores.
Fine-tuning steps. In case where multiple
datasets (silver and gold standard) were used, we
followed a multi-step fine-tuning procedure by
first using the silver and then the gold standard
dataset. Gururangan et al. (2020) showed that
task-adaptive pretraining using curated datasets
from a dataset with similar distribution with the
end task, provides significant improvements. Our
fine-tuning schema follows this finding by first
fine-tuning GePpeTto with the silver data as
the task adaptive pretraining with an augmented
dataset. Our preliminary experiments confirmed
that adapting fine-tuned models towards the
language characteristics of the target corpus is more
effective than mixing silver and gold data together
in a single fine-tuning procedure.
5.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Evaluation</title>
        <p>
          For our experiments we report word-overlap
metrics BLEU
          <xref ref-type="bibr" rid="ref24">(Papineni et al., 2002)</xref>
          and ROUGE
          <xref ref-type="bibr" rid="ref17">(Lin, 2004)</xref>
          to evaluate the CN generation on the
gold standard test set. As for the generation
quality, we compute Repetition Rate
          <xref ref-type="bibr" rid="ref2">(Bertoldi et al.,
2013)</xref>
          and Novelty
          <xref ref-type="bibr" rid="ref10 ref13 ref35 ref6">(Wang and Wan, 2018)</xref>
          to
assess how Diverse a response is with reference to
the given HS and how Novel the generation is
concerning the training data.
        </p>
        <p>We also conduct a human evaluation to compare
the generation quality of the configurations based
on 3 criteria: (i) Suitableness. How suitable the
given CN is as a response for the input HS. (ii)
Specificity. How specific the given CN is as a
response. This metric is used to discern suitable
responses that are nonetheless very generic. (iii)
Grammaticality. How grammatically correct the
given CN is. All scores were in a scale from 1 to
5.
6</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Results and Discussion</title>
      <p>Model comparison. Results in Table 1 show
that using the silver data (GP-trans) provides a
viable step towards a proper model. When gold
standard data is also available (GP-hybrid), we obtain
better quantitative performance in terms of BLEU
and ROUGE scores in comparison to the best case
scenario (GP-ita). Furthermore, mixing the silver
translation and the Italian gold standard data
(GPhybrid) yields better performances also in terms of
output diversity (RR 11.7 vs 12.8). On the
contrary, the most novel output is obtained by
GPtrans, which can be expected since EN and FR
pairs usually have slightly different focus on the
topic of Islamophobia (topics and tropes can vary
across nations and cultures). In Table 2 we provide
few examples of generated CNs.</p>
      <sec id="sec-7-1">
        <title>Learning Curve Discussion. As can be seen</title>
        <p>
          in Figure 1, even 100 Italian pairs are enough
to dramatically improve the performances of
GePpeTto on the task of CN generation over the
baseline GP-trans. If we continue fine-tuning
GPtrans with more and more Italian pairs, soon we
are able to outperform also GP-ita. The number
of examples required to obtain a new state of the
art CN generation in Italian comes within 200 and
300, which reduces the required amount of gold
standard data by around 80%. Therefore, it
becomes clear that a good NMT model can be of
fundamental help while porting the generation task to
new languages, especially if few or no gold
standard examples are available in the target language.
Considering the fact that the counter narrative data
collection is an expert-based task requiring costly
human effort
          <xref ref-type="bibr" rid="ref14 ref23 ref25 ref27 ref28 ref29 ref4">(Chung et al., 2019)</xref>
          , decreasing the
required amount of expert data can be of
remarkable importance for low-resource languages.
Human Evaluation. As annotators, we
employed 2 Italian native speakers that are expert
in counter narrative production. The
annotators were instructed in assessing CN suitableness,
specificity, and grammaticality with respect to the
paired hate speech. During training, we explained
what a common and suitable counter narrative is,
and then asked them to intuitively evaluate the
generation without overthinking. We further
presented 20 examples of HS/CN pairs to
demonstrate the appropriate evaluation. In order to avoid
comparison or primacy/recency effects, we have
presented 20 random pairs from each condition
to each annotator as a single randomized file and
asked them to evaluate each counter narrative with
respect to the 3 criteria. The results presented in
Table 3 show that all models reach very high
levels of grammaticality; most of the sentences were
completely grammatical and few ungrammatical
ones were due to dangling sentences. Moreover,
using silver data alone can already provide a
performance lower than but close to the GP-ita case
for Suitableness and Specificity. Finally,
finetuning GP-trans further using gold standard data
(GP-hybrid) provides the most suitable and the
least generic responses among the 3 models in line
with their performance ranking of automatic
metrics.
7
        </p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Conclusion and Future Work</title>
      <p>Counter narrative generation using neural
architectures is beginning to be studied for hatred
intervention. In this paper, we presented the first
attempt of CN generation for Italian,
investigating several variations of generation when gold data
is limited or not available. Our experiments
reveal that with simple data augmentation strategies
paired with powerful LMs can bring promising</p>
      <p>
        GP-trans: La maggior parte dei pedofili nel
Regno Unito sono uomini bianchi, quindi
secondo la vostra logica il mondo sarebbe un
posto migliore senza uomini bianchi? [Most
pedophiles in the UK are white men, so by your
logic would the world be a better place without
white men?]
GP-ita: Non esiste nessuna invasione islamica
dell’occidente. La popolazione musulmana in
Italia non supera il 5%. [There is no Islamic
invasion of the West. The Muslim population in
Italy does not exceed 5%.]
GP-hybrid: Non si pu o` ritenere una invasione
quando il numero di musulmani in Italia e`
inferiore addirittura al 5%. [It cannot be
considered an invasion when the number of Muslims
in Italy is less than even 5%.]
Gold: Salve, in realt a` il numero di musulmani
in Italia e` pari a poco meno il 5% della
popolazione totale: ritiene davvero che si possa
parlare di sostituzione? [Hello, in reality the
number of Muslims in Italy is equal to a little less
than 5% of the total population: do you really
think that we can talk about substitution?]
results for under-resourced languages. As future
work we plan to explore methods for improving
specificity via the use of diverse types of
knowledge (e.g., facts, events and named entities) from
external resources. This would also mitigate the
problem of the model possibly generating
hallucinated content (i.e. non-existent or wrong content),
see for example
        <xref ref-type="bibr" rid="ref12 ref14 ref19 ref20 ref23 ref25 ref27 ref28 ref29 ref33 ref4">(Maynez et al., 2020; Nie et al.,
2019)</xref>
        . Finally, we plan to apply this approach to
other hate phenomena such as antisemitism,
homophobia, and misogyny.
      </p>
      <p>Model
GP-trans
GP-ita
GP-hybrid</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Benesch et al.2016]
          <string-name>
            <given-names>Susan</given-names>
            <surname>Benesch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D</given-names>
            <surname>Ruths</surname>
          </string-name>
          , KP Dillon, HM Saleem, and
          <string-name>
            <given-names>L</given-names>
            <surname>Wright</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Considerations for successful counterspeech</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Bertoldi et al.2013]
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Bertoldi</surname>
          </string-name>
          , Mauro Cettolo, and
          <string-name>
            <given-names>Marcello</given-names>
            <surname>Federico</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Cache-based online adaptation for machine translation enhanced computer assisted translation</article-title>
          .
          <source>In MT-Summit</source>
          , pages
          <fpage>35</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[Burnap and Williams2015] Pete Burnap</article-title>
          and
          <string-name>
            <surname>Matthew L Williams</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making</article-title>
          .
          <source>Policy &amp; Internet</source>
          ,
          <volume>7</volume>
          (
          <issue>2</issue>
          ):
          <fpage>223</fpage>
          -
          <lpage>242</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Chung et al.2019]
          <string-name>
            <surname>Yi-Ling</surname>
            <given-names>Chung</given-names>
          </string-name>
          , Elizaveta Kuzmenko, Serra Sinem Tekirog˘lu, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Guerini</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Conan-counter narratives through nichesourcing: a multilingual dataset of responses to fight online hate speech</article-title>
          .
          <source>In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>2819</fpage>
          -
          <lpage>2829</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>[de Gibert</surname>
          </string-name>
          et al.2018] Ona de Gibert, Naiara Perez, Aitor Garc´
          <article-title>ıa-</article-title>
          <string-name>
            <surname>Pablos</surname>
            , and
            <given-names>Montse</given-names>
          </string-name>
          <string-name>
            <surname>Cuadros</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Hate speech dataset from a white supremacy forum</article-title>
          .
          <source>In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)</source>
          , pages
          <fpage>11</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [de Jong and Theune2018] Ruud de Jong and Marie¨t Theune.
          <year>2018</year>
          .
          <article-title>Going dutch: Creating simplenlg-nl</article-title>
          .
          <source>In Proceedings of the 11th International Conference on Natural Language Generation</source>
          , pages
          <fpage>73</fpage>
          -
          <lpage>78</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Del Vigna12</source>
          et al.
          <source>2017] Fabio Del Vigna12</source>
          ,
          <string-name>
            <surname>Andrea</surname>
            <given-names>Cimino23</given-names>
          </string-name>
          , Felice Dell'Orletta,
          <string-name>
            <given-names>Marinella</given-names>
            <surname>Petrocchi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Maurizio</given-names>
            <surname>Tesconi</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Hate me, hate me not: Hate speech detection on facebook</article-title>
          .
          <source>In Proceedings of the First Italian Conference on Cybersecurity (ITASEC17)</source>
          , pages
          <fpage>86</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Ernst et al.2017]
          <string-name>
            <given-names>Julian</given-names>
            <surname>Ernst</surname>
          </string-name>
          , Josephine B Schmitt, Diana Rieger, Ann Kristin Beier,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Vorderer</surname>
          </string-name>
          , Gary Bente, and
          <string-name>
            <surname>Hans-Joachim Roth</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Hate beneath the counter speech? a qualitative content analysis of user comments on youtube related to counter speech videos</article-title>
          .
          <source>Journal for Deradicalization</source>
          , (
          <volume>10</volume>
          ):
          <fpage>1</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Fersini et al.2018]
          <string-name>
            <given-names>Elisabetta</given-names>
            <surname>Fersini</surname>
          </string-name>
          , Debora Nozza, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the evalita 2018 task on automatic misogyny identification (ami)</article-title>
          .
          <source>EVALITA Evaluation of NLP and Speech Tools for Italian</source>
          ,
          <volume>12</volume>
          :
          <fpage>59</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>[Fortuna and Nunes2018] Paula Fortuna and Se´rgio Nunes</source>
          .
          <year>2018</year>
          .
          <article-title>A survey on automatic detection of hate speech in text</article-title>
          .
          <source>ACM Computing Surveys (CSUR)</source>
          ,
          <volume>51</volume>
          (
          <issue>4</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[Gatt and Reiter2009] Albert Gatt and Ehud Reiter</source>
          .
          <year>2009</year>
          .
          <article-title>Simplenlg: A realisation engine for practical applications</article-title>
          .
          <source>In Proceedings of the 12th European Workshop on Natural Language Generation (ENLG</source>
          <year>2009</year>
          ), pages
          <fpage>90</fpage>
          -
          <lpage>93</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Gururangan et al.2020]
          <string-name>
            <given-names>Suchin</given-names>
            <surname>Gururangan</surname>
          </string-name>
          , Ana Marasovic´,
          <string-name>
            <surname>Swabha</surname>
            <given-names>Swayamdipta</given-names>
          </string-name>
          , Kyle Lo, Iz Beltagy,
          <source>Doug Downey, and Noah A Smith</source>
          .
          <year>2020</year>
          .
          <article-title>Don't stop pretraining: Adapt language models to domains and tasks</article-title>
          .
          <source>In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>[Hua and Wang2018] Xinyu Hua and Lu Wang</source>
          .
          <year>2018</year>
          .
          <article-title>Neural argument generation augmented with externally retrieved evidence</article-title>
          . arXiv preprint arXiv:
          <year>1805</year>
          .10254.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Hua et al.2019]
          <string-name>
            <given-names>Xinyu</given-names>
            <surname>Hua</surname>
          </string-name>
          , Zhe Hu, and
          <string-name>
            <given-names>Lu</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Argument generation with retrieval, planning, and realization</article-title>
          . arXiv preprint arXiv:
          <year>1906</year>
          .03717.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Kumar et al.2018]
          <string-name>
            <given-names>Ritesh</given-names>
            <surname>Kumar</surname>
          </string-name>
          , Atul Kr Ojha, Shervin Malmasi, and
          <string-name>
            <given-names>Marcos</given-names>
            <surname>Zampieri</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Benchmarking aggression identification in social media</article-title>
          .
          <source>In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC2018)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Le et al.2018]
          <string-name>
            <surname>Dieu-Thu</surname>
            <given-names>Le</given-names>
          </string-name>
          , Cam Tu Nguyen, and Kim Anh Nguyen.
          <year>2018</year>
          .
          <article-title>Dave the debater: a retrieval-based and generative argumentative dialogue agent</article-title>
          .
          <source>In Proceedings of the 5th Workshop on Argument Mining</source>
          , pages
          <fpage>121</fpage>
          -
          <lpage>130</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Lin2004]
          <string-name>
            <surname>Chin-Yew Lin</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Rouge: A package for automatic evaluation of summaries</article-title>
          .
          <source>In Text summarization branches out</source>
          , pages
          <fpage>74</fpage>
          -
          <lpage>81</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <article-title>tering online hate speech</article-title>
          .
          <source>In Proceedings of the International AAAI Conference on Web and Social Media</source>
          , volume
          <volume>13</volume>
          , pages
          <fpage>369</fpage>
          -
          <lpage>380</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Mattei et al.2020
          <string-name>
            <surname>] Lorenzo De Mattei</surname>
            , Michele Cafagna, Felice Dell'Orletta,
            <given-names>Malvina</given-names>
          </string-name>
          <string-name>
            <surname>Nissim</surname>
            , and
            <given-names>Marco</given-names>
          </string-name>
          <string-name>
            <surname>Guerini</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Geppetto carves italian into a language model</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Maynez et al.2020]
          <string-name>
            <given-names>Joshua</given-names>
            <surname>Maynez</surname>
          </string-name>
          , Shashi Narayan, Bernd Bohnet, and
          <string-name>
            <surname>Ryan McDonald</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>On faithfulness and factuality in abstractive summarization</article-title>
          . arXiv preprint arXiv:
          <year>2005</year>
          .00661.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [Mazzei et al.2016]
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Mazzei</surname>
          </string-name>
          , Cristina Battaglino, and
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Simplenlg-it: adapting simplenlg to italian</article-title>
          .
          <source>In Proceedings of the 9th international natural language generation conference</source>
          , pages
          <fpage>184</fpage>
          -
          <lpage>192</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [Munger2017]
          <string-name>
            <given-names>Kevin</given-names>
            <surname>Munger</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Tweetment effects on the tweeted: Experimentally reducing racist harassment</article-title>
          .
          <source>Political Behavior</source>
          ,
          <volume>39</volume>
          (
          <issue>3</issue>
          ):
          <fpage>629</fpage>
          -
          <lpage>649</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [Nie et al.2019]
          <string-name>
            <given-names>Feng</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jin-Ge</surname>
            <given-names>Yao</given-names>
          </string-name>
          , Jinpeng Wang,
          <string-name>
            <surname>Rong Pan</surname>
          </string-name>
          , and
          <string-name>
            <surname>Chin-Yew Lin</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>A simple recipe towards reducing hallucination in neural surface realisation</article-title>
          .
          <source>In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>2673</fpage>
          -
          <lpage>2679</lpage>
          , Florence, Italy, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [Papineni et al.2002]
          <string-name>
            <given-names>Kishore</given-names>
            <surname>Papineni</surname>
          </string-name>
          , Salim Roukos, Todd Ward, and
          <string-name>
            <surname>Wei-Jing Zhu</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Bleu: a method for automatic evaluation of machine translation</article-title>
          .
          <source>In Proceedings of the 40th annual meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>311</fpage>
          -
          <lpage>318</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [Qian et al.2019]
          <string-name>
            <given-names>Jing</given-names>
            <surname>Qian</surname>
          </string-name>
          , Anna Bethke, Yinyin Liu, Elizabeth Belding, and William Yang Wang.
          <year>2019</year>
          .
          <article-title>A benchmark dataset for learning to intervene in online hate speech</article-title>
          . arXiv preprint arXiv:
          <year>1909</year>
          .04251.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [Rach et al.2018]
          <string-name>
            <given-names>Niklas</given-names>
            <surname>Rach</surname>
          </string-name>
          , Saskia Langhammer, Wolfgang Minker, and
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Ultes</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Utilizing argument mining techniques for argumentative dialogue systems</article-title>
          .
          <source>In Proceedings of the 9th International Workshop On Spoken Dialogue Systems (IWSDS).</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [Rakshit et al.2019]
          <article-title>Geetanjali Rakshit, Kevin</article-title>
          K Bowden, Lena Reed, Amita Misra, and
          <string-name>
            <given-names>Marilyn</given-names>
            <surname>Walker</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Debbie, the debate bot of the future</article-title>
          .
          <source>In Advanced Social Interaction with Agents</source>
          , pages
          <fpage>45</fpage>
          -
          <lpage>52</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [Mathew et al.2018]
          <string-name>
            <given-names>Binny</given-names>
            <surname>Mathew</surname>
          </string-name>
          , Navish Kumar, [Rashid et al.2019]
          <string-name>
            <given-names>Ahmad</given-names>
            <surname>Rashid</surname>
          </string-name>
          , Alan Do-Omri, Pawan Goyal,
          <string-name>
            <given-names>Animesh</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          , et al.
          <year>2018</year>
          . Md Haidar, Qun Liu,
          <string-name>
            <given-names>Mehdi</given-names>
            <surname>Rezagholizadeh</surname>
          </string-name>
          , et al.
          <article-title>Analyzing the hate and counter speech accounts on 2019</article-title>
          .
          <article-title>Bilingual-gan: A step towards parallel text twitter</article-title>
          . arXiv preprint arXiv:
          <year>1812</year>
          .02712. generation. arXiv preprint arXiv:
          <year>1904</year>
          .04742.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [Mathew et al.2019]
          <article-title>Binny Mathew, Punyajoy Saha, [Schieb and Preuss2016] Carla Schieb and Mike Hardik Tharad</article-title>
          , Subham Rajgaria, Prajwal Singha- Preuss.
          <year>2016</year>
          .
          <article-title>Governing hate speech by means nia, Suman Kalyan Maity, Pawan Goyal, and Ani- of counterspeech on facebook</article-title>
          .
          <source>In 66th ica annual mesh Mukherjee</source>
          .
          <year>2019</year>
          .
          <article-title>Thou shalt not hate</article-title>
          : Coun- conference, at Fukuoka, Japan, pages
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <source>[Schmidt and Wiegand2017] Anna Schmidt and Michael Wiegand</source>
          .
          <year>2017</year>
          .
          <article-title>A survey on hate speech detection using natural language processing</article-title>
          .
          <source>In Proceedings of the Fifth International workshop on natural language processing for social media</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [Silva et al.2016]
          <string-name>
            <given-names>Leandro</given-names>
            <surname>Silva</surname>
          </string-name>
          , Mainack Mondal, Denzil Correa, Fabr´ıcio Benevenuto, and
          <string-name>
            <given-names>Ingmar</given-names>
            <surname>Weber</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Analyzing the targets of hate in online social media</article-title>
          .
          <source>arXiv preprint arXiv:1603</source>
          .
          <fpage>07709</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [Silverman et al.2016]
          <string-name>
            <given-names>Tanya</given-names>
            <surname>Silverman</surname>
          </string-name>
          , Christopher J Stewart,
          <source>Jonathan Birdwell, and Zahed Amanullah</source>
          .
          <year>2016</year>
          .
          <article-title>The impact of counter-narratives. Institute for Strategic Dialogue</article-title>
          , pages
          <fpage>1</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [Tekirog˘lu et al.2020]
          <article-title>Serra Sinem Tekirog˘lu</article-title>
          ,
          <string-name>
            <surname>Yi-Ling Chung</surname>
            , and
            <given-names>Marco</given-names>
          </string-name>
          <string-name>
            <surname>Guerini</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Generating counter narratives against online hate speech: Data and strategies</article-title>
          .
          <source>In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.</source>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [Wachsmuth et al.2018]
          <string-name>
            <given-names>Henning</given-names>
            <surname>Wachsmuth</surname>
          </string-name>
          , Shahbaz Syed, and
          <string-name>
            <given-names>Benno</given-names>
            <surname>Stein</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Retrieval of the best counterargument without prior topic knowledge</article-title>
          .
          <source>In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pages
          <fpage>241</fpage>
          -
          <lpage>251</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <source>[Wang and Wan2018] Ke Wang and Xiaojun Wan</source>
          .
          <year>2018</year>
          .
          <article-title>Sentigan: Generating sentimental texts via mixture adversarial networks</article-title>
          .
          <source>In IJCAI</source>
          , pages
          <fpage>4446</fpage>
          -
          <lpage>4452</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <source>[Warner and Hirschberg2012] William Warner and Julia Hirschberg</source>
          .
          <year>2012</year>
          .
          <article-title>Detecting hate speech on the world wide web</article-title>
          .
          <source>In Proceedings of the second workshop on language in social media</source>
          , pages
          <fpage>19</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <source>[Waseem and Hovy2016] Zeerak Waseem and Dirk Hovy</source>
          .
          <year>2016</year>
          .
          <article-title>Hateful symbols or hateful people? predictive features for hate speech detection on twitter</article-title>
          .
          <source>In Proceedings of the NAACL student research workshop</source>
          , pages
          <fpage>88</fpage>
          -
          <lpage>93</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [Wright et al.2017]
          <string-name>
            <given-names>Lucas</given-names>
            <surname>Wright</surname>
          </string-name>
          , Derek Ruths, Kelly P Dillon, Haji Mohammad Saleem, and
          <string-name>
            <given-names>Susan</given-names>
            <surname>Benesch</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Vectors for counterspeech on twitter</article-title>
          .
          <source>In Proceedings of the First Workshop on Abusive Language Online</source>
          , pages
          <fpage>57</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>