<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Marburg, Germany
$ johannes.schaefer@uni-hildesheim.de (J. Schäfer); mandl@uni-hildesheim.de (T. Mandl)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Comparative Survey of German Hate Speech Datasets: Background, Characteristics and Biases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Markus Bertram</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johannes Schäfer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Mandl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Hildesheim</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The large fraction of hate speech and other ofensive and objectionable content online poses a vast challenge to societies. Ofensive language such as insulting, hurtful, derogatory, or obscene content directed from one person to another and open to others undermines objective discussions. Hate speech detection quality depends on the datasets available for training. Potential bias needs to be identified in order to increase the generalization performance of the trained classifiers. This article gives an overview on nine German hate speech datasets. We apply a framework from the literature to gain insights into potential bias. Using diferent methods, our analysis shows that there are various topics in the diferent datasets. The results are shown and compared for LSI, Topic models, Mutual Information and Shapley values.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Hate speech</kwd>
        <kwd>datasets</kwd>
        <kwd>reliability</kwd>
        <kwd>PMI</kwd>
        <kwd>LSI</kwd>
        <kwd>Shapley values</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Hate speech and its detection has gotten significant attention in recent years [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Increasingly,
governments are trying to police social media platforms and require the implementation of
automatic detection methods of illegal content such as hate speech and disinformation. A
notable example of this is the Digital Services Act adopted by the EU parliament in July 2022,
with the aim of "protection of users’ rights online" [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This shows that hate speech and its
detection raises important ethical questions with regards to free speech, protecting users and
groups, as well as promoting social good. Hate speech intends to do harm to individuals and
groups which motivates private and government actors to discourage it.
      </p>
      <p>
        Because of the vast amount of online communication, which makes a manual review of all
communication infeasible, there exists a need to automatically filter or detect potential hate
speech [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Hate speech detection methods try to automatically predict how likely a given form
of online communication contains hate speech, and thereby can assist humans in filtering. To
properly train these systems, reliable datasets need to be available. So far, there has been little
work regarding the analysis of the quality of datasets and the comparison of datasets in natural
language processing (NLP) in general.
      </p>
      <p>
        With this paper we address this issue and present a survey of nine German hate speech
datasets. In the following sections we review related work (see Section 2) and outline the
comparison framework by Wich et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] in Section 3 as basis for our analysis. In Section 4 we
discuss the datasets included in our survey and present the results of the framework analysis in
Section 5. Finally, we conclude in Section 6.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Research in Hate Speech Detection</title>
      <p>
        The question of the quality of databases has been approached from several angles and there is
concern that current datasets do not lead to classifiers with a good level of generalization [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ].
      </p>
      <p>
        A recent study analyzed six diferent English language hate speech datasets, with diferent
but related labels like hate speech, ofensive, aggression, toxicity etc. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The authors visualized
how similar and compatible classes are within and across the datasets and measured how well
each class afects performance of hate speech classifiers. They grouped semantically similar
classes and calculated centroids using pre-trained word embedding for each class, which are
then used to calculate distances between them.
      </p>
      <p>
        Several other works explored hate speech datasets with regards to their biases and
characteristics, as well as their generalizability. A study by Nejadgholi and Kiritchenko [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] explored two
diferent types of bias in hate speech datasets and their efect on cross-dataset generalization:
topic bias and task formulation bias. The former is a type of selection bias and was identified
using keyword search. The authors showed that some topics are more generalizable than others.
The latter bias describes the diference in the definitions of classes between the datasets. The
efect of this bias was estimated by training classifiers on diferent tasks. HATECHECK for
English hate speech datasets is a collection of 29 tests, each designed to ofer insight into specific
weaknesses of hate speech classifiers [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. For each test, a manually annotated test set was
created. The authors showed that in their setting, models tend to focus on specific terms and
not take the context into account. Lastly, Yin and Zubiaga [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] summarized the cross-dataset
performance of English hate speech detection models and provided reasons for why models
fail to generalize. They argued that models fail to generalize because of diferent grammar and
vocabulary used in hate speech datasets, too few labeled data, sampling bias, representation
biases, i.e. the failure of models to take the language of minority groups into account, as well as
models failing to detect implicit hate speech.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Bias and Comparison Framework</title>
      <p>
        For the analysis of German hate speech collections, we applied the framework introduced by
Wich et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] that can be used to show the biases and characteristics of such datasets. This
bias framework can visualize the diference of the probability distributions between and within
hate speech datasets. It has been applied to English and Arabic hate speech datasets.
      </p>
      <sec id="sec-3-1">
        <title>3.1. LSI-based Similarity</title>
        <p>
          The first approach implemented is based on Latent Semantic Indexing (LSI), presented by
Deerwester et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], and is a way to visualize the intra-dataset similarity between classes. LSI
is a method to find a transformation  →  that embeds documents in a lower-dimensional
latent space based on their semantic similarities. Specifically, for each dataset, all unique words
are extracted and a document-term matrix is created. Then, Singular-Value Decomposition
with  singular values is applied. After that, the datasets are then filtered by classes and each
document is transformed into a bag-of-word vector, which is then transformed into the latent
space.
        </p>
        <p>Lastly, the average cosine-similarity between the documents of the same and the other classes
are computed. The average cosine-similarity is a measure of how similar the classes are with
itself and the other classes and are therefore a measure of intra-dataset similarity.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Word-embedding-based Similarity</title>
        <p>
          This approach focuses on visualizing the intra- and inter-dataset similarity using pre-trained
word embeddings. Diferent to Wich et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], we use word embeddings produced by the
pretrained gBERT model. Because gBERT does not embed whole sentences directly, the embedding
of the first [CLS] token as sentence representation of the document is used [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. The word
embedding vectors of each dataset are averaged into a centroid which acts as a representation
of the entire dataset. Then, Principal Component Analysis (PCA) is performed to project each
centroid into a two-dimensional vector space in order to visualize the inter-dataset similarity. In
addition to the suggestion in the framework [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], we intended to visualize inter- and intra-dataset
similarity on a class level. We separated the documents into classes before averaging. This way,
a centroid of each class for each dataset is obtained before performing PCA.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. MI-based Word Rankings</title>
        <p>The third method is a ranking of the 10 most relevant terms of the hate speech classes for each
dataset. The framework used pointwise-mutual information (PMI) to rank the most relevant
terms for each class.</p>
        <p>However, experiments showed that the datasets have a significant amount of terms which
only occur in one class. The PMI would then be the same highest value for all the words
regardless of how often it occurs. Therefore, instead Mutual Information is used, which is the
PMI weighted by the expectation of the joint word-class distribution, i.e. the relative word-class
frequency. PMI is more useful, because a word that occurs more often is more relevant to a
class than a word that does not.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Cross-Dataset Topic Model</title>
        <p>
          This approach visualizes the most relevant topics of the hate speech datasets using CluWords
[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. First, a sample from all hate speech documents of all datasets is taken. Then, a vocabulary
 of all unique terms in the sample is constructed. For each term  in the vocabulary, a word
embedding vector is computed. Since this transformation is context-less in contrast to gBERT,
German Fasttext word embeddings are used [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>
          Lastly, the topics and CluWords are projected into a two-dimensional vector space using
t-SNE, introduced by van der Maaten and Hinton [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], in order to visualize them.
Dataset name
Covid2021
De-reddit-corpus
Germeval2018
Germeval2019
Hasoc2019
Hasoc2020
iHS
IWG Hate. pub.
        </p>
        <p>
          Telegram
[
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]
Unpub.
[
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]
[
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]
[
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]
[
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]
[
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]
[25]
[26]
        </p>
        <p>Twitter
Reddit
Twitter
Twitter
Facebook, Twitter
Twitter
Twitter
Twitter</p>
        <p>
          Telegram
3.5. Inter-rater Reliability
# of labeled # of unlabeled abusive % Inter-rater
samples samples of labeled data agreement
The next approach focuses on the inter-rater reliability of the dataset annotators. Since not
all datasets provide labeling information, this can only be calculated for those that do. The
inter-rater reliability is calculated using Krippendorf’s alpha [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>3.6. SHAP Feature Importance</title>
        <p>
          The last approach is based on SHAP (SHapley Additive exPlanations) [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. It is a way to explain
the importance of features for diferent hate speech classifiers. For each hate speech classifier  ,
an explanation model  is approximated.  uses simplified binary valued inputs ′ that map to
the original inputs through a mapping function ℎ, i.e.  ≈ ℎ(′).  then tries to approximate
(′) ≈  (ℎ(′)).
        </p>
        <p>SHAP learns the values of the factors  for each explanation model. Since  approximates
our hate speech classifier  , the higher the value  for a feature ′, the more important this
feature is to the classifier. Instead of displaying the feature importance plot for a single example,
we instead calculate the global feature importance for each classifier using SHAP barplots.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Hate Speech Data Collections</title>
      <p>
        This section presents the datasets included in our analysis. Each dataset consists of recent real
world examples of hate speech in online communication that were written in German. Our goal
was to select largest and most recent datasets that can be found for this purpose. An overview
is given in Table 1. The datasets are explained in detail in the following sections.
4.1. Covid2021
The first dataset contains German tweets collected from Twitter with COVID-19 as topic,
published in 2021 [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The tweets were sampled from an annotation pool that is comprised in
equal parts from three other pools: a replies pool, a community pool and a topic pool.
      </p>
      <p>
        The replies pool was sampled from replies to posts published between 01.01.2020 and
20.02.2021 by three Twitter seed accounts that were identified as being influential and
spreading COVID-19 misinformation. Only the tweets that contained one of 65 COVID-19 related
keywords were considered for this purpose. The community pool then was fed from tweets
sampled from the timeline of the accounts that replied to the seed accounts. The topic pool
was sampled from tweets related to COVID-19 and hate speech. Lastly, tweets were sampled
from the annotation pool and labeled by three annotators using a binary labeling scheme. A
tweet was labeled ABUSIVE if it contains attacks or threats, insults, harassment, hate as well as
degradation. Tweets were labeled NEUTRAL if otherwise. In total, 4,960 tweets were labeled, of
which 1,105 were classified as abusive, and 3,855 as neutral. The Krippendorf’s alpha is 91.5%.
4.2. Germeval2018
GermEval Shared Task on the Identification of Ofensive Language, in short Germeval2018 [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]
consists of German tweets collected from Twitter. Specifically, tweets were sampled from the
timeline of around 100 diferent users, each of which was selected because they posted both
ofensive, as well as non-ofensive tweets. In total, 8,541 tweets were sampled and then manually
annotated by one of three annotators using two diferent labeling schema, coarse-grained and
ifne-grained.
      </p>
      <p>Coarse-grained is a binary classification scheme that labels a tweet as OFFENSE if it includes
abusive language, insults or profanity, and NEUTRAL if not. Because the tweets were sampled
around the time of the so called refugee crisis in Germany, the dataset creators noticed that
certain non-ofensive words had a high-frequency in the documents labeled as the hate speech
class, but that did not appear in the non-hate speech class. Therefore, in order to debias the
datasets, they further added non-hate speech tweets containing these words. Lastly, they split
the dataset into a training and test set. The tweets sampled from each user only appear in
one of the sets. In total, 2,890 tweets were labeled as abusive and 5,651 as neutral. It has a
Krippendorf’s alpha of 78%.</p>
      <sec id="sec-4-1">
        <title>4.3. De-Reddit-corpus</title>
        <p>
          De-Reddit-corpus was built by the authors of this paper containing posts from the German /r/de
subreddit from the reddit.com website. In total the corpus comprises 2,992,835 comments from
272,661 submissions created in 2019 or earlier. The comments were pseudo-labeled using a
CNN model with word embeddings described by Schäfer and Burtenshaw [27]. The model was
trained on Germeval2018 where it achieved an F1-score of 73.35%. Each comment was assigned
a binary pseudo-label as well as the predicted label probability. While this dataset provides a
significant number of examples of the phenomenon and can be useful for analyses, we do not
recommend using it as a training dataset due to a lack of manual annotation supervision.
4.4. Germeval2019
GermEval Shared Task 2 on the Identification of Oefnsive Language from 2019, in short
Germeval2019 [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] also consists of a training and a test set. The training set of Germeval2019
consists primarily of tweets from the training and test set of Germeval2018, as well as some
newly sampled tweets. The test set consists of entirely newly sampled tweets using the same
method illustrated in Germeval2018. In addition, this time a specific efort was made to include
tweets from users across the whole political spectrum.
        </p>
        <p>
          The tweets were then manually annotated using the same labeling scheme as in Germeval2018.
In total, Germeval2019 consists of 9,862 labeled tweets, 5,103 of which are labeled as abusive
and the other 4,759 as neutral. The Cohen’s kappa inter-rater reliability is  = 59%.
4.5. Hasoc2019
The fifth dataset is Hate Speech and Ofensive Content Identification in IndoEuropean Languages,
in short Hasoc2019 [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. The German subset of this dataset is used. The posts included in
Hasoc2019 were sampled from Facebook and Twitter. The posts were manually labeled using
a binary as well as a fine-grained labeling scheme. The binary labeling scheme annotates a
post as either hate speech/ofensive (HOF) or as non-hate speech (NOT). Here, hate speech is
defined as posts containing Hate, ofensive words, aggression, or profanity. In total, 4,699 posts
were annotated, 543 of which were labeled as abusive, 4,126 as neutral, with a Cohen’s kappa
inter-rater agreement of  = 88%.
4.6. Hasoc2020
The sixth dataset is the German Hate Speech and Ofensive Content Identification in IndoEuropean
Languages dataset from 2020, Hasoc2020 [28]. It consists of tweets sampled from a collection of
tweets created in May 2019. First, non-German tweets were filtered using the language attribute
metadata provided by Twitter. Then, the tweets were sampled using a Support Vector Machine
(SVM) hate speech classifier trained on Germeval2018 and Hasoc2019. The classifier was trained
in such a way that it achieves an F1-score of around 0.5. All tweets that were labeled hateful by
the classifier were included in the sample. In addition, 5% of tweets that the classifier did not
label as hateful were also included.
        </p>
        <p>
          For the binary hate speech classification task, tweets were labeled as HOF when they contained
hate, ofensive or profane content, and NOT when otherwise. All tweets in the dataset were
manually labeled twice by two diferent annotators. In cases when the two annotators disagreed
on a label, a third annotator who did not yet see the tweet assigned the label. 3,400 tweets were
labeled in total, with 973 assigned to abusive and 2,427 to neutral. Cohen’s kappa inter-rater
agreement is  = 83.3%.
4.7. iHS
iHS is an unpublished dataset of potentially illegal hate speech [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. The creation of this dataset
consisted of two steps. First, court cases were collected in which German social media posts
were identified to be violating certain laws associated with hate speech. Then, using these
posts as examples, 102 tweets from Germeval2019 were manually extracted that were deemed
to potentially violate German law in order to create manual annotation guidelines [29].
        </p>
        <p>Lastly, text posts from Twitter were annotated using these guidelines in several annotation
rounds. Tweets were assigned to one of six categories: Public incitement to commit ofences,
Incitement of masses, malicious gossip and defamation, insults, ofensive language and other. The
ofensive language category contains 214 tweets that are not illegal but still deemed hateful.
The remaining 747 tweets are labeled as other. The Fleiss kappa inter-rater agreement ranged
between  = 44% and  = 55%.</p>
        <p>For the purpose of binary hate speech classification, the first five categories are considered as
abusive, the other category as neutral. In addition to the 1,249 labeled tweets, 275,022 unlabeled
tweets from were also available.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.8. IWG Hatespeech public</title>
        <p>IWG Hatespeech public [25] contains German hate speech in the context of the refugee crisis
in Europe. This dataset consists of tweets from Twitter written in 2016. They were sampled
using keyword search with 10 diferent hashtags that were considered to likely contain a
disproportionate amount of hate speech. After filtering these, the tweets were manually annotated
by splitting the dataset into six parts, each of which was annotated by two of six annotators.
They used a binary labeling scheme of labeling a tweet as hate speech if it violates the Twitter
definition on hateful conduct, and neutral if not. In total, the dataset contains 469 tweets, 110
abusive and 359 neutral. The Krippendorf’s alpha is 38.29%.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.9. Telegram</title>
        <p>The last dataset that is analyzed is referred to as Telegram [26]. It contains messages from
German Telegram channels that were posted between 01.01.2019 and 15.03.2021.</p>
        <p>
          The authors used a snowball sampling strategy. Specifically, they first collected messages from
51 public seed channels known to spread hate. Using these as starting points, they then collected
all messages from channels that were mentioned by them or forwarded to them from other
channels. This procedure was repeated once again for the newly acquired messages. To filter
out languages other than German, the message texts were fed into a classifier using multilingual
word vectors from fastText [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. The messages were manually labeled as abusive and neutral
by five annotators using the same labeling scheme as the covid2021 dataset [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. In total, 1,149
messages were labeled, of which 181 were abusive and 968 neutral. The Krippendorf’s alpha
is 73.87%. In addition, the unlabeled dataset was also provided to me. It consists of 5,421,845
Telegram messages that were pseudo-labeled by a hate speech classifier [26].
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiment Results</title>
      <p>
        We apply the framework by Wich et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] as described in Section 3 on the datasets discussed
in the previous section. In this section, we present and discuss the results of this analysis. We
provide the code used for this research on GitHub1.
      </p>
      <sec id="sec-5-1">
        <title>5.1. LSI-based Similarity</title>
        <p>We now examine the results of the LSI-based intra-dataset similarity experiment. Table 2 shows
the similarity values of the binary hate speech classes within each of the nine hate speech
1https://github.com/MarkusBertram/Cross-Dataset-Generalization-of-German-Hate-Speech-Datasets
Covid2021
De-reddit-corpus
Germeval2018
Germeval2019
Hasoc2019
Hasoc2020
iHS
IWG Hatespeech public
Telegram
abusive → abusive abusive → neutral, neutral → neutral</p>
        <p>neutral → abusive
datasets using 16 LSI-dimensions. The left column are the respective datasets, the top row
indicates the direction of the LSI similarity between the classes. The experiment was repeated
for diferent LSI-dimensions with no significant changes.</p>
        <p>In general, the diferences between the classes of each dataset seem rather small. In
Germeval2018, Hasoc2019, Hasoc2020, Covid2021 and Telegram, the neutral class is most similar
with itself than the others. In Germeval2019, De-reddit-corpus, iHS, and IWG Hatespeech public,
the hate speech class is most similar with itself.</p>
        <p>The low absolute and relative diference between the classes can be interpreted as indicating
high intra-dataset similarity, i.e. a small diference in the marginal distributions of the covariates
in each dataset for each class. A classifier therefore is less likely to simply memorize class-specific
phrases or words.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Word-embedding-based Similarity</title>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. MI-based Word Rankings</title>
        <p>The Mutual Information-based word rankings for both the abusive and the neutral class in each
dataset show which terms can be considered most relevant for each class. They are displayed
in Table 3 in descending order. Unsurprisingly, the majority of datasets rank several diferent
terms that indicate an insult or profanity highly, i.e. idiot, dumm, scheiß, abschaum, schwein,
ferkel, hure, hurensohn and nutte. The latter three terms, together with the term frau, clearly
show misogyny also being a focus in these hate speech datasets. Terms often used in a racist
manner can also be found, like flüchtling or islam.</p>
        <p>We can also conclude that there is a clear temporal shift that one should consider when
generalizing across datasets. Terms like Merkel will clearly be more popular in some time-periods
than others.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Cross-Dataset Topic Model</title>
        <p>20 topic clusters were calculated using CluWords. The topics and two-dimensional projection
of each document can be seen in Figure 3.</p>
        <p>Most topics do not appear to be relevant to hate speech, with the exception of three topics
in the lower right half of the plot. Topic T4 (terroristen, faschisten, moslems, etc.), topic T6
(feministen, terrorgruppen) as well as topic T15 (inhaftierung, abschieberaten, etc.) can be
attributed to hate speech. In addition, there is no clear clustering of datasets to specific topics.
This indicates that the combined hate speech datasets have several diferent topics and no
obvious bias.</p>
      </sec>
      <sec id="sec-5-5">
        <title>5.5. Feature Importance using Shapley Values</title>
        <p>This analysis investigates the most important features of classifiers trained on each dataset,
in descending order. We show the results for Germeval2019 as an example in 4. The y-axis
contains the most important sub-tokens while the x-axis displays the global importance of each
feature. For all classifiers, the feature importance was measured on the same dataset which was
sampled from all hate speech datasets, combined.</p>
        <p>The results show that several classifiers give high weights to the same features. For example,
the word Sklaven is important for the classifiers trained on Covid2021, Germeval2018, Hasoc2019
and Hasoc2020. Another example is the word Schweine, which is important in Hasoc2020, iHS
Covid2021
De-reddit-corpus
Germeval2018
Germeval2019
Hasoc2019
Hasoc2020
iHS
Telegram
IWG Hatespeech public</p>
        <p>MI-based word rankings for the hate speech class
corona, dumm, merkel, mensch, virus,
geben, glauben, anderer, idiot, einfach
einfach, geben, halt, anderer, sehen,
leute, sagen, mensch, finden, eigentlich
merkel, frau, deutsch, deutschland, dumm,
geben, grüne, sehen, deutsche, land
merkel, frau, deutschland, deutsch, dumm,
sehen, land, geben, spd, deutsche
alias, loch, deutschland, papa, merkel,
capitol, land, frau, sagen, sehen
arsch, hurensohn, scheiß, porno, dumm,
deutsch, gratis, frau, ficken, halt
fuck, arsch, scheiße, ficken, nutte,
dumm, idiot, abschaum, hure, einfach
flüchtling, kind, frau, absagen, vergewaltigen,
finden, schwimmbad, menschenwürde, verstoß, sexuell
kind, geben, volk, mensch, deutsch,
deutschland, anderer, bringen, krank, sehen
and Telegram. However, the majority of features that the classifier consider are ranked and
weighted diferently. This shows that, given the same sub-token, each classifier may come to a
diferent conclusion regarding the predicted hate speech label.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>
        In this work, we present a survey of German hate speech datasets and apply the framework
suggested by Wich et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to compare their contents. This analysis sheds some light on the
datasets and how they are similar in some regards, but diverse in topics. Our analysis shows
that the contained topics are quite heterogeneous and thus a cross-dataset classification would
be rather dificult. Although the analysis helps to better understand the datasets, it cannot alone
determine how good the datasets are and how they can lead to better generalizability. Further
experiments on the the performance of classifiers across German datasets are necessary [30].
      </p>
      <p>Further new directions in hate speech detection include the creation of datasets for low
resource languages (e.g. [31]), the analysis of context [32], the generation of counter speech
[33] and the design of interfaces for diverse user groups of such AI systems [34].
approach, Technical Report, Detect Then Act (DTCT) Technical Report 3, 2021. URL:
https://dtct.eu/wp-content/uploads/2021/10/DTCT-TR3-CL.pdf, iSSN 2736-6391.
[25] B. Ross, M. Rist, G. Carbonell, B. Cabrera, N. Kurowsky, M. Wojatzki, Measuring the
Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis, in:
M. Beißwenger, M. Wojatzki, T. Zesch (Eds.), NLP4CMC III: 3rd Workshop on Natural
Language Processing for Computer-Mediated Communication, volume 17, Bochum, 2016,
pp. 6–9.
[26] M. Wich, A. Gorniak, T. Eder, D. Bartmann, B. E. Çakici, G. Groh, Introducing an Abusive
Language Classification Framework for Telegram to Investigate the German Hater
Community, 2021. URL: https://arxiv.org/abs/2109.07346. doi:10.48550/ARXIV.2109.07346.
[27] J. Schäfer, B. Burtenshaw, Ofence in dialogues: A corpus-based study, in: Proceedings of
the International Conference on Recent Advances in Natural Language Processing (RANLP
2019), INCOMA Ltd., Varna, Bulgaria, 2019, pp. 1085–1093. URL: https://aclanthology.org/
R19-1125. doi:10.26615/978-954-452-056-4_125.
[28] T. Mandl, S. Modha, G. K. Shahi, A. K. Jaiswal, D. Nandini, D. Patel, P. Majumder, J. Schäfer,
Overview of the HASOC track at FIRE 2020: Hate speech and ofensive content
identification in Indo-European Languages, in: Working Notes of FIRE 2020 - Forum for Information
Retrieval Evaluation, Hyderabad, India, December 16-20, volume 2826, CEUR-WS.org,
2020, pp. 87–111. URL: http://ceur-ws.org/Vol-2826/T2-1.pdf.
[29] K. Boguslu, J. Schäfer, Annotationsrichtlinien für illegale Hassrede, 2021. URL: https:
//dtct.eu/wp-content/uploads/2021/09/Annotationsrichtlinien_iHS.pdf.
[30] N. Seemann, Y. S. Lee, J. Höllig, M. Geierhos, Generalizability of Abusive Language
Detection Models on Homogeneous German Datasets, Datenbank-Spektrum 23 (2023)
15–25. doi:10.1007/s13222-023-00438-1.
[31] B. R. Chakravarthi, P. K. Kumaresan, R. Sakuntharaj, A. K. Madasamy, S. Thavareesan,
B. Premjith, S. K, S. C. Navaneethakrishnan, J. P. McCrae, T. Mandl, Overview of the
HASOC-DravidianCodeMix Shared Task on Ofensive Language Detection in Tamil and
Malayalam, in: Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation,
Gandhinagar, India, December 13-17., volume 3159 of CEUR Workshop Proceedings,
CEURWS.org, 2021, pp. 589–602. URL: https://ceur-ws.org/Vol-3159/T3-1.pdf.
[32] H. Madhu, S. Satapara, S. Modha, T. Mandl, P. Majumder, Detecting ofensive speech in
conversational code-mixed dialogue on social media: A contextual dataset and benchmark
experiments, Expert Systems with Applications 215 (2023) 119342.
[33] S. S. Tekiroglu, Y. Chung, M. Guerini, Generating Counter Narratives against Online Hate
Speech: Data and Strategies, in: Proceedings of the 58th Annual Meeting of the Association
for Computational Linguistics, ACL , Online, July 5-10, Association for Computational
Linguistics, 2020, pp. 1177–1190. doi:10.18653/v1/2020.acl-main.110.
[34] L. Sontheimer, J. Schäfer, T. Mandl, Enabling Informational Autonomy through
Explanation of Content Moderation: UI Design for Hate Speech Detection, in: Mensch und
Computer 2022-Workshopband, Gesellschaft für Informatik e.V., 2022. doi:10.18420/
MUC2022-MCI-WS12-260.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Steiger</surname>
          </string-name>
          , Digitale Hate Speech: Interdisziplinäre Perspektiven auf Erkennung,
          <source>Beschreibung und Regulation</source>
          , Springer Nature,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>662</fpage>
          -65964-9.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Di Fátima</surname>
          </string-name>
          , Hate Speech on Social Media:
          <string-name>
            <given-names>A Global</given-names>
            <surname>Approach„ LabCom - Comunicação e</surname>
          </string-name>
          Artes - Universidade
          <source>da Beira Interior</source>
          ,
          <year>2023</year>
          . URL: https://labcomca.ubi.pt/ hate
          <article-title>-speech-on-social-media-a-global-approach/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>European</given-names>
            <surname>Commission</surname>
          </string-name>
          ,
          <source>Laws on digital services and markets: European Commission welcomes yes from the European Parliament</source>
          ,
          <year>2022</year>
          . URL: https://ec.europa.eu/commission/ presscorner/detail/en/ip_22_
          <fpage>4313</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <article-title>Tracking hate in social media: Evaluation, challenges and approaches</article-title>
          ,
          <source>SN Computer Science</source>
          <volume>1</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          . doi:
          <volume>10</volume>
          .1007/ s42979-020-0082-0.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Eder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kuwatly</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Groh, Bias and comparison framework for abusive language datasets</article-title>
          ,
          <source>AI and Ethics</source>
          <volume>2</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          . doi:
          <volume>10</volume>
          .1007/s43681-021-00081-0.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Vidgen</surname>
          </string-name>
          , L. Derczynski,
          <article-title>Directions in abusive language training data, a systematic review: Garbage in, garbage out</article-title>
          ,
          <source>Plos one 15</source>
          (
          <year>2020</year>
          )
          <article-title>e0243300</article-title>
          . doi:
          <volume>10</volume>
          .1371/journal. pone.
          <volume>0243300</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <article-title>KI-Verfahren für die Hate Speech Erkennung: Die Gestaltung von Ressourcen für das maschinelle Lernen und ihre Zuverlässigkeit</article-title>
          , in: Digitale Hate Speech, Springer Berlin Heidelberg,
          <year>2023</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>130</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>662</fpage>
          -65964-
          <issue>9</issue>
          _
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Fortuna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Soler</surname>
          </string-name>
          , L. Wanner,
          <article-title>Toxic, hateful, ofensive or abusive? what are we really classifying? an empirical analysis of hate speech datasets</article-title>
          ,
          <source>in: Twelfth Language Resources and Evaluation Conference</source>
          , ELRA, Marseille, France,
          <year>2020</year>
          , pp.
          <fpage>6786</fpage>
          -
          <lpage>6794</lpage>
          . URL: https: //aclanthology.org/
          <year>2020</year>
          .lrec-
          <volume>1</volume>
          .
          <fpage>838</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Nejadgholi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kiritchenko</surname>
          </string-name>
          ,
          <article-title>On cross-dataset generalization in automatic detection of online abuse</article-title>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>2010</year>
          .07414. doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>2010</year>
          .
          <volume>07414</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Röttger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vidgen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Waseem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Margetts</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Pierrehumbert,</surname>
          </string-name>
          <article-title>HateCheck: Functional Tests for Hate Speech Detection Models, in: 59th Annual Meeting of the Association for Computational Linguistics and the 11th</article-title>
          <source>International Joint Conference on Natural Language Processing</source>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>41</fpage>
          -
          <lpage>58</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .
          <article-title>acl-long.4</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zubiaga</surname>
          </string-name>
          ,
          <article-title>Towards generalisable hate speech detection: a review on obstacles and solutions, 2021</article-title>
          . URL: https://arxiv.org/abs/2102.08886. doi:
          <volume>10</volume>
          .48550/ARXIV.2102. 08886.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Deerwester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Dumais</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Furnas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Landauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Harshman</surname>
          </string-name>
          ,
          <article-title>Indexing by latent semantic analysis</article-title>
          ,
          <source>Journal of the American Society for Information Science</source>
          <volume>41</volume>
          (
          <year>1990</year>
          )
          <fpage>391</fpage>
          -
          <lpage>407</lpage>
          . doi:https://doi.org/10.1002/(SICI)
          <fpage>1097</fpage>
          -
          <lpage>4571</lpage>
          (
          <issue>199009</issue>
          )41:
          <fpage>6</fpage>
          &lt;
          <fpage>391</fpage>
          :
          <article-title>: AID-ASI1&gt;3.0</article-title>
          .CO;
          <fpage>2</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://aclanthology.org/ N19-1423. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F.</given-names>
            <surname>Viegas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Canuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gomes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Luiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ribas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rocha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Gonçalves</surname>
          </string-name>
          , Cluwords:
          <article-title>Exploiting semantic word clustering representation for enhanced topic modeling</article-title>
          ,
          <source>in: Twelfth ACM International Conference on Web Search and Data Mining, WSDM '19</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA,
          <year>2019</year>
          , p.
          <fpage>753</fpage>
          -
          <lpage>761</lpage>
          . doi:
          <volume>10</volume>
          .1145/3289600.3291032.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Grave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Enriching word vectors with subword information</article-title>
          ,
          <source>arXiv preprint arXiv:1607.04606</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L. van der</given-names>
            <surname>Maaten</surname>
          </string-name>
          , G. Hinton,
          <article-title>Visualizing Data using t-SNE</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>9</volume>
          (
          <year>2008</year>
          )
          <fpage>2579</fpage>
          -
          <lpage>2605</lpage>
          . URL: http://jmlr.org/papers/v9/vandermaaten08a.html.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K.</given-names>
            <surname>Krippendorf</surname>
          </string-name>
          ,
          <source>Computing Krippendorf's Alpha-Reliability</source>
          ,
          <year>2011</year>
          . URL: https://repository. upenn.edu/asc_papers/43.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A unified approach to interpreting model predictions</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ). URL: https://proceedings.neurips.cc/ paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Räther</surname>
          </string-name>
          , G. Groh,
          <article-title>German abusive language dataset with focus on COVID19</article-title>
          ,
          <source>in: Proceedings of the 17th Conference on Natural Language Processing (KONVENS</source>
          <year>2021</year>
          ),
          <article-title>KONVENS 2021 Organizers</article-title>
          , Düsseldorf, Germany,
          <year>2021</year>
          , pp.
          <fpage>247</fpage>
          -
          <lpage>252</lpage>
          . URL: https: //aclanthology.org/
          <year>2021</year>
          .konvens-
          <volume>1</volume>
          .
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Siegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ruppenhofer</surname>
          </string-name>
          ,
          <article-title>Overview of the GermEval 2018 shared task on the identification of ofensive language</article-title>
          ,
          <source>in: Proceedings of the GermEval 2018 Workshop, 14th Conference on Natural Language Processing KONVENS</source>
          <year>2018</year>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>J. M. Struß</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Siegel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Ruppenhofer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Wiegand</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Klenner, Overview of GermEval Task 2, 2019 shared task on the identification of ofensive language</article-title>
          ,
          <source>in: 15th Conference on Natural Language Processing (KONVENS)</source>
          ,
          <year>Oct</year>
          .
          <volume>9</volume>
          -
          <fpage>11</fpage>
          ,
          <year>2019</year>
          , Erlangen-Nürnberg,
          <article-title>German Society for Computational Linguistics</article-title>
          &amp; Language
          <string-name>
            <surname>Technology</surname>
          </string-name>
          , München [u.a.],
          <year>2019</year>
          , pp.
          <fpage>352</fpage>
          -
          <lpage>363</lpage>
          . URL: https://nbn-resolving.org/urn:nbn:de:bsz:
          <fpage>mh39</fpage>
          -
          <lpage>93197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC track at FIRE 2019: Hate Speech and Ofensive Content Identification in Indo-European Languages), in: Working Notes of the Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE, CEUR-</article-title>
          <string-name>
            <surname>WS</surname>
          </string-name>
          ,
          <year>2019</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2517</volume>
          /
          <fpage>T3</fpage>
          -1.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Kumar</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC Track at FIRE 2020: Hate Speech and Ofensive Language Identification in Tamil, Malayalam, Hindi, English and German, in: Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>FIRE</surname>
          </string-name>
          <year>2020</year>
          , ACM, New York, NY, USA,
          <year>2020</year>
          , p.
          <fpage>29</fpage>
          -
          <lpage>32</lpage>
          . doi:
          <volume>10</volume>
          .1145/3441501.3441517.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>J.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Boguslu</surname>
          </string-name>
          ,
          <article-title>Towards annotating illegal hate speech: A computational linguistic</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>