<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Comparative Survey of German Hate Speech Datasets: Background, Characteristics and Biases</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Markus</forename><surname>Bertram</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Hildesheim</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Johannes</forename><surname>Schäfer</surname></persName>
							<email>johannes.schaefer@uni-hildesheim.de</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Hildesheim</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Thomas</forename><surname>Mandl</surname></persName>
							<email>mandl@uni-hildesheim.de</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Hildesheim</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="institution" key="instit1">LWDA</orgName>
								<orgName type="institution" key="instit2">Lernen, Wissen</orgName>
								<address>
									<addrLine>Daten, Analysen. October 9-11</addrLine>
									<postCode>2023</postCode>
									<settlement>Marburg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Comparative Survey of German Hate Speech Datasets: Background, Characteristics and Biases</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">282D02E2CB9D15045FDCD8EFF1815744</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:20+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Hate speech</term>
					<term>datasets</term>
					<term>reliability</term>
					<term>PMI</term>
					<term>LSI</term>
					<term>Shapley values</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The large fraction of hate speech and other offensive and objectionable content online poses a vast challenge to societies. Offensive language such as insulting, hurtful, derogatory, or obscene content directed from one person to another and open to others undermines objective discussions. Hate speech detection quality depends on the datasets available for training. Potential bias needs to be identified in order to increase the generalization performance of the trained classifiers. This article gives an overview on nine German hate speech datasets. We apply a framework from the literature to gain insights into potential bias. Using different methods, our analysis shows that there are various topics in the different datasets. The results are shown and compared for LSI, Topic models, Mutual Information and Shapley values.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Hate speech and its detection has gotten significant attention in recent years <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. Increasingly, governments are trying to police social media platforms and require the implementation of automatic detection methods of illegal content such as hate speech and disinformation. A notable example of this is the Digital Services Act adopted by the EU parliament in July 2022, with the aim of "protection of users' rights online" <ref type="bibr" target="#b2">[3]</ref>. This shows that hate speech and its detection raises important ethical questions with regards to free speech, protecting users and groups, as well as promoting social good. Hate speech intends to do harm to individuals and groups which motivates private and government actors to discourage it.</p><p>Because of the vast amount of online communication, which makes a manual review of all communication infeasible, there exists a need to automatically filter or detect potential hate speech <ref type="bibr" target="#b3">[4]</ref>. Hate speech detection methods try to automatically predict how likely a given form of online communication contains hate speech, and thereby can assist humans in filtering. To properly train these systems, reliable datasets need to be available. So far, there has been little work regarding the analysis of the quality of datasets and the comparison of datasets in natural language processing (NLP) in general.</p><p>With this paper we address this issue and present a survey of nine German hate speech datasets. In the following sections we review related work (see Section 2) and outline the comparison framework by Wich et al. <ref type="bibr" target="#b4">[5]</ref> in Section 3 as basis for our analysis. In Section 4 we discuss the datasets included in our survey and present the results of the framework analysis in Section 5. Finally, we conclude in Section 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Research in Hate Speech Detection</head><p>The question of the quality of databases has been approached from several angles and there is concern that current datasets do not lead to classifiers with a good level of generalization <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7]</ref>.</p><p>A recent study analyzed six different English language hate speech datasets, with different but related labels like hate speech, offensive, aggression, toxicity etc. <ref type="bibr" target="#b7">[8]</ref>. The authors visualized how similar and compatible classes are within and across the datasets and measured how well each class affects performance of hate speech classifiers. They grouped semantically similar classes and calculated centroids using pre-trained word embedding for each class, which are then used to calculate distances between them.</p><p>Several other works explored hate speech datasets with regards to their biases and characteristics, as well as their generalizability. A study by Nejadgholi and Kiritchenko <ref type="bibr" target="#b8">[9]</ref> explored two different types of bias in hate speech datasets and their effect on cross-dataset generalization: topic bias and task formulation bias. The former is a type of selection bias and was identified using keyword search. The authors showed that some topics are more generalizable than others. The latter bias describes the difference in the definitions of classes between the datasets. The effect of this bias was estimated by training classifiers on different tasks. HATECHECK for English hate speech datasets is a collection of 29 tests, each designed to offer insight into specific weaknesses of hate speech classifiers <ref type="bibr" target="#b9">[10]</ref>. For each test, a manually annotated test set was created. The authors showed that in their setting, models tend to focus on specific terms and not take the context into account. Lastly, Yin and Zubiaga <ref type="bibr" target="#b10">[11]</ref> summarized the cross-dataset performance of English hate speech detection models and provided reasons for why models fail to generalize. They argued that models fail to generalize because of different grammar and vocabulary used in hate speech datasets, too few labeled data, sampling bias, representation biases, i.e. the failure of models to take the language of minority groups into account, as well as models failing to detect implicit hate speech.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Bias and Comparison Framework</head><p>For the analysis of German hate speech collections, we applied the framework introduced by Wich et al. <ref type="bibr" target="#b4">[5]</ref> that can be used to show the biases and characteristics of such datasets. This bias framework can visualize the difference of the probability distributions between and within hate speech datasets. It has been applied to English and Arabic hate speech datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">LSI-based Similarity</head><p>The first approach implemented is based on Latent Semantic Indexing (LSI), presented by Deerwester et al. <ref type="bibr" target="#b11">[12]</ref>, and is a way to visualize the intra-dataset similarity between classes. LSI is a method to find a transformation 𝒳 → 𝒵 that embeds documents in a lower-dimensional latent space based on their semantic similarities. Specifically, for each dataset, all unique words are extracted and a document-term matrix is created. Then, Singular-Value Decomposition with 𝑘 singular values is applied. After that, the datasets are then filtered by classes and each document is transformed into a bag-of-word vector, which is then transformed into the latent space.</p><p>Lastly, the average cosine-similarity between the documents of the same and the other classes are computed. The average cosine-similarity is a measure of how similar the classes are with itself and the other classes and are therefore a measure of intra-dataset similarity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Word-embedding-based Similarity</head><p>This approach focuses on visualizing the intra-and inter-dataset similarity using pre-trained word embeddings. Different to Wich et al. <ref type="bibr" target="#b4">[5]</ref>, we use word embeddings produced by the pretrained gBERT model. Because gBERT does not embed whole sentences directly, the embedding of the first [CLS] token as sentence representation of the document is used <ref type="bibr" target="#b12">[13]</ref>. The word embedding vectors of each dataset are averaged into a centroid which acts as a representation of the entire dataset. Then, Principal Component Analysis (PCA) is performed to project each centroid into a two-dimensional vector space in order to visualize the inter-dataset similarity. In addition to the suggestion in the framework <ref type="bibr" target="#b4">[5]</ref>, we intended to visualize inter-and intra-dataset similarity on a class level. We separated the documents into classes before averaging. This way, a centroid of each class for each dataset is obtained before performing PCA.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">MI-based Word Rankings</head><p>The third method is a ranking of the 10 most relevant terms of the hate speech classes for each dataset. The framework used pointwise-mutual information (PMI) to rank the most relevant terms for each class.</p><p>However, experiments showed that the datasets have a significant amount of terms which only occur in one class. The PMI would then be the same highest value for all the words regardless of how often it occurs. Therefore, instead Mutual Information is used, which is the PMI weighted by the expectation of the joint word-class distribution, i.e. the relative word-class frequency. PMI is more useful, because a word that occurs more often is more relevant to a class than a word that does not.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Cross-Dataset Topic Model</head><p>This approach visualizes the most relevant topics of the hate speech datasets using CluWords <ref type="bibr" target="#b13">[14]</ref>. First, a sample from all hate speech documents of all datasets is taken. Then, a vocabulary 𝒱 of all unique terms in the sample is constructed. For each term 𝑡 in the vocabulary, a word embedding vector is computed. Since this transformation is context-less in contrast to gBERT, German Fasttext word embeddings are used <ref type="bibr" target="#b14">[15]</ref>.</p><p>Lastly, the topics and CluWords are projected into a two-dimensional vector space using t-SNE, introduced by van der Maaten and Hinton <ref type="bibr" target="#b15">[16]</ref>, in order to visualize them. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Inter-rater Reliability</head><p>The next approach focuses on the inter-rater reliability of the dataset annotators. Since not all datasets provide labeling information, this can only be calculated for those that do. The inter-rater reliability is calculated using Krippendorff's alpha <ref type="bibr" target="#b16">[17]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.6.">SHAP Feature Importance</head><p>The last approach is based on SHAP (SHapley Additive exPlanations) <ref type="bibr" target="#b17">[18]</ref>. It is a way to explain the importance of features for different hate speech classifiers. For each hate speech classifier 𝑓 , an explanation model 𝑔 is approximated. 𝑔 uses simplified binary valued inputs 𝑥 ′ that map to the original inputs through a mapping function ℎ, i.e. 𝑥 ≈ ℎ(𝑥 ′ ). 𝑔 then tries to approximate</p><formula xml:id="formula_0">𝑔(𝑥 ′ ) ≈ 𝑓 (ℎ(𝑥 ′ )).</formula><p>SHAP learns the values of the factors 𝜑 𝑖 for each explanation model. Since 𝑔 approximates our hate speech classifier 𝑓 , the higher the value 𝜑 𝑖 for a feature 𝑥 ′ 𝑖 , the more important this feature is to the classifier. Instead of displaying the feature importance plot for a single example, we instead calculate the global feature importance for each classifier using SHAP barplots.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Hate Speech Data Collections</head><p>This section presents the datasets included in our analysis. Each dataset consists of recent real world examples of hate speech in online communication that were written in German. Our goal was to select largest and most recent datasets that can be found for this purpose. An overview is given in Table <ref type="table" target="#tab_0">1</ref>. The datasets are explained in detail in the following sections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Covid2021</head><p>The first dataset contains German tweets collected from Twitter with COVID-19 as topic, published in 2021 <ref type="bibr" target="#b18">[19]</ref>. The tweets were sampled from an annotation pool that is comprised in equal parts from three other pools: a replies pool, a community pool and a topic pool.</p><p>The replies pool was sampled from replies to posts published between 01.01.2020 and 20.02.2021 by three Twitter seed accounts that were identified as being influential and spreading COVID-19 misinformation. Only the tweets that contained one of 65 COVID-19 related keywords were considered for this purpose. The community pool then was fed from tweets sampled from the timeline of the accounts that replied to the seed accounts. The topic pool was sampled from tweets related to COVID-19 and hate speech. Lastly, tweets were sampled from the annotation pool and labeled by three annotators using a binary labeling scheme. A tweet was labeled ABUSIVE if it contains attacks or threats, insults, harassment, hate as well as degradation. Tweets were labeled NEUTRAL if otherwise. In total, 4,960 tweets were labeled, of which 1,105 were classified as abusive, and 3,855 as neutral. The Krippendorff's alpha is 91.5%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Germeval2018</head><p>GermEval Shared Task on the Identification of Offensive Language, in short Germeval2018 <ref type="bibr" target="#b19">[20]</ref> consists of German tweets collected from Twitter. Specifically, tweets were sampled from the timeline of around 100 different users, each of which was selected because they posted both offensive, as well as non-offensive tweets. In total, 8,541 tweets were sampled and then manually annotated by one of three annotators using two different labeling schema, coarse-grained and fine-grained.</p><p>Coarse-grained is a binary classification scheme that labels a tweet as OFFENSE if it includes abusive language, insults or profanity, and NEUTRAL if not. Because the tweets were sampled around the time of the so called refugee crisis in Germany, the dataset creators noticed that certain non-offensive words had a high-frequency in the documents labeled as the hate speech class, but that did not appear in the non-hate speech class. Therefore, in order to debias the datasets, they further added non-hate speech tweets containing these words. Lastly, they split the dataset into a training and test set. The tweets sampled from each user only appear in one of the sets. In total, 2,890 tweets were labeled as abusive and 5,651 as neutral. It has a Krippendorff's alpha of 78%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">De-Reddit-corpus</head><p>De-Reddit-corpus was built by the authors of this paper containing posts from the German /r/de subreddit from the reddit.com website. In total the corpus comprises 2,992,835 comments from 272,661 submissions created in 2019 or earlier. The comments were pseudo-labeled using a CNN model with word embeddings described by Schäfer and Burtenshaw <ref type="bibr" target="#b26">[27]</ref>. The model was trained on Germeval2018 where it achieved an F1-score of 73.35%. Each comment was assigned a binary pseudo-label as well as the predicted label probability. While this dataset provides a significant number of examples of the phenomenon and can be useful for analyses, we do not recommend using it as a training dataset due to a lack of manual annotation supervision.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Germeval2019</head><p>GermEval Shared Task 2 on the Identification of Offensive Language from 2019, in short Ger-meval2019 <ref type="bibr" target="#b20">[21]</ref> also consists of a training and a test set. The training set of Germeval2019 consists primarily of tweets from the training and test set of Germeval2018, as well as some newly sampled tweets. The test set consists of entirely newly sampled tweets using the same method illustrated in Germeval2018. In addition, this time a specific effort was made to include tweets from users across the whole political spectrum.</p><p>The tweets were then manually annotated using the same labeling scheme as in Germeval2018. In total, Germeval2019 consists of 9,862 labeled tweets, 5,103 of which are labeled as abusive and the other 4,759 as neutral. The Cohen's kappa inter-rater reliability is 𝜅 = 59%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5.">Hasoc2019</head><p>The fifth dataset is Hate Speech and Offensive Content Identification in IndoEuropean Languages, in short Hasoc2019 <ref type="bibr" target="#b21">[22]</ref>. The German subset of this dataset is used. The posts included in Hasoc2019 were sampled from Facebook and Twitter. The posts were manually labeled using a binary as well as a fine-grained labeling scheme. The binary labeling scheme annotates a post as either hate speech/offensive (HOF) or as non-hate speech (NOT). Here, hate speech is defined as posts containing Hate, offensive words, aggression, or profanity. In total, 4,699 posts were annotated, 543 of which were labeled as abusive, 4,126 as neutral, with a Cohen's kappa inter-rater agreement of 𝜅 = 88%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.6.">Hasoc2020</head><p>The sixth dataset is the German Hate Speech and Offensive Content Identification in IndoEuropean Languages dataset from 2020, Hasoc2020 <ref type="bibr" target="#b27">[28]</ref>. It consists of tweets sampled from a collection of tweets created in May 2019. First, non-German tweets were filtered using the language attribute metadata provided by Twitter. Then, the tweets were sampled using a Support Vector Machine (SVM) hate speech classifier trained on Germeval2018 and Hasoc2019. The classifier was trained in such a way that it achieves an F1-score of around 0.5. All tweets that were labeled hateful by the classifier were included in the sample. In addition, 5% of tweets that the classifier did not label as hateful were also included.</p><p>For the binary hate speech classification task, tweets were labeled as HOF when they contained hate, offensive or profane content, and NOT when otherwise. All tweets in the dataset were manually labeled twice by two different annotators. In cases when the two annotators disagreed on a label, a third annotator who did not yet see the tweet assigned the label. 3,400 tweets were labeled in total, with 973 assigned to abusive and 2,427 to neutral. Cohen's kappa inter-rater agreement is 𝜅 = 83.3%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.7.">iHS</head><p>iHS is an unpublished dataset of potentially illegal hate speech <ref type="bibr" target="#b23">[24]</ref>. The creation of this dataset consisted of two steps. First, court cases were collected in which German social media posts were identified to be violating certain laws associated with hate speech. Then, using these posts as examples, 102 tweets from Germeval2019 were manually extracted that were deemed to potentially violate German law in order to create manual annotation guidelines <ref type="bibr" target="#b28">[29]</ref>.</p><p>Lastly, text posts from Twitter were annotated using these guidelines in several annotation rounds. Tweets were assigned to one of six categories: Public incitement to commit offences, Incitement of masses, malicious gossip and defamation, insults, offensive language and other. The offensive language category contains 214 tweets that are not illegal but still deemed hateful. The remaining 747 tweets are labeled as other. The Fleiss kappa inter-rater agreement ranged between 𝜅 = 44% and 𝜅 = 55%.</p><p>For the purpose of binary hate speech classification, the first five categories are considered as abusive, the other category as neutral. In addition to the 1,249 labeled tweets, 275,022 unlabeled tweets from were also available.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.8.">IWG Hatespeech public</head><p>IWG Hatespeech public <ref type="bibr" target="#b24">[25]</ref> contains German hate speech in the context of the refugee crisis in Europe. This dataset consists of tweets from Twitter written in 2016. They were sampled using keyword search with 10 different hashtags that were considered to likely contain a disproportionate amount of hate speech. After filtering these, the tweets were manually annotated by splitting the dataset into six parts, each of which was annotated by two of six annotators. They used a binary labeling scheme of labeling a tweet as hate speech if it violates the Twitter definition on hateful conduct, and neutral if not. In total, the dataset contains 469 tweets, 110 abusive and 359 neutral. The Krippendorff's alpha is 38.29%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.9.">Telegram</head><p>The last dataset that is analyzed is referred to as Telegram <ref type="bibr" target="#b25">[26]</ref>. It contains messages from German Telegram channels that were posted between 01.01.2019 and 15.03.2021.</p><p>The authors used a snowball sampling strategy. Specifically, they first collected messages from 51 public seed channels known to spread hate. Using these as starting points, they then collected all messages from channels that were mentioned by them or forwarded to them from other channels. This procedure was repeated once again for the newly acquired messages. To filter out languages other than German, the message texts were fed into a classifier using multilingual word vectors from fastText <ref type="bibr" target="#b14">[15]</ref>. The messages were manually labeled as abusive and neutral by five annotators using the same labeling scheme as the covid2021 dataset <ref type="bibr" target="#b18">[19]</ref>. In total, 1,149 messages were labeled, of which 181 were abusive and 968 neutral. The Krippendorff's alpha is 73.87%. In addition, the unlabeled dataset was also provided to me. It consists of 5,421,845 Telegram messages that were pseudo-labeled by a hate speech classifier <ref type="bibr" target="#b25">[26]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Experiment Results</head><p>We apply the framework by Wich et al. <ref type="bibr" target="#b4">[5]</ref> as described in Section 3 on the datasets discussed in the previous section. In this section, we present and discuss the results of this analysis. We provide the code used for this research on GitHub<ref type="foot" target="#foot_0">1</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">LSI-based Similarity</head><p>We now examine the results of the LSI-based intra-dataset similarity experiment. datasets using 16 LSI-dimensions. The left column are the respective datasets, the top row indicates the direction of the LSI similarity between the classes. The experiment was repeated for different LSI-dimensions with no significant changes.</p><p>In general, the differences between the classes of each dataset seem rather small. In Ger-meval2018, Hasoc2019, Hasoc2020, Covid2021 and Telegram, the neutral class is most similar with itself than the others. In Germeval2019, De-reddit-corpus, iHS, and IWG Hatespeech public, the hate speech class is most similar with itself.</p><p>The low absolute and relative difference between the classes can be interpreted as indicating high intra-dataset similarity, i.e. a small difference in the marginal distributions of the covariates in each dataset for each class. A classifier therefore is less likely to simply memorize class-specific phrases or words.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Word-embedding-based Similarity</head><p>Figure <ref type="figure" target="#fig_0">1</ref> shows the two-dimensional PCA projection of the word embedding centroids of each dataset. For this, all classes of each dataset were combined. Interestingly, Germeval2019 is closer to De-reddit-corpus than to Germeval2018. This is surprising because Germeval2019 is comprised in large parts of tweets from Germeval2018.</p><p>Figure <ref type="figure" target="#fig_1">2</ref> depicts the two-dimensional projection of each individual class. At least in this projection, there is no clear abusive or neutral cluster. Therefore, this experiment is an indication that the cluster assumption may not hold true. However, we have to keep in mind that we are projecting a 768-dimensional vector space into two dimensions which limits the interpretability of the result.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">MI-based Word Rankings</head><p>The Mutual Information-based word rankings for both the abusive and the neutral class in each dataset show which terms can be considered most relevant for each class. They are displayed in Table <ref type="table">3</ref> in descending order. Unsurprisingly, the majority of datasets rank several different terms that indicate an insult or profanity highly, i.e. idiot, dumm, scheiß, abschaum, schwein, ferkel, hure, hurensohn and nutte. The latter three terms, together with the term frau, clearly show misogyny also being a focus in these hate speech datasets. Terms often used in a racist manner can also be found, like flüchtling or islam.</p><p>We can also conclude that there is a clear temporal shift that one should consider when generalizing across datasets. Terms like Merkel will clearly be more popular in some time-periods than others.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4.">Cross-Dataset Topic Model</head><p>20 topic clusters were calculated using CluWords. The topics and two-dimensional projection of each document can be seen in Figure <ref type="figure" target="#fig_2">3</ref>.</p><p>Most topics do not appear to be relevant to hate speech, with the exception of three topics in the lower right half of the plot. Topic T4 (terroristen, faschisten, moslems, etc.), topic T6 (feministen, terrorgruppen) as well as topic T15 (inhaftierung, abschieberaten, etc.) can be attributed to hate speech. In addition, there is no clear clustering of datasets to specific topics. This indicates that the combined hate speech datasets have several different topics and no obvious bias.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.5.">Feature Importance using Shapley Values</head><p>This analysis investigates the most important features of classifiers trained on each dataset, in descending order. We show the results for Germeval2019 as an example in 4. The y-axis contains the most important sub-tokens while the x-axis displays the global importance of each feature. For all classifiers, the feature importance was measured on the same dataset which was sampled from all hate speech datasets, combined.</p><p>The results show that several classifiers give high weights to the same features. For example, the word Sklaven is important for the classifiers trained on Covid2021, Germeval2018, Hasoc2019 and Hasoc2020. Another example is the word Schweine, which is important in Hasoc2020, iHS</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>In this work, we present a survey of German hate speech datasets and apply the framework suggested by Wich et al. <ref type="bibr" target="#b4">[5]</ref> to compare their contents. This analysis sheds some light on the datasets and how they are similar in some regards, but diverse in topics. Our analysis shows that the contained topics are quite heterogeneous and thus a cross-dataset classification would be rather difficult. Although the analysis helps to better understand the datasets, it cannot alone determine how good the datasets are and how they can lead to better generalizability. Further experiments on the the performance of classifiers across German datasets are necessary <ref type="bibr" target="#b29">[30]</ref>.</p><p>Further new directions in hate speech detection include the creation of datasets for low resource languages (e.g. <ref type="bibr" target="#b30">[31]</ref>), the analysis of context <ref type="bibr" target="#b31">[32]</ref>, the generation of counter speech <ref type="bibr" target="#b32">[33]</ref> and the design of interfaces for diverse user groups of such AI systems <ref type="bibr" target="#b33">[34]</ref>. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Word-embedding-based dataset similarity.</figDesc><graphic coords="9,131.83,84.19,374.16,355.32" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Word-embedding-based class similarity.</figDesc><graphic coords="10,128.71,84.19,377.28,355.32" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Topic model for the hate speech classes.</figDesc><graphic coords="12,89.29,84.19,455.95,339.69" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="13,121.98,84.19,384.00,240.60" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Overview of German hate speech datasets.</figDesc><table><row><cell>Dataset name</cell><cell cols="2">Reference Source</cell><cell># of labeled samples</cell><cell># of unlabeled samples</cell><cell>abusive % of labeled data</cell><cell>Inter-rater agreement</cell></row><row><cell>Covid2021</cell><cell>[19]</cell><cell>Twitter</cell><cell>4,960</cell><cell>0</cell><cell cols="2">22.28% 𝛼 = 91.50%</cell></row><row><cell cols="2">De-reddit-corpus Unpub.</cell><cell>Reddit</cell><cell>0</cell><cell>2,992,835</cell><cell cols="2">--</cell></row><row><cell>Germeval2018</cell><cell>[20]</cell><cell>Twitter</cell><cell>8,541</cell><cell>0</cell><cell cols="2">33.84% 𝛼 = 78%</cell></row><row><cell>Germeval2019</cell><cell>[21]</cell><cell>Twitter</cell><cell>9,862</cell><cell>0</cell><cell cols="2">51.74% 𝜅 = 59%</cell></row><row><cell>Hasoc2019</cell><cell>[22]</cell><cell>Facebook, Twitter</cell><cell>4,669</cell><cell>0</cell><cell cols="2">11.63% 𝜅 = 88%</cell></row><row><cell>Hasoc2020</cell><cell>[23]</cell><cell>Twitter</cell><cell>3,400</cell><cell>0</cell><cell cols="2">28.62% 𝜅 = 83.3%</cell></row><row><cell>iHS</cell><cell>[24]</cell><cell>Twitter</cell><cell>1,249</cell><cell>275,022</cell><cell cols="2">40.19% 𝜅 = 44%, 55%</cell></row><row><cell>IWG Hate. pub.</cell><cell>[25]</cell><cell>Twitter</cell><cell>469</cell><cell>0</cell><cell cols="2">23.45% 𝛼 = 38.29%</cell></row><row><cell>Telegram</cell><cell>[26]</cell><cell>Telegram</cell><cell>1,149</cell><cell>5,421,845</cell><cell cols="2">15.75% 𝛼 = 73.87%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>Table 2 shows the similarity values of the binary hate speech classes within each of the nine hate speech</figDesc><table><row><cell>Dataset</cell><cell cols="3">abusive → abusive abusive → neutral, neutral → neutral</cell></row><row><cell></cell><cell cols="2">neutral → abusive</cell><cell></cell></row><row><cell>Covid2021</cell><cell>0.70</cell><cell>0.71</cell><cell>0.72</cell></row><row><cell>De-reddit-corpus</cell><cell>0.29</cell><cell>0.26</cell><cell>0.24</cell></row><row><cell>Germeval2018</cell><cell>0.39</cell><cell>0.41</cell><cell>0.44</cell></row><row><cell>Germeval2019</cell><cell>0.41</cell><cell>0.40</cell><cell>0.36</cell></row><row><cell>Hasoc2019</cell><cell>0.53</cell><cell>0.57</cell><cell>0.61</cell></row><row><cell>Hasoc2020</cell><cell>0.48</cell><cell>0.50</cell><cell>0.56</cell></row><row><cell>iHS</cell><cell>0.47</cell><cell>0.49</cell><cell>0.51</cell></row><row><cell>IWG Hatespeech public</cell><cell>0.28</cell><cell>0.17</cell><cell>0.21</cell></row><row><cell>Telegram</cell><cell>0.34</cell><cell>0.37</cell><cell>0.44</cell></row><row><cell>Table 2</cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">LSI-based intra-dataset class similarity with 16 dimensions.</cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://github.com/MarkusBertram/Cross-Dataset-Generalization-of-German-Hate-Speech-Datasets</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Digitale Hate Speech: Interdisziplinäre Perspektiven auf Erkennung, Beschreibung und Regulation</title>
		<author>
			<persName><forename type="first">S</forename><surname>Jaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Steiger</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-662-65964-9</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
			<publisher>Springer Nature</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">Di</forename><surname>Fátima</surname></persName>
		</author>
		<ptr target="https://labcomca.ubi.pt/hate-speech-on-social-media-a-global-approach/" />
		<title level="m">Hate Speech on Social Media: A Global Approach&quot; LabCom -Comunicação e Artes -Universidade da Beira Interior</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<ptr target="https://ec.europa.eu/commission/presscorner/detail/en/ip_22_4313" />
		<title level="m">Laws on digital services and markets: European Commission welcomes yes from the European Parliament</title>
				<imprint>
			<publisher>European Commission</publisher>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Tracking hate in social media: Evaluation, challenges and approaches</title>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Majumder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Patel</surname></persName>
		</author>
		<idno type="DOI">10.1007/s42979-020-0082-0</idno>
	</analytic>
	<monogr>
		<title level="j">SN Computer Science</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1" to="16" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Bias and comparison framework for abusive Figure 4: Global feature importance of the Germeval2019 dataset. language datasets</title>
		<author>
			<persName><forename type="first">M</forename><surname>Wich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Eder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kuwatly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Groh</surname></persName>
		</author>
		<idno type="DOI">10.1007/s43681-021-00081-0</idno>
	</analytic>
	<monogr>
		<title level="j">AI and Ethics</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="1" to="23" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Directions in abusive language training data, a systematic review: Garbage in, garbage out</title>
		<author>
			<persName><forename type="first">B</forename><surname>Vidgen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Derczynski</surname></persName>
		</author>
		<idno type="DOI">10.1371/journal.pone.0243300</idno>
	</analytic>
	<monogr>
		<title level="j">Plos one</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page">e0243300</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">KI-Verfahren für die Hate Speech Erkennung: Die Gestaltung von Ressourcen für das maschinelle Lernen und ihre Zuverlässigkeit</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-662-65964-9_6</idno>
	</analytic>
	<monogr>
		<title level="m">Digitale Hate Speech</title>
				<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="111" to="130" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Toxic, hateful, offensive or abusive? what are we really classifying? an empirical analysis of hate speech datasets</title>
		<author>
			<persName><forename type="first">P</forename><surname>Fortuna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Soler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wanner</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2020.lrec-1.838" />
	</analytic>
	<monogr>
		<title level="m">Twelfth Language Resources and Evaluation Conference</title>
				<meeting><address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="6786" to="6794" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">On cross-dataset generalization in automatic detection of online abuse</title>
		<author>
			<persName><forename type="first">I</forename><surname>Nejadgholi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kiritchenko</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.2010.07414</idno>
		<ptr target="https://arxiv.org/abs/2010.07414.doi:10.48550/ARXIV.2010.07414" />
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">HateCheck: Functional Tests for Hate Speech Detection Models</title>
		<author>
			<persName><forename type="first">P</forename><surname>Röttger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Vidgen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Waseem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Margetts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pierrehumbert</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.acl-long.4</idno>
	</analytic>
	<monogr>
		<title level="m">59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="41" to="58" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Towards generalisable hate speech detection: a review on obstacles and solutions</title>
		<author>
			<persName><forename type="first">W</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zubiaga</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.2102.08886</idno>
		<ptr target="https://arxiv.org/abs/2102.08886.doi:10.48550/ARXIV.2102.08886" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Indexing by latent semantic analysis</title>
		<author>
			<persName><forename type="first">S</forename><surname>Deerwester</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">T</forename><surname>Dumais</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">W</forename><surname>Furnas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">K</forename><surname>Landauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Harshman</surname></persName>
		</author>
		<idno type="DOI">10.1002/(SICI)1097-4571(199009)41:6&lt;391::AID-ASI1&gt;3.0.CO;2-9</idno>
		<idno>AID-ASI1&gt;3.0.CO;2-9</idno>
		<ptr target="https://doi.org/10.1002/" />
	</analytic>
	<monogr>
		<title level="j">Journal of the American Society for Information Science</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="page" from="391" to="407" />
			<date type="published" when="1990">1990</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1423</idno>
		<ptr target="https://aclanthology.org/N19-1423.doi:10.18653/v1/N19-1423" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Cluwords: Exploiting semantic word clustering representation for enhanced topic modeling</title>
		<author>
			<persName><forename type="first">F</forename><surname>Viegas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Canuto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gomes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Luiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ribas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rocha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Gonçalves</surname></persName>
		</author>
		<idno type="DOI">10.1145/3289600.3291032</idno>
	</analytic>
	<monogr>
		<title level="m">Twelfth ACM International Conference on Web Search and Data Mining, WSDM &apos;19</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="753" to="761" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1607.04606</idno>
		<title level="m">Enriching word vectors with subword information</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Visualizing Data using t-SNE</title>
		<author>
			<persName><forename type="first">L</forename><surname>Van Der Maaten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Hinton</surname></persName>
		</author>
		<ptr target="http://jmlr.org/papers/v9/vandermaaten08a.html" />
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="2579" to="2605" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Computing Krippendorff&apos;s Alpha-Reliability</title>
		<author>
			<persName><forename type="first">K</forename><surname>Krippendorff</surname></persName>
		</author>
		<ptr target="https://repository.upenn.edu/asc_papers/43" />
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">A unified approach to interpreting model predictions</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Lundberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-I</forename><surname>Lee</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf" />
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">German abusive language dataset with focus on COVID-19</title>
		<author>
			<persName><forename type="first">M</forename><surname>Wich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Räther</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Groh</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2021.konvens-1.26" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021), KONVENS 2021 Organizers</title>
				<meeting>the 17th Conference on Natural Language Processing (KONVENS 2021), KONVENS 2021 Organizers<address><addrLine>Düsseldorf, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="247" to="252" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Overview of the GermEval 2018 shared task on the identification of offensive language</title>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Siegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ruppenhofer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the GermEval 2018 Workshop, 14th Conference on Natural Language Processing KONVENS 2018</title>
				<meeting>the GermEval 2018 Workshop, 14th Conference on Natural Language Processing KONVENS 2018</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Overview of GermEval Task 2, 2019 shared task on the identification of offensive language</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Struß</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Siegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ruppenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Klenner</surname></persName>
		</author>
		<ptr target="https://nbn-resolving.org/urn:nbn:de:bsz:mh39-93197" />
	</analytic>
	<monogr>
		<title level="m">German Society for Computational Linguistics &amp; Language Technology</title>
				<meeting><address><addrLine>Erlangen-Nürnberg; München</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019-09-11">Oct. 9 -11, 2019. 2019</date>
			<biblScope unit="page" from="352" to="363" />
		</imprint>
	</monogr>
	<note>15th Conference on Natural Language Processing (KONVENS). u.a</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo-European Languages)</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Majumder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Patel</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/Vol-2517/T3-1.pdf" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of the Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE, CEUR-WS</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<idno type="DOI">10.1145/3441501.3441517</idno>
	</analytic>
	<monogr>
		<title level="m">Forum for Information Retrieval Evaluation, FIRE 2020</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="29" to="32" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Towards annotating illegal hate speech: A computational linguistic approach</title>
		<author>
			<persName><forename type="first">J</forename><surname>Schäfer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Boguslu</surname></persName>
		</author>
		<ptr target="iSSN2736-6391" />
	</analytic>
	<monogr>
		<title level="m">Detect Then Act (DTCT)</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis</title>
		<author>
			<persName><forename type="first">B</forename><surname>Ross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rist</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Carbonell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Cabrera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kurowsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wojatzki</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">NLP4CMC III: 3rd Workshop on Natural Language Processing for Computer-Mediated Communication</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Beißwenger</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Wojatzki</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Zesch</surname></persName>
		</editor>
		<meeting><address><addrLine>Bochum</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="page" from="6" to="9" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Introducing an Abusive Language Classification Framework for Telegram to Investigate the German Hater Community</title>
		<author>
			<persName><forename type="first">M</forename><surname>Wich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gorniak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Eder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bartmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">E</forename><surname>Çakici</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Groh</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.2109.07346</idno>
		<ptr target="https://arxiv.org/abs/2109.07346.doi:10.48550/ARXIV.2109.07346" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Offence in dialogues: A corpus-based study</title>
		<author>
			<persName><forename type="first">J</forename><surname>Schäfer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Burtenshaw</surname></persName>
		</author>
		<idno type="DOI">10.26615/978-954-452-056-4_125</idno>
		<ptr target="https://aclanthology.org/R19-1125.doi:10.26615/978-954-452-056-4_125" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)</title>
				<meeting>the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)<address><addrLine>Varna, Bulgaria</addrLine></address></meeting>
		<imprint>
			<publisher>INCOMA Ltd</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1085" to="1093" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Overview of the HASOC track at FIRE 2020: Hate speech and offensive content identification in Indo-European Languages</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Shahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Jaiswal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nandini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Patel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Majumder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schäfer</surname></persName>
		</author>
		<idno>CEUR-WS.org</idno>
		<ptr target="http://ceur-ws.org/Vol-2826/T2-1.pdf" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of FIRE 2020 -Forum for Information Retrieval Evaluation</title>
				<meeting><address><addrLine>Hyderabad, India</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">December 16-20,. 2020</date>
			<biblScope unit="volume">2826</biblScope>
			<biblScope unit="page" from="87" to="111" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Boguslu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schäfer</surname></persName>
		</author>
		<ptr target="https://dtct.eu/wp-content/uploads/2021/09/Annotationsrichtlinien_iHS.pdf" />
		<title level="m">Annotationsrichtlinien für illegale Hassrede</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Generalizability of Abusive Language Detection Models on Homogeneous German Datasets</title>
		<author>
			<persName><forename type="first">N</forename><surname>Seemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">S</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Höllig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Geierhos</surname></persName>
		</author>
		<idno type="DOI">10.1007/s13222-023-00438-1</idno>
	</analytic>
	<monogr>
		<title level="j">Datenbank-Spektrum</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page" from="15" to="25" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Overview of the HASOC-DravidianCodeMix Shared Task on Offensive Language Detection in Tamil and Malayalam</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Kumaresan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sakuntharaj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Madasamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Thavareesan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Premjith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">C</forename><surname>Navaneethakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-3159/T3-1.pdf" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of FIRE 2021 -Forum for Information Retrieval Evaluation</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting><address><addrLine>Gandhinagar, India</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">December 13-17. 2021</date>
			<biblScope unit="volume">3159</biblScope>
			<biblScope unit="page" from="589" to="602" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments</title>
		<author>
			<persName><forename type="first">H</forename><surname>Madhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Satapara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Majumder</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Expert Systems with Applications</title>
		<imprint>
			<biblScope unit="volume">215</biblScope>
			<biblScope unit="page">119342</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Generating Counter Narratives against Online Hate Speech: Data and Strategies</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Tekiroglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Guerini</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.110</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL , Online</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, ACL , Online</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2020-10">July 5-10. 2020</date>
			<biblScope unit="page" from="1177" to="1190" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Enabling Informational Autonomy through Explanation of Content Moderation: UI Design for Hate Speech Detection</title>
		<author>
			<persName><forename type="first">L</forename><surname>Sontheimer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schäfer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<idno type="DOI">10.18420/MUC2022-MCI-WS12-260</idno>
	</analytic>
	<monogr>
		<title level="m">Mensch und Computer 2022-Workshopband</title>
				<imprint>
			<publisher>Gesellschaft für Informatik e.V</publisher>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
