-

Lang. NOT HOF HATE OFFN PRFN Total English

Overview of the HASOC track at FIRE 2019: Hate Speech and O ensive Content Identi cation in Indo-European Languages

Prasenjit Majumder

prasenjit.majumder@gmail.com 0

Daksh Patel

dakshpatel68@gmail.com 1 0 DA-IICT , Gandhinagar , India 1 LDRP-ITR , Gandhinagar , India 2 University of Hildesheim , Germany

3591

2261

The identi cation of Hate Speech in Social Media has received much attention in research recently. There is a particular demand for research for languages other than English. The rst edition of the HASOC track creates resources for Hate Speech Identi cation in Hindi, German, and English. Three datasets were developed from Twitter, and Facebook and made available. HASOC intends to stimulate research and development for Hate Speech classi cation for di erent languages. The datasets allow the development and testing of supervised machine learning systems. Binary classi cation and more ne-grained sub-classes were o ered in 3 sub tasks. For all sub-tasks, 321 experiments were submitted. For the classi cation task, models based on deep learning methods have proved to be adequate. The approaches used most often were Long-ShortTerm memory (LSTM) networks with distributed word representation of the text. The performance of the best system for identi cation of Hate Speech for English, Hindi, and German was a Marco-F1 score of 0.78, 0.81, and 0.61, respectively. This overview provides details insights and analyzes the results.

Hate Speech Text Classi cation Evaluation Deep Learning

The large fraction of Hate Speech and other o ensive and objectionable content online poses a huge challenge to societies. O ensive language such as insulting, hurtful, derogatory, or obscene content directed from one person to another person and open for others undermines objective discussions. There is a growing need for research on the classi cation of Hate Speech into di erent categories of o ensive content on di erent platforms of social media without human assistance.

In October 2019, the European Court of Justice decided that platforms need to take down content worldwide even after national decisions. In a particular case, the EU court debated defamatory posts on Facebook. Even posts similar in tone need to be addressed and the ruling explicitly mentions automatic systems. This shows that automatic systems are of high social relevance. Recently, also the founder of Facebook proposed ideas for the regulation of the Internet. He demanded standards and baselines for the de nition of harmful content. Such clear de nitions have not been provided and are unlikely to be developed in the near future. This makes research and annotated corpora even more necessary.

The identi cation of Hate Speech within a collection or a stream of tweets is a challenging task because systems cannot rely on the text content. Based on content, text classi cation systems have been successful. However, Hate text might have many issues. Hate often has no clear signal words, and word lists, as in sentiment analysis, are expected to work less well.

In order to contribute to this research, w this overview paper presents the 1st edition of HASOC Hate Speech and O ensive Content Identi cation in IndoEuropean Languages, namely: German, English, and Hindi. The dataset for all three languages was created from Twitter and Facebook. HASOC consists of three tasks, a coarse-grained binary classi cation task, and two ne-grained multi-class classi cations. Of course, freedom of speech needs to be guaranteed in democratic societies for future development. Nevertheless, the o ensive text which hurts others' sentiments needs to be restricted. As there is such an increase in the usage of abuse on many internet platforms, technological support for the recognition of such posts is necessary. The use of supervised learning with the annotated dataset is a key strategy for advancing such systems. There has been signi cant work in several languages in particular for English. However, there is a lack of research on this recent and relevant topic for most other languages. This track intends to develop data and evaluation resources for several languages. The objectives are to stimulate research for these languages and to nd out the quality of hate speech detection technology in other languages.

The HASOC dataset provides several thousands labeled social media posts for each language. The entire dataset was annotated and checked by the organizers of the track. The annotation architecture is designed to create data for 3 di erent sub tasks. 1. SUB-TASK A: classi cation of Hate Speech (HOF) and non-o ensive content. 2. SUB-TASK B: If the post is HOF, sub-task B is used to identify the type of hate. 3. SUB-TASK C: it decides the target of the post.

Hate Speech detection is of great signi cance and attracting many researchers. Recent overview papers provide a good introduction to the scienti c issues that are involved in Hate Speech identi cation [ 12,36 ].

https://www.nytimes.com/2019/10/03/technology/facebook-europe.html https://www.faz.net/aktuell/wirtschaft/diginomics/facebook-ceo-zuckerberg-ideasto-regulate-the-internet-16116032.html 2

Related Forum and Dataset

Collections are an important asset for any supervised classi cation methods. For Hate Speech, several previous initiatives have created corpora that have been used for research. There has been signi cant work in several languages, in particular for English. However, for other languages, such as Hindi standard datasets are not available and HASOC is an attempt to create the labeled dataset for such low resource language. HASOC is primarily inspired by two previous evaluation forums, GermEval [ 44 ], and O ensEval [ 47 ], and tries to leverage the synergies of these initiatives.

Data sampling is a paramount task for any data challenges competition. Some of the corpora focuses in speci c on certain targets, like immigrants, women (HateEval) [ 5 ]or racism (e.g. [ 39 ]). Others focus on Hate Speech in general (e.g. HaSpeeDe [ 7 ]) or other unacceptable text types. A recent trend is to introduce a more ne-grained classi cation. Some data challenges require detailed analysis for the hateful comments, like detection of the target (HateEval and O ensEval) or the type of Hate Speech (GermEval). Others focus on the severity of the comment (Kaggle Toxic [ 1 ]). A recent and very interesting collection is CONAN. It o ers Hate Speech and the reactions to it [ 9 ]. This could open opportunities for detecting Hate Speech by analyzing it jointly with the following posts. Table 1 summarize standard Hate speech dataset available at various forum.

There is a huge demand for many languages other than English. HASOC is the rst shared task which developed a resource for three languages together and which encourages multilingual research. 3

Task Description

HASOC and most other collections provide the text of a post and require systems to detect hateful content. No context or meta-data like time related features or the network of the actors are given which might make these tasks somewhat unrealistic. Platforms can obviously use all meta-data of a post and a user. However, the distribution of such data poses legal issues. The following tasks have been proposed in HASOC 2019: Sub-task A : Sub-task A focuses on Hate speech and O ensive language identi cation and is o ered for English, German, Hindi. Sub-task A is coarse-grained binary classi cation in which participating system are required to classify tweets into two class, namely: Hate and O ensive (HOF) and Non- Hate and o ensive. 1. (NOT) Non Hate-O ensive - This post does not contain any Hate speech, o ensive content. 2. (HOF) Hate and O ensive - This post contains Hate, o ensive, and profane content.

During our annotation, we labeled posts as HOF in case they contained any form of non-acceptable language such as hate speech, aggression, profanity; otherwise they were labeled as NOT. Sub-task B : Sub-task B represents a ne-grained classi cation. Hate-speech and o ensive posts from the sub-task A are further classi ed into three categories. 1. (HATE) Hate speech: Posts contain Hate speech content. 2. (OFFN) O ensive: Posts contain o ensive content. 3. (PRFN) Profane: These posts contain profane words HATE SPEECH : Describing negative attributes or de ciencies to groups of individuals because they are members of a group (e.g. all poor people are stupid). Hateful comment toward groups because of race, political opinion, sexual orientation, gender, social status, health condition or similar.

OFFENSIVE : Posts which are degrading, dehumanizing, insulting an individual, threatening with violent acts are categorized into this category. PROFANITY : Unacceptable language in the absence of insults and abuse. This typically concerns the usage of swearwords (Schei e, Fuck etc.) and cursing (Hell! Verdammt! etc.). Such posts are categorized into this category. As expected, most posts are in the category NOT, some are HATE and the other two categories are less frequent. Dubious cases which are di cult to decide even for humans, were left out.

Sub-task C (only for English and Hindi) : Sub-task C considers the type

of o ense. Only posts labeled as HOF in sub-task A are included in sub-task C. The two categories in sub-task C are the following: 1. Targeted Insult (TIN): Posts containing an insult/threat to an individual, group, or others. 2. Untargeted (UNT): Posts containing non targeted profanity and swearing.

Posts with general profanity are not targeted, but they contain non-acceptable language. 4

Data Set and Collection

The following sections explain how the data set was created and enriched by annotations. First, the authors searched with heuristics for typical Hate Speech in online fora. They identi ed topics for which many hate posts can be expected. Di erent hashtags and keywords were used for all three languages. For some of the found posts, the id of the author was recorded. For a number of such users, the timeline was collected. Based on tweets found, we crawled the last posts of the authors to increase variety. The systems are less likely to classify individual textual style when they have a rich set of posts from an author. This procedure was intended to decrease bias and was inspired by GermEval [ 43 ].

The HASOC dataset was subsequently sampled from Twitter and partially from Facebook for all the three languages. The Twitter API gives a large number Classes NOT HATE HATE OFFN PRFN PRFN UNT OFFN TIN Sample tweet from the class 4 matches were can't play due to rain and many more will be not played r the same reason . Conclusion this world cup is no more world cup. #ShameOnICC #RainCup Are Muslims, in general a nuisance to be tolerated by the rest of the world ? #SaveBengal #DoctorsFightBack #DoctorsStrike #MamtaBanerjee #TerroristNationPakistan 90% Pakistanis wants war with India and 10% said war should not be. And Those 10% belongs to Pakistans Armed Forces #TerroristNationPakistan #Just a daily reminder to @realDonaldTrump that he is a National Disgrace. #TraitorTrump #TrumpIsADisgrace #TrumpIsATraitor @cizzacampbell Didn't realise you were an expert #dickhead Who voted for a no-deal? Tell me, who the fuck voted for a no deal? The way I see it, the referendum was a corrupt vote between remain and leave. Not remain, leave, deal, no deal. Nobody voted for no deal!! @realDonaldTrump Will it be worse than killing children? Worse than selling your country to the Russians? Worse than saying you love a ruthless dictator? Probably not. #TrumpIsATraitor of recent tweets which resulted in an unbiased dataset. Thus the tweets were acquired using hashtags and keywords that contained o ensive content. The collection was provided to participants without metadata. We have developed Twitter and Facebook plugins to fetch the posts without using the API. The size of the data corpus is shown in tables 2 and 3.

During the labeling process, several juniors for each language engaged with an online system to judge the tweets. The system can be seen in gure 1 and gure 2. They were given short guidelines that contained the information as mentioned in section 3.1. The process is highly subjective, and even after discussions of questionable often no agreement could be reached. This lies in the nature of Hate Speech.

As pointed out in the study by Ross et al. [ 32 ], not even with providing written guidelines can improve the agreement. Consequently, and to be sure that people can see them on one page, we tried to keep the guidelines short. The guidelines for HASOC are listed in the annex. A study by Salminen et al. [ 35 ] showed that the dubious and questionable cases led to much more disagreement than clear cases with obvious Hate Speech characters. Jhaver et al. [ 14 ] and colleagues interviewed both the receivers and the senders of some posts which were considered to be aggressive. They revealed that the senders often did not agree with the judgment of readers. Among other arguments, they brought forward that some messages were regarded as hateful because people did not want to be confronted with the arguments. Again, this study shows that there is a great deal of subjectivity involved and that also context matters.

The di culties during assessment in HASOC were often related to the use of language registers like youth talk and irony or indirectness which might not be understood by all readers. A more detailed analysis of the issues encountered during the HASOC annotation for German has been carried out[ 42 ]. u

The overlap between annotators for task A for English, Hindi, and German for a subset to tweets and posts annotated twice was 89%, 91%, and 32%, respectively. Further statistical details of the annotation process can be seen in table 5. The e ects of such disagreement need to be analyzed in the future. The values show that the labeling task is hard overall. The second sub-task can only be solved with a lower quality. For the sub-task C, the quality does not drop much of is even higher than that of sub-task B .

We also calculated the (Kappa) coe cient due to the high imbalance of the data sets. Using the scikit-Learn package, the inter annotator agreement for the rst two annotators for a tweet was determined. Table 6 shows values of in sub-task A for all three languages

The degree of disagreement might also result from the topics present in the collection[ 46 ]. The issues and the level of disagreement need to be analyzed in the future. 5

Evaluation Metrics

The metrics for classi cation should combine both recall and precision. The F1score has many variants like weighted F1, Macro-F1 or micro-F1. For multi-class classi cation, the distribution of class labels is often unbalanced. The weighted F1-score calculates the F1 score for each class independently. When it adds them, it uses a weight based on the number of true labels of each class. Therefore, it gives a bias for the majority class. The 'macro' calculates the F1 separately for each class but does not use weights for the aggregation. This results in a stronger penalization when a system does not perform well for the minority classes. Choice of the variant of F1-measure depends on the objective of the tasks and the distribution of label in the dataset. Hate Speech related classi cation problems su er from class imbalance. Therefore, the macro F1 is the natural choice for the evaluation. 6

Results

Overall, 103 registrations were submitted for the track. 37 teams submitted runs and 25 teams have submitted papers. 321 runs were submitted by 37 teams in all the sub-tasks.

The following sections show the sub-tasks of HASOC. The approaches of all teams are brie y summarized in the annex of this paper. For details on the technical implementation, the reader is referred to the descriptions of the participating teams in this volume. 6.1

English Dataset

In the English language, Total 174 runs were submitted across 3 sub-tasks. The YNU wb team [ 6 ] used an LSTM approach with ordered neurons and applied an attention mechanism. The absolute di erences between the top runs are rather small. Table 8 presents the results of the top 10 teams of the English sub-task A.

The plot of the performance of all systems in Figure 3 shows that the Median of the runs lies quite close to the top performance.

Despite the similar performance of many teams, the recall-precision graph in gure 4 shows that there are considerable di erences between the systems which the F1 measures do not reveal.

The overall F1 measures for sub-task B and C are much lower than for subtask A. Table 9 and 10 shows the results of these tasks. The best performing team [ 21 ] for sub-task B and sub-task C used the relatively new BERT model for classi cation. This shows that it performed well for both sub-task A with more training samples as well as for sub-task B with much fewer training instances.

The performance for task C shows that the weighted F1 values are very close together and that run number 10 has even a higher values than run number 1. The careful selection of metrics is crucial. The boxplots in gure 5, and 6 show that the Median lies again close to the top performing run for sub-tasks B and C. In the Hindi language, total 93 runs were submitted across 3 sub-tasks. The QutNocturnal team [ 4 ] used a CNN base approach with Word2vec embedding. The absolute di erences between the top runs are rather small. Table 11 presents results of the top team of Hindi sub-task A The absolute values for Hindi subtask A are comparable to the English sub-task and the top-performing systems are again close to each other. In the German language, total 54 runs were submitted across 2 sub-tasks and only, the rst two sub-tasks were possible. The Macro F1 score is lower than for the other two languages. For sub-task A, the best team used BERT sentence embedding and the multilingual sentence embedding LASER. Table 14 and 15 present result of sub-task A and B.

The LSV team [ 10 ] performed second and rst for sub-task B. They apply the BERT model and use additional corpora for similar tasks. Boxplots of the performance of all the participants team are shown in gure 10 and 11. 7

Approaches

The top performance for the sub-task A for English and and Hindi three languages is delivered by systems based on Deep neural models. Even new architectures for which little experience is available like BERT have been applied Standing Team name 1 LSV-UdS [ 10 ] 2 LSV-UdS [ 10 ] 3 HateMonitors [ 34 ] 4 3Idiots [ 21 ] 5 Cs 6 3Idiots [ 21 ] 7 FalsePostive [ 16 ] 8 FalsePostive [ 16 ] 9 FalsePostive [ 16 ] 10 LSV-UdS [ 10 ] with great success. There is even true for sub-task B for German where only few training examples were available. There needs to be considered that most systems applied a Deep Learning system (see annex B). However, for Hindi the top performance comes from a traditional machine learning system. Even for the other two languages, we can observe that some of the few non-Deep Learning systems lead to a performance quite close to the top performance. For example, Team A3-108 [ 23 ] reaches a result close to the top performance for the Hindi subtask B. Also the run IRLAB@IITBHU [ 2 ] achieves a higher weighted F1 value than the top run for sub-task B for English. It seems that the size of HASOC is small enough that traditional approaches can still prevail. There might not be enough data to train Deep architectures with many parameters. Future improvement for such systems might lie in the intelligent use of external resources. Participants were allowed to use external resources and other datasets for this task. For German, this seems to have boosted the top performing team LSV-UdS for the sub-task B for which only few training examples were available.

Several teams have adopted an open code policy and published their code in Github repositories. This policy allows repeat-ability and reproducibility of the experiments. 8

Performance Analysis

Some of the participants have conducted an interesting analysis in order to explore the behavior of their systems. We tried to explore the performance of all systems on each tweet. We ranked the tweets for sub-task A in English based on the number of systems that classi ed them. The following gure shows the distribution of the values.

We can observe that only 30% of the systems agree a post is an o ensive (class HOF) considering the Median. On the other hand, 70% of the systems vote for NOT in the Median for the class NOT. However, the distributions are quite scattered. This shows that for the systems there seem to be no clear and obvious cases. Considering the analysis of Salminen et al. in which humans agreed much on obvious Hate Speech tweets [ 35 ], there seems to be less agreement by systems. As a consequence, voting approaches might not work well. Another consequence could be that it is hard to explain and understand the decision of a classi er in this domain. This may lead to a lack of ability to explain decisions and a lack of transparency. This can result in a low degree of acceptance in society. More analysis of the results is necessary for the future. 9

Conclusion and Outlook

The submissions for HASOC have shown that deep learning representations seem to be the state of the art approach for Hate Speech classi cation. After analyzing the results, the best method for Hate speech classi cation is dependent on the corpus language, classi cation granularities, and distribution of each classlabels. In other words balance, an unbalanced training dataset might a ect the performance of the classi cation system. In the long run, the HASOC track aims at supporting researchers to develop robust technology which can cope with multilingual data and to develop transfer learning approaches that can exploit learning data across languages. For future editions, we envision the integration of further languages. The potential bias in the data collection needs to be analyzed and monitored [ 43 ]. 10

Acknowledgements

We thank all participants for their submissions and the work involved. We thank all the jurors who labeled the tweets in a short period of time. We also thank the FIRE organizers for their support in organizing the track. A A.1

Annotation Guidelines for HASOC 2019

HATE SPEECH Ascribing negative attributes or de ciencies to groups of individuals because they are members of a group (e.g. all poor people are stupid). Hateful comment toward groups because of race, political opinion, sexual orientation, gender, social status, health condition or similar.

OFFENSIVE Degrading, dehumanizing or insulting an individual. Threatening with violent acts.

PROFANITY Unacceptable language in the absence of insults and abuse. This typically concerns the usage of swearwords (Schei e, Fuck etc.) and cursing (Zur Holle! Verdammt! etc.).

OTHER Normal content, statements, or anything else. If the utterances are considered to be \normal" and not o ending to anyone, they should not be labeled. This could be part of youth language or other language registers.

We expect most posts to be OTHER, some to be HATE and the other two categories to be less frequent.

Dubious cases which are di cult to decide even for humans, should be left out. B B.1

Systems and Approaches at HASOC 2019

The following tables summarize the approaches used by the teams. The last col-umn has an entry when the team compared several approaches and clearly identi- ed a best one. The rst table shows the approaches which used technology without Deep Learning or for which a traditional approaches performed best. The second table shows the approaches which used Deep Learning.

Text Representa- Best tion and Classi er (when

cable) CNN, fastText fastText,

Hot

run appliLSTM Amrita [ 37 ] Amrita Vishwa CNN, LSTM fast

Vidyapeetham Text LGI2P [ 13 ] Univ Montpellier fastText CIT Kokrajhar [ 33 ] University of Ed- LSTM monton & CIT Kokrajhar

Yunnan Univ.

Stacked

CNN RALIGRAPH [ 19 ] Univ. of Montral BERT, Graph VCGN-BERT CNNPre-trained with external Founta corpus 3Idiots [ 21 ] Univ. of Illinois & BERT, All 3 tasks in BERT cased

IIT Kanpur one BRUMS [ 29 ] Univ of Wolver- Several deep learn- BERT hampton, ing architectures Rochester In- including LSTM, stitute of Techn. & GRU, Attention, 2D Birmingham City Convolution,light

Univ pre-processing KMI-Panlingua [ 31 ] Bhimrao Ambed- BERT, Char and BERT kar Univ. & Pan- word ngrams + SVM lingua & Charles

Univ. Prague LSV-UdS [ 10 ] Saarland Univer- BERT, BERT sity SVM,External collections

1. Kaggle ( 2017 ): Toxic comment classi cation challenge: Identify and classify toxic online comments , https://www.kaggle.com/c/jigsaw-toxic -comment-classi cationchallenge

2. Anita

Saroj

, R.K.M. , Pal , S. : Irlab@iitbhu at hasoc 2019 2019 :traditional machine learning for hate speech and o ensive content identi cation . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

3. Baruah , A. , Barbhuiya , F. , Dey , K. : IIITG-ADBU at HASOC 2019: Automated Hate Speech and O ensive Content Detection in English and Code-Mixed Hindi Text . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

4. Bashar , M.A. , Nayak , R.: QutNocturnalHASOC'19: CNN for Hate Speech and Offensive Content Identi cation in Hindi Language . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

5. Basile , V. , Bosco , C. , Fersini , E. , Nozza , D. , Patti , V. , Pardo , F.M.R. , Rosso , P. , Sanguinetti , M. : Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter . In: Proceedings of the 13th International Workshop on Semantic Evaluation . pp. 54 { 63 ( 2019 )

6. Bin

Wang

, Yunxia Ding , S.L. , Zhou , X. : YNU Wb at HASOC 2019: Ordered Neurons LSTM with Attention for Identifying Hate Speech and O ensive Language . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

7. Bosco , C. , Felice , D. , Poletto , F. , Sanguinetti , M. , Maurizio , T. : Overview of the evalita 2018 hate speech detection task . In: EVALITA 2018-Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian . vol. 2263 , pp. 1 { 9 . CEUR ( 2018 )

8. Casavantes , M. , Lopez , R. , Gonzalez , L.C. , Montes-y Gomez , M.: UACh-INAOE at HASOC 2019: Detecting Aggressive Tweets by Incorporating Authors' Traits as Descriptors . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

9. Chung , Y.L. , Kuzmenko , E. , Tekiroglu , S.S. , Guerini , M. : Conan{counter narratives through nichesourcing: a multilingual dataset of responses to ght online hate speech . arXiv preprint arXiv: 1910 . 03270 ( 2019 )

10.

Dana

Ruiter , M.A.R. , Klakow , D. : LSVUdS at HASOC 2019: The Problem of De ning Hate? In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

11. Davidson , T. , Warmsley , D. , Macy , M. , Weber , I. : Automated Hate Speech Detection and the Problem of O ensive Language . In: Proceedings of ICWSM ( 2017 )

12. Fortuna , P. , Nunes , S.: A Survey on Automatic Detection of Hate Speech in Text . ACM Computing Surveys (CSUR) 51(4) , 85 ( 2018 )

13. Jean-Christophe

Mensonides

, Pierre-Antoine Jean , A.T. , Harispe , S. : IMT Mines Ales at HASOC 2019: Automatic Hate Speech detection . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

14. Jhaver , S. , Ghoshal , S. , Bruckman , A. , Gilbert , E.: Online harassment and content moderation: The case of blocklists. ACM Transactions on Computer-Human Interaction (TOCHI) 25(2 ), 12 ( 2018 )

15. Jiang , A. : QMUL-NLP at HASOC 2019: O ensive Content Detection and Classication in Social Media . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

16. Kaushik Amar Das , F.A.B. : FalsePostive at HASOC 2019: Transfer-Learning for Detection and Classi cation of Hate Speech . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

17.

Kirti

Kumari , J.P.S. : AI ML NIT Patna at HASOC 2019: Deep Learning Approach for Identi cation of Abusive Content . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

18. Kwok , I. , Wang , Y. : Locate the hate: Detecting Tweets Against Blacks . In: TwentySeventh AAAI Conference on Arti cial Intelligence ( 2013 )

19. Lu , Z. , Nie , J.Y.: RALIGRAPH at HASOC 2019: VGCN-BERT: Augmenting BERT with Graph Embedding for O ensive Language Detection . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

20. Mishra , A. , Pal , S. : IIT Varanasi at HASOC 2019 : Hate Speech and O ensive Content Identi cation in Indo-European Languages . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

21. Mishra , S. , Mishra , S. : 3Idiots at HASOC 2019: Fine-tuning Transformer Neural Networks for Hate Speech Identi cation in Indo-European Languages . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

22. Mubarak , H. , Darwish , K. , Magdy , W. : Abusive language detection on arabic social media . In: Proceedings of the First Workshop on Abusive Language Online . pp. 52 { 56 ( 2017 )

23. Mujadia , V. , Mishra , P. , Sharma , D.M. : IIIT-Hyderabad

at HASOC

2019: Hate Speech Detection . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

24. Nayel , H.A. , Shashirekha , H.L. : DEEP at HASOC2019 : A Machine Learning Framework for Hate Speech and O ensive Language Detection . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

25. Nina-Alcocer , V. : Vito at HASOC 2019: Detecting Hate Speech and O ensive Content through Ensembles . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

26. Parikh , A. , Desai , H. , Bisht , A.S. : DA Master at HASOC 2019: Identi cation of Hate Speech using Machine Learning and Deep Learning approaches for social media post . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

27. Pedro

Alonso

, R.S. , Kovacs , G.: TheNorth at HASOC 2019: Hate Speech Detection in Social Media Data . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

28.

Rajalakshmi , Y.R. : DLRG@HASOC 2019 : An Enhanced Ensemble Classi er for Hate and O ensive Content Identi cation . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

29. Ranasinghe , T. , Zampieri , M. , Hettiarachchi , H.: BRUMS at HASOC 2019: Deep Learning Models for Multilingual Hate Speech and O ensive Language Identi - cation . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

30. Ritesh , K. , N. , R.A. , Akshit , B. , MaheshwariTushar: Aggression-annotated corpus of hindi-english code-mixed data . In: Proceedings of the 11th Language Resources and Evaluation Conference (LREC) . pp. 1 { 11 . Miyazaki , Japan ( 2018 )

31.

Ritesh

Kumar , A.K.O.: KMI-Panlingua at HASOC 2019 : SVM vs BERT for Hate Speech and O ensive Content Detection . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

32. Ross , B. , Rist , M. , Carbonell, G., Cabrera , B. , Kurowsky , N. , Wojatzki , M. : Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis . In: Proceedings of the Workshop on Natural Language Processing for Computer-Mediated Communication (NLP4CMC) . Bochum, Germany ( 2016 )

33. Saha , B.N. , Senapati , A. : CIT Kokrajhar Team: LSTM based Deep RNN Architecture for Hate Speech and O ensive Content (HASOC) Identi cation in IndoEuropean Languages . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

34. Saha , P. , Mathew , B. , Goyal , P. , Mukherjee , A. : HateMonitors at HASOC 2019: Language Agnostic Online Abuse Detection . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

35. Salminen , J. , Almerekhi , H. , Kamel , A.M. , Jung , S.g. , Jansen , B.J.: Online hate ratings vary by extremes: A statistical analysis . In: Proceedings of the 2019 Conference on Human Information Interaction and Retrieval . pp. 213 { 217 . ACM ( 2019 )

36. Schmidt , A. , Wiegand , M.: A Survey on Hate Speech Detection Using Natural Language Processing . In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics . pp. 1 { 10 . Valencia , Spain ( 2017 )

37. Sreelakshmi .K, P., K.P, S.: AmritaCEN at HASOC 2019: Hate SpeechDetection in Roman and Devanagiri Scripted Text . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

38. Stru , J.M. , Siegel , M. , Ruppenhofer , J. , Wiegand , M. , Klenner , M. : Overview of germeval task 2, 2019 shared task on the identi cation of o ensive language ( 2019 )

39. Tulkens , S. , Hilte , L. , Lodewyckx , E. , Verhoeven , B. , Daelemans , W.: The automated detection of racist discourse in dutch social media . Computational Linguistics in the Netherlands Journal 6 , 3 { 20 ( 2016 )

40. Tulkens , S. , Hilte , L. , Lodewyckx , E. , Verhoeven , B. , Daelemans , W.: A dictionarybased approach to racism detection in dutch social media . arXiv preprint arXiv:1608.08738 ( 2016 )

41.

Urmi

Saha , A.D. , Bhattacharyya , P. : IIT Bombay at HASOC 2019: Supervised Hate Speech and O ensive Content Detection in Indo-European Languages . In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December 2019 )

42. Wagner , K. , Bumann , C. : Challenges in annotating a corpus for automatic hate speech detection . In: BOBCATSSS Paris, January ( 2020 )

43. Wiegand , M. , Ruppenhofer , J. , Kleinbauer , T. : Detection of abusive language: the problem of biased datasets . In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 1 (Long and Short Papers). pp. 602 { 608 ( 2019 )

44. Wiegand , M. , Siegel , M. , Ruppenhofer , J.: Overview of the germeval 2018 shared task on the identi cation of o ensive language ( 2018 )

45. Zampieri , M. , Malmasi , S. , Nakov , P. , Rosenthal , S. , Farra , N. , Kumar , R.: Predicting the Type and Target of O ensive Posts in Social Media . In: Proceedings of NAACL ( 2019 )

46. Zampieri , M. , Malmasi , S. , Nakov , P. , Rosenthal , S. , Farra , N. , Kumar , R.: Predicting the type and target of o ensive posts in social media . arXiv preprint arXiv: 1902 . 09666 ( 2019 )

47. Zampieri , M. , Malmasi , S. , Nakov , P. , Rosenthal , S. , Farra , N. , Kumar , R.: Semeval-2019 task 6: Identifying and categorizing o ensive language in social media (o enseval) . arXiv preprint arXiv:1903 . 08983 ( 2019 )