Exploiting Contextualized Word Representations to Profile Haters on Twitter Notebook for PAN at CLEF 2021

Exploiting Contextualized Word Representations to Profile Haters on Twitter Notebook for PAN at CLEF 2021 TaniseCeron taniseceron@gmail.com University of Trento

Italy

Fondazione Bruno Kessler

Italy

CamillaCasula ccasula@fbk.eu University of Trento

Italy

Fondazione Bruno Kessler

Italy

Exploiting Contextualized Word Representations to Profile Haters on Twitter Notebook for PAN at CLEF 2021 1613-0073 85B6B55B40BDD50E64201BAA87A56E9D GROBID - A machine learning software for extracting information from scholarly documents BERT word embeddings hate speech statistical feature extraction

In this paper, we present our submission to the Profiling Haters on Twitter shared task at PAN@CLEF2021. The task aims at analyzing Twitter feeds of users in two languages, English and Spanish, in order to determine whether these users spread hate speech on social media. For English, we propose an approach which exploits contextualized word embeddings and a statistical feature extraction method, in order to find words which are used in different contexts by haters and non-haters, and we use these words as features to train a classifier. For Spanish, on the other hand, we take advantage of BERT sequence representations, using the average of the sequence representations of all tweets from a user as a feature to train a model for classifying users into haters and non-haters.

Introduction

The rise of social media in the past decade has undoubtedly changed interaction among people and made the world more inter-connected. It has provided a way for people to keep constantly in contact even when being far apart geographically, united people who have not seen each other for years or who had never met before, helped numerous volunteer associations to gather aid or recruit more volunteers, provided a place for entire communities with common interests to interact with one another and share content, resources and ideas, and the list of benefits continues relentlessly. However, on the flip side of the coin, the growing amounts of usergenerated content online are tied to an increased presence of hateful content on social media. Content moderation online is therefore important to identify and limit the spread of hate speech.

The Profiling Haters on Twitter task [1] at PAN 2021 [2] aims at determining whether a user spreads hate speech based on their Twitter feed. This shared task tackles the problem of identifying hate spreaders from a multilingual perspective, including Twitter feeds in English and Spanish.

In this paper, we present our submission to the Profiling Haters on Twitter shared task, which consists of two different approaches. First, we propose a novel approach to hate speech detection for the English data set, which derives from the assumption that certain words are used in different contexts by haters as opposed to non-haters. The idea is to exploit statistical feature selection techniques in order to find words whose embedding vectors extracted from BERT differ the most between classes, then use these words as features to train a classifier. The Spanish model, on the other hand, is inspired by text classification models, as it allows us to tackle the challenge of having a single representation for long sequences. Therefore, we build the features as though all tweets of a given user were a unique text, without losing information from any tweet. In order to do this, we use a Spanish pre-trained version of BERT for extracting a single vector representation of each tweet by a user. These representations are then averaged and fed into a classifier.

Related Work

The present work follows the definition of hate speech as described in the overview of this edition's shared task [1] and claimed by Nockleby [3] -it is defined as "any communication that disparages a person or a group on the basis of some characteristic, such as race, colour, ethnicity, gender, sexual orientation, nationality, religion, or others".

Most studies carried out on hate speech within natural language processing (NLP) so far have focused on the detection of hate speech in single messages. The singularity of this shared task lies in the fact that, instead, it focuses on the quite novel approach of classifying users who disseminate hateful messages (haters) and users who do no spread any type of hateful messages (non-haters) on Twitter. To the best of our knowledge, a similar task was proposed only once [4]. However, it is developed differently, given that the features of their model are based on the interaction among users and network metrics rather than linguistic features as proposed in the present work. User information has been used to boost the performance of hate speech detection in messages in other works as well [5,6].

In the past years, many approaches have been proposed for the detection of hate speech in single messages extracted from various social media channels, such as Twitter [6,7], Reddit [8], and YouTube [9]. A number of shared tasks have been organized on the topic, both from a monolingual [10,11,12,13] and a multilingual perspective [14,15]. They vary from more linear machine learning approaches with Naive Bayes [9], Logistic Regression [7], Support Vector Machine [16] to non-linear approaches fed with features from non-contextualized word embeddings [17] and the latest deep learning models consisting of contextualized word vector representations as features [18].

As in many other NLP and, more in general, supervised learning methods, feature selection is one of the most crucial parts of the task. In addition to this, the task of hate speech detection is particularly complex because messages can involve sarcasm, irony and neutral sentiments that are challenging for NLP systems to identify. In early models, as Schmidt and Wiegand [19] put it, simpler surface-level features such as n-gram and character n-grams have been implemented [20,6,7]. In addition to that, linguistic and lexical features have also been employed for this task, the former with the addition of part-of-speech or dependency information [21,22] and the latter with terms that are related to hatred against a certain community or general profanities [9,20]. Yet, other models have made use of features reliant on other common NLP tasks such as sentiment analysis [23].

Nobata et al. [20] experiment with features derived from static word embeddings with annotated data of comments on Yahoo! in three ways. Two of them consist of averaging the vector representation of all words in a comment derived from two types of word embeddings, they are the pretrained and word2vec models, both containing 200 dimensions. Their third approach is based on the representation of paragraph embeddings [24] following the work of Djuric et al. [25], who use the same approach for abusive language detection. In this case, every word of the comment is mapped to a matrix representing words, and every comment is mapped into a vector in a matrix of comments. Finally, words and comment vectors are concatenated forming a single representation of the comment. Besides the distributional semantic features, character and token n-grams, linguistic features (such as length of comment in tokens, average length of words, number of punctuation marks and so on), syntactic features, namely part-ofspeech and dependency parsing relations, are also included in the model. The combination of all these features yield better results than the use of the paragraph2vec technique alone [25].

The latest classifiers for hate speech detection take advantage of models such as BERT, RoBERTA and other large multilingual language models [15]. They usually feed the sentence vector representation, the [CLS] token in BERT, into more recent deep learning architectures such as convolutional neural networks, recurrent neural networks and gated recurrent units and they reach very impressive results. In the last SemEval task for detection of offensive language, the best team reached a F1 score of 0.9204 and the other teams have mostly reached very similar performance in a tight competition.

Our model proposes to work on this line of features because of its potential to capture meaning beyond a restricted list of words, besides the great number of successful NLP applications that are based on non-contextualized vector representations of words, for instance GloVe [26] and word2vec [27], and more recently contextualized representations of text with Deep Bidirectional Transformers such as BERT [28]. The development of language models based on transformer mechanisms is an important milestone in advancements of NLP, given that it has improved the state of the art of many well-established NLP tasks. One of its greatest advantages is its capacity to encompass the representation of a text in a single vector. Secondly, the vector representation of each word is dynamic and contextualized, meaning that it is has the potential to adapt the embeddings of a word according to its context. Whereas our Spanish model benefits from the former advantage, the English model uses the latter in its favor.

Methods

Both the English and Spanish training sets are balanced, consisting of 100 haters and 100 nonhaters. The dataset provided by the task organizers contains 200 tweets and a ground truth label for each user. In both datasets provided, user mentions, URLs, and hashtags have been replaced with tags in the form of #HASHTAG#. We remove all hash marks (#) while keeping the accompanying words (USER, URL, and HASHTAG).

We use different models to perform the task on English and on Spanish data. Both models exploit contextualized word representations and implement a support vector machine for the task of binary classification.

Both models were tested by the task organizers using the TIRA tool [29].

English

Features exploration

The idea underlying our English model is that haters and non-haters might use certain words in different contexts. Example (1) below shows a tweet found in the hater class which mentions the word gun in an aggressive circumstance, whereas tweet (2) illustrates an instance from the nonhater class and mentions the same word in a more political context. Thus, the feature selection for the model involves the identification of words and, in this case, of BERT embeddings that significantly vary from one class to another.

1. This money can't fit in my pockets but I bet that gun fit.

2. New state laws for the new year: California limits gun rights, minimum wages increase #url. . . #url.

To verify whether there are significant differences in the vector representation of words between the two classes, we first carry out an experimental coarse analysis with t-SNE [30], a technique of dimensionality reduction that is able to reduce the space to two dimensions so that it can be plotted and interpreted.

We first make a list of most frequent tokens by selecting the ones that occur at least 25 times in both classes. In total there are 788 of them. Note that they are BERT WordPiece tokens [31] taken from BertTokenizer [32] 1 , meaning that tokens not correspondent to complete words are also included in the list. However, even though words are not complete, they should still have a rich contextualized vector representation, considering that BERT is able to distinguish the different contexts of split words as well [33]. Throughout the whole experiment, we use the uncased base version of BERT [34].

For the t-SNE analysis, we feed each tweet of a given 𝑢𝑠𝑒𝑟 𝑗 of the class hater into the BERT model and retrieve the vector representation of a given token (𝑡 𝑖 ) present in the most frequent list. Then, we average all the vectors of 𝑡 𝑖 of 𝑢𝑠𝑒𝑟 𝑗 . More formally, let 𝑡 𝑖 be the token that occurs in a tweet {𝑡𝑤 1 , 𝑡𝑤 2 ,... 𝑡𝑤 𝑛 } of a given 𝑢𝑠𝑒𝑟 𝑗 . Thus, the vector representation (𝐸 ⃗ ) of 𝑡 𝑖 in 𝑢𝑠𝑒𝑟 𝑗 is:

𝐸 ⃗ 𝑢𝑠𝑒𝑟 𝑗 [𝑡 𝑖 ] = ∑︀ 𝑁 𝑛=1 𝑡𝑤 𝑛 [𝑡 𝑖 ] 𝑁 (1)

where N is the number of occurrences of 𝑡 𝑖 in all tweets by 𝑢𝑠𝑒𝑟 𝑗 , and 𝑡𝑤 ⃗ 𝑛 [𝑡 𝑖 ] is the vector of 𝑡 𝑖 in 𝑡𝑤 𝑛 . We then repeat the same procedure for the non-hater class and reduce the dimensions. For example, 𝐸 ⃗ 𝑢𝑠𝑒𝑟 [gun], which is a matrix of [Nx768], is reduced to a matrix with 2 components [Nx2]. Some of the results can be seen in Figure 1, where each dot represents a 𝐸 ⃗ 𝑢𝑠𝑒𝑟 𝑗 [𝑡 𝑖 ].

This coarse evaluation shows that some tokens form well-defined clusters between the two classes such as happy (Figure 1a) and world (Figure 1b). In contrast, others words like amazing (Figure 1c) and indeed even the word gun (Figure 1d) are sprawling and occupy overlapping spaces in both classes, suggesting that they do not have distinguishing vector representations.

Given the results of this coarse analysis with t-SNE, and considering that the reduction of vectors from 768 to 2 dimensions may cause the vector to lose a large amount of relevant information, we turn to a more statistical approach to select the words for our model. Instead of using predefined term lists, we employ a technique called filter approach for selecting the features (in our case the words) that most diverge between the two classes in terms of word embeddings. This technique requires two steps. First of all, a statistical test measures the difference in the vector representation of the tokens, and returns the p-value for the difference in vector between the two classes for each token. Then, a p-value (our threshold) is chosen, in order to pick the k most relevant features/tokens. In this study, we analysed the difference in vectors using the Kolmogorov-Smirnov (K-S) test. Biesiada and Duch [35] suggest that the K-S test helps in feature selection of high-dimensions and can significantly improve the performance of classifiers such as the one used here (SVM). The K-S test allows us to understand the maximum difference between the cumulative distribution of two random variables. Therefore, we assume that the more dissimilar the vectors are, as determined by a two-tailed K-S test, the easier it is for the classifier to distinguish between classes.

To start with, we retrieve the same vector representation for each user presented in Equation 1. After that, considering that in this case we want to have a single representation of a token t for each class, we average 𝐸 ⃗ 𝑢𝑠𝑒𝑟 [𝑡 𝑖 ] of all users to get the final representation of 𝑡 𝑖 , as in:

𝐸 ⃗ 𝑘𝑠 [𝑡 𝑖 ] = ∑︀ 𝑁 𝑛=1 𝐸 ⃗ 𝑢𝑠𝑒𝑟 [𝑡 𝑖 ] 𝑁(2)

In this case N is the number of users that have at least one occurrence of the given 𝑡 𝑖 and we call the vector 𝐸 ⃗ 𝑘𝑠 because it is used for the statistical test. We reach this point with two dictionaries, one for each class, with 𝑡 𝑖 as key and its corresponding 𝐸 ⃗ [𝑡 𝑖 ] as the value and are ready to run the K-S test. For example, the 𝐸 ⃗ [𝑔𝑢𝑛 ℎ𝑎𝑡𝑒𝑟 ] from the hater label as variable x and 𝐸 ⃗ [𝑔𝑢𝑛 𝑛𝑜𝑛−ℎ𝑎𝑡𝑒𝑟 ] as variable y, letting the K-S test be the function 𝐾 − 𝑆 𝑡𝑤𝑜−𝑡𝑎𝑖𝑙 (x, y).

The results of the K-S test for each token are p-values very close to 1 for most tokens, since they are drawn from the same distribution. However, this is not a problem in our case, because we do not want to know whether they are statistically significant according to the confidence interval. The goal instead is to find out which 𝐸 ⃗ 𝑘𝑠 [𝑡 𝑖 ] are lower compared to others, meaning that the 𝐸 ⃗ 𝑘𝑠 [𝑡 𝑖 ] between classes are more dissimilar.

Model implementation

Now that we have the p-values for each token in our most frequent list, we must decide on a threshold that will select the number of relevant features to be fed into the classifier. This is done by various runs with different thresholds in the model. That is, we pick a p-value, get a single vector representation that is the average of all the 𝐸 ⃗ 𝑢𝑠𝑒𝑟 with t under the p-value of the K-S test, such as:

𝐸 ⃗ 𝑓 𝑒𝑎𝑡 [𝑢𝑠𝑒𝑟 𝑗 ] = ∑︀ 𝑁 𝑛=1 𝐸 ⃗ 𝑢𝑠𝑒𝑟 𝑗 [𝑡 < 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑] 𝑁(3)

The embeddings is called feat because it is the feature representation of each user. 𝐸 ⃗ 𝑓 𝑒𝑎𝑡 [user] is fed to the classifier. The performance is evaluated in terms of accuracy with 5-fold crossvalidation. We finally choose the set of features that results in the model's best performance. The threshold selected in our submission is 0.998 and the set of tokens consists in a total of 394 tokens/features for the model. As a matter of fact, some of them can be very relevant semantically in the context of hate speech detection such as war, liberal, black, woman, violence, racism, bitch and so forth (all the tokens are presented in Appendix A -List of wordpieces used in the English model).

After having selected the features, we also try different layers of representations from BERT's outputs given that it is has been observed that each of the 12 layers capture different features of the input text [36]. More precisely, we experiment with the last three layers because they seem to be the ones that encapsulate more context-specific representations [37]. Hence, we run the classifier with feature vector representations from the 10th, 11th and 12th layer to see which performs the best. The 12th layer shows to produce better results even though the different in performance between one layer and another is not statistically significant.

As a final step, we add to the features the averaged CLS tokens of each tweet because we notice that even though the set of tokens is large, there are some users from the test set who do not contain any of those tokens. Again, we run tests to see which layer of the CLS token is more advantageous to the model and choose the 12th layer. The experiments are conducted with two kernels of the support vector machine, the radial basis function (rbf) and the polynomial kernel. We utilize Bayesian optimization technique [38] for finding the best hyper-parameters in order to spare some time and computational power in executing the traditional grid search approach. The best performing model is the rbf (C≈14.0749, gamma≈0.0095).

Spanish

During the experimental phase, we tested the same approach used in the English data set on Spanish as well. However, in our final submission, we opted for a simpler model, which in our experiments worked better on the Spanish data.

This model follows a more straightforward approach inspired by text classification [36] with BERT representations. It is based on text classification systems because all tweets of each user are treated like a long text. Besides that, given that the 200 tweets are longer than 512 tokens, such as in the case of text classification, in order to have a single representation of the whole text, we use the strategy of averaging the sequence representation of every tweet of the user. For that, we use the uncased and base pre-trained Spanish version of BERT called BETO2 [39] throughout the training and testing of the Spanish data set.

After pre-processing, every tweet {𝑡𝑤 1 , 𝑡𝑤 2 ,... 𝑡𝑤 𝑛 } of a given user is fed into BETO. Then, we extract the vector representation of the CLS token, which encapsulates a single representation of the whole sequence. Lastly, we average these vectors to create the feature representation of each user such as in:

𝐸 ⃗ 𝑓 𝑒𝑎𝑡 [𝑢𝑠𝑒𝑟 𝑗 ] = ∑︀ 𝑁 𝑛=1 𝑡𝑤 ⃗ 𝑛 [cls_token] 𝑁(4)

Where N is always 200 given that this is a fixed number of tweets in 𝑢𝑠𝑒𝑟 𝑗 . The 𝐸 ⃗ 𝑓 𝑒𝑎𝑡 is fed into the support vector machine.

We also experimented with summing the CLS tokens and found that results are very similar, given that there is no confound with the frequency of tokens. Similarly to the English model, we test the last three layers of the CLS token in rbf and polynomial functions with Bayesian optimization, and verify that the 11th layer trained on the polynomial kernel (C≈7.3588, gamma≈0.0285, degree≈1.2859) and the 10th layer trained on rbf give the best result in the 5 fold cross-validation, so we submitted both for the shared task, and indeed they have even returned the same labels for the test set.

Results

Table 1

The 5-fold cross-validation results for the English model. 1 shows that the best accuracy in the training set in English is reached by the model with fewer features (210 tokens) compared to the second place. However, a paired t-test with the results of the 5 fold CV showed that the first and second model are not statistically significantly different in the training set (p-value=0.1998). The results of the test test, on the other hand, differ considerably from one another with the 394-token model reaching 4 points better accuracy. One reason for the difference in classification of the test set is that a broader range of tokens included in the features can enhance the performance on unseen data.

Alternatively to averaging, we try to sum the vectors based on the idea that the frequency with which words occur in the tweets may help the classifier to discriminate better between classes. However, despite having indeed performed better (accuracy 7% higher that our final submission model) in the training set overall, the classification in the test set was overwhelmingly imbalanced with 97 haters out of 100 users and the accuracy was also very imbalanced within the 5 fold cross-validation, showing that it did not generalize well in all folds. It suggests, though, that the classifier learns from the frequency and that there should be a similar number of occurrences for a reasonable performance.

For what concerns the Spanish model, even though we chose the model that extracts the CLS token representation from the 11th layer, the three models actually perform similarly in terms of accuracy as seen in Table 2. The first and second model have even returned the same labels for the classification. Moreover, we have attempted to apply same approach we used for the English data set, but the results in the training set drop considerably reaching lower performance than with the CLS token approach. One hypothesis for the difference in results could be related to the corpora on which the pre-trained language models are trained. It might be that the language used in the Spanish tweets is more similar to the language in the corpora used for training BETO compared to the English data set and BERT, which would help encompassing the meaning and context of the dataset more easily, therefore prompting better results. Nonetheless, it is difficult to know precisely the reason why they perform so differently, because of the lack of interpretability of these large language models.

In terms of layer selection, we observed that both models perform quite similarly when trained in each of the three last layers, suggesting that they have very similar representations of tokens.

Conclusion

We present two novel approaches to profile haters on Twitter. The English approach relies on the idea that a set of words can be used in different contexts by the hater and non-hater users. The state-of-the-art language model BERT is adopted to capture the contextualized embeddings of tokens. Then, the difference of the vector representation in both classes is measured through the K-S statistical test. And finally, the relevant features are chosen by feeding a set of tokens from different thresholds of the test into the support vector machine.

In contrast, we have seen that the same approach does not work as well for the Spanish model. Therefore, inspired by text classification methods, we use the averaged vector representation of all CLS tokens from every tweet of each user as input for the support vector machine. Despite being a simpler model, it yields impressive results considering the amount of training data available.

As a future step, it would be interesting to test the same models with more training data to check whether it boosts their performance, and perhaps replace the SVM approach with a deep learning model. In addition, in this shared task we only process textual information, but in a real scenario other features related to metadata could be included to have more informative and characteristic features, which may improve classification. Lastly, more related to the English model, other types of statistical tests might be experimented as well, in order to distinguish better features for the model. Otherwise, after the statistical test, the model could be trained iteratively with the ablation technique to select best performing features among the ones already selected by the threshold.

Figure 1 :1Figure 1: t-SNE applied on words as a coarse analysis to check the difference in the semantic space of the hater and non-hater classes. t-SNE has a perplexity of 15, 2 components and 3500 iterations.

Kernel Layer Feature (K-S test) Additional feature Acc. (train set) Acc. (test set)Our final submission model is the underlined model.rbf11th210 tokensCLS token74%69%rbf12th394 tokensCLS token72%73%rbf10th210 tokensCLS token71%-rbf10th517 tokensCLS token70,5%-

Table

Table 22The 5-fold cross-validation results for the Spanish model. Our final submission model is the underlined model.

Kernel LayerFeatureAcc. (training set) Acc. (test set)poly11thCLS token84%80%rbf10thCLS token84%80%rbf12thCLS token82%-

https://huggingface.co/transformers/main_classes/tokenizer.html https://github.com/dccuchile/beto

Appendix A -List of wordpieces used in the English model cls_token, act, actually, ad, age, ah, air, al, almost, also, always, americans, amp, anti, around, attack, automatically, away, b, behind, better, big, bitch, black, block, body, border, boy, br, break, bro, buy, ca, california, call, cannot, car, care, change, check, checked, children, christmas, city, class, close, cnn, co, con, congress, control, cr, crazy, crime, cu, cut, da, days, de, dem, democrat, democrats, deserve, di, die, different, donald, drop, dude, dumb, e, election, else, em, end, energy, even, evidence, ex, f, fact, facts, fake, family, far, fast, feeling, fight, find, fine, folks, following, food, forever, fox, fr, friend, friends, full, g, ga, gas, gave, george, get, getting, give, glad, gone, great, guy, guys, h, ha, hair, half, hands, happen, happy, hard, hash, hell, help, high, history, hit, ho, hold, hot, hu, human, id, idea, im, imagine, imp, important, ins, interesting, j, k, keep, kid, kids, kind, last, late, law, less, let, liberal, listen, live, lives, living, lo, longer, look, lord, lot, mad, made, make, makes, mark, mask, matter, may, men, military, mine, minutes, miss, mom, month, months, mother, move, mr, ms, mu, n, ne, never, ni, night, obama, ok, okay, old, om, ones, open, order, others, p, paid, pan, past, pe, per, place, plan, play, playing, please, po, point, poor, post, power, pp, pray, press, put, question, r, race, racism, racist, rape, rather, read, ready, reason, red, republican, rest, right, room, sad, safe, save, say, saying, sc, second, see, seems, seen, self, send, sense, share, shit, shot, show, shut, side, sign, single, sit, sm, smoke, social, someone, song, soon, speak, special, stand, start, state, stay, step, stop, story, straight, strong, stuff, suck, sure, system, take, taking, team, test, thanks, thing, things, three, ti, time, times, took, top, tried, trip, try, trying, twitter, type, un, understand, user, using, va, vaccine, via, vibe, video, violence, vote, voted, voting, vs, wall, want, wanted, war, water, wear, wearing, whatever, white, whole, win, wish, wit, woman, wonder, word, world, worse, worst, would, wrong, x, ye, yeah, year, yes, yesterday, yet, yo, youtube, ##aa, ##al, ##c, ##ce, ##ck, ##d, ##e, ##ea, ##er, ##es, ##f, ##fs, ##ful, ##gga, ##h, ##ha, ##i, ##ie, ##ies, ##in, ##k, ##llo, ##n, ##na, ##o, ##ot, ##p, ##r, ##rs, ##ss, ##t, ##tf, ##ting, ##v, ##w, ##wed, ##wee, ##x, ##y, ##z, 0, 000, 1, 19,20,2021,3,30,4,5,50,6,7,8,9

Profiling Hate Speech Spreaders on Twitter Task at PAN 2021 FRangel PRosso GL D L PSarracén EFersini BChulvi CLEF 2021 Labs and Workshops Notebook Papers 2021 Overview of PAN 2021: Authorship Verification,Profiling Hate Speech Spreaders on Twitter,and Style Change Detection JBevendorff BChulvi GL D L PSarracén MKestemont EManjavacas IMarkov MMayerl MPotthast FRangel PRosso EStamatatos BStein MWiegmann MWolska EZangerle 12th International Conference of the CLEF Association (CLEF 2021) Springer 2021 US. Cited in "Library 2.0 and the Problem of Hate Speech JTNockleby Electronic Journal of Academic and Special Librarianship Margaret Brown-Sica and Jeffrey Beall 3 2 Summer 2008. 2000 Macmillan Reference Hate Speech", Encyclopedia of the American Constitution Characterizing and detecting hateful users on twitter MRibeiro PCalais YSantos VAlmeida WMeiraJr Proceedings of the International AAAI Conference on Web and Social Media the International AAAI Conference on Web and Social Media 2018 12 Author profiling for abuse detection PMishra MDel Tredici HYannakoudakis EShutova Proceedings of the 27th international conference on computational linguistics the 27th international conference on computational linguistics 2018 Hateful symbols or hateful people? predictive features for hate speech detection on twitter ZWaseem DHovy Proceedings of the NAACL student research workshop the NAACL student research workshop 2016 Automated hate speech detection and the problem of offensive language TDavidson DWarmsley MMacy IWeber Proceedings of the International AAAI Conference on Web and Social Media the International AAAI Conference on Web and Social Media 2017 11 The effect of extremist violence on hateful speech online AOlteanu CCastillo JBoy KVarshney Proceedings of the International AAAI Conference on Web and Social Media the International AAAI Conference on Web and Social Media 2018 12 Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making PBurnap MLWilliams Policy & internet 7 2015 Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval) MZampieri SMalmasi PNakov SRosenthal NFarra RKumar Proceedings of the 13th International Workshop on Semantic Evaluation the 13th International Workshop on Semantic Evaluation 2019 Overview of the evalita 2018 hate speech detection task CBosco FDell'orletta FPoletto MSanguinetti MTesconi EVALITA@CLiC-it 2018 Haspeede 2 @ evalita2020: Overview of the evalita 2020 hate speech detection task MSanguinetti GComandini ENuovo SFrenda MStranisci CBosco TCaselli VPatti IRusso 2020 JStruß MSiegel JRuppenhofer MWiegand MKlenner Overview of germeval task 2, 2019 shared task on the identification of offensive language 2019 SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter VBasile CBosco EFersini DNozza VPatti FMRangel Pardo PRosso MSanguinetti 10.18653/v1/S19-2007 Proceedings of the 13th International Workshop on Semantic Evaluation, Association for Computational Linguistics the 13th International Workshop on Semantic Evaluation, Association for Computational Linguistics

Minneapolis, Minnesota, USA

2019 SemEval-2020 task 12: Multilingual offensive language identification in social media (OffensEval MZampieri PNakov SRosenthal PAtanasova GKaradzhov HMubarak LDerczynski ZPitenis ÇÇöltekin Proceedings of the Fourteenth Workshop on Semantic Evaluation, International Committee for Computational Linguistics the Fourteenth Workshop on Semantic Evaluation, International Committee for Computational Linguistics 2020. 2020 Challenges in discriminating profanity from hate speech SMalmasi MZampieri Journal of Experimental & Theoretical Artificial Intelligence 30 2018 PMishra HYannakoudakis EShutova arXiv:1809.00378 Neural character-based composition models for abuse detection 2018 arXiv preprint A bert-based transfer learning approach for hate speech detection in online social media MMozafari RFarahbakhsh NCrespi International Conference on Complex Networks and Their Applications Springer 2019 A survey on hate speech detection using natural language processing ASchmidt MWiegand Proceedings of the fifth international workshop on natural language processing for social media the fifth international workshop on natural language processing for social media 2017 Abusive language detection in online user content CNobata JTetreault AThomas YMehdad YChang Proceedings of the 25th international conference on world wide web the 25th international conference on world wide web 2016 Learning from bullying traces in social media J.-MXu K.-SJun XZhu ABellmore Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies 2012 Detecting offensive language in social media to protect adolescent online safety YChen YZhou SZhu HXu 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing IEEE 2012 A lexicon-based approach for hate speech detection NDGitari ZZuping HDamien JLong International Journal of Multimedia and Ubiquitous Engineering 10 2015 Distributed representations of sentences and documents QLe TMikolov International conference on machine learning

PMLR

2014 Hate speech detection with comment embeddings NDjuric JZhou RMorris MGrbovic VRadosavljevic NBhamidipati Proceedings of the 24th international conference on world wide web the 24th international conference on world wide web 2015 Glove: Global vectors for word representation JPennington RSocher CDManning Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) the 2014 conference on empirical methods in natural language processing (EMNLP) 2014 TMikolov KChen GCorrado JDean arXiv:1301.3781 Efficient estimation of word representations in vector space 2013 arXiv preprint JDevlin M.-WChang KLee KToutanova arXiv:1810.04805 Bert: Pre-training of deep bidirectional transformers for language understanding 2018 arXiv preprint TIRA Integrated Research Architecture MPotthast TGollub MWiegmann BStein 10.1007/978-3-030-22948-1_5 Information Retrieval Evaluation in a Changing World, The Information Retrieval Series NFerro CPeters

Berlin Heidelberg New York

Springer 2019 Visualizing data using t-sne LVan Der Maaten GHinton Journal of machine learning research 9 2008 YWu MSchuster ZChen QVLe MNorouzi WMacherey MKrikun YCao QGao KMacherey JKlingner AShah MJohnson XLiu ŁukaszKaiser SGouws YKato TKudo HKazawa KStevens GKurian NPatil WWang CYoung JSmith JRiesa ARudnick OVinyals GCorrado MHughes JDean CoRR abs/1609.08144 Google's neural machine translation system: Bridging the gap between human and machine translation 2016 Transformers: State-of-the-art natural language processing TWolf JChaumond LDebut VSanh CDelangue AMoi PCistac MFuntowicz JDavison SShleifer Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations 2020 KZhou KEthayarajh DJurafsky arXiv:2104.08465 Frequency-based distortions in contextualized word embeddings 2021 arXiv preprint ITurc M.-WChang KLee KToutanova arXiv:1908.08962v2 Well-read students learn better: On the importance of pre-training compact models 2019 arXiv preprint Feature selection for high-dimensional data: A kolmogorov-smirnov correlation-based filter JBiesiada WDuch Computer Recognition Systems Springer 2005 How to fine-tune bert for text classification? CSun XQiu YXu XHuang China National Conference on Chinese Computational Linguistics Springer 2019 KEthayarajh arXiv:1909.00512 How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings 2019 arXiv preprint FNogueira Bayesian Optimization: Open source constrained global optimization tool for Python 2014 Spanish pre-trained bert model and evaluation data JCañete GChaperon RFuentes J.-HHo HKang JPérez PML4DC at ICLR 2020. 2020