<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Exploiting Contextualized Word Representations to Profile Haters on Twitter Notebook for PAN at CLEF 2021</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Tanise</forename><surname>Ceron</surname></persName>
							<email>taniseceron@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Trento</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Camilla</forename><surname>Casula</surname></persName>
							<email>ccasula@fbk.eu</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Trento</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Fondazione Bruno Kessler</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Exploiting Contextualized Word Representations to Profile Haters on Twitter Notebook for PAN at CLEF 2021</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">85B6B55B40BDD50E64201BAA87A56E9D</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:46+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>BERT</term>
					<term>word embeddings</term>
					<term>hate speech</term>
					<term>statistical feature extraction</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we present our submission to the Profiling Haters on Twitter shared task at PAN@CLEF2021. The task aims at analyzing Twitter feeds of users in two languages, English and Spanish, in order to determine whether these users spread hate speech on social media. For English, we propose an approach which exploits contextualized word embeddings and a statistical feature extraction method, in order to find words which are used in different contexts by haters and non-haters, and we use these words as features to train a classifier. For Spanish, on the other hand, we take advantage of BERT sequence representations, using the average of the sequence representations of all tweets from a user as a feature to train a model for classifying users into haters and non-haters.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The rise of social media in the past decade has undoubtedly changed interaction among people and made the world more inter-connected. It has provided a way for people to keep constantly in contact even when being far apart geographically, united people who have not seen each other for years or who had never met before, helped numerous volunteer associations to gather aid or recruit more volunteers, provided a place for entire communities with common interests to interact with one another and share content, resources and ideas, and the list of benefits continues relentlessly. However, on the flip side of the coin, the growing amounts of usergenerated content online are tied to an increased presence of hateful content on social media. Content moderation online is therefore important to identify and limit the spread of hate speech.</p><p>The Profiling Haters on Twitter task <ref type="bibr" target="#b0">[1]</ref> at PAN 2021 <ref type="bibr" target="#b1">[2]</ref> aims at determining whether a user spreads hate speech based on their Twitter feed. This shared task tackles the problem of identifying hate spreaders from a multilingual perspective, including Twitter feeds in English and Spanish.</p><p>In this paper, we present our submission to the Profiling Haters on Twitter shared task, which consists of two different approaches. First, we propose a novel approach to hate speech detection for the English data set, which derives from the assumption that certain words are used in different contexts by haters as opposed to non-haters. The idea is to exploit statistical feature selection techniques in order to find words whose embedding vectors extracted from BERT differ the most between classes, then use these words as features to train a classifier. The Spanish model, on the other hand, is inspired by text classification models, as it allows us to tackle the challenge of having a single representation for long sequences. Therefore, we build the features as though all tweets of a given user were a unique text, without losing information from any tweet. In order to do this, we use a Spanish pre-trained version of BERT for extracting a single vector representation of each tweet by a user. These representations are then averaged and fed into a classifier.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>The present work follows the definition of hate speech as described in the overview of this edition's shared task <ref type="bibr" target="#b0">[1]</ref> and claimed by Nockleby <ref type="bibr" target="#b2">[3]</ref> -it is defined as "any communication that disparages a person or a group on the basis of some characteristic, such as race, colour, ethnicity, gender, sexual orientation, nationality, religion, or others".</p><p>Most studies carried out on hate speech within natural language processing (NLP) so far have focused on the detection of hate speech in single messages. The singularity of this shared task lies in the fact that, instead, it focuses on the quite novel approach of classifying users who disseminate hateful messages (haters) and users who do no spread any type of hateful messages (non-haters) on Twitter. To the best of our knowledge, a similar task was proposed only once <ref type="bibr" target="#b3">[4]</ref>. However, it is developed differently, given that the features of their model are based on the interaction among users and network metrics rather than linguistic features as proposed in the present work. User information has been used to boost the performance of hate speech detection in messages in other works as well <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>.</p><p>In the past years, many approaches have been proposed for the detection of hate speech in single messages extracted from various social media channels, such as Twitter <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7]</ref>, Reddit <ref type="bibr" target="#b7">[8]</ref>, and YouTube <ref type="bibr" target="#b8">[9]</ref>. A number of shared tasks have been organized on the topic, both from a monolingual <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13</ref>] and a multilingual perspective <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref>. They vary from more linear machine learning approaches with Naive Bayes <ref type="bibr" target="#b8">[9]</ref>, Logistic Regression <ref type="bibr" target="#b6">[7]</ref>, Support Vector Machine <ref type="bibr" target="#b15">[16]</ref> to non-linear approaches fed with features from non-contextualized word embeddings <ref type="bibr" target="#b16">[17]</ref> and the latest deep learning models consisting of contextualized word vector representations as features <ref type="bibr" target="#b17">[18]</ref>.</p><p>As in many other NLP and, more in general, supervised learning methods, feature selection is one of the most crucial parts of the task. In addition to this, the task of hate speech detection is particularly complex because messages can involve sarcasm, irony and neutral sentiments that are challenging for NLP systems to identify. In early models, as Schmidt and Wiegand <ref type="bibr" target="#b18">[19]</ref> put it, simpler surface-level features such as n-gram and character n-grams have been implemented <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7]</ref>. In addition to that, linguistic and lexical features have also been employed for this task, the former with the addition of part-of-speech or dependency information <ref type="bibr" target="#b20">[21,</ref><ref type="bibr" target="#b21">22]</ref> and the latter with terms that are related to hatred against a certain community or general profanities <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b19">20]</ref>. Yet, other models have made use of features reliant on other common NLP tasks such as sentiment analysis <ref type="bibr" target="#b22">[23]</ref>.</p><p>Nobata et al. <ref type="bibr" target="#b19">[20]</ref> experiment with features derived from static word embeddings with annotated data of comments on Yahoo! in three ways. Two of them consist of averaging the vector representation of all words in a comment derived from two types of word embeddings, they are the pretrained and word2vec models, both containing 200 dimensions. Their third approach is based on the representation of paragraph embeddings <ref type="bibr" target="#b23">[24]</ref> following the work of Djuric et al. <ref type="bibr" target="#b24">[25]</ref>, who use the same approach for abusive language detection. In this case, every word of the comment is mapped to a matrix representing words, and every comment is mapped into a vector in a matrix of comments. Finally, words and comment vectors are concatenated forming a single representation of the comment. Besides the distributional semantic features, character and token n-grams, linguistic features (such as length of comment in tokens, average length of words, number of punctuation marks and so on), syntactic features, namely part-ofspeech and dependency parsing relations, are also included in the model. The combination of all these features yield better results than the use of the paragraph2vec technique alone <ref type="bibr" target="#b24">[25]</ref>.</p><p>The latest classifiers for hate speech detection take advantage of models such as BERT, RoBERTA and other large multilingual language models <ref type="bibr" target="#b14">[15]</ref>. They usually feed the sentence vector representation, the [CLS] token in BERT, into more recent deep learning architectures such as convolutional neural networks, recurrent neural networks and gated recurrent units and they reach very impressive results. In the last SemEval task for detection of offensive language, the best team reached a F1 score of 0.9204 and the other teams have mostly reached very similar performance in a tight competition.</p><p>Our model proposes to work on this line of features because of its potential to capture meaning beyond a restricted list of words, besides the great number of successful NLP applications that are based on non-contextualized vector representations of words, for instance GloVe <ref type="bibr" target="#b25">[26]</ref> and word2vec <ref type="bibr" target="#b26">[27]</ref>, and more recently contextualized representations of text with Deep Bidirectional Transformers such as BERT <ref type="bibr" target="#b27">[28]</ref>. The development of language models based on transformer mechanisms is an important milestone in advancements of NLP, given that it has improved the state of the art of many well-established NLP tasks. One of its greatest advantages is its capacity to encompass the representation of a text in a single vector. Secondly, the vector representation of each word is dynamic and contextualized, meaning that it is has the potential to adapt the embeddings of a word according to its context. Whereas our Spanish model benefits from the former advantage, the English model uses the latter in its favor.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methods</head><p>Both the English and Spanish training sets are balanced, consisting of 100 haters and 100 nonhaters. The dataset provided by the task organizers contains 200 tweets and a ground truth label for each user. In both datasets provided, user mentions, URLs, and hashtags have been replaced with tags in the form of #HASHTAG#. We remove all hash marks (#) while keeping the accompanying words (USER, URL, and HASHTAG).</p><p>We use different models to perform the task on English and on Spanish data. Both models exploit contextualized word representations and implement a support vector machine for the task of binary classification.</p><p>Both models were tested by the task organizers using the TIRA tool <ref type="bibr" target="#b28">[29]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">English</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1.">Features exploration</head><p>The idea underlying our English model is that haters and non-haters might use certain words in different contexts. Example (1) below shows a tweet found in the hater class which mentions the word gun in an aggressive circumstance, whereas tweet (2) illustrates an instance from the nonhater class and mentions the same word in a more political context. Thus, the feature selection for the model involves the identification of words and, in this case, of BERT embeddings that significantly vary from one class to another.</p><p>1. This money can't fit in my pockets but I bet that gun fit.</p><p>2. New state laws for the new year: California limits gun rights, minimum wages increase #url. . . #url.</p><p>To verify whether there are significant differences in the vector representation of words between the two classes, we first carry out an experimental coarse analysis with t-SNE <ref type="bibr" target="#b29">[30]</ref>, a technique of dimensionality reduction that is able to reduce the space to two dimensions so that it can be plotted and interpreted.</p><p>We first make a list of most frequent tokens by selecting the ones that occur at least 25 times in both classes. In total there are 788 of them. Note that they are BERT WordPiece tokens <ref type="bibr" target="#b30">[31]</ref> taken from BertTokenizer <ref type="bibr" target="#b31">[32]</ref> <ref type="foot" target="#foot_0">1</ref> , meaning that tokens not correspondent to complete words are also included in the list. However, even though words are not complete, they should still have a rich contextualized vector representation, considering that BERT is able to distinguish the different contexts of split words as well <ref type="bibr" target="#b32">[33]</ref>. Throughout the whole experiment, we use the uncased base version of BERT <ref type="bibr" target="#b33">[34]</ref>.</p><p>For the t-SNE analysis, we feed each tweet of a given 𝑢𝑠𝑒𝑟 𝑗 of the class hater into the BERT model and retrieve the vector representation of a given token (𝑡 𝑖 ) present in the most frequent list. Then, we average all the vectors of 𝑡 𝑖 of 𝑢𝑠𝑒𝑟 𝑗 . More formally, let 𝑡 𝑖 be the token that occurs in a tweet {𝑡𝑤 1 , 𝑡𝑤 2 ,... 𝑡𝑤 𝑛 } of a given 𝑢𝑠𝑒𝑟 𝑗 . Thus, the vector representation (𝐸 ⃗ ) of 𝑡 𝑖 in 𝑢𝑠𝑒𝑟 𝑗 is:</p><formula xml:id="formula_0">𝐸 ⃗ 𝑢𝑠𝑒𝑟 𝑗 [𝑡 𝑖 ] = ∑︀ 𝑁 𝑛=1 𝑡𝑤 𝑛 [𝑡 𝑖 ] 𝑁 (<label>1</label></formula><formula xml:id="formula_1">)</formula><p>where N is the number of occurrences of 𝑡 𝑖 in all tweets by 𝑢𝑠𝑒𝑟 𝑗 , and 𝑡𝑤 ⃗ 𝑛 [𝑡 𝑖 ] is the vector of 𝑡 𝑖 in 𝑡𝑤 𝑛 . We then repeat the same procedure for the non-hater class and reduce the dimensions. For example, 𝐸 ⃗ 𝑢𝑠𝑒𝑟 [gun], which is a matrix of [Nx768], is reduced to a matrix with 2 components [Nx2]. Some of the results can be seen in Figure <ref type="figure" target="#fig_1">1</ref>, where each dot represents a 𝐸 ⃗ 𝑢𝑠𝑒𝑟 𝑗 [𝑡 𝑖 ].</p><p>This coarse evaluation shows that some tokens form well-defined clusters between the two classes such as happy (Figure <ref type="figure" target="#fig_1">1a</ref>) and world (Figure <ref type="figure" target="#fig_1">1b</ref>). In contrast, others words like amazing (Figure <ref type="figure" target="#fig_1">1c</ref>) and indeed even the word gun (Figure <ref type="figure" target="#fig_1">1d</ref>) are sprawling and occupy overlapping spaces in both classes, suggesting that they do not have distinguishing vector representations.</p><p>Given the results of this coarse analysis with t-SNE, and considering that the reduction of vectors from 768 to 2 dimensions may cause the vector to lose a large amount of relevant information, we turn to a more statistical approach to select the words for our model.  Instead of using predefined term lists, we employ a technique called filter approach for selecting the features (in our case the words) that most diverge between the two classes in terms of word embeddings. This technique requires two steps. First of all, a statistical test measures the difference in the vector representation of the tokens, and returns the p-value for the difference in vector between the two classes for each token. Then, a p-value (our threshold) is chosen, in order to pick the k most relevant features/tokens. In this study, we analysed the difference in vectors using the Kolmogorov-Smirnov (K-S) test. Biesiada and Duch <ref type="bibr" target="#b34">[35]</ref> suggest that the K-S test helps in feature selection of high-dimensions and can significantly improve the performance of classifiers such as the one used here (SVM). The K-S test allows us to understand the maximum difference between the cumulative distribution of two random variables. Therefore, we assume that the more dissimilar the vectors are, as determined by a two-tailed K-S test, the easier it is for the classifier to distinguish between classes.</p><p>To start with, we retrieve the same vector representation for each user presented in Equation <ref type="formula" target="#formula_0">1</ref>. After that, considering that in this case we want to have a single representation of a token t for each class, we average 𝐸 ⃗ 𝑢𝑠𝑒𝑟 [𝑡 𝑖 ] of all users to get the final representation of 𝑡 𝑖 , as in:</p><formula xml:id="formula_2">𝐸 ⃗ 𝑘𝑠 [𝑡 𝑖 ] = ∑︀ 𝑁 𝑛=1 𝐸 ⃗ 𝑢𝑠𝑒𝑟 [𝑡 𝑖 ] 𝑁<label>(2)</label></formula><p>In this case N is the number of users that have at least one occurrence of the given 𝑡 𝑖 and we call the vector 𝐸 ⃗ 𝑘𝑠 because it is used for the statistical test. We reach this point with two dictionaries, one for each class, with 𝑡 𝑖 as key and its corresponding 𝐸 ⃗ [𝑡 𝑖 ] as the value and are ready to run the K-S test. For example, the 𝐸 ⃗ [𝑔𝑢𝑛 ℎ𝑎𝑡𝑒𝑟 ] from the hater label as variable x and 𝐸 ⃗ [𝑔𝑢𝑛 𝑛𝑜𝑛−ℎ𝑎𝑡𝑒𝑟 ] as variable y, letting the K-S test be the function 𝐾 − 𝑆 𝑡𝑤𝑜−𝑡𝑎𝑖𝑙 (x, y).</p><p>The results of the K-S test for each token are p-values very close to 1 for most tokens, since they are drawn from the same distribution. However, this is not a problem in our case, because we do not want to know whether they are statistically significant according to the confidence interval. The goal instead is to find out which 𝐸 ⃗ 𝑘𝑠 [𝑡 𝑖 ] are lower compared to others, meaning that the 𝐸 ⃗ 𝑘𝑠 [𝑡 𝑖 ] between classes are more dissimilar.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2.">Model implementation</head><p>Now that we have the p-values for each token in our most frequent list, we must decide on a threshold that will select the number of relevant features to be fed into the classifier. This is done by various runs with different thresholds in the model. That is, we pick a p-value, get a single vector representation that is the average of all the 𝐸 ⃗ 𝑢𝑠𝑒𝑟 with t under the p-value of the K-S test, such as:</p><formula xml:id="formula_3">𝐸 ⃗ 𝑓 𝑒𝑎𝑡 [𝑢𝑠𝑒𝑟 𝑗 ] = ∑︀ 𝑁 𝑛=1 𝐸 ⃗ 𝑢𝑠𝑒𝑟 𝑗 [𝑡 &lt; 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑] 𝑁<label>(3)</label></formula><p>The embeddings is called feat because it is the feature representation of each user. 𝐸 ⃗ 𝑓 𝑒𝑎𝑡 [user] is fed to the classifier. The performance is evaluated in terms of accuracy with 5-fold crossvalidation. We finally choose the set of features that results in the model's best performance. The threshold selected in our submission is 0.998 and the set of tokens consists in a total of 394 tokens/features for the model. As a matter of fact, some of them can be very relevant semantically in the context of hate speech detection such as war, liberal, black, woman, violence, racism, bitch and so forth (all the tokens are presented in Appendix A -List of wordpieces used in the English model).</p><p>After having selected the features, we also try different layers of representations from BERT's outputs given that it is has been observed that each of the 12 layers capture different features of the input text <ref type="bibr" target="#b35">[36]</ref>. More precisely, we experiment with the last three layers because they seem to be the ones that encapsulate more context-specific representations <ref type="bibr" target="#b36">[37]</ref>. Hence, we run the classifier with feature vector representations from the 10th, 11th and 12th layer to see which performs the best. The 12th layer shows to produce better results even though the different in performance between one layer and another is not statistically significant.</p><p>As a final step, we add to the features the averaged CLS tokens of each tweet because we notice that even though the set of tokens is large, there are some users from the test set who do not contain any of those tokens. Again, we run tests to see which layer of the CLS token is more advantageous to the model and choose the 12th layer. The experiments are conducted with two kernels of the support vector machine, the radial basis function (rbf) and the polynomial kernel. We utilize Bayesian optimization technique <ref type="bibr" target="#b37">[38]</ref> for finding the best hyper-parameters in order to spare some time and computational power in executing the traditional grid search approach. The best performing model is the rbf (C≈14.0749, gamma≈0.0095).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Spanish</head><p>During the experimental phase, we tested the same approach used in the English data set on Spanish as well. However, in our final submission, we opted for a simpler model, which in our experiments worked better on the Spanish data.</p><p>This model follows a more straightforward approach inspired by text classification <ref type="bibr" target="#b35">[36]</ref> with BERT representations. It is based on text classification systems because all tweets of each user are treated like a long text. Besides that, given that the 200 tweets are longer than 512 tokens, such as in the case of text classification, in order to have a single representation of the whole text, we use the strategy of averaging the sequence representation of every tweet of the user. For that, we use the uncased and base pre-trained Spanish version of BERT called BETO<ref type="foot" target="#foot_1">2</ref>  <ref type="bibr" target="#b38">[39]</ref> throughout the training and testing of the Spanish data set.</p><p>After pre-processing, every tweet {𝑡𝑤 1 , 𝑡𝑤 2 ,... 𝑡𝑤 𝑛 } of a given user is fed into BETO. Then, we extract the vector representation of the CLS token, which encapsulates a single representation of the whole sequence. Lastly, we average these vectors to create the feature representation of each user such as in:</p><formula xml:id="formula_4">𝐸 ⃗ 𝑓 𝑒𝑎𝑡 [𝑢𝑠𝑒𝑟 𝑗 ] = ∑︀ 𝑁 𝑛=1 𝑡𝑤 ⃗ 𝑛 [cls_token] 𝑁<label>(4)</label></formula><p>Where N is always 200 given that this is a fixed number of tweets in 𝑢𝑠𝑒𝑟 𝑗 . The 𝐸 ⃗ 𝑓 𝑒𝑎𝑡 is fed into the support vector machine.</p><p>We also experimented with summing the CLS tokens and found that results are very similar, given that there is no confound with the frequency of tokens. Similarly to the English model, we test the last three layers of the CLS token in rbf and polynomial functions with Bayesian optimization, and verify that the 11th layer trained on the polynomial kernel (C≈7.3588, gamma≈0.0285, degree≈1.2859) and the 10th layer trained on rbf give the best result in the 5 fold cross-validation, so we submitted both for the shared task, and indeed they have even returned the same labels for the test set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>The 5-fold cross-validation results for the English model.  <ref type="table">1</ref> shows that the best accuracy in the training set in English is reached by the model with fewer features (210 tokens) compared to the second place. However, a paired t-test with the results of the 5 fold CV showed that the first and second model are not statistically significantly different in the training set (p-value=0.1998). The results of the test test, on the other hand, differ considerably from one another with the 394-token model reaching 4 points better accuracy. One reason for the difference in classification of the test set is that a broader range of tokens included in the features can enhance the performance on unseen data.</p><p>Alternatively to averaging, we try to sum the vectors based on the idea that the frequency with which words occur in the tweets may help the classifier to discriminate better between classes. However, despite having indeed performed better (accuracy 7% higher that our final submission model) in the training set overall, the classification in the test set was overwhelmingly imbalanced with 97 haters out of 100 users and the accuracy was also very imbalanced within the 5 fold cross-validation, showing that it did not generalize well in all folds. It suggests, though, that the classifier learns from the frequency and that there should be a similar number of occurrences for a reasonable performance.</p><p>For what concerns the Spanish model, even though we chose the model that extracts the CLS token representation from the 11th layer, the three models actually perform similarly in terms of accuracy as seen in Table <ref type="table" target="#tab_2">2</ref>. The first and second model have even returned the same labels for the classification. Moreover, we have attempted to apply same approach we used for the English data set, but the results in the training set drop considerably reaching lower performance than with the CLS token approach. One hypothesis for the difference in results could be related to the corpora on which the pre-trained language models are trained. It might be that the language used in the Spanish tweets is more similar to the language in the corpora used for training BETO compared to the English data set and BERT, which would help encompassing the meaning and context of the dataset more easily, therefore prompting better results. Nonetheless, it is difficult to know precisely the reason why they perform so differently, because of the lack of interpretability of these large language models.</p><p>In terms of layer selection, we observed that both models perform quite similarly when trained in each of the three last layers, suggesting that they have very similar representations of tokens.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>We present two novel approaches to profile haters on Twitter. The English approach relies on the idea that a set of words can be used in different contexts by the hater and non-hater users. The state-of-the-art language model BERT is adopted to capture the contextualized embeddings of tokens. Then, the difference of the vector representation in both classes is measured through the K-S statistical test. And finally, the relevant features are chosen by feeding a set of tokens from different thresholds of the test into the support vector machine.</p><p>In contrast, we have seen that the same approach does not work as well for the Spanish model. Therefore, inspired by text classification methods, we use the averaged vector representation of all CLS tokens from every tweet of each user as input for the support vector machine. Despite being a simpler model, it yields impressive results considering the amount of training data available.</p><p>As a future step, it would be interesting to test the same models with more training data to check whether it boosts their performance, and perhaps replace the SVM approach with a deep learning model. In addition, in this shared task we only process textual information, but in a real scenario other features related to metadata could be included to have more informative and characteristic features, which may improve classification. Lastly, more related to the English model, other types of statistical tests might be experimented as well, in order to distinguish better features for the model. Otherwise, after the statistical test, the model could be trained iteratively with the ablation technique to select best performing features among the ones already selected by the threshold.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: t-SNE applied on words as a coarse analysis to check the difference in the semantic space of the hater and non-hater classes. t-SNE has a perplexity of 15, 2 components and 3500 iterations.</figDesc><graphic coords="5,89.29,210.83,187.50,85.03" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Kernel Layer Feature (K-S test) Additional feature Acc. (train set) Acc. (test set)</head><label></label><figDesc>Our final submission model is the underlined model.</figDesc><table><row><cell>rbf</cell><cell>11th</cell><cell>210 tokens</cell><cell>CLS token</cell><cell>74%</cell><cell>69%</cell></row><row><cell>rbf</cell><cell>12th</cell><cell>394 tokens</cell><cell>CLS token</cell><cell>72%</cell><cell>73%</cell></row><row><cell>rbf</cell><cell>10th</cell><cell>210 tokens</cell><cell>CLS token</cell><cell>71%</cell><cell>-</cell></row><row><cell>rbf</cell><cell>10th</cell><cell>517 tokens</cell><cell>CLS token</cell><cell>70,5%</cell><cell>-</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table</head><label></label><figDesc></figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>The 5-fold cross-validation results for the Spanish model. Our final submission model is the underlined model.</figDesc><table><row><cell cols="2">Kernel Layer</cell><cell>Feature</cell><cell cols="2">Acc. (training set) Acc. (test set)</cell></row><row><cell>poly</cell><cell>11th</cell><cell>CLS token</cell><cell>84%</cell><cell>80%</cell></row><row><cell>rbf</cell><cell>10th</cell><cell>CLS token</cell><cell>84%</cell><cell>80%</cell></row><row><cell>rbf</cell><cell>12th</cell><cell>CLS token</cell><cell>82%</cell><cell>-</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://huggingface.co/transformers/main_classes/tokenizer.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://github.com/dccuchile/beto</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Appendix A -List of wordpieces used in the English model cls_token, act, actually, ad, age, ah, air, al, almost, also, always, americans, amp, anti, around, attack, automatically, away, b, behind, better, big, bitch, black, block, body, border, boy, br, break, bro, buy, ca, california, call, cannot, car, care, change, check, checked, children, christmas, city, class, close, cnn, co, con, congress, control, cr, crazy, crime, cu, cut, da, days, de, dem, democrat, democrats, deserve, di, die, different, donald, drop, dude, dumb, e, election, else, em, end, energy, even, evidence, ex, f, fact, facts, fake, family, far, fast, feeling, fight, find, fine, folks, following, food, forever, fox, fr, friend, friends, full, g, ga, gas, gave, george, get, getting, give, glad, gone, great, guy, guys, h, ha, hair, half, hands, happen, happy, hard, hash, hell, help, high, history, hit, ho, hold, hot, hu, human, id, idea, im, imagine, imp, important, ins, interesting, j, k, keep, kid, kids, kind, last, late, law, less, let, liberal, listen, live, lives, living, lo, longer, look, lord, lot, mad, made, make, makes, mark, mask, matter, may, men, military, mine, minutes, miss, mom, month, months, mother, move, mr, ms, mu, n, ne, never, ni, night, obama, ok, okay, old, om, ones, open, order, others, p, paid, pan, past, pe, per, place, plan, play, playing, please, po, point, poor, post, power, pp, pray, press, put, question, r, race, racism, racist, rape, rather, read, ready, reason, red, republican, rest, right, room, sad, safe, save, say, saying, sc, second, see, seems, seen, self, send, sense, share, shit, shot, show, shut, side, sign, single, sit, sm, smoke, social, someone, song, soon, speak, special, stand, start, state, stay, step, stop, story, straight, strong, stuff, suck, sure, system, take, taking, team, test, thanks, thing, things, three, ti, time, times, took, top, tried, trip, try, trying, twitter, type, un, understand, user, using, va, vaccine, via, vibe, video, violence, vote, voted, voting, vs, wall, want, wanted, war, water, wear, wearing, whatever, white, whole, win, wish, wit, woman, wonder, word, world, worse, worst, would, wrong, x, ye, yeah, year, yes, yesterday, yet, yo, youtube, ##aa, ##al, ##c, ##ce, ##ck, ##d, ##e, ##ea, ##er, ##es, ##f, ##fs, ##ful, ##gga, ##h, ##ha, ##i, ##ie, ##ies, ##in, ##k, ##llo, ##n, ##na, ##o, ##ot, ##p, ##r, ##rs, ##ss, ##t, ##tf, ##ting, ##v, ##w, ##wed, ##wee, ##x, ##y, ##z, 0, 000, 1, <ref type="bibr" target="#b18">19,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr">2021,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b29">30,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr">50,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9</ref> </p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Profiling Hate Speech Spreaders on Twitter Task at PAN 2021</title>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">L D L P</forename><surname>Sarracén</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Fersini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">CLEF 2021 Labs and Workshops</title>
		<title level="s">Notebook Papers</title>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Overview of PAN 2021: Authorship Verification,Profiling Hate Speech Spreaders on Twitter,and Style Change Detection</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bevendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">L D L P</forename><surname>Sarracén</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kestemont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Manjavacas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Markov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mayerl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wolska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Zangerle</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">12th International Conference of the CLEF Association (CLEF 2021)</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">US. Cited in &quot;Library 2.0 and the Problem of Hate Speech</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T</forename><surname>Nockleby</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Electronic Journal of Academic and Special Librarianship</title>
		<editor>Margaret Brown-Sica and Jeffrey Beall</editor>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">2</biblScope>
			<date type="published" when="2000">Summer 2008. 2000</date>
			<publisher>Macmillan Reference</publisher>
		</imprint>
	</monogr>
	<note>Hate Speech&quot;, Encyclopedia of the American Constitution</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Characterizing and detecting hateful users on twitter</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ribeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Calais</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Santos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Almeida</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Meira</surname><genName>Jr</genName></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International AAAI Conference on Web and Social Media</title>
				<meeting>the International AAAI Conference on Web and Social Media</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">12</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Author profiling for abuse detection</title>
		<author>
			<persName><forename type="first">P</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Del Tredici</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yannakoudakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Shutova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 27th international conference on computational linguistics</title>
				<meeting>the 27th international conference on computational linguistics</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1088" to="1098" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Hateful symbols or hateful people? predictive features for hate speech detection on twitter</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Waseem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hovy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the NAACL student research workshop</title>
				<meeting>the NAACL student research workshop</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="88" to="93" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Automated hate speech detection and the problem of offensive language</title>
		<author>
			<persName><forename type="first">T</forename><surname>Davidson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Warmsley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Macy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Weber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International AAAI Conference on Web and Social Media</title>
				<meeting>the International AAAI Conference on Web and Social Media</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">11</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The effect of extremist violence on hateful speech online</title>
		<author>
			<persName><forename type="first">A</forename><surname>Olteanu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Castillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Boy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Varshney</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International AAAI Conference on Web and Social Media</title>
				<meeting>the International AAAI Conference on Web and Social Media</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">12</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making</title>
		<author>
			<persName><forename type="first">P</forename><surname>Burnap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Williams</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Policy &amp; internet</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="223" to="242" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval)</title>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Malmasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rosenthal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Farra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kumar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 13th International Workshop on Semantic Evaluation</title>
				<meeting>the 13th International Workshop on Semantic Evaluation</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="75" to="86" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Overview of the evalita 2018 hate speech detection task</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bosco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Dell'orletta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Poletto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sanguinetti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tesconi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EVALITA@CLiC-it</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Haspeede 2 @ evalita2020: Overview of the evalita 2020 hate speech detection task</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sanguinetti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Comandini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Nuovo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Frenda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Stranisci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bosco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Caselli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Patti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Russo</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Struß</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Siegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ruppenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Klenner</surname></persName>
		</author>
		<title level="m">Overview of germeval task 2, 2019 shared task on the identification of offensive language</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter</title>
		<author>
			<persName><forename type="first">V</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bosco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Fersini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nozza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Patti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Rangel Pardo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sanguinetti</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/S19-2007</idno>
		<ptr target="https://www.aclweb.org/anthology/S19-2007.doi:10.18653/v1/S19-2007" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 13th International Workshop on Semantic Evaluation, Association for Computational Linguistics</title>
				<meeting>the 13th International Workshop on Semantic Evaluation, Association for Computational Linguistics<address><addrLine>Minneapolis, Minnesota, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="54" to="63" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">SemEval-2020 task 12: Multilingual offensive language identification in social media (OffensEval</title>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rosenthal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Atanasova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Karadzhov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mubarak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Derczynski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Pitenis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ç</forename><surname>Çöltekin</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/2020.semeval-1.188" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fourteenth Workshop on Semantic Evaluation, International Committee for Computational Linguistics</title>
				<meeting>the Fourteenth Workshop on Semantic Evaluation, International Committee for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020. 2020</date>
			<biblScope unit="page" from="1425" to="1447" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Challenges in discriminating profanity from hate speech</title>
		<author>
			<persName><forename type="first">S</forename><surname>Malmasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Experimental &amp; Theoretical Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="187" to="202" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yannakoudakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Shutova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1809.00378</idno>
		<title level="m">Neural character-based composition models for abuse detection</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">A bert-based transfer learning approach for hate speech detection in online social media</title>
		<author>
			<persName><forename type="first">M</forename><surname>Mozafari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Farahbakhsh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Crespi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Complex Networks and Their Applications</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="928" to="940" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">A survey on hate speech detection using natural language processing</title>
		<author>
			<persName><forename type="first">A</forename><surname>Schmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegand</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the fifth international workshop on natural language processing for social media</title>
				<meeting>the fifth international workshop on natural language processing for social media</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Abusive language detection in online user content</title>
		<author>
			<persName><forename type="first">C</forename><surname>Nobata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tetreault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Thomas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Mehdad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 25th international conference on world wide web</title>
				<meeting>the 25th international conference on world wide web</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="145" to="153" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Learning from bullying traces in social media</title>
		<author>
			<persName><forename type="first">J.-M</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-S</forename><surname>Jun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bellmore</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies</title>
				<meeting>the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="656" to="666" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Detecting offensive language in social media to protect adolescent online safety</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="71" to="80" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">A lexicon-based approach for hate speech detection</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">D</forename><surname>Gitari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zuping</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Damien</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Long</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Multimedia and Ubiquitous Engineering</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="215" to="230" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Distributed representations of sentences and documents</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1188" to="1196" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Hate speech detection with comment embeddings</title>
		<author>
			<persName><forename type="first">N</forename><surname>Djuric</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Morris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Grbovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Radosavljevic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Bhamidipati</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th international conference on world wide web</title>
				<meeting>the 24th international conference on world wide web</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="29" to="30" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Glove: Global vectors for word representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</title>
				<meeting>the 2014 conference on empirical methods in natural language processing (EMNLP)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1301.3781</idno>
		<title level="m">Efficient estimation of word representations in vector space</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.04805</idno>
		<title level="m">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">TIRA Integrated Research Architecture</title>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Gollub</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-22948-1_5</idno>
	</analytic>
	<monogr>
		<title level="m">Information Retrieval Evaluation in a Changing World, The Information Retrieval Series</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Peters</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Visualizing data using t-sne</title>
		<author>
			<persName><forename type="first">L</forename><surname>Van Der Maaten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Hinton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of machine learning research</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Schuster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Norouzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Macherey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krikun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Macherey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Klingner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Shah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Johnson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Łukasz</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gouws</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kudo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kazawa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Stevens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kurian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Patil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Young</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Riesa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rudnick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Vinyals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hughes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<idno>CoRR abs/1609.08144</idno>
		<ptr target="http://arxiv.org/abs/1609.08144" />
		<title level="m">Google&apos;s neural machine translation system: Bridging the gap between human and machine translation</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Transformers: State-of-the-art natural language processing</title>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Delangue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cistac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Funtowicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Davison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shleifer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</title>
				<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="38" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ethayarajh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2104.08465</idno>
		<title level="m">Frequency-based distortions in contextualized word embeddings</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<author>
			<persName><forename type="first">I</forename><surname>Turc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1908.08962v2</idno>
		<title level="m">Well-read students learn better: On the importance of pre-training compact models</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Feature selection for high-dimensional data: A kolmogorov-smirnov correlation-based filter</title>
		<author>
			<persName><forename type="first">J</forename><surname>Biesiada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Duch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computer Recognition Systems</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="95" to="103" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">How to fine-tune bert for text classification?</title>
		<author>
			<persName><forename type="first">C</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Qiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Huang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">China National Conference on Chinese Computational Linguistics</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="194" to="206" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Ethayarajh</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1909.00512</idno>
		<title level="m">How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b37">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><surname>Nogueira</surname></persName>
		</author>
		<ptr target="https://github.com/fmfn/BayesianOptimization" />
		<title level="m">Bayesian Optimization: Open source constrained global optimization tool for Python</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">Spanish pre-trained bert model and evaluation data</title>
		<author>
			<persName><forename type="first">J</forename><surname>Cañete</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chaperon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fuentes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-H</forename><surname>Ho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pérez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">PML4DC at ICLR</title>
				<imprint>
			<date type="published" when="2020">2020. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
