<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Leveraging Bias in Pre-Trained Word Embeddings for Unsupervised Microaggression Detection</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Tol</forename><forename type="middle">Úlo Pé</forename><surname>Òg</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Únrè</forename><surname>Mí</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Stanford University</orgName>
								<address>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nazanin</forename><surname>Sabri</surname></persName>
							<email>nazanin.sabrii@gmail.com</email>
						</author>
						<author>
							<persName><forename type="first">Valerio</forename><surname>Basile</surname></persName>
							<email>valerio.basile@unito.it</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Turin</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Tommaso</forename><surname>Caselli</surname></persName>
							<email>t.caselli@rug.nl</email>
							<affiliation key="aff2">
								<orgName type="institution">University of Groningen</orgName>
								<address>
									<country key="NL">Netherlands</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Leveraging Bias in Pre-Trained Word Embeddings for Unsupervised Microaggression Detection</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">0D6ECF5C2816376CE57D00CB0DED8D04</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T03:44+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Microaggressions are subtle manifestations of bias <ref type="bibr" target="#b4">(Breitfeller et al., 2019)</ref>. These demonstrations of bias can often be classified as a subset of abusive language. However, not as much focus has been placed on the recognition of these instances. As a result, limited data is available on the topic, and only in English. Being able to detect microaggressions without the need for labeled data would be advantageous since it would allow content moderation also for languages lacking annotated data. In this study, we introduce an unsupervised method to detect microaggressions in natural language expressions. The algorithm relies on pre-trained wordembeddings, leveraging the bias encoded in the model in order to detect microaggressions in unseen textual instances. We test the method on a dataset of racial and gender-based microaggressions, reporting promising results. We further run the algorithm on out-of-domain unseen data with the purpose of bootstrapping corpora of microaggressions "in the wild", and discuss the benefits and drawbacks of our proposed method.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The growth of Social Media platforms has been accompanied by an increased visibility of expressions of socially unacceptable language online. In a 2016 Eurobarometer survey, 75% of people who follow or participate in online discussions have witnessed or experienced abuse or hate speech. With this umbrella term, different phenomena can be identified ranging from offensive language to more complex and dangerous ones, such as hate speech or doxing. Recently, there has been a growing interest by the Natural Language Processing community in the development of language resources and systems to counteract socially unacceptable language online. Most previous work has focused on few, easy to model phenomena, ignoring more subtle and complex ones, such as microaggressions <ref type="bibr" target="#b9">(Jurgens et al., 2019)</ref>.</p><p>Microaggressions are brief, everyday exchanges that denigrate stigmatised and culturally marginalised groups <ref type="bibr" target="#b14">(Merriam-Webster, 2021)</ref>. They are not always perceived as hurtful by either party, and they can often be detected as positive statements by current hate-speech detection systems <ref type="bibr" target="#b4">(Breitfeller et al., 2019)</ref>. The occasionally unintentional hurt caused by such comments is a reflection of how certain stereotypes of others are baked into society. <ref type="bibr" target="#b18">Sue et al. (2007)</ref> define microaggressions in the racial context, particularly when directed toward people of color, as "brief and commonplace daily verbal, behavioral, or environmental indignities", such as: "you are a credit to your race." (intended message: it is unusual for someone of your race to be intelligent) or "do you think you're ready for college?" (indented message: it is unusual for people of color to succeed). The need for moderation of hateful content has previously been explored. For instance, <ref type="bibr" target="#b13">Mathew et al. (2019b)</ref> analyses the temporal effects of allowing hate speech on Gab, and finds that the language of users tends to become more and more similar to that of hateful users over time. <ref type="bibr" target="#b12">Mathew et al. (2019a)</ref> further highlights that the spreading speed and reach of hateful content is much higher than with the non-hateful content. As a result, being able to remove instances of hateful language, such as microaggressions, is of great importance.</p><p>Previous work on microaggressions with com-putational methods is quite recent. <ref type="bibr" target="#b4">Breitfeller et al. (2019)</ref> is one of the first work to address microaggressions in a systematic way, also introducing a first dataset, SelfMA. A further contribution specifically focused on racial microaggression is <ref type="bibr" target="#b0">Ali et al. (2020)</ref>, where the authors focus on the development of machine learning systems.</p><p>In this study we introduce an unsupervised method for microaggression detection.</p><p>Our method utilizes the existing bias in wordembeddings to detect words with biased connotations in the message. Although unsupervised approaches tend to be less competitive than their supervised counterparts, our method is languageindependent and thus it can be applied to any language for which embedding representations exist. Furthermore, the reliance of our methods on specific lexical items and their context of occurrence makes transparent the flagging of a message as an instance of a microaggression. In addition to the usefulness of our method in languages with no labeled data, the reliance of our model on words in the sentences would make it interpretable as it allow human moderators to understand what the system has based its decision on.</p><p>Our contributions can be summarised as follows:</p><p>• we introduce a new unsupervised method for the detection of microaggressions which builds on top of pre-trained word embeddings;</p><p>• we compare the performance of our model using different pre-trained word embeddings (Glove, FastText, and Word2Vec) and discuss the potential reasons behind the differences;</p><p>• we test the proposed algorithm on unseen data from a different domain (i.e., Twitter), in order to qualitatively evaluate its efficacy in discovering new instances of microaggression.</p><p>The rest of this paper is structured as follows: we introduce our method in Section 2. The data and our results are reported in Section 3. We deploy our model and discuss its limitations in Section 4. Finally, we present the conclusion and future work in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Use the Bias Against the Bias</head><p>Embedded representations, either from pre-trained word embeddings or pre-trained language models, have been shown to contain and amplify the biases present in the data used to generate them <ref type="bibr" target="#b3">(Bolukbasi et al., 2016;</ref><ref type="bibr" target="#b10">Lauscher and Glavaš, 2019;</ref><ref type="bibr">Bhardwaj et al., 2020)</ref>. As such, they often exhibit gender and racial bias <ref type="bibr" target="#b19">(Swinger et al., 2019)</ref>. Many studies have attempted to reduce this bias <ref type="bibr" target="#b20">(Yang and Feng, 2020;</ref><ref type="bibr" target="#b21">Zhao et al., 2018;</ref><ref type="bibr" target="#b11">Manzini et al., 2019)</ref>. In this work, we take a different turn by using this bias to our advantage: rather than taming the hurtfulness of the representations <ref type="bibr" target="#b17">(Schick et al., 2021)</ref>, we actively use it to promote social good. In this first study, we employ word representations derived from generic textual corpora of English, in order to capture the background knowledge needed to disambiguate instances of microaggressions in the text. Recently, however, there have been studies involving word representations created from tailored collections of social media content aimed at capturing abusive phenomena like verbal aggression <ref type="bibr" target="#b6">(Dynel, 2021)</ref> and hate speech <ref type="bibr" target="#b5">(Caselli et al., 2020)</ref>.</p><p>We devise a simple and effective method that exploits existing bias in word embeddings and identify words in a message that are related to particular and distant semantic areas in the embedding space. Messages are analysed in three steps: first, for each token t i we compute its relatedness to a list of manually curated seed words s = s 1 , ..., s n denoting potential targets of microaggressions; second, we consider only the similarities of the pairs (t i , s j ) above an empirical similarity threshold ST and compute their variance v i ; finally, we classify the token t i as a micro aggression trigger, and consequently the message as a micro aggression, if the v i is above an empirically determined variance threshold V T .</p><p>The intuitive idea behind this algorithm is that some lexical elements in a verbal microaggression are often (yet sometimes subtly) hinting at specific features of the recipient of the message, in an otherwise neutral lexical context.</p><p>In this work, we choose to focus on microaggressions related to race and gender, therefore the seed words have to be chosen accordingly. The seed word lists for race and gender are, respectively, [white, black, asian, latino, hispanic, arab, african, caucasian] and [girl, boy, man, woman, male, female] for gender. There is also a practical reasons to focus on gender and race, namely the scarcity of data available for other categories of microaggression and other idiosincrasies of the Figure <ref type="figure">1</ref>: Worked example of unsupervised method for word "chopsticks" in the message "Ford: Built With Tools, Not With Chopsticks" available datasets -the religion class was specific to different religions, therefore hard to generalise, sexuality and gender presented a large overlap, and so on.</p><p>An example of how the proposed method works is illustrated in Figure <ref type="figure">1</ref>. In the example, consider the word "chopsticks" in the message "Ford: Built With Tools, Not With Chopsticks" (from the SelfMA dataset, described in Section 3). The target word exhibits a much higher relatedness to the word asian (0.237) than any other seed words. Even just considering the seed words with a similarity above a fixed threshold (white, asian and, african), the variance of their similarity score with respect to chopsticks is still higher than the variance threshold, and therefore this target word, in this context, triggers a microaggression according to the algorithm. This process is repeated for all the words in the message in order to detect microaggressions. Some categories of words are bound to exhibit a high relatedness to all the seed words, e.g., "people" or "human". This is the reason to introduce the variance threshold in the final step of our algorithm, to filter out these cases when classifying a given message, and instead focus on words that are related to different races (or genders) unevenly, with a skewed distribution of similarity scores.</p><p>An important by-product of this algorithm is that the output is one or more trigger words, in addition to the microaggression label -in the example, the trigger word is indeed chopsticks -therefore enabling a more informative and interpretable decision process. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Source Number of posts</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Experiments</head><p>To test our method, we use two subsets of the SelfMA: microaggressions.com dataset <ref type="bibr" target="#b4">(Breitfeller et al., 2019)</ref>, comprised of 1,314 and 1,278 Tumblr posts respectively<ref type="foot" target="#foot_0">1</ref> . The posts in SelfMA are all instances of microaggressions, manually tagged with one of four categories: race, gender, sexuality and religion. These posts can be tagged with more than one form of microaggressions, meaning certain instances can appear in both subsets of race and gender used for the purposes of this study. The dataset consists of first and second hand accounts of microaggressions, as well as direct quotes of phrases or sentences said to the person posting. In order to reduce linguistic perturbation introduced by accounts of a situation, we only take direct quotes found in the dataset as instances of microaggressions that we can detect with our unsupervised method. For training, we pull out direct quotes from the gender (561) and racial (519) dataset to test the algorithm. In order to balance the dataset, we scraped 2,021 random Tumblr posts, for a total of 4,612 instances. Table 1 summarises the composition of our dataset.</p><p>It is important to note that a microaggression can have multiple tags, so there is an overlap of instances. However, the seed words used to detect microaggression types in the method are different for each target phenomenon (e.g., race, gender).</p><p>We ran the algorithm on the SelfMA dataset, empirically optimising the two thresholds on the training split, for each word embedding type and each microaggression category, filtering by the seed words listed in Section 2. We test the algorithm with three pre-trained word embedding models for English, namely FastText <ref type="bibr" target="#b8">(Joulin et al., 2016)</ref> (trained on Wikipedia and Common Crawl), word2vec <ref type="bibr" target="#b15">(Mikolov et al., 2013)</ref> (trained on Google News), and GloVe (Pennington et al., 2014) (trained on Wikipedia, GigaWord corpus, and Common Crawl). The optimization is performed by exhaustive grid search over the hyperparamter space.</p><p>The results, shown in Table <ref type="table">2</ref>, indicate that FastText has a better F1 score on Racial microaggressions while word2vec performs better on Gender microaggressions. The difference in performance between FastText and word2vec is not major, and we attribute this to the difference between the corpora on which the two models were trained (i.e., web crawl and Wikipedia for FastText vs. news data for word2vec). The GloVe pretrained model, trained on a combination of newswire texts, encyclopedic entries and texts from the Web, underperforms in both experiments. In general, the absolute figures are encouraging, especially considering the simplicity of this unsupervised approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Discovering Microaggressions</head><p>To better understand the performance of our unsupervised model, we performed an additional experiment. Our goal is to understand the false positive results and the potential harm the model could cause. To do so, we use our unsupervised model to label unseen instances from another domain (Twitter) than the SelfMA dataset (Tumblr) in order to see how the model would perform in detecting microaggressions.</p><p>We begin by performing keyword searches on Twitter (using Twitter's official API) and collect a new dataset of of 3M tweets with seven keywords potentially containing race and gender expressions.Next, we set the threshold values ST and V T in our model in order to obtain the highest Precision scores, rather than the highest F1 value. This step is performed exactly like the optimiza-tion described in Section 2 with the only difference of the target metric. The aim of this step is to only label tweets as microaggressions with the highest possible degree of confidence. We set ST = 0.12 and V T = 0.014 for racial microaggressions leading to Precision of .931 and ST = 0.13 and V T = 0.019 for gender-based microaggressions leading to a Precision of .912. Precision has been measured on the original SelfMA dataset used as a validation set.</p><p>We then run the unsupervised model on the new Twitter dataset by automatically labelling 256,843 tweets for gender and 373,631 tweets for race. After the data is labeled, we manually explore the positive instances in order to evaluate the performance of the model. The algorithm tuned for high precision found in this dataset 6,306 genderrelated microaggression candidates, 13,004 racerelated microaggression candidates. We find that while the model does detect actual instances of microaggression, there is a noticeable amount of false positive instances. These tweets discuss race or gender in some manner. However, they do not necessarily contain microaggressions towards these groups. While the model does learn to detect discussions of these topics, it seems to sometimes confuse these discussions with microaggressions towards the aforementioned groups. Some examples follow, paraphrased to avoid tracking the original messages.</p><p>Saying "Arrested Development isn't funny" in an office full of women just to feel something "Men have moustaches, women have oversized bracelets"</p><p>The humorous attempts in this tweets hinge on gender stereotypes, and therefore in some contexts it could be perceived as offensive by some recipients. The high relatedness in the word embedding space between some words (moustaches and bracelets) and gender-related seed words (men and women) triggers the detection algorithm.</p><p>The automatic detection of racial microaggressions "in the wild" is more challenging than gender-based ones, according to our manual exploration of this automatically labeled dataset. This may be due to the difficulty of crafting a list of seed words that is sufficiently race-related, but at the same time avoids generating too many false positives. We indeed found many of them, .692</p><p>Table <ref type="table">2</ref>: Results of the experiment on the Gender and Racial subset of SelfMA, in terms of Precition (P), Recall (R), and F1-score (F1) on the positive class (MA), on the negative class (not-MA), and their macro-average. Best scores per microagression category are in bold.</p><p>mainly due to named entities and multi-word expressions such as "White House", or simply because of the polysemy of color words, e.g. "black" and "white". We, however, still found instances of messages containing different extent of racial stereotyping.</p><p>"why are you being so dramatic? just say I'm not originally arab, you don't have to fight about it" "I will need to explain that to the chinese old lady who works at my school's administrative office"</p><p>In summary, running the unsupervised microaggression detection algorithm on unseen data seems to represent a promising intermediate step towards the semi-automatic creation of language resources for this phenomenon. While the accuracy is not ideal, and lists of seed words have to be handcrafted carefully in order to avoid false positives, these drawbacks are balanced by the fairly cheap computational cost and the ease of application in a multilingual scenario.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion and Future Work</head><p>In this paper we introduce a novel algorithm that exploits the existing bias in pre-trained word em-beddings to detect subtly abusive language phenomena such as microagressions. While supervised methods of detection in the field of natural language processing are plentiful, these methods are only viable for languages and topics with available labeled datasets. That is however not the case for many languages. As a result, the unsupervised method of detection introduced in this study could help address the need for the moderation of microaggressions in languages other than English. This is further helped by the availability of multilingual word-embeddings as they would allow the method to be used in any of the languages supported by the embedding.</p><p>The method is unsupervised and only needs a small list of seed words. Considering its simplicity, the results obtained from an experiment on a dataset of manually annotated microaggressions are very promising. Further, the method is transparent, explicitly identifying the words triggering a microaggression, and thus paving the way for explainable microaggression detection.</p><p>Although the preliminary results are promising, an experiment on unseen data from a different domain shows that there is leeway for improvement. Given that we are looking at the explicit words used in each message, our method is not sensitive to implicit expressions like "you people" or "your kind", often occurring in microaggressions. We would have to add further steps to our algorithm to catch expressions like these.</p><p>Polysemy is another known issue, e.g., in words like "black" and "white" whose relatedness to certain identified trigger words could not necessarily be due to race. While a careful composition of the seed word lists helps to minimize this issue, a systematic approach to polysemy would certainly be desirable. The seed word list may also be expanded, either manually or exploiting existing lexicons such as HurtLex <ref type="bibr" target="#b1">(Bassignana et al., 2018)</ref> for offensive terms (including stereotypes for several categories of individuals) or specialized lists of identity-related terms 2 .</p><p>In future work, we plan on improving our model to account for lexical ambiguity, and the complexity derived from the interference between pragmatic phenomena and aggression, e.g., in humorous and ironic messages, following the intuition in recent literature <ref type="bibr" target="#b7">(Frenda, 2018)</ref> about the interconnection between irony or sarcasm and abusive language online. Our current plan is to apply the algorithm presented in this paper to bootstrap the creation of a multilingual resource of online verbal microaggressions and release it to the research community.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Statistics of the two subsets of the SelfMA dataset used in this paper, and the extra data downloaded to balance the dataset.</figDesc><table><row><cell>SelfMA Gender</cell><cell>1,314</cell></row><row><cell>SelfMA Racial</cell><cell>1,278</cell></row><row><cell>Tumblr</cell><cell>2,021</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">Tumblr is a popular American microblogging platform https://www.tumblr.com</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>This work of Valerio Basile is partially funded by the project "Be Positive!" (under the 2019 "Google.org Impact Challenge on Safety" call).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Automated detection of racial microaggressions using machine learning</title>
		<author>
			<persName><forename type="first">Omar</forename><surname>Ali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nancy</forename><surname>Scheidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alexander</forename><surname>Gegov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ella</forename><surname>Haig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mo</forename><surname>Adda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Benjamin</forename><surname>Aziz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Symposium Series on Computational Intelligence (SSCI)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2020">2020. 2020</date>
			<biblScope unit="page" from="2477" to="2484" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Hurtlex: A multilingual lexicon of words to hurt</title>
		<author>
			<persName><forename type="first">Elisa</forename><surname>Bassignana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Valerio</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Viviana</forename><surname>Patti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">5th Italian Conference on Computational Linguistics, CLiC-it 2018</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">2253</biblScope>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
	<note>CEUR-WS</note>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">Rishabh</forename><surname>Bhardwaj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Navonil</forename><surname>Majumder</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2009.05021</idno>
		<ptr target="genderbiasinbert" />
		<title level="m">Soujanya 2 See for instance this compendium of LGBTQIA+ terminology</title>
				<imprint/>
	</monogr>
	<note type="report_type">arXiv preprint</note>
	<note>Investigating</note>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Man is to computer programmer as woman is to homemaker? debiasing word embeddings</title>
		<author>
			<persName><forename type="first">Tolga</forename><surname>Bolukbasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kai-Wei</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">James</forename><surname>Zou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Venkatesh</forename><surname>Saligrama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Adam</forename><surname>Kalai</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1607.06520</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Finding microaggressions in the wild: A case for locating elusive phenomena in social media posts</title>
		<author>
			<persName><forename type="first">Luke</forename><surname>Breitfeller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Emily</forename><surname>Ahn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Jurgens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yulia</forename><surname>Tsvetkov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1664" to="1674" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">HateBERT: Retraining BERT for Abusive Language Detection in English</title>
		<author>
			<persName><forename type="first">Tommaso</forename><surname>Caselli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Valerio</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jelena</forename><surname>Mitrović</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Granitzer</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2010.12472</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Humour and (mock) aggression: Distinguishing cyberbullying from roasting</title>
		<author>
			<persName><forename type="first">Marta</forename><surname>Dynel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Language &amp; Communication</title>
		<imprint>
			<biblScope unit="volume">81</biblScope>
			<biblScope unit="page" from="17" to="36" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The role of sarcasm in hate speech. a multilingual perspective</title>
		<author>
			<persName><forename type="first">Simona</forename><surname>Frenda</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Doctoral Symposium of the XXXIVInternational Conference of the Spanish Society for Natural Language Processing (SEPLN 2018)</title>
				<meeting><address><addrLine>Lloret; Moreno, I</addrLine></address></meeting>
		<imprint>
			<publisher>Martínez-Barco</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="13" to="17" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Bag of tricks for efficient text classification</title>
		<author>
			<persName><forename type="first">Armand</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Edouard</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Piotr</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tomas</forename><surname>Mikolov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1607.01759</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A just and comprehensive strategy for using NLP to address online abuse</title>
		<author>
			<persName><forename type="first">David</forename><surname>Jurgens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Libby</forename><surname>Hemphill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eshwar</forename><surname>Chandrasekharan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 57th Annual Meeting of the Association for Computational Linguistics<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2019-07">2019. July</date>
			<biblScope unit="page" from="3658" to="3666" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Are we consistently biased? multidimensional analysis of biases in distributional word vectors</title>
		<author>
			<persName><forename type="first">Anne</forename><surname>Lauscher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Goran</forename><surname>Glavaš</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1904.11783</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">Thomas</forename><surname>Manzini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chong</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yulia</forename><surname>Lim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alan</forename><forename type="middle">W</forename><surname>Tsvetkov</surname></persName>
		</author>
		<author>
			<persName><surname>Black</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1904.04047</idno>
		<title level="m">Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Spread of hate speech in online social media</title>
		<author>
			<persName><forename type="first">Binny</forename><surname>Mathew</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ritam</forename><surname>Dutt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pawan</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Animesh</forename><surname>Mukherjee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th ACM conference on web science</title>
				<meeting>the 10th ACM conference on web science</meeting>
		<imprint>
			<date type="published" when="2019">2019a</date>
			<biblScope unit="page" from="173" to="182" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Temporal effects of unmoderated hate speech in gab</title>
		<author>
			<persName><forename type="first">Binny</forename><surname>Mathew</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anurag</forename><surname>Illendula</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Punyajoy</forename><surname>Saha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Soumya</forename><surname>Sarkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pawan</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Animesh</forename><surname>Mukherjee</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1909.10966</idno>
		<imprint>
			<date type="published" when="2019">2019b</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><surname>Merriam-Webster</surname></persName>
		</author>
		<ptr target="https://www.merriam-webster.com/dictionary/microaggression" />
		<title level="m">Merriam-webster&apos;s definition of microaggression</title>
				<imprint>
			<date type="published" when="2021-03-08">2021. 2021-03-08</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Distributed representations of words and phrases and their compositionality</title>
		<author>
			<persName><forename type="first">Tomas</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ilya</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kai</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Greg</forename><forename type="middle">S</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeff</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">C</forename><forename type="middle">J C</forename><surname>Burges</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Bottou</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Welling</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Z</forename><surname>Ghahramani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><forename type="middle">Q</forename><surname>Weinberger</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">26</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Glove: Global vectors for word representation</title>
		<author>
			<persName><forename type="first">R</forename><surname>Jeffrey Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">D</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EMNLP</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in nlp</title>
		<author>
			<persName><forename type="first">Timo</forename><surname>Schick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sahana</forename><surname>Udupa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hinrich</forename><surname>Schütze</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2103.00453</idno>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Racial microaggressions in everyday life: Implications for clinical practice</title>
		<author>
			<persName><forename type="first">Derald</forename><surname>Sue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christina</forename><surname>Capodilupo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gina</forename><surname>Torino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jennifer</forename><surname>Bucceri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kevin</forename><surname>Aisha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marta</forename><surname>Nadal</surname></persName>
		</author>
		<author>
			<persName><surname>Esquilin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The American psychologist</title>
		<imprint>
			<biblScope unit="volume">62</biblScope>
			<biblScope unit="page">5</biblScope>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">What are the biases in my word embedding?</title>
		<author>
			<persName><forename type="first">Nathaniel</forename><surname>Swinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maria</forename><surname>De-Arteaga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Neil</forename><surname>Thomas Heffernan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">V</forename><surname>Mark Dm Leiserson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Adam</forename><surname>Tauman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kalai</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society</title>
				<meeting>the 2019 AAAI/ACM Conference on AI, Ethics, and Society</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="305" to="311" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">A causal inference method for reducing gender bias in word embedding relations</title>
		<author>
			<persName><forename type="first">Zekun</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Juan</forename><surname>Feng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI Conference on Artificial Intelligence</title>
				<meeting>the AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="9434" to="9441" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">Jieyu</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yichao</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zeyu</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wei</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kai-Wei</forename><surname>Chang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1809.01496</idno>
		<title level="m">Learning gender-neutral word embeddings</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
