=Paper=
{{Paper
|id=Vol-3033/paper36
|storemode=property
|title=Leveraging Bias in Pre-Trained Word Embeddings for Unsupervised Microaggression Detection
|pdfUrl=https://ceur-ws.org/Vol-3033/paper36.pdf
|volume=Vol-3033
|authors=Tolúlọpẹ́ Ògúnrẹ̀mí,Nazanin Sabri,Valerio Basile,Tommaso Caselli
|dblpUrl=https://dblp.org/rec/conf/clic-it/OgunremiSBC21
}}
==Leveraging Bias in Pre-Trained Word Embeddings for Unsupervised Microaggression Detection==
Leveraging Bias in Pre-Trained Word Embeddings for Unsupervised
Microaggression Detection
Tolúlo.pé. Ògúnrè.mı́1 , Nazanin Sabri2 , Valerio Basile3 , Tommaso Caselli4
1. Stanford University, United States, tolulope@stanford.edu
2. Independent Researcher nazanin.sabrii@gmail.com
3. University of Turin, Italy, valerio.basile@unito.it
4. University of Groningen, Netherlands, t.caselli@rug.nl
Abstract be identified ranging from offensive language to
more complex and dangerous ones, such as hate
Microaggressions are subtle manifesta- speech or doxing. Recently, there has been a grow-
tions of bias (Breitfeller et al., 2019). ing interest by the Natural Language Processing
These demonstrations of bias can often community in the development of language re-
be classified as a subset of abusive lan- sources and systems to counteract socially unac-
guage. However, not as much focus has ceptable language online. Most previous work has
been placed on the recognition of these in- focused on few, easy to model phenomena, ignor-
stances. As a result, limited data is avail- ing more subtle and complex ones, such as mi-
able on the topic, and only in English. Be- croaggressions (Jurgens et al., 2019).
ing able to detect microaggressions with- Microaggressions are brief, everyday ex-
out the need for labeled data would be ad- changes that denigrate stigmatised and culturally
vantageous since it would allow content marginalised groups (Merriam-Webster, 2021).
moderation also for languages lacking an- They are not always perceived as hurtful by ei-
notated data. In this study, we introduce an ther party, and they can often be detected as pos-
unsupervised method to detect microag- itive statements by current hate-speech detection
gressions in natural language expressions. systems (Breitfeller et al., 2019). The occasion-
The algorithm relies on pre-trained word- ally unintentional hurt caused by such comments
embeddings, leveraging the bias encoded is a reflection of how certain stereotypes of oth-
in the model in order to detect microag- ers are baked into society. Sue et al. (2007) de-
gressions in unseen textual instances. We fine microaggressions in the racial context, par-
test the method on a dataset of racial and ticularly when directed toward people of color, as
gender-based microaggressions, reporting “brief and commonplace daily verbal, behavioral,
promising results. We further run the algo- or environmental indignities”, such as: “you are a
rithm on out-of-domain unseen data with credit to your race.” (intended message: it is un-
the purpose of bootstrapping corpora of usual for someone of your race to be intelligent)
microaggressions “in the wild”, and dis- or “do you think you’re ready for college?” (in-
cuss the benefits and drawbacks of our dented message: it is unusual for people of color to
proposed method. succeed). The need for moderation of hateful con-
tent has previously been explored. For instance,
1 Introduction Mathew et al. (2019b) analyses the temporal ef-
The growth of Social Media platforms has been fects of allowing hate speech on Gab, and finds
accompanied by an increased visibility of expres- that the language of users tends to become more
sions of socially unacceptable language online. In and more similar to that of hateful users over time.
a 2016 Eurobarometer survey, 75% of people who Mathew et al. (2019a) further highlights that the
follow or participate in online discussions have spreading speed and reach of hateful content is
witnessed or experienced abuse or hate speech. much higher than with the non-hateful content. As
With this umbrella term, different phenomena can a result, being able to remove instances of hate-
ful language, such as microaggressions, is of great
Copyright © 2021 for this paper by its authors. Use per- importance.
mitted under Creative Commons License Attribution 4.0 In-
ternational (CC BY 4.0). Previous work on microaggressions with com-
putational methods is quite recent. Breitfeller et have been shown to contain and amplify the biases
al. (2019) is one of the first work to address mi- present in the data used to generate them (Boluk-
croaggressions in a systematic way, also introduc- basi et al., 2016; Lauscher and Glavaš, 2019;
ing a first dataset, SelfMA. A further contribu- Bhardwaj et al., 2020). As such, they often
tion specifically focused on racial microaggression exhibit gender and racial bias (Swinger et al.,
is Ali et al. (2020), where the authors focus on the 2019). Many studies have attempted to reduce
development of machine learning systems. this bias (Yang and Feng, 2020; Zhao et al., 2018;
In this study we introduce an unsupervised Manzini et al., 2019). In this work, we take a dif-
method for microaggression detection. Our ferent turn by using this bias to our advantage:
method utilizes the existing bias in word- rather than taming the hurtfulness of the repre-
embeddings to detect words with biased conno- sentations (Schick et al., 2021), we actively use
tations in the message. Although unsupervised it to promote social good. In this first study, we
approaches tend to be less competitive than their employ word representations derived from generic
supervised counterparts, our method is language- textual corpora of English, in order to capture the
independent and thus it can be applied to any lan- background knowledge needed to disambiguate
guage for which embedding representations exist. instances of microaggressions in the text. Re-
Furthermore, the reliance of our methods on spe- cently, however, there have been studies involving
cific lexical items and their context of occurrence word representations created from tailored collec-
makes transparent the flagging of a message as an tions of social media content aimed at capturing
instance of a microaggression. In addition to the abusive phenomena like verbal aggression (Dynel,
usefulness of our method in languages with no la- 2021) and hate speech (Caselli et al., 2020).
beled data, the reliance of our model on words in We devise a simple and effective method that
the sentences would make it interpretable as it al- exploits existing bias in word embeddings and
low human moderators to understand what the sys- identify words in a message that are related to
tem has based its decision on. particular and distant semantic areas in the em-
Our contributions can be summarised as fol- bedding space. Messages are analysed in three
lows: steps: first, for each token ti we compute its re-
• we introduce a new unsupervised method latedness to a list of manually curated seed words
for the detection of microaggressions which s = s1 , ..., sn denoting potential targets of mi-
builds on top of pre-trained word embed- croaggressions; second, we consider only the sim-
dings; ilarities of the pairs (ti , sj ) above an empirical
similarity threshold ST and compute their vari-
• we compare the performance of our model
ance vi ; finally, we classify the token ti as a micro
using different pre-trained word embeddings
aggression trigger, and consequently the message
(Glove, FastText, and Word2Vec) and discuss
as a micro aggression, if the vi is above an empir-
the potential reasons behind the differences;
ically determined variance threshold V T .
• we test the proposed algorithm on unseen The intuitive idea behind this algorithm is that
data from a different domain (i.e., Twitter), some lexical elements in a verbal microaggression
in order to qualitatively evaluate its efficacy are often (yet sometimes subtly) hinting at specific
in discovering new instances of microaggres- features of the recipient of the message, in an oth-
sion. erwise neutral lexical context.
The rest of this paper is structured as follows: In this work, we choose to focus on microag-
we introduce our method in Section 2. The data gressions related to race and gender, therefore the
and our results are reported in Section 3. We de- seed words have to be chosen accordingly. The
ploy our model and discuss its limitations in Sec- seed word lists for race and gender are, respec-
tion 4. Finally, we present the conclusion and fu- tively, [white, black, asian, latino, hispanic, arab,
ture work in Section 5. african, caucasian] and [girl, boy, man, woman,
male, female] for gender. There is also a practi-
2 Use the Bias Against the Bias
cal reasons to focus on gender and race, namely
Embedded representations, either from pre-trained the scarcity of data available for other categories
word embeddings or pre-trained language models, of microaggression and other idiosincrasies of the
Figure 1: Worked example of unsupervised method for word ”chopsticks” in the message ”Ford: Built
With Tools, Not With Chopsticks”
available datasets — the religion class was spe- Source Number of posts
cific to different religions, therefore hard to gener- SelfMA Gender 1,314
alise, sexuality and gender presented a large over- SelfMA Racial 1,278
lap, and so on. Tumblr 2,021
Table 1: Statistics of the two subsets of the
An example of how the proposed method works SelfMA dataset used in this paper, and the extra
is illustrated in Figure 1. In the example, con- data downloaded to balance the dataset.
sider the word ”chopsticks” in the message ”Ford:
Built With Tools, Not With Chopsticks” (from the 3 Experiments
SelfMA dataset, described in Section 3). The tar-
get word exhibits a much higher relatedness to To test our method, we use two subsets of the
the word asian (0.237) than any other seed words. SelfMA: microaggressions.com dataset (Breitfeller
Even just considering the seed words with a sim- et al., 2019), comprised of 1,314 and 1,278 Tumblr
ilarity above a fixed threshold (white, asian and, posts respectively1 . The posts in SelfMA are all
african), the variance of their similarity score with instances of microaggressions, manually tagged
respect to chopsticks is still higher than the vari- with one of four categories: race, gender, sexu-
ance threshold, and therefore this target word, in ality and religion. These posts can be tagged with
this context, triggers a microaggression accord- more than one form of microaggressions, mean-
ing to the algorithm. This process is repeated for ing certain instances can appear in both subsets
all the words in the message in order to detect of race and gender used for the purposes of this
microaggressions. Some categories of words are study. The dataset consists of first and second
bound to exhibit a high relatedness to all the seed hand accounts of microaggressions, as well as di-
words, e.g., “people” or “human”. This is the rea- rect quotes of phrases or sentences said to the per-
son to introduce the variance threshold in the fi- son posting. In order to reduce linguistic pertur-
nal step of our algorithm, to filter out these cases bation introduced by accounts of a situation, we
when classifying a given message, and instead fo- only take direct quotes found in the dataset as
cus on words that are related to different races (or instances of microaggressions that we can detect
genders) unevenly, with a skewed distribution of with our unsupervised method. For training, we
similarity scores. pull out direct quotes from the gender (561) and
racial (519) dataset to test the algorithm. In order
to balance the dataset, we scraped 2,021 random
An important by-product of this algorithm is Tumblr posts, for a total of 4,612 instances. Ta-
that the output is one or more trigger words, in ad- ble 1 summarises the composition of our dataset.
dition to the microaggression label — in the exam- It is important to note that a microaggression
ple, the trigger word is indeed chopsticks — there- can have multiple tags, so there is an overlap of
fore enabling a more informative and interpretable 1
Tumblr is a popular American microblogging platform
decision process. https://www.tumblr.com
instances. However, the seed words used to detect tion described in Section 2 with the only difference
microaggression types in the method are different of the target metric. The aim of this step is to only
for each target phenomenon (e.g., race, gender). label tweets as microaggressions with the highest
We ran the algorithm on the SelfMA dataset, possible degree of confidence. We set ST = 0.12
empirically optimising the two thresholds on the and V T = 0.014 for racial microaggressions lead-
training split, for each word embedding type and ing to Precision of .931 and ST = 0.13 and
each microaggression category, filtering by the V T = 0.019 for gender-based microaggressions
seed words listed in Section 2. We test the al- leading to a Precision of .912. Precision has been
gorithm with three pre-trained word embedding measured on the original SelfMA dataset used as
models for English, namely FastText (Joulin a validation set.
et al., 2016) (trained on Wikipedia and Com- We then run the unsupervised model on the new
mon Crawl), word2vec (Mikolov et al., 2013) Twitter dataset by automatically labelling 256,843
(trained on Google News), and GloVe (Penning- tweets for gender and 373,631 tweets for race. Af-
ton et al., 2014) (trained on Wikipedia, GigaWord ter the data is labeled, we manually explore the
corpus, and Common Crawl). The optimization is positive instances in order to evaluate the perfor-
performed by exhaustive grid search over the hy- mance of the model. The algorithm tuned for
perparamter space. high precision found in this dataset 6,306 gender-
The results, shown in Table 2, indicate that related microaggression candidates, 13,004 race-
FastText has a better F1 score on Racial mi- related microaggression candidates.
croaggressions while word2vec performs bet- We find that while the model does detect actual
ter on Gender microaggressions. The differ- instances of microaggression, there is a notice-
ence in performance between FastText and able amount of false positive instances. These
word2vec is not major, and we attribute this tweets discuss race or gender in some manner.
to the difference between the corpora on which However, they do not necessarily contain mi-
the two models were trained (i.e., web crawl croaggressions towards these groups. While the
and Wikipedia for FastText vs. news data model does learn to detect discussions of these
for word2vec). The GloVe pretrained model, topics, it seems to sometimes confuse these dis-
trained on a combination of newswire texts, en- cussions with microaggressions towards the afore-
cyclopedic entries and texts from the Web, under- mentioned groups. Some examples follow, para-
performs in both experiments. In general, the ab- phrased to avoid tracking the original messages.
solute figures are encouraging, especially consid-
ering the simplicity of this unsupervised approach. Saying ”Arrested Development isn’t
funny” in an office full of women just to
feel something
4 Discovering Microaggressions
“Men have moustaches, women have
To better understand the performance of our un-
oversized bracelets”
supervised model, we performed an additional ex-
periment. Our goal is to understand the false posi- The humorous attempts in this tweets hinge on
tive results and the potential harm the model could gender stereotypes, and therefore in some contexts
cause. To do so, we use our unsupervised model to it could be perceived as offensive by some recip-
label unseen instances from another domain (Twit- ients. The high relatedness in the word embed-
ter) than the SelfMA dataset (Tumblr) in order to ding space between some words (moustaches and
see how the model would perform in detecting mi- bracelets) and gender-related seed words (men and
croaggressions. women) triggers the detection algorithm.
We begin by performing keyword searches on The automatic detection of racial microaggres-
Twitter (using Twitter’s official API) and collect sions “in the wild” is more challenging than
a new dataset of of 3M tweets with seven key- gender-based ones, according to our manual ex-
words potentially containing race and gender ex- ploration of this automatically labeled dataset.
pressions.Next, we set the threshold values ST This may be due to the difficulty of crafting a
and V T in our model in order to obtain the highest list of seed words that is sufficiently race-related,
Precision scores, rather than the highest F1 value. but at the same time avoids generating too many
This step is performed exactly like the optimiza- false positives. We indeed found many of them,
Target Model Class Precision Recall F1-Score
not-MA .609 .746 .671
FastText MA .714 .570 .634
macro avg. .680
not-MA .692 .380 .491
Gender GloVe MA .603 .848 .705
macro avg. .598
not-MA .659 .789 .718
word2vec MA .769 .634 .694
macro avg. .706
not-MA .659 .875 .654
FastText MA .814 .547 .752
macro avg. .702
not-MA .765 .371 .500
Race GloVe MA .611 .896 .726
macro avg. .613
not-MA .640 .814 .747
word2vec MA .776 .584 .667
macro avg. .692
Table 2: Results of the experiment on the Gender and Racial subset of SelfMA, in terms of Precition
(P), Recall (R), and F1-score (F1) on the positive class (MA), on the negative class (not-MA), and their
macro-average. Best scores per microagression category are in bold.
mainly due to named entities and multi-word ex- beddings to detect subtly abusive language phe-
pressions such as “White House”, or simply be- nomena such as microagressions. While super-
cause of the polysemy of color words, e.g. “black” vised methods of detection in the field of natu-
and “white”. We, however, still found instances ral language processing are plentiful, these meth-
of messages containing different extent of racial ods are only viable for languages and topics with
stereotyping. available labeled datasets. That is however not the
“why are you being so dramatic? just case for many languages. As a result, the unsuper-
say I’m not originally arab, you don’t vised method of detection introduced in this study
have to fight about it” could help address the need for the moderation of
microaggressions in languages other than English.
“I will need to explain that to the chi- This is further helped by the availability of multi-
nese old lady who works at my school’s lingual word-embeddings as they would allow the
administrative office” method to be used in any of the languages sup-
In summary, running the unsupervised microag- ported by the embedding.
gression detection algorithm on unseen data seems The method is unsupervised and only needs a
to represent a promising intermediate step towards small list of seed words. Considering its simplic-
the semi-automatic creation of language resources ity, the results obtained from an experiment on
for this phenomenon. While the accuracy is not a dataset of manually annotated microaggressions
ideal, and lists of seed words have to be hand- are very promising. Further, the method is trans-
crafted carefully in order to avoid false positives, parent, explicitly identifying the words triggering
these drawbacks are balanced by the fairly cheap a microaggression, and thus paving the way for ex-
computational cost and the ease of application in a plainable microaggression detection.
multilingual scenario. Although the preliminary results are promising,
an experiment on unseen data from a different do-
5 Conclusion and Future Work
main shows that there is leeway for improvement.
In this paper we introduce a novel algorithm that Given that we are looking at the explicit words
exploits the existing bias in pre-trained word em- used in each message, our method is not sensitive
to implicit expressions like “you people” or “your Poria. 2020. Investigating gender bias in bert. arXiv
kind”, often occurring in microaggressions. We preprint arXiv:2009.05021.
would have to add further steps to our algorithm Tolga Bolukbasi, Kai-Wei Chang, James Zou,
to catch expressions like these. Venkatesh Saligrama, and Adam Kalai. 2016. Man
Polysemy is another known issue, e.g., in words is to computer programmer as woman is to home-
like “black” and “white” whose relatedness to cer- maker? debiasing word embeddings. arXiv preprint
arXiv:1607.06520.
tain identified trigger words could not necessarily
be due to race. While a careful composition of Luke Breitfeller, Emily Ahn, David Jurgens, and Yu-
the seed word lists helps to minimize this issue, a lia Tsvetkov. 2019. Finding microaggressions in
systematic approach to polysemy would certainly the wild: A case for locating elusive phenomena in
social media posts. In Proceedings of the 2019 Con-
be desirable. The seed word list may also be ex- ference on Empirical Methods in Natural Language
panded, either manually or exploiting existing lex- Processing and the 9th International Joint Confer-
icons such as HurtLex (Bassignana et al., 2018) ence on Natural Language Processing (EMNLP-
for offensive terms (including stereotypes for sev- IJCNLP), pages 1664–1674.
eral categories of individuals) or specialized lists Tommaso Caselli, Valerio Basile, Jelena Mitrović, and
of identity-related terms2 . Michael Granitzer. 2020. HateBERT: Retraining
In future work, we plan on improving our model BERT for Abusive Language Detection in English.
to account for lexical ambiguity, and the complex- arXiv preprint arXiv:2010.12472.
ity derived from the interference between prag- Marta Dynel. 2021. Humour and (mock) aggression:
matic phenomena and aggression, e.g., in humor- Distinguishing cyberbullying from roasting. Lan-
ous and ironic messages, following the intuition guage & Communication, 81:17–36.
in recent literature (Frenda, 2018) about the inter- Simona Frenda. 2018. The role of sarcasm in hate
connection between irony or sarcasm and abusive speech. a multilingual perspective. In e Doctoral
language online. Our current plan is to apply the Symposium of the XXXIVInternational Conference
algorithm presented in this paper to bootstrap the of the Spanish Society for Natural Language Pro-
cessing (SEPLN 2018), pages 13–17. Lloret, E.; Sa-
creation of a multilingual resource of online ver-
quete, E.; Martı́nez-Barco, P.; Moreno, I.
bal microaggressions and release it to the research
community. Armand Joulin, Edouard Grave, Piotr Bojanowski,
and Tomas Mikolov. 2016. Bag of tricks
Acknowledgements for efficient text classification. arXiv preprint
arXiv:1607.01759.
This work of Valerio Basile is partially funded
David Jurgens, Libby Hemphill, and Eshwar Chan-
by the project “Be Positive!” (under the 2019 drasekharan. 2019. A just and comprehensive strat-
“Google.org Impact Challenge on Safety” call). egy for using NLP to address online abuse. In Pro-
ceedings of the 57th Annual Meeting of the Asso-
ciation for Computational Linguistics, pages 3658–
References 3666, Florence, Italy, July. Association for Compu-
tational Linguistics.
Omar Ali, Nancy Scheidt, Alexander Gegov, Ella Haig,
Mo Adda, and Benjamin Aziz. 2020. Automated Anne Lauscher and Goran Glavaš. 2019. Are we con-
detection of racial microaggressions using machine sistently biased? multidimensional analysis of bi-
learning. In 2020 IEEE Symposium Series on Com- ases in distributional word vectors. arXiv preprint
putational Intelligence (SSCI), pages 2477–2484. arXiv:1904.11783.
IEEE.
Thomas Manzini, Yao Chong Lim, Yulia Tsvetkov, and
Elisa Bassignana, Valerio Basile, and Viviana Patti. Alan W Black. 2019. Black is to criminal as cau-
2018. Hurtlex: A multilingual lexicon of words to casian is to police: Detecting and removing mul-
hurt. In 5th Italian Conference on Computational ticlass bias in word embeddings. arXiv preprint
Linguistics, CLiC-it 2018, volume 2253, pages 1–6. arXiv:1904.04047.
CEUR-WS.
Binny Mathew, Ritam Dutt, Pawan Goyal, and Ani-
Rishabh Bhardwaj, Navonil Majumder, and Soujanya mesh Mukherjee. 2019a. Spread of hate speech
in online social media. In Proceedings of the 10th
2
See for instance this compendium of LGBTQIA+ termi- ACM conference on web science, pages 173–182.
nology: https://www.umass.edu/stonewall/si
tes/default/files/documents/allyship ter Binny Mathew, Anurag Illendula, Punyajoy Saha,
m handout.pdf Soumya Sarkar, Pawan Goyal, and Animesh
Mukherjee. 2019b. Temporal effects of un-
moderated hate speech in gab. arXiv preprint
arXiv:1909.10966.
Merriam-Webster. 2021. Merriam-webster’s definition
of microaggression. https://www.merriam-
webster.com/dictionary/microaggres
sion. Accessed: 2021-03-08.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-
rado, and Jeff Dean. 2013. Distributed representa-
tions of words and phrases and their composition-
ality. In C. J. C. Burges, L. Bottou, M. Welling,
Z. Ghahramani, and K. Q. Weinberger, editors, Ad-
vances in Neural Information Processing Systems,
volume 26. Curran Associates, Inc.
Jeffrey Pennington, R. Socher, and Christopher D.
Manning. 2014. Glove: Global vectors for word
representation. In EMNLP.
Timo Schick, Sahana Udupa, and Hinrich Schütze.
2021. Self-diagnosis and self-debiasing: A pro-
posal for reducing corpus-based bias in nlp. arXiv
preprint arXiv:2103.00453.
Derald Sue, Christina Capodilupo, Gina Torino, Jen-
nifer Bucceri, be Aisha, Kevin Nadal, and Marta Es-
quilin. 2007. Racial microaggressions in everyday
life: Implications for clinical practice. The Ameri-
can psychologist, 62:271–86, 05.
Nathaniel Swinger, Maria De-Arteaga, Neil Thomas
Heffernan IV, Mark DM Leiserson, and Adam Tau-
man Kalai. 2019. What are the biases in my word
embedding? In Proceedings of the 2019 AAAI/ACM
Conference on AI, Ethics, and Society, pages 305–
311.
Zekun Yang and Juan Feng. 2020. A causal inference
method for reducing gender bias in word embedding
relations. In Proceedings of the AAAI Conference on
Artificial Intelligence, volume 34, pages 9434–9441.
Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, and Kai-
Wei Chang. 2018. Learning gender-neutral word
embeddings. arXiv preprint arXiv:1809.01496.