1. Introduction

Bias Mitigation in Misogynous Meme Recognition: A Preliminary Study

Gianmaria Balducci

0 2

Giulia Rizzi

1 2

Elisabetta Fersini

2 0 PMI Reboot S.r.l. , Milan , Italy 1 Universitat Politècnica de València , Valencia , Spain 2 University of Milano-Bicocca , Milan , Italy

In this paper, we address the problem of automatic misogynous meme recognition by dealing with potentially biased elements that could lead to unfair models. In particular, a bias estimation technique is proposed to identify those textual and visual elements that unintendedly afect the model prediction, together with a naive bias mitigation strategy. The proposed approach is able to achieve good recognition performance characterized by promising generalization capabilities.

eol>Bias Mitigation Bias Estimation Misogyny Identification Meme

1. Introduction Most of the investigations propose a few bias estimation

metrics and related mitigation policies that are based on a fixed set of seed words to quantify and minimize the bias at the dataset or model level. When dealing with misogynous memes recognition, metrics to estimate the bias and techniques to mitigate it are still missing.

To this purpose, we provide the following main contributions: In the context of social media, memes have become popular as a means of expressing irony or opinions on various topics. However, these memes can also perpetuate discriminatory behaviours towards certain groups and minorities. Misogyny, in particular, has gained attention as a form of hateful language conveyed through memes in various ways, such as female stereotyping, shaming, objectification, and violence. While misogyny recogni- • a candidate biased elements identification in a tion mechanisms have been widely investigated focusing multi-modal setting, focusing on both textual and on textual sources (i.e., tweets) [ 1, 2, 3, 4 ], misogynous visual constituents of a meme; identification in multimodal settings, and in particular • a mitigation strategy at training time, named on memes, is still in its infancy. In [ 5 ], a few naive uni- Masking Mitigation, that masks the candidate bimodal and multimodal approaches have been investi- ased elements to reduce the distortion introduced gated to understand the contribution of textual and vi- by their presence. sual cues. Further investigations from the same authors [ 6 ] have introduced a multimodal approach that consid- The rest of the paper is organized as follows. In Section ers both visual (in the form of captioning) and textual 2 a summary of the state of the art is reported. In Section information to distinguish between misogynous and non- 3 the candidate biased element identification strategy is misogynous memes. Recently, the performance of mul- detailed. In Section 4 the proposed mitigation strategy tiple pre-trained and trained from scratch models have is presented. In Section 5 the experimental results are been compared to verify if domain-specific pre-training reported. In Section 6 conclusion are reported. could help to improve the recognition performance [ 7 ].

Independently on the textual, visual or multimodal 2. Related work sources, several authors highlighted how the classification models may be subject to bias that could afect the The majority of works on hate content detection focus real performance of the models [ 8, 9 ] in a real setting. on tweets, while, only in recent years, they have started to address multimodal content such as memes. For inCLiC-it 2023: 9th Italian Conference on Computational Linguistics, stance, the approach proposed in [ 5 ] aims to counter the Nov 30 — Dec 02, 2023, Venice, Italy phenomenon of memes that can convey sexist messages *$Cogr.breasldpuocncdii1n@g caaumthpours..unimib.it (G. Balducci); ranging from stereotyping women to shaming, objectig.rizzi10@campus.unimib.it (G. Rizzi); elisabetta.fersini@unimib.it fication, and violence, investigating both unimodal and (E. Fersini) multimodal approaches to understand the contribution 0000-0002-0619-0760 (G. Rizzi); 0000-0002-8987-100X (E. Fersini) of textual and visual cues. In [ 10 ], the authors indicate CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ©ACt2tEr0i2bU3utCRioonpWy4r.0igoIhnrttekfornsrahtthiooisnppaalp(PCerCrboByYcite4s.0ea)u.dthionrsg.Usse( CpeErmUittRed-uWndeSr.Correagti)ve Commons License how the visual mode may be much more informative for detecting hate speech than the linguistic mode in and (iii) the definition of a metric to quantify how a memes. More recently, two benchmark datasets have model could be biased from such elements. The proposed been proposed to facilitate the investigation related to method has been evaluated on the MAMI Dataset [ 12 ] misogynous meme detection. The first benchmark pre- consisting of 10.000 memes for training and 1.000 memes sented in [ 11 ] is composed of 800 memes from the most for testing. The MAMI test set will later be referred to as popular social media platforms. The dataset has been raw. labelled through a crowdsourcing platform, involving 60 subjects, in order to collect three evaluations for each 3.1. Candidate Bias Elements Estimation instance. Each instance, labelled according to misogyny, aggressiveness and irony, has been labelled by three an- As highlighted in the literature, classification models may notators from the crowd and three expert labellers. A be afected by bias: the presence of specific elements can more recent benchmark has been collected for MAMI lead the model to an erroneous behaviour by predicting shared task at SemEval 2022 [ 12 ]. The dataset, composed a specific label due to the presence of such elements. of 10.000 memes for training and 1.000 memes for testing, This distortion in the investigated data-derived models allowed to approach: (i) the identification of misogynistic can be in fact caused by an imbalance distribution, in memes, and (ii) the recognition of the type of misogyny relation to the prediction label, of specific terms or visual among potential overlapping categories. For the MAMI elements strongly associated with a given class label. challenge, most of the participants [ 13, 14, 15, 16 ] ex- Those candidate biased elements can be distinguished ploited pre-trained models and ensemble strategies. in candidate biased terms, which are related to the

Regarding the potential bias that the models could in- superimposed text of a meme, and candidate biased herit from the training dataset, most of the investigations tags, which are concerned with the objects that describe focus only on a unimodal setting and more precisely on the scene of a meme. We exploit a novel estimation the textual component [17, 18, 19]. In particular, special for identifying candidate biased elements [26] that attention has been devoted to identity terms, i.e. those overcomes the limitations of the Polarized Weirdness terms frequently associated with hateful expressions in Index (PWI) [27], which is unbounded and does not the dataset referred to a specific target (e.g., woman, consider the context in which the elements appear, and wife, girlfriend, etc...). It has been demonstrated that extended the estimation process to address more than such identity terms lead the models to biased implicit one modality. associations between such terms and a given class label, ifnally originating unfair predictions. In order to coun- Given a multimodal dataset , is a visual or textual teract the potential bias, several mitigation strategies element belonging to the set that comprises all the have been proposed in the literature. One of the most terms and tags of . A bias score () can be estimated widely used strategies is data augmentation [ 4, 20, 21 ], for each element according to the following formula: which consists in adding data containing examples of tnhoant-thoaxviec cthomemmoensttsdtihspartobprionrgtiobancaktethdoissteriibdeuntitoitny itnertmhes S(e) = |ℳ1 | ∑|ℳ=︁1| (+ | ) − (+ | { − }) (1) dataset. Alternative solutions are focused on mitigating directly the models by means of specific objective where ℳ is the set of memes containing , + represents functions [22, 23] or optimization strategies [24, 25, 26]. the misogynous label and denotes the set of terms Although the above-mentioned strategies represent a fun- and tags in a given meme . (+ | ) represents the damental step towards bias mitigation, they are defined probability of a meme of being associated with the for unimodal settings. Bias estimation and mitigation for misogynous label, given the terms and tags within the multimodal perspective are still missing for misogynous meme itself, and, analogously, (+ | { − }) denotes meme identification. the probability of a meme of being associated with the misogynous label +, given the text (tags) present 3. Bias Estimation in the instance (meme), excluding the evaluated element except for the term (tag) in analysis. The proposed In order to understand if a given misogyny identifica- bias score ranges into the interval [− 1; +1]. The higher tion model is biased, three main steps are performed: (i) positive the score, the more likely the element would Candidate Biased Elements Estimation, which allows us induce bias towards the positive class (misogynous). On to identify specific textual or visual elements that could the other hand, the lower negative the score, the more lead a model to unfair predictions, (ii) the creation of a likely the element would be associated with the negative Synthetic Dataset with specific characteristics that allow class (not misogynous). Terms and tags with a score close evaluating models behaviours in challenging examples, to zero, are considered neutral with respect to a given label.

We report in Tables 1 and 2 the set of biased terms

and biased tags identified on the MAMI training dataset. As we can see, the set of candidate biased terms with the highest score for the misogynous class is composed of words that are typically associated with some specific misogyny categories like dishwasher and chick for stereotype and whore for objectification. The remaining tokens are websites that have been used to collect only misogynous memes. A few terms identified as convey potential bias are related to the seed words used to collect the dataset (e.g. whore), confirming the ability of the proposed approach to capture the bias introduced in the dataset-creation phase (Selection Bias). On the other hand, the presence of other terms (e.g. chloroform) highlights the ability of the proposed approach to generalize with respect to the dataset creation process and include elements that may induce bias due to their unintended unbalanced distribution. Concerning the set of terms with the highest negative bias score for the not misogynous class, it is composed of words that are very general and commonly used in a variety of popular memes. An analogous consideration can be drawn for the candidate biased tags.

Term Candidate Biased Terms Misogynous Not Misogynous

Score Term Score demotivational dishwasher promotion whore chick motivate chloroform blond diy belong 0.39 0.38 0.35 0.35 0.34 0.33 0.30 0.30 0.30 0.28 towards the not misogynous class. Given a specific element + ∈ + and + ∈ +, we collected misogynous and not misogynous memes according to the following criteria: • a not misogynous meme is part of the synthetic dataset if it contains + (or +) and it does not contain any biased candidate terms (or tags) with a negative score. This is to evaluate the impact of + (or +) in introducing a bias towards the misogynous class in not misogynous memes; • a misogynous meme is part of the synthetic dataset if it contains + (or +) and it does not contain any other element in + (or +). This is to verify if the model, given the presence of + (or +), is able to perform well on misogynous memes.

An analogous procedure has been adopted to create

misogynous and not misogynous memes according to the candidate biased terms and tags with a negative score.

The synthetic test set will later be recalled as synt. 3.3. Multimodal Bias Estimation (MBE) 3.2. Synthetic Dataset In order to measure if a given model is afected by bias we introduce the Multimodal Bias Estimation (MBE) metIn order to measure the bias of the models when making ric, which combines the area under the curve ( ) predictions, a synthetic dataset has been created with spe- estimated on a test set belonging to the original MAMI cific characteristics that can efectively help to highlight test set and the area under curve estimated on the test the bias of the models given the presence of the candidate set belonging to the synthetic dataset ( ): biased elements.

In particular, let + and + be respectively the set of sacllotrhee,wbihaiscehdqcuaanldifieisdaetlee mteernmtss tahnadt taargesewxpitehctaedpotosiitniv-e = 12 + 21 (2) troduce the bias towards the misogynous class. Also, where is computed as reported in Equation 3. let − and − be respectively the set of all the biased ℳ represents the subgroup of memes identified by the candidate terms and tags with a negative score, which presence of a biased term , is the subset of selected qualifies elements that are expected to introduce the bias

∑︀ Subgroup(ℳ) + ∑︀ (ℳ) + ∑︀ (ℳ) 1 ∈ ∈ ∈ 2

∑︀ Subgroup(ℳ) + ∑︀ (ℳ) + ∑︀ (ℳ) + 1 ∈ ∈ ∈ 2 (3) | | || man 0.87 man 0.87 biased terms. ℳ denotes the subgroup of memes iden- unimodal representation of the memes. In particular, the tified by the presence of a biased tag and denotes the following modalities have been considered as (separate) subset of selected biased tags. input space:

is a three per-element AUC-based measure, which considers both the biased terms and the biased tags, composed of the following estimations: desk 0.8 desk 0.8 chair 0.43 chair 0.43 car

The MBE metric, which ranges into the interval [0, 1], esti

mates the ability of the models on performing a good prediction on the raw test data and simultaneously achieving a significant performance on memes that, by construction, can lead to a biased prediction.

4. Debiasing Strategy Several baseline models have been initially considered for

distinguishing between misogynous and not misogynous memes. We trained SVM, KNN, Naive Bayes, Decision Tree, and Multi-layer Perception independently on each • textual component, that is the transcription of the text contained within the meme (obtained with OCR) embedded through the Universal Sentence Encoder (USE) [28]. • visual component, expressed by the objects identified within the meme ( object tags) by the Scene Graph Generation method [29] and represented through a n-dimensional vector that denotes if a given meme contains one or more predefined objects with the corresponding probabilities.

The classifiers have been combined, accordingly to

each modality (e.g. visual or textual), through a Bayesian Model Averaging (BMA) [30] ensemble paradigm. BMA has been employed also for creating a multimodal ensemble that considers all the predictions provided by the above-mentioned models trained on each representation independently. 4.1. Mitigation Strategy

Bias mitigation is adopted in both unimodal and

multi-modal contexts. In an unimodal setting, only the considered modality is mitigated. In a multi-modal scenario, all the models based on visual and textual components that compose the ensemble are mitigated.

In order to debias the model at training time (and inference time), a Masking Mitigation is proposed. In particular, for what concerns the textual component, each biased term is masked according to the class label that they afect more (see Table 1). Any given biased term, estimated using to the strategy presented in section 3, is masked in the training dataset according to the class towards they induce bias. In particular, if a candidate biased term induces a bias towards the misogynous label, then it is replaced with a positive mask [POS-MASK] in misogynous memes. On the contrary, if a candidate biased term induces a bias towards the not misogynous label, then it is replaced with a negative mask [NEG-MASK] in not misogynous memes. An example is reported in the following.

Original Text: dishwasher so you...

When you can’t aford a new Masked Text: When you can’t aford a new [POS-MASK] so you...

Textual Component Only 0.7202 0.7801 0.7173 0.7041 0.7010 0.7687 0.6301 0.7475 0.7257 0.7521 0.7326 0.7841 0.6775 0.6811 0.7325 0.8052 component only: (1) training on the textual component Regarding the visual component, when a candi- only lead all the models to obtain good results on both date biased tag is present, the probability value of and test sets, (2) BMA is able to achieve rethat tag is set equal to 0 and a new feature indicating markable results compared with the baselines, (3) the the presence of the masking is added to the original proposed Masking Mitigation strategy (BMA-MM) sign-dimensional vector. A toy example is reported in nificantly outperforms all the baseline models and the Figure 1. original BMA, but also the REPAIR strategy. BMA-MM is able to maintain good recognition performance on the 5. Experimental Results test set, still improving significantly the generalization capabilities on the controversial memes available in We report in this section the results of the proposed miti- the test set. gation strategy, comparing the performance with several approaches. In particular, we report , Visual Component Only and related to each model enclosed in the en- Model semble, i.e., Support Vector Machines (SVM), K-Nearest SKVNMN 00..66860283 00..55994128 00..66326833 Neighbour (KNN), Naive Bayes (NB), Decision Tree (DT), NB 0.6635 0.5773 0.6204 and Multi-layer Perception (MLP) together with their DT 0.6499 0.5888 0.6194 Bayesian Model Averaging (BMA). We also show the per- MLP 0.6912 0.6047 0.6480 formance of the proposed Masking Mitigation on BMA BMA 0.6870 0.5990 0.6430 (BMA-MM). Finally, we report a baseline debiasing tech- REPAIR 0.6651 0.5922 0.6286 nique available in the state of the art. In particular, we BMA-MM 0.6655 0.6416 0.6535* used REPAIR [31] as a benchmark mitigation model. It Table 4 computes a weight for each sample based on its pro- Model performance using the visual component only. Bold portional loss contribution with respect to a reference denotes the best MBE, while (*) reflects that the mitigated model and resamples the original training dataset accord- model outperforms the best non-mitigated approach (BMA) ing to several strategies. In particular, given a weight and the improvement is statistically significant. for each meme , it keeps = 50% examples with the largest weight from each class. For what concerns Table 4, where the models have

We show in Tables 3-5, the comparison between all been trained using the visual component only, the conthe considered models, distinguished according to the siderations are a bit diferent. As demonstrated in other modalities used to perform the training and the corre- state-of-the-art studies [26], the visual component is less sponding mitigation phase. A T-test has been performed impactful on the recognition capabilities than the textual to compute the statistical equality with a pairwise analy- one. We hypothesize that the reduced contribution of the sis between the best-performing approach (BMA) against pictorial component is mainly due to conceptualization the compared mitigation strategies, i.e. BMA-MM and issues to relate a given object to a an abstract concept REPAIR. (e.g. dishwasher). However, also in this case, BMA is able

A few considerations can be derived from Table 3, to achieve better results than the baselines and BMA-MM where the models have been trained using the textual is still able to significantly outperform the original BMA and REPAIR.

Regarding the performance of the multimodal settings

reported in Table 5, we can assert that not only the proposed mitigation strategy significantly outperforms all the other configurations presented above, but it is also able to achieve a very promising compromise between and samples that facilitate the adoption of the BMA-MM in a real setting.

6. Conclusions

This paper addressed the problem of mitigating misogynous meme detection. In particular, a candidate biased element estimation and a corresponding mitigation strategy is proposed to perform fair prediction in a real setting. The proposed approach, validated on a benchmark dataset, achieved remarkable results both in terms of prediction and generalization capabilities, reducing the bias in a significant way.

Acknowledgments The work of Elisabetta Fersini has been partially funded

by the European Union – NextGenerationEU under the National Research Centre For HPC, Big Data and Quantum Computing - Spoke 9 - Digital Society and Smart Cities (PNRR-MUR), and by MUR under the grant “Dipartimenti di Eccellenza 2023-2027" of the Department of Informatics, Systems and Communication of the University of Milano-Bicocca, Italy. DD-TIG at semeval-2022 task 5: Investigating the of the 2022 Conference of the North American relationships between multimodal and unimodal Chapter of the Association for Computational Lininformation in misogynous memes detection and guistics: Human Language Technologies, Associaclassification, in: The 16th International Workshop tion for Computational Linguistics, Seattle, United on Semantic Evaluation, 2022. States, 2022, pp. 811–826. URL: https://aclanthology. [14] L. Chen, H. W. Chou, RIT boston at semeval-2022 org/2022.naacl-main.59. doi:10.18653/v1/2022. task 5: Multimedia misogyny detection by using naacl-main.59. coherent visual and language features from CLIP [24] V. Perrone, M. Donini, M. B. Zafar, R. Schmucker, model and data-centric AI principle, in: The 16th K. Kenthapadi, C. Archambeau, Fair bayesian optiInternational Workshop on Semantic Evaluation, mization, in: Proceedings of the 2021 AAAI/ACM 2022. Conference on AI, Ethics, and Society, 2021, pp. [15] S. Hakimov, G. S. Cheema, R. Ewerth, TIB-VA at 854–863.

semeval-2022 task 5: A multimodal architecture [25] S. Sikdar, F. Lemmerich, M. Strohmaier, Getfair: for the detection and classification of misogynous Generalized fairness tuning of classification models, memes, in: The 16th International Workshop on in: Proceedings of the 2022 ACM Conference on Semantic Evaluation, 2022. Fairness, Accountability, and Transparency, 2022, [16] J. M. ZHI, Z. Mengyuan, M. Yuan, D. Hu, X. Du, pp. 289–299.

L. Jiang, Y. Mo, X. Shi, PAIC at semeval-2022 task [26] G. Rizzi, F. Gasparini, A. Saibene, P. Rosso, E. Fersini, 5: Multi-modal misogynous detection in MEMES Recognizing misogynous memes: Biased models with multi-task learning and multi-model fusion, and tricky archetypes, Information Processing & in: The 16th International Workshop on Semantic Management 60 (2023) 103474.

Evaluation, 2022. [27] F. Poletto, V. Basile, M. Sanguinetti, C. Bosco, [17] D. Nozza, C. Volpetti, E. Fersini, Unintended bias V. Patti, Resources and benchmark corpora for hate in misogyny detection, in: IEEE/WIC/ACM Inter- speech detection: a systematic review, Language national Conference on Web Intelligence, WI ’19, Resources and Evaluation 55 (2021) 477–523. Association for Computing Machinery, New York, [28] D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, NY, USA, 2019, p. 149–155. URL: https://doi.org/ R. St. John, N. Constant, M. Guajardo-Cespedes, 10.1145/3350546.3352512. doi:10.1145/3350546. S. Yuan, C. Tar, B. Strope, R. Kurzweil, Univer3352512. sal Sentence Encoder for English, in: Empirical [18] N. Zueva, M. Kabirova, P. Kalaidin, Reducing unin- Methods in Natural Language Processing (EMNLP): tended identity bias in russian hate speech detec- System Demonstrations, 2018, pp. 169–174. tion, in: Proceedings of the Fourth Workshop on [29] X. Han, J. Yang, H. Hu, L. Zhang, J. Gao, P. Zhang, Online Abuse and Harms, 2020, pp. 65–69. Image scene graph generation (sgg) benchmark, [19] F. R. Nascimento, G. D. Cavalcanti, M. Da Costa- 2021. arXiv:2107.12604.

Abreu, Unintended bias evaluation: An analysis of [30] E. Fersini, E. Messina, F. A. Pozzi, Sentiment analyhate speech detection and gender bias mitigation sis: Bayesian Ensemble Learning, Decision Support on social media using ensemble learning, Expert Systems 68 (2014) 26–38.

Systems with Applications 201 (2022) 117032. [31] Y. Li, N. Vasconcelos, Repair: Removing representa[20] R. Zmigrod, S. J. Mielke, H. Wallach, R. Cotterell, tion bias by dataset resampling, in: Proceedings of Counterfactual data augmentation for mitigating the IEEE/CVF conference on computer vision and gender stereotypes in languages with rich morphol- pattern recognition, 2019, pp. 9572–9581. ogy, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 1651–1661. [21] K. Guo, R. Ma, S. Luo, Y. Wang, Coco at semeval2023 task 10: Explainable detection of online sexism, in: Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023), 2023, pp. 469–476. [22] M. Xia, A. Field, Y. Tsvetkov, Demoting racial bias in hate speech detection, in: Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media, 2020, pp. 7–14. [23] R. Sridhar, D. Yang, Explaining toxic text via knowledge enhanced text generation, in: Proceedings

[1]

Anzovino , E. Fersini,

Rosso , Automatic identiifcation and classification of misogynistic language on twitter , in: International Conference on Applications of Natural Language to Information Systems , Springer, 2018 , pp. 57 - 64 .

[2]

M. A.

Bashar ,

Nayak ,

Suzor , Regularising lstm classifier by transfer learning for detecting misogynistic tweets with small training set , Knowledge and Information Systems 62 ( 2020 ) 4029 - 4054 .

[3]

H. T.

Ta ,

A. B. S.

Rahman ,

Najjar ,

Gelbukh , Transfer learning from multilingual deberta for sexism identification , in: CEUR Workshop Proceedings , volume 3202 , CEUR-WS , 2022 .

[4]

Calderón-Suarez ,

R. M.

Ortega-Mendoza ,

Montes-Y-Gómez ,

Toxqui-Quitl ,

M. A.

Márquez-Vera , Enhancing the detection of misogynistic content in social media by transferring knowledge from song phrases , IEEE Access 11 ( 2023 ) 13179 - 13190 .

[5]

Fersini ,

Gasparini ,

Corchs , Detecting sexist MEME on the Web: A study on textual and visual cues , in: 8th International Conference on Afective Computing and Intelligent Interaction Workshops and Demos (ACIIW) , 2019 , pp. 226 - 231 .

[6]

Fersini ,

Rizzi ,

Saibene ,

Gasparini , Misogynous meme recognition: A preliminary study , in: International Conference of the Italian Association for Artificial Intelligence , Springer, 2021 .

[7]

Singh ,

Haridasan , R. Mooney, “ female astronaut: Because sandwiches won't make themselves up there”: Towards multimodal misogyny detection in memes , in: The 7th Workshop on Online Abuse and Harms (WOAH) , 2023 , pp. 150 - 159 .

[8]

Song ,

Giunchiglia ,

Li ,

Shi ,

Xu , Measuring and mitigating language model biases in abusive language detection , Information Processing & Management 60 ( 2023 ) 103277 .

[9]

Shen ,

Li ,

M. R.

Bouadjenek ,

Mai ,

Sanner , Towards understanding and mitigating unintended biases in language model-driven conversational recommendation , Information Processing & Management 60 ( 2023 ) 103139 .

[10]

B. O.

Sabat ,

C. C.

Ferrer , X. G. i Nieto , Hate speech in pixels: Detection of ofensive memes towards automatic moderation , 2019 . arXiv: 1910 .02334.

[11]

Gasparini ,

Rizzi ,

Saibene , E. Fersini, Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content , Data in brief 44 ( 2022 ) 108526 .

[12]

Fersini ,

Gasparini ,

Rizzi ,

Saibene ,

Chulvi ,

Rosso ,

Lees , J. Sorensen, SemEval2022 task 5: Multimedia automatic misogyny identification , in: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) , Association for Computational Linguistics , Seattle, United States, 2022 , pp. 533 - 549 .

[13]

Zhou ,

Zhao ,

Dong ,

Ding ,

Liu , K. Zhang,