<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Study on Sentimental Analysis, Homophobia-Transphobia Detection for Dravidian Languages</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Manoj</forename><forename type="middle">J</forename><surname>Balaji</surname></persName>
							<email>manojbalaji1@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Forum for Information Retrieval Evaluation</orgName>
								<address>
									<addrLine>December 9-13</addrLine>
									<postCode>2022</postCode>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Chinmaya</forename><surname>Hs</surname></persName>
							<email>chinmayasbhat4@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Forum for Information Retrieval Evaluation</orgName>
								<address>
									<addrLine>December 9-13</addrLine>
									<postCode>2022</postCode>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Study on Sentimental Analysis, Homophobia-Transphobia Detection for Dravidian Languages</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">01102AE67D81CBFE08F2FDD08048BF70</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-06-19T14:43+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Homophobia Detection</term>
					<term>Transphobia Detection</term>
					<term>Sentiment Analysis</term>
					<term>Social Media</term>
					<term>Dravidian Language</term>
					<term>MP-Net</term>
					<term>Classification</term>
					<term>Transformers</term>
					<term>LightGBM</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>With internet becoming highly accessible to mass population, there has been a tremendous increase in usage of social media, with the usage being spread across the Indian peninsula. Although this is advantageous, there's also increase in anti-social activities in the social media space. There has been an increase in hate speech especially the ones that lie in the spectrum of homophobia and trans-phobia. With a growing concern for preventing such posts on the social media, there are multiple efforts happening across the world. To solve this issue, we study two methods, fastText+LightGBM based classification for Sentimental analysis and MP-Net is used for homophobia-trans-phobia detection. For this study, we are using the dataset provided by the shared task on Sentiment Analysis and Homophobia detection of YouTube comments in Code-Mixed Dravidian Languages. The proposed methodology for sentimental analysis has macro f1 scores of 0.19, 0.3, 0.2 for Tamil, Kannada and Malayalam respectively and for homophobia-transphobia detection, the macro f1 scores are 0.234, 0.493, 0.942, 0.316 for Tamil, English, Malayalam and Tamil-English respectively. The proposed solution outshines baselines for homophobiatransphobia detection.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In the recent advancement of technology and social media hatred towards LGBTQ+ community is also growing. Homophobia/transphobia refers to the actions resulting in threat, dread, dislike, discomfort or mistrust of lesbian, gay, transgender or bisexual person <ref type="bibr" target="#b0">[1]</ref>. Social media, as it provides medium for communication, allowing the users to express their views, ideas and feelings on anything at any time. The power of sharing resources, materials to support their views are also enabled using social media platforms <ref type="bibr" target="#b1">[2]</ref>  <ref type="bibr" target="#b2">[3]</ref>. The abundance of data available online can enable researchers to use natural language processing to interpret, quantify, and monitor the user behavior, propagation of information across different communities and events influence by these online information <ref type="bibr" target="#b3">[4]</ref>.</p><p>Internet is home to a wide verity of racist, sexist, homophobic, trans phobic and all sorts of unpleasant content.The increase in the quantity of such contents have appeared as a problem for online communities <ref type="bibr" target="#b4">[5]</ref>. The wide verity of data available on social media platforms such as YouTube, Facebook and others are ever changing and are influencing the way people think, talk and connect with each other.Social media platforms also provide a great avenue to venture into the darker side of internet, like, share and support the violent, sexist, homophobic content creation and sharing <ref type="bibr" target="#b5">[6]</ref>.</p><p>With increasing content available on the internet, the computer scientists, linguists and researchers have an opportunity to build and use automated solutions that can mitigate or ban anti LGBTQ+ harmful content, and try to make internet a place of equality, diversity and inclusion. While much work has been put into the domain of aggression identification <ref type="bibr" target="#b6">[7]</ref>, misogyny <ref type="bibr" target="#b7">[8]</ref> [5], and racism <ref type="bibr" target="#b8">[9]</ref>, homophobic or transphobic verbal abuse, on the other hand, was given as far less important than racist or other prohibited issues Recent advancements in the attention mechanism used in transformers, which are becoming very popular in low resource Dravidian languages like, Tamil, Malayalam, Kannada among others. Lack of language carpus available to train makes it difficult to train the models without using embedding where transformers are acting as a solution. Bert <ref type="bibr" target="#b9">[10]</ref> and XLnet <ref type="bibr" target="#b10">[11]</ref>, which are the two highly popular models used for the text classification and which are pose to have drawbacks which are overcome with MPNet.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Wast availability of data on the internet attracted many researchers and computer scientist to develop and research possible solutions to tackle the hatred towards LGBTQ+ communities. One of the early studies towards identifying offensive comment identification in dravidian languages (Tamil) <ref type="bibr" target="#b11">[12]</ref> [13] followed by DravidianLangtech <ref type="bibr" target="#b13">[14]</ref> shed light towards possibilities in bringing equality and diversity for LGBTQ+ people who are also ill-treated in these part. Dataset for HASOCDravidianCodeMix which consisted of 4000 comments which were collected from twitter and other social media platforms. Similar work DravidianLangTech comprised of 30 thousand YouTube comments, which were annotated by multiple volunteers. Both these datasets are code mixed Datasets. Based on these two datasets k <ref type="bibr" target="#b13">[14]</ref>. Inspired by these works, our previous work on dravidian code mix dataset (troll-meta) <ref type="bibr" target="#b14">[15]</ref>, created a hybrid deep learning model which performed classification of given images to one of the 2 classes. The work focused on classification of data into offensive speech and neutral ones.</p><p>Works by Ljubešić et al. <ref type="bibr" target="#b15">[16]</ref> constructed lexicons of several languages including Croatian, Dutch and Slovene. And using these lexicons to identify texts containing socially unacceptable words towards topics of migrants and LGBTQ+. Even though this is a great work, but it fails to meet the end goals as it was in the early stages of research, it lacks confidence in classification task.</p><p>DravidianCodeMix, a recent work proposes a multilingual model which tries to establish a baseline model to conduct further research <ref type="bibr" target="#b0">[1]</ref>. The corpera included comments collected from Youtube, belonging to 4 major dravidian languages, Kannada, Tamil and Malayalam. The data set has Kannada-English, Tamil-English and Malayalam-English datasets.Which are annotated by human volunteers.</p><p>Our work focus on sentiment analysis and classification of homophobic and transphobic comments that are collected by YouTube.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Dataset</head><p>The dataset is provided as part of shared task on "Sentiment Analysis and Homophobia detection of YouTube comments in Code-Mixed Dravidian Languages" <ref type="bibr" target="#b16">[17]</ref>. The data set consists of annotated data for sentiment analysis and offensive language identification for a total of more than 60 thousand individual comments on YouTube videos.</p><p>The dataset count for individual languages and tasks are tabulated in Tables <ref type="table" target="#tab_1">1, 2</ref>, 3 and 4. The dev dataset for Homophobia/Transphobia detection contained 526 Non-anti-LGBTQ+ content, 103 homophobic and 37 transphobic comments for Tamil, 532 non-anti-LGBTQ+ content, 58 homophobic and 2 transphobic comments for English, for Malayalam 692,133 and 41 comments for Non-anti-LGBTQ+ content, homophobic and transphobic labels respectively. Tamil-English dev data consisted of 862 non-anti-LGBTQ+ content, 66 homophobic and 38 transphobic comments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Approach</head><p>The approach to the solution started with cleaning the data, making it free from special characters, converting the Kannada-English, Tamil-English and Malayalam-English to lowercase.</p><p>Emoji's as the name suggests, which is used to express emotions in the form of graphics, images, pictogram or ideogram embedded with text. We consider these as one of the major driver in finding emotions such as sarcasm, sadness, happiness etc. They play major role in finding or classifying emotions and analysing the sentiments in the given comments. We used the Python library (https://pypi.org/project/emoji/) to convert the emoji's to text.</p><p>For Example: Original: Idha pathutu road la students kathi vaichi kitu sanda poduvanunga 🤦 Translated: Idha pathutu road la students kathi vaichi kitu sanda poduvanunga Face Palm The study involves 2 tasks which are Sentimental Analysis and Homophobia-Transphobia Detection which will be referred to as Task A and Task B henceforth. Task A involved 3 different languages which are Tamil, Kannada, and Malayalam whereas for Task B, 4 languages i.e. Tamil, English, Malayalam, and Tamil-English(combination of both, often called as Tanglish in colloquial language) were in consideration. For both the tasks, two methods were experimented. The first method involved building a text classifier using MP-Net <ref type="bibr" target="#b18">[19]</ref> and the second method involved word generating embedding using fastText <ref type="bibr" target="#b19">[20]</ref>, followed by dimension-wise averaging, finally classifying using LightGBM <ref type="bibr" target="#b20">[21]</ref> to obtain the required results.</p><p>For LightGBM, 15 num_leaves, min_child_weight of 1e-1, subsample of 0.8 and random state of 42 give the better results for the task of Homophobia/Transphobia detection.</p><p>Similar to this, MPNet trained with a learning rate of 2e-5, weight decay of 0.01 and batch size of 8 for the task of sentiment analysis.</p><p>For Sentimental Analysis, MP-Net method was used, and for Homophobia-Transphobia Detection, for Tamil and Malayalam, fastText+LightGBM method was used whereas for English and Tamil-English, MP-Net was used.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head><p>The research activity performed for sentiment analysis as well as classifying the comments into homophobic/transphobic comments or not. The model involved two different machine learning model for the classification problem, part-A is a decision tree based LightGBM <ref type="bibr" target="#b20">[21]</ref>, where as the part-B was a hybrid model of masked language modeling and permuted language modeling <ref type="bibr" target="#b18">[19]</ref>.</p><p>To analyze the results of the model, a confusion matrix was constructed, and the weighted f1 is calculated.</p><p>The results for the conducted research activity are tabulated in the table 5 and 6. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Error Analysis</head><p>The research activity carried out shed the light towards the setbacks faced during training and evaluation steps. One of the important reason for the performance of the models is lack of data pertaining to dravidian languages compared to other, as it can be seen in the result of classifying homophobia/transphobia. Wherein, the rank in English task is 1, which is because the MPNet is trained on larger English language data corpus. That is the reason why we explored fastText as an alternate option. The other models in the same task were trained on fastText+LightGBM, even though fastText were also trained on Tamil/Malayalam language corpus, due to the differences in colloquial language versus formal language in which the models were trained on, the results were poor. Tamil-English performed poor in ranking compared to others, which is likely as pre-trained models seldomly comes across language code-switch, thus failing to provide better representation embedding. Dataset size is another aspect that we analyze the results. The size of the dataset available for Tamil language is more compared the others, due to which performance is better, which can be clearly seen in the results. With the minimum precision, Malayalam language had least quantity of data next to Kannada. Even with embeddings from transformers, the quantity of data were not enough for better generalization.</p><p>In terms of of improvement, pre-training or fine-tuning these aforementioned models on the available dataset, will significantly increase the quality of predictions. Also we will have to explore other methodologies to handle low-resource constraints and strive to achieve best results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusion</head><p>We experimented with both fastText along with LightGBM and MPNet which were able to provide some improvements over the baseline models. Even with considerable improvements the models experienced some shortcomings <ref type="bibr" target="#b21">[22]</ref> <ref type="bibr" target="#b17">[18]</ref>.</p><p>In the future works we are considering building custom transformers and enhanced architectures to gain a better results.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Architecture of the proposed system</figDesc><graphic coords="5,89.29,84.19,416.70,168.75" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Sentiment Analysis -Train DataFor sentiment analysis the dataset belonging Tamil language contained 20069 positive, 4271 negative and 4020, Kannada language contained 2823 positive, 1188 Negative and 574 Mixed feeling, and Malayalam language data consisted of 6421 positive, 2105 negative and 926 mixed feeling comments. All these are labeled by the volunteers as mentioned in<ref type="bibr" target="#b17">[18]</ref> </figDesc><table><row><cell></cell><cell>Tamil</cell><cell>Kannada</cell><cell>Malayalam</cell></row><row><cell>Positive</cell><cell>20069</cell><cell>2823</cell><cell>6421</cell></row><row><cell>Negative</cell><cell>4271</cell><cell>1188</cell><cell>2105</cell></row><row><cell>Mixed Feeling</cell><cell>4020</cell><cell>574</cell><cell>926</cell></row><row><cell></cell><cell>Tamil</cell><cell>Kannada</cell><cell>Malayalam</cell></row><row><cell>Positive</cell><cell>2257</cell><cell>321</cell><cell>706</cell></row><row><cell>Negative</cell><cell>480</cell><cell>139</cell><cell>237</cell></row><row><cell>Mixed Feeling</cell><cell>438</cell><cell>52</cell><cell>102</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Sentiment Analysis -Dev DataThe development dataset, which contained 2257 positive, 480 negative and 438 Mixed feeling data for Tamil language, 321 positive, 139 negative and 52 mixed feeling data for Kannada Language and for Malayalam there are 706 positive, 237 negative and 102 mixed feeling.</figDesc><table><row><cell></cell><cell>Tamil</cell><cell>English</cell><cell>Malayalam</cell><cell>Tamil-</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>English</cell></row><row><cell>Non-anti-</cell><cell>2022</cell><cell>3001</cell><cell>2434</cell><cell>3438</cell></row><row><cell>LGBT+</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>content</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Homopho-</cell><cell>485</cell><cell>157</cell><cell>491</cell><cell>311</cell></row><row><cell>bic</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Transpho-</cell><cell>155</cell><cell>6</cell><cell>189</cell><cell>112</cell></row><row><cell>bic</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Homophobia/Transphobia Analysis -Train Data Homophobia/Transphobia detection training dataset contained 2022 Non-anti-LGBTQ+ content, 485 Homophobic and 155 Transphobic data for Tamil, 3002 Non-anti-LGBTQ+ content, 157 Homophobic and 6 Transphobic data for English, for language Malayalam, 2434 Non-anti-LGBTQ+content, 491 homophobic and 189 transphobic comments and finally for Tamil-English 3438 non-anti-LGBTQ+ content, 311 homophobic and 112 Transphobic comments.</figDesc><table><row><cell></cell><cell>Tamil</cell><cell>English</cell><cell>Malayalam</cell><cell>Tamil-</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>English</cell></row><row><cell>Non-anti-</cell><cell>526</cell><cell>732</cell><cell>692</cell><cell>862</cell></row><row><cell>LGBT+</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>content</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Homopho-</cell><cell>103</cell><cell>58</cell><cell>133</cell><cell>66</cell></row><row><cell>bic</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Transpho-</cell><cell>37</cell><cell>2</cell><cell>41</cell><cell>38</cell></row><row><cell>bic</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Table 4</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="3">Homophobia/Transphobia Analysis -Dev Data</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 6</head><label>6</label><figDesc>Results for Homophobia/Transphobia Detection</figDesc><table><row><cell>1-9</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Priyadharshini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ponnusamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Kumaresan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sampath</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Thenmozhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Thangasamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Nallathambi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.2109.00227</idno>
		<ptr target="https://arxiv.org/abs/2109.00227.doi:10.48550/ARXIV.2109.00227" />
		<title level="m">Dataset for identification of homophobia and transophobia in multilingual youtube comments</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">A benchmark dataset for learning to intervene in online hate speech</title>
		<author>
			<persName><forename type="first">J</forename><surname>Qian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bethke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Belding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">Y</forename><surname>Wang</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.1909.04251</idno>
		<ptr target="https://arxiv.org/abs/1909.04251.doi:10.48550/ARXIV.1909.04251" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The language of mental health problems in social media</title>
		<author>
			<persName><forename type="first">G</forename><surname>Gkotsis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Oellrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hubbard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Dobson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Liakata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Velupillai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Dutta</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/W16-0307</idno>
		<ptr target="https://aclanthology.org/W16-0307.doi:10.18653/v1/W16-0307" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, Association for Computational Linguistics</title>
				<meeting>the Third Workshop on Computational Linguistics and Clinical Psychology, Association for Computational Linguistics<address><addrLine>San Diego, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="63" to="73" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Incorporating textual information on user behavior for personality prediction</title>
		<author>
			<persName><forename type="first">K</forename><surname>Yamada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sasano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Takeda</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P19-2024</idno>
		<ptr target="https://aclanthology.org/P19-2024.doi:10.18653/v1/P19-2024" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, Association for Computational Linguistics</title>
				<meeting>the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, Association for Computational Linguistics<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="177" to="182" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Developing a multilingual annotated corpus of misogyny and aggression</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bhattacharya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bansal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bhagat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Dawer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Lahiri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Ojha</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.2003.07428</idno>
		<ptr target="https://arxiv.org/abs/2003.07428.doi:10.48550/ARXIV.2003.07428" />
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Let-mi: An arabic levantine twitter dataset for misogynistic language</title>
		<author>
			<persName><forename type="first">H</forename><surname>Mulki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ghanem</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.2103.10195</idno>
		<ptr target="https://arxiv.org/abs/2103.10195.doi:10.48550/ARXIV.2103.10195" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Aggression identification using deep learning and data augmentation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Risch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Krestel</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/W18-4418" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018)</title>
				<meeting>the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018)<address><addrLine>Santa Fe, New Mexico, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="150" to="158" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Profiling Italian misogynist: An empirical study</title>
		<author>
			<persName><forename type="first">E</forename><surname>Fersini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nozza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Boifava</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2020.restup-1.3" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language, European Language Resources Association (ELRA)</title>
				<meeting>the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language, European Language Resources Association (ELRA)<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="9" to="13" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Are you a racist or am I seeing things? annotator influence on hate speech detection on Twitter</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Waseem</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/W16-5618</idno>
		<ptr target="https://aclanthology.org/W16-5618.doi:10.18653/v1/W16-5618" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on NLP and Computational Social Science, Association for Computational Linguistics</title>
				<meeting>the First Workshop on NLP and Computational Social Science, Association for Computational Linguistics<address><addrLine>Austin, Texas</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="138" to="142" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.1810.04805</idno>
		<ptr target="https://arxiv.org/abs/1810.04805.doi:10.48550/ARXIV.1810.04805" />
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Xlnet: Generalized autoregressive pretraining for language understanding</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carbonell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Salakhutdinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.1906.08237</idno>
		<ptr target="https://arxiv.org/abs/1906.08237.doi:10.48550/ARXIV.1906.08237" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">HopeEDI: A multilingual hope speech detection dataset for equality, diversity, and inclusion</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2020.peoples-1.5" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Third Workshop on Computational Modeling of People&apos;s Opinions, Personality, and Emotion&apos;s in Social Media, Association for Computational Linguistics</title>
				<meeting>the Third Workshop on Computational Modeling of People&apos;s Opinions, Personality, and Emotion&apos;s in Social Media, Association for Computational Linguistics<address><addrLine>Barcelona, Spain (Online</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="41" to="53" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Overview of the hasoc track at fire 2020: Hate speech and offensive language identification in tamil, malayalam, hindi, english and german</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<idno type="DOI">10.1145/3441501.3441517</idno>
		<idno>doi:10.1145/3441501.3441517</idno>
		<ptr target="https://doi.org/10.1145/3441501.3441517" />
	</analytic>
	<monogr>
		<title level="m">Forum for Information Retrieval Evaluation, FIRE 2020</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="29" to="32" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Findings of the shared task on offensive language identification in Tamil, Malayalam, and Kannada</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Priyadharshini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Jose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Kumaresan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ponnusamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sherly</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2021.dravidianlangtech-1.17" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics</title>
				<meeting>the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics<address><addrLine>Kyiv</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="133" to="145" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">TrollMeta@DravidianLangTech-EACL2021: Meme classification using deep learning</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">B J</forename></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hs</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2021.dravidianlangtech-1.39" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics</title>
				<meeting>the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics<address><addrLine>Kyiv</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="277" to="280" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">The LiLaH emotion lexicon of Croatian, Dutch and Slovene</title>
		<author>
			<persName><forename type="first">N</forename><surname>Ljubešić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Markov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fišer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Daelemans</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2020.peoples-1.15" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Third Workshop on Computational Modeling of People&apos;s Opinions, Personality, and Emotion&apos;s in Social Media, Association for Computational Linguistics</title>
				<meeting>the Third Workshop on Computational Modeling of People&apos;s Opinions, Personality, and Emotion&apos;s in Social Media, Association for Computational Linguistics<address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="153" to="157" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Overview of the Shared Task on Sentiment Analysis and Homophobia Detection of YouTube Comments in Code-Mixed Dravidian Languages</title>
		<author>
			<persName><forename type="first">K</forename><surname>Shanmugavadivel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Subramanian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Kumaresan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">B</forename></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chinnaudayar Navaneethakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">S K</forename></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ponnusamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Palanikumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">B</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of FIRE 2022 -Forum for Information Retrieval Evaluation</title>
				<imprint>
			<publisher>CEUR</publisher>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Priyadharshini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ponnusamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Kumaresan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sampath</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Thenmozhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Thangasamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Nallathambi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2109.00227</idno>
		<title level="m">Dataset for identification of homophobia and transophobia in multilingual youtube comments</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">Mpnet: Masked and permuted pre-training for language understanding</title>
		<author>
			<persName><forename type="first">K</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-Y</forename><surname>Liu</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.2004.09297</idno>
		<ptr target="https://arxiv.org/abs/2004.09297.doi:10.48550/1-9ARXIV.2004.09297" />
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Enriching word vectors with subword information</title>
		<author>
			<persName><forename type="first">P</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.1607.04606</idno>
		<ptr target="https://arxiv.org/abs/1607.04606.doi:10.48550/ARXIV.1607.04606" />
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Lightgbm: A highly efficient gradient boosting decision tree</title>
		<author>
			<persName><forename type="first">G</forename><surname>Ke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Meng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Finley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-Y</forename><surname>Liu</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">I</forename><surname>Guyon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><forename type="middle">V</forename><surname>Luxburg</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Bengio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Wallach</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Vishwanathan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Garnett</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">30</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Overview of the shared task on homophobia and transphobia detection in social media comments</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Priyadharshini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Durairaj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mccrae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Buitelaar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kumaresan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ponnusamy</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.ltedi-1.57</idno>
		<ptr target="https://aclanthology.org/2022.ltedi-1.57.doi:10.18653/v1/2022.ltedi-1.57" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion, Association for Computational Linguistics</title>
				<meeting>the Second Workshop on Language Technology for Equality, Diversity and Inclusion, Association for Computational Linguistics<address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="369" to="377" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
