<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Sentiment Analysis in Dravidian Code-Mixed YouTube Comments and Posts</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Sanjeepan</forename><surname>Sivapiran</surname></persName>
							<email>sanjeepan.18@cse.mrt.ac.lk</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science and Engineering</orgName>
								<orgName type="institution">University of Moratuwa</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Charangan</forename><surname>Vasantharajan</surname></persName>
							<email>charangan.18@cse.mrt.ac.lk</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science and Engineering</orgName>
								<orgName type="institution">University of Moratuwa</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Uthayasanker</forename><surname>Thayasivam</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science and Engineering</orgName>
								<orgName type="institution">University of Moratuwa</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Sentiment Analysis in Dravidian Code-Mixed YouTube Comments and Posts</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">8DB963E391367F542C0EC609C54570E2</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T01:37+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Sentiment Analysis&quot; Code-Mixed</term>
					<term>Transformers</term>
					<term>Tamil</term>
					<term>ULMFiT</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents the methodologies implemented while doing Sentiment Analysis on Dravidian code-mixed comments and posts collected from social media. With a dataset of code-mixed Tamil, We experimented with transformer-based models such as multilingual BERT and DistilBERT and ULMFiT. This work submitted to the track 'Sentiment Analysis for Dravidian Languages in Code-Mixed Text' organized by the Forum for Information Retrieval Evaluation. Although it received the seventh rank for the Tamil task in the benchmark, This paper improves upon the results by a margin to attain the final weighted F1 score of 0.61 for the Tamil task.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In the past few years, usage of social media platforms has drastically increased. With this trend, cyberbullying and hate speech also increased and created a need to analyze comments/posts on social media. Sentimental Analysis is a study that uses Natural Language Processing in identifying subjective opinions or emotional responses about a given topic. <ref type="bibr" target="#b0">[1]</ref> There are already multiple steps taken to make use of sentimental Analysis in monolingual texts. But there has been an indispensable demand for sentimental Analysis in code-mixed Dravidian languages (Tamil, Malayalam, and Kannada) <ref type="bibr" target="#b1">[2]</ref>. Code-mixing is a prevalent phenomenon in a multilingual community, and the code-mixed texts sometimes write in non-native scripts. <ref type="bibr" target="#b2">[3]</ref> Systems trained on monolingual data fail on code-mixed data due to the complexity of code-switching at different linguistic levels in the text.</p><p>The objective of our study is to classify YouTube comments into positive, negative, neutral, mixed emotions or if the word is not in Tamil, which is in code-mixed form <ref type="bibr" target="#b3">[4]</ref>. For this task, transformer architecture models Like multilingual BERT and DistilBERT yielded good results since they optimized for low-resourced languages like Tamil. Yet ULMFiT made the best results compared to transformer models. Since data was in code-mixed form, models had difficulty understanding semantic relationships and their respective contexts. We used the translation and transliteration technique to convey a word from one writing system to another while preserving the context and semantics to overcome this issue.</p><p>The rest of the sections in the paper are as follows. Section 2 reviews related experiment works in Sentiment Analysis. Section 3 describes the given dataset in the Shared Task <ref type="bibr" target="#b4">[5]</ref>. The fourth section(4) presents the system description and conducted experiments using different approaches and features as well as the results reaped from the experiments of our proposed system. Benchmark results are discussed in section 4.5 and finally, the conclusion.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Cyberbullying and hateful speech are unpleasant parts of social media. To ensure the wellbeing of the social media users from cyberbullying, social media companies always had to invest/contribute in sentimental analysis research. Due to that, an adequate amount of studies has been already done. Historically, there have been two approaches to solve sentimental analysis problems lexicon-based and machine learning approaches <ref type="bibr" target="#b5">[6]</ref>. Even though they produce moderately quality results, they failed against human-generated data. Due to that, new deep learning models such as Bidirectional Recurrent Neural Network(RNN) <ref type="bibr" target="#b6">[7]</ref> and Long Short-Term Memory(LSTM) network <ref type="bibr" target="#b7">[8]</ref> were introduced. On the other hand, <ref type="bibr" target="#b8">[9]</ref> conducted experiments in Kannada-English using the traditional learning approaches such as Logistic Regression(LR), Support Vector Machine(SVM), Multinomial Naive Bayes, K-Nearest Neighbors(KNN), Decision Trees(DT), and Random Forest (RF).</p><p>To address the sentiment analysis problem using the above techniques, We need a corpus. Since social-media comments/posts do not follow the strict grammar rules and also they are always in non-native scripts as well as code-mixed <ref type="bibr" target="#b9">[10]</ref>. <ref type="bibr" target="#b10">[11]</ref> created a gold standard Tamil-English code-switched, sentiment-annotated corpus containing 15,744 comment posts from YouTube to overcome the above situation. Moreover, Chakravarthi et al. <ref type="bibr" target="#b11">[12]</ref> created a standard corpus for Malayalam-English to increase the sentiment analysis tasks in the code-mixed contents.</p><p>[13] explored in Tamil-English, Kannda-English, and <ref type="bibr" target="#b13">[14]</ref>Malayalam-English by using the transformer-based model mBERT. The model performed well but failed in some text where code-mixed comes <ref type="bibr" target="#b14">[15]</ref>. As an extension work of this research work, <ref type="bibr" target="#b15">[16]</ref> conducted experiments on different kinds of models such as Bidirectional LSTM, mBERT, DistilBERT, and ULMFiT <ref type="bibr" target="#b16">[17]</ref> to overcome this issue. Moreover, they developed a standard Translation and Transliteration algorithm to convert the corpus into monolingual. From this approach, they could be able to improve their system's performance.</p><p>Over the past decade, different kinds of models introduced, but contrasted to conventional Recurrent Neural Network models (RNNs), the efficiency and performance of the transformer models such as BERT <ref type="bibr" target="#b17">[18]</ref>, DistilBERT <ref type="bibr" target="#b18">[19]</ref>, mBERT <ref type="bibr" target="#b19">[20]</ref> are remarkably distinguished. BERT <ref type="bibr" target="#b20">[21]</ref>) models designed to contextualize the text by jointly conditioning on both left and proper contexts. Due to that, transformer models can be used to produce a state-of-the-art result by just fine-tuning the output layer. After studying the above research studies, we decided to go with transformer models and ULMFiT.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Dataset</head><p>The Tamil-English data set is provided by the Dravidian-CodeMix-FIRE 2021 organizing committee, which extracted from Tamil YouTube comments/posts that contains three parts <ref type="bibr">(</ref> The data set contains three code-mixed sentences: Inter-Sentential switch, Intra-Sentential switch, and Tag switching. They wrote in either native Tamil script or English grammar with Tamil. Some comments wrote in Tamil script with English words between them. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">System Description and Result Analysis</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Preprocessing</head><p>Since the dataset collected from YouTube does not follow any grammar rules and is in code-mixed form. The dataset undergoes the Following steps to use the dataset efficiently.</p><p>• The first step is to stemming and lemmatization the words and lower casing the only romanized words as there is no such thing in Tamil script.  • The next step is to remove all emojis, special characters, numbers, and punctuations as they do not carry any meaning to the sentence. • Finally, we applied the algorithm introduced by <ref type="bibr" target="#b15">[16]</ref> to do translation and transliteration on the comments and posts to create a monolingual corpus.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Translation</head><p>After loading the dataset, we used an extensive corpus of English words from NLTK-corpus<ref type="foot" target="#foot_0">1</ref> to detect English words in a sentence; if the word is in the English dictionary, then we translated the word into native Tamil script; otherwise, we ignored the word. For this purpose, We used Google Translate API<ref type="foot" target="#foot_1">2</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Transliteration</head><p>Most of the comments are in code mixed form. Comments should be in the native script to get state-of-the-art results from transformers models. Transliteration is the process of transferring a word from the alphabet of one language to another. All non-native Tamil words converted into the same meaning Tamil words using transliteration. To achieve this, we used AI4Bharat Transliteration<ref type="foot" target="#foot_2">3</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Models</head><p>Recently released transformer models such as BERT achieves a state of the art results in text classification tasks. Considering the performance of transform models, we choose to start with multilingual BERT and DistilBERT. All of our transformer-based models are culled from HuggingFace<ref type="foot" target="#foot_3">4</ref> transformers library and the models' parameters are as stated in Table <ref type="table">3</ref>. Figure <ref type="figure" target="#fig_1">2</ref> depicts the architecture of our best-performed model(ULMFiT). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Parameters</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3</head><p>Common parameters for the models that we used during our experiments.</p><p>DistilBERT model is a small, fast, and light transformer-based model trained on the Wikipedia dataset. It has 40% fewer parameters than BERT, runs 60% faster while preserving over 95% of BERT's performances. Since our purpose is to train a model in Tamil(non-Latin script), we selected the distilbert-base-multilingual-cased model, which has six layers, 768 dimensions, 12 heads, and tantalizing 134M parameters.</p><p>We also experimented with bert-base-multilingual-cased as our pre-trained multilingual model having approximately 110M parameters with 12-layers and 768 hidden states.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5.">Results and Analysis</head><p>Teams were ranked by the weighted average F1 score of their model, and we received 7th rank. Even though our model got above rank, the F1-score difference between the first team is relatively low.</p><p>In the beginning, we start with our BERT model and it doesn't perform well. It may have happened due to the lack of BERT multilingual based model training in the Tamil language. In the next step, We approached the problem with the ULMFiT model, a transfer learning technique <ref type="bibr" target="#b21">[22]</ref>. ULMFiT's model architecture is different from transformer models, and it is an effective transfer learning method that can apply to any task in NLP. The table shows that ULMFiT To recreate this image, we used a source image from <ref type="bibr" target="#b7">[8]</ref>. After unfreezing all the layers, we did more epochs to train the whole neural network rather than just the last few layers. This method involves fine-tuning a pre-trained language model (LM) AWD-LSTM to a new dataset in such a manner that it does not forget what it previously learned. yielded an F1-Score of 0.6101, and DistilBert, mBERT yielded 0.60104 and 0.5963, respectively. Precision and recalls of the above models showen in Table <ref type="table">4</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Models</head><p>Precision Recall F1-Score ULMFiT 0.6075 0.6045 0.6101 DistilBert 0.5978 0.5984 0.6014 mBERT 0.5782 0.5627 0.5963</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 4</head><p>Weighted F1-scores according to the models on the test data-set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>In this research, we have analyzed different NLP techniques to classify offensive language in Tamil code-mixed YouTube comments <ref type="bibr" target="#b4">[5]</ref>. We used a novel technique, transliteration, which leverages the accuracy across all three models.Also, We experimented with transformer models and transfer learning technique(ULMFiT) models. Even though transformer models are more advanced, To our task, ULMFiT yields the best results. Since Tamil is a low-resourced language <ref type="bibr" target="#b22">[23]</ref>, our research also can be applied to other low-resourced languages without much difficulty.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Class distribution on Training set. Dataset is highly imbalanced where a number of comments/posts in positive is much higher higher than other classes.</figDesc><graphic coords="4,192.22,212.73,208.34,136.01" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: ULMFiT model's Architecture.To recreate this image, we used a source image from<ref type="bibr" target="#b7">[8]</ref>. After unfreezing all the layers, we did more epochs to train the whole neural network rather than just the last few layers. This method involves fine-tuning a pre-trained language model (LM) AWD-LSTM to a new dataset in such a manner that it does not forget what it previously learned.</figDesc><graphic coords="6,89.29,84.19,416.69,239.60" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Train, Validation, Test). The training, validation and testing datasets have 35,656, 3962, and 4392 comments, respectively, with annotated labels. The dataset consists of texts in five different classes as follows: Dataset samples for each sentiment class.</figDesc><table><row><cell>Text</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>describes the dataset statistics and it is visualized in Figure1. The following items show the five different classes of comments with a definition:</figDesc><table /><note>• Positive: Comments which are not offensive e.g: ennaya trailer Ku mudi Ellam nikkudhu... Vera level trailer.. • Negative: Comments which are offensive e.g: எந் ெதந் த youtube channel காரங் க எல் லாம் இைத ஜாதி ெவறி படம் குறாங் கேளாேளா அவங் ெகல் லாம் அந் த ஜாதி என் றறிக • Mixed Feelings: Comments which are both negative and positive e.g:Kaagam karaindhu koodi unnum, Manidham ennum moodar koodam koodi serdhu pagaimai kollum... Idil yaar uyarthinai yaar agrinai • Unknown State: Comments which are not determined e.g:Vandha raja vah dhaan varuven Vera level str • Not in Tamil: Comments which are not in native Tamil e.g:Subtitle me traller dekhne wale like karo</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Number of comment for each class in train, validation and test sets.</figDesc><table><row><cell>Label</cell><cell cols="3">Train Dev Test</cell></row><row><cell>positive</cell><cell cols="3">20070 2257 3190</cell></row><row><cell>negative</cell><cell>4271</cell><cell>480</cell><cell>315</cell></row><row><cell cols="2">unknown_state 5628</cell><cell>611</cell><cell>288</cell></row><row><cell>mixed-feelings</cell><cell>4020</cell><cell>438</cell><cell>71</cell></row><row><cell>Not-Tamil</cell><cell>1667</cell><cell>176</cell><cell>160</cell></row><row><cell>Total</cell><cell cols="3">35656 3962 4392</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://www.nltk.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://pypi.org/project/googletrans/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://pypi.org/project/ai4bharat-transliteration/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://github.com/huggingface/</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Findings of the shared task on offensive language identification in Tamil, Malayalam, and Kannada</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Priyadharshini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Jose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Kumaresan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ponnusamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sherly</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2021.dravidianlangtech-1.17" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics</title>
				<meeting>the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics<address><addrLine>Kyiv</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="133" to="145" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Comparison of pretrained embeddings to identify hate speech in indian code-mixed text</title>
		<author>
			<persName><forename type="first">S</forename><surname>Banerjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICACCCN51052.2020.9362731</idno>
	</analytic>
	<monogr>
		<title level="m">2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="21" to="25" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Overview of the hasoc track at fire 2020: Hate speech and offensive language identification in tamil, malayalam, hindi, english and german</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Forum for Information Retrieval Evaluation</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="29" to="32" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Findings of the shared task on troll meme classification in Tamil</title>
		<author>
			<persName><forename type="first">S</forename><surname>Suryawanshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2021.dravidianlangtech-1.16" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics</title>
				<meeting>the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics<address><addrLine>Kyiv</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="126" to="132" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Priyadharshini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Thavareesan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chinnappa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Durairaj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sherly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hande</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ponnusamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Banerjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Vasantharajan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of FIRE 2021 -Forum for Information Retrieval Evaluation</title>
				<imprint>
			<publisher>CEUR</publisher>
			<date type="published" when="2021">2021. 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Sentiment analysis using deep learning approaches: an overview</title>
		<author>
			<persName><forename type="first">O</forename><surname>Habimana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">X</forename><surname>Yu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Science China Information Sciences</title>
		<imprint>
			<biblScope unit="volume">63</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Bidirectional recurrent neural networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Schuster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Paliwal</surname></persName>
		</author>
		<idno type="DOI">10.1109/78.650093</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Signal Processing</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page" from="2673" to="2681" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network</title>
		<author>
			<persName><forename type="first">A</forename><surname>Sherstinsky</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.physd.2019.132306</idno>
		<ptr target="http://dx.doi.org/10.1016/j.physd.2019.132306.doi:10.1016/j.physd.2019.132306" />
	</analytic>
	<monogr>
		<title level="j">Physica D: Nonlinear Phenomena</title>
		<imprint>
			<biblScope unit="volume">404</biblScope>
			<biblScope unit="page">132306</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">KanCMD: Kannada CodeMixed dataset for sentiment analysis and offensive language detection</title>
		<author>
			<persName><forename type="first">A</forename><surname>Hande</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Priyadharshini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2020.peoples-1.6" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Third Workshop on Computational Modeling of People&apos;s Opinions, Personality, and Emotion&apos;s in Social Media, Association for Computational Linguistics</title>
				<meeting>the Third Workshop on Computational Modeling of People&apos;s Opinions, Personality, and Emotion&apos;s in Social Media, Association for Computational Linguistics<address><addrLine>Barcelona, Spain (Online</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="54" to="63" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Banerjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jayapal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Thavareesan</surname></persName>
		</author>
		<title level="m">Nuig-shubhanker@dravidian-codemix-fire2020: Sentiment analysis of code-mixed dravidian text using xlnet</title>
				<imprint>
			<publisher>FIRE</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Corpus creation for sentiment analysis in code-mixed Tamil-English text</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Muralidaran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Priyadharshini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2020.sltu-1.28" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), European Language Resources association</title>
				<meeting>the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), European Language Resources association<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="202" to="210" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">A sentiment analysis dataset for code-mixed Malayalam-English</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Jose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Suryawanshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sherly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2020.sltu-1.25" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), European Language Resources association</title>
				<meeting>the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), European Language Resources association<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="177" to="184" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Hypers@DravidianLangTech-EACL2021: Offensive language identification in Dravidian code-mixed YouTube comments and posts</title>
		<author>
			<persName><forename type="first">C</forename><surname>Vasantharajan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Thayasivam</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/2021.dravidianlangtech-1.26" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics</title>
				<meeting>the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics<address><addrLine>Kyiv</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="195" to="202" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Overview of the HASOC-DravidianCodeMix Shared Task on Offensive Language Detection in Tamil and Malayalam</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Kumaresan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sakuntharaj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Madasamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Thavareesan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">B</forename></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chinnaudayar Navaneethakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of FIRE 2021 -Forum for Information Retrieval Evaluation</title>
				<imprint>
			<publisher>CEUR</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Overview of the dravidiancodemix 2021 shared task on sentiment detection in tamil, malayalam, and kannada</title>
		<author>
			<persName><forename type="first">R</forename><surname>Priyadharshini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Thavareesan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chinnappa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Durairaj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sherly</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Forum for Information Retrieval Evaluation, FIRE 2021</title>
				<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Towards offensive language identification for tamil code-mixed youtube comments and posts</title>
		<author>
			<persName><forename type="first">C</forename><surname>Vasantharajan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Thayasivam</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">a r X i v</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Universal language model fine-tuning for text classification</title>
		<author>
			<persName><forename type="first">J</forename><surname>Howard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ruder</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">a r X i v</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">8</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">a r X i v</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">8</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter</title>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">a r X i</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">9</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">How multilingual is multilingual bert?</title>
		<author>
			<persName><forename type="first">T</forename><surname>Pires</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Schlinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Garrette</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">a r X i v</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">9</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">a r X i v</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">7</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">A comprehensive survey on transfer learning</title>
		<author>
			<persName><forename type="first">F</forename><surname>Zhuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Qi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Duan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Xi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>He</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">a r X i v</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">9</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Overview of the track on sentiment analysis for dravidian languages in code-mixed text</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Priyadharshini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Muralidaran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Suryawanshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Jose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sherly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Forum for Information Retrieval Evaluation</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="21" to="24" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
