<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Hate Speech and Offensive Language Identification on Multilingual code-mixed Text using BERT</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Snehaan</forename><surname>Bhawal</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Kalinga Institute of Industrial Technology</orgName>
								<address>
									<settlement>Odisha</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pradeep</forename><surname>Kumar</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Indian Institute of Information Technology Surat</orgName>
								<address>
									<region>Gujarat</region>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Abhinav</forename><surname>Kumar</surname></persName>
							<affiliation key="aff2">
								<orgName type="department">Siksha &apos;O&apos; Anusandhan</orgName>
								<orgName type="institution">Deemed to be University</orgName>
								<address>
									<settlement>Bhubaneswar</settlement>
									<region>Odisha</region>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Hate Speech and Offensive Language Identification on Multilingual code-mixed Text using BERT</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">D1D0129B586869DBCF5A307DD18DA577</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T01:35+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Multilingual Text</term>
					<term>Hate Speech</term>
					<term>Deep Learning</term>
					<term>Machine Learning</term>
					<term>BERT</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Hate Speech and Offensive Content detection in social media has been an active field of research for the last couple of years. For the majority of the world consisting of non-native English speakers, most of the time unofficial messages are written in code-mixed language in a combination of words in a native language with English text. The current study focuses on using Machine and Deep learning techniques for detection of Hate Speech and Offensive content in a Malayalam and Tamil code-mixed text collected from social media. The study showed that Deep learning models perform better than the machine learning models, specifically the implementation of BERT based transfer learning models performed best.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Hate Speech is generally defined as content that expresses hate or prejudice against a particular group, ethnicity, religion, nationality or sexual orientation <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b2">3]</ref>. Social network platforms consists of a large amount of user-generated content, and due to being not moderated in nature, there is a widespread use of targeted hate speech against certain individuals, which has become a very critical issue <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>.</p><p>Humans can't always moderate the social media networks to read, identify, and deal with the hateful text that the platform generates in such high frequency affecting the users mentally. Thus, there is a need for automation, and it has already been established that detection of such content by automation has been successful to a certain extent. Davidson et al. <ref type="bibr" target="#b5">[6]</ref> used Logistic Regression with n-grams TF-IDF features to perform classification of Offensive and Non-Offensive text. At the same time, in another paper, a neural network-based approach was presented by Badjatiya et al. <ref type="bibr" target="#b6">[7]</ref>, where they used GloVe embedding with CNNs and LSTMs to provide better results.</p><p>However, most of the research that has taken place over hate speech and offensive language detection is predominately for the English language <ref type="bibr" target="#b0">[1]</ref>. In a country like India, with home FIRE 2021,Forum for Information Retrieval Evaluation, December <ref type="bibr" target="#b12">[13]</ref><ref type="bibr" target="#b13">[14]</ref><ref type="bibr" target="#b14">[15]</ref><ref type="bibr" target="#b15">[16]</ref><ref type="bibr" target="#b16">[17]</ref><ref type="bibr">2021</ref> Envelope mailtosnehaan@gmail.com (S. Bhawal); pradeep.roy@iiitsurat.ac.in (P. K. Roy); abhinavkumar@soa.ac.in (A. <ref type="bibr">Kumar)</ref> Orcid 0000-0002-1072-5326 (S. Bhawal); 0000-0001-5513-2834 (P. K. Roy); 0000-0001-9367-7069 (A. <ref type="bibr">Kumar)</ref> to numerous regional languages, people have adapted to using a mix of regional and English languages to express themselves in social media <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9]</ref>. The current research is done upon bilingual texts, which contain words from both languages and are written in one script, called code-mixed text. While there is another way of combining the words in native writing with the English script, which is known as Script-mixed text. These are far more challenging to work with as it requires a different tokenization process compared to what we need for English texts. Examples of some popular code-mixed languages in India are Hinglish (Hindi and English), Tanglish (Tamil and English), Manglish (Malayalam and English), and a mixed language of Kannada and English <ref type="bibr" target="#b9">[10]</ref>.</p><p>Identifying Hate Speech in such code-mixed languages is much more challenging than in English <ref type="bibr" target="#b10">[11]</ref> due to the absence of sufficient NLP resources. The models which are trained on a monolingual corpus might find it difficult to provide satisfactory results. This is because the system learns and recognizes the words provided in the given vocabulary while training. In the case of code-mixed text, many new words will be introduced which will not be present in the training vocabulary. The words are then marked as out of vocabulary token that makes no difference in the estimation of the model. Thus, the performance of the model decreases.</p><p>The current study focuses on Offensive language identification in code-mixed languages of Tanglish and Manglish with the data set provided in HASOC-Dravidian-CodeMix-FIRE2021 challenge. An overview of the dataset can be found here <ref type="bibr" target="#b11">[12]</ref>. We have implemented a number of Machine learning and Deep learning models, including transfer learning models like BERT, to distinguish between the Offensive and Non-Offensive text.</p><p>The rest of the article is summarized as follows: Section 2 discusses the related works. Section 3, 3.1, 4 provides the task description, the pre-processing steps taken, followed by the explanation of the proposed methodology. The experimental results and discussion are explained in Section 5 and 6, respectively. Section 7 concluded the work by highlighting the limitations and future scope.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Literature Review</head><p>The use of Hate speech and Offensive language has become one of the major issues concerning the social networking platforms and hence received fruitful attention from many worldwide researchers <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9]</ref>. Roy et al. <ref type="bibr" target="#b0">[1]</ref> developed a deep learning-based framework to address the hate speech issue on Twitter. They used a Convolutional Neural Network to process the tweets and predict whether it were Hate or non-Hate. They considered only the tweets written in English language and hence unable to detect the tweets of multilingual texts, such as Tamil-English, Kannada-English and others. Badjatiya et al. <ref type="bibr" target="#b6">[7]</ref> developed a deep learning model to classify the tweets into racist, sexist or neither category. Their model experimented on 16k labelled data and outperformed existing models. The main issues with the existing works are the coverage of the language. Most of the existing researches use an English dataset. However, currently, people prefer to post the message on the social platform in code-mixed languages like Hindi-English mixed, Tamil-English mixed and others.</p><p>Recent work by Kumar et al. <ref type="bibr" target="#b3">[4]</ref> suggested a deep learning-based framework to classify the Tamil and Malayalam code-mixed YouTube comments into the offensive and non-offensive categories. Many machines and deep learning models have experimented. The best result was obtained when a character n-gram tf-idf features passed to the dense neural network. Their model achieved the weighted F1-score value of 0.95. Suryawanshi et al. <ref type="bibr" target="#b12">[13]</ref> developed the resources for Tamilmeme detection. The developed dataset consisted of two labels: troll and not_troll. A total of ten models were submitted, and the model with an F1-score value of 0.55 secured the first rank among them. Banerjee et al. <ref type="bibr" target="#b13">[14]</ref> compared the performance of the pre-trained models on the Hinglish code-mixed dataset for predicting the Hate and non-Hate post.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Task and Data Description</head><p>The current study is an implementation and comparison of different Machine and Deep Learning models for a Hate Speech and Offensive Language detection system for Tamil and Malayalam code-mixed texts in English. The dataset consists of sentences collected from comments or posts from social media. Table <ref type="table" target="#tab_0">1</ref> shows the overview of the data used in this analysis. There are two sets of data, Malayalam code-mixed and Tamil code-mixed data each consisting of code-mixed sentences with addition of various emojis in most of the cases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data Preprocessing</head><p>As the data was code-mixed with Malayalam or Tamil mixed with English, no stop-word removal was done. The text being informal in nature contained emojis and emoticons which were replaced with their respective textual meaning using data from the Unicode Consortium's emoji code repository by using the demoji library. This was then followed with the removal of punctuation, URLS, email-ids, hyperlinks and numeric data from the text.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Methodology</head><p>This section discusses the working of the implemented models in detail, the codes of which can be found in the GitHub repository <ref type="foot" target="#foot_0">1</ref> . In our current study, three different approaches were used as shown in Figure <ref type="figure" target="#fig_0">1:</ref> i Conventional Machine learning based models. ii Neural Network based models iii Transfer learning based models</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Traditional ML Models</head><p>In traditional ML-based models, we looked into using a 1 -5 gram word TF-IDF feature set. The extracted features were then fed to classifiers like Logistic Regression (LR), Naive Bayes (NB), Random Forest (RF), XGBoost (XGB), and Support Vector Machine (SVM). The performance of these models were evaluated in terms of precision, recall, and F1-score <ref type="bibr" target="#b14">[15]</ref>. The detailed performance report of these models are provided in the Section 5.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Neural Network based models</head><p>In neural network-based models, the 1 to 5 grams TF-IDF features extracted while working with the Machine Learning models were used again as an input to a simple deep neural network (DNN) model. This model consisted of four fully connected layers in sequential order, with 512, 256, 128 and 1 neurons in the first, second, third and fourth (output) layers. Due to classification between two distinct labels, only one output neuron was used to identify the outputs. The hidden neurons were set up with the ReLU activation function. In contrast, the output neuron was set up with sigmoid activation function with Adam and binary-cross-entropy as the respective chosen optimizer and loss function.</p><p>The second experimented neural model is Convolutional Neural Network (CNN) <ref type="bibr" target="#b15">[16]</ref>. The CNN consisted of one Conv1D layer followed by a Global Max Pooling and a Dropout layer connected to a fully connected sequential network with two hidden layers of 128 and 64 neurons, respectively, The activation function for the hidden neurons were chosen to be ReLU. The output was a single neuron with sigmoid activation. An embedding layer was used as the input layer with the embedding dimension set to 50 and the input length set to 120. Therefore, a (120, 50) dimensional embedding matrix was given as an input to CNN. The Convolutional layer consisted of 64 filters with a kernel size of three.</p><p>Our final neural network based model was a Bidirectional Long Short-Term Memory model (Bi-LSTM), consisting of 256 memory units followed by Global Max Pooling and Batch Normalization. The input layer was an embedding layer with 50 dimensions and length padded to 120 like the previous model. Two fully connected dense layers served as the hidden layers comprising of 20 and 10 neurons, respectively with ReLU as the activation function, which was then connected to a single neuron as the output layer with sigmoid activation. Subsequent Hyper-parameter tuning was done for the described models to check for the optimal performance by adjusting the optimizer, learning rate and embedding dimensions. Our experiments led to the best result with a learning rate set to 0.0001 with the optimizer set as Adam. The embedding dimension was set to 50 as it gave the best result. Due to binary classification and the overall balanced nature of the data set, the loss function was kept to be binary cross-entropy and sigmoid activation function for the output neuron.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Transfer Learning</head><p>This study has implemented BERT (Bidirectional Encoder Representations from Transformers) models to work with these models' transfer learning capabilities. For these models, no preprocessing was done. Three different variants of BERT models were studied.</p><p>i BERT (multilingual) <ref type="bibr" target="#b16">[17]</ref>.</p><p>ii IndicBERT <ref type="bibr" target="#b17">[18]</ref>. iii Multilingual Representations for Indian Languages (MuRIL) <ref type="bibr" target="#b18">[19]</ref>.</p><p>The BERT <ref type="bibr" target="#b16">[17]</ref> multilingual model was trained on 102 languages with masked language modelling. The case-sensitive model was chosen, as no prior data pre-processing was done in case of transfer learning models. IndicBERT <ref type="bibr" target="#b17">[18]</ref> is a multilingual ALBERT model, pretrained exclusively on a corpus of 12 major Indian languages. Compared to other such BERT based models, IndicBERT is comparatively smaller and has much less number of parameters. We used ktrain <ref type="bibr" target="#b19">[20]</ref> libraries to develop the IndicBERT model. The last model that we used is MuRIL(Multilingual Representations for Indian Languages) <ref type="bibr" target="#b18">[19]</ref>. MuRIL is a BERT model trained over a monolingual corpus of 17 Indian languages along with their translated and transliterated counterpart. The differentiating factor between this and the previous model is that IndicBERT is trained only on the native Indian scripts. In contrast, MuRIL is trained on traditional scripts as well their transliterated corpus in roman script. The benefit of this will be evident in our experiment, which deals with code-mixed data of Indian and English language written strictly in roman script. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head><p>This section presents the results of all our experiments done during this study, as mentioned in Section 2. The results shown below corresponds to the model prediction on the validation data and are shown in terms of precision, recall, and F1-score belonging to OFF (Offensive) or NOT (Not Offensive) class. A model is said to be the best if it reports the highest weighted average in terms of precision, recall, and F1-score. The best results for the particular data set are presented in bold for each different model used in this study. Traditional ML models were built using 1 to 5-gram character TF-IDF features which included the following models, LR, RF, NB, XGB and SVM. Their results are shown in Table <ref type="table" target="#tab_1">2</ref> respectively. In the Malayalam code-mixed data set, the LR classifier gave a better performance with recall and F1 of 0.70. Similarly, in Tamil code-mixed text, the LR classifier performed the best and reported precision of 0.83 with recall and an F1-score of 0.82.</p><p>Results of the neural network models are presented in Table <ref type="table" target="#tab_2">3</ref>. It is seen that a simple DNN provided the best results in the case of the Malayalam Code Mix data with a precision of 0.75 with recall and F1-Score being 0.74. In Tamil Code Mix data, CNN showed the best performance with precision reaching 0.90 with Recall and F1-Score of 0.89.</p><p>In Table <ref type="table" target="#tab_3">4</ref> the results of using different BERT models are presented. In both Malayalam and Tamil Data, it is seen that the MuRIL model performed the best among the other models. In Malayalam data, the precision was 0.79 with recall and F1-Score being 0.78, and for Tamil data, precision, recall and F1-Score were 0.91, which was the highest among all experimented models </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Discussion</head><p>Among all experimented models, the MuRIL-a transfer learning model, performed the best for both Malayalam and Tamil code-mixed data. The experimental outcomes show that the traditional machine learning models are unable to understand the context of the message and hence may not be a good choice for this task. A simple Deep Neural Network (DNN) with an embedding layer performed better than most of the machine learning models (Tables <ref type="table" target="#tab_2">2,  3</ref>). Although some of their performance came near those of neural network models, we were dealing mostly with text data consisting of single sentences. For multiple sentence texts, a neural network with the ability to hold some memory like LSTM would have outclassed the machine learning models <ref type="bibr" target="#b20">[21]</ref>.</p><p>As shown in Table <ref type="table" target="#tab_3">4</ref>, the IndicBERT model is not able to perform as good as the multilingual BERT model. This may happen because the data set consisted of code-mixed data in the Roman script only. If there were any text written in the traditional script, then the multilingual BERT model would have treated most of the tokens as an unknown token which would had affected the model performance-benefiting the IndicBERT model as it was trained on monolingual Indian scripts. Finally, MuRIL, which was trained on a corpus of both traditional script and transliterated one, performed better than all the models. The above reported results (Tables <ref type="table" target="#tab_2">2, 3</ref>, 4) were based on the predictions done over the validation data set. While using the test data, the proposed MuRIL model achieved the precision, recall and F1-score value of 0.679, 0.673 and 0.636, respectively for Tamil code-mixed data, while on the Malayalam code-mixed data, the precision, recall and F1-score value is 0.752, 0.727, and 0.734, respectively for the best case.</p><p>The models were re-experimented with labelled test data, and the obtained results with different machine learning, neural network and transfer learning models are shown in Table <ref type="table" target="#tab_4">5</ref>. Similar to the results on the validation data, MuRIL -a transfer learning model produces the best prediction outcomes in terms of weighted average precision, recall and F1-score for both Tanglish and Manglish test dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusion</head><p>Hate speech and offensive language detection is still a challenge for low resource and codemixed languages in NLP. We implemented various machine learning, deep learning and transfer learning models to find the best suitable model for code-mixed Tamil and Malayalam datasets. The results reported by the models show the deep learning models. Specifically, the pre-trained models outperformed the machine learning models. The MuRIL model performed the best reporting weighted F1-score of 0.636 in Tamil code-mixed data. The same model provided a weighted F1-score of 0.734 in Malayalam code-mixed data. On test data, the BERT and MuRIL both transfer learning model yielded almost similar outcomes. In the future, a better model can be built by some additional preprocessing steps on the dataset to achieve better prediction accuracy.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Framework used to predict the offensive post</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Distribution of Data in the Training and Validation classes</figDesc><table><row><cell>Data Set</cell><cell>Class</cell><cell cols="3">Offensive Not Offensive Total</cell></row><row><cell>Malayalam</cell><cell>Train</cell><cell>1952</cell><cell>2047</cell><cell>3999</cell></row><row><cell>Code-Mixed</cell><cell>Validation</cell><cell>478</cell><cell>473</cell><cell>951</cell></row><row><cell>Tamil</cell><cell>Train</cell><cell>2019</cell><cell>1980</cell><cell>3999</cell></row><row><cell>Code-Mixed</cell><cell>Validation</cell><cell>465</cell><cell>465</cell><cell>940</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Results of Traditional ML Models on Malayalam and Tamil code-mixed validation data set</figDesc><table><row><cell cols="2">Model Class</cell><cell cols="3">Malayalam Code-Mixed</cell><cell cols="3">Tamil Code-Mixed</cell></row><row><cell></cell><cell></cell><cell cols="6">Precision Recall F1-score Precision Recall F1-score</cell></row><row><cell></cell><cell>Offensive</cell><cell>0.73</cell><cell>0.65</cell><cell>0.69</cell><cell>0.81</cell><cell>0.86</cell><cell>0.83</cell></row><row><cell>LR</cell><cell>Not Offensive</cell><cell>0.68</cell><cell>0.75</cell><cell>0.72</cell><cell>0.85</cell><cell>0.79</cell><cell>0.82</cell></row><row><cell></cell><cell>Weighted Avg</cell><cell>0.70</cell><cell>0.70</cell><cell>0.70</cell><cell>0.83</cell><cell>0.82</cell><cell>0.82</cell></row><row><cell></cell><cell>Offensive</cell><cell>0.78</cell><cell>0.51</cell><cell>0.61</cell><cell>0.78</cell><cell>0.87</cell><cell>0.82</cell></row><row><cell>RF</cell><cell>Not Offensive</cell><cell>0.63</cell><cell>0.85</cell><cell>0.73</cell><cell>0.85</cell><cell>0.76</cell><cell>0.80</cell></row><row><cell></cell><cell>Weighted Avg</cell><cell>0.71</cell><cell>0.68</cell><cell>0.67</cell><cell>0.81</cell><cell>0.81</cell><cell>0.81</cell></row><row><cell></cell><cell>Offensive</cell><cell>0.71</cell><cell>0.66</cell><cell>0.69</cell><cell>0.84</cell><cell>0.77</cell><cell>0.81</cell></row><row><cell>NB</cell><cell>Not Offensive</cell><cell>0.68</cell><cell>0.73</cell><cell>0.71</cell><cell>0.78</cell><cell>0.85</cell><cell>0.82</cell></row><row><cell></cell><cell>Weighted Avg</cell><cell>0.70</cell><cell>0.70</cell><cell>0.70</cell><cell>0.81</cell><cell>0.81</cell><cell>0.81</cell></row><row><cell></cell><cell>Offensive</cell><cell>0.81</cell><cell>0.33</cell><cell>0.47</cell><cell>0.67</cell><cell>0.93</cell><cell>0.78</cell></row><row><cell>XGB</cell><cell>Not Offensive</cell><cell>0.58</cell><cell>0.92</cell><cell>0.71</cell><cell>0.89</cell><cell>0.52</cell><cell>0.66</cell></row><row><cell></cell><cell>Weighted Avg</cell><cell>0.70</cell><cell>0.63</cell><cell>0.59</cell><cell>0.77</cell><cell>0.73</cell><cell>0.72</cell></row><row><cell></cell><cell>Offensive</cell><cell>0.72</cell><cell>0.60</cell><cell>0.66</cell><cell>0.76</cell><cell>0.88</cell><cell>0.82</cell></row><row><cell>SVM</cell><cell>Not Offensive</cell><cell>0.66</cell><cell>0.77</cell><cell>0.71</cell><cell>0.85</cell><cell>0.72</cell><cell>0.78</cell></row><row><cell></cell><cell>Weighted Avg</cell><cell>0.69</cell><cell>0.68</cell><cell>0.68</cell><cell>0.81</cell><cell>0.80</cell><cell>0.80</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Results of Neural Network based models on Malayalam and Tamil code-mixed data set</figDesc><table><row><cell cols="2">Model Class</cell><cell cols="3">Malayalam Code-Mixed</cell><cell cols="3">Tamil Code-Mixed</cell></row><row><cell></cell><cell></cell><cell cols="6">Precision Recall F1-score Precision Recall F1-score</cell></row><row><cell></cell><cell>Offensive</cell><cell>0.73</cell><cell>0.67</cell><cell>0.70</cell><cell>0.84</cell><cell>0.83</cell><cell>0.83</cell></row><row><cell>DNN</cell><cell>Not Offensive</cell><cell>0.69</cell><cell>0.75</cell><cell>0.72</cell><cell>0.83</cell><cell>0.83</cell><cell>0.83</cell></row><row><cell></cell><cell>Weighted Avg</cell><cell>0.71</cell><cell>0.71</cell><cell>0.71</cell><cell>0.83</cell><cell>0.83</cell><cell>0.83</cell></row><row><cell>DNN+ Emb</cell><cell>Offensive Not Offensive Weighted Avg</cell><cell>0.79 0.70 0.75</cell><cell>0.65 0.82 0.74</cell><cell>0.72 0.76 0.74</cell><cell>0.84 0.91 0.87</cell><cell>0.92 0.82 0.87</cell><cell>0.88 0.86 0.87</cell></row><row><cell></cell><cell>Offensive</cell><cell>0.78</cell><cell>0.61</cell><cell>0.68</cell><cell>0.95</cell><cell>0.84</cell><cell>0.89</cell></row><row><cell>CNN</cell><cell>Not Offensive</cell><cell>0.68</cell><cell>0.83</cell><cell>0.74</cell><cell>0.85</cell><cell>0.95</cell><cell>0.90</cell></row><row><cell></cell><cell>Weighted Avg</cell><cell>0.73</cell><cell>0.72</cell><cell>0.71</cell><cell>0.90</cell><cell>0.89</cell><cell>0.89</cell></row><row><cell>Bi-LSTM</cell><cell>Offensive Not Offensive Weighted Avg</cell><cell>0.77 0.64 0.70</cell><cell>0.52 0.84 0.68</cell><cell>0.62 0.72 0.67</cell><cell>0.92 0.79 0.86</cell><cell>0.76 0.93 0.84</cell><cell>0.83 0.86 0.84</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Results of Transfer Learning based models on Malayalam and Tamil code-mixed validation data set</figDesc><table><row><cell cols="2">Model Class</cell><cell cols="3">Malayalam Code-Mixed</cell><cell cols="3">Tamil Code-Mixed</cell></row><row><cell></cell><cell></cell><cell cols="6">Precision Recall F1-score Precision Recall F1-score</cell></row><row><cell></cell><cell>Offensive</cell><cell>0.75</cell><cell>0.73</cell><cell>0.74</cell><cell>0.88</cell><cell>0.92</cell><cell>0.90</cell></row><row><cell>BERT</cell><cell>Not Offensive</cell><cell>0.74</cell><cell>0.75</cell><cell>0.74</cell><cell>0.92</cell><cell>0.87</cell><cell>0.89</cell></row><row><cell></cell><cell>Weighted Avg</cell><cell>0.74</cell><cell>0.74</cell><cell>0.74</cell><cell>0.90</cell><cell>0.90</cell><cell>0.90</cell></row><row><cell>In-</cell><cell>Offensive</cell><cell>0.71</cell><cell>0.69</cell><cell>0.70</cell><cell>0.78</cell><cell>0.82</cell><cell>0.80</cell></row><row><cell>dic</cell><cell>Not Offensive</cell><cell>0.70</cell><cell>0.72</cell><cell>0.71</cell><cell>0.81</cell><cell>0.76</cell><cell>0.78</cell></row><row><cell>BERT</cell><cell>Weighted Avg</cell><cell>0.71</cell><cell>0.71</cell><cell>0.71</cell><cell>0.79</cell><cell>0.79</cell><cell>0.79</cell></row><row><cell></cell><cell>Offensive</cell><cell>0.82</cell><cell>0.72</cell><cell>0.77</cell><cell>0.92</cell><cell>0.91</cell><cell>0.92</cell></row><row><cell>MuRIL</cell><cell>Not Offensive</cell><cell>0.75</cell><cell>0.84</cell><cell>0.79</cell><cell>0.91</cell><cell>0.92</cell><cell>0.91</cell></row><row><cell></cell><cell>Weighted Avg</cell><cell>0.79</cell><cell>0.78</cell><cell>0.78</cell><cell>0.91</cell><cell>0.91</cell><cell>0.91</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5</head><label>5</label><figDesc>Test Data Prediction Results on selected models</figDesc><table><row><cell cols="2">Model Class</cell><cell cols="3">Malayalam Code-Mixed</cell><cell cols="3">Tamil Code-Mixed</cell></row><row><cell></cell><cell></cell><cell cols="6">Precision Recall F1-score Precision Recall F1-score</cell></row><row><cell></cell><cell>Offensive</cell><cell>0.51</cell><cell>0.65</cell><cell>0.57</cell><cell>0.52</cell><cell>0.56</cell><cell>0.54</cell></row><row><cell>LR</cell><cell>Not Offensive</cell><cell>0.81</cell><cell>0.70</cell><cell>0.75</cell><cell>0.70</cell><cell>0.66</cell><cell>0.68</cell></row><row><cell></cell><cell>Weighted Avg</cell><cell>0.71</cell><cell>0.69</cell><cell>0.69</cell><cell>0.62</cell><cell>0.62</cell><cell>0.62</cell></row><row><cell>DNN+ Emb</cell><cell>Offensive Not Offensive Weighted Avg</cell><cell>0.49 0.80 0.70</cell><cell>0.65 0.68 0.67</cell><cell>0.56 0.73 0.67</cell><cell>0.55 0.70 0.64</cell><cell>0.53 0.72 0.64</cell><cell>0.54 0.71 0.64</cell></row><row><cell></cell><cell>Offensive</cell><cell>0.55</cell><cell>0.52</cell><cell>0.53</cell><cell>0.56</cell><cell>0.50</cell><cell>0.53</cell></row><row><cell>CNN</cell><cell>Not Offensive</cell><cell>0.78</cell><cell>0.79</cell><cell>0.78</cell><cell>0.69</cell><cell>0.75</cell><cell>0.72</cell></row><row><cell></cell><cell>Weighted Avg</cell><cell>0.70</cell><cell>0.70</cell><cell>0.70</cell><cell>0.64</cell><cell>0.65</cell><cell>0.64</cell></row><row><cell></cell><cell>Offensive</cell><cell>0.56</cell><cell>0.65</cell><cell>0.60</cell><cell>0.61</cell><cell>0.84</cell><cell>0.71</cell></row><row><cell>BERT</cell><cell>Not Offensive</cell><cell>0.82</cell><cell>0.75</cell><cell>0.78</cell><cell>0.73</cell><cell>0.44</cell><cell>0.55</cell></row><row><cell></cell><cell>Weighted Avg</cell><cell>0.73</cell><cell>0.72</cell><cell>0.72</cell><cell>0.67</cell><cell>0.64</cell><cell>0.63</cell></row><row><cell></cell><cell>Offensive</cell><cell>0.55</cell><cell>0.60</cell><cell>0.58</cell><cell>0.58</cell><cell>0.43</cell><cell>0.50</cell></row><row><cell>MuRIL</cell><cell>Not Offensive</cell><cell>0.80</cell><cell>0.77</cell><cell>0.78</cell><cell>0.68</cell><cell>0.80</cell><cell>0.74</cell></row><row><cell></cell><cell>Weighted Avg</cell><cell>0.75</cell><cell>0.72</cell><cell>0.73</cell><cell>0.68</cell><cell>0.67</cell><cell>0.64</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://github.com/Sbhawal/HASOC-FIRE-2021-CODES</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A framework for hate speech detection using deep convolutional neural network</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Roy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Tripathy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">K</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X.-Z</forename><surname>Gao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="204951" to="204962" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Offensive language identification in dravidian code mixed social media text</title>
		<author>
			<persName><forename type="first">S</forename><surname>Saumya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Singh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages</title>
				<meeting>the First Workshop on Speech and Language Technologies for Dravidian Languages</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="36" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Overview of the hasoc track at fire 2020: Hate speech and offensive language identification in tamil, malayalam, hindi, english and german</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Forum for Information Retrieval Evaluation</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="29" to="32" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Nitp-ai-nlp@ hasoc-dravidian-codemix-fire2020: A machine learning approach to identify offensive languages from dravidian code-mixed text</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Saumya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Singh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">FIRE (Working Notes)</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="384" to="390" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Nitp-ai-nlp@ hasoc-fire2020: Fine tuned bert for the hate speech and offensive content identification from social media</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Saumya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Singh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">FIRE (Working Notes)</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="266" to="273" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Automated hate speech detection and the problem of offensive language</title>
		<author>
			<persName><forename type="first">T</forename><surname>Davidson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Warmsley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Macy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Weber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International AAAI Conference on Web and Social Media</title>
				<meeting>the International AAAI Conference on Web and Social Media</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">11</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Deep learning for hate speech detection in tweets</title>
		<author>
			<persName><forename type="first">P</forename><surname>Badjatiya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Varma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 26th international conference on World Wide Web companion</title>
				<meeting>the 26th international conference on World Wide Web companion</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="759" to="760" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Jayanthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gupta</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2102.01051</idno>
		<title level="m">Sj_aj@ dravidianlangtech-eacl2021: Task-adaptive pretraining of multilingual bert models for offensive language identification</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Hypers@ dravidianlangtech-eacl2021: Offensive language identification in dravidian code-mixed youtube comments and posts</title>
		<author>
			<persName><forename type="first">C</forename><surname>Vasantharajan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Thayasivam</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages</title>
				<meeting>the First Workshop on Speech and Language Technologies for Dravidian Languages</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="195" to="202" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Findings of the shared task on offensive language identification in Tamil, Malayalam, and Kannada</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Priyadharshini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Jose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Kumaresan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ponnusamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sherly</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2021.dravidianlangtech-1.17" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics</title>
				<meeting>the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics<address><addrLine>Kyiv</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="133" to="145" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Overview of the HASOC-DravidianCodeMix Shared Task on Offensive Language Detection in Tamil and Malayalam</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Kumaresan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sakuntharaj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Madasamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Thavareesan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">B</forename></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chinnaudayar Navaneethakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of FIRE 2021 -Forum for Information Retrieval Evaluation</title>
				<imprint>
			<publisher>CEUR</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Overview of the dravidiancodemix 2021 shared task on sentiment detection in tamil, malayalam, and kannada</title>
		<author>
			<persName><forename type="first">R</forename><surname>Priyadharshini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Thavareesan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chinnappa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Durairaj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sherly</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Forum for Information Retrieval Evaluation, FIRE 2021</title>
				<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Findings of the shared task on troll meme classification in Tamil</title>
		<author>
			<persName><forename type="first">S</forename><surname>Suryawanshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2021.dravidianlangtech-1.16" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics</title>
				<meeting>the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics<address><addrLine>Kyiv</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="126" to="132" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Comparison of pretrained embeddings to identify hate speech in indian code-mixed text</title>
		<author>
			<persName><forename type="first">S</forename><surname>Banerjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICACCCN51052.2020.9362731</idno>
	</analytic>
	<monogr>
		<title level="m">2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="21" to="25" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">A novel hybrid credit scoring model based on ensemble feature selection and multilayer ensemble classification</title>
		<author>
			<persName><forename type="first">D</forename><surname>Tripathi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">R</forename><surname>Edla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cheruku</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kuppili</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Intelligence</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="371" to="394" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Multilayer convolutional neural network to filter low quality content from quora</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Roy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural Processing Letters</title>
		<imprint>
			<biblScope unit="volume">52</biblScope>
			<biblScope unit="page" from="805" to="821" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.04805</idno>
		<title level="m">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kakwani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kunchukuttan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Golla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">N C</forename></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bhattacharyya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Khapra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kumar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Findings of EMNLP</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Muril: Multilingual representations for indian languages</title>
		<author>
			<persName><forename type="first">S</forename><surname>Khanuja</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bansal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mehtani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Khosla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Gopalan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">K</forename><surname>Margam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Aggarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">T</forename><surname>Nagipogu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">C B</forename><surname>Gali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Subramanian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Talukdar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">a r X i</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">ktrain: A low-code library for augmented machine learning</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Maiya</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2004.10703</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
	<note>a r X i v : 2 0 0 4 . 1 0 7 0 3</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Deep learning to filter sms spam</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Roy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Banerjee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Future Generation Computer Systems</title>
		<imprint>
			<biblScope unit="volume">102</biblScope>
			<biblScope unit="page" from="524" to="533" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
