<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">NITP-AI-NLP@HASOC-Dravidian-CodeMix-FIRE2020: A Machine Learning Approach to Identify Offensive Languages from Dravidian Code-Mixed Text</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Abhinav</forename><surname>Kumar</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">National Institute of Technology Patna</orgName>
								<address>
									<settlement>Patna</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">National Institute of Technology Patna</orgName>
								<address>
									<settlement>Patna</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sunil</forename><surname>Saumya</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Indian Institute of Information Technology Dharwad</orgName>
								<address>
									<region>Karnataka</region>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jyoti</forename><forename type="middle">Prakash</forename><surname>Singh</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">National Institute of Technology Patna</orgName>
								<address>
									<settlement>Patna</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">National Institute of Technology Patna</orgName>
								<address>
									<settlement>Patna</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">NITP-AI-NLP@HASOC-Dravidian-CodeMix-FIRE2020: A Machine Learning Approach to Identify Offensive Languages from Dravidian Code-Mixed Text</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">65EB0BE6FBD1AD93493B0189EB4885E0</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T13:49+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Hate speech</term>
					<term>Code-mixed</term>
					<term>Script-mixed</term>
					<term>Machine learning</term>
					<term>Deep learning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Hate speech in social media has posed a threat to society. Several models for a single language, mostly English hate speech is proposed recently. However, In countries where English is not the native language, communication involves scripts and constructs of more than one language yielding code mixed text. The current work classifies offensive and non-offensive tweets or YouTube comments written in code-mix Tamil, code-mixed Malayalam, and script-mixed Malayalam languages. We explored deep learning models such as attention-based Long Short Term Memory (LSTM), Convolution Neural Network (CNN), and machine learning models such as support vector machine, Logistic regression, Random forest, and Naive Bayes to identify offensive posts from the code-mixed and script-mixed posts. From the extensive experiments, we found that the use of character N-gram Term Frequency-Inverse Document Frequency (TF-IDF) features plays a promising role in identifying offensive social media posts. The character N-gram TF-IDF based Naive Bayes classifier performed best with the weighted precision, recall, and 𝐹 1 -score of 0.90 for Tamil code-mixed text. The Logistic regression classifier with character N-gram TF-IDF features performed best with the weighted precision, recall, and 𝐹 1 -score of 0.78 for Malayalam code-mixed text. The Dense Neural Network with character N-gram TF-IDF features performed best with the weighted precision of 0.96, recall of 0.95, and 𝐹 1 -score of 0.95 for Malayalam script-mixed text.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Social media such as facebook twitter is flooded with user generated content <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4]</ref>. In particular, hate speech on social media is on the rise at a rapid pace 1 posing a significant threat to the sustainable society. Social media YouTube defines hate speech as "any speech that involves race, age, sexual orientation, disability, religion, and racism to promote hate or violence among groups" 2 . Through these social platforms, the hate speech is reaching to concerned person even in their bedroom and last forever <ref type="bibr" target="#b4">[5]</ref>. The hate speech has a terrible impact on users' mental status, resulting in depression, sleeplessness, and even suicide. It is challenging for the authorities to prove someone guilty due to their anonymous identity and across border laws since some countries have the freedom of expression to write. In contrast, others adopt a very stringent policy for the same message <ref type="bibr" target="#b5">[6]</ref>.</p><p>FIRE 2020: Forum for Information Retrieval Evaluation, December 16-20, 2020, Hyderabad, India email: abhinavanand05@gmail.com (A. Kumar); sunil.saumya@iiitdwd.ac.in (S. Saumya); jps@nitp.ac.in (J.P. Singh) orcid:</p><p>The manual identification of hate speech is almost impossible and needs to be thoroughly investigated automatically. A considerable amount of research work on English hate speech has been published. <ref type="bibr">Davidson et al. [7]</ref> extracted N-gram TF-IDF features from tweets and applied logistic regression to classify each tweet into three classes hate, offensive, and neither. Kumari et al. <ref type="bibr" target="#b7">[8]</ref> presented a model to identify cyberbullying instances using features optimization with genetic algorithm. Similarly, Agarwal and Sureka <ref type="bibr" target="#b8">[9]</ref> extracted linguistic, semantic, and sentimental features and learned an ensemble classifier to detect racist contents. Kapil et al. <ref type="bibr" target="#b5">[6]</ref> proposed LSTM and CNN based model to identify the hate speech in social media posts whereas, Badjatiya et al. <ref type="bibr" target="#b9">[10]</ref> learned semantic word embedding to classify each tweet as racist, sexist, or neither. Kumari and Singh <ref type="bibr" target="#b10">[11]</ref> presented a deep learning model to detect hate speech for English text. The code-mixed and script-mixed sentences are among the major challenges for machine learning models due to the unavailability of a sufficient dataset. A code mixed dataset on Tamil with English code-mixed Tanglish <ref type="bibr" target="#b11">[12]</ref> and Malayalam with English code-mixed Manglish <ref type="bibr" target="#b12">[13]</ref> were recently proposed for sentiment analysis task.</p><p>The purpose of this study is to recognize the hate speech in Indian languages like code-mixed Tamil-English, code-mixed Malayalam-English, and script-mixed Malayalam-English languages. The dataset used in the study belongs to HASOC-Dravidian-CodeMix-FIRE2020 challenge <ref type="bibr" target="#b13">[14]</ref>. The data was gathered from YouTube and Twitter with a target for two sets of tasks. Task 1 asks to develop a classification system to differentiate script-mixed Malayalam comments into offensive and non-offensive. Task 2 requires to build a classifier to differentiate Tanglish and Manglish (Tamil and Malayalam have written using Roman Characters) into offensive and not-offensive classes. The current paper explored several deep learning and machine learning models to identify offensive posts from the code-mixed and script-mixed posts. For deep learning, we utilized attention-based Bi-LSTM-CNN, BERT, and DNN models. In contrast, for conventional machine learning, Support vector machine, Logistic regression, Random forest, and Naive Bayes classifiers are used.</p><p>The rest of the article is organized as follows; The proposed methodology is explained in Section 2. The experiment setting and obtained results are discussed in Section 3. Finally, we conclude the paper in Section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Methodology</head><p>The detail description of the submitted model in the FIRE-2020 workshop is listed in section 2.1, whereas the details of extensive experiments with different character N-gram TF-IDF features with the different classifiers are listed in section 2.2. The detailed statistic of the datasets used in this study is listed in Table <ref type="table" target="#tab_0">1</ref>. While pre-processing of the texts, we kept &amp; and @ in our dataset by translating it into 'and' and 'at' respectively and removed other special characters. We also removed single letter word, punctuation, and replaced all numeric letters into their corresponding English word (e.g., 1-one, 9-nine). Finally, all texts are converted into the lowercase.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Model-1</head><p>A hybrid attention-based Bi-LSTM and CNN network is used in the case of Tamil and Malayalam codemixed text. The overall model diagram of the hybrid attention-based Bi-LSTM and CNN network can be seen from Figure <ref type="figure" target="#fig_0">1</ref>. We used character embedding for the CNN network, whereas word embedding is used in the attention-based Bi-LSTM network. For character embedding, one-hot encoding vectors are used. For word embedding, we created our FastText<ref type="foot" target="#foot_0">3</ref> word embedding by utilizing the language- To process word-embedding, two Bi-LSTM layers having 512 and 256 output dimension space are used, followed by an attention layer. Finally, the output of attention-based Bi-LSTM and CNN layer is concatenated and passes through a softmax layer to predict offensive and not-offensive text. The detailed working of the CNN and attention-based Bi-LSTM network can be seen in <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b17">18]</ref>. The performance of deep neural networks is susceptible to hyper-parameters. Therefore, we performed extensive experiments by varying the learning rate, batch size, optimizer, epochs, loss function, and activation function. The proposed system worked best with the learning rate of 0.001, batch size of 32, Adam as an optimizer, epochs = 100, binary cross-entropy as a loss function, and ReLU activation in the internal layers of the network whereas softmax activation function at the output layer.</p><p>In the case of Malayalam script-mixed text, a fine-tuned pre-trained BERT <ref type="foot" target="#foot_1">4</ref> model is used to classify the text into offensive and not-offensive classes. We fixed 30-words for the text to input in the model and used a batch size of 32 and a learning rate of 2𝑒 −5 to fine-tune the pre-trained bert-basemultilingual-uncased BERT model. The detailed description of the BERT model can be seen in <ref type="bibr" target="#b18">[19]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Model-2</head><p>Along with the submitted model in the FIRE-2020 workshop, we also explored the uses of different word and character N-gram TF-IDF features with the conventional machine learning classifiers and deep neural network. We experimented by extracting different combinations of 1-gram, 2-gram, 3gram, 4-gram, 5-gram, and 6-gram word features and character features from the text and applied Support Vector Machine (SVM), Logistic Regression (LR), Naive Bayes (NB), Random Forest (RF), and Dense Neural Network (DNN). For the DNN network, we used four layers of a fully connected network with 1024, 256, 128, and 2-neurons. We used a dropout rate of 0.3, ReLU, and softmax as the activation function, batch size of 32, binary cross-entropy as the loss function, and Adam as the optimizer. Various word-level N-gram TF-IDF features with the said machine learning models did not perform well compared to the submitted model. Therefore we are not reporting the results of these  models. On the other hand, when various machine learning models learned with different character N-gram TF-IDF features like 1-gram, 2-gram, 3-gram, 4-gram, 5-gram, and 6-gram, the performance was remarkable. In the case of code-mixed Tamil and Malayalam text, the top 10,000 characters Ngram (1-gram to 6-gram) TF-IDF features performed best. In the case of Malayalam script-mixed text, the top 20,000 characters N-gram (1-gram to 6-gram) TF-IDF features performed best. The detailed results of each of the classifiers are listed in section 3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Results</head><p>The results of the attention-based Bi-LSTM-CNN and BERT models for Tamil code-mixed, Malayalam code-mixed, and Malayalam script-mixed text are listed in Table <ref type="table" target="#tab_1">2</ref>. In the case of Tamil code-mixed   Next, experiments were performed for machine learning models using character N-gram (1 to 6gram) TF-IDF features. The results for the Support Vector Machine (SVM), Logistic Regression (LR), Naive Bayes (NB), Random Forest (RB), and Dense Neural Network (DNN) are listed in Table <ref type="table">3</ref>. In the case of Tamil code-mixed text, the NB classifier performed best and achieved a precision, recall,</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Proposed hybrid attention-based Bi-LSTM and CNN network</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Confusion matrix for Naive Bayes (Tamil code-mixed)</figDesc><graphic coords="5,115.55,252.28,81.97,81.97" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :Figure 4 :Figure 5 :Figure 6 :</head><label>3456</label><figDesc>Figure 3: ROC for Naive Bayes (Tamil code-mixed)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: ROC for DNN model (Malayalam script-mixed)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Data statistic used in this study Tamil and Malayalam text for Tamil and Malayalam models, respectively. We used skip-gram techniques and trained system for 10 epochs to create the FastText word embedding vectors. We fixed 200-characters for the input sequences in the case of CNN and 30-words for the input sequences in the case of attention based Bi-LSTM network. Finally, the character embedding matrix (200×70) and word embedding matrix (30×100) passes through the CNN and attention-based Bi-LSTM network, respectively. In CNN, 128 filters of 1-gram, 2-gram, 3-gram, and 4-gram are used at different layers of CNN. The output of the CNN layer is then passed through a dense layer having 128 neurons.</figDesc><table><row><cell>Language</cell><cell>Class</cell><cell cols="3">Not-offensive Offensive Total</cell></row><row><cell>Malayalam code-mixed</cell><cell>Training</cell><cell>2047</cell><cell>1953</cell><cell>4000</cell></row><row><cell></cell><cell>Testing</cell><cell>473</cell><cell>478</cell><cell>951</cell></row><row><cell>Tamil code-mixed</cell><cell>Training</cell><cell>2020</cell><cell>1980</cell><cell>4000</cell></row><row><cell></cell><cell>Testing</cell><cell>465</cell><cell>475</cell><cell>940</cell></row><row><cell cols="2">Malayalam script-mixed Training</cell><cell>2633</cell><cell>567</cell><cell>3200</cell></row><row><cell></cell><cell cols="2">Development 328</cell><cell>72</cell><cell>400</cell></row><row><cell></cell><cell>Testing</cell><cell>334</cell><cell>66</cell><cell>400</cell></row><row><cell>specific code-mixed</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Results for the attention-based Bi-LSTM-CNN and BERT models</figDesc><table><row><cell>Language</cell><cell>Model</cell><cell>Class</cell><cell cols="3">Precision Recall 𝐹 1 -score</cell></row><row><cell>Tamil code-mixed</cell><cell cols="2">Attention-based Bi-LSTM-CNN Offensive</cell><cell>0.85</cell><cell>0.83</cell><cell>0.84</cell></row><row><cell></cell><cell></cell><cell>Not-offensive</cell><cell>0.83</cell><cell>0.85</cell><cell>0.84</cell></row><row><cell></cell><cell></cell><cell cols="2">Weighted Avg. 0.84</cell><cell>0.84</cell><cell>0.84</cell></row><row><cell>Malayalam code-mixed</cell><cell cols="2">Attention-based Bi-LSTM-CNN Offensive</cell><cell>0.71</cell><cell>0.71</cell><cell>0.71</cell></row><row><cell></cell><cell></cell><cell>Not-offensive</cell><cell>0.71</cell><cell>0.71</cell><cell>0.71</cell></row><row><cell></cell><cell></cell><cell cols="2">Weighted Avg. 0.71</cell><cell>0.71</cell><cell>0.71</cell></row><row><cell cols="2">Malayalam script-mixed BERT</cell><cell>Offensive</cell><cell>0.95</cell><cell>0.97</cell><cell>0.96</cell></row><row><cell></cell><cell></cell><cell>Not-offensive</cell><cell>0.83</cell><cell>0.74</cell><cell>0.78</cell></row><row><cell></cell><cell></cell><cell cols="2">Weighted Avg. 0.93</cell><cell>0.93</cell><cell>0.93</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">https://fasttext.cc/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">https://huggingface.co/transformers/pretrained_models.html</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Among all the submitted model in the FIRE-2020 workshop for this task, the best model achieved precision, recall, and 𝐹 1 -score of 0.90 for code-mixed text, precision, recall, and 𝐹 1 -score of 0.78 for Malayalam code-mixed text and precision, recall, and 𝐹 1 -score of 0.95 for Malayalam script-mixed text. Compared to the best-submitted model, our proposed system equally performed well in Tamil and Malayalam's code-mixed text. In the case of Malayalam script-mixed text, our proposed system performed slightly better with the precision of 0.96.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion</head><p>The identification of hate speech from the code-mixed and script-mixed Dravidian sentences have enormous challenges. This work explored the usability of several deep learning and machine learningbased models for classifying offensive and not-offensive sentences. The character N-gram TF-IDF based Naive Bayes classifier performed best with the weighted precision, recall, and 𝐹 1 -score of 0.90 for Tamil code-mixed text. The Logistic regression classifier with character N-gram TF-IDF features performed best with the weighted precision, recall, and 𝐹 1 -score of 0.78 for Malayalam code-mixed text. The Dense Neural Network with character N-gram TF-IDF features performed best with the weighted precision of 0.96, recall of 0.95, and 𝐹 1 -score of 0.95 for Malayalam script-mixed text.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Detection of spam reviews: A sentiment analysis approach</title>
		<author>
			<persName><forename type="first">S</forename><surname>Saumya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Singh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Csi Transactions on ICT</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="137" to="148" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Spam review detection using lstm autoencoder: an unsupervised approach</title>
		<author>
			<persName><forename type="first">S</forename><surname>Saumya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Singh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Electronic Commerce Research</title>
		<imprint>
			<biblScope unit="page" from="1" to="21" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A comparative analysis of machine learning techniques for disaster-related tweet classification</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Saumya</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE R10 Humanitarian Technology Conference (R10-HTC)(47129)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="page" from="222" to="227" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Relationship strength based access control in online social networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">C</forename><surname>Rathore</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of First International Conference on Information and Communication Technology for Intelligent Systems</title>
				<meeting>First International Conference on Information and Communication Technology for Intelligent Systems</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="197" to="206" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Towards cyberbullying-free social media in smart cities: a unified multi-modal approach</title>
		<author>
			<persName><forename type="first">K</forename><surname>Kumari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">K</forename><surname>Dwivedi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">P</forename><surname>Rana</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Soft Computing</title>
		<imprint>
			<biblScope unit="page" from="11059" to="11070" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Investigating deep learning approaches for hate speech detection in social media</title>
		<author>
			<persName><forename type="first">P</forename><surname>Kapil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ekbal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Das</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2005.14690</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Davidson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Warmsley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Macy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Weber</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1703.04009</idno>
		<title level="m">Automated hate speech detection and the problem of offensive language</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Identification of cyberbullying on multi-modal social media posts using genetic algorithm</title>
		<author>
			<persName><forename type="first">K</forename><surname>Kumari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Singh</surname></persName>
		</author>
		<idno type="DOI">10.1002/ett.3907</idno>
	</analytic>
	<monogr>
		<title level="j">Transactions on Emerging Telecommunications Technologies</title>
		<imprint>
			<biblScope unit="page">e3907</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sureka</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1701.04931</idno>
		<title level="m">Characterizing linguistic attributes for automatic classification of intent based racist/radicalized posts on tumblr micro-blogging website</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Deep learning for hate speech detection in tweets</title>
		<author>
			<persName><forename type="first">P</forename><surname>Badjatiya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Varma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 26th International Conference on WWW Companion</title>
				<meeting>the 26th International Conference on WWW Companion</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="759" to="760" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Ai_ml_nit patna at hasoc 2019: Deep learning approach for identification of abusive content</title>
		<author>
			<persName><forename type="first">K</forename><surname>Kumari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Singh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation</title>
				<meeting>the 11th annual meeting of the Forum for Information Retrieval Evaluation</meeting>
		<imprint>
			<date type="published" when="2019-12">December 2019. 2019</date>
			<biblScope unit="page" from="328" to="335" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Corpus creation for sentiment analysis in code-mixed Tamil-English text</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Muralidaran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Priyadharshini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st Joint Workshop on SLTU and CCURL</title>
				<meeting>the 1st Joint Workshop on SLTU and CCURL<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="202" to="210" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A sentiment analysis dataset for code-mixed Malayalam-English</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Jose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Suryawanshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sherly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st Joint Workshop on SLTU and CCURL</title>
				<meeting>the 1st Joint Workshop on SLTU and CCURL<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="177" to="184" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Overview of the track on &quot;hasoc-offensive language identification-DravidianCodeMix</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Mccrae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">B</forename></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of FIRE</title>
				<meeting>FIRE</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Location reference identification from tweets during emergencies: A deep learning approach</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Singh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International journal of disaster risk reduction</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="365" to="375" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A deep multi-modal neural network for informative twitter content classification during emergencies</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">K</forename><surname>Dwivedi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">P</forename><surname>Rana</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Annals of Operations Research</title>
		<imprint>
			<biblScope unit="page" from="1" to="32" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Attention-based lstm network for rumor veracity estimation of tweets</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">P</forename><surname>Rana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">K</forename><surname>Dwivedi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Systems Frontiers</title>
		<imprint>
			<biblScope unit="page" from="1" to="16" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Predicting the helpfulness score of online reviews using convolutional neural network</title>
		<author>
			<persName><forename type="first">S</forename><surname>Saumya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">K</forename><surname>Dwivedi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Soft Computing</title>
		<imprint>
			<biblScope unit="page" from="1" to="17" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1910.01108</idno>
		<title level="m">DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
