<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Machine Learning Models for Hate Speech Identification in Marathi Language</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Disha</forename><surname>Gajbhiye</surname></persName>
							<email>dishagajbhiye14@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Hope Foundation&apos;s International Institute of Information Technology</orgName>
								<orgName type="institution">Hinjawadi</orgName>
								<address>
									<settlement>Pune</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Swapnil</forename><surname>Deshpande</surname></persName>
							<email>swapnildeshpande412@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Hope Foundation&apos;s International Institute of Information Technology</orgName>
								<orgName type="institution">Hinjawadi</orgName>
								<address>
									<settlement>Pune</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Prerna</forename><surname>Ghante</surname></persName>
							<email>ghanteperi6@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Hope Foundation&apos;s International Institute of Information Technology</orgName>
								<orgName type="institution">Hinjawadi</orgName>
								<address>
									<settlement>Pune</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Abhijeet</forename><surname>Kale</surname></persName>
							<email>abhijeetkale459@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Hope Foundation&apos;s International Institute of Information Technology</orgName>
								<orgName type="institution">Hinjawadi</orgName>
								<address>
									<settlement>Pune</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Deptii</forename><surname>Chaudhari</surname></persName>
							<email>deptiic@isquareit.edu.in</email>
							<affiliation key="aff0">
								<orgName type="department">Hope Foundation&apos;s International Institute of Information Technology</orgName>
								<orgName type="institution">Hinjawadi</orgName>
								<address>
									<settlement>Pune</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">Forum for Information Retrieval Evaluation</orgName>
								<address>
									<addrLine>December 13-17</addrLine>
									<postCode>2021</postCode>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Machine Learning Models for Hate Speech Identification in Marathi Language</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">2B47E35CDB8E8375326B432286E7E1C5</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T01:33+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Logistic Regression</term>
					<term>Random Forest Classifier</term>
					<term>TF-IDF Vectorizer</term>
					<term>Text Classification D. Chaudhari) 0000-0003-3632-3832 (D. Chaudhari)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Hate speech content has become a significant issue in today's world. Hate speech detection is an automated task of detecting textual content that contains discriminatory language regarding a person or group based on who they are, their race, gender, caste, etc. In this paper, we discuss the models submitted by our team, Mind Benders, for Marathi subtask A, for "Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages (HASOC)" at Forum for Information Retrieval Evaluation. A training and test dataset in Marathi language containing 1874 and 625 tweets, respectively, were shared by the HASOC organizers. Using these datasets, we propose an approach to automatically classify the tweets into two categories: "NOT" (Non-Hate-Offensive) and "HOF" (Hate and Offensive). The classification models developed are applied to the test dataset. They are experimented with to predict the categories of respective test data.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The use of social media has increased in recent years. It plays a significant role in forming and shaping views of people on various issues. Users tend to send hateful and offensive messages to a person or community on social media platforms, leading to heated debates.</p><p>To make social networking sites a friendly knowledge-sharing environment, there is an acute need for an automated hate speech detection system that will automate making decisions.</p><p>Hate speech classifies tweets into two categories, hate speech or non-hate speech. The number of hate and non-hate tweets had to be balanced as the initial stage in developing our model. Our data preprocessing step involved two approaches, Random forest, and Logistic Regression.</p><p>Random forest is a supervised learning technique used for both classification and regression problems in ML. It builds decision trees on different samples and takes their majority vote for classification and average in case of regression.</p><p>A machine learning approach called logistic regression is used to forecast the likelihood of a target variable. It's a method for predicting a categorical dependent variable from a set of independent variables.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Several studies on the automatic detection of hate speech and offensive and non-offensive content have been published. Kulkarni, et al. <ref type="bibr" target="#b0">[1]</ref> has adopted the best accuracy using IndicBERT and CNN with Indic fastText word embeddings. This dataset will play a crucial role in advancing NLP research for the Marathi language.</p><p>Aluru et al. <ref type="bibr" target="#b1">[2]</ref> worked on using classification techniques for hate speech detection like CNN-GRU, BERT, mBERT, translation. Pathak et al. <ref type="bibr" target="#b2">[3]</ref> applied Support Vector Classifier, Multinomial Bayes, LR, Random Forest Classifier, n-gram model, Text Classification. Founta et al. <ref type="bibr" target="#b3">[4]</ref> worked on Deep Learning Architectures such as text classification network, metadata network, combining two classification paths, and trained combined networks. The related study shows that significant work has been done on detecting hate speech in many Indian languages.</p><p>The approach of a system developed by Khandelwal et al. <ref type="bibr" target="#b4">[5]</ref> is based on N-gram, CBOW, and reference tokens. This system detects abusive language in English from social media. Another work done by Lakshmi BS et al. <ref type="bibr" target="#b5">[6]</ref> detects the offensive content from English and Kannada social media text. Sutejo et al. <ref type="bibr" target="#b6">[7]</ref> used word n-gram, Long short-term memory (LSTM) in deep learning to determine the sentiments from the Indonesian language. Jiang et al. <ref type="bibr" target="#b7">[8]</ref> used two datasets of the Hate speech dataset published on Kaggle that contain 1000 unique labeled values (tweets data). They have used multiple classifiers such as Logistic Regression and Support Vector Machines (SVMs) for classification.</p><p>Kovács et al. <ref type="bibr" target="#b8">[9]</ref> worked on the text preprocessing methods and the cross-validation method used to train and evaluate models. Working on Natural Language Toolkit, Word2vec, a combination of Bag of-word (CBOW) and Skip-Gram algorithm, was done by Chaitanya et al. <ref type="bibr" target="#b9">[10]</ref>. Gaydhani et al. <ref type="bibr" target="#b10">[11]</ref> employed several techniques such as SVM, Logistic regression, and Naive Bayes to classify tweets into offensive and non-offensive. Mandl et al. <ref type="bibr" target="#b11">[12]</ref> presented an overview of the tasks and the results of the HASOC track at FIRE 2020.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Problem Definition</head><p>We propose a coarse-grained binary classification to classify tweets into two classes: Hate and Offensive (HOF) and Non-Hate and offensive (NOT).</p><p>Non-Hate-Offensive (NOT) -Post does not contain any Hate speech, profane, offensive content. Hate and Offensive (HOF) -Post contains Hate, offensive, and profane content.</p><p>Best resulting features are used by extracting language-specific and language-independent characteristics of the given dataset. The approach applied for the classification of this text data is explained below. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Methodology</head><p>A supervised machine learning approach is used in the experimental work. While building the model, data preprocessing is a vital step. In NLP, the first step is to preprocess the data, i.e., removal of unnecessary noise from the textual content. This is followed by encoding the text into numeric vectors as Machine Learning needs data in the numeric form. This is done using encoding techniques such as BagOfWords, n-gram, TF-IDF, Word2Vec, etc. In our analysis, we have implemented the TF-IDF feature extraction technique.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1.">Data preprocessing</head><p>The data is usually in the natural human format, which is in sentences or paragraphs. Hence, before analyzing it, the information needs to be transformed and cleaned up so that the computer in the desired language can understand it. Following are the steps of the preprocessing data phase:</p><p>• Removal of Leading and Trailing spaces: They are unnecessary whitespaces located at both ends of the line, removed using the python strip() method. • Removal of irrelevant characters (numbers and punctuation): In our analysis, the English alphabet and numbers, Marathi numbers, and punctuation are irrelevant. Thus, they are removed to simplify the text content. • Removal of URLs and emojis: URLs and emojis are also needless in our analysis; hence they are removed from the text using regex expression. • Removal of stopwords: A custom-made Marathi stopwords list is defined for removing stopwords, which are commonly used words that have no real value in the analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2.">Features Extraction</head><p>For feature extraction, we applied the TF-IDF technique, which is used to get the most important words. TF and IDF measure the frequency of the word in a document and the uniqueness of the word, respectively. To convert the sentences into vectors, multiply the word frequency by the inverse document frequency. This is done with sci-kit-learn, and the TF-IDF vectorizer technique is used to extract features from the document of words. Thus, it provides a matrix of numeric values of the entire document.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.3.">Classifier Models</head><p>Implementation of two classifier models was carried out, namely, Logistic Regression and Random Forest Classifier. The extracted feature set is used in the training phase. Around 70% of the observations from the training dataset are used for fitting the model. In contrast, the remaining portion is used for testing to make predictions to test the model's accuracy. We have used the accuracy of the results of classification to evaluate the performance of these classifiers. For this purpose, the parametric values with the best performance are found by varying the parameters of each classifier.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental Work</head><p>We have developed a model for the HASOC Marathi subtask A using a Machine Learning approach. Experiments were done on various classifier algorithms by the feature extraction set.</p><p>The classifier algorithms that we used for our experiments are as follows:</p><p>• Logistic Regression (LR): As it is known, Linear regression uses a linear function to map input values to continuous values. The data is modeled using a straight line to predict the output of a variable. Logistic regression is similar to linear regression, except logistic regression predicts whether something is true or false instead of predicting continuous values. It is a Supervised Machine Learning algorithm used to predict the probability of target variables. The probability of some obtained event is represented as a linear function of a combination of predictor variables. It is used when data is linearly separable and output is binary or dichotomous in nature. So, it is used for binary classification problems. The target variable is divided into two classes' 1' for success/YES and' 0' for failure/NO. Logistic regression's ability to provide probabilities and classify new samples using continuous and discrete measurements makes it a popular machine learning method. One big difference between linear regression and logistic regression is how the line is fit to the data. With linear regression, we fit the line using the least-squares method, i.e., we find the line that minimizes the sum of the squares of the residuals. We also use the residuals to calculate 𝑅 2 and to compare simple models to complicated models. Logistic regression doesn't have the same concept of a residual, so it can't use the least-squares method. Instead, it uses the concept of maximum likelihood. The goal of maximum likelihood is to find the optimal way to fit a distribution to the data. Instead of fitting a line to the data, logistic regression fits an "S" shaped logistic function called the Sigmoid function, which is used for classification. It is helpful to map any predicted values into values between 0 and 1. The concept of the threshold value is used in LR. If the expected value is above the threshold, it tends to be one, and below the threshold, it is 0. There are two hypotheses in logistic regression: a null hypothesis and the other is an alternative hypothesis. We used an alternative where the model predicts the accurate values and differs significantly from null or zero. From the analysis of this hypothesis, the output from the hypothesis depends on estimated probability. 𝑙𝑜𝑔 𝑝 1−𝑝 is a link function used in logistic regression where p is the probability of success and 1-p is the probability of failure. Here p must always be positive and less than equal to 1. 𝑝 1−𝑝 is an odds ratio. If the odds ratio comes out positive then the probability of success is always more than 50%. If it comes out negative, then it is the probability of failure.</p><p>• Random Forest (RF): It is a Machine Learning algorithm used for classification and regression problems. Random forests are made out of decision trees. Decision trees work great with the data used to create them, but they are not flexible when it comes to classifying new samples. Random forests combine the simplicity of decision trees with flexibility resulting in a vast improvement in accuracy. A random forest contains several decision trees on various subsets of a given training dataset. It provides output based on a majority vote. The decision tree consists of three components, a decision node, a leaf node, and a root node. This tree will divide the training dataset into branches and further separate it into other branches. The variation between a decision tree and a random forest is that the earlier combines certain decisions while the latter does not. A random forest, on the other hand, combines many decision trees. We have used a bagging method for prediction known as Bootstrap aggregation, the ensemble technique used in random forests. It involves using different samples of data rather than one. The training dataset consists of observation and features that are used for prediction. Now the tree will produce different outputs depending upon training data. The final output obtained is based on majority voting, and the collection of this output is called aggregation. Also, we used hyper-parameters which are helpful to increase the prediction power of the model. n_estimators is one of the hyper-parameters. n_estimator is many trees that the algorithm builds before taking majority voting or average predictions. If the number of trees increases, the model's performance will improve, and prediction will be stable.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head><p>For better model performance, we have used 70 percent of the training data for training the model, and the remaining is used for testing. Table <ref type="table" target="#tab_1">2</ref> and Table <ref type="table" target="#tab_2">3</ref> show the Precision, Recall, F1, and Accuracy scores for Logistic Regression and Random Forest. Precision is defined as the ratio 𝑡𝑝 𝑡𝑝+𝑓 𝑝 where 𝑡𝑝 is the number of true positives and 𝑓 𝑝, the number of false positives. Precision is the ability of the classifier not to label a sample as positive, that is negative. The F1 score is also known as balanced F-score or F-measure. It is the weighted average of Precision and Recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of Precision and Recall to the F1 score is equal. The formula for the F1 score is defined as 𝐹 1 = 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛•𝑟𝑒𝑐𝑎𝑙𝑙 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙 . Recall score helps when the cost of false negatives is high. Recall is the ratio 𝑡𝑝 𝑡𝑝+𝑓 𝑛 where 𝑡𝑝 is the number of true positives and 𝑓 𝑛, the number of false negatives. Recall is the ability of the classifier to find all the positive samples. Accuracy score can tell us immediately whether a model is being trained correctly and how it may perform generally. It is simply a ratio of correctly predicted observations to the total observations. Logistic Regression has an F1 score of 0.84 for the non-offensive text and 0.54 for the hateoffensive. Accuracy of 0.7595 is obtained from Logistic Regression. Random Forest has an F1 score of 0.83 for the non-offensive text and 0.67 for the hate-offensive text. This classifier gives an accuracy of 0.7770.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>Hate speech continues to be a social media problem. This paper presents the experimental work and results of HASOC Marathi subtask-A by Team Mind Benders. This paper proposes a  We performed an analysis of LR and RF on various sets of feature values and model parameters. For the identification of critical features from data, we used the TF-IDF feature extraction technique. The results showed that Random Forest performs comparatively better than the Logistic Regression approach. We achieved a reasonable accuracy of 0.77 using the Random Forest classifier. Given all the challenges that remain, there is a need for more research on this problem statement.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Architectural View</figDesc><graphic coords="4,89.29,84.19,416.69,234.39" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Training and Test Dataset Statistics for the Marathi languageWe have chosen the task to identify offensive and non-offensive content in the Marathi dataset released for the HASOC shared task as discussed above, consisting of CSV files of comments. All given comments are in Marathi. This training dataset has columns with column names as Text ID, text, and label, respectively. The Label column has values either HOF, indicating offensive</figDesc><table><row><cell cols="3">Language used Type of dataset Type of tweet</cell><cell>%</cell><cell>Total</cell></row><row><cell>Marathi</cell><cell>Training</cell><cell>HOF</cell><cell cols="2">1204 (64.27%) 1874</cell></row><row><cell>Marathi</cell><cell>Training</cell><cell>NOT</cell><cell>670 (35.73%)</cell><cell></cell></row><row><cell>Marathi</cell><cell>Test</cell><cell>HOF</cell><cell>Not known</cell><cell>625</cell></row><row><cell>Marathi</cell><cell>Test</cell><cell>NOT</cell><cell>Not known</cell><cell></cell></row><row><cell>3.1. Datasets</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table><note>text, or NOT, indicating a non-offensive text. The number of comments in the file is around 1874. This training dataset is given to carry the experimental work of training the machine by applying appropriate machine learning algorithms. The test dataset has only two columns, text id, and text. The third column i.e. Label, is missing. After the training in the first phase, the machine learning algorithms have to predict the labels of the respective tweets. Approximately 625 comments are available in this dataset for both languages. Gaikwad et al.<ref type="bibr" target="#b12">[13]</ref> worked on a model for Marathi language that described the task's data. Modha et al.<ref type="bibr" target="#b13">[14]</ref> have given an overview of the results and findings of HASOC 2021. Table 1 represents the statistical data about this Training and Test Dataset for the Marathi language.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Results of Logistic Regression</figDesc><table><row><cell></cell><cell cols="4">Precision Recall F1 score Support</cell></row><row><cell>HOF</cell><cell>0.85</cell><cell>0.39</cell><cell>0.54</cell><cell>224</cell></row><row><cell>NOT</cell><cell>0.74</cell><cell>0.96</cell><cell>0.84</cell><cell>404</cell></row><row><cell>Accuracy</cell><cell></cell><cell></cell><cell>0.76</cell><cell>628</cell></row><row><cell>Macro avg</cell><cell>0.80</cell><cell>0.68</cell><cell>0.69</cell><cell>628</cell></row><row><cell>Weighted avg</cell><cell>0.78</cell><cell>0.76</cell><cell>0.73</cell><cell>628</cell></row><row><cell cols="4">Logistic Regression, Accuracy Score: 75.955%</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Results of Random Forest Accuracy Score: 77.707% solution for detecting Marathi hate speech and offensive content on the Twitter dataset through supervised machine learning approaches like Logistic Regression (LR) and Random Forest (RF).</figDesc><table><row><cell></cell><cell cols="4">Precision Recall F1 score Support</cell></row><row><cell>HOF</cell><cell>0.70</cell><cell>0.65</cell><cell>0.67</cell><cell>224</cell></row><row><cell>NOT</cell><cell>0.81</cell><cell>0.85</cell><cell>0.83</cell><cell>404</cell></row><row><cell>Accuracy</cell><cell></cell><cell></cell><cell>0.78</cell><cell>628</cell></row><row><cell>Macro avg</cell><cell>0.76</cell><cell>0.75</cell><cell>0.75</cell><cell>628</cell></row><row><cell>Weighted avg</cell><cell>0.77</cell><cell>0.78</cell><cell>0.77</cell><cell>628</cell></row><row><cell cols="2">Random Forest,</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Kulkarni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mandhane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Likhitkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kshirsagar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Joshi</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2103.11408</idno>
		<title level="m">L3cubemahasent: A marathi tweet-based sentiment analysis dataset</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Aluru</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mathew</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Saha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mukherjee</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2004.06465</idno>
		<title level="m">Deep learning models for multilingual hate speech detection</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Pathak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mundada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Joshi</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2102.09866</idno>
		<title level="m">Kbcnmujal@ hasoc-dravidian-codemix-fire2020: Using machine learning for detection of hate speech and offensive code-mixed social media text</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A unified deep learning architecture for abuse detection</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Founta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chatzakou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kourtellis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Blackburn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vakali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Leontiadis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th ACM conference on web science</title>
				<meeting>the 10th ACM conference on web science</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="105" to="114" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Gender prediction in english-hindi code-mixed social media content: Corpus and baseline system</title>
		<author>
			<persName><forename type="first">A</forename><surname>Khandelwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Swami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Akhtar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Shrivastava</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computación y Sistemas</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="page" from="1241" to="1247" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">An automatic language identification system for codemixed english-kannada social media text</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">S</forename><surname>Lakshmi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Shambhavi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS), IEEE</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1" to="5" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Indonesia hate speech detection using deep learning</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">L</forename><surname>Sutejo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Lestari</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2018 International Conference on Asian Language Processing (IALP), IEEE</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="39" to="43" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Detecting hate speech from tweets for sentiment analysis</title>
		<author>
			<persName><forename type="first">L</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Suzuki</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2019 6th International Conference on Systems and Informatics (ICSAI), IEEE</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="671" to="676" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Challenges of hate speech detection in social media</title>
		<author>
			<persName><forename type="first">G</forename><surname>Kovács</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Alonso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Saini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SN Computer Science</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="1" to="15" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Word level language identification in code-mixed data using word embedding methods for indian languages</title>
		<author>
			<persName><forename type="first">I</forename><surname>Chaitanya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Madapakula</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Thara</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Advances in Computing, Communications and Informatics (ICACCI)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="page" from="1137" to="1141" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Gaydhani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Doma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kendre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bhagwat</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1809.08651</idno>
		<title level="m">Detecting hate speech and offensive language on twitter using machine learning: An n-gram and tfidf based approach</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Overview of the hasoc track at fire 2020: Hate speech and offensive language identification in tamil, malayalam, hindi, english and german</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Forum for Information Retrieval Evaluation</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="29" to="32" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Cross-lingual offensive language identification for low resource languages: The case of marathi</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gaikwad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ranasinghe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">M</forename><surname>Homan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of RANLP</title>
				<meeting>RANLP</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Overview of the HASOC subtrack at FIRE 2021: Hate speech and offensive content identification in english and indo-aryan languages and conversational hate speech</title>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Shahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Madhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Satapara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ranasinghe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">FIRE 2021: Forum for Information Retrieval Evaluation, Virtual Event</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2021-12">December 2021. 2021</date>
			<biblScope unit="page" from="13" to="17" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
