<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">SVM for Hate Speech and Offensive Content Detection</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Shyam</forename><surname>Ratan</surname></persName>
							<email>shyamratan2907@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Linguistics</orgName>
								<orgName type="institution">Dr. Bhimrao Ambedkar University</orgName>
								<address>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sonal</forename><surname>Sinha</surname></persName>
							<email>sonalsinha2612@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Linguistics</orgName>
								<orgName type="institution">Dr. Bhimrao Ambedkar University</orgName>
								<address>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Siddharth</forename><surname>Singh</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Centre for Transdisciplinary Studies</orgName>
								<orgName type="institution">Dr. Bhimrao Ambedkar University</orgName>
								<address>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department">Forum for Information Retrieval Evaluation</orgName>
								<address>
									<addrLine>December 13-17</addrLine>
									<postCode>2021</postCode>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">SVM for Hate Speech and Offensive Content Detection</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">95EB349A0F07F147F7077DD114ECF4FB</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T01:35+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>English</term>
					<term>Hindi</term>
					<term>SVM</term>
					<term>Hate Speech</term>
					<term>Offensive Language</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents the system description of S_cube, which was submitted at the FIRE Shared Task 2021 on Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC). Our team submitted a system for Subtask 1 in two languages -English and Hindi, which has two different segments Subtask 1A and 1B for both languages. We experimented with the classic machine learning using Support Vector Machine (SVM). We discuss the system and its results with main findings for hate speech and offensive content identification in this paper. Our model achieves an F1 Score of 0.7563 at English Subtask 1A while the performance is worse for Hindi Subtask 1B (0.7195 F1).</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Communication on the internet has become a lot faster than anything in the world through various social media platforms like Facebook, Twitter, Whatsapp, Viber, Telegram and many more. Now, the concern is to check what kind of information and speech is being spread by users so that the social media platforms do not work as hotbeds for hate speech and offensive content. Therefore, a robust automatic filter system is required to sweep away these malicious contents.Hate speech and offensive content ranges over several issues, such as politics, religion, colour, gender, caste, ethnicity, etc. which holds the potential to polarise the society <ref type="bibr" target="#b0">[1]</ref>. The benefit of anonymity and fake accounts on social media are major contributing factors for ease in bullying and the spread of hate speech and offensive languages at light speed.</p><p>Prominent efforts have been put to develop systems to secure the platforms (distinctively <ref type="bibr" target="#b1">[2]</ref>, <ref type="bibr" target="#b2">[3]</ref>, <ref type="bibr" target="#b3">[4]</ref>, <ref type="bibr" target="#b4">[5]</ref>, <ref type="bibr" target="#b5">[6]</ref>, <ref type="bibr" target="#b6">[7]</ref>, <ref type="bibr" target="#b7">[8]</ref> ). In addition to it, many shared tasks are being regularly organised for awareness and to come up with productive outcomes as automatic detection around the context of hate speech, aggression and offensive content <ref type="bibr" target="#b8">[9]</ref>, <ref type="bibr" target="#b9">[10]</ref>, <ref type="bibr" target="#b10">[11]</ref>, <ref type="bibr" target="#b0">[1]</ref>, <ref type="bibr" target="#b11">[12]</ref>, <ref type="bibr" target="#b12">[13]</ref>, <ref type="bibr" target="#b13">[14]</ref>.</p><p>One of its kinds in this horizon is FIRE 2021 shared task on Hate Speech and Offensive Content Detection in Indo-European Languages (HASOC 2021). In this paper, as part of the shared task, we elaborate on automatic hate speech and offensive content identification using SVM based system and its development for both segments of sub-task 1 in two languages -Hindi and English. The remaining portion of the paper is divided into four sections. Section 2 discusses the used corpus size and its types for training and testing. Section 3 gives a detailed sketch of the conducted experiments for this task. Moreover, section 4 delivers the developed system's results and their error analysis with classified types of errors. Eventually, section 5 wraps up with the concluding notes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Dataset</head><p>In order to direct the experiments for the identification/classification of hate speech and offensive language, we used the annotated twitter corpus for Hindi and English languages which were shared in the FIRE Shared Task HASOC 2021 <ref type="bibr" target="#b14">[15]</ref>. An enumeration of the shared task corpus is given in Table <ref type="table" target="#tab_0">1</ref>. The corpus is labelled at two levels and they were presented as two segments in Subtask 1 for Hindi and English <ref type="bibr" target="#b15">[16]</ref> given below -1. Subtask 1A: In sub-task 1A, the corpus is annotated as HOF and NOT. HOF stands for hate speech, offensive language, and profane words while NOT is non hate and non-offensive content. Hence it is a binary classification task. 2. Subtask 1B: In Subtask 1B, fine-grained classification is offered for the identification of hate speech and offensive language. If the content is marked HOF in the Subtask 1A then it is marked as Hate Speech (HATE), Offensive (OFFN), and Profanity (PRFN) in this stage. Hence it is a three class classification task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Experiments with SVM</head><p>We mainly experimented with SVMs classifier for Subtask 1A and 1B of Hindi and English corpus. We used the scikit-learn implementation of SVM ( <ref type="bibr" target="#b16">[17]</ref>, <ref type="bibr" target="#b17">[18]</ref> as cited in <ref type="bibr" target="#b0">[1]</ref>). Support Vector Machines (SVMs) <ref type="bibr" target="#b18">[19]</ref> are one of the most efficient classic machine learning models used for different kinds of text classification tasks. We experimented with binary and three-class problems with our basic objective of exploring the efficiency and productivity of SVMs for the detection of hate speech and offensive content.</p><p>In the case of our system, we experimented with SVM for both segments of Subtask 1 with the consecutive sets of features (given below in list 1, 2, and 3) and different C-values (0.001, 0.01, 0.1, 1, 5, 10) for working out the best model. Our classifier's best performances in both languages are given in Table <ref type="table" target="#tab_1">2</ref> with n-gram features. Selection of these combination of word n-grams and character n-grams is based on best performances of system for Subtask1A and Subtask 1B.</p><p>1. Character n-grams features (trigrams to five-grams).</p><p>2. Word n-grams features (unigrams, bigrams and trigrams).</p><p>3. A systematic combination of diverse character n-grams and word n-grams features.</p><p>From the above experiments, we get that given the particular dataset, for English Subtask 1A feature of character five-gram and word trigram with C-value 10 gives the best performance. For Hindi Subtask 1A feature of character four-gram and word bigram with C-value 5 gives the best performance. For English Subtask 1B feature of character four-gram and word trigram with C-value 5 gives the best performance. For Hindi Subtask 1B feature of character four-gram and word unigram with C-value 10 gives the best performance. In the overall judgment of both Subtasks, the combination of character n-grams and word n-grams performed well for Subtask 1A in Hindi and English than Subtask 1B. Though, the score-wise improvement in all sub-tasks was very low for different features. Word n-gram features are widely effective in the case of Subtask 1A, which is binary classification and on the other side of three-class classification these are not very helpful for Subtask 1B.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results and Error Analysis</head><p>In the collective results, our system performed best on the test set in Subtask 1A for English in comparison to Hindi (also Subtask 1A) and Subtask 1B for both languages. The macro F1 scores of all segments of Subtask 1 are present in Table <ref type="table" target="#tab_2">3</ref>. Our SVM classifier was placed at the 34th position in English Subtask 1A (the macro F1 score is 8 points below that of the topmost team), while it is placed at the 25th place in English Subtask 1B (macro F1 score being almost 9 points below the best performance team), it is placed at the 29th position in Hindi Subtask 1A (with an overall difference of 7 points in the micro F1 score of the best team) and finally, it is placed at the 15th position in Hindi Subtask 1B (with a difference of almost 11 points in macro F1 score in comparison to the topper team). The performance comparison of our classifier and best classifier in all segments of Subtask 1 are summed up in the Figure <ref type="figure" target="#fig_0">1</ref>. Apart from the results of our system for both languages, analysis of predicted errors on test data and its explanations are also most important. It is quite visible that the system we have developed has high precision and low recall in all Subtasks for Hindi and English. In the comparison of predicted labels in sub-task 1A for both languages. Our system predicted the values of HOF class in English Subtask 1A (see Figure <ref type="figure" target="#fig_1">2</ref>) are much higher in numbers than the Hindi Subtask 1A (see Figure <ref type="figure">3</ref>), which is quite opposite for NOT class in both segments of this Subtask. The performance of the system for different classes in different Subtasks is due to the sampling size of training sample data. Here, In Subtask 1A the proportion of both classes (HOF and NOT) were higher individually in English and Hindi.</p><p>Likewise, in three-class classification Subtask 1B the system performed well for some classes and predicted PRFN and NONE well in comparison of HATE and OFFN classes in English (see Figure <ref type="figure" target="#fig_2">4</ref>). In this Subtask of Hindi (see Figure <ref type="figure">5</ref>), NONE class is produced adequately good in numbers than the other classes (PRFN, OFFN, and HATE). The earlier trend of the proportion of training sample data is followed here in the case of three-class classification, where the 65% are PRFN and NONE classes of whole proportion in English, which is opposite in Hindi where PRFN is much lesser and OFFN, HATE are subsequent in numbers than NONE class. Another basis of the lower performance of the system in different Subtasks for both languages is the structure and morphological features of both languages, where the structure of Hindi is a little bit complex with a good number of morphological features. In error analysis, some different types of errors are classified on the basis of gold labels and system predicted labels in both languages. These are satire/sarcasm, slogan, coined and aggressive lexical items, idiomatic expressions, quotes, and code-mixed data, etc. Broadly, these error types are predicted in the form of lexical features for all segments of Subtask 1 represented in Table <ref type="table" target="#tab_3">4</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>The paper deals with a detailed description of the S_Cube system which is developed for HASOC at FIRE 2021. Results of the experiment show that SVM extrapolates a cut above for the binary classifier task in Subtask 1A, effective in cases of uneven corpus too, which is far opposite in the case of the three-class classifier. SVM is able to achieve low recall (but high precision) for all Subtasks in both languages. We also observed that, the lower performance in Subtask 1B could be broadly ascribed to the uneven corpus and the lack of ample training sample size for This quote is used for supreme political leader of BJP. HATE -PRFN different classes. The lexical features like satire, slogans, idioms, quotes and code-mixed data are adding to the factor due to which system is producing error. Therefore, a more propped corpus with a substantial learning sample size for each class could give better results in these incidents.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Performance of SVM classifier vis-a-vis the best classifier in Subtask 1</figDesc><graphic coords="4,89.29,150.33,416.69,184.52" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Confusion Matrix for English sub-task 1A Figure 3: Confusion Matrix for Hindi sub-task 1A</figDesc><graphic coords="4,90.20,450.78,200.01,150.01" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Confusion Matrix for English Subtask 1B Figure 5: Confusion Matrix for Hindi Subtask 1B</figDesc><graphic coords="5,90.20,285.42,200.01,150.01" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 The</head><label>1</label><figDesc></figDesc><table><row><cell cols="2">HASOC Dataset</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell cols="2">Train Sub-task 1A</cell><cell></cell><cell cols="3">Train Sub-task 1B</cell><cell></cell><cell>Test Set</cell></row><row><cell></cell><cell cols="8">TOTAL HOF NOT TOTAL HATE OFFN PRFN NONE TOTAL</cell></row><row><cell>EN</cell><cell>3,843</cell><cell>2,501 1,342</cell><cell>3,843</cell><cell>683</cell><cell>622</cell><cell>1,196</cell><cell>1,342</cell><cell>1,281</cell></row><row><cell>HI</cell><cell>4,594</cell><cell>1,433 3,161</cell><cell>4,594</cell><cell>566</cell><cell>654</cell><cell>213</cell><cell>3,161</cell><cell>1,532</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Comparison of character and word n-gram features for best SVM classifier</figDesc><table><row><cell></cell><cell cols="2">Sub-task 1A</cell><cell cols="2">Sub-task 1B</cell></row><row><cell></cell><cell>EN</cell><cell>HI</cell><cell>EN</cell><cell>HI</cell></row><row><cell>Character n-grams</cell><cell>4, 5</cell><cell>4</cell><cell>4</cell><cell>3, 4, 5</cell></row><row><cell>Word n-grams</cell><cell cols="2">1, 2, 3 1, 2, 3</cell><cell>3</cell><cell>1</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc></figDesc><table><row><cell>Macro F1 Score of Subtask 1</cell><cell></cell><cell></cell></row><row><cell>Subtask</cell><cell cols="2">Hindi Marco F1 Score English Marco F1 Score</cell></row><row><cell>Subtask 1A</cell><cell>0.7195</cell><cell>0.7563</cell></row><row><cell>Subtask 1B</cell><cell>0.4513</cell><cell>0.5739</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Error classification with types</figDesc><table><row><cell>Error Types</cell><cell>Hindi &amp; English Examples</cell><cell>Translation &amp; Explaination</cell><cell>Gold Label -Predicted Label</cell></row><row><cell></cell><cell>1. vodaafon ne ek kuttaa paalaa</cell><cell></cell><cell></cell></row><row><cell>Satire / Sar-casm</cell><cell>thaa bhut fems huaa fir mukesh an-baani ko shauk kdha. 2. Kangana did a terrible mistake of pointing the mistakes of supreme leader !!</cell><cell>1. Vodafone had raised a dog, it became very famous then Mukesh Ambani was fond of it.</cell><cell>NOT -HOF, OFFN -HATE</cell></row><row><cell></cell><cell>Betrayal &amp; amp</cell><cell></cell><cell></cell></row><row><cell>Slogan</cell><cell>srkaar maun jntaa preshaan</cell><cell>The government is silent, the pub-lic is upset</cell><cell>NONE -HATE</cell></row><row><cell></cell><cell></cell><cell>1. People of BJP (Bhartiya Janta</cell><cell></cell></row><row><cell>Code-mix data</cell><cell>1. fattu hain bjp vaale. 2. In this not like this feku govt. may day we want a new fresh govt.</cell><cell>Party) 1 are Coward. 2. We want like this government who makes new government in this may not</cell><cell>HOF -NOT, HATE -NONE</cell></row><row><cell></cell><cell></cell><cell>false promises</cell><cell></cell></row><row><cell>Aggressive</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Lexical</cell><cell>aashutos tu vaakyi gadhaa hai</cell><cell cols="2">Ashutosh, you are a actual donkey OFFN -NONE</cell></row><row><cell>Items</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Idiomatic Expressions</cell><cell>jaisi krni vaisi bhrni</cell><cell>As you sow, so you shall reap</cell><cell>NOT -HOF</cell></row><row><cell>Coined Lexi-cal Items</cell><cell>godi midiyaa nmaajvaadi paarti</cell><cell>Lapdog media, It is a socialist polit-Muslims ical party which is inclined towards</cell><cell>NONE -OFFN</cell></row><row><cell>Famous Quotes</cell><cell>Old lions in the wild lay down and die with dignity when they can't hunt anymore.</cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Kmi-panlingua at HASOC 2019: SVM vs BERT for hate speech and offensive content detection</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Ojha</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/Vol-2517/T3-14.pdf" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of FIRE 2019 -Forum for Information Retrieval Evaluation</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">P</forename><surname>Mehta</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Majumder</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Mitra</surname></persName>
		</editor>
		<meeting><address><addrLine>Kolkata, India</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">December 12-15, 2019. 2019</date>
			<biblScope unit="volume">2517</biblScope>
			<biblScope unit="page" from="285" to="292" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Predicting the type and target of offensive posts in social media</title>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Malmasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rosenthal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Farra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kumar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technology (NAACL-HLT)</title>
				<meeting>the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technology (NAACL-HLT)</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">An evaluation of multilingual offensive language identification methods for the languages of india</title>
		<author>
			<persName><forename type="first">T</forename><surname>Ranasinghe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<idno type="DOI">10.3390/info12080306</idno>
		<ptr target="https://www.mdpi.com/2078-2489/12/8/306.doi:10.3390/info12080306" />
	</analytic>
	<monogr>
		<title level="j">Information</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Challenges in discriminating profanity from hate speech</title>
		<author>
			<persName><forename type="first">S</forename><surname>Malmasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Experimental &amp; Theoretical Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="1" to="16" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Understanding abuse: A typology of abusive language detection subtasks</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Waseem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Davidson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Warmsley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Weber</surname></persName>
		</author>
		<ptr target="http://aclweb.org/anthology/W17-3012" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Abusive Language Online, Association for Computational Linguistics</title>
				<meeting>the First Workshop on Abusive Language Online, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="78" to="84" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Automated hate speech detection and the problem of offensive language</title>
		<author>
			<persName><forename type="first">T</forename><surname>Davidson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Warmsley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Macy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Weber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ICWSM</title>
				<meeting>ICWSM</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Aggression-annotated corpus of hindienglish code-mixed data</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Reganti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bhatia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Maheshwari</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European Language Resources Association (ELRA)</title>
				<editor>
			<persName><forename type="first">N</forename><forename type="middle">C C</forename></persName>
		</editor>
		<editor>
			<persName><forename type="first">)</forename></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Choukri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Cieri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Declerck</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Goggi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Hasida</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Isahara</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Maegaard</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Mariani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Mazo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Moreno</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Odijk</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Piperidis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Tokunaga</surname></persName>
		</editor>
		<meeting>the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European Language Resources Association (ELRA)<address><addrLine>Paris, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Developing a multilingual annotated corpus of misogyny and aggression</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bhattacharya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bansal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bhagat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Dawer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Lahiri</surname></persName>
		</author>
		<author>
			<persName><surname>Ojha</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, European Language Resources Association (ELRA)</title>
				<meeting>the Second Workshop on Trolling, Aggression and Cyberbullying, European Language Resources Association (ELRA)<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="158" to="168" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Benchmarking aggression identification in social media</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Ojha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Malmasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/W18-4401" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018)</title>
				<meeting>the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018)<address><addrLine>Santa Fe, New Mexico, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1" to="11" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Evaluating aggression identification in social media</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Ojha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Malmasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, European Language Resources Association (ELRA)</title>
				<meeting>the Second Workshop on Trolling, Aggression and Cyberbullying, European Language Resources Association (ELRA)<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1" to="5" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)</title>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Malmasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rosenthal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Farra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kumar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of The 13th International Workshop on Semantic Evaluation (SemEval)</title>
				<meeting>The 13th International Workshop on Semantic Evaluation (SemEval)</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo-European Languages)</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Patel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Mandlia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Patel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation</title>
				<meeting>the 11th annual meeting of the Forum for Information Retrieval Evaluation</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Shahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jaiswal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nandini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Patel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Majumder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schäfer</surname></persName>
		</author>
		<title level="m">Overview of the hasoc track at fire 2020: Hate speech and offensive content identification in indo-european languages</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2020-12">December 2020. 2020</date>
			<biblScope unit="page" from="16" to="20" />
		</imprint>
	</monogr>
	<note>FIRE 2020: Forum for Information Retrieval Evaluation, Virtual Event</note>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Anandkumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<title level="m">Overview of the hasoc track at fire 2020: Hate speech and offensive language identification in tamil, malayalam</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note>english and german</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech</title>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Shahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Madhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Satapara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ranasinghe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">FIRE 2021: Forum for Information Retrieval Evaluation, Virtual Event</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2021-12">December 2021. 2021</date>
			<biblScope unit="page" from="13" to="17" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Overview of the HASOC subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Shahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Madhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Satapara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Majumder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schäfer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ranasinghe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nandini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename></persName>
		</author>
		<ptr target="http://ceur-ws.org/" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of FIRE 2021 -Forum for Information Retrieval Evaluation</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Scikit-learn: Machine learning in Python</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pedregosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Varoquaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gramfort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Michel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Thirion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Grisel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Blondel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Prettenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dubourg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vanderplas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Passos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cournapeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brucher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Perrot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Duchesnay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2825" to="2830" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">API design for machine learning software: experiences from the scikit-learn project</title>
		<author>
			<persName><forename type="first">L</forename><surname>Buitinck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Louppe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Blondel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Pedregosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mueller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Grisel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Niculae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Prettenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gramfort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Grobler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Layton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vanderplas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Holt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Varoquaux</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECML PKDD Workshop: Languages for Data Mining and Machine Learning</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="108" to="122" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Support vector machines</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Hearst</surname></persName>
		</author>
		<idno type="DOI">10.1109/5254.708428</idno>
		<idno>doi:</idno>
		<ptr target="10.1109/5254.708428" />
	</analytic>
	<monogr>
		<title level="j">IEEE Intelligent Systems</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page" from="18" to="28" />
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
