<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Mangalore-University@INLI-FIRE-2017: Indian Native Language Identification using Support Vector Machines and Ensemble Approach</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Hamada</forename><forename type="middle">A</forename><surname>Nayel</surname></persName>
							<email>hamada.ali@fci.bu.edu.eg</email>
						</author>
						<author>
							<persName><forename type="first">H</forename><forename type="middle">L</forename><surname>Shashirekha</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Mangalore University</orgName>
								<address>
									<postCode>574199</postCode>
									<settlement>Mangalore</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="institution">Benha University</orgName>
								<address>
									<postCode>-13518</postCode>
									<settlement>Benha</settlement>
									<country key="EG">Egypt</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Mangalore University</orgName>
								<address>
									<postCode>574199</postCode>
									<settlement>Mangalore, Karnataka</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Mangalore-University@INLI-FIRE-2017: Indian Native Language Identification using Support Vector Machines and Ensemble Approach</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">B500E0517605B543CA496D62A9B7ED68</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T03:15+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Support Vector Machines</term>
					<term>Ensemble Learning</term>
					<term>Native Languages Identification</term>
					<term>Word Vector Space</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper describes the systems submitted by our team for Indian Native Language Identification (INLI) task held in conjunction with FIRE 2017. Native Language Identification (NLI) is an important task that has different applications in different areas such as social-media analysis, authorship identification, second language acquisition and forensic investigation. We submitted two systems using Support Vector Machine (SVM) and Ensemble Classifier based on three different classifiers representing the comments (data) as vector space model for both systems and achieved accuracy of 47.60% and 47.30% respectively and secured second rank over all submissions for the task.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Native Language Identification (NLI) aims at identifying the native language (L1) of users writing in another or later learned language or speech (L2). NLI is an important task that has many applications in different areas such as social-media analysis, authorship identification, second language acquisition and forensic investigation. In forensic analysis <ref type="bibr" target="#b6">[7]</ref>, NLI helps to glean information about the discriminant L1 cues in an anonymous text. Second Language Acquisition (SLA) <ref type="bibr" target="#b11">[12]</ref> studies the transfer effects from the native languages on later learned language. In education, automatic correction of grammatical errors is an important application of NLI <ref type="bibr" target="#b13">[14]</ref>. NLI can be used as a feature in authorship identification task <ref type="bibr" target="#b5">[6]</ref>, which aims at assigning a text to one of the predefined list of authors. Authorship identification is used for terrorists communications investigation <ref type="bibr" target="#b0">[1]</ref> and digital crime investigation <ref type="bibr" target="#b3">[4]</ref>.</p><p>Supervised approaches using machine learning algorithms have been used for NLI by many researchers. Jarvis et al. <ref type="bibr" target="#b8">[9]</ref>, used SVM classification algorithm to create a model for NLI and reported an accuracy of 83.6%. They used features such as n-grams of words, Part-of-Speech (PoS) tags and lemmas. Combining multiple classifier systems to enhance the final output, such as ensemble classifier was used for NLI by Tetreault et al. <ref type="bibr" target="#b14">[15]</ref>. Bykh and Meurers <ref type="bibr" target="#b2">[3]</ref> applied a tuned and optimized ensemble classifier on NLI 2013 shared task dataset and achieved an accuracy of 84.82%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">TASK DESCRIPTION</head><p>Given a comment I =&lt;w 1 ,w 2 , . . . ,w N &gt; where each w i , i = 1..n is either an English language word or a word of native language written in English (or transliterated to English language) for an individual social media user, the objective of the task is to identify the native language of the user. The comment may include English words in addition to the words of any one native language written in English. The task considers six Indian languages, namely Tamil (TA), Hindi (HI), Kannada (KA), Malayalam (MA), Bengali (BE) and Telugu (TE). Considering the languages as a set of classes C = {T A,HI ,KA,MA,BE,T E} and comments as individual instances I = {I 1 , I 2 , . . . , I n } we have formulated the task as a classification problem that assigns one of the six predefined classes of C to a new unlabelled instance I u .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">DATASET</head><p>The data sets provided for this task are a collection of comments from different regional newspaper's facebook pages during April-2017 to July-2017. Training and test sets contain 1233 and 783 files respectively. Each training and testing file consists of a set of comments. Table <ref type="table" target="#tab_0">1</ref> shows a brief statistics about training set. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">SYSTEM DESCRIPTION</head><p>In this section, we will describe the two systems proposed for Indian Native Language Identification (INLI) <ref type="bibr" target="#b9">[10]</ref> task submissions. The general frame work of classifier for both systems is shown in figure <ref type="figure" target="#fig_0">1</ref>. First phase of our systems is data preprocessing, also known as corpus cleaning. This phase is important where we exclude non-informative tokens and phrases. Second phase comprises of constructing vector space model for the comments (input data). These two phases are common for both the systems. The next phase is creating a model using a machine learning algorithm. Support Vector Machine (SVM) and Ensemble learning are used for the first and second submission respectively. Details of each phase is given below. In this phase, we tokenized each comment I j into a set of words or tokens and removed uninformative tokens as follows to get bag of tokens:-</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>• Emoji removal</head><p>Emoji is a small image used as a visual presentation to express emotion. The first step in removing unrelated information is to remove Emojis as they are not important for the identification of native language.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>• Special characters and digits</head><p>Digits and special characters such as #, %, ... are the characters which appear frequently in the comments of all the languages. As such characters do not contribute to the identification of native language they are removed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>• Modified stop words</head><p>Stop words are the words which appear frequently and do not contribute to the identification of native language. Hence, to remove stop words we used a union of different stop words lists, namely, (1) stop words list extracted from nltk.corpus 1 package.</p><p>(2) stop words list extracted from stop_words 2 package.</p><p>(3) Manually written stop words. (The complete list of manually written stop words is given in Appendix A) 1 www.nltk.org/nltk_data/ 2 pypi.python.org/pypi/stop-words</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Constructing Vector Space Model</head><p>After preprocessing, the comments will be represented as vector space model. If &lt;t 1 ,t 2 , . . . ,t k &gt; are the unique tokens/terms in a comment I j , the vector space model for the comment I j will be represented as &lt;w j1 ,w j2 , . . . ,w jk &gt; where w ji is the weight of the token/term t i in comment I j . For term weights, we used Term Frequency/Inverse Document Frequency (TF/IDF) calculated as follows:-</p><formula xml:id="formula_0">t j = t f j * log N + 1 d f j + 1</formula><p>where t f j is the total number of occurrences of term t j in the current comment, d f j is the number of comments in which the token/term t j occurs and N is the total number of comments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Model Construction for First Submission using SVM</head><p>SVM is a binary classifier which creates a hyperplane that discriminates between the two classes <ref type="bibr" target="#b4">[5]</ref>. SVM can be extended to multi-class problems by creating several binary SVMs and combining them using a one-vs-rest method or one-vs-one method <ref type="bibr" target="#b7">[8]</ref>.</p><p>We implemented a six class SVM corresponding to six classes TA, HI, KA, MA, BE and TE, as per the framework shown in figure <ref type="figure" target="#fig_0">1</ref> for comment identification using Stochastic Gradient Descent (SGD) for optimizing the parameters of SVM model. SGD algorithm updates the value of parameter θ of the objective function W (θ ) as</p><formula xml:id="formula_1">θ = θ − η∇ θ E [W (θ )]</formula><p>where η is step size and E[W (θ )] is the cost function. Ensemble learning is a classification technique, which uses a set of different heterogenous and diverse classifiers as base classifiers and combines the output of them in different approaches to get the final output <ref type="bibr" target="#b12">[13]</ref>. Ensemble technique tries to overcome the weakness of some classifiers using the strength of other classifiers. Figure <ref type="figure" target="#fig_1">2</ref> shows the framework of ensemble learning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Model Construction for Second Submission using Ensemble Approach</head><p>We have used 3 base classifiers, namely, multinomial Bayes, SVM and random forest tree classifiers and combined the results by weighted voting. Multinomial Bayes classifier is an instance of Naive Bayes classifier that captures word frequency information in documents <ref type="bibr" target="#b10">[11]</ref>. Random forests classifier is a supervised classifier which comprises of multiple decision trees and each tree depends on independently sampled random vector <ref type="bibr" target="#b1">[2]</ref>. The base classifiers are designed as per the framework shown in figure <ref type="figure" target="#fig_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">PERFORMANCE EVALUATION</head><p>Performance evaluation of INLI task is measured as the accuracy of the system in addition to class-wise accuracy which is calculated using Precision (P), Recall (R) and F1 measure 3 . For each class, P is the measure of the number of comments correctly classified over the total number of comments that system classified as same class. R is the measure of the number of comments correctly classified over the actual number of comments of the class. F1 measure is the harmonic mean of P and R, which can be calculate as follow:-</p><formula xml:id="formula_2">F 1 = 2 * P * R P + R</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">RESULTS AND DISCUSSION</head><p>The class wise accuracy of first submission using SVM based on SGD algorithm to determine the parameters of the model is shown in Table <ref type="table" target="#tab_1">2</ref> in terms of P, R and F1 measure. The overall accuracy of this submission is 47.60% and it ranks second among all the submissions. Table <ref type="table" target="#tab_2">3</ref> shows the performance evaluation of the second submission where we used Ensemble approach to combine output of different models. Overall accuracy of this submission is 47.30% and it ranks third among all the submissions. Results of both submissions illustrates that the performance of identifying Hindi is the worst. The reason may be most of the other languages' natives have knowledge of Hindi. Our systems depend essentially on the effective words for each language.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">CONCLUSION</head><p>In this work, SVM and Ensemble classifier have been used for INLI. SVM outperforms the Ensemble classifier which combines different three classifiers. Our Support Vector Machine (SVM) submission secured second rank respectively over all submissions for the task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A COMPLETE LIST OF MANUALLY WRITTEN STOP WORDS</head><p>The following is the full list of stopwords used in our system:-{ a, about, above, across, after, afterwards, again, against, all, almost, alone, along, already, also, although, always, am, among, amongst, amoungst, amount, an, and, another, any, anyhow, anyone, anything, anyway, anywhere, are, around, as, at, back, be, side, since, sincere, six, sixty, so, some, somehow, someone, something, sometime, sometimes, somewhere, still, such, system, take, ten, than, that, the, their, them, themselves, then, thence, there, thereafter, thereby, therefore, therein, thereupon, these, they, thick, thin, third, this, those, though, three, through, throughout, thru, thus, to, together, too, top, toward, towards, twelve, twenty, two, un, under, until, up, upon, us, very, via, was, we, well, were, what, whatever, when, whence, whenever, where, whereafter, whereas, whereby, wherein, whereupon, wherever, whether, which, while, whither, who, whoever, whole, whom, whose, why, will, with, within, without, would, yet, you, your, yours, yourself, yourselves }</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Framework of classifier</figDesc><graphic coords="2,94.72,185.11,158.40,187.20" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Framework of Ensemble approach</figDesc><graphic coords="2,348.08,449.14,180.00,136.81" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Training set statistics</figDesc><table><row><cell cols="3">Language # of comments Ratio</cell></row><row><cell>TA</cell><cell>207</cell><cell>16.79%</cell></row><row><cell>HI</cell><cell>211</cell><cell>17.11%</cell></row><row><cell>KA</cell><cell>203</cell><cell>16.46%</cell></row><row><cell>MA</cell><cell>200</cell><cell>16.22%</cell></row><row><cell>BE</cell><cell>202</cell><cell>16.38%</cell></row><row><cell>TE</cell><cell>210</cell><cell>17.03%</cell></row><row><cell>Total</cell><cell>1233</cell><cell>100%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Results of SVM classifier based submission</figDesc><table><row><cell>Class</cell><cell>P</cell><cell>R</cell><cell>F1</cell></row><row><cell>BE</cell><cell>54.00%</cell><cell cols="2">84.90% 66.00%</cell></row><row><cell>HI</cell><cell>60.00%</cell><cell cols="2">7.20% 12.80%</cell></row><row><cell>KA</cell><cell>40.40%</cell><cell cols="2">54.10% 46.20%</cell></row><row><cell>MA</cell><cell>42.70%</cell><cell cols="2">66.30% 51.90%</cell></row><row><cell>TA</cell><cell>58.00%</cell><cell cols="2">58.00% 58.00%</cell></row><row><cell>TE</cell><cell>32.50%</cell><cell cols="2">48.10% 38.80%</cell></row><row><cell cols="2">Overall Accuracy</cell><cell>47.60%</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Results of Ensemble classifier based submission http://www.nltk.org/_modules/nltk/metrics/scores.html We used 10-fold cross-validation technique while training both classifiers, the cross validation accuracy of both submissions is given in Table 4.</figDesc><table><row><cell cols="2">Class P</cell><cell>R</cell><cell>F1</cell></row><row><cell>BE</cell><cell>56.50%</cell><cell cols="2">79.50% 66.10%</cell></row><row><cell>HI</cell><cell>60.70%</cell><cell>6.80%</cell><cell>12.20%</cell></row><row><cell>KA</cell><cell>38.40%</cell><cell cols="2">58.10% 46.20%</cell></row><row><cell>MA</cell><cell>40.40%</cell><cell cols="2">70.70% 51.40%</cell></row><row><cell>TA</cell><cell>58.00%</cell><cell cols="2">58.00% 58.00%</cell></row><row><cell>TE</cell><cell>32.80%</cell><cell cols="2">49.40% 39.40%</cell></row><row><cell cols="3">Overall Accuracy 47.30%</cell><cell></cell></row></table><note>3 </note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 :</head><label>4</label><figDesc>10-fold cross-validation accuracy for both submissions</figDesc><table><row><cell>Submission 1</cell><cell>Submission 2</cell></row><row><cell>88.09%</cell><cell>87.30%</cell></row><row><cell>84.80%</cell><cell>84.80%</cell></row><row><cell>90.32%</cell><cell>90.32%</cell></row><row><cell>91.06%</cell><cell>91.06%</cell></row><row><cell>89.43%</cell><cell>86.18%</cell></row><row><cell>79.68%</cell><cell>80.49%</cell></row><row><cell>86.18%</cell><cell>90.24%</cell></row><row><cell>88.52%</cell><cell>89.34%</cell></row><row><cell>90.98%</cell><cell>90.16%</cell></row><row><cell>89.34%</cell><cell>91.80%</cell></row><row><cell cols="2">Mean = 87.84% Mean = 88.17%</cell></row><row><cell>STD = 3.32</cell><cell>STD = 3.33</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head></head><label></label><figDesc>, mill, mine, more, moreover, most, mostly, move, much, must, my, myself, name, namely, neither, never, nevertheless, next, nine, no, nobody, none, noone, nor, not, nothing, now, nowhere, of, off, often, on, once, one, only, onto, or, other, others, otherwise, our, ours, ourselves, out, over, own, part, per, perhaps, please, put, rather, re, same, see, seem, seemed, seeming, seems, serious, several, she, should, show,</figDesc><table><row><cell>became, because,</cell></row><row><cell>become, becomes, becoming, been, before, beforehand,</cell></row><row><cell>behind, being, below, beside, besides, between, beyond,</cell></row><row><cell>bill, both, bottom, but, by, call, can, cannot, cant,</cell></row><row><cell>co, con, could, couldnt, cry, de, describe, detail, do,</cell></row><row><cell>done, down, due, down, due, during, each, eg, eight,</cell></row><row><cell>either, eleven, else, elsewhere, empty, enough, etc,</cell></row><row><cell>even, ever, every, everyone, everything, everywhere,</cell></row><row><cell>except, few, fifteen, fifty, fill, find, fire, first,</cell></row><row><cell>five, for, former, formerly, forty, found, four, from,</cell></row><row><cell>front, full, further, get, give, go, had, has, hasnt,</cell></row><row><cell>have, he, hence, her, here, hereafter, hereby, herein,</cell></row></table><note>hereupon, hers, herself, him, himself, his, how, however, hundred, i, ie, if, in, inc, indeed, interest, into, is, it, its, itself, keep, last, latter, latterly, least, less, ltd, made, many, may, me, meanwhile, might</note></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Applying Authorship Analysis to Extremist-Group Web Forum Messages</title>
		<author>
			<persName><forename type="first">Ahmed</forename><surname>Abbasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hsinchun</forename><surname>Chen</surname></persName>
		</author>
		<idno type="DOI">10.1109/MIS.2005.81</idno>
		<ptr target="https://doi.org/10.1109/MIS.2005.81" />
	</analytic>
	<monogr>
		<title level="j">IEEE Intelligent Systems</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="67" to="75" />
			<date type="published" when="2005-09">2005. Sept. 2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Random Forests</title>
		<author>
			<persName><forename type="first">Leo</forename><surname>Breiman</surname></persName>
		</author>
		<idno type="DOI">10.1023/A:1010933404324</idno>
		<ptr target="https://doi.org/10.1023/A:1010933404324" />
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="5" to="32" />
			<date type="published" when="2001-10-01">2001. 01 Oct 2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Exploring Syntactic Features for Native Language Identification: A Variationist Perspective on Feature Encoding and Ensemble Optimization</title>
		<author>
			<persName><forename type="first">Serhiy</forename><surname>Bykh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Detmar</forename><surname>Meurers</surname></persName>
		</author>
		<ptr target="http://aclanthology.coli.uni-saarland.de/pdf/C/C14/C14-1185.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers</title>
				<meeting>COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers</meeting>
		<imprint>
			<publisher>Dublin City University and Association for Computational Linguistics</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1962" to="1973" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">WhoâĂŹs at the keyboard? Authorship attribution in digital evidence investigations</title>
		<author>
			<persName><forename type="first">Carole</forename><forename type="middle">E</forename><surname>Chaski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International journal of digital evidence</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="13" />
			<date type="published" when="2005">2005. 2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Support-vector networks</title>
		<author>
			<persName><forename type="first">Corinna</forename><surname>Cortes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vladimir</forename><surname>Vapnik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine learning</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="273" to="297" />
			<date type="published" when="1995">1995. 1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Author profiling for English emails</title>
		<author>
			<persName><forename type="first">Dominique</forename><surname>Estival</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tanja</forename><surname>Gaustad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Son</forename><forename type="middle">Bao</forename><surname>Pham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Will</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ben</forename><surname>Hutchinson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics</title>
				<meeting>the 10th Conference of the Pacific Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="263" to="272" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Forensic linguistics: An introduction to language in the justice system</title>
		<author>
			<persName><forename type="first">John</forename><surname>Gibbons</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2003">2003</date>
			<publisher>Wiley-Blackwell</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A Comparison of Methods for Multiclass Support Vector Machines</title>
		<author>
			<persName><forename type="first">Chih-Wei</forename><surname>Hsu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chih-Jen</forename><surname>Lin</surname></persName>
		</author>
		<idno type="DOI">10.1109/72.991427</idno>
		<ptr target="https://doi.org/10.1109/72.991427" />
	</analytic>
	<monogr>
		<title level="j">Trans. Neur. Netw</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="415" to="425" />
			<date type="published" when="2002-03">2002. March 2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Maximizing Classification Accuracy in Native Language Identification</title>
		<author>
			<persName><forename type="first">Scott</forename><surname>Jarvis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yves</forename><surname>Bestgen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Steve</forename><surname>Pepper</surname></persName>
		</author>
		<ptr target="http://aclanthology.coli.uni-saarland.de/pdf/W/W13/W13-1714.pdf" />
		<imprint>
			<date type="published" when="2013">2013. 2013</date>
			<biblScope unit="page" from="111" to="118" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Overview of the INLI PAN at FIRE-2017 Track on Indian Native Language Identification</title>
		<author>
			<persName><forename type="first">Anand</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Barathi</forename><surname>Ganesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">B</forename><surname>Shivkaran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">P</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Paolo</forename><surname>Rosso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Notebook Papers of FIRE 2017, FIRE-2017</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting><address><addrLine>Bangalore, India</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017-12-08">2017. December 8-10</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">A comparison of event models for naive bayes text classification</title>
		<author>
			<persName><forename type="first">Andrew</forename><surname>Mccallum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kamal</forename><surname>Nigam</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI-98 workshop on learning for text categorization. The COLING 2012 Organizing Committee</title>
				<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="41" to="48" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Understanding Second Language Acquisition</title>
		<author>
			<persName><forename type="first">Lourdes</forename><surname>Ortega</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2009">2009</date>
			<publisher>Hodder Education</publisher>
			<pubPlace>Oxford</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Ensemble based systems in decision making</title>
		<author>
			<persName><forename type="first">R</forename><surname>Polikar</surname></persName>
		</author>
		<idno type="DOI">10.1109/MCAS.2006.1688199</idno>
		<ptr target="https://doi.org/10.1109/MCAS.2006.1688199" />
	</analytic>
	<monogr>
		<title level="j">IEEE Circuits and Systems Magazine</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="21" to="45" />
			<date type="published" when="2006">2006. 2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Algorithm Selection and Model Adaptation for ESL Correction Tasks</title>
		<author>
			<persName><forename type="first">Alla</forename><surname>Rozovskaya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dan</forename><surname>Roth</surname></persName>
		</author>
		<ptr target="http://www.aclweb.org/anthology/P11-1093" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics</title>
				<meeting>the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics<address><addrLine>Portland, Oregon, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="924" to="933" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Native Tongues, Lost and Found: Resources and Empirical Evaluations in Native Language Identification</title>
		<author>
			<persName><forename type="first">Joel</forename><surname>Tetreault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Daniel</forename><surname>Blanchard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aoife</forename><surname>Cahill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Chodorow</surname></persName>
		</author>
		<ptr target="http://aclanthology.coli.uni-saarland.de/pdf/C/C12/C12-1158.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of COLING 2012. The COLING 2012 Organizing Committee</title>
				<meeting>COLING 2012. The COLING 2012 Organizing Committee</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="2585" to="2602" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
