<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the Second Shared Task on Indian Native Language Identi cation (INLI) ?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anand Kumar M</string-name>
          <email>manandkumar@nitk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Barathi Ganesh H</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ajay S G</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Soman K P</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Technology</institution>
          ,
          <addr-line>NITK Surathkal m</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This overview paper describes the second shared task on Indian Native Language Identi cation (INLI) that was organized by FIRE 2018. Given a corpus with comments in English from various Facebook newspapers pages, the objective of the task is to identify the native language among the following six Indian languages: Bengali, Hindi, Kannada, Malayalam, Tamil, and Telugu. Altogether, 31 approaches of 14 di erent teams are evaluated. In this paper, we report the overview of the participant's systems and the results of second INLI shared task. We have also compared the results of the rst INLI shared task conducted with FIRE-2017.</p>
      </abstract>
      <kwd-group>
        <kwd>Native Language Identi cation thor Pro ling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        This paper explains the overview of the second INLI (Indian Native Language
Identi cation) shared task conducted co-joined with FIRE2018. Native Language
Identi cation (NLI) is the task of automatically classifying the L1 of a writer
based only on his or her text written in another language[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The research in the
native language identi cation is emerged in recent years because of its
applications in Digital forensics and language learning. This is the rst foremost task
which is conducted particularly for Indian languages. It is a continuation of the
previous shared task INLI-2017 conducted with the FIRE2017 conference. The
objective of the task is de ned as the set of user comments needs to be
classied to an Indian native language. We have collected the user comments written
in the English language from the regional news pages of Facebook. We assume
that only the native persons will see the Regional news pages of Facebook. The
motivation of the shared task is to create the rst ever corpora for Indian
native language identi cation in social media and to provide the environment to
directly compare the di erent pre-processing methods, features, and the
algorithms. Even though the researchers and industries showing an emerging interest
? Supported by organization x.
towards the native language identi cation, the development of such systems are
slow down by the primary issue which is getting the right annotated corpora.
Assessing the NLI system needs a corpus consists of texts in a language other
than the native language of the user. The problem with the collecting essays
and students assignments for native language identi cation is that even though
the person belongs to a particular region or native language, we cannot assure
that the person speaks or reads the native languages. Most of the Indians will
speak their native language but not all will read and write their native language.
Lack of such corpora in Indian languages induced us to collect the smaller size
of INLI corpora and evaluating the participant's systems. Few of the prominent
applications of native language identi cation is given below.
      </p>
      <p>Error correction and language pro ciency: The language pro ciency of the region
can be identi ed and analyzed with the help of native language identi cation
system. It is known that people from di erent region and mother tongue will
do a di erent kind of errors when they are learning the other language. The
native language identi cation system will give the targeted feedback to language
learners.</p>
      <p>Marketing: Categorizing the geographical region and native language of authors
who providing the opinions may help to improve the marketing strategies.
Politics: The comments of the user who likes the Govt. policies and whose dislike
the policies and the region-speci c people opinion can be identi ed automatically
without looking to their pro le. Getting the exact pro le of the person is di cult
in social media. Native language is a part of the user pro le. So we need the
mechanism to nd the native language automatically by analyzing the usage of
another language.</p>
      <p>Person identi cation - Fake news identi cation: Analyzing the Fake news can
be helped to nd out the which region or native person created the Fake news
or threatening messages.</p>
      <p>In this overview of the shared task paper, we describe the task and the
data sets used, the features and classi ers in which the participants used, the
results and its comparisons with INLI-2017. The paper is organized as follows,
section 2 explores the related works in the NLI and section 3 describes the task
descriptions. Section 4 deals with the statistics of the INLI corpora used in the
shared task. Section 5 shows the system descriptions of the participants and the
various features used. Section 6 explains the Results and discussions and section
7 concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>
        Most commonly NLI is done as a supervised classi cation task, where features
are extracted from the text produced by non-native speakers. NLI is a recent,
but rapidly growing, area of research. While some early research was conducted
in the early 2000s, most work has only appeared in the last few years. The
work of Koppel et al.[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] (2005) was the rst in the eld and they explored a
multitude of features, many of which are employed in several of the systems in
the shared tasks. These features included character and POS n-grams, content
and function words, as well as spelling and grammatical errors (since language
learners have tendencies to make certain errors based on their L1 (Swan and
Smith, 2001)). An SVM model was trained on these features extracted from
a subsection of the ICLE corpus consisting of 5 L1s. N-gram features (word,
character and POS) have gured prominently in prior work. Not only are they
easy to compute, but they can be quite predictive. Wong and Dras (2011)[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
utilized character and part-of-speech (POS) n-grams as well as cross-sections of
parse trees and Context-Free Grammar (CFG) features, i.e., local trees. Their
approach with a binary representation of non-lexicalized rules (except for those
rules lexicalized with function words and punctuation) outperformed a setup
using only lexical features, such as n-grams, on data from the International
Corpus of Learner English (ICLE; Granger et al., 2002)[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Swanson and Charniak
(2012)[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] used binary feature representations of CFG and Tree Substitution
Grammar (TSG) rules replacing terminals (except for function words) by a
special symbol. TSG outperformed CFG features in their settings. gs2 being widely
noted (Brooke and Hirst, 2012a)[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. More recently, TOEFL11, the rst corpus
designed for NLI was released (Blanchard et al., 2013)[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. While it is the largest
NLI dataset available, it only contains argumentative essays, limiting analyses
to this genre. Research has also expanded to use non-English learner corpora
(Malmasi and Dras, 2014a; Malmasi and Dras, 2014c)[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Recently, Malmasi
and Dras (2014b)[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] introduced the Chinese Learner Corpus for NLI and their
results indicate that feature performance may be similar across corpora and even
L1- L2 pairs.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Task Descriptions</title>
      <p>
        The shared task is the second version of the INLI-2017[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Given an XML le
which contains the Facebook comments written in the English language, the
task is to identify the native language of the author of comments. The native
languages considered in the shared task are Hindi, Tamil, Malayalam, Kannada,
Telugu, and Bengali. The highest accuracy obtained in the rst shared task
INLI-2017[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is 48.80 which is comparably less. We felt that there are a lot of
avenues to improve the performance of the INLI system. So we conducted the
second version of the shared task with same Training data set and di erent test
set.
      </p>
      <p>
        Training Data : In this shared task, the training data set of the INLI-2017[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
shared task is used as it is an extended version of the earlier shared task. Totally,
1233 XML les for 6 Indian natives where each le contains 8 to 10 Facebook
comments written in English. Facebook comments are collected during the period
of April 2017 to July 2017.
      </p>
      <p>
        Test Set-1 : Test set-1 represents the test set used in the earlier INLI-task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Totally 874 XML documents which are also collected in the same period of the
Training data set. In order to compare the results of earlier results, we asked
participants to test their systems with this test set.
      </p>
      <p>Test Set-2 : Test set-2 the new set which is collected during the period of May
2018 to June 2018. The regional bias comments are removed in order to avoid the
Topic bias. Here the author bias also removed so as expected the performance
of the participants is comparably less.</p>
      <p>The training data was released on 15th May 2018 and the unlabeled testing
data set released almost one month later. The training set is categorized to
the folders which are named as six Indian languages correspondingly. Each team
allowed to submit up to 3 di erent runs of the test set-1 and test set2. task. This
allowed participants to experiment with di erent variations of their developed
system. The participants are only ranked based on the test set 2.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Corpora Statistics</title>
      <p>Collecting corpora is an important challenge in the INLI. We have collected the
comments posted in English on the top regional news pages of Facebook. In
order to avoid the topic bias, we removed the comments with the regional avor.
We concentrate only on the comments on national importance like "Budget",
"Modi", "BJP" and "Election" etc. The training dataset which is used in the
INLI-2018 shared task is the same as the data set used in the INLI-2017. But,
the test set is di erent in which it is collected recently in the time period of
May 2018 to June 2018. In order to compare the previous shared task results,
the participants are asked to test their systems with the INLI-2017 test set also.
The detailed dataset statistics are given in table.1 and 2.</p>
      <p>Figure 1 and 2 explain the word cloud of the training data set and testing
set. Each language from the training data is represented separately in the word
cloud.</p>
      <p>Figure.1 shows the top 50 words of the training data set using the word
cloud visualization. Tamil, Malayalam, Kannada, and Telugu are spoken in the
southern part of India. Bengali is spoken in the eastern part and Hindi is most
common in the northern region. Interestingly all the keywords in the Hindi
language are present in the all other ve languages. So identifying the Hindi native
language is di cult compared with other languages.</p>
      <p>Each language comments are visualized separately to understand the most
frequently used words by the native speakers. The gure shows the common
words like "India", Country, People, Modi, money and politics and government.
Even though we removed the region-speci c words. Some of the posts still
reect the region information. This also depends upon the news item where the
comments have been collected. Compare to other languages the word "farmers"
are more in the Tamil region and similarly, most of the border region consists of
the word "army".
5</p>
    </sec>
    <sec id="sec-5">
      <title>System Descriptions of Participants</title>
      <p>In total, 14 teams were submitted their runs. Each team is restricted to 3 runs.
Totally, we received 31 submissions from participants for test set-1 and test set-2
data.</p>
      <p>
        Ajees et.al[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] from CUSAT team applied the Convolutional neural network
for native language identi cation. Four convolution layers, three max-pooling
layer, and two dense layers were used on the CNN network. Instead of treating
the problem as a document classi cation, they converted to sentence/comment
classi cation where each comment are tagged with the corresponding native
language. Each post in the XML le of the test set is tagged in the model. Since
we created the training and testing data where each document contains an equal
number of comments, the developed model will not be a ected based on the
number of comments in each document. The maximum number of prediction
for that particular document is considered as a label for the XML document.
Bharathi et.al[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] from SSNCSE team used the statistical test based feature
selection method for identifying the native language of the document. There have
been submitted three runs for the INLI 2018 task. They have used TFIDF as the
common initial feature for all the submissions. Each submission is di erentiated
with the feature selection method and classi er. In the rst run, Analysis of
Variance (ANOVA) F-values for selecting best features and trained using
MultiLayer Perceptron (MLP) classi er. The second submission is Chi - square value
based feature selection method and the MLP classi er. The third submission is
with Chi-square based feature selection and trained using Stochastic Gradient
Descent (SGD) classi er. For MLP classi er, RELU (Recti ed Linear Unit) is
used as activation function and Adam optimizer is used for weight optimization.
The SGD supports multi-class classi cation by combining multiple binary
classi ers in a one versus rest fashion. Their third submission SGD classi er with
Chi-square feature selection methods outperforms the other submission
submitted on the shared task. Thenmozhi et.al[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] from SSNNLP team also used the
feature selection method with the traditional classi ers for native identi cation.
As a preprocessing, they removed the punctuation and they have not applied
the stemming and stop-word removal. To extract the useful features that are
contributing to native language identi cation, they have used Chi-Square
feature selection method. They tried with di erent combinations of features and
machine learning classi ers and recorded the cross-validation results. Finally,
the MLP classi er with TF-IDF features (without feature selection) and
Multinomial Naive Bayes classi er with Chi-Square feature selection methods are
submitted for evaluation. The results clearly show that the performance of
ChiSquare feature selection method is comparably lesser than the TFIDF features
with no feature selection. Soumik Mondal et.al[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] from Corplab team designed
an INLI system with TFIDf features and linear SVM classier with three
different strategies. In preprocessing they removed the non-ASCII characters and
replaced multiple occurrences of some characters like "......" or "sorryyyyyyy"
with "." or "sorry". In the rst submission, they have used one-vs-rest classi er
and in the second and third submission, they have used the Pairwise Coupling
strategies proposed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Ian markov et.al[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] from CIC-IPN team proposed a
system with the SVM classi er on rich feature set including the emotion-based
features. They used the word and character n-grams, part-of-speech (POS) tag
n-grams, character n-grams from misspelled words, punctuation mark n-grams,
and emotion-based features. The features are weighted using the log-entropy
weighting scheme . They have used the Emotion polarity features similar to
the features proposed in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The well-known NRC emotion lexicon [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] is used
in the features. Aman Gupta[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] from Team WebArch proposed a system using
n-gram based TFIDF features extracted the given data set and trained with
logistic regression. He divided the training data set into train, test and validation
data set. He has calculated the validation accuracy, for with stop words and
without stop words, di erent n-grams and TFIDF and Count Vectorizer.
Rajesh Kumar et.al[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] from NLPRL team developed an INLI system using Hybrid
gated LSTM-CNN. The Glove pre-trained word embeddings are used to nd
the initial level word representation of tokens in the sentences. The word level
input is converted into sentence level input by using a bidirectional LSTM. This
is achieved by linearly combining the last hidden state of forwarding and
backward LSTM. The entire network is trained by Adam optimizer with epoch and
mini-batch size of 15 and 10 respectively. The proposed model retrieved more
relevant documents for the Tamil language as compared to other languages during
the testing phase. For Hindi and Tamil language, the proposed model achieves
highest F1-score for Test set1 data. Ashish Patel et.al[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] from IIITV team
proposed a Hyper-dimensional Computing (HDC) as a supervised learning model
for identifying Indian Native Language from the user's social media comments
written in English. HDC represents language features as high dimensional
vectors called hyper vectors. Initially, comments are broken in character bi-grams
and tri-grams which are used for generating comment hyper vectors. These
hyper vectors are further combined to create di erent language pro le vectors.
Pro le hyper vectors are then used for classi cation of test comments. They
have removed of non-English characters, special characters and converting the
text in lowercase (alphabets). Hamada et.al[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] from Mangalore University team
used Arti cial Neural Network (ANN) model and Ensemble approach. The
traditional TFIDF features have been used to represent comments. The ANN-based
classi er is designed for the rst and second submissions. The hidden layer of
the rst submission contains 70 neurons and the second submission contains 80
neurons and the activation function is the logistic function. Ensemble approach
using majority voting technique has been used in the third submission.
      </p>
      <p>Table 3 explains the various features used by the participating teams. Table
4 shows the important preprocesing techniques, feature selection methods and
classi ers used by the participants.</p>
    </sec>
    <sec id="sec-6">
      <title>Results and Discussions</title>
      <p>The participants are asked to test their systems with two test sets. The accuracy
of the rst and second INLI shared task is given in the Table.5. The highest
accuracy of INLI-2017 is 46.6 %. The highest accuracy of the same data set in
the second shared task is 37.0 % which is less compare to the previous shared
task. Table.6 describes the test set-1 results in the INLI shared task 2018. The
highest accuracy is achieved by the TFIDF features and ANN classi er.</p>
      <p>For the test set-2, the highest accuracy is achieved by SSN CSE team. They
have tried the TFIDF features and feature selection methods with MLP and
SGD classi er.</p>
      <p>Most of the teams tried the conventional TFIDF features. Teams are not
considered the socio-linguistic features and preprocessing methods. As expected
deep learning methods are not dominating the traditional methods due to the
size of the training data set. Feature selection method on top of TFIDF shows
the improvement over other methods.</p>
      <p>The reasons for less accuracy of the shared task is as follows, The data set size
is very small, which is one of the reasons that the accuracy of the participant's
system not performed at the expected level. Facebook comments are also small
in size compared to the essays which are used in the NLI shared task. The topic
bias comments are removed, in order to give attention to only on the writing
style of the user, which are the main evidence for identifying the native of the
user.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>For any language processing task collecting annotated corpora is the
challenging part. The training data set of the INLI 2018 is same as the 2017 shared
task data set. The dataset collection is based on the assumption that, only
native speakers will read native language newspapers. Code-mixed comments and
comments related to the regional topics were removed from the corpus, and
comments with common keywords discussed across the regions were considered in
order to avoid possible topic biases. To the best of our knowledge, this is the rst
corpus for native language identi cation for Indian languages. The participants
used di erent feature sets to address the problem: content-based (among others:
bag of words, character n-grams, word n-grams, term vectors, word embedding,
non-English words) and stylistic-based (among others: words frequency, POS
n-grams, noun and adjective POS tag counts). Participants have used hybrid
gated LSTM-CNN, ANN etc and some have used Glove pre trained word
embeddings. Overall the best performance system obtained an accuracy of 46.6%,
which is 3.6% greater than the baseline. Overall three of the systems performed
better than the baseline. These systems have used the bag of word features
which are extracted from the text posted by the user and the feature vectors
are constructed using TF-IDF score for the training data and Arti cial Neural
Network (ANN) model and Ensemble approaches. The smallest overall accuracy
was 15.2%, which is 27.8% less than the baseline. As future work, we believe
that native language identi cation should be addressed taking into account also
socio-linguistics features to improve further.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Anand</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          , et al.
          <source>"Paolo Rosso</source>
          .
          <year>2017</year>
          .
          <article-title>Overview of the INLI PAN at FIRE2017 Track on Indian Native Language Identi cation." Notebook Papers of FIRE (</article-title>
          <year>2017</year>
          ):
          <fpage>8</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Hamada</surname>
          </string-name>
          et.al.
          <article-title>"Arti cial Neural Network and Ensemble Based Models for INLI"</article-title>
          .
          <source>In Working notes of FIRE</source>
          <year>2018</year>
          <article-title>- Forum for Information Retrieval Evaluation</article-title>
          . Gandhinagar,
          <fpage>6th</fpage>
          - 9th December
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ajees</surname>
          </string-name>
          et.al.
          <article-title>"A Native Language Identi cation System using Convolutional Neural Networks"</article-title>
          ,In Working notes of FIRE 2018 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          . Gandhinagar,
          <fpage>6th</fpage>
          - 9th December
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bharathi</surname>
          </string-name>
          et.al .
          <article-title>"Statistical testing based feature selection for Native Language Identi cation"</article-title>
          ,
          <source>In Working notes of FIRE</source>
          <year>2018</year>
          <article-title>- Forum for Information Retrieval Evaluation</article-title>
          .Gandhinagar ,
          <fpage>6th</fpage>
          - 9th December
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Thenmozhi</surname>
          </string-name>
          et.al .
          <article-title>"A Machine Learning Approach to Indian Native Language Identi cation"</article-title>
          ,
          <source>In Working notes of FIRE</source>
          <year>2018</year>
          <article-title>- Forum for Information Retrieval Evaluation</article-title>
          . Gandhinagar ,
          <fpage>6th</fpage>
          - 9th December
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Soumik</given-names>
            <surname>Mondal</surname>
          </string-name>
          et.al .
          <article-title>"Identi cation of Indian Native Language using Pairwise Coupling"</article-title>
          , In Working notes of FIRE 2018 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          .Gandhinagar,
          <fpage>6th</fpage>
          - 9th December
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Ilia</given-names>
            <surname>Markov</surname>
          </string-name>
          et.al .
          <article-title>"CIC-IPN@INLI2018: Indian Native Language Identi cation"</article-title>
          ,
          <source>In Working notes of FIRE</source>
          <year>2018</year>
          <article-title>- Forum for Information Retrieval Evaluation</article-title>
          . Gandhinagar ,
          <fpage>6th</fpage>
          - 9th December
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Aman</given-names>
            <surname>Gupta</surname>
          </string-name>
          .
          <article-title>"Team WebArch at FIRE-2018 Track on Indian Native Language Identi cation"</article-title>
          ,
          <source>In Working notes of FIRE</source>
          <year>2018</year>
          <article-title>- Forum for Information Retrieval Evaluation</article-title>
          . Gandhinagar ,
          <fpage>6th</fpage>
          - 9th December
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Mundotiya</surname>
          </string-name>
          et.al .
          <article-title>"NLPRL@INLI-2018: Hybrid gated LSTM-CNN model for Indian native language identi cation"</article-title>
          ,
          <source>In Working notes of FIRE</source>
          <year>2018</year>
          <article-title>- Forum for Information Retrieval Evaluation</article-title>
          . Gandhinagar ,
          <fpage>6th</fpage>
          - 9th December
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ashish</surname>
          </string-name>
          Patel et.al .
          <article-title>"IIITV@INLI-2018 : Hyperdimensional Computing for Indian Native Language Identi cation"</article-title>
          ,
          <source>In Working notes of FIRE</source>
          <year>2018</year>
          <article-title>- Forum for Information Retrieval Evaluation</article-title>
          . Gandhinagar,
          <fpage>6th</fpage>
          - 9th December
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Koppel</surname>
            , Moshe,
            <given-names>Jonathan</given-names>
          </string-name>
          <string-name>
            <surname>Schler</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          K r Zigdon.
          <article-title>"Determining an author's native language by mining a text for errors." Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining</article-title>
          .
          <source>ACM</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <surname>Sze-Meng Jojo</surname>
            , and
            <given-names>Mark</given-names>
          </string-name>
          <string-name>
            <surname>Dras</surname>
          </string-name>
          .
          <article-title>"Exploiting parse structures for native language identi cation</article-title>
          .
          <source>" Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Granger</surname>
            , Sylviane,
            <given-names>Joseph</given-names>
          </string-name>
          <string-name>
            <surname>Hung</surname>
          </string-name>
          , and
          <string-name>
            <surname>Stephanie</surname>
          </string-name>
          Petch-Tyson, eds.
          <article-title>" Computer learner corpora, second language acquisition, and foreign language teaching</article-title>
          .
          <source>"</source>
          Vol.
          <volume>6</volume>
          . John Benjamins Publishing,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Swanson</surname>
            , Ben, and
            <given-names>Eugene</given-names>
          </string-name>
          <string-name>
            <surname>Charniak</surname>
          </string-name>
          .
          <article-title>"Native language detection with tree substitution grammars." Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2</article-title>
          . Association for Computational Linguistics,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Brooke</surname>
            , Julian, and
            <given-names>Graeme</given-names>
          </string-name>
          <string-name>
            <surname>Hirst</surname>
          </string-name>
          .
          <article-title>"Native language detection with `cheap'learner corpora." Twenty Years of Learner Corpus Research</article-title>
          . Looking Back,
          <source>Moving Ahead: Proceedings of the First Learner Corpus Research Conference (LCR</source>
          <year>2011</year>
          ). Vol.
          <volume>1</volume>
          . Presses universitaires de Louvain,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Tetreault</surname>
            , Joel, Daniel Blanchard, and
            <given-names>Aoife</given-names>
          </string-name>
          <string-name>
            <surname>Cahill</surname>
          </string-name>
          .
          <article-title>"A report on the rst native language identi cation shared task." Proceedings of the eighth workshop on innovative use of NLP for building educational applications</article-title>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Malmasi</surname>
            , Shervin, and
            <given-names>Mark</given-names>
          </string-name>
          <string-name>
            <surname>Dras</surname>
          </string-name>
          .
          <article-title>"Language identi cation using classi er ensembles</article-title>
          .
          <source>" Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects</source>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>