<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>KCE_DAlab@MAPonSMS-FIRE2018: Effective Word and Character-based Features for Multilingual Author Profiling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sharmila Devi V</string-name>
          <email>sharmiladevi1002@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kannimuthu S</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ravikumar G</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anand Kumar M</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering</institution>
          ,
          <addr-line>CIET, Coimbatore</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Information Technology, Karpagam College of Engineering</institution>
          ,
          <addr-line>Coimbatore</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Information Technology, National Institute of Technology-Karnataka</institution>
          ,
          <addr-line>Surathkal</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper illustrates the work on identification of gender and age-group in Multilingual Author Profiling on SMS messages (MAPonSMS) shared task conducted in the Forum for Information Retrieval and Evaluation (FIRE 2018). To develop the Multilingual Author profiling system, the organizers released the training corpus which includes multilingual (Roman Urdu and English) SMS messages and its corresponding profiles. In gender identification, a profile may be either male or female. The author's age-group fall into one of the three categories: 15-19, 20-24, 25-xx. We have developed the author profiling system1 using the word and character-based Term Frequency &amp; Inverse Document Frequency (TFIDF) features and classify with Support Vector Machine classifier. The proposed system achieved the State-of-Art performance in the multilingual author profiling on SMS task. The accuracy obtained for identification of age-group is 65% and for gender, it is 87%. The performance is also evaluated jointly where the accuracy gained is 57%. We also experimented with the system by changing different parameters and report the crossvalidation accuracy.</p>
      </abstract>
      <kwd-group>
        <kwd>Author profiling</kwd>
        <kwd>Support Vector Machine</kwd>
        <kwd>TFIDF</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Word and Character-based features</kwd>
        <kwd>Multilingual SMS</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In our day-to-day life, social media has provided various ways to share the
information through the faster growth of the electronic devices. Transferring information
and sharing perceptions into the world wide web is becoming an inevitable scenario.
The social networking sites like Facebook, Twitter, blogs, newsgroups, etc are
growing in popularity because they connected various peoples who can share and express
1 https://drive.google.com/drive/u/0/folders/1UIVUZfk98V_KvIITnncl856X66MVi0Z0
their ideas pertained to interesting topics around the world. Author profiling is
generally said to be an identification of the demographic features of the author's traits such
as gender, age-group, and nativity etc. The author profiling has various useful
applications like forensics, security, politics, and marketing etc. For example, from the
security point of view, author profiling competently examine the linguistic profile of a
person who writes the aggressive messages which will lead to valued background
information to evaluate the context of the thread. Similarly, in the online marketing,
the companies want to know about the user's profile who absolutely interested or not
interested in their product. This information helps to recommend the same product to
the users who are all having the similar profile such as demographic, age-group and
gender etc. In order to develop the system for author profiling, the main challenge is
collecting the annotated corpora which contain different attributes of the user. In this
paper, we described the straightforward approach to the multilingual author profiling
on SMS messages (MAPonSMS) shared task conducted. The task involves the
identification of the demographic appearance of the user traits such as gender and
agegroup in code-mixed Roman Urdu language. The rest of the paper is mentioned as
follows: In section 2, we discuss the related work about the author profiling in various
languages. Section 3 presents the corpus statistics and how it was preprocessed. In
section 4, we explain the methodology used for author profiling and the final section
describes the experiments conducted and the results obtained. In section 6, we
conclude the paper and present the limitations and future work
2</p>
    </sec>
    <sec id="sec-2">
      <title>Literature Review</title>
      <p>The growth and popularity of social media platforms have generated a new social
interaction environment between people and the internet thus a new collaboration and
communication network among individuals. To establish the benchmark corpora for
author profiling that the attempt has been made by the research community in newly
years.</p>
      <p>
        Oren Halvani et.al (2017) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] proposed an inherent author verification method
which produced aggressive result compared to a number of state-of-the-art
approaches, situated on support vector machines or neural networks. Michael Tschuggnall and
Gunther Specht (2015) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] explained the entire grammar trees of the sentences of a
document and the substructure of the documents must be extracted by using
pqgrams. The high effectiveness of grammar analysis for automatic author profiling. In
Francisco Rangel et al. (2015) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], developed a system to identify the age and gender
based on the impact of emotions. They used the emograph and emotion labeled graph
method to attain the better performance. In Monika Briedienė et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] based on the
Lithuanian texts the author profiling is performed by machine learning, they used
Naive Bayes Multinomial method which gives the best accuracy in gender, age,
education, marital status and personality type. In Fatima et.al [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] explored the
multilingual author profiling on Facebook for English-Urdu languages using content-based
features and 64 different stylistic based features to identify the age and gender on
multilingual and translated corpora. Seifeddine Mechti. et al. (2013) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] identified the
age and gender of an anonymous author text using the J48 algorithm in the learning
process of the English and the Spanish corpora. Ben Verhoeven et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] discussed
the gender profiling system for multiple languages to generate the categorized
discourse lexicons for three languages, English, Dutch and German and the method used
is Rhetorical structure theory (RST) discourse parser. Francisco Rangel et al. (2013)
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] used the method for automatically identifying the emotions in Spanish written
texts of Facebook media.
      </p>
      <p>
        Nandhini et al. (2015) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] explained a method to detect the cyberbullying activities
on social media. Vineetha et al. (2018 ) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] explored the Malayalam gender
identification for WhatsApp data using conventional features and SVM. Maarten Sap et al.
(2014) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] developed the lexica for predicting age and gender in social media, they
also tried the regression and classification models for Facebook, blogs, and Twitter
data. Francisco Rangel et al. (2017) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] explored to identify the age and gender of an
author and they used four different languages such as Arabic, English, Portuguese and
Spanish. Vasiliki Simaki et al. (2016) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] automatically identifying the age and
gender based on the sociolinguistics and achieved better results. These results indicate
that the model based on the knowledge features and the linguistic choices that they
are preferred for social media users. Mohammed AliAl-garadi et al. (2016) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
detecting the cyberbullying in the Twitter using the supervised machine learning
approach. Anand Kumar et.al [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] conducted the shared task on Indian native language
identification for six Indian languages in FIRE-2017.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Dataset Description</title>
      <p>In order to develop a multilingual author profiling system for Roman-Urdu, the
MAPonSMS shared task organizers provided the training and testing corpus which
includes the multilingual (Roman Urdu and English) SMS messages grouped as
documents In the training corpus, there were 350 documents with annotations in which
210 documents were from male and the remaining 140 were from female. The
training and testing documents are given in the ".txt" format. In testing, the dataset
contains 150 documents in the same format. For gender, an author profile may belong to
the male or female class and regarding the age-group, there are three categories:
1519, 20-24 and 25-xx. The detailed statistics of the document counts in age-group and
gender are given in Table 1.</p>
      <p>Male</p>
      <p>In order to discriminate the word usage, identify the features, for the different
classes we categorized the data into five classes and plot it using the word cloud. We
have visualized the frequency top 50 words of each category. The top 50 words and
its corresponding count for different age-group are given in Fig. 1. The same for
different gender is shown in Fig. 2. Since we have not removed the stop words, the top
50 words contain most of the stop words but the proportion of usage differs between
classes. From the word cloud in figure 1 and 2, the most of the top 50 words are the
same for the different classes. On the other hand, in the Fig.1, the word "main" is not
in the top 50 words of 15-19 age group. The word "mein" only occurs in the 25-xx
age group. The word "to" is also not in the top 50-word list of 25-xx age-group.
Interestingly, in Fig. 2, even though most of the words are overlapped between the
gender, the proportion shows some discrimination in the gender. For example, the words
"yar", "ap","bhai" and "gi" may discriminate the gender.</p>
      <p>a) 15-19 age-group
b) 20-24 age-group
c) 25-xx age-group</p>
    </sec>
    <sec id="sec-4">
      <title>Author Profiling using word and char features</title>
      <p>We have developed the author profiling system using word-based and
characterbased Term Frequency and Inverse Document Frequency (TFIDF) features. Since we
don't aware of Urdu language and its stop words, especially in Roman Urdu, we have
not tried the language specific features. Before choosing the hybrid features (word
and char) with the set of parameters, we experimented the author profiling for word
based and char based features separately. But, the cross-validation results are not
promising in the above-mentioned methods. The detailed cross-validation results for
various feature sets are explained in section 5.</p>
      <sec id="sec-4-1">
        <title>Training Corpora</title>
      </sec>
      <sec id="sec-4-2">
        <title>Testing Corpora</title>
      </sec>
      <sec id="sec-4-3">
        <title>TFIDF</title>
      </sec>
      <sec id="sec-4-4">
        <title>Word features</title>
      </sec>
      <sec id="sec-4-5">
        <title>Character features</title>
      </sec>
      <sec id="sec-4-6">
        <title>SVM Classifier</title>
      </sec>
      <sec id="sec-4-7">
        <title>Age group and Gender</title>
        <p>Fig.3 briefly explains the methodology used for the Multilingual Author profiling
SMS messages system developed for the shared task. We cleaned the corpora in the
preprocessing stage by removing multiple spaces, tabs and unknown characters. The
set of cleaned documents are given to the TFIDF feature extraction module. We have
considered word level and document level features in each document. The motivation
behind the selection of word and character features are; a) we assumed that the word
and phrase usage differs from a male, female and different age-groups in the SMS
messages. b) We believed that since it is code-mixed multilingual corpora, the
transliteration style, spelling, and usage of capital and small Roman letters consists of author
traits. We varied the word features from unigram, bigram to trigram and character
features from bigram to 5-grams. We have also tried the combination of features like
unigram + bigram, bigram-trigram etc. in both word and character features. Finally,
we combined the feature matrix of words and characters and cross-validated the
performance of the author profiling for the given training corpora. The features are
finally classified using the well-known Support Vector Machine linear classifier with
default parameter settings. The testing corpora also converted to word and char based
TFIDF features and given to the classifier. The classifier's output, as well as the
deSTEP 1 :</p>
        <sec id="sec-4-7-1">
          <title>STEP 2: STEP 3 :</title>
          <p>veloped system, is submitted to the organizer's for evaluation. Since only one
submission is accepted by the organizers, we have submitted the system which gives high
cross-validation accuracy.</p>
          <p>Due to the time limit, we have not explored the preprocessing techniques and the
features pertained to the author profiling task. Another reason for not using
wellknown preprocessing steps like case folding, stemming and stop word removal is that
we believed the originally written style of the text holds the personality and behavior
traits of the user.
4.1</p>
        </sec>
        <sec id="sec-4-7-2">
          <title>Word and Char based TFIDF Features</title>
          <p>We have used the conventional word and character-based features for developing
the document based author profiling system. The experiments and the cross-validation
results for different n-gram combinations are briefly given in section.5.</p>
          <p>TFIDF is said to be Term Frequency-Inverse Document Frequency which is most
often used in the application like document categorization in Information Retrieval
and Text Mining. The number of times that word occurs in the document is
proportional increases in the importance of the collection of word or corpus. Normally,
TFIDF weight is computed by following steps:</p>
          <p>We first compute the Term Frequency (TF) i.e, the number of times the term or
word that occurs in the document that should be divided by the total no of a document
in order normalize.</p>
          <p>TF(t) = Number of times term t occurs in a document</p>
          <p>Total no of terms in the document</p>
          <p>The Inverse Document Frequency (IDF) is computed by taking the logarithm of a
total number of the document in the corpus divided by the number of the document
where the particular term occurs. Thus we want to weigh down the repeated terms
which scale up the different one by computing the following.</p>
          <p>IDF(t) = log(Total number of documents / Number of documents with term t ).</p>
          <p>To calculate the TFIDF is combing both Term frequency is multiplied with
Inverse Document Frequency is given as</p>
          <p>TFIDF = TF(t) * IDF(t)</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments and Results</title>
      <p>In this section, we present the cross-validation accuracy for the Multilingual
author profiling on SMS messages. The 10-fold cross-validation is used to find out the
best model for different feature sets and parameters. The experimented models
identified the age-group, gender and both jointly. The results of the 10-fold cross-validation
are computed and shown in Table 2.</p>
      <p>Gender identification accuracy of the age-group is less in the proposed model. It
shows that identifying the age-group is a complicated task compared with the gender
detection. For identifying gender the word-unigram model outperforms the
wordbigrams, word trigram, and character-based features. For age-group identification,
character-based model outperforms the word based models. As expected, the accuracy
of the age-group, gender joint identification is less accurate. We combined the word
features and character features and interestingly it gives the highest cross-validation
accuracy compared to the models tested. The joint model accuracy is significantly
increased in the word and character-based model. We combined the best features of
word and character models for the hybrid models.</p>
      <p>After the extensive cross-validation analysis, we fixed to use the hybrid model as
the final submission. Before that in order to find out the discriminative features for the
gender identification, we plot the SVM discriminative feature model for male and
female. Fig.4. explains the discriminative features where red color indicates the
female and blue indicates the male. We can easily understand that the word "ga" and
"bhai" are used mostly by the male and "rai" is mostly used by the female. We have
also seen that from Fig.2 the "ga" is used mostly in the male class.</p>
      <p>In this paper, we illustrate the work on identification of gender and age-group in
Multilingual Author Profiling on SMS messages (MAPonSMS) shared task on
Roman Urdu and English language. Using the training dataset, we have developed the
system using word and char based Term Frequency &amp; Inverse Document Frequency
(TFIDF) features and classified with Support Vector Machine classifier. We have
discussed the dataset descriptions and experiments used for the Multilingual author
profiling task. We experimented with the 10-fold cross-validation with different
feature sets and the results were reported. We have also presented the results given by
the organizers of the shared task. The submitted system with Word{Unigram}+
Char{Bigram+Trigram+4-grams} achieved the state-of-art best performance in terms
of accuracy 87% for identification of age and 65% for gender. The joint accuracy
gained is 57%. Interestingly, our cross-validation results are very close to the results
provided by the organizer. This shows the consistency of the dataset and the method
used in the shared task. Detailed error analysis can be considered in near future to
improve the accuracy further for age -group detection. The preprocessing pertained to
the author profiling can be incorporated. The linguistic, behavioral features and the
statistical test features can be incorporated to improve the performance of age-group
identification. The role of stop words and the identification of specific words
pertained to gender and age-group can be studied. Finally, the character based
embeddings with deep neural networks can also be proposed for large-scale author profiling.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgment</title>
      <p>We would like to thank MAPonSMS organizers and Forum for Information
Retrieval Evaluation-FIRE 2018 for organizing the Author profiling task.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Oren</given-names>
            <surname>Halvani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Christian</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Lukas</given-names>
            <surname>Graner</surname>
          </string-name>
          .
          <source>Authorship Verification based on Compression-Models arXIV:1706.00516 [cs.ir] 1 June 2017</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Michael</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Schuggnall</surname>
            and
            <given-names>Gunther</given-names>
          </string-name>
          <string-name>
            <surname>Specht</surname>
          </string-name>
          .
          <article-title>Detecting Plagiarism in Text Documents through Grammar-Analysis of Authors , 2015</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Fransisco</given-names>
            <surname>Rangel</surname>
          </string-name>
          , Paolo.
          <article-title>On the impact of emotions on author profiling</article-title>
          .
          <source>Information Processing and Management</source>
          <volume>52</volume>
          (
          <year>2016</year>
          )
          <fpage>73</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Monika</given-names>
            <surname>Briediene</surname>
          </string-name>
          , Jurgita Kapociute Dzikiene.
          <article-title>An Automatic author profiling from NonNormative Lithuanbian Texts</article-title>
          .
          <source>CEUR-WS.org\</source>
          vol-
          <volume>2145</volume>
          \
          <fpage>p18</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Mehwish</given-names>
            <surname>Fatima</surname>
          </string-name>
          , Komal Hasan, Saba Anwar, Rao Muhammad Adeel Nawab (
          <year>2017</year>
          ),
          <article-title>"Multilingual author profiling on Facebook"</article-title>
          ,
          <source>Information Processing &amp; Management, Elsevier</source>
          , pp:
          <fpage>886</fpage>
          -
          <lpage>904</lpage>
          , Vol:
          <volume>53</volume>
          , Issue: 4, Standard:
          <fpage>0306</fpage>
          -
          <lpage>4573</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Seifeddine</given-names>
            <surname>Mechti</surname>
          </string-name>
          , Maher Jaoua , Lamia Hadrich Belguith , and
          <string-name>
            <given-names>Rim</given-names>
            <surname>Faiz</surname>
          </string-name>
          .
          <article-title>Author Profiling Using Style-based Features Notebook for PAN at CLEF</article-title>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Ben</given-names>
            <surname>Verhoeven</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Walter</given-names>
            <surname>Daelemans</surname>
          </string-name>
          .
          <article-title>Discourse lexicon induction for multiple languages and its use for gender profiling</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Francisco Rangel , Paolo Rosso .
          <article-title>On the Identification of Emotions and Authors' Gender in Facebook Comments on the Basis of their Writing Style</article-title>
          .http://www.uniweimar.de/medien/webis/research/events/pan-13/pan13- web/author-profiling.html.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>B.</given-names>
            <surname>Sri Nandhini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.I.Sheeba.Online</given-names>
            <surname>Social Network Bullying Detection Using Intelligence Techniques</surname>
          </string-name>
          .
          <source>International Conference on Advanced Computing Technologies and Applications</source>
          (ICACTA-
          <year>2015</year>
          ).
          <source>Procedia Computer Science</source>
          <volume>45</volume>
          (
          <year>2015</year>
          )
          <fpage>485</fpage>
          -
          <lpage>492</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. Vineetha Rebecca Chacko,
          <string-name>
            <surname>Anand Kumar</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soman</surname>
            <given-names>K P</given-names>
          </string-name>
          , ”
          <article-title>Experimental Study Of Gender And Language Variety Identificaion in Social Media”</article-title>
          <source>In: Proceedings of the Second International Conference on Big Data and Cloud Computing, (Springer) Advances in Intelligent Systems and Computing (AISC)</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Maarten</surname>
          </string-name>
          Sap et.al
          <article-title>Developing Age and Gender Predictive Lexica over Socia Media</article-title>
          .
          <source>Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pages
          <fpage>1146</fpage>
          -
          <lpage>1151</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Francisco Rangel , Paolo Rosso , Martin Potthast ,
          <string-name>
            <given-names>Benno</given-names>
            <surname>Stein</surname>
          </string-name>
          .
          <source>Overview of the 5th Author Profiling Task at PAN</source>
          <year>2017</year>
          :
          <article-title>Gender and Language Variety Identification in Twitter</article-title>
          , http://webis.de/research/events/pan-13/pan13-web/author-profiling.html
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Vasiliki</surname>
            <given-names>Simaki</given-names>
          </string-name>
          , Iosif Mporas,
          <string-name>
            <given-names>Vasileios</given-names>
            <surname>Megalooikonomou</surname>
          </string-name>
          .
          <article-title>Evaluation and Sociolinguistic Analysis of Text Features for Gender and Age Identification</article-title>
          .
          <source>American Journal of Engineering and Applied Sciences</source>
          <year>2016</year>
          ,
          <volume>9</volume>
          (
          <issue>4</issue>
          ):
          <fpage>868</fpage>
          .876 DOI: 10.3844/ajeassp.
          <year>2016</year>
          .
          <volume>868</volume>
          .876.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <article-title>MohammedAliAl-garadi,KasturiDewiVarathansri, Deviravana.Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network</article-title>
          .
          <source>Computer in Human Behaviour</source>
          . Volume
          <volume>63</volume>
          ,
          <year>October 2016</year>
          ,pages
          <fpage>433</fpage>
          -
          <lpage>443</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Anand Kumar</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barathi Ganesh</surname>
            <given-names>HB</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shivkaran</surname>
            <given-names>Singh</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soman KP</surname>
          </string-name>
          and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <article-title>Overview of the INLI PAN at FIRE-2017 Track on Indian Native Language Identification</article-title>
          .
          <source>In Proc. of Forum for Information Retrieval Evaluation</source>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>