<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bits_Pilani@INLI-FIRE-2017:Indian Native Language Identification using Deep Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rupal Bhargava∗</string-name>
          <email>rupal.bhargava@pilani.bits-pilani.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaspreet Singh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shivangi Arora</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yashvardhan Sharma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Native Language Identification, Natural Language Processing, Deep</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Birla Institute of Technology and Science</institution>
          ,
          <addr-line>Pilani Campus, Pilani</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Learning</institution>
          ,
          <addr-line>Neural Network, Machine Learning</addr-line>
        </aff>
      </contrib-group>
      <fpage>4</fpage>
      <lpage>7</lpage>
      <abstract>
        <p>The task of Native Language Identification involves identifying the prior or first learnt language of a user based on his writing technique and/or analysis of speech and phonetics in second language. There is a surplus of such data present on social media sites and organised dataset from bodies like Educational Testing Service(ETS), which can be exploited to develop language learning systems and forensic linguistics. In this paper we propose a deep neural network for this task using hierarchical paragraph encoder with attention mechanism to identify relevant features over tendencies and errors a user makes with second language for the INLI task in FIRE 2017. The task involves six Indian languages as prior/native set and english as the second language which has been collected from user's social media account.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Computing methodologies → Natural language
processing; Machine learning; • Information systems → Information
systems applications;</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>Native Language Identification (NLI) is the task of identifying the
native language of a user based on his usage of second language
with the help of a computer program.The task is usually modelled
as a classification problem where a machine learning algorithm is
trained in a supervised fashion, which is then used for predicting
the native language of the user text.</p>
      <p>NLI works by underpinning the fact that users′ linguistic
background will lead them to use particular language phrases/styles
more oftenly in their newly acquired languages. Despite the
increasing research in this field there is lack of NLI datasets covering
the wide span of languages and are still pretty small in size.</p>
      <p>
        NLI is a non-trivial and challenging problem with an assumption
that the native or first language influences Second Language
Acquisition (SLA) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. If machines could learn tendencies and mistakes
that language learners make, then it would help in development of
education systems for learning new languages and acquisition. It
would also help educators to develop techniques for helping learn
dificult aspects of second language based on their first language
and its language transfer pattern [
        <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
        ].NLI can be closely related
to the task of authorship profiling, which aims to extract
information like age, gender and native origin of the author solely from
text which is useful for forensic linguistics. NLI can be used to
improve the performance of automatic speech recognition (ASR)
for non-native speakers using speech and phonetic features for the
task.
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>
        NLI as an artificial intelligence challenge is gaining popularity,
which can be seen it being part of several shared tasks in various
events in recent years[
        <xref ref-type="bibr" rid="ref13 ref16 ref18">13, 16, 18</xref>
        ]. Usually, models will try to
extract patterns that a speakers with diferent native language will
have in terms of diferent topic biases, misspellings,
mispronunciations or usage frequency of particular words. Also some languages
have specific linguistic styles, like Japanese is much more formal
in nature. Malmasi [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] have extensively tested series of linear
classifiers, and observed that state of the art results are achived by
ensemble model. The features they have used are simple unigrams,
bigrams, and character n-grams further including function word,
POS-tagged n-grams and sentence dependencies for improving the
results. Usually character level features generally outperform word
level features for NLI. Stehwein and Pado [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] also analyze the
performance of SVM's on this task, and use their results to identify
key features of the datasets. SVM's tend to outperform neural
networks when examined for performance on this task with a similar
dataset by Malmasi et al [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Deep Neural networks have not been
used much for NLI task, even in 2013 shared task there was no
deep neural network submission[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Previous approaches were
dependent upon features like grammatical structure of the language,
string kernels[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], syntactic features [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In forthcoming sections
the design and performance of our model is described.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>DATA ANALYSIS</title>
      <p>
        Dataset provided by task organizers[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] contains information
collected from English speakers of six diferent native Indian languages.
It includes 1233 written text by the diferent speakers on social
media websites. All the data was present in romanised script. The
distribution of class and training instances can be seen in Table 1.
4
      </p>
    </sec>
    <sec id="sec-5">
      <title>PROPOSED TECHNIQUE</title>
      <p>
        We have tried to model the task as a text classification problem and
have tried solving it using hierarchical encoder [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], so the task
we had was to pre-process the text for passing it to network and
generate word embeddings and design the neural networks model.
4.1
      </p>
    </sec>
    <sec id="sec-6">
      <title>Pre-Processing</title>
      <p>
        The data is tokenized, and capitalizations are removed. The english
stop words have not been removed and as well as the punctuation
marks are also retained, as they might be useful information to
classify into the native language. Function words such as ‘which’, ‘the’,
‘at’, have been useful to distinguish native language[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Fixed length
sentence runs are formed by delimiting with full-stop, comma and
semi-colon which were padded by zeros to keep 128 as the fixed
length input to network.
4.2
      </p>
    </sec>
    <sec id="sec-7">
      <title>Word Embeddings</title>
      <p>
        In each of three diferent runs we have used a diferent approach
for generating the word vectors. The combined testing and training
data has around 23,000 unique tokens in roman script but
contains slang and transliterated native language words. Below are the
diferent inputs methods of our embeddings we tested:
4.2.1 Pre-trained vectors. : We used google news embeddings
which are produced by word2vec [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] model having a vocabulary
of 3 million words and phrases and dimensionality of 300 for Run
1 which were further trained by applying online learning jointly
over the training and testing corpus for Run 2, which gives the
embedding additional context over the text that has to be dealt
with. We think that most pretrained word vectors will fail to cover
big parts of our vocabulary and even online learning is not enough
for capturing context with such small corpus.
      </p>
      <p>4.2.2 Random initialized word vectors. : Due to the above pointed
short comings in pre-trained vectors, we also used randomly
initialized vectors of dimension 300 only, as embeddings for Run 3,
and trained these embeddings through backpropagation during
model fitting, while this has ability to build embeddings even more
efectively except model requires more data to train and has risk of
over-fitting.
4.3</p>
    </sec>
    <sec id="sec-8">
      <title>Classification</title>
      <p>
        The most intuitive design for a text classifying neural network
is a recurrent architecture due to their retention of longer term
dependencies, and bi-directional one can also capture context in
reverse order. We have used GRU[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] cells instead of LSTM as the
give equal performance with lesser training. The first two runs had
network of similar depth the bi-directional layer has 256 GRU units
for sentence encoding and 128 GRU units for paragraph encoding.
But for randomly initialized embedding which had to be trained the
model was kept shallow having half the number of GRU cells for
both the encoders. The both the encoders have attention layer added
after recurrent units which help the model to weight words and
sentences which efectively classify it. For paragraph encoder the
ifnal hidden state of the attention layer was fed to a fully connected
softmax layer which returned probability distribution of six classes.
5
      </p>
    </sec>
    <sec id="sec-9">
      <title>ALGORITHM</title>
      <p>
        This project deals with classification of social media text to its
correct native language of the user. Basic assumption is that a text has
K sentences si and each sentence contains Ti words. wit where t ∈
[1,T ] represents the words in the ith sentence. The model encodes
the raw text into a vector representation, which is passed to a neural
network to perform text classification. Below we have described
building of text level vector from word vectors by using two levels
of encoding [
        <xref ref-type="bibr" rid="ref15 ref8">8, 15</xref>
        ] represented in Figure 11.
      </p>
      <p>
        Word Level Layers
5.1.1 Word Encoder. It takes sentence as input and if a sentence
has words wit where t ∈ [1, T ] , we first convert the words to
vectors using the embedding matrix created above. We use a
bidirectional Gated Recurrent Unit [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to get representation of words as
1Image from Hierarchical Attention Networks for Document Classification Yang et
al.(2016) under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License
hit , which contains the information of the whole sentence centred
around wit from both directions.
      </p>
      <p>
        5.1.2 Word Atention. Next, the above computed hidden
representation is subject to word attention layer[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], as some words
are more important for representation of the sentence meaning.
For that we pass the word annotation hit through a single layer
multilayer perceptron to get a hidden representation uit , then the
importance weight for a word is computed by similarity of uit with
a word level context vector uw by a softmax function. After that,
sentence vector si is computed as the weighted sum of the word
annotations.
5.2
      </p>
    </sec>
    <sec id="sec-10">
      <title>Sentence Level Layer</title>
      <p>5.2.1 Sentence Encoder. The sentence vector si generated by
word encoder is passed to the sentence encoder, a similar hidden
representation hit is returned by bidirectional Gated Recurrent
Unit whose count of units was described in section 4.3, but this
time it has paragraph level context.</p>
      <p>5.2.2 Sentence Atention. As to weight sentences that are more
relevant for classification, we use attention layer and introduce a
sentence level context vector us and again use the softmax function
for similarity calculations. The text vector v is weighted sum of
encoded sentences. Further this vector v is passed to fully connected
softmax layer to generate class probabilities.
6</p>
    </sec>
    <sec id="sec-11">
      <title>EXPERIMENTS &amp; RESULTS</title>
      <p>We used an 80-20% split of the training data to validation split. All
three model had a training accuracy of near 95% and had validation
accuracies in the range of 60% - 70%, below is the confusion matrix
of second run, the better of other two run over validation split.
Runs
Run1
Run2
Run3</p>
      <p>
        As per the results published by the task organizer[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] the highest
accuracy of each team is shown below(Figure 3). As it can be seen
despite good training and validation accuracies, the model did not
generalize suficiently and possibly sufered from problems of low
context embeddings and vocabulary shortage.
      </p>
    </sec>
    <sec id="sec-12">
      <title>Error Analysis</title>
      <p>The paper aims to develop a system which performs efective
native language identification without the use of grammatical and
structural features of languages. Although the model is good but
fails to capture generalized features and performs poorly on the
test set. The major issue is lack of ample data for efective training
for a deep learning model. Another issue for a such word level
model is large amount of slang and transliterated words present
in data whose context is not very efectively captured by word
embedding for such small sample of training corpus, moreover it
sufers from vocabulary shortage over the test data which afects
the classification performance.
7</p>
    </sec>
    <sec id="sec-13">
      <title>CONCLUSION &amp; FUTURE WORK</title>
      <p>In this paper, we have outlined a native language identification
approach for Indian languages based on an hierarchical deep neural
network. We describe our system for the INLI Task in FIRE 2017,
which involves a neural approach to this task. Although deep neural
networks are able to learn features for this task, traditional methods
still perform better with current datasets and models. In future we
plan to continue to work on this problem and develop a hybrid
system which combines traditional approaches of POS-tagged
ngrams and sentence dependencies with deep learning models.</p>
    </sec>
    <sec id="sec-14">
      <title>ACKNOWLEDGMENTS</title>
      <p>We are thankful to the authorities of Department of Computer
Science at Birla Institute of Technology and Science, Pilani, for
providing infrastructure facilities for pursuing the work done for
this research project.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Dzmitry</given-names>
            <surname>Bahdanau</surname>
          </string-name>
          , Kyunghyun Cho, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Neural machine translation by jointly learning to align and translate</article-title>
          .
          <source>arXiv preprint arXiv:1409.0473</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Julian</given-names>
            <surname>Brooke</surname>
          </string-name>
          and
          <string-name>
            <given-names>Graeme</given-names>
            <surname>Hirst</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Measuring Interlanguage: Native Language Identification with L1-influence Metrics.</article-title>
          .
          <source>In LREC</source>
          .
          <volume>779</volume>
          -
          <fpage>784</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Serhiy</given-names>
            <surname>Bykh</surname>
          </string-name>
          and
          <string-name>
            <given-names>Detmar</given-names>
            <surname>Meurers</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Exploring Syntactic Features for Native Language Identification: A Variationist Perspective on Feature Encoding and Ensemble Optimization.</article-title>
          .
          <source>In COLING</source>
          .
          <year>1962</year>
          -
          <fpage>1973</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Radu</given-names>
            <surname>Tudor</surname>
          </string-name>
          <string-name>
            <surname>Ionescu</surname>
          </string-name>
          , Marius Popescu, and
          <string-name>
            <given-names>Aoife</given-names>
            <surname>Cahill</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>String kernels for native language identification: insights from behind the curtains</article-title>
          .
          <source>Computational Linguistics</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Scott</given-names>
            <surname>Jarvis</surname>
          </string-name>
          and
          <string-name>
            <given-names>Scott A</given-names>
            <surname>Crossley</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Approaching Language Transfer Through Text Classification: Explorations in the Detectionbased Approach</article-title>
          . Vol.
          <volume>64</volume>
          . Multilingual Matters.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Moshe</given-names>
            <surname>Koppel</surname>
          </string-name>
          , Jonathan Schler, and
          <string-name>
            <given-names>Kfir</given-names>
            <surname>Zigdon</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Automatically determining an anonymous authorâĂŹs native language</article-title>
          .
          <source>Intelligence and Security Informatics</source>
          (
          <year>2005</year>
          ),
          <fpage>41</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Robert</given-names>
            <surname>Lado</surname>
          </string-name>
          .
          <year>1957</year>
          .
          <article-title>Linguistics Across Cultures: Applied Linguistics for Language Teachers</article-title>
          . (
          <year>1957</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Jiwei</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>Minh-Thang Luong</surname>
            , and
            <given-names>Dan</given-names>
          </string-name>
          <string-name>
            <surname>Jurafsky</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>A hierarchical neural autoencoder for paragraphs and documents</article-title>
          .
          <source>arXiv preprint arXiv:1506.01057</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Minh-Thang</surname>
            <given-names>Luong</given-names>
          </string-name>
          , Hieu Pham, and
          <string-name>
            <given-names>Christopher D</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Efective approaches to attention-based neural machine translation</article-title>
          .
          <source>arXiv preprint arXiv:1508.04025</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Anand Kumar</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barathi Ganesh</surname>
            <given-names>HB</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shivkaran</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sonam</surname>
            <given-names>K P</given-names>
          </string-name>
          , and Paolo Rosso.
          <year>2017</year>
          .
          <article-title>Overview of the INLI PAN at FIRE-2017 Track on Indian Native Language Identification</article-title>
          .
          <source>In Notebook Papers of FIRE</source>
          <year>2017</year>
          , FIRE-2017, Bangalore, India, December 8-
          <fpage>10</fpage>
          . CEUR Workshop Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Shervin</given-names>
            <surname>Malmasi</surname>
          </string-name>
          et al.
          <year>2016</year>
          .
          <article-title>Native language identification: explorations and applications</article-title>
          . (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Shervin</given-names>
            <surname>Malmasi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Dras</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Native Language Identification using Stacked Generalization</article-title>
          .
          <source>arXiv preprint arXiv:1703.06541</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Shervin</surname>
            <given-names>Malmasi</given-names>
          </string-name>
          , Keelan Evanini, Aoife Cahill, Joel Tetreault, Robert Pugh, Christopher Hamill, Diane Napolitano, and
          <string-name>
            <given-names>Yao</given-names>
            <surname>Qian</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A Report on the 2017 Native Language Identification Shared Task</article-title>
          .
          <source>In Proceedings of the 12th Workshop on Building Educational Applications Using NLP. Association for Computational Linguistics</source>
          , Copenhagen, Denmark.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Kai Chen, Greg Corrado, and
          <string-name>
            <given-names>Jefrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Eficient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Nikolaos</given-names>
            <surname>Pappas</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andrei</given-names>
            <surname>Popescu-Belis</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Multilingual hierarchical attention networks for document classification</article-title>
          .
          <source>arXiv preprint arXiv:1707.00896</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Björn</surname>
            <given-names>W Schuller</given-names>
          </string-name>
          , Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K Burgoon, Alice Baird, Aaron C Elkins, Yue Zhang, Eduardo Coutinho, and
          <string-name>
            <given-names>Keelan</given-names>
            <surname>Evanini</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity &amp; Native Language.</article-title>
          .
          <source>In INTERSPEECH</source>
          .
          <year>2001</year>
          -
          <fpage>2005</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Sabrina</given-names>
            <surname>Stehwien</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Padó</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Generalization in Native Language Identification: Learners versus Scientists</article-title>
          .
          <source>CLiC it</source>
          (
          <year>2015</year>
          ),
          <fpage>264</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Joel</surname>
            <given-names>R Tetreault</given-names>
          </string-name>
          , Daniel Blanchard, and
          <string-name>
            <given-names>Aoife</given-names>
            <surname>Cahill</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>A Report on the First Native Language Identification Shared Task.</article-title>
          .
          <source>In BEA@ NAACL-HLT</source>
          .
          <fpage>48</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Zichao</surname>
            <given-names>Yang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Diyi</given-names>
            <surname>Yang</surname>
          </string-name>
          , Chris Dyer, Xiaodong He,
          <string-name>
            <surname>Alexander J Smola</surname>
          </string-name>
          , and
          <string-name>
            <surname>Eduard H Hovy</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Hierarchical Attention Networks for Document Classification.</article-title>
          .
          <source>In HLT-NAACL</source>
          .
          <fpage>1480</fpage>
          -
          <lpage>1489</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>