<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>idrbt-team-a@IECSIL-FIRE-2018: Named Entity Recognition of Indian languages using Bi-LSTM</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>S. Nagesh Bhattu</string-name>
          <email>nageshbhattu@nitandhra.ac.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>N. Satya Krishna</string-name>
          <email>satya.krishna.nunna@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>D. V. L. N. Somayajulu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IDRBT</institution>
          ,
          <addr-line>Road No.1 Castle Hills, Masab Tank, Hyderabad, Telangana</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>NIT Tadepalligudem</institution>
          ,
          <addr-line>West Godavari District, Andhra Pradesh</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>NIT Warangal</institution>
          ,
          <addr-line>Telangana</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Named entity recognition(NER) is a key task in NLP pipeline useful for various applications such as search engines, question answering systems, sentiment analysis in domains ranging from travel, bio-medical text, newswire text, nancial text etc. NER is e ectively solved using sequence labeling approaches like HMM and CRF. Though, CRF (being discriminative) shows better performance compared to HMM, it uses discrete features and do not naturally capture semantic features. LSTM based RNNs can address this through their ability to deal with continuous valued features such as Word2Vec, Glove, etc. Another advantage of using LSTM lies in its ability to capture the long and short range dependencies through its novel gating structure. This work presents the deep learning based NER using special type of Recurrent Neural Network(RNN) called Bi-directional Long Short-Term Memory(Bi-LSTM). We use a two stage LSTM based network, one acting at character level capturing the n-gram patterns related to NER. Such features are crucial in NER for Indian languages as su xes used in Indian languages often carry syntactic information. The character based emebeddings, word2vec embeddings and sequence based bi-LSTM embeddings together carry all the requisite features necessary for the NER prediction problem. We present the experimental results on two test datasets from each Indian language such as hindi, kannada, malayalam, tamil and telugu. The accuracies on test-1 datasets of hindi, kannada, malayalam, tamil and telugu languages are 97.82%, 97.04%, 97.46% 97.41% and 97.54% respectively. These are highest accuracy results given by this model when compared with all other models presented by competitors in this shared task [2]. The accuracies on test-2 datasets of hindi, kannada, malayalam, tamil and telugu languages are 97.82%, 96.79%, 96.58% 96.18% and 97.68% respectively. On test-2 dataset this model stood in rst position for hindi language and second position for the remaining four languages. The shared task organizers released F-Scores for test-2 datasets of all languages. This model got 94.0%, 84.55%, 84.78%, 89.55% and 91.44% F-Scores on hindi, kannada, malayalam, tamil and telugu languages respectively. All these F-Scores are in second position compared with other models. In overall average accuracy and F-Score of this model on all these ve Indian languages is 97.01% and 86.99% which are in second position.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Named Entity Recognition(NER) is one of the major subtasks in the eld of
Information Extraction(IE). The objective of NER is to identify and classify
the entities into pre-con ned class names person, location, event, number,
organization, occupation, datenum, name and other from the given unstructured
text. For example consider the sentences shown in table-1. In both sentences,
each word is annotated with a named entity tag. Identifying the entities in the
Sentence-1 john working as an assistant professor in IITD
NE tags Person other other other occupation occupation other organization
Sentence-2 Columbus discovered America in 1492
NE tags Person event location other datenum
unstructured text resolve many problems in di erent applications. For example
news publishers can manage their large data, created and generated on a daily
basis, by classifying them based on major places, events, organizations and
people. It can be used in customer support systems for processing the customer
feedbacks and also to improve the performance of searching algorithms on text
data.</p>
      <p>
        There has been a lot of work done in NER since 3 decades [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In early days
most of the work done for english language text using hand-crafted rule-based
techniques. Later, algorithms are developed using machine learning techniques
such as HMM, CRF. Compared to HMM, CRF uses a discriminative model and
hence it is able to perform better due to its generalizability of log-linear models
based on maximum entropy.
      </p>
      <p>
        Initially in 1995, the system development competition for NER task was
introduced by 6th Message Understanding Conference(MUC-VI) on news article
data. Later di erent shared task events were conducted for NER in di erent
languages. The CoNLL-2002 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and CoNLL-2003 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] also conducted two shared
tasks with same name Language-Independent Named Entity Recognition. They
provided the dataset with 4 di erent named entity annotations such as person,
location organization and name. IJCNLP-2008 conducted a shared task on ve
south and south asian languages such as hindi, bengali, oriya, telugu and urdu.
      </p>
      <p>
        CoNLL-2002 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] shared task organizers provided the datasets in the form of
train, development and test data for two languages spanish and duch.
CoNLL2003 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] organized the competition task on NER for english and german
languages. They provided the dataset in four di erent les such as training,
development, test and unlabeled corpus les for each language. The english language
dataset was prepared by collecting text from the Reuters Corpus4 having news
articles presented in the period of one year from mid-1996 to mid-1997. The
german dataset was prepared by collecting text from the ECI Multilingual Text
Corpus5 containing the news articles from german news papers.
      </p>
      <p>
        Black et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] presented two di erent approaches. One is modi ed
Transformation based Learning(TBL) integrated with Named Entity classi er and
another is decision tree induction based approach. He presented the F1-score of
67.49 and 56.43 on spanish and duch language test datasets respectively.
McNamee et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] applied a one-vs-rest multiclass classi er for NER using eight
linear kernel binary SVM classi ers. Cucerzan et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed a statistical
based language independent named entity recognizer using word internal
information(i.e current word, pre x and su x of the current word) and contextual
information(i.e previous and next words of the current word) extracted from the
given annotated training data.
      </p>
      <p>In this task we applied a deep learning based language independent NER
system which is built by integrating the Convolutional Neural Network(CNN)
followed by Bi-directional Long short-term Memory(Bi-LSTM). This system
utilizes the character and word level informations which are extracted from
unannotated corpus. We use this information in the form of two real valued vectors(aka
word embeddings) as input features to NER to classify the entities in a given
text. We experimented our model on ve datasets of di erent Indian languages
such as hindi, kannada, malayalam, tamil and telugu. In results section, we
presented the performance results of our system on two test datasets from each
language.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Approach</title>
      <p>
        NER is a sequence labeling task, in which we predict the label for each word
in a sequence of words. Applying traditional machine learning methods for text
classi cation require more feature engineering. Though, the performance of
rulebased NER is good, it requires more language dependent hand-crafted rules for
classi cation. To reduce the feature engineering overhead, in our approach we
applied the deep-learning based methods. As shown in the gure-1 rst we build
the feature vectors from two sources of information namely word level and
character level. The pre-trained word feature vectors6 are used in our approach. The
word feature vectors for new words (which are not having pre-trained vectors)
from the given dataset are built using the model. Character-to-word feature
vectors are built using Bi-LSTM to capture the character level information from
character sequences in each word of the dataset. We replaced each word in an
input sentence of Bi-LSTM with a word embedding created by concatenation
of the feature vectors corresponding to that word. Later, we predicted the label
sequence for each sentence using these word embeddings.
4 http://reuters.com/researchandstandards/
5 http://www.idc.upenn.edu
6 https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
In this section we described the computation of word embeddings, experimental
details using an approach described in the previous section, and results on ten
test sets(two test sets from each language dataseet).
This model uses the word embedding as input feature in place of a word in
the given input sentence. These word embeddings are created by combining the
word feature vector and char to word feature vector. These word feature vectors
are built using skip-gram model as de ned in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In skip-gram model, the word
feature vector Vwi is learned on large training corpus by computing the statistics
of occurrence of its context words given the word wi throughout all sentences
in training data. For example a large corpus represented as a sequence of words
w0; w1; w2; w3; :::wL in the training data, then it computes the log-likelihood
using equation-1.
Here, c indicates the context window size. wi is the current word in the sequence
and wi+j is a context word to wi with j distance from its position in context
window c. Here, is the probability denoting the occurrence of context word
wi+j given wi. We compute the value using softmax function given in
equation2. Here, vo is output feature vector corresponding to the wo and vi is feature
vector corresponding to input word wi. V is the size of vocabulary in the given
corpus.
      </p>
      <p>(wi+j=wi) = (wo=wi) =</p>
      <p>exp(voT vi)
PVu=1 exp(vuT vui )
3.2</p>
      <sec id="sec-2-1">
        <title>Problem statement and Bi-LSTM model</title>
        <p>As our consideration, the NER is a sequence labeling task, we represented each
input sentence as a sequence of words w0; w1; w2; :::wn and its corresponding
output label sequence is represented as t0; t1; t2; :::tn. Instead of using word sequence
directly as input to the Bi-LSTM model we replaced each word with its
corresponding word-embedding in the input sequence and then fed it to the Bi-LSTM
model. The size of word-embedding in our experiment is 450 dimensions(i.e 300
word vector dimensions + 150 char to word vector dimensions). Here we
implemented the Bi-LSTM cell using two Long Short-Term Memory(LSTM) cells.
One LSTM cell scans the input vector sequence in forward direction and another
LSTM cell scans the input vector sequence in backward direction. There are
different versions of LSTM cells are de ned. Among these we applied the following
LSTM cell as shown in the following equation-3.</p>
        <p>xt = (X:[ut; ht 1] bx)
yt = (Y:[ut; ht 1]</p>
        <p>by)
ot = (O:[ut; ht 1] bo)
c~t = tanh(S:[ut; ht 1] bs)
St = xt
ht = yt
c~t
tanh(st)</p>
        <p>St 1
ot</p>
        <p>Here, ut denotes the input word embedding corresponding to the tth position
word wt in the given input sequence. yt denotes the predicted label corresponding
to the word wt. ht denotes the hidden state vector. st represents the LSTM cell
state vector. The weight matrices X and Y are used by LSTM cell in its input
and output layers. Another weight matrices O and S are used by the LSTM in
its forget layer and context layers respectively. In the above equations denotes
the element wise vector addition and denotes the element wise vector product
operation.
3.3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Dataset</title>
        <p>
          Arnekt-IECSIL@FIRE2018 shared task[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] organizers provided ve datasets[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]
for competitors on ve Indian languages such as hindi, kannada, malayalam,
(2)
(3a)
(3b)
(3c)
(3d)
(3e)
(3f)
tamil and telugu. Each language dataset has three di erent les with text in
its corresponding language script. Among these three les, one is training le
having data in two column format with label in second column for each word
in a sentence. The sentences are separated with a new line labeled as newline.
Remaining two are test-1 and test-2 les having data same as training le
without labels. Each test le has 20% of data among the overall dataset. Except
kannada dataset remaining datasets are having su cient number of sentences
and words for learning word feature vectors using deep learning methods. The
training le in each dataset has nine distinct labels such as datenum, number,
person, occupation, organization, location, name, things and other. The table-2
describes the detail description regarding the number of sentences, number of
words, number of unique words in each le of all datasets.
Malayalam
As per evaluation procedure given by the Shared task organizers, our model is
evaluated using these metrics.
        </p>
        <p>Accuracy =</p>
        <sec id="sec-2-2-1">
          <title>N o:Of words are assigned with the correct label N o:Of words in the dataset</title>
          <p>P recision(Pi) =
Recall(Ri) =</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>N o:Of words are correctly labeled with labeli</title>
          <p>N o:Of words are labeled with labeli</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>N o:Of words are correctly labeled with labeli T otal N o:Of words with labeli in test data</title>
          <p>(4)
(5)
(6)
f score(Fi) =
2</p>
          <p>Pi</p>
          <p>Ri</p>
          <p>Pi + Ri
Overall f score(F ) =
1
L
j j iinL</p>
          <p>X Fi
(7)
(8)
The table-3 summarizes the accuracy on testset-1 and 2 of ve Indian language
datasets. In Pre-Evaluation on all language datasets, this model shown the high
performance compared to the all other models presented in competition. In
FinalEvaluation this model got the rst position in hindi language , secodn position in
malayalam, tamil and telugu languages and third position in kannada language.
The table-4 summarizes the F1-Scores of Final-Evaluation. This model given
second highest performance in F-Score on kannada, malayalam, tamil and telugu
language datasets. The actual F-score for hindi dataset is nearly 94.00%. But
the F-score given in table-4 for hindi language is 85.9% according to the results
given by the shared task organizers. The reason for less F-Score is typographical
mistake done by us while converting the predicted label index with its
corresponding label name datenum in the results le. Due to the zero F-Score for
datenum label, the overall F-Score of this model is reduced to 85.9%. We
gureout this mistake after announcement of results by the shared task organizers.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>As a part of competitive participation in Arnekt-IECSIL@FIRE2018 shared task,
in this paper we have presented a NER system implemented using deep learning
methods, in which we consider the pre-trained word vectors and
character-toword vectors as features. We have presented the experimental results on ve
Indian language datasets.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Barathi</given-names>
            <surname>Ganesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.B.</given-names>
            ,
            <surname>Soman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.P.</given-names>
            ,
            <surname>Reshma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            ,
            <surname>Mandar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Prachi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Anitha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Anand</surname>
          </string-name>
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Information extraction for conversational systems in indian languages - arnekt iecsil</article-title>
          .
          <source>In: Forum for Information Retrieval Evaluation</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Barathi</given-names>
            <surname>Ganesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.B.</given-names>
            ,
            <surname>Soman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.P.</given-names>
            ,
            <surname>Reshma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            ,
            <surname>Mandar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Prachi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Anitha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Anand</surname>
          </string-name>
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Overview of arnekt iecsil at re-2018 track on information extraction for conversational systems in indian languages</article-title>
          .
          <source>In: FIRE (Working Notes)</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Black</surname>
            ,
            <given-names>W.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasilakopoulos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Language-independent named entity classi cation by modi ed transformation-based learning and by decision tree induction</article-title>
          .
          <source>In: Proceedings of CoNLL-2002</source>
          . pp.
          <volume>159</volume>
          {
          <fpage>162</fpage>
          .
          <string-name>
            <surname>Taipei</surname>
          </string-name>
          ,
          <string-name>
            <surname>Taiwan</surname>
          </string-name>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>5</volume>
          ,
          <issue>135</issue>
          {
          <fpage>146</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cucerzan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yarowsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Language independent ner using a uni ed model of internal and contextual evidence</article-title>
          .
          <source>In: Proceedings of CoNLL-2002</source>
          . pp.
          <volume>171</volume>
          {
          <fpage>174</fpage>
          .
          <string-name>
            <surname>Taipei</surname>
          </string-name>
          ,
          <string-name>
            <surname>Taiwan</surname>
          </string-name>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>McNamee</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , May eld, J.:
          <article-title>Entity extraction without language-speci c resources</article-title>
          .
          <source>In: Proceedings of CoNLL-2002</source>
          . pp.
          <volume>183</volume>
          {
          <fpage>186</fpage>
          .
          <string-name>
            <surname>Taipei</surname>
          </string-name>
          ,
          <string-name>
            <surname>Taiwan</surname>
          </string-name>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Nadeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sekine</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A survey of named entity recognition and classi cation</article-title>
          .
          <source>Lingvisticae Investigationes</source>
          <volume>30</volume>
          (
          <issue>1</issue>
          ),
          <volume>3</volume>
          {
          <fpage>26</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Tjong</given-names>
            <surname>Kim Sang</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.F.</surname>
          </string-name>
          :
          <article-title>Introduction to the conll-2002 shared task: Languageindependent named entity recognition</article-title>
          .
          <source>In: Proceedings of CoNLL-2002</source>
          . pp.
          <volume>155</volume>
          {
          <fpage>158</fpage>
          .
          <string-name>
            <surname>Taipei</surname>
          </string-name>
          ,
          <string-name>
            <surname>Taiwan</surname>
          </string-name>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Tjong</given-names>
            <surname>Kim Sang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.F.</given-names>
            ,
            <surname>De Meulder</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>Introduction to the conll-2003 shared task: Language-independent named entity recognition</article-title>
          . In: Daelemans,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Osborne</surname>
          </string-name>
          , M. (eds.)
          <source>Proceedings of CoNLL-2003</source>
          . pp.
          <volume>142</volume>
          {
          <fpage>147</fpage>
          . Edmonton, Canada (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>