-

SSN_NLP@IECSIL-FIRE-2018: Deep Learning Approach to Named Entity Recognition and Relation Extraction for Conversational Systems in Indian Languages

D. Thenmozhi

B. Senthil Kumar

senthil@ssn.edu.in 0

Chandrabose Aravindan

aravindanc@ssn.edu.in 0 0 Department of CSE, SSN College of Engineering , Chennai

Named Entity Recognition (NER) focuses on the classification of proper nouns into the generic named entities (NE) such as person_names, organizations, locations, currency and dates. NER has several applications like conversation systems, machine translation, automatic summarization and question answering. Relation Extraction (RE) is an information extraction process used to identify the relationship between NEs. RE is very important in applications like short answer grading, conversation systems, question answering and ontology learning. NER and RE in Indian languages are difficult tasks due to their agglutinative nature and rich morphological structure. Further, developing language independent framework that supports all Indian Languages is a challenging task. In this paper, we present a deep learning methodology for both NER and RE in five Indian languages namely Hindi, Kannada, Malayalam, Tamil and Telugu. We proposed a common approach that works for both NER and RE tasks. We have used neural machine translation architecture to implement our methodology for these tasks. Our approach was evaluated using the data set given by IECSIL@FIRE2018 shared task. We have evaluated on two sets of data for NER task and obtained the accuracies as 94.41%, 95.23%, 95.97% and 96.02% for the four variations on pre-evaluation test set and 95.9%, 95.85% and 95.05% for the three runs on final-evaluation test set. Also, for RE task, we have obtained the accuracies as 56.19%, 60.74%, 60.7%, 75.43% and 79.11% for our five variations on pre-evaluation test set and 79.44%, 76.01% and 61.11% for Run 1, Run 2 and Run 3 respectively on final-evaluation test set.

Named Entity Recognition (NER) Relation Extraction Information Extraction Text mining Deep Learning Indian Languages

Named Entity Recognition (NER) is an Information Extraction (IE) task which identifies and classifies the proper names in the text into predefined classes such as names of persons, organizations, dates, locations, numbers and currency. NER is very important for many NLP applications such as machine translation, text summarization, question answering and short answer grading. NER is very popular since 1970 in English and in other European languages. Several approaches have been reported for NER in these languages. Deep learning methods have also been employed for English, European and Chinese languages [ 5, 9, 7, 19 ]. However, NER for Indian languages is a very challenging task due to the characteristics namely no closed set vocabulary, no concept of capitalization, polysemy and ambiguity. Also, due to the complex morphology structure and agglutinative nature, IE in Indian languages is still an open challenge. Several methodologies including rule based, statistical and machine learning approaches have been reported for NER in Indian languages. However, the approaches are language dependent and no Indian language except Bengali reported above 90% F-score [ 18 ]. Moreover, developing language independent framework that supports all Indian Languages is a challenging task.

Relation Extraction (RE) is a process of extracting the relationships between NEs. It is also an IE task which extracts and classifies the relationship between entities. For example, “lives in” is the relationship present between person and location. RE is so important for applications such as ontology learning, conversational systems, question answering and short answer grading. Several methods are reported in literature to learn the relations automatically from English documents [ 22, 24, 14, 16, 8, 25 ]. Many of them are domain dependent [ 8, 25 ] and a few are domain independent approaches [ 22, 14 ]. They used approaches like rule-based, supervised and unsupervised to learn taxonomic [ 15, 23 ] or semantic relations [ 8, 17 ] between the entities. Only a very few approaches are presented that learn both taxonomic and semantic relations for any domain [ 22, 26 ]. Deep learning methods are also employed for RE in recent years [ 10, 13, 4 ]. However, RE for Indian languages [ 20 ] is still an open challenge. Further, developing language independent framework that supports all Indian Languages for extracting relations is a challenging task.

The shared task IECSIL@FIRE2018 [ 3 ] focuses on IE for conversational systems in Indian languages namely Hindi, Kannada, Malayalam, Tamil and Telugu. This shared task focuses on two sub-tasks namely NER task and RE tasks. The goal of IECSIL task is to research and develop techniques to extract information using language independent framework. IECSIL@FIRE2018 is a shared Task on information extractor for conversational systems in Indian languages collocated with FIRE-2018 (Forum for Information Retrieval Evaluation). This paper focuses on both sub tasks of IECSIL@FIRE2018 namely NER and RE. We propose a common approach that identifies and classifies the NEs to one of the generic classes namely name, occupation, location, things, organization, datenum, number and other, and also classifies the relations between NEs to one of the classes namely information_1, information_2, information_3, information_4, information_per, information_quant, information_closed, information_so, information_neg, information_cc, action_1, action_2, action_3, action_per, action_so, action_quant, action_neg, and other.

Proposed Methodology

We have used a deep learning approach based on Sequence to Sequence (Seq2Seq) model [ 21, 6 ]. We have utilized a common approach that addresses both NER and RE problems. We have adopted the Neural Machine Translation (NMT) framework [ 12, 11 ] based on Seq2Seq model for both NER and RE tasks. Figures 1 and 2 depict the flow of our approach for NER and RE tasks respectively.

The steps are detailed below.

The given text consists of tokens of the sentences and their corresponding NEs for NER task and consists of the input sentences and their corresponding relation labels for RE task. Sample inputs for both tasks are given in Figures 3 and 4.

We have prepared the data in such a way that Seq2Seq deep learning algorithm may be applied. The input sentences and NER / RE label input sequences are constructed separately based on the delimiter “newline”. For example, for the

NER task, the above sequence of tokens are converted to input sentences and NER label input sequences as shown in Figure 5 and Figure 6 respectively.

Similarly, for RE task, the input data is converted to input sentences and relation label output, as shown in Figure 7 and Figure 8.

Then the input sentences and NER / RE label input sequences are splitted into training sets and development sets. The vocabulary for both input sentences and NER / RE label input sequences are determined. For NER task, the input sentence with n words w1, w2, ...wn and NER label input sequence with n labels l1, l2, ...ln are given to the embedding layer. However, for RE task, the input sentence with n words w1, w2, ...wn and relation label input one label rl are given to the embedding layer.

To build a deep neural network model, a multi-layer RNN (Recurrent Neural Network) with LSTM (Long Short Term Memory) as a recurrent unit is used. This neural network consists of several layers namely, embedding layer, encoding layer, decoding layer, projection layer or softmax layer and loss layer. The embedding layer learns weight vectors from the input sentence and NER / RE label input sequence based on their vocabulary. These embeddings are fed into multi-layer LSTM where encoding and decoding are performed. The word embedding vector xwi for each word wi, where wi constitutes a time step, is the input to LSTM network. The computation of the hidden layer at time t and the output can be represented as follows.

it = σ(wx(i)x + wh(i)ht−1 + b(i)) ft = σ(wx(f)x + wh(f)ht−1 + b(f) + 1) (1)

ot = σ(wx(o)x + wh(o)ht−1 + b(o)) ct = tanh(wx(c)x + wh(c)ht−1 + b(c)) e ct = ft ◦ ect−1 + it ◦ ect hb/f = ot ◦ tanh(ct) (3) (4) (5) (6) where ws are the weight matrices, ht−1 is the hidden layer state at time t − 1, it, ft, ot are the input, forget, output gates respectively at time t, and hb/f is the hidden state of backward, forward LSTM cells. To have the better efficiency of LSTM, the bias value in the forget gate is set to a default value 1.

The attention mechanism [ 1, 11 ] is used to handle the longer sentences. Softmax layer or projection layer is a dense layer to obtain the NER / RE label output sequence. Loss layer is used to compute the training loss during model building. Once, the model is built, the NER / RE label output sequences are obtained by using the model for sequence mapping.

The target sequences we have obtained are the sequences of NER labels with respect to the given sentence. Thus, the NER label sequences are tokenized further to obtain the NE classes for each term. However, for RE task, the target sequence is itself a RE label and thus it not required to do any post process the output of our deep neural network. 3

Implementation

Our methodology was implemented using TensorFlow for IECSIL Shared Tasks namely NER and RE. The data set [ 2 ] used to evaluate the NER and RE tasks consists of a training set and two test sets namely pre-evaluation set and finalevaluation set for five Indian languages namely “Hindi”, “Kannada”, “Malayalam”, “Tamil” and “Telugu”. The details about the NER and RE data are given in Tables 1 and 2.

The input sentences and NER / RE label input sequences are constructed based on the delimiter “newline”. We have splitted these sequences into train set and development set to feed into the deep neural network. The details of the splits are given in Tables 3 and 4 for NER and RE tasks respectively. Languages

We have used TensorFlow code based on tutorial code released by Neural Machine Translation 1 [ 11 ] that was developed based on Sequence-to-Sequence (Seq2Seq) models [ 21, 1, 12 ] to implement our deep learning approach for NER and RE tasks. We have implemented several variations of the Seq2Seq model by varying the directionality, depth and number of training steps with a dropout of 0.2 and batch size of 128 to show the effectiveness of our proposed model of 8-layer, bi-LSTM with attention.

The implementation details of model building for both NER and RE tasks are explained below. 3.1

NER Models

We has used 4 variations of model building for NER task. The variations are given below.

– Model 1: 2 layer, uni-directional LSTM, without attention, 50,000 steps – Model 2: 4 layer, uni-directional LSTM, with scaled-luong attention, 75,000 steps – Model 3: 8 layer, bi-directional LSTM, with scaled-luong attention, 75,000 steps – Model 4: 8 layer, bi-directional LSTM, with scaled-luong attention, 1,00,000 steps

The development bleu scores obtained for these variations are given in Table 5.

It is observed from Table 5 that except for the “Malayalam” language, bidirectional LSTM with attention having 8 layers depth and more number of training steps works well for all other languages. 3.2

RE Models

We have implemented a total of five variations for finding the relation labels for the given sentences to show the effectiveness of our proposed model. The first 1 https://github.com/tensorflow/nmt three variations use machine learning approaches and the last two used the deep learning approach.

The three variations using machine learning approach are given below. – Model 1: Term frequency vectorizer – Model 2: TF-IDF vectorizer by ignoring the terms with document frequency less than 1 – Model 3: TF-IDF vectorizer by ignoring the terms with document frequency less than 2

For these machine learning approaches, the bag of word features are extracted from the training instances. We have used Scikit–learn machine learning library to vectorize the training instances and to implement the classifier for the relation extraction and classification task.

In the first variation, CountVectorizer of sklearn is used for vectorization. Term Frequency - Inverse Document Frequency (TF-IDF) is used for vectorization with min_df as 1 and 2 in the second and third variations respectively. min_df is to build the vocabulary by ignoring terms that have a document frequency lower than the given value. TfidfVectorizer of sklearn is used for these two variations. We have used neural network classifier with Stochastic Gradient Descent (SGD) optimizer to classify the relation labels.

The two variations using deep learning approach for RE are given below. – Model 4: 4 layer, uni-directional LSTM, with scaled-luong attention, 50,000 steps – Model 5: 8 layer, bi-directional LSTM, with scaled-luong attention, 75,000 steps

The development accuracy scores obtained for these two variations are given in Table 6.

It is observed from Table 6 that except for the “Telugu” language, bi-directional LSTM with attention having 8 layers depth performs well for all other languages in RE task. 4

Results

We have evaluated our models for the data set provided by IECSIL shared task. The results obtained for both NER and RE tasks are discussed in this section. 4.1

NER Results

We have obtained the accuracies as 94.41%, 95.23%, 95.97% and 96.02% for Model 1, Model 2, Model 3 and Model 4 respectively on the pre-evaluation test data.

We have submitted three runs based on our three models namely Model 4, Model 3 and Model 2 as Run 1, Run 2 and Run 3 respectively for the task. Table 8 shows the accuracies we have obtained for the final-evaluation test data using our three models. We have obtained the accuracies as 95.9%, 95.85% and 95.05% for Run 1, Run 2 and Run 3 respectively.

It is observed from Table 8 that bi-directional LSTM with attention having 8 layers depth works well for five Indian languages. However, increase in just step size does not show significant improvement.

Table 9 shows the F1-scores we have obtained for the final-evaluation test data using our three runs. This table shows the F1-score for the individual classes of all five languages. It is observed from the table that we have obtained less F1score for “datanum” class and high F1-score for “other” class. This may be due to low recall value for “datanum” class and low precision value for “other” class. Also, we have obtained a overall F1-score for all the classes is low for Kannada language. This is due to the size of the dataset which is lesser for Kannada when compared with all the other languages.

We have obtained the accuracies for the pre-evaluation test data as 56.19%, 60.74%, 60.7%, 75.43% and 79.11% for Model 1, Model 2, Model 3, Model 4 and Model 5 respectively.

We have submitted three runs based on our three models namely Model 5, Model 4 and Model 2 as Run 1, Run 2 and Run 3 respectively for the task. Table 11 shows the accuracies we have obtained for the final-evaluation test data using our three models. We have obtained the accuracies as 79.44%, 76.01% and 61.11% for Run 1, Run 2 and Run 3 respectively.

Tables 10 and 11 show that bi-directional LSTM with attention having 8 layers depth performs better for all the languages except “Telugu” language.

Table 12 and Table 13 show the F1-scores we have obtained for the finalevaluation test data using our three runs for relation extraction task. These tables show the F1-scores for the individual classes of all five languages in which “A_” and “I_” indicate “Action_” and “Information_” classes respectively. It is observed from the tables that we have obtained a overall F1-score for all the classes is very low for Kannada language when compared with all the other languages while we applied deep learning techniques. This is due to the size of the dataset which is lesser for Kannada language. However, Kannada language gives better F1-score than Hindi and Malayalam languages while we use machine learning algorithm. We have presented a deep learning approach for NER and RE in Indian languages namely “Hindi”, “Kannada”, “Malayalam”, “Tamil” and “Telugu”. We have used neural machine translation model to implement both NER and RE tasks. Our approach is a common approach that identifies and classifies the NEs into any of the generic classes namely name, occupation, location, things, organization, datenum, number and other, and also classifies the relations to any of the relation labels such as information_1, action_1, etc. We have evaluated four deep learning models for NER and two deep learning models for RE by varying the directionality, depth and number of training steps with and without attention mechanism called scaled-luong. To show the effectiveness of deep learning approach, we have also implemented three variations of traditional machine learning models for RE using bag of word features, term frequency / TF-IDF vectorizer and a neural network classifier with SGD optimizer having minimum difference as one and two to extract relations. We have evaluated these models using the data set given by IECSIL@FIRE2018 shared task for NER and RE. We have used the metric accuracy to measure the performance of different variations of our approach. For, NER task, we have obtained the accuracies as 94.41%, 95.23%, 95.97% and 96.02% for Model 1, Model 2, Model 3 and Model 4 respectively on pre-evaluation test data and 95.9%, 95.85% and 95.05% for Run 1, Run 2 and Run 3 respectively on final-evaluation test data. For RE task, we have obtained the accuracies as 56.19%, 60.74%, 60.7%, 75.43% and 79.11% for Model 1, Model 2, Model 3, Model 4 and Model 5 respectively on pre-evaluation test data, and 79.44%, 76.01% and 61.11% for Run 1, Run 2 and Run 3 respectively on final-evaluation test data. The performance may be improved further by incorporating different attention mechanisms, including more hidden layers, and increasing training steps.

1. Bahdanau , D. , Cho , K. , Bengio , Y.: Neural machine translation by jointly learning to align and translate . arXiv preprint arXiv:1409.0473 ( 2014 )

Barathi

Ganesh , H.B. , Soman , K.P. , Reshma , U. , Mandar , K. , Prachi , M. , Gouri , K. , Anitha , K. , Anand Kumar , M. : Information extraction for conversational systems in indian languages - arnekt iecsil . In: Forum for Information Retrieval Evaluation ( 2018 )

Barathi

Ganesh , H.B. , Soman , K.P. , Reshma , U. , Mandar , K. , Prachi , M. , Gouri , K. , Anitha , K. , Anand Kumar , M. : Overview of arnekt iecsil at fire-2018 track on information extraction for conversational systems in indian languages . In: FIRE (Working Notes) ( 2018 )

4. Chikka , V.R. , Karlapalem , K. : A hybrid deep learning approach for medical relation extraction . arXiv preprint arXiv: 1806 . 11189 ( 2018 )

5. Chiu , J.P. , Nichols , E.: Named entity recognition with bidirectional lstm-cnns . arXiv preprint arXiv:1511.08308 ( 2015 )

6. Cho , K. , Van Merriënboer , B. , Gulcehre , C. , Bahdanau , D. , Bougares , F. , Schwenk , H. , Bengio , Y. : Learning phrase representations using rnn encoder-decoder for statistical machine translation . arXiv preprint arXiv:1406.1078 ( 2014 )

7. Dugas , F. , Nichols , E.: Deepnnner: Applying blstm-cnns and extended lexicons to named entity recognition in tweets . In: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT) . pp. 178 - 187 ( 2016 )

8. Frunza , O. , Inkpen , D. , Tran , T.: A machine learning approach for identifying disease-treatment relations in short texts. Knowledge and Data Engineering , IEEE Transactions on 23 ( 6 ), 801 - 814 ( 2011 ). https://doi.org/10.1109/TKDE. 2010 .152

9. Lample , G. , Ballesteros , M. , Subramanian , S. , Kawakami , K. , Dyer , C. : Neural architectures for named entity recognition . arXiv preprint arXiv:1603.01360 ( 2016 )

10. Leng , J. , Jiang , P.: A deep learning approach for relationship extraction from interaction context in social manufacturing paradigm . Knowledge-Based Systems 100 , 188 - 199 ( 2016 )

11. Luong , M. , Brevdo , E. , Zhao , R.: Neural machine translation (seq2seq) tutorial . https://github.com/tensorflow/nmt ( 2017 )

12. Luong , M.T. , Pham , H. , Manning , C.D.: Effective approaches to attention-based neural machine translation . arXiv preprint arXiv:1508.04025 ( 2015 )

13. Lv , X. , Guan , Y. , Yang , J. , Wu , J.: Clinical relation extraction with deep learning . IJHIT 9 ( 7 ), 237 - 248 ( 2016 )

14. Paukkeri , M.S. , García-Plaza , A.P. , Fresno , V. , Unanue , R.M. , Honkela , T. : Learning a taxonomy from a set of text documents . Applied Soft Computing 12 ( 3 ), 1138 - 1148 ( 2012 ). https://doi.org/10.1016/j.asoc. 2011 . 11 .009

15. Peng , B. , Wu , J. , Yuan , H. , Guo , Q. , Tao , D. : Aneec: A quasi-automatic system for massive named entity extraction and categorization . The Computer Journal 56 ( 11 ), 1328 - 1346 ( 2013 ). https://doi.org/10.1093/comjnl/bxs114

16. Poon , H. , Domingos , P. : Unsupervised ontology induction from text . In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics . pp. 296 - 305 . ACL, United

States

, Uppsala, Sweden (July 11 -16 2010 )

17. Punuru , J. , Chen , J.: Learning non-taxonomical semantic relations from domain texts . Journal of Intelligent Information Systems 38 ( 1 ), 191 - 207 ( 2012 ). https://doi.org/10.1007/s10844-011-0149-4

18.

Senthil

Kumar , B. , Thenmozhi , D. : Named entity recognition in dravidian languages - state of the art . International Journal of Applied Engineering Research 10 ( 34 ), 27295 - 72300 ( 2015 )

19. Shen , Y. , Yun , H. , Lipton , Z.C. , Kronrod , Y. , Anandkumar , A. : Deep active learning for named entity recognition . arXiv preprint arXiv:1707.05928 ( 2017 )

20. Sinha , B. , Garg , M. , Chandra , S. : Identification and classification of relations for indian languages using machine learning approaches for developing a domain specific ontology . In: Computational Techniques in Information and Communication Technologies (ICCTICT) , 2016 International Conference on. pp. 415 - 420 . IEEE ( 2016 )

21. Sutskever , I. , Vinyals , O. , Le , Q.V. : Sequence to sequence learning with neural networks . In: Advances in neural information processing systems . pp. 3104 - 3112 ( 2014 )

22. Thenmozhi , D. , Aravindan , C. : An automatic and clause based approach to learn relations for ontologies . The Computer Journal 59 ( 6 ), 889 - 907 ( 2016 )

23. Velardi , P. , Faralli , S. , Navigli , R.: Ontolearn reloaded: A graph-based algorithm for taxonomy induction . Computational Linguistics 39 ( 3 ), 665 - 707 ( 2013 )

24. Wang , W. , Barnaghi , P. , Bargiela , A. : Probabilistic topic models for learning terminological ontologies. Knowledge and Data Engineering , IEEE Transactions on 22 ( 7 ), 1028 - 1040 ( 2010 ). https://doi.org/10.1109/TKDE. 2009 .122

25. Weichselbraun , A. , Wohlgenannt , G. , Scharl , A. : Refining non-taxonomic relation labels with external structured data to support ontology learning . Data & Knowledge Engineering 69 ( 8 ), 763 - 778 ( 2010 ). https://doi.org/10.1016/j.datak. 2010 . 02 .010

26. Zouaq , A. , Nkambou , R.: Evaluating the generation of domain ontologies in the knowledge puzzle project. Knowledge and Data Engineering , IEEE Transactions on 21 ( 11 ), 1559 - 1572 ( 2009 ). https://doi.org/10.1109/TKDE. 2009 .25