-

1613-0073

Clasi cacion conjunta de frases clave y sus relaciones en documentos electronicos de salud en espan~ol

Salvador Medina

smedina@cs.upc.edu 0

Jordi Turmo

turmo@cs.upc.edu 0 0 TALP Research Center - Universitat Politecnica de Catalunya

83 88

This paper describes the approach presented by the TALP team for Task 3 of TASS-2018 : a convolutional neural network to jointly deal with classi cation of key-phrases and relationships in eHealth documents written in Spanish. The results obtained are promising as we ranked in rst place in scenarios 2 and 3.

This article describes the model presented by the TALP Team for solving B and C subtasks of Task 3 in the Taller de Analisis Semantico en la SEPLN 2018 (TASS-2018) (Mart nez-Camara et al., 2018) . TASS-2018's Task 3 consists in recogniting and classifying key-phrases as well as identifying the relationships between them in Electronic Health Documents (i.e., eHealth documents) written in Spanish. Task 3 is divided in sub-tasks A, B and C, which correspond to key-phrase boundary recognition, key-phrase classi cation and relation detection, respectively.

In this task, a key-phrase stands for any sub-phrase included in eHealth documents that is relevant from the clinical viewpoint and can be classi ed into Concept or Action. The relationships between them are classi ed into 6 types: 4 of them are between Concepts (is-a, part-of, property-of and same-as) while the rest are between an Action and another key-phrase (subject and target ). The proposed task is similar to previous competitions such as Semeval-2017 Task 10: ScienceIE (Gonzalez-Hernandez et al., 2017) , but uses a simpler categorization for key-phrases while considering a broader range of possible relationships.

Participants in the Semeval-2017 Task 10: ScienceIE (Gonzalez-Hernandez et al., 2017) shared task considered a large plethora of supervised learning models, ranging from Convolutional or Recurrent Neural Networks to Support Vector Machines, Conditional Random Fields and even rule-based systems, often applying radically di erent models for each one of the three sub-tasks. Note that some of the teams did not participate in all three sub-tasks, this was in fact the case for the winners of sub-tasks BC (MayoNLP (Liu et al., 2017) ) and C (MIT (Lee, Dernoncourt, and Szolovits, 2017) ). 1.1

Joint classi cation of key-phrases and relationships

In our implementation we tackle both the classi cation of key-phrases and the identi

cation of the relationships between them, corresponding to scenarios 2 and 3 of TASS 2018's Task 3, as a single task. The intuition behind this decision is that the categories of key-phrases are in uenced by the relationships they hold with other key-phrases. For instance, a verb is an Action key-phrase if and only if it relates to another Action or Concept by either being the subject or target, which means that sometimes phrases are

Implementation

The architecture that we propose is represented in Figure 1 and consists of a two-layer Convolutional Neural Network (CNN) which takes a vectorial representation of the documents and the position of two key-phrases as input and applies several convolution lters for window sizes from 1 to 4 tokens. The outputs of these lters are then max-pooled and fed to a fully connected output layer, which has two outputs for the given key-phrase pairwise: the probabilities of either key-phases for being Action or Concept, and the probabilities of the pairwise for being each possible kind of relationship, including \other" for no relationship.

At rst glance, our architecture is similar to the one proposed by the MIT team for the ScienceIE task, which also consists of a CNN using word-embedding, relative position and PoS-tags as input features. However, it presents some noticeable di erences. First of all, our architecture jointly tackles sub-tasks B and C. For this reason, it does not take the key-phrase category as an input and has two additional outputs which hold the source and destination key-phrases' classes. Moreover, we optimize all three outputs at the same time and consequently our loss function is designed to re ect this. 2.1

Layout of the network and parameter optimization

Arti cial Neural Networks (ANN) and more speci cally CNNs have proven to be capable of jointly identifying entities and relationships in various kinds of textual documents and relation extraction tasks, as it has been demonstrated in recent articles such as (Singh et al., 2013) , (Shickel et al., 2017) and (Li et al., 2017) . This joint identi cation takes advantage of the correlation that exists between linked entities aiming to provide better results for both named entity recognition classi cation and relation extraction tasks respect to a classical two-step system.

The loss function used by the parameter optimization algorithm is computed independently for the three outputs using soft-max cross-entropy, as classes are mutually exclusive for a single output, and is then combined by just adding the three losses. By adopting these three independent loss functions we can take pro t of the fact that output classes for a single output are mutually exclusive and make their probabilities add up to one, independently of the other two outputs.

As for the optimization algorithm, we use TensorFlow's Adam optimizer with a learning rate of 0;005. The system was trained in batches of 128 sentences which were previously stripped to up to 50 tokens and padded. We also apply a dropout rate of 0;5 to the fully-connected output layer for regularization purposes. The parameter optimization process is stopped either when the average loss in the development corpus remains at for 1000 iterations or when 1e5 iterations have been run. 2.2

Input parameters and encoding

In order to come up with a manageable vectorial representation of the input sentences, they are previously tokenized using FreeLing 's with multi-word and quantity detection as well as Named Entity Classi cation (NEC) modules disabled, so that multiple tokens are never joined together. These tokens are then passed through a lookup table containing their pre-computed word-embeddings vectors, which are then joined one-hot encodings of the relative positions respect to the target source and destination key-phrases and their respective Part-of-Speech (PoS) tag determined by FreeLing 's PoS-Tagger module. A more detailed description of the input properties is listed below:

Word-Embedding: 300-dimension vectorial representation of words in word2vec format. We used the pretrained general-purpose vectors from SBWCE (Cardellino, 2016) , trained from multiple sources.

Distance to source or destination key-phrase: One-hot encoding of the distance respect to the key-phrases. We consider two types of distances: absolute distance in terms of the number of tokens between each token and key-phrase and number of arcs in the dependency tree between each token and key-phrase, not taking into account the dependency class. The latter option was nally selected as it yielded better results in the Word-Embedding e s rhaP tree

Figura 1: Layout of the proposed Convolutional Neural Network architecture validation corpus.

Part-of-Speech tag: One-hot encoding of the token's PoS-tag determined by FreeLing. For simplicity, we only contemplate the category and type positions of the PoS-tag, hence reducing the number of di erent tags to 33.

2.3 Data augmentation

Relation extraction is a di cult task and usually requires big amounts of training examples in order to be able to correctly generalize the relationship classes. This is specially so for ANN based models, which can be prone to over- tting. The training corpus that was provided for the TASS-2018 is very limited and classes are considerably unbalanced. To give an example, it only includes 30 instances of class same-as compared to the 911 examples provided for class target.

Because of this, we evaluated several data augmentation alternatives, which added slight modi cations of the original training instances to the training set. These modi cations included replacing some or all keyphrases by their class name or other keyphrases in the training corpus, or trimming the sentences removing some of their tokens. The alternative that worked best in the validation corpus and was used in the nal model was to trim the context before and after the relationships to 1 and 3 tokens. For instance, in sentence \Un ataque de asma se produce cuando los s ntomas empeoran.", the target relationship between produce and ataque de asma, adds \Un ataque de asma se produce cuando" and \Un ataque de asma se produce cuando los s ntomas ", as well as the full sentence. 3

Results

As it can be seen in Table 1, our model scored rst in the evaluation scenarios 2 and 3, which evaluate sub-tasks BC and C respectively. As it was mentioned in Section 1, our system was designed for sub-tasks B and C, so no submission was sent for scenario 1, which also evaluates sub-task A. In terms of the individual sub-tasks, our system raked rst for sub-task C but was outperformed by rriveraz 's model in sub-task B. 3.1

Analysis of errors

In this Subsection, we analyze the errors made by our model in Scenarios 2 and 3. Tables 2 and 3 show the confusion matrices for subtasks B and C in the evaluation of Scenario 2. Results for sub-task C in Scenario 3 are analogous to Scenario 2 and are not shown, as our model does not make use of the additional information given in Scenario 2. 3.1.1 Sub-Task B Table 2 shows the confusion matrix for subtask B in Scenario 2. Our model achieves siScenario plubeda rriveraz upf upc baseline

Marcelo Tabla 1: Micro-averaged F 1 score for evaluation scenarios 1 to 3 and global average. TALP column shows our model's score. N/A*: Not Available, counted as 0 in the average score. milar precision for classes Concept and Action, but recall for the latter is 0.205 smaller.

This is not only due to the fact that classes are unbalanced (439 and 154 instances of classes Concept and Action respectively), but also to other reasons listed below:

The Shared-Task's description de nes Actions as a particular kind of Concept that modi es another concept. Consequently, in some cases, the same phrase can either be an Action or a Concept depending on whether or not the modi

ed Concept is explicitly mentioned. As an illustration, the noun causa (cause) is labeled as a Concept in sentence \El tratamiento depende de la causa." (The threatment depends on the cause.). However, in sentence \Es una causa comun de sordera." (It is a common cause of deafness.), it is labeled as Action, as it is supposed to modify sordera (deafness).

Errors which were in part due to incorrect dependency parsing or PoS-tagging by FreeLing, specially when verbs are identi ed as nouns.

For example, the noun o do (ear) was identi ed as a verb in sentence \Suele afectar solo un o do." (It usually a ects just one ear.) by FreeLing, which lead to confusion. Similarly, in sentence \Esto causa una acumulacion de sustancias grasosas en el bazo, h gado, pulmones, huesos y, a veces, en el cerebro. (This causes an accumulation of fatty substances in the arm, liver, lungs, bones and, sometimes, the brain.), causa (causes) is incorrectly labeled as a noun.

Other instances where it is di cult to determine the label assigned to the entity, even for us, as they do not seem to correspond to any of the criteria exposed in the description.

For instance, in sentence \Si usted ya tiene diabetes, el mejor momento para controlar su diabetes es antes de quedar embarazada." (If you alredy have diabetes, the best moment to control your diabetes is before getting pregnant.), the adverb antes (before) is labeled as Action and is related to controlar (control, keep) and quedar (get, become) as subject and target respectively.

On the other hand, in sentence \La exposicion al arsenico puede causar muchos problemas de salud." (The exposition to arsenic can cause several health problems), the noun exposicion (exposition) is labeled as Concept, while we understand it as the Action of being exposed to something. This is not coherent to other instances such as \No se conoce la causa de la destruccion celular." (The cause of cell destruction is not known.), where destruction is labeled as Action the Action of being destroyed.

Sub-Task C Table 3 shows the confusion matrix for subtask C in Scenario 2. Class other is used for all pairs of entities that have no speci ed relationship in the training set, making it the most frequent class in the training set. The model seems to prioritize precision over recall, which vary from class to class. Recall and precision for same-as, although 0;000, are not signi cant, as just one instance is present in the test set. The list below describes multiple reasons for the most common errors produced by our model: truenpred.

other

is-a part-of property-of same-as subject

target precision r e h t o Tabla 3: Confusion matrix, precision and recall of our model's predictions for sub-task C in scenario 2. F1 is micro-averaged for all classes.

Annotated instances in both training and test sets are unbalanced. Relationship counts in the training set range from 991 for target and 693 for subject to 149 and 30 for part-of and same-as respectively. What is more, the auxiliary class other amounts to 16478 instances. More instances for the two less common classes seem to be required, as the model achieves much lower recall and precision than the most common ones.

Relationships subject and target are prone to be mutually confused, specially for re exive or passive verbs, and labeling is not always coherent. For example, in \Algunos sarpullidos se desarrollan inmediatamente." (Some skin rashes are developed immediately.), sarpullidos (skin rashes) is subject of se desarrollan (are developed). However, in sentence \Existen muchas razones para someterse a una cirug a." (There are several reasons to have surgery.), razones (reasons) is target of existen (there are).

Multi-label relationships were not considered by our model, as we did not realize instances such as Durante cada trimestre, el feto crece y se desarrolla. (During each quarter, the fetus grows and develops.), where the relationships between feto (fetus) and crece (grows), and similarly between feto and se desarrolla (develops), are both target and subject.

Errors due to incorrect parsing by FreeLing, which were already discussed in Section 3.1.1.

Conclusions and future work

In this paper, we have described the model presented by the TALP team for Task 3 of TASS-2018. In addition we have presented some reasons for our model to wrongly classify key-phrases and relationships.

The results achieved by our model when compared to the rest of the challengers prove that a model that jointly classi es entities and relations can outperform traditional twostep systems in tasks where some entity classes are de ned by the relationships they hold with others. There is however a big room for improvement, specially in the relation extraction task, mainly due to the increased complexity and the limited amount of examples available in the training set.

Our model was designed to solve the keyphrase classi cation and relation extraction tasks, leaving the key-phrase recognition as future work, as our focus was joint recognition and we did not have enough time to design and optimize a single model that could tackle all three tasks. We are committed to continue this line of investigation and extend the architecture so that it is also able to determine the key-phrases' boundaries.

Additionally, there are several improvements that could be applied to the current model, that we realized after analyzing the currently most common errors. To begin with, our model should allow for multi-label relation extraction, as mentioned in Section 3.1.2. Second, more syntactical features could be added, by for instance providing a complete and more appropriate encoding of the PoS-tags or by including not only the dependency tree distances but also the types.

Acknowledgments

This works has been partially funded by the Spanish Goverment and by the European Union through GRAPHMED project (TIN2016-77820-C3-3-R and AEI/FEDER,UE.)

Cardellino , C.

2016 . Spanish Billion Words Corpus and Embeddings, March.

Gonzalez-Hernandez , G. ,

Sarker , K. O'Connor , and G. Savova . 2017 . Capturing the patient's perspective: a review of advances in natural language processing of health-related text . Yearbook of medical informatics , 26 ( 01 ): 214 { 227 .

Lee , J. Y. ,

Dernoncourt , and

Szolovits . 2017 . Mit at semeval-2017 task 10: Relation extraction with convolutional neural networks . arXiv preprint arXiv:1704 . 01523 .

Li , F. ,

Zhang , G. Fu, and

Ji . 2017 . A neural joint model for entity and relation extraction from biomedical text . BMC bioinformatics , 18 ( 1 ): 198 .

Liu , S. ,

Shen ,

Chaudhary , and

Liu . 2017 . Mayonlp at semeval 2017 task 10: Word embedding distance pattern for keyphrase classi cation in scienti c publications . In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) , pages 956 { 960 .

Mart nez-Camara, E., Y.

Almeida-Cruz , M. C.

D az-

Galiano , S.

Estevez-Velarde , M. A.

Garc a-Cumbreras, M. Garc aVega, Y.

Gutierrez , A.

Montejo-Raez , A.

Montoyo , R.

Mun

~oz, A. Piad-Mor s, and

Villena-Roman . 2018 . Overview of TASS 2018: Opinions, health and emotions . In E. Mart nez-Camara , Y. Almeida Cruz , M. C. D azGaliano , S. Estevez

Velarde , M. A.

Garc a-Cumbreras, M.

Garc a-Vega, Y.

Gutierrez Vazquez , A. Montejo

Raez , A.

Montoyo

Guijarro

Mun ~oz Guillena, A. Piad Mor s, and J. Villena-Roman, editors, Proceedings of TASS 2018: Workshop on Semantic Analysis at SEPLN (TASS 2018 ), volume 2172 of CEUR Workshop Proceedings , Sevilla, Spain, September. CEUR-WS.

Shickel , B. ,

P. J.

Tighe ,

Bihorac , and

Rashidi . 2017 . Deep ehr: A survey of recent advances in deep learning techniques for electronic health record (ehr) analysis . IEEE Journal of Biomedical and Health Informatics.

Singh , S. ,

Riedel ,

Martin ,

Zheng , and

McCallum . 2013 . Joint inference of entities, relations, and coreference . In Proceedings of the 2013 workshop on Automated knowledge base construction , pages 1{6 . ACM.