=Paper=
{{Paper
|id=Vol-2172/p7-labda_tass2018
|storemode=property
|title=LABDA at TASS-2018 Task 3: Convolutional Neural Networks for Relation Classification in Spanish eHealth documents
|pdfUrl=https://ceur-ws.org/Vol-2172/p7-labda_tass2018.pdf
|volume=Vol-2172
|authors=Víctor Suárez-Paniagua,Isabel Segura-Bedmar,Paloma Martínez
|dblpUrl=https://dblp.org/rec/conf/sepln/Suarez-Paniagua18
}}
==LABDA at TASS-2018 Task 3: Convolutional Neural Networks for Relation Classification in Spanish eHealth documents==
TASS 2018: Workshop on Semantic Analysis at SEPLN, septiembre 2018, págs. 71-76
LABDA at TASS-2018 Task 3: Convolutional Neural
Networks for Relation Classification in Spanish
eHealth documents
LABDA en TASS-2018 Task 3: Redes Neuronales
Convolucionales para Clasificación de Relaciones en
documentos electrónicos de salud en español
Vı́ctor Suárez-Paniagua1 , Isabel Segura-Bedmar2 , Paloma Martı́nez3
Computer Science Department
Carlos III University of Madrid
Leganés 28911, Madrid, Spain
1
vspaniag,2 isegura,3 pmf@inf.uc3m.es
Resumen: Este trabajo presenta la participación del equipo LABDA en la subta-
rea de clasificación de relaciones entre dos entidades identificadas en documentos
electronicos de salud (eHealth) escritos en español. Usamos una Red Neuronal Con-
volucional con el word embedding y el position embedding de cada palabra para
clasificar el tipo de la relación entre dos entidades de la oración. Anteriormente, este
método de aprendizaje automático ya ha mostrado buen rendimiento para capturar
las caracterı́sticas relevantes en documentos electronicos de salud los cuales descri-
ben relaciones. Nuestra arquitectura obtuvo una F1 de 44.44 % en el escenario 3 de la
tarea, llamada como Setting semantic relationships. Solo cinco equipos presentaron
resultados para la subtarea. Nuestro sistema alcanzó el segundo F1 más alto, siendo
muy similar al resultado más alto (micro F1=44.8 %) y más alto que el resto de los
equipos. Una de las principales ventajas de nuestra aproximación es que no requiere
ningún recurso de conocimiento externo como caracterı́sticas.
Palabras clave: Extraccion de relaciones, aprendizaje profundo, redes neuronales
convolucionales, textos biomédicos
Abstract: This work presents the participation of the LABDA team at the subtask
of classification of relationships between two identified entities in electronic health
(eHealth) documents written in Spanish. We used a Convolutional Neural Network
(CNN) with the word embedding and the position embedding of each word to clas-
sify the type of the relation between two entities in the sentence. Previously, this
machine learning method has already showed good performance for capturing the
relevant features in electronic health documents which describe relationships. Our
architecture obtained an F1 of 44.44 % in the scenario 3 of the shared task, named
as Setting semantic relationships. Only five teams submitted results for this subtask.
Our system achieved the second highest F1, being very similiar to the top score (mi-
cro F1=44.8 %) and higher than the remainig teams. One of the main advantage of
our approach is that it does not require any external knowledge resource as features.
Keywords: Relation Classification, Deep Learning, Convolutional Neural Network,
biomedical texts
1 Introduction tract and analyse information from electronic
health (eHealth) documents can significantly
Nowadays, there is a high increase in the reduce the workload of doctors.
publication of scientific articles every year,
which demonstrates that we are living in an The TASS workshop proposes shared
emerging knowledge era. This explosion of tasks on sentiment analysis in Spanish each
information makes it nearly impossible for year. Concretely, the goal of TASS-2018 Task
doctors and biomedical researchers to keep 3 (Martı́nez-Cámara et al., 2018) is to create
up to date with the literature in their fields. a competition where Natural Language Pro-
The development of automatic systems to ex- cessing (NLP) experts can train their sys-
ISSN 1613-0073 Copyright © 2018 by the paper's authors. Copying permitted for private and academic purposes.
Víctor Suárez-Paniagua, Isabel Segura-Bedmar y Paloma Martínez
tems for extracting the relevant information 1,012 relations and 1,385 roles, the develop-
in Spanish eHealth documents and evaluate ment set contains another 285 sentences. The
them in a objective and fair way. dataset contains 3,276 entities and 1,012 re-
Recently, Deep Learning has had a big im- lations and 1,385 roles in the train set, the
pact on NLP tasks becoming the state-of-the- development set contains 285 sentences. A
art technique. Convolutional Neural Network detailed description of the method used to
(CNN) is a Deep Learning architecture which collect and process documents can be found
has shown good performance in Computer in (Martı́nez-Cámara et al., 2018).
Vision task such as image classification (Kriz- Unlike the other two previous subtasks,
hevsky, Sutskever, y Hinton, 2012) and face the documents include annotated entities
recognition (Lawrence et al., 1997). with boundaries and types. In this way, it
The system described in (Kim, 2014) was is possible to measure and compare the diffe-
the first work to use a CNN for a NLP task. rent approaches only focusing on the goal of
It created a vector representation for each the subtask C.
sentence by extracting the relevant informa-
tion with different filters in order to clas- 2.1 Pre-processing phase
sify them into predefined categories obtai- As some of the relationships types are asym-
ning good results. In addition, CNN was used metrical, for each pair of entities marked
with good performance for relation classifica- in the sentence, we generate two instances.
tion between nominals in the work of (Zeng Thus, a sentence with n entities will have
et al., 2014). Furthermore, this architectu- (n − 1) × n instances. Each instance is la-
re has been also used in the biomedical do- belled with one of the six classes is-a, part-
main for the extraction of drug-drug inter- of, property-of, same-as, subject and target.
actions in (Suárez-Paniagua, Segura-Bedmar, In addition, a None class is also considered
y Martı́nez, 2017a). This system did not for the non-relationship between the entities.
require any external biomedical knowledge Due to the fact that there are some over-
in order to provide very close results to lapped entities, we consider each sentence as
those obtained using lots of hand-crafted a graph where the vertices are the entities
features. We also employed the same ap- and the edges are the non-overlapped entities
proach of (Suárez-Paniagua, Segura-Bedmar, with itself in order to obtain recursively all
y Martı́nez, 2017b), which was used for ex- the possible paths without overlapping, thus
tracting relationships between keyphrases in we have different instances for each overlap-
the Semeval-2017 Task 10: ScienceIE (Au- ped entities. Table 2 shows the resulting num-
genstein et al., 2017), which proposed very ber of instances for each class on the train,
similar subtasks than those defined in TASS- validation and test sets.
2018 Task 3. Label Train Validation Test
In this work, we describe the participation
is-a 238 299 41
of the LABDA at the subtask C in the classifi-
cation of relationships between two identified part-of 222 171 36
entities in Spanish documents about health. property-of 600 366 84
In this subtask, the test dataset includes the same-as 42 19 8
text, the boundaries and the types of their subject 1018 636 206
entities to generate the prediction. target 1510 988 308
None 27112 20631 5265
2 Dataset
Tabla 2: Number of instances for each rela-
The task provides an annotated corpus from tionship type in each dataset: train, valida-
MedlinePlus documents which is divided into tion and test.
the training set for the learning step, develop-
ment set for the validation and test set for the After that, we tokenize and clean the
evaluation of the systems. sentences following a similar approach as
The relationship between entities defined that described in (Kim, 2014), converting
as concepts are: is-a, part-of, property-of and the numbers to a common name, words to
same-as. There are also relationships defined lower-case, replacing special Spanish accents
as roles: subject and target. The training set to Unicode, e.g ñ to n, and separating spe-
contains 559 sentences with 3,276 entities, cial characters with white spaces by regular
72
LABDA at TASS-2018 Task 3: Convolutional Neural Networks for Relation Classification in Spanish eHealth Documents
Relationship between entities Instances after entity blinding Label
(ataque de asma → produce) ’un entity1 se entity2 cuando los entity0 entity0 .’ None
(ataque de asma ← produce) ’un entity2 se entity1 cuando los entity0 entity0 .’ target
(ataque de asma → sı́ntomas) ’un entity1 se entity0 cuando los entity2 entity0 .’ None
(ataque de asma ← sı́ntomas) ’un entity2 se entity0 cuando los entity1 entity0 .’ None
(ataque de asma → empeoran) ’un entity1 se entity0 cuando los entity0 entity2 .’ None
(ataque de asma ← empeoran) ’un entity2 se entity0 cuando los entity0 entity1 .’ None
(produce → sı́ntomas) ’un entity0 se entity1 cuando los entity2 entity0 .’ None
(produce ← sı́ntomas) ’un entity0 se entity2 cuando los entity1 entity0 .’ None
(produce → empeoran) ’un entity0 se entity1 cuando los entity0 entity2 .’ subject
(produce ← empeoran) ’un entity0 se entity2 cuando los entity0 entity1 .’ None
(sı́ntomas → empeoran) ’un entity0 se entity0 cuando los entity1 entity2 .’ None
(sı́ntomas ← empeoran) ’un entity0 se entity0 cuando los entity2 entity1 .’ target
(asma → produce) ’un ataque de entity1 se entity2 cuando los entity0 entity0 .’ None
(asma ← produce) ’un ataque de entity2 se entity1 cuando los entity0 entity0 .’ None
(asma → sı́ntomas) ’un ataque de entity1 se entity0 cuando los entity2 entity0 .’ None
(asma ← sı́ntomas) ’un ataque de entity2 se entity0 cuando los entity1 entity0 .’ None
(asma → empeoran) ’un ataque de entity1 se entity0 cuando los entity0 entity2 .’ None
(asma ← empeoran) ’un ataque de entity2 se entity0 cuando los entity0 entity1 .’ None
(produce → sı́ntomas) ’un ataque de entity0 se entity1 cuando los entity2 entity0 .’ None
(produce ← sı́ntomas) ’un ataque de entity0 se entity2 cuando los entity1 entity0 .’ None
(produce → empeoran) ’un ataque de entity0 se entity1 cuando los entity0 entity2 .’ subject
(produce ← empeoran) ’un ataque de entity0 se entity2 cuando los entity0 entity1 .’ None
(sı́ntomas → empeoran) ’un ataque de entity0 se entity0 cuando los entity1 entity2 .’ None
(sı́ntomas ← empeoran) ’un ataque de entity0 se entity0 cuando los entity2 entity1 .’ target
Tabla 1: Instances with two different entities relationship after the pre-processing phase with
entity blinding of the sentence ’Un ataque de asma se produce cuando los sı́ntomas empeoran.’.
expressions. 3 CNN model
Furthermore, the two target entities of In this section, we present the CNN archi-
each instance are replaced by the labels ”en- tecture which is used for the task of relation
tity1 ”, ”entity2 ”, and by ”entity0 ”for the re- extraction in electronic health documents. Fi-
maining entities. This method is known as gure 2 shows the entire process of the CNN
entity blinding, and supports the generaliza- starting from a sentence with marked entities
tion of the model. For instance, the sentence to return the prediction.
in Figure 1: ’Un ataque de asma se produ-
ce cuando los sı́ntomas empeoran.’ with the 3.1 Word table layer
entities ataque de asma, asma, produce, sı́nto- After the pre-processing phase, we created an
mas and empeoran should be transformed to input matrix suitable for the CNN architec-
the relation instances showed in Table 1. ture. The input matrix should represent all
training instances for the CNN model; there-
fore, they should have the same length. We
determined the maximum length of the sen-
tence in all the instances (denoted by n), and
Figura 1: Relationships and entities in the then extended those sentences with lengths
sentence ’Un ataque de asma se produce shorter than n by padding with an auxiliary
cuando los sı́ntomas empeoran.’. token ”0 ”.
Moreover, each word has to be represented
by a vector. To do this, we randomly initia-
We observed that there are some instan- lized a vector for each different word which
ces that involve relationships between an en- allows us to replace each word by its word
tity and its overlapped entity, for this reason, embedding vector: We ∈ R|V |×me where V
we remove them from the dataset because we is the vocabulary size and me is the word
can not deal with these relations in the en- embedding dimension. Finally, we obtained a
tity blinding process. Moreover, there are re- vector x = [x1 ; x2 ; ...; xn ] for each instance
lationships with more than one label, in this where each word of the sentence is represen-
case, we take just one label because our sys- ted by its corresponding word vector from the
tem is not able to cope with a multi-class word embedding matrix. We denote p1 and
problem. p2 as the positions in the sentence of the two
73
Víctor Suárez-Paniagua, Isabel Segura-Bedmar y Paloma Martínez
Un ataque de asma<\e1> se produce<\e2> cuando los síntomas<\e0> empeoran<\e0>.
Preprocessing
We Wd1 Wd2
2n-1
|V|
md md
me
Position Position
Word embeddings embeddings embeddings
Un
entity1
se
entity2
cuando
los
entity0
entity0
n . n-w+1 m k
0
…
Ws o
w
0 z
0 S
X
Look-up table layer Convolutional layer Pooling Softmax layer
layer with dropout
Figura 2: CNN model for the Setting semantic relationships subtask of TASS-2018-Task 3.
entities to be classified. that in Figure 2, we represent the total num-
The following step involves calculating the ber of filters, denoted by m, with the same
relative position of each word to the two can- size w in a matrix S ∈ R(n−w+1)×m . However,
didate entities as i − p1 and i − p2 , where i the same process can be applied to filters with
is the word position in the sentence (padded different sizes by creating additional matrices
word included), in the same way as (Zeng et that would be concatenated in the following
al., 2014). In order to avoid negative values, layer.
we transformed the range (−n + 1, n − 1) to
the range (1, 2n − 1). Then, we mapped these 3.3 Pooling layer
distances into a real value vector using two In this layer, the goal is to extract the most
position embeddings Wd1 ∈ R(2n−1)×md and relevant features of each filter using an aggre-
Wd2 ∈ R(2n−1)×md . Finally, we created an gating function. We used the max function,
input matrix X ∈ Rn×(me +2md ) which is re- which produces a single value in each filter
presented by the concatenation of the word as zf = max{s} = max{s1 ; s2 ; ...; sn−w+1 }.
embeddings and the two position embeddings Thus, we created a vector z = [z1 , z2 , ..., zm ],
for each word in the instance. whose dimension is the total number of filters
m representing the relation instance. If the-
3.2 Convolutional layer re are filters with different sizes, their output
Once we obtained the input matrix, we ap- values should be concatenated in this layer.
plied a filter matrix f = [f1 ; f2 ; ...; fw ] ∈
Rw×(me +2md ) to a context window of size w 3.4 Softmax layer
in the convolutional layer to create higher Prior to performing the classification, we per-
level features. For each filter, we obtained formed a dropout to prevent overfitting. We
a score sequence s = [s1 ; s2 ; ...; sn−w+1 ] ∈ obtained a reduced vector zd , randomly set-
R(n−w+1)×1 for the whole sentence as ting the elements of z to zero with a probabi-
w
lity p following a Bernoulli distribution. After
si = g(
X
fj xTi+j−1 + b) that, we fed this vector into a fully connec-
ted softmax layer with weights Ws ∈ Rm×k
j=1
to compute the output prediction values for
where b is a bias term and g is a non-linear the classification as o = zd Ws + d where d
function (such as tangent or sigmoid). Note is a bias term; we have k = 6 classes in the
74
LABDA at TASS-2018 Task 3: Convolutional Neural Networks for Relation Classification in Spanish eHealth Documents
Label Correct Missing Spurious Precision Recall F1
is-a 8 61 8 50 % 11.59 % 18.82 %
part-of 5 27 5 50 % 15.63 % 23.81 %
property-of 9 53 12 42.86 % 14.52 % 21.69 %
same-as 1 4 0 100 % 20 % 33.33 %
subject 50 87 37 57.47 % 36.5 % 44.64 %
target 113 99 72 61.08 % 53.3 % 56.93 %
Scenario 3 186 331 134 58.12 % 35.98 % 44.44 %
Tabla 3: Results over the test set using a CNN with position embedding.
dataset and the ”Noneçlass. At test time, the Parameter Value
vector z of a new instance is directly classified Maximal length in the dataset, n 38
by the softmax layer without a dropout. Word embeddings dimension, Me 300
Position embeddings dimension, Md 10
3.5 Learning Filter window sizes, w 3, 4, 5
For the training phase, we need to learn the Filters for each window size, m 200
CNN parameter set θ = (We , Wd1 , Wd2 , Dropout rate, p 0.5
Ws , d, Fm , b), where Fm are all of the m Non-linear function, g ReLU
filters f. For this purpose, we used the condi- Mini-batch size 50
tional probability of a relation r obtained by Learning rate 0.001
the softmax operation as
exp(or ) Tabla 4: The CNN model parameters and
p(r|x, θ) = Pk their values used for the results.
l=1 exp(ol )
to minimize the cross entropy function for all Table 3 shows the results of the CNN con-
instances (xi ,yi ) in the training set T as fo- figuration with position embeddings. We ob-
llows serve that the number of Missing is very high.
XT This may be due to the fact that the dataset
J(θ) = log p(yi |xi , θ) is very unbalanced and these instances are
i=1 classified as None by the system. In fact, we
In addition, we minimize the objective fun- see that the classes that are more represen-
ction by using stochastic gradient descent tative have better Recall. To solve this pro-
over shuffled mini-batches and the Adam up- blem we propose to use sampling techniques
date rule (Kingma y Ba, 2014) to learn the to increase the number of instances of the less
parameters. representative classes.
Only five teams submitted results for this
4 Results and Discussion subtask. Our system achieved the second hig-
hest F1, being very similiar to the top sco-
The CNN model was training with the trai- re (micro F1=44.8 %), but very much higher
ning set and we obtained the best values of than the other teams, which are bellow than
each parameters fine-tuning them on the va- 11 % of F1. One of the main advantage of our
lidation set (see Table 4). approach is that it does not require any ex-
The results were measured with precision ternal knowledge resource.
(P), recall (R) and F1, defined as:
C C P ×R
5 Conclusions and Future work
P = R= F1 = 2 In this paper, we propose a CNN model for
C +S C +M P +R
the subtask C (Setting semantic relations-
where Correct (C) are the relations that mat- hips) of the TASS-2018 Task 3. The official
ched to the test set and the prediction, Mis- results for this model show that the CNN is
sing (M) are the relations that are in the test a very promising system because neither ex-
set but not in the prediction, and Spurious pert domain knowledge nor external features
(S) are the relations that are in the predic- are needed. The configuration of the architec-
tion but not in the test set. ture is very simple with a basic preprocessing
75
Víctor Suárez-Paniagua, Isabel Segura-Bedmar y Paloma Martínez
adapted for Spanish documents. Krizhevsky, A., I. Sutskever, y G. E. Hinton.
The results show that the system produ- 2012. Imagenet classification with deep
ces very many false negatives. We think that convolutional neural networks. En Advan-
this may be due to the unbalanced nature of ces in Neural Information Processing Sys-
the dataset. To solve this problem, we propo- tems 25, páginas 1097–1105. Curran As-
se to use oversampling techniques to increa- sociates, Inc.
se the number of instances of the less repre-
Lawrence, S., C. L. Giles, A. C. Tsoi, y
sentative classes. Our system also seems to
A. D. Back. 1997. Face recognition:
have difficulties in order to distinguish the
a convolutional neural-network approach.
directionality of the relationships. For these
IEEE Transactions on Neural Networks,
reasons, we will use more complex settings of
8(1):98–113, Jan.
the architecture for tackling the directiona-
lity problem. Martı́nez-Cámara, E., Y. Almeida-Cruz,
Moreover, we plan to use external features M. C. Dı́az-Galiano, S. Estévez-Velarde,
as part of the embeddings such as the entity M. A. Garcı́a-Cumbreras, M. Garcı́a-
labels given by the second subtask, the Part- Vega, Y. Gutiérrez, A. Montejo-Ráez,
of-Speech (PoS) tags and the dependency ty- A. Montoyo, R. Muñoz, A. Piad-
pes of each word for the Spanish documents Morffis, y J. Villena-Román. 2018.
in order to increase the information of each Overview of TASS 2018: Opinions,
sentence. We want to explore in detail each health and emotions. En E. Martı́nez-
feature contribution and the fine-tune all the Cámara Y. Almeida Cruz M. C. Dı́az-
parameters. Furthermore, we will use some Galiano S. Estévez Velarde M. A.
rules to distinguish the relations and the roles Garcı́a-Cumbreras M. Garcı́a-Vega
with the entity labels and train two different Y. Gutiérrez Vázquez A. Montejo Ráez
classifier, thus, they would be more accurate. A. Montoyo Guijarro R. Muñoz Guillena
In addition, we will use another neural net- A. Piad Morffis, y J. Villena-Román,
work architectures like the Recurrent Neural editores, Proceedings of TASS 2018:
Network and possible combinations with the Workshop on Semantic Analysis at
CNN. SEPLN (TASS 2018), volumen 2172 de
CEUR Workshop Proceedings, Sevilla,
Funding Spain, September. CEUR-WS.
This work was supported by the Research Suárez-Paniagua, V., I. Segura-Bedmar, y
Program of the Ministry of Economy and P. Martı́nez. 2017a. Exploring convolu-
Competitiveness - Government of Spain, tional neural networks for drug-drug inter-
(DeepEMR project TIN2017-87548-C2-1-R). action extraction. Database, 2017:bax019.
Bibliografı́a Suárez-Paniagua, V., I. Segura-Bedmar, y
P. Martı́nez. 2017b. LABDA at semeval-
Augenstein, I., M. Das, S. Riedel, L. Vikra-
2017 task 10: Relation classification bet-
man, y A. McCallum. 2017. Semeval
ween keyphrases via convolutional neural
2017 task 10: Scienceie - extracting keyph-
network. En Proceedings of the 11th In-
rases and relations from scientific publi-
ternational Workshop on Semantic Eva-
cations. En Proceedings of the 11th In-
luation, SemEval@ACL 2017, Vancouver,
ternational Workshop on Semantic Eva-
Canada, August 3-4, 2017, páginas 969–
luation (SemEval-2017), páginas 546–555,
972.
Vancouver, Canada, August. Association
for Computational Linguistics. Zeng, D., K. Liu, S. Lai, G. Zhou, y J. Zhao.
2014. Relation classification via convo-
Kim, Y. 2014. Convolutional neural net- lutional deep neural network. En Pro-
works for sentence classification. En Pro- ceedings of the 25th International Confe-
ceedings of the 2014 Conference on Em- rence on Computational Linguistics (CO-
pirical Methods in Natural Language Pro- LING 2014), Technical Papers, páginas
cessing (EMNLP), páginas 1746–1751. 2335–2344, Dublin, Ireland, August. Du-
Kingma, D. P. y J. Ba. 2014. Adam: A met- blin City University and Association for
hod for stochastic optimization. CoRR, Computational Linguistics.
abs/1412.6980.
76