PUCRJ-PUCPR-UFMG at eHealth-KD
                  Challenge 2021:
    A Multilingual BERT-based System for Joint
    Entity Recognition and Relation Extraction

           Lucas Pavanelli1[0000−0003−2228−7965] , Elisa Terumi Rubel
              3[0000−0002−8921−5598]
 Schneider                        , Yohan Bonescki Gumiel2,3[000−0001−8239−2930] ,
                              2[0000−0003−0200−3646]
     Thiago Castro Ferreira                          , Lucas Ferro Antunes de
              Oliveira3[0000−0003−4052−7993] , João Vitor Andrioli de
             Souza3[0000−0002−8950−0890] , Giovanni Pazini Meneghel
     3[0000−0002−9789−9547]
Paiva                       , Lucas Emanuel Silva e Oliveira3[0000−0003−1811−5087] ,
      Claudia Maria Cabral Moro3[0000−0003−2637−3086] , Emerson Cabrera
    Paraiso3[0000−0002−6740−7855] , and Adriana Pagano2[0000−0002−3150−3503]
      1
      Pontifı́cia Universidade Católica do Rio de Janeiro, Rio de Janeiro, Brazil
                             lpavanelli@inf.puc-rio.br
2
  Universidade Federal de Minas Gerais, Belo Horizonte, Brazil thiagocf05@ufmg.br
  3
    Pontifı́cia Universidade Católica do Paraná, Curitiba, Brazil c.moro@pucpr.br


          Abstract. This study introduces the system submitted to the eHealth-
          KD Challenge 2021 by the PUCRJ-PUCPR-UFMG team. We proposed a
          multilingual BERT-based system for joint entity recognition and relation
          extraction in multidomain texts. Our end-to-end multitasking model
          benefits from the transformer architecture, which has proved to capture
          better the global dependencies of the input text. Also, the use of a
          multilingual model contributed to our system to perform well even in the
          set of tests containing non-Spanish sentences. Our system ranked first
          in the entity recognition task and second in the Main scenario, where
          both tasks of entity recognition and relation extraction had to be solved.
          The full code of our approach and more details of the implementation
          are publicly available.

          Keywords: eHealth · Entity Recognition · Relation Extraction · BERT
          · Deep Learning


1     Introduction

Recent advances in Natural Language Processing (NLP) allow the extraction of
relevant information from clinical and biomedical texts, automatically acquiring
    IberLEF 2021, September 2021, Málaga, Spain.
    Copyright © 2021 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
a wide variety of knowledge from unstructured health documents [10]. Tasks such
as named entity recognition (NER) and extracting relations between entities
can support other tasks and assist in healthcare decision-making. In the clinical
domain, NER can identify clinical concepts such as symptoms, diseases and
procedures, extracting valuable information about the patients. The extraction of
relations between entities allows finding information, such as interactions between
drugs, which can aid healthcare professionals by providing information to enhance
patient care.
    The IberLEF eHealth Knowledge Discovery Challenge (eHealthKD) 2021
[18] targets the recognition of entities and their relations in the clinical domain,
encouraging researchers and scientists to discover new knowledge through text
mining and NLP in the health domain. The challenge involves the use of human
language modeling in electronic health documents in Spanish with semantic
interpretation, in the tasks of entity recognition and relation extraction. The
semantic structure has four types of information units, which can have relation-
ships among themselves (13 types of semantic relations). In addition to having
a larger dataset, compared to the previous challenge [17], the 2021 edition also
contains cross-domain and multi-language features, encouraging the development
of more generic and adaptive systems, which can be applied readily in several
languages and domains. In this respect, the released dataset comprised texts in
Spanish and English and covered both healthcare and news domains.
    In our method, we use a transformer-based model, as large pre-trained lan-
guage models based on the transformer architecture [20] have reached the state-of-
the-art in various NLP tasks. We employ the multilingual version of Bidirectional
Encoder Representations from Transformers (BERT) [5], which supports 104
languages including Spanish. We implemented an end-to-end multitasking BERT-
based model fine-tuned to extract the entities from text and classify relations
between them. Our approach is based on the 2020 Vicomtech method [7], one
of the best approaches in the 2020 version of the challenge [17]. In order to
contribute research in the tasks of extracting entities and relations, we also make
our code available in a public repository [16], allowing it to be easily reproduced.
    The paper is organised as follows: Section 2 describes the proposed method
with architecture and implementation details; Section 3 presents the results
of all competing systems; and Sections 4 and 5 provide some discussions and
conclusions obtained from the observed results.


2   System Description

Based on the 2020 Vicomtech approach [7], our method consists in an end-to-end
multilingual BERT-based system that jointly predicts both entities and relations.
During training, the proposed multi-task system is fine-tuned in 3 sequential
steps: the first one prioritizes the entity recognition task whereas the second
gives precedence to the relation extraction one. Finally, the third and last step
is trained for both tasks using a multi-task strategy. In this section, we detail
the system’s architecture, how we handle the inputs and the outputs and, finally,
present the parameters and training setup.

         1
             mBERT Encoder                                                                                          4
                 SxH                                                                                                          Projection
                                                                                                                               SxSxD
 [CLS]              ...
                                                                                                                                   ...
  El                ...
                                                                                                                                   ...
  g                 ...
                                                                                                                                   ...
 ##lut              ...                                           3
                                                                            Cross-operation                                        ...
 ##en               ...                                                      SxSx2(H+E)                                            ...
                    ...
                                                  [CLS]   [CLS]       ...                     ...                                  ...
 [SEP]              ...                            El      El         ...                     ...                                  ...
                                  +                El      g          ...                     ...
         2
             Entity Recognition                    El     ##lut       ...                     ...
                                                                                                                        5
                    SxE                            El     ##en        ...                     ...                           Relation Extraction
                     ...                 O                            ...                     ...                                 SxSxR
 [CLS]
   El                ...                 O        [SEP]   [SEP]       ...                     ...   [CLS]   [CLS]                    ...           O

                     ...              B-Concept                                                      El      El                      ...           O
   g
                                         O                                                           El      g                       ...          is-a
 ##lut               ...
                                                                                                     El     ##lut                    ...           O
 ##en                ...                 O
                                                                                                     El     ##en                     ...           O
                     ...
                                                                                                                                     ...
 [SEP]               ...                 O
                                                                                                    [SEP]   [SEP]                    ...           O


Fig. 1. Architecture of the approach submitted to the eHealth-KD Challenge 2021 by
the PUCRJ-PUCPR-UFMG team.


2.1          Architecture
The architecture of our model is presented in Figure 1. In the following paragraphs
we will explain each of its components.

Encoder Like 2020 Vicomtech, our approach first tokenizes the input text and
encodes its tokens (words or subwords) into vector representations (Step 1 in
Figure 1). Unlike Vicomtech’s previous approach, which used BETO [3], a
BERT version trained on the Spanish language, our approach uses mBERT,
a multilingual version of BERT pretrained in texts of 104 languages [5]. We
used the bert-base-multilingual-cased setting, with 12 self-attention heads,
12 layers (transformer blocks), and embedding length of 768 dimension, which
encodes multilingual cased texts.

Entity Recognition Once the input text is encoded, as depicted in Step 2 in
Figure 1, the encoded vector representations are fed into a softmax classifier
for entity recognition. In the provided dataset, each entity can be classified into
4 categories: Concept, Action, Predicate and Reference. In order to know
when a token is part of an entity mention and where each of these mentions
starts and ends in terms of tokens, we used the IOB2 format, popular in Named
Entity Recognition applications, so that each token of the text could be labeled
by the classifier according to 9 categories: O, B-Concept, I-Concept, B-Action,
I-Action, B-Predicate, I-Predicate, B-Reference and I-Reference. The O
label is used to mark tokens which are not part of an entity mention, whereas
the ones starting with B- and I- indicate the beginning and subsequent tokens
of a mention, respectively.
Relation Extraction Like the 2020 Vicomtech approach, we concatenate the
logits of the entity recognition classifier with the vector representations related
to the respective tokens. A cross-operation is then performed by concatenating
each pair of vector representations among the tokens, resulting in a tensor of
dimension SxSx2(H + E), being S the sequence size of the input text, H the
768 dimensions of the vector representations and E the 9 dimensions of the logits
(Step 3 of Figure 1). The matrix is further fed into a projection layer with an
Tanh activation function, which maps the input onto a SxSxD, where D = 768
(Step 4 of Figure 1). Finally, the output of the previous operation is given as
input to a classifier which predicts the relation of each pair of tokens according to
13 categories of the dataset (is-a, part-of, has-property, causes, entails,
in-context, in-place, in-time, subject, target, domain, arg and same-as),
plus a O one, which indicates there is no relation among the target pair.

Classifiers Both entity recognition and relation extraction classifiers consist in a
projection layer with a Mish [13] activation function and dropout of 0.2, followed
by a softmax layer.


                         Fig. 2. Example of model’s input.


2.2   Input handling

Since the corpora have been provided in a character span-based format and our
network works at token level, we tokenize the sentence text using BERT default
tokenizer [5], resulting in WordPiece information.
    Next, for each token, we assign Begin and Inside tags (IOB2 format), if it
is part of an entity, and O otherwise. By using this approach, we can represent
consecutive entities with more than one token. However, this prevents us from
representing discontinuous entities e.g. considering the text span “uno o dos
dı́as”, we cannot represent the entity (“un dı́a”) using the IOB2 format. In this
case, we only consider the first entity (“un”). We opt for this simple approach
instead of a complex representation, because we value building a more simple
and efficient model.
    As for relations, we represent them as triples, containing the first token of
each entity in the relation and the relation type. We use these triples to fill the
relation matrix. Figure 2 shows an example input of our model.


2.3         Output handling

The output of the model needs to be converted back to a character span-based
format. So we implement a postprocessing module that is responsible for this
conversion. The model’s output contains a sequence of tokens, each one assigned
to an entity tag and a SxS matrix informing the relation between each pair of
tokens.
    For each token, if it is the beginning of an entity, we identify the character
range that it spans and add this span list to the result. Next, we discard entities
that are entirely contained within another one and which start with a stopword.
Lastly, we construct the relations by linking entities that contain at least one
token in the model’s relations output.


2.4         Parameters and Training Setup

Our neural network approach was trained using the AdamW [11] optimizer combined
with a linear scheduler which warms-up the training process from an initial
learning rate of 2e-6 up to 2e-5 over the first 10 epochs. Using a batch of size 1,
we train the approach in 3 sequential steps.
    In the first step, all the training parameters of the network are frozen except
for the ones from mBERT and the entity recognition classifier. The model is
then trained for 50 epochs with early stopping of patience 15 (i.e., the training
algorithm waits 15 epochs before early stop if no progress on the validation set),
computing the loss only based on the entity recognition task:

                                                           N
                                                       1 X (ent)
                         Jent (x(ent) , y (ent) ) =         x                   (1)
                                                       N n=1 yn(ent)

    where x(ent) is the likelihood computed by the entity recognition classifier,
    (ent)
y      is the gold-standards and N is the size of the batch.
    For the second step, which focuses on the relation extraction task, we only
freeze the training parameters of the entity recognition classifier. The approach
is also trained for 50 epochs with early stopping of patience 15, though unlike
the previous step, the loss is computed based on the relation extraction task:

                                                          N
                                                       1 X (rel)
                          Jrel (x(rel) , y (rel) ) =        x                   (2)
                                                       N n=1 yn(rel)

      where x(rel) is the likelihood computed by the relation extraction classifier,
    (rel)
y       the gold-standards and N is the size of the batch.
    Finally, we perform a third training step with 100 epochs and early stopping
of patience 15 in order to fine-tune the model for both tasks. None of the training
parameters are frozen and the loss is computed based on [9] in the following way:

 J = e−αent × Jent (x(ent) , y (ent) ) + αent + e−αrel × Jrel (x(rel) , y (rel) ) + αrel (3)
    being αent and αrel training parameters as well.
    Our approach was trained using the training and development sets released
for the shared-task. Although the third step was scheduled to run for 100 epochs,
due to time constraints regarding the submission deadline, we reported results for
67 epochs in this step. After submission, we performed the third training process
with a 100 epochs run. In the following section we present the results of our
officially submitted approach as well as the results obtained in our subsequent
100 epochs run in the third step.

3   Results
Table 1 displays the results (precision, recall and F1-score) reported by the
participating systems in the eHealth-KD Challenge 2021 [18]. For the entity
recognition task (Task A), our official approach, which ran for 67 epochs in the
third training step and is labeled as Our Approach in the table, ranked first with
an F-Score of 70.60 outperforming Vicomtech, the second best in the task and
developed by the winning team of the 2020 version of the challenge [17]. On the
other hand, for the relation extraction task (Task B) our system had a significant
drop, ranking 4th in the task with an F-Score of 26.32 behind the IXA, Vicomtech
and uhKD4 systems. The intermediate performance of our approach in task B
was made up for by its good performance for the entity recognition task so that
our approach ranked second in the Main task (which combines both tasks), just
behind Vicomtech.

Table 1. Participating systems’ reported results for the eHealth-KD Challenge 2021

                              Main                 Task A              Task B
                       Entity + Relation         Entity R.           Relation E.
                      #R      P     R    F1 #R      P    R    F1 #R     P     R    F1
Our Approach            2 56.85 50.28 52.84  1 71.49 69.73 70.60  4 36.66 20.54 26.32
Our Approach 100 epcs   - 53.63 49.39 51.42   - 71.49 69.20 70.33  - 32.31 22.96 26.85
Vicomtech [6]           1 54.08 53.46 53.11  2 69.99 74.71 68.41  2 54.19 28.31 37.19
IXA [2]                 3 46.46 53.86 49.89  3 61.37 69.8 65.33   1 45.36 40.95 43.04
uhKD4 [1]               4 48.53 37.43 42.26  5 51.75 53.74 52.73  3 55.62 22.24 31.77
UH-MMM [14]             5 29.16 40.37 33.87  4 54.60 68.50 60.77  5 07.73 4.13 05.38
CodestrangeTeam [12]    6 33.70 17.69 23.20 10 41.50 4.44 8.02    6 43.75 1.70 3.28
baseline [18]           7 33.70 17.69 23.20  7 35.03 27.17 30.60  7 43.75 1.70 3.28
JAD [15]                8 10.95 23.44 7.14   8 31.58 22.46 26.25  8 37.50 0.365 0.722
Yunnan-Deep [8]         -      -     -    -  6 52.04 24.60 33.41   -     -     -     -
Yunnan-1 [21]           -      -     -    -  9 27.11 12.73 17.32   -     -     -     -


    As mentioned in Section 2.4, due to time constraints, we ran the third step of
training for 67 epochs instead of the originally scheduled 100 ones. We initially
hypothesized this to be the reason for the significant drop in performance of our
model in the Relation Extraction task. However, this does not seem to have been
the case as evidenced by the results of our system obtained after subsequently
training for 100 epochs in the third step, depicted as Our Approach 100 epcs.
In terms of F-Score, the 100 epochs version of our approach showed a slight
improvement compared to our official results in the task to the cost of a slight
drop in performance in the entity recognition task.


4    Discussion

Our approach is a simplified version of 2020 Vicomtech2020, the winner of the
2020 challenge. After encoding a sentence using a BERT-based method, the
original system uses two classifiers for entity recognition: the first one predicts
whether each token is part or not of an entity of a certain type; and the second
one, a multiword classifier, predicts whether each pair of tokens is part of a same
entity mention. Unlike 2020, adopting the IOB2 format, we used a single classifier
for the entity recognition task. Besides simpler, our approach outperformed the
original, ranking first for the entity recognition task.
    A further distinction between the original 2020 Vicomtech2020 approach
and ours is that the former uses a DistilBERT [19] module to map the tensor
with pairs of token representations, whereas ours uses a linear projection with
a Tanh activation function, which makes our system much less computationally
intensive. Moreover, the original approach uses three classifiers in order to solve
the relation extraction task. The first predicts a bidirectional same-as relation
between pairs of tokens. The second aims to predict whether there is or not
another kind of relation between tokens. If a relation is predicted, a third classifier
is used to predict the relation type. Unlike 2020 Vicomtech2020, our approach
is built with only one classifier for relation extraction, which predicts whether
there is a relation and its type. As an impact of our simplification, the method
was outperformed and ranked 4th in task B, even after training the method for a
total of 100 epochs.


5    Conclusions

This study has introduced the approach developed by the PUCRJ-PUCPR-UFMG
team for the eHealth-KD Challenge 2021. The approach is able to jointly solve
the entity recognition and relation extraction tasks in multilingual texts, using a
fine-tuned version of mBERT, a BERT version which supports 104 languages. Our
system ranked first in the entity recognition task and second in the Main scenario,
with a simple and little computationally-intensive approach. As the trained model
can benefit several downstream NLP tasks, we have publicly released [16] our
method for researchers and community in general. As a further step in our work,
we intend to explore relation classification heuristics in order to improve our
results for entity relation extraction.
Acknowledgments
Research partially funded by the Coordination for the Improvement of Higher
Education Personnel (CAPES) under grant 88887.508597/2020-00 and Finance
Code 001 and the National Council for Scientific and Technological Development
(CNPq) under grant 443653/2018-6.

References
 1. Alfaro-González, D., Pérez-Perera, D., González-Rodrı́guez, G., Otaño-Barrera, A.J.:
    uhKD4 at eHealth-KD Challenge 2021: Deep Learning Approaches for Knowledge
    Discovery from Spanish Biomedical Documents. In: Proceedings of the Iberian
    Languages Evaluation Forum (IberLEF 2021) (2021)
 2. Andrés, E.: IXA at eHealth-KD Challenge 2021: Generic Sequence Labeling as
    Relation Extraction Approach. In: Proceedings of the Iberian Languages Evaluation
    Forum (IberLEF 2021) (2021)
 3. Canete, J., Chaperon, G., Fuentes, R., Pérez, J.: Spanish pre-trained bert model
    and evaluation data. PML4DC at ICLR 2020 (2020)
 4. Cañete, J., Chaperon, G., Fuentes, R., Ho, J.H., Kang, H., Pérez, J.: Spanish
    Pre-Trained BERT Model and Evaluation Data. In: PML4DC at ICLR 2020 (2020)
 5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep
    Bidirectional Transformers for Language Understanding. Proceedings of the
    2019 Conference of the North American Chapter of the Association for Com-
    putational Linguistics: Human Language Technologies, Volume 1 (Long and
    Short Papers) pp. 4171–4186 (Jun 2019). https://doi.org/10.18653/v1/N19-1423,
    https://www.aclweb.org/anthology/N19-1423
 6. Garcı́a-Pablos, A., Pérez, N., Cuadros, M.: Vicomtech at eHealth-KD Challenge
    2021: Deep Learning Approaches to Model Health-related Text in Spanish. In:
    Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) (2021)
 7. Garcı́a-Pablos, A., Perez, N., Cuadros, M., Zotova, E.: Vicomtech at eHealth-
    KD Challenge 2020: Deep End-to-End Model for Entity and Relation Extraction
    in Medical Text. In: Proceedings of the Iberian Languages Evaluation Forum
    co-located with 36th Conference of the Spanish Society for Natural Language
    Processing, IberLEF@ SEPLN. vol. 2020 (2020)
 8. Guan, Z., Liu, R.: Yunnan-Deep at eHealth-KD Challenge 2021: Deep Learning
    Model for Entity Recognition in Spanish Documents. In: Proceedings of the Iberian
    Languages Evaluation Forum (IberLEF 2021) (2021)
 9. Kendall, A., Gal, Y., Cipolla, R.: Multi-Task Learning Using Uncertainty to Weigh
    Losses for Scene Geometry and Semantics. In: Proceedings of the IEEE conference
    on computer vision and pattern recognition. pp. 7482–7491 (2018)
10. Kreimeyer, K., Foster, M., Pandey, A., Arya, N., Halford, G., Jones, S.F., Forshee,
    R., Walderhaug, M., Botsis, T.: Natural language processing systems for capturing
    and standardizing unstructured clinical information: a systematic review. Journal
    of biomedical informatics 73, 14–29 (2017)
11. Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization.
    In: International Conference on Learning Representations (2019),
    https://openreview.net/forum?id=Bkg6RiCqY7
12. Marti, R., Bermudez, C., Garcı́a, L., Gutiérrez, L.: CodeStrange at eHealth-KD
    Challenge 2021. In: Proceedings of the Iberian Languages Evaluation Forum (Iber-
    LEF 2021) (2021)
13. Misra, D.: Mish: A Self Regularized Non-Monotonic Activation Function. In:
    31st British Machine Vision Conference 2020, BMVC 2020, Virtual Event,
    UK, September 7-10, 2020. BMVA Press (2020), https://www.bmvc2020-
    conference.com/assets/papers/0928.pdf
14. Monteagudo-Garcı́a, L., Marrero-Santos, A., Fernández-Arias, M.S., Cañizares-
    Dı́az, H.: UH-MMM at eHealth-KD Challenge 2021. In: Proceedings of the Iberian
    Languages Evaluation Forum (IberLEF 2021) (2021)
15. Navarro Comabella, J.G., Valle Diaz, J.D., Helguera Fleitas, A.: JAD at eHealth-
    KD Challenge 2021: Simple Neural Network with BERT for Joint Classification of
    Key-Phrases and Relations. In: Proceedings of the Iberian Languages Evaluation
    Forum (IberLEF 2021) (2021)
16. Pavanelli, L., Schneider, E.T.R., Gumiel, Y.B., Ferreira, T.C., Oliveira, L.F.A.,
    De Souza, J.V.A., Paiva, G.P.M., Oliveira, L.E.S., Moro, C.M.C., Paraiso,
    E.C., Pagano, A.: PUCRJ-PUCPR-UFMG. https://github.com/eHealth-KD-PUCs-
    UFMG/pucrj-pucpr-ufmg (2021)
17. Piad-Morffis, A., Gutiérrez, Y., Cañizares-Diaz, S., Estevez-Velarde, S., Almeida-
    Cruz, Y., Muñoz, R., Montoyo, A.: Overview of the eHealth Knowledge Discovery
    Challenge at IberLEF 2020. In: Proceedings of the Iberian Languages Evaluation
    Forum (2020)
18. Piad-Morffis, A., Gutiérrez, Y., Estevez-Velarde, S., Almeida-Cruz, Y., Muñoz, R.,
    Montoyo, A.: Overview of the eHealth Knowledge Discovery Challenge at IberLEF
    2021. Procesamiento del Lenguaje Natural 67(0) (2021)
19. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version
    of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019),
    http://arxiv.org/abs/1910.01108
20. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
    u., Polosukhin, I.: Attention is All You Need. Proceedings of the 31st International
    Conference on Neural Information Processing Systems p. 6000–6010 (2017)
21. Yang, M.: Yunnan-1 at eHealth-KD Challenge 2021: Deep-Learning Methods for
    Entity Recognition in Medical Text. In: Proceedings of the Iberian Languages
    Evaluation Forum (IberLEF 2021) (2021)