=Paper=
{{Paper
|id=Vol-2664/cantemist_paper15
|storemode=property
|title=ICB-UMA at CANTEMIST 2020: Automatic ICD-O Coding in Spanish with BERT
|pdfUrl=https://ceur-ws.org/Vol-2664/cantemist_paper15.pdf
|volume=Vol-2664
|authors=Guillermo López-García,José Manuel Jerez,Nuria Ribelles,Emilio Alba,Francisco Javier Veredas
|dblpUrl=https://dblp.org/rec/conf/sepln/Lopez-GarciaJV20
}}
==ICB-UMA at CANTEMIST 2020: Automatic ICD-O Coding in Spanish with BERT==
ICB-UMA at CANTEMIST 2020: Automatic ICD-O Coding in Spanish with BERT Guillermo López-Garcíaa , José M. Jereza , Nuria Ribellesb , Emilio Albab and Francisco J. Veredasa a Departamento de Lenguajes y Ciencias de la Computación, ETSI Informática, Universidad de Málaga, Málaga, Spain b Unidad de Gestión Clínica Intercentros de Oncología, Instituto de Investigación Biomédica de Málaga (IBIMA), Hospitales Universitarios Regional y Virgen de la Victoria, Málaga, Spain Abstract This working notes paper presents our contribution to the CANTEMIST track. Our team has partici- pated in the CANTEMIST-CODING subtask, the first shared task consisted in the automatic assignment of ICD-O-3 codes to Spanish oncology clinical cases. We addressed the task as a multi-label text clas- sification problem using BERT model [1]. In order to leverage all the language modelling capabilities of the BERT architecture when applied to the CANTEMIST corpus, we have used an enhanced ver- sion of our fragment-based classification approach initially developed to tackle the CodiEsp-D task [2]. Hence, applying the improved version of our fragment-based strategy to the CANTEMIST corpus, we produced short fragments of text comprising a sequence of sentences from the long clinical documents present in the oncology corpus, and used them as input to the model. Two different versions of the BERT-Base model, namely the Multilingual BERT [3] and the BERT-SciELO [4] models, were fine-tuned on the CANTEMIST-CODING annotated corpus. The Multilingual BERT model further pre-trained on an unlabeled Spanish corpus of oncology clinical cases retrieved from Galén [5], achieved the highest classification performance among our five submitted systems, obtaining a final Mean Average Precision (MAP) score of 0.847 on the evaluation set. Keywords Clinical NLP, BERT, Spanish oncology clinical cases, Automatic clinical coding, Transfer learning, Text classification 1. Introduction There is a growing interest in processing clinical documents using text mining and Natural Language Processing (NLP) techniques, giving rise to the birth of a new scientific subdiscipline situated at the intersection between Medicine, Linguistics and Computer Science, namely Clinical NLP [6]. One of the most active areas of research in Clinical NLP is the development of tools that perform automatic clinical coding, i.e. the task of autonomously extracting valuable structured information contained in the unstructured medical notes, following standardised coding terminologies [7]. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) email: guilopgar@uma.es (G. López-García); jja@lcc.uma.es (J.M. Jerez); ealbac@uma.es (E. Alba); franveredas@uma.es (F.J. Veredas) orcid: 0000-0001-5903-1483 (G. López-García); 0000-0002-7858-2966 (J.M. Jerez); 0000-0003-0739-2505 (F.J. Veredas) © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Historically, Clinical NLP researchers have focused mainly on English text, generating and exploiting clinical coding resources in the English language [8, 9, 10]. With 483 million native speakers [11], there exists a noteworthy interest in processing medical documents in Spanish. However, the lack of clinical linguistic resources for non-English languages makes it specially arduous to develop tools tailored to the Spanish clinical documents. With the aim of promoting the application of Clinical NLP techniques to Spanish medical texts, the CANcer TExt MIning Shared Task (CANTEMIST) [12] has been organised in the context of the Iberian Languages Evaluation Forum (IberLEF 2020). CANTEMIST is the first shared task consisted in the automatic clinical coding of oncology medical cases written in Spanish. The CANTEMIST corpus comprises 1.3𝐾 clinical cases manually annotated by ex- perts in oncology using the Spanish version (eCIE-O-3.1) of the International Classification of Diseases for Oncology (ICD-O-3) codes. The CANTEMIST track is composed of three distinct subtasks: CANTEMIST-NER, CANTEMIST-NORM and CANTEMIST-CODING. CANTEMIST- NER subtrack requires identifying tumour morphology mentions contained in a free-text clinical case, whereas in CANTEMIST-NORM subtask tumour morphology mentions must be both identified and normalised by assigning their corresponding ICD-O-3 codes. On the other hand, CANTEMIST-CODING task requires assigning a set of ICD-0-3 codes to each medical document in the corpus. In this work, we present our contribution to the IberLEF 2020, where our team has partici- pated in the CANTEMIST-CODING subtask. We have addressed the task as a multi-label text classification problem using BERT [1], a Transformer-based [13] language model that achieved state-of-the-art results on several different NLP tasks. BERT was initially designed to receive short fragments of text as input to the model, as opposed to the long oncology texts present in the CANTEMIST corpus. In order to leverage all the language modelling capabilities of the BERT architecture when applied to the CANTEMIST corpus, we have employed an improved version of our fragment-based classification approach originally developed for the CodiEsp-D task [2]. Hence, using the annotations available for the CANTEMIST-NORM subtask, we turned the CANTEMIST-CODING multi-label document classification problem into a multi-label short- fragment classification task, producing annotated short fragments of text with a full semantic meaning. Furthermore, we experimented with two different versions of the BERT-Base model: the Multilingual version [3], and the BERT-SciELO [4] model. Besides, different alternatives were explored to adapt the two models to the clinical domain, by further pre-training their weights on medical corpora before fine-tuning the models on the CANTEMIST corpus. For reproducibility purposes, the implementation of our approach is publicly available at https://github.com/guilopgar/CANTEMIST-2020. 2. Materials and Methods 2.1. Corpora 2.1.1. Clinical corpora In this work, we experimented with two different BERT-Base models, namely Multilingual BERT and BERT-SciELO. Multilingual BERT was pre-trained on an extensive multilingual 469 Table 1 Summary of the Galén oncology and MIMIC-III discharge summaries unlabeled corpora. The number of tokens and the number of characters were obtained using the Linux wc -w and wc -c commands, respectively. Corpus Documents Tokens Characters Galén oncology 30.9K 64.4M 437.6M MIMIC-III discharge summaries 57.4K 81M 534.7M general domain corpus comprising texts from 104 different languages [3], including Spanish. On the other hand, BERT-SciELO model was pre-trained on a corpus of biomedical articles in Spanish [4]. With the aim of adapting both models to the distinctive features of the Spanish clinical texts domain, we decided to further pre-train the models on a corpus of de-identified medical texts in Spanish retrieved from the Galén Oncology Information System [5]. The corpus comprises 30.9𝐾 unlabeled documents containing oncology clinical notes written by physicians from the Oncology Departments of the Hospital Universitario Virgen de la Victoria (HUVV) and the Hospital Regional Universitario (HRU) in Málaga, Spain. Moreover, in order to exploit the cross-lingual features extracted by the Multilingual BERT model, in addition to the corpus of medical texts retrieved from Galén, we also used a clinical corpus in English to pre-train the Multilingual BERT model. In this way, we joined the Galén oncology documents and the discharge summaries from the MIMIC-III database [14] in a single bilingual clinical corpus, used to perform the unsupervised pre-training of the multilingual model. From the multiple categories of documents stored in the MIMIC-III database, e.g. radiology, nursing, nutrition and pharmacy reports, we only selected the discharge summaries texts, given the high similarity between the medical texts from Galén and the content of the discharge summaries. In Table 1, a brief description of both the Galén oncology corpus and the discharge summaries corpus retrieved from the MIMIC-III database is given. 2.1.2. CANTEMIST corpus The CANTEMIST corpus contains 1.3𝐾 clinical cases manually curated by oncology experts, covering a wide variety of cancer types. For the CANTEMIST-CODING subtask, the documents from the corpus were annotated with ICD-O-3 codes. The entire corpus was divided into four distinct subsets of annotated texts, the training (501 documents), development-1 (249 documents), development-2 (250 documents) and test (300 documents) sets. Teams participating in the CANTEMIST-CODING subtrack were evaluated on the test set. In Table 2, a basic description of the CANTEMIST-CODING corpus is given. As it can be seen from the table, the number of codes annotations is scarce considering the limited number of documents contained in the corpus, having a low average number of texts where each ICD-O-3 code is present. This results in an imbalanced multi-label classification problem, in which for each code, the annotated texts (positive samples) are clearly outnumbered by the documents where the code is not present (negative samples). 470 Table 2 Summary of the CANTEMIST-CODING annotated corpus. Training Development-1 Development-2 Test Documents 501 249 250 300 Total ICD-O codes 2756 1385 1279 1599 Avg. ICD-O codes per doc. 5.501 5.562 5.116 5.330 Unique ICD-O codes 493 338 334 386 Avg. docs. per ICD-O code 5.590 4.098 3.829 4.142 Unique unseen ICD-O codes - 130 120 107 2.2. Classification system We have tackled the CANTEMIST-CODING challenge using BERT model [1]. One of the characteristic features of BERT is that its Transformer-based architecture is designed to process an input sequence of WordPiece [15] sub-tokens with a limited length 𝑁 (𝑁 = 512 in the original implementation of BERT). This entails an important constraint when dealing with long- document classification tasks such as CANTEMIST-CODING, in which most of the clinical cases exhibit a WordPiece sub-tokens sequence length high above the maximum length supported by BERT. In order to overcome this limitation, we have applied our fragment-based classification approach initially developed to address the CodiEsp-D task [2]. Given the high correspondence between CodiEsp-D and CANTEMIST-CODING subtasks, the three-phases custom approach was applied in a straightforward manner. In this way, we firstly split each clinical document of the CANTEMIST corpus into short fragments of text. Subsequently, using the ICD-O-3 codes annotations available for the CANTEMIST-NORM subtask, we annotated each fragment with the oncology codes exclusively occurring within the fragment. Then, we used the annotated fragments to perform the supervised fine-tuning of the BERT model on a fragment-level multi- label classification task. Finally, since the evaluation of the CANTEMIST-CODING participating systems was performed at document level, we post-processed the probabilities predicted by the model at fragment level using a maximum probability criterion, producing a list of codes ordered by relevance for each clinical document [2]. Nevertheless, in this work, we have not directly applied the first phase of the fragment-based classification approach described above. Instead, we have modified the first of the three-phases forming the fragment-based strategy, performing the text segmentation at the sentence level. Concretely, during the splitting phase, for each clinical case, the text was firstly split into sentences using the SPACCC Sentence Splitter tool [16]. Then, the WordPiece tokenization was performed on each sentence, producing a sequence of sub-tokens 𝑠𝑖 = (𝑤𝑖1 , ..., 𝑤1𝑘 ) for every sentence, with length 𝑘. For each sentence 𝑠𝑖 , if 𝑘 > 𝑁 − 2 (considering that BERT always adds the sub-tokens [CLS] and [SEP] at the start and end positions, respectively, of an input sequence), we split 𝑠𝑖 into ⌈𝑘/(𝑁 − 2)⌉ further sentences, ensuring that each finally produced sentence had a maximum length of 𝑁 − 2 sub-tokens. Hence, a final sequence 𝑠 = (𝑠1 , ..., 𝑠𝑚 ) = ((𝑤11 , ..., 𝑤1𝛼 ), ..., (𝑤𝑚1 , ..., 𝑤𝑚𝜔 )) of 𝑚 sub-tokens sentences was generated. Lastly, we split the sequence 𝑠 into a sequence of fragments of contiguous sentences 𝑓 = 471 (𝑓1 , 𝑓2 , ..., 𝑓𝑙 ) = ((𝑠1 , ..., 𝑠𝜆 ), (𝑠𝜆+1 , ..., 𝑠𝛽 )..., (𝑠𝜎 , ..., 𝑠𝑚 )) using a simple greedy strategy: each fragment 𝑓𝑖 contained the maximum number of adjacent sentences such that ∑𝑠𝑗 ∈𝑓𝑖 |𝑠𝑗 | ≤ 𝑁 − 2. Using this enhanced version of the splitting phase, along with the other two stages of the original version of our fragment-based classification approach [2], we could produce annotated short fragments of text comprising a sequence of sentences with full semantic meaning. 2.3. Experiments We used two different versions of the BERT-Base model, namely the Multilingual BERT and the BERT-SciELO models. To perform the unsupervised pre-training of both models on the clinical corpora described in Section 2.1.1, we made use of the original TensorFlow implementation of BERT [17]. As the vocabulary of the BERT-SciELO model does not account for any punctuation character, we performed a pre-processing procedure consisted in substituting all punctuation marks contained in the Galén oncology corpus (see Section 2.1.1) by spaces, and used the pre-preprocessed version of the corpus to pre-train the weights of the model. Once pre-trained, the model was fine-tuned on the CANTEMIST-CODING task, applying the same pre-processing procedure to the CANTEMIST corpus before training the architecture. In the case of the Multilingual BERT model, since punctuation marks are considered in its vocabulary, the raw texts of both the clinical corpora and the CANTEMIST corpus were employed to pre-train and then fine-tune the architecture, respectively. Regarding the models hyper-parameters, for fine-tuning, we used a maximum input sequence length of 𝑁 = 100 for the Multilingual BERT and a value of 𝑁 = 72 for the BERT-SciELO model; for both models, we used RAdam [18] with a learning rate of 3 × 10−5 , a batch size of 16 and the number of epochs were experimentally determined on the CANTEMIST-CODING development-2 subset using early-stopping, with an upper limit of 40 epochs. Finally, with respect to the hardware resources, all experiments were executed on a single GeForce GTX 1080 Ti 11 GB GPU. 3. Results In this section, we describe the results obtained by our team, ICB-UMA, at the CANTEMIST- CODING task. We submitted five different runs of our fragment-based classification system. The first (ICB-UMA run1) and the fourth (ICB-UMA run4) submissions corresponded to the original Multilingual BERT and BERT-SciELO models, respectively, fine-tuned on the CANTEMIST- CODING training, development-1 and development-2 corpora. Submissions ICB-UMA run2 and ICB-UMA run 5 contained the codes predicted by the Multilingual BERT and the BERT-SciELO models, respectively, further pre-trained on the Galén oncology corpus (see section 2.1.1) and then fine-tuned on the CANTEMIST-CODING corpus. Finally, submission ICB-UMA run3 corresponded to the Multilingual BERT model further pre-trained on the bilingual clinical corpus comprising the texts from the Galeń oncology corpus and the discharge summaries from the MIMIC-III database, and subsequently fine-tuned on the CANTEMIST-CODING corpus. To fine-tune the weights of each of the submitted BERT models, the representation generated by BERT for the initial [CLS] sub-token was fed into an output multi-label classification layer of 743—the number of unique codes present in the training, development-1 and development-2 CANTEMIST-CODING corpora (see Table 2)—units. 472 Table 3 Predictive performance of each submitted system assessed using MAP, the main evaluation metric of the CANTEMIST-CODING subtask. Submission MAP MAP No-Code ICB-UMA run1 0.821 0.794 ICB-UMA run2 0.847 0.821 ICB-UMA run3 0.837 0.813 ICB-UMA run4 0.800 0.769 ICB-UMA run5 0.812 0.784 Table 4 Predictive performance of each submitted system assessed according to additional evaluation metrics. Submission P R F1 P No-Code R No-Code F1 No-Code ICB-UMA run1 0.007 0.928 0.013 0.006 0.914 0.011 ICB-UMA run2 0.007 0.928 0.013 0.006 0.914 0.011 ICB-UMA run3 0.007 0.928 0.013 0.006 0.914 0.011 ICB-UMA run4 0.007 0.928 0.013 0.006 0.914 0.011 ICB-UMA run5 0.007 0.928 0.013 0.006 0.914 0.011 In Table 3 and Table 4, we show the classification performance of our five submitted runs on the CANTEMIST-CODING test corpus. In particular, Table 3 describes the results obtained according to the main evaluation metric of the CANTEMIST-CODING subtask, i.e. the Mean Average Precision (MAP) [19]. The second column of the table shows the MAP values computed considering all codes contained in the CANTEMIST-CODING test annotated corpus, whereas the values presented in the last column (MAP No-Code) were calculated without considering the overrepresented metastasis ICD-O-3 code (8000/6). According to the results observed in Table 3, the Multilingual version of BERT outperformed the BERT-SciELO model, as ICB-UMA run1, ICB-UMA run2 and ICB-UMA run3 systems achieved higher values than ICB-UMA run4 and ICB-UMA run5 systems for the two analysed metrics in the table. The best performance is obtained by the Multilingual BERT model pre-trained on the Galén oncology corpus (ICB-UMA run2), followed by the same model pre-trained on the bilingual medical corpus (ICB-UMA run3). If we compare ICB-UMA run4 and ICB-UMA run5 systems, we can see that the BERT-SciELO model further pre-trained on the Galén oncology corpus (ICB-UMA run5) outperformed the original version of the BERT-SciELO model (ICB-UMA run4). Thus, the obtained results in this work demonstrate that, both the Multilingual BERT and the BERT-SciELO models, when adapted to the Spanish clinical texts domain by means of further pre-training their weights on an unlabeled medical corpus in Spanish, outperformed the original version of the models on the CANTEMIST-CODING task. On the other hand, to perform a more extensive analysis of the obtained results, the or- ganisers of the CANTEMIST track evaluated the classification performance of the submitted systems according to a set of additional metrics. Hence, in Table 4, the second, third and fourth columns present the computed values using precision (P), recall (R) and the F-score (F1) metrics, 473 respectively, taking into consideration all codes contained in the CANTEMIST-CODING test subset, while the last three columns (P No-Code, R No-Code and F1 No-Code) show the results calculated using the same three metrics but without considering the 8000/6 ICD-O-3 code. As it can be seen from Table 4, our five submitted systems obtained really poor values for both precision and F-score metrics, while for the recall metric the obtained values were unusually high. The reason is that, with the goal of maximising the score obtained for the main evaluation MAP metric, as performed in [2], for each clinical case from the test corpus, we submitted all ICD-O-3 codes considered by the classification system—743, the number of units of the output classification layer of the models—sorted by their predicted probability of occurrence in descending order. On the contrary, if we had maximised precision, recall and F-score metrics, instead of submitting all considered codes, we would have defined a classification threshold to select only a subset of the codes according to their predicted probabilities. 4. Conclusion In this paper, we present our contribution to the CANTEMIST-CODING subtask from the CANTEMIST track [12], in the context of the IberLEF 2020. This shared task consisted in the automatic assignment of ICD-O-3 codes to oncology clinical cases written in Spanish. We have addressed the task as a multi-label text classification problem using BERT model [1]. With the goal of adapting BERT to the distinctive features of the CANTEMIST-CODING corpus, we have applied an improved version of our fragment-based classification approach initially developed for the CodiEsp-D task [2]. In this way, using the available information for the CANTEMIST- NORM subtask, we converted the CANTEMIST-CODING multi-label long-text classification task into a multi-label short-text classification problem, generating short fragments of text with full semantic meaning annotated with ICD-O-3 codes. We experimented with two different versions of the BERT-Base model, namely the Multilingual BERT [3] and the BERT-SciELO [4] models. The best classification performance among our five submitted systems was achieved by the Multilingual BERT model further pre-trained on a medical corpus of Spanish oncology clinical cases, obtaining a MAP score of 0.847 on the evaluation set. Besides, both Multilingual BERT and BERT-SciELO models further pre-trained on a medical corpus outperformed the original versions of the models on the CANTEMIST-CODING subtask, reinforcing the idea that a clinical domain version of BERT achieves higher performance on medical classification tasks that a non-clinical domain version of the model. In future works, given the promising results obtained by the Multilingual BERT model when applied to the CANTEMIST corpus, we will tackle other Spanish medical text classification tasks, such as CodiEsp-D subtask [2], using the Multilingual BERT fragment-based classification approach developed in this work. Furthermore, we will investigate whether further pre-training the model architecture using not only Spanish and English medical texts, but also French, Italian or German clinical documents, could leverage all the cross-lingual features extracted by the Multilingual BERT model and improve the results presented in this work for a Spanish oncology text classification task. 474 Acknowledgments This work was partially supported by the project TIN2017-88728-C2-1-R, MINECO, Plan Nacional de I+D+I, and I Plan Propio de Investigación y Transferencia of the Universidad de Málaga. References [1] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186. doi:10.18653/v1/N19-1423. [2] G. López-García, J. M. Jerez, F. J. Veredas, ICB-UMA at CLEF e-Health 2020 Task 1: Automatic ICD-10 coding in Spanish with BERT, in: Working Notes of Conference and Labs of the Evaluation Forum (CLEF), CEUR Workshop Proceedings, 2020. [3] Google Research, Multilingual BERT, 2019. URL: https://github.com/google-research/bert/ blob/master/multilingual.md. [4] L. Akhtyamova, P. Martínez, K. Verspoor, J. Cardiff, Testing Contextualized Word Embed- dings to Improve NER in Spanish Clinical Case Narratives, Preprint (Version 1) available at Research Square (2020). doi:10.21203/rs.2.22697/v1. [5] N. Ribelles, J. M. Jerez, D. Urda, J. L. Subirats, A. Márquez, C. Quero, E. Torres, L. Franco, E. Alba, Galén: Sistema de Información para la gestión y coordinación de procesos en un servicio de Oncología, FeSALUD 6 (2010). [6] K. Kreimeyer, M. Foster, A. Pandey, N. Arya, G. Halford, S. F. Jones, R. Forshee, M. Walder- haug, T. Botsis, Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review, Journal of Biomedical Informatics 73 (2017) 14–29. doi:10.1016/j.jbi.2017.07.012. [7] M. H. Stanfill, M. Williams, S. H. Fenton, R. A. Jenders, W. R. Hersh, A systematic literature review of automated clinical coding and classification systems, Journal of the American Medical Informatics Association 17 (2010) 646–651. doi:10.1136/jamia.2009.001024. [8] J. P. Pestian, C. Brew, P. Matykiewicz, D. J. Hovermale, N. Johnson, K. B. Cohen, W. Duch, A Shared Task Involving Multi-Label Classification of Clinical Free Text, in: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, BioNLP ’07, Association for Computational Linguistics, USA, 2007, p. 97–104. [9] A. Perotte, R. Pivovarov, K. Natarajan, N. Weiskopf, F. Wood, N. Elhadad, Diagnosis code assignment: models and evaluation metrics, Journal of the American Medical Informatics Association 21 (2013) 231–237. doi:10.1136/amiajnl-2013-002159. [10] M. Li, Z. Fei, M. Zeng, F. Wu, Y. Li, Y. Pan, J. Wang, Automated ICD-9 Coding via A Deep Learning Approach, IEEE/ACM Transactions on Computational Biology and Bioinformatics 16 (2019) 1193–1202. [11] D. F. Vítores, El español: una lengua viva. Informe 2019. Instituto Cervantes, 2019. URL: https://www.cervantes.es/imagenes/File/espanol_lengua_viva_2019.pdf. [12] A. Miranda-Escalada, E. Farré, M. Krallinger, Named entity recognition, concept normal- 475 ization and clinical coding: Overview of the CANTEMIST track for cancer text mining in Spanish, Corpus, Guidelines, Methods and Results, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020), CEUR Workshop Proceedings, 2020. [13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polo- sukhin, Attention is all you need, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems 30, Curran Associates, Inc., 2017, pp. 5998–6008. [14] A. E. W. Johnson, T. J. Pollard, L. Shen, L. H. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. A. Celi, R. G. Mark, MIMIC-III, a freely accessible critical care database, Scientific Data 3 (2016). [15] M. Johnson, M. Schuster, Q. V. Le, M. Krikun, Y. Wu, Z. Chen, N. Thorat, F. Viégas, M. Wattenberg, G. Corrado, M. Hughes, J. Dean, Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation, Transactions of the Association for Computational Linguistics 5 (2017) 339–351. [16] Plan TL-Sanidad, The Sentence Splitter (SS) for Clinical Cases Written in Spanish, 2019. doi:10.5281/zenodo.2586995. [17] Google Research, BERT, 2019. URL: https://github.com/google-research/bert. [18] L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, J. Han, On the Variance of the Adaptive Learning Rate and Beyond (2019). arXiv:1908.03265. [19] C. D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval, Cambridge University Press, USA, 2008. 476