=Paper=
{{Paper
|id=Vol-3224/paper03
|storemode=property
|title=ESAN: Automating medical scribing in Spanish
|pdfUrl=https://ceur-ws.org/Vol-3224/paper03.pdf
|volume=Vol-3224
|authors=Naiara Pérez,Aitor Álvarez,Arantza del Pozo,Andrés Arbona,Oihane Ibarrola,Marta Suarez,Pedro de la Peña Tejada,Itziar Cuenca
|dblpUrl=https://dblp.org/rec/conf/sepln/PerezAPAISTC22
}}
==ESAN: Automating medical scribing in Spanish==
ESAN: Automating medical scribing in Spanish ESAN: Automatización de la toma de notas clínicas Naiara Perez1 , Aitor Álvarez1 , Arantza del Pozo1 , Andrés Arbona2 , Oihane Ibarrola2 , Marta Suarez2 , Pedro de la Peña Tejada3 and Itziar Cuenca3 1 Fundación Vicomtech, Basque Research and Technology Alliance (BRTA), Donostia-San Sebastián, 20009, Spain 2 Biokeralty Research Institute AIE, Vitoria-Gasteiz, 01510, Spain 3 Instituto Ibermática de Innovación (i3B), Donostia-San Sebastián, 20009, Spain Abstract The ESAN research project aims at developing a Spanish digital scribe that reduces the administrative workload of clinicians and enhances the quality of the data collected in the medical records by automatically transcribing and structuring doctor-patient conversations. At present, the main goal of the consortium consists in collecting and annotating the data necessary for training and adapting speech and natural language processing models based on deep learning architectures. Keywords clinical data, EHR, speech recognition, data mining 1. Introduction makes irregular use of grammar, standard medical terminology, and of the EHR structure itself. It may The past few decades have seen a worldwide, steady omit information that is not of evident immediate growth in the adoption of electronic health record value. Moreover, it is barely codified (if at all), all (EHR) systems, with the ultimate goal of improv- of which hinders its automated exploitation. ing the efficiency and quality of the provided care. More recently, the major and rapid advances of In spite of their many virtues, EHRs have also in- Deep Learning have prompted a surge of interest in creased the administrative workload of healthcare the application of artificial intelligence to medical professionals, to the point of having been identified conversations, so much so that several tech giants as a direct cause of burnout and lack of meaningful have recently launched a workshop exclusively fo- doctor-patient eye contact [1, among others]. cused on this research topic [2, 3]. Meanwhile, the accumulation of massive amounts In this context we present the ESAN project (from of digitised health records in the era of Big Data “EStructuración de conversaciones en el ámbito SAN- has boosted the pursuit of public policies aimed at itario” or Structuring conversations in the health accelerating the advent of new healthcare paradigms sector in Spanish, but also “esan” or say, tell in such as personalised medicine. Yet Big Data is Basque). ESAN is the first step of a joint, long- no more profitable than the quality of the data term effort towards alleviating the above introduced allows. Currently, much of the data collected in problems through the research and development of EHRs is in the form of free text written in haste. It a Spanish digital scribe. SEPLN-PD 2022. Annual Conference of the Spanish Association for Natural Language Processing 2022: 2. Consortium and funding body Projects and Demonstrations, September 21-23, 2022, A Coruña, Spain ESAN is partially funded by the Basque Gov- $ nperez@vicomtech.org (N. Perez); ernment through the Elkartek 2021 program of aalvarez@vicomtech.org (A. Álvarez); the SPRI Group under the grant agreement KK- adelpozo@vicomtech.org (A. d. Pozo); 2021/00117. It will run from 04/2021 to 12/2023. andres.arbona@keralty.com (A. Arbona); oihane.ibarrolv@keralty.com (O. Ibarrola); The consortium includes the Vicomtech research marta.suarez@keralty.com (M. Suarez); centre, (https://www.vicomtech.org), Grupo Ker- pm.delapena@ibermatica.com (P. d. l. P. Tejada); alty’s R&D division BioKeralty Research Institute ia.cuenca@ibermatica.com (I. Cuenca) (https://biokeralty.com), and Grupo Ibermática’s 0000-0001-8648-0428 (N. Perez); 0000-0002-7938-4486 R&D business unit and project leader Instituto Iber- (A. Álvarez) © 2022 Copyright for this paper by its authors. Use mática de Innovación or i3B (https://ibermatica. permitted under Creative Commons License Attribu- tion 4.0 International (CC BY 4.0). com/en/innovacion/). CEUR Workshop http://ceur-ws.org CEUR Workshop Proceedings (CEUR- ISSN 1613-0073 Proceedings WS.org) 10 3. Goals and expected results 4. Challenges The long-term, main technical objective of the The challenges faced by the ESAN research project ESAN consortium is to develop a Spanish digital are twofold because it must overcome major ethical scribe. A digital scribe is, in short, a program capa- and legal obstacles in addition to the scientific and ble of documenting the encounter between a patient technological. and their doctor or nurse. It involves Automatic Conversations between patients and their doctors Speech Recognition (ASR) to transcribe the conver- are among the most sensitive pieces of information sations, and Natural Language Processing (NLP) conceivable. Voice recordings alone qualify largely to understand and transform those transcripts as as personal data according to the many policies that necessary (e.g., extract relevant information and we are subject to, from the international (e.g., the classify it into EHR sections). GDPR of the European Union) to the local (e.g., At this early stage, the identified challenges of ethics committees ). This means that there is no the project (see S4) point primarily to the need for public dataset that we can leverage, and that we problem-specific data and the lack thereof. Thus, must overcome these ethical and legal barriers in the focus of this initial venture of the ESAN consor- order to collect it ourselves. tium is set on building a new corpus. The expected Regarding the scientific and technical challenges, results of this line of work are: at this stage of the project, the difficulties of devel- oping a Spanish digital scribe stem also from the • 150 hours of anonymised recordings (∼1K nature of the data to be processed, in all its facets: encounters) in 4 medical specialities, along with their manual, enriched transcripts and Genre The input to the scribe is spontaneous the corresponding written medical notes, all speech produced in the context of a dialogue be- in Spanish. tween two or more people. Current ASR technology • Guidelines for the annotation of the dialogues still struggles in this scenario due to a) the difficulty regarding the information extraction (IE) to obtain quality audio, where all the interlocutors and classification tasks, as well as the manualare recorded with optimal volume and energy and annotations resulting from their application. b) linguistic phenomena inherent to spontaneous speech (overlapping, false starts, repetitions, etc.). Second, we plan to train benchmark models for Human-human conversations are a serious challenge enriched ASR and IE adapted to the application sce- for NLP systems too for similar reasons. For exam- narios of ESAN, exploiting primarily the aforesaid ple, questions may go unanswered or be answered at corpus and other publicly available data that might a later point in the dialogue, or relevant information be considered beneficial.. The specific expected may be transmitted through non-verbal means. results in this regard are: • Robust neural models for enriched ASR Domain Along with the genre, the highly spe- adapted to face-to-face clinician-patient con- cialised application domain constitutes the key defin- versations in Spanish, including automatic ing challenge of ESAN. Out-of-the-box, generic ASR capitalisation and punctuation, and super- and NLP solutions are not viable here simply be- vised diarisation. cause they are not prepared to deal with the spe- • Initial neural IE and classification models cialised vocabulary and the extraction or classifica- to transform the dialogue transcripts into tion targets of the clinical domain. Further, building structured data that can be fed to an EHR. new solutions and resources requires at least the guidance of expert knowledge. The third and final major goal is to flesh out the next steps based on quantitative and qualita- Register Conversations in consultations present tive evaluations of the obtained technology. The the added difficulty that doctors tend to address expected final outcome is then: their patients in technical terms, while the patients may be less formal and employ more colloquialisms. • A road map towards productisation, taking From the perspective of the technologies involved into account the performance of the ASR in the project, this discursive gap is translated into and NLP models and other aspects that are an increased range of vocabulary and semantic com- outside the current scope (e.g., usability, com- plexity that the automated systems must recognise munication standards). and understand. 11 Language The ESAN consortium expects to The ASR models will be built using the nnet3 gather data in—and, ultimately, be able to process— DNN setup of the Kaldi recognition toolkit [4] fol- multiple varieties of the Spanish language, including lowing our previous approach based on CNN layers the Colombian. The differences in pronunciation and a TDNN-F network [5]. The ASR engine will and vocabulary with standard Castilian Spanish also include n-gram language models for decoding pose an added important challenge both to ASR and re-scoring the initial lattices. The transcriptions and NLP technologies and serve only to aggravate will be enriched with capitalisation and punctuation the problems listed above. marks generated by the BERT-based AutoPunct To these concerns, we must add the fact that the system [6], which will be also adapted to the do- errors of the enriched ASR modules are cascaded main. Finally, new speaker diarisation models will down the pipeline to the text processing modules. be trained for the Kaldi X-Vectors-based system [7] In addition, it is noteworthy that the health sector to be developed. is most demanding and intolerant of errors, due to the gravity of the consequences that could follow 5.3. From transcripts to the EHR from decisions based on inaccurate data. The corpus of transcribed dialogues will be manually annotated at a later stage to serve as training and 5. Approach testing data of IE and classification models. The annotation policy, whose precise definition 5.1. Audio collection is another key task of ESAN, will be built around This is the most crucial yet sensitive task of the related efforts [8, 9, 10]. It will define guidelines project. The strategy involves recording real doctor- for the annotation of information at different levels, patient encounters of at least 4 specialities in a including mentions of signs and symptoms, disor- private hospital. ders, and medications, as well as related attributes Measures have been taken towards minimising (severity, location, dosage, etc.). the impact that this activity might have on the The models for the automatic detection and clas- doctors’ primary job, such as training dedicated sification of this information will be based on the staff responsible for informing the patients about ubiquitous Transformers architecture [11]. We plan ESAN and asking for their consent in the waiting on exploiting the latest neural language models for rooms, prior to meeting their doctors. Spanish and the biomedical domain [12, 13]. This In addition, we have already tested a variety of line of work will also profit from previous work of commercial microphone arrays both in terms of consortium members on clinical IE [14, 15, 16]. quality and user-friendliness, so as to ensure their suitability before starting the audio collection cam- 5.4. Validation paign. We will make the recordings with the audio Each of the above-mentioned technological modules software Audacity (https://www.audacityteam.org) will be assessed in isolation with gold standard data in PCM WAV format at 44.1kHz and 24 bits. and the appropriate metrics (e.g., WER, F1-score) during their development. We will also measure 5.2. Enriched ASR the impact of the errors propagated from the ASR down the processing pipeline. The ASR models will be trained with the 150 hours Equally, if not more, important in order to flesh of acoustic corpus to be recorded during the project. out the productisation road map, we will carry out This corpus will be manually annotated through a qualitative evaluation of the technology as an the Transcriber 1.5.1 tool (http://trans.sourceforge. integrated solution prototype. To that end, we net) with spoken literal transcriptions and speaker intend to devise an initial integration of all the core turn information. The annotation process will be as- modules, and to develop a graphic user interface for sisted by ASR technology, which will be iteratively demonstration and testing purposes, through which enhanced as new annotated audio sets are gener- expert testers will be able to identify potential areas ated: the first set of drafts to be post-edited will be of improvement. created with generic Castilian Spanish recognition models; once each set is manually corrected, new adapted versions of the ASR models will be trained 6. Conclusions incrementally. This process, aimed at making the annotation task more productive, will be repeated We have presented the ESAN project, whose aim is until all hours are manually revised. to develop a Spanish digital scribe that reduces the 12 administrative workload of clinicians and enhances DNN embeddings for speaker recognition, in: the quality of the data collected in the EHRs. Proceedings of ICASSP, 2018, pp. 5329–5333. The envisaged solution consists of a neural en- [8] I. Shafran, N. Du, L. Tran, A. Perry, L. Keyes, riched ASR component followed by IE and classifi- M. Knichel, A. Domin, L. Huang, Y.-h. Chen, cation modules, based too on neural architectures. G. Li, M. Wang, L. El Shafey, H. Soltau, J. S. To that end, the consortium will devote significant Paul, The Medical Scribe: Corpus develop- resources and effort to gathering the data necessary ment and model performance analyses, in: for adapting this technology to the challenging do- Proceedings of LREC, 2020, pp. 2036–2044. main that doctor-patient face-to-face conversations [9] P. Chocrón, Á. Abella, G. de Maeztu, Con- pose. This emphasis on data collection and domain textMEL: Classifying contextual modifiers in adaptation sets ESAN apart from related projects clinical text, Proces. de Leng. Nat. 65 (2020) [17, among others]. 45–52. [10] B. Magnini, B. Altuna, A. Lavelli, M. Speranza, R. Zanoli, The E3C project: Collection and Acknowledgments annotation of a multilingual Corpus of Clinical Cases, in: Proceedings of CLiC-it 2020, 2021, ESAN is partially funded by the Basque Business pp. 1–7. Development Agency, SPRI, under the grant agree- [11] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkor- ment KK-2021/00117. eit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polo- sukhin, Attention is all you need, in: Proceed- References ings of NIPS, 2017, pp. 6000–6010. [12] G. López-García, J. M. Jerez, N. Ribelles, [1] C. Sinsky, L. Colligan, L. Li, M. Prgomet, E. Alba, F. J. Veredas, Detection of tumor mor- S. Reynolds, L. Goeders, J. Westbrook, phology mentions in clinical reports in spanish M. Tutty, G. Blike, Allocation of physician using transformers, in: Proceedings of IWANN, time in ambulatory practice: a time and mo- 2021, pp. 24—-35. tion study in 4 specialties, Ann Intern Med [13] C. P. Carrino, J. Llop, M. Pàmies, A. Gutiérrez- 165 (2016) 753–760. Fandiño, J. Armengol-Estapé, J. Silveira- [2] P. Bhatia, S. Lin, R. Gangadharaiah, B. Wal- Ocampo, A. Valencia, A. Gonzalez-Agirre, lace, I. Shafran, C. Shivade, N. Du, M. Diab M. Villegas, Pretrained biomedical language (Eds.), Proceedings of the 1st Workshop on models for clinical NLP in Spanish, in: Pro- NLPMC, 2020. ceedings of BioNLP, 2022, pp. 193–199. [3] C. Shivade, R. Gangadharaiah, S. Gella, [14] N. Perez, P. Accuosto, À. Bravo, M. Cuadros, S. Konam, S. Yuan, Y. Zhang, P. Bhatia, E. Martínez-Garcia, H. Saggion, G. Rigau, B. Wallace (Eds.), Proceedings of the 2nd Cross-lingual semantic annotation of biomed- Workshop on NLPMC, 2021. ical literature: experiments in Spanish and [4] D. Povey, A. Ghoshal, G. Boulianne, L. Bur- English, Bioinformatics 36 (2019) 1872–1880. get, O. Glembek, N. Goel, M. Hannemann, [15] S. Lima-López, N. Perez, M. Cuadros, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Rigau, NUBes: A corpus of negation and G. Stemmer, K. Vesely, The kaldi speech recog- uncertainty in Spanish clinical texts, in: Pro- nition toolkit, in: Proceedings of IEEE ASRU, ceedings of LREC, 2020, pp. 5772–5781. 2011, pp. 1–4. [16] A. García-Pablos, N. Perez, M. Cuadros, Vi- [5] A. Álvarez, H. Arzelus, I. G. Torre, comtech at eHealth-KD challenge 2021: Deep A. González-Docasal, Evaluating novel speech learning approaches to model health-related transcription architectures on the Spanish text in Spanish, in: Proceedings of IberLEF, RTVE2020 Database, Appl. Sci. 12 (2022) 2021, pp. 712–724. 1–16. [17] P. J. Vivancos-Vicente, J. A. García-Díaz, J. S. [6] A. González-Docasal, A. García-Pablos, Castejón-Garrido, R. Valencia-García, ISMR H. Arzelus, A. Álvarez, AutoPunct: A BERT- - Sistema basado en Deep Learning para la based automatic punctuation and capitalisa- transcripción y extracción de conocimiento en tion system for Spanish and Basque, Proces. entrevistas médico-paciente, in: Proceedings de Leng. Nat. 67 (2021) 59–68. of SEPLN-PD, 2021, pp. 1–4. [7] D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, S. Khudanpur, X-Vectors: Robust 13