=Paper=
{{Paper
|id=Vol-3224/paper03
|storemode=property
|title=ESAN: Automating medical scribing in Spanish
|pdfUrl=https://ceur-ws.org/Vol-3224/paper03.pdf
|volume=Vol-3224
|authors=Naiara Pérez,Aitor Álvarez,Arantza del Pozo,Andrés Arbona,Oihane Ibarrola,Marta Suarez,Pedro de la Peña Tejada,Itziar Cuenca
|dblpUrl=https://dblp.org/rec/conf/sepln/PerezAPAISTC22
}}
==ESAN: Automating medical scribing in Spanish==
<pdf width="1500px">https://ceur-ws.org/Vol-3224/paper03.pdf</pdf>
<pre>
ESAN: Automating medical scribing in Spanish
ESAN: Automatización de la toma de notas clínicas

Naiara Perez1 , Aitor Álvarez1 , Arantza del Pozo1 , Andrés Arbona2 , Oihane Ibarrola2 ,
Marta Suarez2 , Pedro de la Peña Tejada3 and Itziar Cuenca3
1
  Fundación Vicomtech, Basque Research and Technology Alliance (BRTA), Donostia-San Sebastián, 20009, Spain
2
  Biokeralty Research Institute AIE, Vitoria-Gasteiz, 01510, Spain
3
  Instituto Ibermática de Innovación (i3B), Donostia-San Sebastián, 20009, Spain


                                           Abstract
                                           The ESAN research project aims at developing a Spanish digital scribe that reduces the administrative
                                           workload of clinicians and enhances the quality of the data collected in the medical records by automatically
                                           transcribing and structuring doctor-patient conversations. At present, the main goal of the consortium
                                           consists in collecting and annotating the data necessary for training and adapting speech and natural language
                                           processing models based on deep learning architectures.

                                           Keywords
                                           clinical data, EHR, speech recognition, data mining


1. Introduction                                                                               makes irregular use of grammar, standard medical
                                                                                              terminology, and of the EHR structure itself. It may
The past few decades have seen a worldwide, steady                                            omit information that is not of evident immediate
growth in the adoption of electronic health record                                            value. Moreover, it is barely codified (if at all), all
(EHR) systems, with the ultimate goal of improv-                                              of which hinders its automated exploitation.
ing the efficiency and quality of the provided care.                                             More recently, the major and rapid advances of
In spite of their many virtues, EHRs have also in-                                            Deep Learning have prompted a surge of interest in
creased the administrative workload of healthcare                                             the application of artificial intelligence to medical
professionals, to the point of having been identified                                         conversations, so much so that several tech giants
as a direct cause of burnout and lack of meaningful                                           have recently launched a workshop exclusively fo-
doctor-patient eye contact [1, among others].                                                 cused on this research topic [2, 3].
   Meanwhile, the accumulation of massive amounts                                                In this context we present the ESAN project (from
of digitised health records in the era of Big Data                                            “EStructuración de conversaciones en el ámbito SAN-
has boosted the pursuit of public policies aimed at                                           itario” or Structuring conversations in the health
accelerating the advent of new healthcare paradigms                                           sector in Spanish, but also “esan” or say, tell in
such as personalised medicine. Yet Big Data is                                                Basque). ESAN is the first step of a joint, long-
no more profitable than the quality of the data                                               term effort towards alleviating the above introduced
allows. Currently, much of the data collected in                                              problems through the research and development of
EHRs is in the form of free text written in haste. It                                         a Spanish digital scribe.

SEPLN-PD 2022. Annual Conference of the Spanish
Association for Natural Language Processing 2022:                                             2. Consortium and funding body
Projects and Demonstrations, September 21-23, 2022, A
Coruña, Spain                                                                                 ESAN is partially funded by the Basque Gov-
$ nperez@vicomtech.org (N. Perez);                                                            ernment through the Elkartek 2021 program of
aalvarez@vicomtech.org (A. Álvarez);                                                          the SPRI Group under the grant agreement KK-
adelpozo@vicomtech.org (A. d. Pozo);
                                                                                              2021/00117. It will run from 04/2021 to 12/2023.
andres.arbona@keralty.com (A. Arbona);
oihane.ibarrolv@keralty.com (O. Ibarrola);                                                       The consortium includes the Vicomtech research
marta.suarez@keralty.com (M. Suarez);                                                         centre, (https://www.vicomtech.org), Grupo Ker-
pm.delapena@ibermatica.com (P. d. l. P. Tejada);                                              alty’s R&D division BioKeralty Research Institute
ia.cuenca@ibermatica.com (I. Cuenca)                                                          (https://biokeralty.com), and Grupo Ibermática’s
 0000-0001-8648-0428 (N. Perez); 0000-0002-7938-4486
                                                                                              R&D business unit and project leader Instituto Iber-
(A. Álvarez)
                                       © 2022 Copyright for this paper by its authors. Use    mática de Innovación or i3B (https://ibermatica.
                                       permitted under Creative Commons License Attribu-
                                       tion 4.0 International (CC BY 4.0).                    com/en/innovacion/).
    CEUR
    Workshop
                  http://ceur-ws.org
                                       CEUR Workshop Proceedings (CEUR-
                  ISSN 1613-0073
    Proceedings

                                       WS.org)


                                                                                             10
3. Goals and expected results                            4. Challenges
The long-term, main technical objective of the           The challenges faced by the ESAN research project
ESAN consortium is to develop a Spanish digital          are twofold because it must overcome major ethical
scribe. A digital scribe is, in short, a program capa-   and legal obstacles in addition to the scientific and
ble of documenting the encounter between a patient       technological.
and their doctor or nurse. It involves Automatic           Conversations between patients and their doctors
Speech Recognition (ASR) to transcribe the conver-       are among the most sensitive pieces of information
sations, and Natural Language Processing (NLP)           conceivable. Voice recordings alone qualify largely
to understand and transform those transcripts as         as personal data according to the many policies that
necessary (e.g., extract relevant information and        we are subject to, from the international (e.g., the
classify it into EHR sections).                          GDPR of the European Union) to the local (e.g.,
   At this early stage, the identified challenges of     ethics committees ). This means that there is no
the project (see S4) point primarily to the need for     public dataset that we can leverage, and that we
problem-specific data and the lack thereof. Thus,        must overcome these ethical and legal barriers in
the focus of this initial venture of the ESAN consor-    order to collect it ourselves.
tium is set on building a new corpus. The expected         Regarding the scientific and technical challenges,
results of this line of work are:                        at this stage of the project, the difficulties of devel-
                                                         oping a Spanish digital scribe stem also from the
   • 150 hours of anonymised recordings (∼1K             nature of the data to be processed, in all its facets:
     encounters) in 4 medical specialities, along
     with their manual, enriched transcripts and    Genre The input to the scribe is spontaneous
     the corresponding written medical notes, all   speech produced in the context of a dialogue be-
     in Spanish.                                    tween two or more people. Current ASR technology
   • Guidelines for the annotation of the dialogues still struggles in this scenario due to a) the difficulty
     regarding the information extraction (IE)      to obtain quality audio, where all the interlocutors
     and classification tasks, as well as the manualare recorded with optimal volume and energy and
     annotations resulting from their application.  b) linguistic phenomena inherent to spontaneous
                                                    speech (overlapping, false starts, repetitions, etc.).
  Second, we plan to train benchmark models for
                                                    Human-human conversations are a serious challenge
enriched ASR and IE adapted to the application sce-
                                                    for NLP systems too for similar reasons. For exam-
narios of ESAN, exploiting primarily the aforesaid
                                                    ple, questions may go unanswered or be answered at
corpus and other publicly available data that might
                                                    a later point in the dialogue, or relevant information
be considered beneficial.. The specific expected
                                                    may be transmitted through non-verbal means.
results in this regard are:

   • Robust neural models for enriched ASR               Domain Along with the genre, the highly spe-
     adapted to face-to-face clinician-patient con-      cialised application domain constitutes the key defin-
     versations in Spanish, including automatic          ing challenge of ESAN. Out-of-the-box, generic ASR
     capitalisation and punctuation, and super-          and NLP solutions are not viable here simply be-
     vised diarisation.                                  cause they are not prepared to deal with the spe-
   • Initial neural IE and classification models         cialised vocabulary and the extraction or classifica-
     to transform the dialogue transcripts into          tion targets of the clinical domain. Further, building
     structured data that can be fed to an EHR.          new solutions and resources requires at least the
                                                         guidance of expert knowledge.
   The third and final major goal is to flesh out
the next steps based on quantitative and qualita- Register Conversations in consultations present
tive evaluations of the obtained technology. The the added difficulty that doctors tend to address
expected final outcome is then:                         their patients in technical terms, while the patients
                                                        may be less formal and employ more colloquialisms.
    • A road map towards productisation, taking From the perspective of the technologies involved
       into account the performance of the ASR in the project, this discursive gap is translated into
       and NLP models and other aspects that are an increased range of vocabulary and semantic com-
       outside the current scope (e.g., usability, com- plexity that the automated systems must recognise
       munication standards).                           and understand.


                                                     11
Language The ESAN consortium expects to                    The ASR models will be built using the nnet3
gather data in—and, ultimately, be able to process—     DNN setup of the Kaldi recognition toolkit [4] fol-
multiple varieties of the Spanish language, including   lowing our previous approach based on CNN layers
the Colombian. The differences in pronunciation         and a TDNN-F network [5]. The ASR engine will
and vocabulary with standard Castilian Spanish          also include n-gram language models for decoding
pose an added important challenge both to ASR           and re-scoring the initial lattices. The transcriptions
and NLP technologies and serve only to aggravate        will be enriched with capitalisation and punctuation
the problems listed above.                              marks generated by the BERT-based AutoPunct
   To these concerns, we must add the fact that the     system [6], which will be also adapted to the do-
errors of the enriched ASR modules are cascaded         main. Finally, new speaker diarisation models will
down the pipeline to the text processing modules.       be trained for the Kaldi X-Vectors-based system [7]
In addition, it is noteworthy that the health sector    to be developed.
is most demanding and intolerant of errors, due to
the gravity of the consequences that could follow       5.3. From transcripts to the EHR
from decisions based on inaccurate data.
                                                        The corpus of transcribed dialogues will be manually
                                                        annotated at a later stage to serve as training and
5. Approach                                             testing data of IE and classification models.
                                                           The annotation policy, whose precise definition
5.1. Audio collection                                   is another key task of ESAN, will be built around
This is the most crucial yet sensitive task of the      related efforts [8, 9, 10]. It will define guidelines
project. The strategy involves recording real doctor-   for the annotation of information at different levels,
patient encounters of at least 4 specialities in a      including mentions of signs and symptoms, disor-
private hospital.                                       ders, and medications, as well as related attributes
   Measures have been taken towards minimising          (severity, location, dosage, etc.).
the impact that this activity might have on the            The models for the automatic detection and clas-
doctors’ primary job, such as training dedicated        sification of this information will be based on the
staff responsible for informing the patients about      ubiquitous Transformers architecture [11]. We plan
ESAN and asking for their consent in the waiting        on exploiting the latest neural language models for
rooms, prior to meeting their doctors.                  Spanish and the biomedical domain [12, 13]. This
   In addition, we have already tested a variety of     line of work will also profit from previous work of
commercial microphone arrays both in terms of           consortium members on clinical IE [14, 15, 16].
quality and user-friendliness, so as to ensure their
suitability before starting the audio collection cam- 5.4. Validation
paign. We will make the recordings with the audio
                                                        Each of the above-mentioned technological modules
software Audacity (https://www.audacityteam.org)
                                                        will be assessed in isolation with gold standard data
in PCM WAV format at 44.1kHz and 24 bits.
                                                        and the appropriate metrics (e.g., WER, F1-score)
                                                        during their development. We will also measure
5.2. Enriched ASR                                       the impact of the errors propagated from the ASR
                                                        down the processing pipeline.
The ASR models will be trained with the 150 hours
                                                           Equally, if not more, important in order to flesh
of acoustic corpus to be recorded during the project.
                                                        out the productisation road map, we will carry out
   This corpus will be manually annotated through
                                                        a qualitative evaluation of the technology as an
the Transcriber 1.5.1 tool (http://trans.sourceforge.
                                                        integrated solution prototype. To that end, we
net) with spoken literal transcriptions and speaker
                                                        intend to devise an initial integration of all the core
turn information. The annotation process will be as-
                                                        modules, and to develop a graphic user interface for
sisted by ASR technology, which will be iteratively
                                                        demonstration and testing purposes, through which
enhanced as new annotated audio sets are gener-
                                                        expert testers will be able to identify potential areas
ated: the first set of drafts to be post-edited will be
                                                        of improvement.
created with generic Castilian Spanish recognition
models; once each set is manually corrected, new
adapted versions of the ASR models will be trained 6. Conclusions
incrementally. This process, aimed at making the
annotation task more productive, will be repeated We have presented the ESAN project, whose aim is
until all hours are manually revised.                   to develop a Spanish digital scribe that reduces the


                                                     12
administrative workload of clinicians and enhances          DNN embeddings for speaker recognition, in:
the quality of the data collected in the EHRs.              Proceedings of ICASSP, 2018, pp. 5329–5333.
   The envisaged solution consists of a neural en-      [8] I. Shafran, N. Du, L. Tran, A. Perry, L. Keyes,
riched ASR component followed by IE and classifi-           M. Knichel, A. Domin, L. Huang, Y.-h. Chen,
cation modules, based too on neural architectures.          G. Li, M. Wang, L. El Shafey, H. Soltau, J. S.
To that end, the consortium will devote significant         Paul, The Medical Scribe: Corpus develop-
resources and effort to gathering the data necessary        ment and model performance analyses, in:
for adapting this technology to the challenging do-         Proceedings of LREC, 2020, pp. 2036–2044.
main that doctor-patient face-to-face conversations     [9] P. Chocrón, Á. Abella, G. de Maeztu, Con-
pose. This emphasis on data collection and domain           textMEL: Classifying contextual modifiers in
adaptation sets ESAN apart from related projects            clinical text, Proces. de Leng. Nat. 65 (2020)
[17, among others].                                         45–52.
                                                       [10] B. Magnini, B. Altuna, A. Lavelli, M. Speranza,
                                                            R. Zanoli, The E3C project: Collection and
Acknowledgments                                             annotation of a multilingual Corpus of Clinical
                                                            Cases, in: Proceedings of CLiC-it 2020, 2021,
ESAN is partially funded by the Basque Business
                                                            pp. 1–7.
Development Agency, SPRI, under the grant agree-
                                                       [11] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkor-
ment KK-2021/00117.
                                                            eit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polo-
                                                            sukhin, Attention is all you need, in: Proceed-
References                                                  ings of NIPS, 2017, pp. 6000–6010.
                                                       [12] G. López-García, J. M. Jerez, N. Ribelles,
 [1] C. Sinsky, L. Colligan, L. Li, M. Prgomet,             E. Alba, F. J. Veredas, Detection of tumor mor-
     S. Reynolds, L. Goeders, J. Westbrook,                 phology mentions in clinical reports in spanish
     M. Tutty, G. Blike, Allocation of physician            using transformers, in: Proceedings of IWANN,
     time in ambulatory practice: a time and mo-            2021, pp. 24—-35.
     tion study in 4 specialties, Ann Intern Med       [13] C. P. Carrino, J. Llop, M. Pàmies, A. Gutiérrez-
     165 (2016) 753–760.                                    Fandiño, J. Armengol-Estapé, J. Silveira-
 [2] P. Bhatia, S. Lin, R. Gangadharaiah, B. Wal-           Ocampo, A. Valencia, A. Gonzalez-Agirre,
     lace, I. Shafran, C. Shivade, N. Du, M. Diab           M. Villegas, Pretrained biomedical language
     (Eds.), Proceedings of the 1st Workshop on             models for clinical NLP in Spanish, in: Pro-
     NLPMC, 2020.                                           ceedings of BioNLP, 2022, pp. 193–199.
 [3] C. Shivade, R. Gangadharaiah, S. Gella,           [14] N. Perez, P. Accuosto, À. Bravo, M. Cuadros,
     S. Konam, S. Yuan, Y. Zhang, P. Bhatia,                E. Martínez-Garcia, H. Saggion, G. Rigau,
     B. Wallace (Eds.), Proceedings of the 2nd              Cross-lingual semantic annotation of biomed-
     Workshop on NLPMC, 2021.                               ical literature: experiments in Spanish and
 [4] D. Povey, A. Ghoshal, G. Boulianne, L. Bur-            English, Bioinformatics 36 (2019) 1872–1880.
     get, O. Glembek, N. Goel, M. Hannemann,           [15] S. Lima-López, N. Perez, M. Cuadros,
     P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky,         G. Rigau, NUBes: A corpus of negation and
     G. Stemmer, K. Vesely, The kaldi speech recog-         uncertainty in Spanish clinical texts, in: Pro-
     nition toolkit, in: Proceedings of IEEE ASRU,          ceedings of LREC, 2020, pp. 5772–5781.
     2011, pp. 1–4.                                    [16] A. García-Pablos, N. Perez, M. Cuadros, Vi-
 [5] A. Álvarez, H. Arzelus, I. G. Torre,                   comtech at eHealth-KD challenge 2021: Deep
     A. González-Docasal, Evaluating novel speech           learning approaches to model health-related
     transcription architectures on the Spanish             text in Spanish, in: Proceedings of IberLEF,
     RTVE2020 Database, Appl. Sci. 12 (2022)                2021, pp. 712–724.
     1–16.                                             [17] P. J. Vivancos-Vicente, J. A. García-Díaz, J. S.
 [6] A. González-Docasal, A. García-Pablos,                 Castejón-Garrido, R. Valencia-García, ISMR
     H. Arzelus, A. Álvarez, AutoPunct: A BERT-             - Sistema basado en Deep Learning para la
     based automatic punctuation and capitalisa-            transcripción y extracción de conocimiento en
     tion system for Spanish and Basque, Proces.            entrevistas médico-paciente, in: Proceedings
     de Leng. Nat. 67 (2021) 59–68.                         of SEPLN-PD, 2021, pp. 1–4.
 [7] D. Snyder, D. Garcia-Romero, G. Sell,
     D. Povey, S. Khudanpur, X-Vectors: Robust


                                                   13

</pre>