<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Section Classification in Spanish Clinical Narratives Using Chunked Named Entity Recognition.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrés Carvallo</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matías Rojas</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Muñoz-Castro</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudio Aracena</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rodrigo Guerra</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benjamín Pizarro</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jocelyn Dunstan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Mathematical Modeling</institution>
          ,
          <addr-line>Santiago</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, Pontificia Universidad Católica de Chile</institution>
          ,
          <addr-line>Santiago</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Faculty of Physical and Mathematical Sciences, Universidad de Chile</institution>
          ,
          <addr-line>Santiago</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Institute for Mathematical Computing, Pontificia Universidad Católica de Chile</institution>
          ,
          <addr-line>Santiago</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Millenium Institute Foundational Research on Data</institution>
          ,
          <addr-line>Santiago</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>National Center for Artificial Intelligence</institution>
          ,
          <addr-line>Santiago</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>The extraction and classification of important information from Spanish Electronic Clinical Narratives (ECNs) can be challenging due to the complexity of the clinical text and the limited availability of labeled data. In this paper, we introduce a chunked Named Entity Recognition model designed to parse and classify sections of ECNs into predefined categories. The model aims to improve section identification and classification accuracy within ECNs in the context of the IberLEF ClinAIS Task. Our system achieves a promising performance, obtaining a weighted B2 score of .6958, demonstrating its capability to accurately distinguish borders and boundaries between sections. The paper concludes with a comprehensive analysis of the results, discussing potential implications and suggesting directions for further improvements in clinical text analysis.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Natural Language Processing</kwd>
        <kwd>Clinical Narratives</kwd>
        <kwd>Named Entity Recognition</kwd>
        <kwd>Section Segmentation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Electronic Clinical Narratives (ECN) are the predominant way of documenting and evaluating
important details concerning a patient’s clinical history and progress [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These records embody
comprehensive details about a patient’s prior medical conditions, treatment procedures
undertaken, the specific illnesses’ progression, and the respective medical interventions prescribed
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Beyond their primary function, these narratives serve secondary roles, aiding in tasks such
as identifying rare medical events [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], predicting potential readmissions to the hospital [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and
contributing to public health surveillance [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Nonetheless, these narratives’ dense complexity
and expansive scope present a substantial hurdle in their successful segmentation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The
manual extraction of critical treatment data and relevant information becomes a demanding,
time-intensive task within such extensive textual content [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This process involves partitioning
the text into semantically distinct labeled segments using a predefined set. The benefits of
such segmentation are manifold, ofering novel insights about entities that vary depending on
the section in which they appear [
        <xref ref-type="bibr" rid="ref2 ref7">2, 7</xref>
        ]. For instance, a past medical condition mentioned in
the patient’s history could be instrumental in forecasting potential health risks. In contrast, a
symptom mentioned in the Evolution section could hint at possible side efects of a treatment
regimen [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Introduced at IberLEF 2023 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the ClinAIS task [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ] aims to tackle the challenge of
automatically identifying sections within unstructured Spanish clinical documents. The primary
focus of this task is the categorization of ECNs into seven predetermined medical sections:
Present Illness, Derived from/to, Past Medical History, Family History, Exploration, Treatment,
and Evolution, predominantly targeting progress notes. The intrinsic complexity of this task
arises from the fact that all lexical units within an ECN pertain to a section, with none falling
outside the boundary of the seven predefined medical categories. Moreover, this task carries
significant value as it helps bridge the linguistic gap in medical informatics resources available
for Spanish-speaking nations, facilitating the automation of section identification within ECNs.
      </p>
      <p>This paper presents a novel methodology to enhance the accuracy and eficiency of identifying
these predefined medical sections, thereby ensuring more structured and readily accessible data
for clinical decision-making. We detail our proposed solution for the ClinAIS shared task: a
chunked Named Entity Recognition model. This model identifies the start and end of each of
the seven sections in the ECN and then classifies this segment under one of the seven possible
labels.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>This section provides a comprehensive review of automated techniques utilized for the
segmentation of Electronic Clinical Narratives (ECNs), comprising both rule-based and machine
learning systems. Rule-based systems are grounded on a predetermined set of patterns or
rules for discerning section boundaries, either derived by experts or through explicit methods.
In contrast, machine learning models rely on annotated corpora to train models that can
segment and classify sections in new texts. The proposed method for segmenting ECNs, based
on a Named Entity Recognition (NER) model, belongs to the machine learning classification.</p>
      <p>
        Rule-based methods used for ECN segmentation exhibit varying levels of complexity.
Certain studies [
        <xref ref-type="bibr" rid="ref12 ref13 ref14 ref15">12, 13, 14, 15</xref>
        ] employed exact matching with tagged headings, while others [
        <xref ref-type="bibr" rid="ref16 ref16">16, 16</xref>
        ]
used regular expressions for section identification in specific document types, such as
radiology reports. SecTag [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], a nuanced probabilistic approach employing hierarchical heading
terminology and a statistical model for section boundary detection, represents a more advanced
methodology that has been adopted in further research [
        <xref ref-type="bibr" rid="ref18">18, 19, 20</xref>
        ].
      </p>
      <p>
        In Electronic Clinical Narratives (ECN), diferent techniques use Machine Learning (ML)
to address the identification of sections. For instance, Bramsen et al. [ 21] identified sections
by detecting alterations in temporal focus, while Li et al. [22] approached section mapping
as a sequence-labeling problem. Tepper et al. [23] employed an innovative technique that
uses BIO tags and category labels to classify text lines in Electronic Clinical Narratives (ECNs).
The BIO tag scheme, widely used in Natural Language Processing (NLP), labels tokens as
"B" (Beginning), "I" (Inside), or "O" (Outside) based on their relation to a section, ofering an
efective tool for ECN segmentation. Additional methodologies encompass the use of Support
Vector Machine (SVM) classifiers within the SOAP (Subjective, Objective, Assessment, Plan)
framework [24] and Bayesian models for section detection [25]. These diverse ML strategies
underscore the versatility and adaptability of machine learning methods for segmenting clinical
narratives. Furthermore, the role of annotated datasets turns essential in training models for
entity recognition, with semi-automated [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14, 26</xref>
        ] or active learning techniques [27, 28, 29]
being employed to reduce human efort in the annotation process.
      </p>
      <p>The use of NER systems for entity identification in clinical reports has significantly expanded
in recent years, primarily due to the advances in deep learning models. The clinical domain has
seen the application of numerous strategies, including a two-step method for clinical NER and
normalization [30], nested entity extraction from ECNs [31], and the creation of pre-trained
language models specicfially for NER tasks [ 32]. These NER models have also been rigorously
evaluated under challenging tests to assess their robustness and generalization capabilities
[33, 34, 35].</p>
      <p>Several corpora have been established for Named Entity Recognition (NER) and normalization
tasks within the clinical and biomedical sectors, including but not limited to PharmaCoNER [36],
eHealth-KD [37], eHealth CLEF [38], Chilean Waiting List [39], Cantemist [40], and LivingNER
[41]. However, ClinAIS uniquely positions itself as the inaugural shared task dedicated explicitly
to extracting sections from Electronic Clinical Narratives (ECNs). Previous methods, both
machine learning and rule-based, have limitations, particularly their restriction to identifying
entities at the level of words or word groups. To address this, we propose a chunked NER
model to identify sections and their corresponding spans within an ECN. However, most of the
previous work has been centered on English language texts, thereby creating a gap in resources
for languages other than English, such as Spanish. Furthermore, a persistent challenge involves
associating each word within an ECN with a corresponding section, given the absence of words in
the text without a related section. Its complexity poses a unique obstacle that current methods
cannot handle. Our work objective is to reduce these gaps. We present a multidirectional
approach. We propose implementing a chunked NER model that identifies entities, sections,
and their spans in an ECN, expanding the scope of entity identification beyond individual words
or groups of words. Our study specifically concentrates on the task of section classification
within Electronic Clinical Narratives (ECNs) in Spanish. Given that most studies within the
realm of natural language processing have largely focused on tasks within the English language
context, there has been a significant gap in resources and models available for Spanish. Our
research, therefore, seeks to address this gap by developing and refining models specifically
for section classification in Spanish ECNs. This specialized focus intends to address the unique
complexities associated with processing clinical narratives in Spanish. Lastly, we confront the
challenge of associating every word within an ECN with a corresponding section, a task that
previous methods still need to address adequately. Thus, this work contributes significantly to
the field of NER in clinical narratives, pushing the boundaries of current methodologies and
addressing previously overlooked challenges.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Dataset</title>
      <p>
        The dataset corpus was obtained from the CodiEsp dataset presented in the eHealth CLEF 2020
task [42]. The ClinAIS dataset [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ] consists of a collection of 1,038 ECNs annotated with the
beginning and end spans for each of the seven predetermined medical sections:
• Present Illness: Outlines the reason for consultation, previous treatments, diagnoses,
explorations, and anamnesis.
• Derived from/to: Records any patient transfers, including requesting party and reasons.
• Past Medical History: Chronicles previous pathologies or notes the absence of such.
• Family History: Details family members’ pathologies or acknowledges their absence.
• Exploration: Covers physical examinations, studies, lab tests, and autopsy findings.
• Treatment: Describes treatments or procedures used, including dietary measures.
• Evolution: Traces patient’s health progression and possible diferential diagnoses.
      </p>
      <p>The dataset has three splits: a training set with 781 ECNs (75% of the total), a development
set containing 127 ECNs (12.5%), and a test set with 130 ECNs (12.5%). Moreover, the dataset’s
distribution is stratified by category and annotator, balancing category representation and
accounting for diferent annotator expertise across all subsets.
This section provides an overview of the proposed model for the ClinAIS shared task on
automatic identification of sections in clinical documents.</p>
      <p>As shown in Figure 2, we developed a chunked variant of Named Entity Recognition (NER)
tailored explicitly for Electronic Clinical Narratives (ECNs). The model can recognize and
subsequently categorize distinct sections within an ECN. The categorization process is structured
to classify each ECN section into one of the following seven categories: Present Illness, Derived
from/to, Past Medical History, Family History, Exploration, Treatment, and Evolution.</p>
      <p>The model receives as input an Electronic Clinical Narrative (ECN). Then, we employ a text
chunking module designed to partition the input data into individual sequences, essentially
sectioning the text. This module utilizes a machine learning-based approach, wherein it learns
the optimal manner to segregate the data based on training it receives from annotated datasets
using BIO tags. It is worth noting that the module facilitates a more granular, context-aware
analysis by identifying and separating the text into distinct sections. The eficacy of such
a module lies in its ability to adapt and improve its chunking capabilities through iterative
learning, thereby becoming more proficient at discerning and distinguishing various sections
within a text.</p>
      <p>Each section is then converted into an embedding using a RoBERTa model. These embeddings
are subsequently processed through a RoBERTa encoder and then passed through an additional
RoBERTa encoder for more comprehensive processing. Afterward, the embeddings move
through a linear layer. This layer functions to classify each section into the most probable
category. It also provides a corresponding probability score, which tells us how confident the
model is of each classification. Thus, the final output for each section is a category assignment
with a confidence score.</p>
      <p>Our methodology employed a RoBERTa model fine-tuned specifically for the Spanish
language and Clinical NLP tasks, proposed by Carrino et al. [43]. To adapt the model to our
requirements, we set the hidden size to 256, which influences the complexity and capacity of
the model. A learning rate of 0.1 was selected to control the model’s rate of learning from the
data. Additionally, we used a mini-batch size of 4, enabling eficient computation and gradient
estimation during the training process. The model was trained over 20 epochs, meaning it
went through the entire dataset 20 times to better learn patterns. Importantly, for all layers
involved, including both encoders and the embedding layer, we relied on a transformer-based
RoBERTa model. This implementation ensured a consistent framework across all processing
stages, facilitating a cohesive understanding of the ECN data.</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>
        The results show the model’s performance on the development set provided in the ClinIAIS
challenge. We evaluated our model using the B2 score [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], a metric designed specifically for
the ClinIAIS challenge evaluation. This metric adapts the boundary distance B metric used in
text segmentation [44]. The B2 metric incorporates edit distance, including additions, deletions,
substitutions, and a transposition operation. This metric is based on borders and boundaries,
where a boundary represents the point between two sections. In our evaluation, each token in
the note was considered a border. We also calculated Edit Counts, which measure the number of
edits required at the boundaries of each ECN to achieve a perfect match with the gold standard
and predicted sections. Additionally, we assessed the performance of the proposed model by
counting added words, matches, and deletions in each section to check which of them the model
performed better.
      </p>
      <p>In the analysis of the results, the B2 score distribution is presented in Figure 3 (a). The
majority of the obtained values were above 0.5, indicating a strong tendency towards higher
B2 scores. This demonstrates the model’s accurately identifying section boundaries within
ECNs. Moreover, the B2 score and the distribution obtained in the development set, seem to
indicate that the model ofers the possibility of solving the task slightly diferently than the
actual annotations. In the face of errors in the annotations, problems arise mainly in some
extreme tokens of the annotations due to substitutions, additions, deletions, or transpositions
present in the task.</p>
      <p>Figure 3 (b) reveals the distribution of the number of edits required for accurately detected
sections. Most instances necessitated fewer than five edits per ECN, suggesting a generally robust
performance of the model. Specific sections within ECNs, namely Present Illness, Exploration,
and Treatment, were identified with exceptional precision by the model, each achieving more
than 100 matches, as visualized in Figure 3 (e). Nevertheless, the model faced challenges with
specific sections, as illustrated in Figures 3 (c) and (d). The Exploration section required the
most token additions, amounting to 120, indicating potential under-specification in the model’s
extraction capabilities. Conversely, the Present Illness section necessitated the most token
deletions, reaching a total of 40, possibly reflecting over-specification in the model’s output.
This dissection of the results provides critical insights into the model’s strengths and potential
areas for improvement. It not only confirms the model’s generally sound performance but
also points to the necessity for enhanced precision in the detection and extraction of specific
sections within ECNs.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Limitations</title>
      <p>While the model reaches competitive performance in numerous scenarios for section
identification in ECNs, it has limitations. A primary limitation comes from the text-chunking module
that when not matching the correct section, the error propagates to other sections since, in this
task, all parts of the text have an assigned section. This impacts the accuracy of matching all
the other sections in the ECN. Another source of error is that since the model performance
also depends on the section classification module accuracy, a classification error also negatively
afects the general matching accuracy of the model. The efectiveness of the second module is
(a) Distribution of B2-score.</p>
      <p>0 0</p>
      <p>5 10 Co1u5nt Edit2s0 25 30
(b) Distribution of the number of editions.
(c) Distribution of the number of added
tokens to each section.</p>
      <p>(d) Distribution of the number of deleted</p>
      <p>tokens for each section.
(e) Distribution of the number of matched</p>
      <p>tokens for each section.
intrinsically linked to the performance of the first module. Addressing these shortcomings could
enhance the accuracy and efectiveness of our approach, rendering it a more robust solution for
section identification task within ECNs texts.</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future Work</title>
      <p>In conclusion, our model efectively segments and classifies Spanish Electronic Clinical
Narratives (ECNs) sections, exhibiting strong performance in most evaluated sections. However,
improvements are needed in token addition and deletion, particularly for the Exploration and
Present Illness sections. To address this, we propose two potential avenues for improvement.
The first approach involves a two-phase model. In the initial phase, the model would identify
section boundaries and accurately segment the text. The second phase would then perform
classification within these identified sections, aiming to enhance the precision of predictions.
Another potential strategy involves leveraging a model that trains on the current chunked NER
output. This model would focus on learning from instances where the predicted spans do not
align correctly with the actual sections. By emphasizing the correction of these misaligned
predictions, the model could learn to refine its performance and reduce the impact of error
propagation. Both of these strategies require further investigation and experimentation to
evaluate their efectiveness.</p>
      <p>Future research will focus on refining the model to reduce necessary adjustments and improve
accuracy across all section types. In this line, we highlight data augmentation by generating
paraphrases using generative language models [45]. We can create diferent variations of the
sections for each annotation type and enrich the data set for training in the ClinAIS task.
Subsequently, a two-phase model could be evaluated to address the problem of adding and
removing tokens in predictions.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work was funded by ANID Chile: Basal Funds for Center of Excellence FB210017 (CENIA),
FB210005 (CMM); Millennium Science Initiative Program ICN17_002 (IMFD) and ICN2021_004
(iHealth), Fondecyt grant 11201250, and National Doctoral Scholarships 21211659 (Claudio
Aracena) and 21221155 (Carlos Muñoz-Castro).
clinical notes, Journal of biomedical informatics 56 (2015) 292–299.
[19] S. Doan, L. Bastarache, S. Klimkowski, J. C. Denny, H. Xu, Integrating existing natural
language processing tools for medication extraction from discharge summaries, Journal of
the American Medical Informatics Association 17 (2010) 528–531.
[20] S. Mehrabi, A. Krishnan, A. M. Roch, H. Schmidt, D. Li, J. Kesterson, C. Beesley, P. Dexter,
M. Schmidt, M. Palakal, et al., Identification of patients with family history of
pancreatic cancer-investigation of an nlp system portability, Studies in health technology and
informatics 216 (2015) 604.
[21] P. Bramsen, P. Deshpande, Y. K. Lee, R. Barzilay, Finding temporal order in discharge
summaries, in: AMIA annual symposium proceedings, volume 2006, American Medical
Informatics Association, 2006, p. 81.
[22] Y. Li, S. Lipsky Gorman, N. Elhadad, Section classification in clinical notes using supervised
hidden markov model, in: Proceedings of the 1st ACM international health informatics
symposium, 2010, pp. 744–750.
[23] M. Tepper, D. Capurro, F. Xia, L. Vanderwende, M. Yetisgen-Yildiz, Statistical section
segmentation in free-text clinical records., in: Lrec, 2012, pp. 2001–2008.
[24] D. Mowery, J. Wiebe, S. Visweswaran, H. Harkema, W. W. Chapman, Building an automated
soap classifier for emergency department reports, Journal of biomedical informatics 45
(2012) 71–81.
[25] P. J. Haug, X. Wu, J. P. Ferraro, G. K. Savova, S. M. Huf, C. G. Chute, Developing a section
labeler for clinical documents, in: AMIA Annual Symposium Proceedings, volume 2014,
American Medical Informatics Association, 2014, p. 636.
[26] A. Carvallo, D. Parra, G. Rada, D. Perez, J. I. Vasquez, C. Vergara, Neural language models
for text classification in evidence-based medicine, arXiv preprint arXiv:2012.00584 (2020).
[27] B. Settles, Active learning literature survey (2009).
[28] A. Carvallo, D. Parra, H. Lobel, A. Soto, Automatic document screening of medical literature
using word and text embeddings in an active learning setting, Scientometrics 125 (2020)
3047–3084.
[29] A. Carvallo, D. Parra, Comparing word embeddings for document screening based on
active learning., in: BIRNDL@ SIGIR, 2019, pp. 100–107.
[30] M. Rojas, J. Barros, M. Araneda, J. Dunstan, Flert-matcher: A two-step approach for clinical
named entity recognition and normalization (2022).
[31] P. Báez, F. Bravo-Marquez, J. Dunstan, M. Rojas, F. Villena, Automatic extraction of nested
entities in clinical referrals in spanish, ACM Transactions on Computing for Healthcare
(HEALTH) 3 (2022) 1–22.
[32] M. Rojas, J. Dunstan, F. Villena, Clinical flair: a pre-trained language model for spanish
clinical natural language processing, in: Proceedings of the 4th Clinical Natural Language
Processing Workshop, 2022, pp. 87–92.
[33] C. Aspillaga, A. Carvallo, V. Araujo, Stress test evaluation of transformer-based models in
natural language understanding tasks, arXiv preprint arXiv:2002.06261 (2020).
[34] V. Araujo, A. Carvallo, C. Aspillaga, D. Parra, On adversarial examples for biomedical nlp
tasks, arXiv preprint arXiv:2004.11157 (2020).
[35] V. Araujo, A. Carvallo, C. Aspillaga, C. Thorne, D. Parra, Stress test evaluation of biomedical
word embeddings, arXiv preprint arXiv:2107.11652 (2021).
[36] A. Gonzalez-Agirre, M. Marimon, A. Intxaurrondo, O. Rabal, M. Villegas, M. Krallinger,
Pharmaconer: Pharmacological substances, compounds and proteins named entity
recognition track, in: Proceedings of The 5th Workshop on BioNLP Open Shared Tasks, 2019,
pp. 1–10.
[37] L. Monteagudo-Garcıa, A. Marrero-Santos, M. Fernández-Arias, H. Canizares-Dıaz,
Uhmmm at ehealth-kd challenge 2021, in: Proceedings of the Iberian Languages Evaluation
Forum (IberLEF 2021), 2021.
[38] L. Goeuriot, H. Suominen, L. Kelly, A. Miranda-Escalada, M. Krallinger, Z. Liu, G. Pasi,
G. Gonzalez Saez, M. Viviani, C. Xu, Overview of the clef ehealth evaluation lab 2020, in:
Experimental IR Meets Multilinguality, Multimodality, and Interaction: 11th International
Conference of the CLEF Association, CLEF 2020, Thessaloniki, Greece, September 22–25,
2020, Proceedings 11, Springer, 2020, pp. 255–271.
[39] P. Báez, F. Villena, M. Rojas, M. Durán, J. Dunstan, The chilean waiting list corpus: a
new resource for clinical named entity recognition in spanish, in: Proceedings of the 3rd
clinical natural language processing workshop, 2020, pp. 291–300.
[40] A. García-Pablos, N. Perez, M. Cuadros, Vicomtech at cantemist 2020, in: Proceedings of
the Iberian Languages Evaluation Forum (IberLEF 2020), CEUR Workshop Proceedings,
volume 17, 2020, p. 25.
[41] S. Francis, M.-F. Moens, Task-aware contrastive pre-training for spanish named entity
recognition in livingner challenge (2022).
[42] A. Miranda-Escalada, A. Gonzalez-Agirre, J. Armengol-Estapé, M. Krallinger, Overview of
automatic clinical coding: Annotations, guidelines, and solutions for non-english clinical
cases at codiesp track of clef ehealth 2020., CLEF (Working Notes) 2020 (2020).
[43] C. P. Carrino, J. Llop, M. Pàmies, A. Gutiérrez-Fandiño, J. Armengol-Estapé, J.
SilveiraOcampo, A. Valencia, A. Gonzalez-Agirre, M. Villegas, Pretrained biomedical language
models for clinical NLP in Spanish, in: Proceedings of the 21st Workshop on Biomedical
Language Processing, Association for Computational Linguistics, Dublin, Ireland, 2022,
pp. 193–199. URL: https://aclanthology.org/2022.bionlp-1.19. doi:10.18653/v1/2022.
bionlp-1.19.
[44] C. Fournier, Evaluating text segmentation using boundary edit distance, in: Proceedings
of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1:
Long Papers), 2013, pp. 1702–1712.
[45] Y. Cao, S. Li, Y. Liu, Z. Yan, Y. Dai, P. S. Yu, L. Sun, A comprehensive survey of
aigenerated content (aigc): A history of generative ai from gan to chatgpt, arXiv preprint
arXiv:2303.04226 (2023).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Tange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hasman</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. F. de Vries Robbé</surname>
            ,
            <given-names>H. C.</given-names>
          </string-name>
          <string-name>
            <surname>Schouten</surname>
          </string-name>
          ,
          <article-title>Medical narratives in electronic medical records</article-title>
          ,
          <source>International journal of medical informatics 46</source>
          (
          <year>1997</year>
          )
          <fpage>7</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pomares-Quimbaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kreuzthaler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schulz</surname>
          </string-name>
          ,
          <article-title>Current approaches to identify sections within clinical narratives from electronic health records: a systematic review</article-title>
          ,
          <source>BMC medical research methodology</source>
          <volume>19</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Iqbal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mallah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. G</given-names>
            .
            <surname>Jackson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ball</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. M.</given-names>
            <surname>Ibrahim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Broadbent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Dzahini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stewart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Johnston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Dobson</surname>
          </string-name>
          ,
          <article-title>Identification of adverse drug events from free text electronic patient records and information in a large mental health case register</article-title>
          ,
          <source>PloS one 10</source>
          (
          <year>2015</year>
          )
          <article-title>e0134208</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mahmoudi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kamdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kim</surname>
          </string-name>
          , G. Gonzales,
          <string-name>
            <given-names>K.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Waljee</surname>
          </string-name>
          ,
          <article-title>Use of electronic medical records in development and validation of risk prediction models of hospital readmission: systematic review</article-title>
          , bmj
          <volume>369</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. M.</given-names>
            <surname>Paul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Greene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Newton-Dame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Thorpe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Perlman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. H.</given-names>
            <surname>McVeigh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Gourevitch</surname>
          </string-name>
          ,
          <article-title>The state of population health surveillance using electronic health records: a narrative review</article-title>
          ,
          <source>Population health management 18</source>
          (
          <year>2015</year>
          )
          <fpage>209</fpage>
          -
          <lpage>216</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Poissant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pereira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tamblyn</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Kawasumi,</surname>
          </string-name>
          <article-title>The impact of electronic health records on time eficiency of physicians and nurses: a systematic review</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          <volume>12</volume>
          (
          <year>2005</year>
          )
          <fpage>505</fpage>
          -
          <lpage>516</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Apostolova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Channin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Furst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lytinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Raicu</surname>
          </string-name>
          ,
          <article-title>Automatic segmentation of clinical texts, in: 2009 annual international conference of the IEEE engineering in medicine and biology society</article-title>
          , IEEE,
          <year>2009</year>
          , pp.
          <fpage>5905</fpage>
          -
          <lpage>5908</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Demonceau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ruppar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kristanto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Hughes</surname>
          </string-name>
          , E. Fargher,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kardas</surname>
          </string-name>
          , S. De Geest,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dobbels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Urquhart</surname>
          </string-name>
          , et al.,
          <article-title>Identification and assessment of adherenceenhancing interventions in studies assessing medication adherence through electronically compiled drug dosing histories: a systematic literature review and meta-analysis</article-title>
          ,
          <source>Drugs</source>
          <volume>73</volume>
          (
          <year>2013</year>
          )
          <fpage>545</fpage>
          -
          <lpage>562</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Montes-y Gómez, Overview of IberLEF 2023: Natural Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023), co-located with the 39th Conference of the Spanish Society for Natural Language Processing (SEPLN 2023), CEURWS</article-title>
          .org,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>I. de la Iglesia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Vivó</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Chocrón</surname>
            , G. de Maeztu,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Gojenola</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Atutxa</surname>
          </string-name>
          , Overview of ClinAIS at IberLEF 2023:
          <article-title>Automatic Identification of Sections in Clinical Documents in Spanish</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>71</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>I. de la Iglesia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Vivó</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Chocrón</surname>
            , G. de Maeztu,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Gojenola</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Atutxa</surname>
          </string-name>
          ,
          <article-title>An Open Source Corpus and Automatic Tool for Section Identification in Spanish Health Records</article-title>
          ,
          <source>Journal of Biomedical Informatics</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Markatou</surname>
          </string-name>
          , G. Hripcsak,
          <string-name>
            <given-names>C.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <article-title>Selecting information in electronic health records for knowledge acquisition</article-title>
          ,
          <source>Journal of biomedical informatics 43</source>
          (
          <year>2010</year>
          )
          <fpage>595</fpage>
          -
          <lpage>601</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Edinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bedrick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hersh</surname>
          </string-name>
          ,
          <article-title>Evaluation of clinical text segmentation to facilitate cohort retrieval</article-title>
          ,
          <source>in: AMIA Annual Symposium Proceedings</source>
          , volume
          <volume>2017</volume>
          , American Medical Informatics Association,
          <year>2017</year>
          , p.
          <fpage>660</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Delaney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Florian</surname>
          </string-name>
          ,
          <article-title>Fast model adaptation for automated section classification in electronic medical records</article-title>
          .,
          <source>MedInfo</source>
          <volume>216</volume>
          (
          <year>2015</year>
          )
          <fpage>35</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Waranusast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Haddawy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dailey</surname>
          </string-name>
          ,
          <article-title>Segmentation of text and non-text in on-line handwritten patient record based on spatio-temporal analysis</article-title>
          ,
          <source>in: Artificial Intelligence in Medicine: 12th Conference on Artificial Intelligence in Medicine, AIME</source>
          <year>2009</year>
          , Verona, Italy,
          <source>July 18-22</source>
          ,
          <year>2009</year>
          . Proceedings 12, Springer,
          <year>2009</year>
          , pp.
          <fpage>345</fpage>
          -
          <lpage>354</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Taira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Soderland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Jakobovits</surname>
          </string-name>
          ,
          <article-title>Automatic structuring of radiology free-text reports</article-title>
          ,
          <source>Radiographics</source>
          <volume>21</volume>
          (
          <year>2001</year>
          )
          <fpage>237</fpage>
          -
          <lpage>245</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Denny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. B.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , A.
          <string-name>
            <surname>Spickard</surname>
            <given-names>III</given-names>
          </string-name>
          ,
          <article-title>Development and evaluation of a clinical note section header terminology, in: AMIA annual symposium proceedings</article-title>
          , volume
          <volume>2008</volume>
          , American Medical Informatics Association,
          <year>2008</year>
          , p.
          <fpage>156</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Denny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            .
            <surname>Spickard</surname>
          </string-name>
          <string-name>
            <given-names>III</given-names>
            ,
            <surname>P. J. Speltz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Porier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Rosenstiel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Powers</surname>
          </string-name>
          ,
          <article-title>Using natural language processing to provide personalized learning opportunities from trainee</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>