<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Conference and Labs of the Evaluation Forum, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>BIT.UA at MultiCardioNER: Adapting a Multi-head CRF for Cardiology</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Richard A. A. Jonker</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tiago Almeida</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sérgio Matos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IEETA/DETI, LASI, University of Aveiro</institution>
          ,
          <addr-line>Aveiro</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>0</volume>
      <fpage>9</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>This paper presents the participation of the University of Aveiro Biomedical Informatics and Technologies (BIT.UA) group in the MultiCardioNER task at BioASQ 12, specifically in the CardioDis subtrack, which focuses on adapting Named Entity Recognition (NER) systems to Spanish cardiology case reports. We aimed to address two primary research questions: 1) the generalizability of a NER model trained on general medical concepts to the specialized sub-domain of cardiology, and 2) the robustness of our Multi-Head CRF model. Our team achieved the top result in the competition with an F1 score of 81.99, using the Multi-Head CRF model. Our ifndings indicate that task-specific data is beneficial to the overall performance of the model, although a model without this data can still be competitive. Additionally, our Multi-Head CRF model demonstrated consistent reliability and robustness, performing well on single-class NER tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Named Entity Recognition</kwd>
        <kwd>Spanish Clinical Procedures</kwd>
        <kwd>Transformers</kwd>
        <kwd>Data Augmentation</kwd>
        <kwd>Multi-head CRF</kwd>
        <kwd>Robust ML</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Named Entity Recognition (NER) is a fundamental task in the field of natural language processing,
especially crucial in the medical domain where it aids significantly in structuring unstructured text for
enhanced patient care and medical research. While general NER technologies have seen considerable
advancement, their application to medical texts presents unique challenges due to the complexity and
specificity of the medical language. To address these challenges, numerous competitions have been
organized to foster the development of NER systems specifically tailored to the biomedical domain.</p>
      <p>
        Our team has continually engaged in these competitions, gaining experience through participation
in BioCreative events such as the NLM-Chem Track [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] and the BioRED Track [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ], both of which
were focused on English biomedical articles. These challenges, allowed us to build a solid foundation in
NER methodologies, initially leveraging state-of-the-art BERT-based models and masked Conditional
Random Fields (CRF) [
        <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
        ] over BIO-tagged sequences [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. We subsequently expanded our eforts to
include Spanish medical texts, participating in challenges like MedProcNER [8] and SYMPTEMIST [9],
where we secured first and second places respectively in the NER evaluations. This extensive experience
has led to the creation of our versatile Multi-Head CRF model [10], a highly competitive NER solution
that encapsulates our accumulated expertise.
      </p>
      <p>This year, as part of the BioASQ challenge [11], the Text Mining Unit (TEMU) at Barcelona
Supercomputing Center (BSC), introduced the MultiCardioNER [12, 13] challenge. This challenge addresses
the need for better recognition of clinical variables in cardiology, given the high mortality rate from
cardiovascular diseases (CVDs), which cause approximately 17.9 million deaths annually [14]. It includes
two subtracks: CardioDis, which adapts NER systems to Spanish cardiology case reports, and MultiDrug,
which tests these systems on medication mentions in English, Spanish, and Italian. The dataset includes
a training set of 1,000 general clinical case reports, a development set of 258 cardiology cases, and a test
set of 250 cardiology reports. Systems are evaluated on micro-averaged Precision, Recall, and F-measure
to determine their adaptability and accuracy in diverse medical settings.</p>
      <p>This paper details the participation of the Biomedical Informatics and Technologies of University of
Aveiro (BIT.UA) in the MultiCardioNER challenge, where we utilize our Multi-Head CRF model [10]. Due
to time constraints, we focused our eforts solely on the CardioDis subtrack. Unlike other challenges, the
CardioDis subtrack is designed to adapt general concept recognition systems, trained on the DISTEMIST
dataset, to cardiology case reports. This leads to our first research question:</p>
      <p>Additionally, we are utilizing this challenge to test the robustness of our Multi-Head CRF model,
prompting our second research question:
2 Is our Multi-Head CRF model capable of delivering competitive out-of-the-box performance in a
specialized clinical setting?</p>
      <p>The remainder of the paper is organized as follows: Section 2 provides a review of related work,
focusing on the latest advancements in biomedical Named Entity Recognition. Section 3 describes our
methodology, detailing the application of our Multi-Head CRF model to address the specific challenges
of the MultiCardioNER competition. Section 4 presents our validation results and the oficial challenge
evaluations, showcasing the performance of our model. In Section 5, we discuss the outcomes in relation
to our initial research questions, providing insights into the adaptability and robustness of our approach.
Section 6 concludes the paper, summarizing our key findings.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Named Entity Recognition (NER) in the biomedical domain presents unique challenges due to the
limited availability of annotated data. The annotation process is both time-consuming and requires a
high level of expertise, making it expensive [15, 16]. Most research in biomedical NER has been focused
on the English language [17], but there is a growing need to extend this work to other languages.</p>
      <p>Several competitions have aimed to address clinical NER in the Spanish language, focusing on
various entity types such as compounds and drugs (PharmaCoNER [18]), diseases (DisTEMIST [19]),
tumor morphology (CANTEMIST [20]), medical procedures (MedProcNER [21]), and symptoms
(SympTEMIST [22]). All of these competitions utilize the Spanish Clinical Case Corpus (SPACCC),
which comprises 1,000 clinical case reports from Spanish medical publications (SciELO).</p>
      <p>Recent advancements in NER predominantly utilize transformer-based models for sequence labelling,
which have proven efective in managing complex entity recognition tasks [ 23, 8, 24, 25]. These models
often leverage pretrained language-specific versions of BERT [ 26] and RoBERTa [27], such as BETO, a
BERT model trained on a Spanish corpus [28], and bsc-bio-es, a RoBERTa model tailored to Spanish
biomedical vocabulary [29].</p>
      <p>
        Further enhancements have been observed by integrating masked Conditional Random Fields
(CRFs)[
        <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
        ] atop the transformer backbone, a technique that our research group has profoundly
explored [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4, 8, 9</xref>
        ]. In these works, we also demonstrate strong transfer-learning capabilities,
efectively adapting a Spanish NER model trained on clinical notes [30] to the diverse target domains.
      </p>
      <p>Another significant development is the introduction of the SpanMarker model, which utilizes the
novel Packed Levitated Markers (PL-Marker) approach [31]. This model enhances NER performance
by employing a neighborhood-oriented packing strategy to accurately model entity boundaries and a
subject-oriented strategy for complex span pair classification tasks. Concurrently, innovative methods
like those in the AIONER system, which prepends special tokens to input texts, enable the adaptation
to annotated corpora lacking comprehensive coverage of all entity classes [32]. Similarly, HunFlair2
utilizes a multi-class BIO tagging scheme, enhancing its ability to distinguish between various entity
types such as genes and diseases, thereby showcasing the adaptability and versatility of these advanced
NER systems [33].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>In this section, we describe the dataset, the evaluation metrics used, and provide a brief overview of the
methodology used.</p>
        <p>The MultiCardioNER challenge utilizes two datasets: the previously released DisTEMIST dataset and
the newly released CardioCCC dataset. The DisTEMIST dataset comprises 1,000 documents from the
SPACCC corpus, annotated with disease mentions. For the validation and evaluation of the system,
the CardioCCC dataset was created. This collection consists of 508 cardiology clinical case reports,
split into 258 documents for development and 250 for testing. The goal of the task is to train a generic
system capable of classifying diseases and to evaluate it within the more specific cardiology domain.
Whilst the goal of the competition is to evaluate the adaption of the model, we utilize the validation set
in order to train some models1.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Evaluation Metrics</title>
        <p>The oficial metrics used in this work are the standard micro-averaged precision, recall and F1-scores.
• Precision (P): The ratio of true positive (TP) predictions to the total number of positive predictions
(TP + FP). It is defined as:</p>
        <p>= .</p>
        <p>+  
• Recall (R): The ratio of true positive (TP) predictions to the total number of actual positives (TP
+ FN). It is defined as:</p>
        <p>= .</p>
        <p>+  
• F1-Score (F1): The harmonic mean of precision and recall. It is defined as:</p>
        <p>1 = 2 ·  +· .</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. System</title>
        <p>The system utilizes the Multi-Head CRF model as a basis, in order to test the secondary objective of
this work, the robustness of the multi-head-CRF model [10]. All the work presented in this work is
done utilising the same code and methods from that work. Whilst the multi-head CRF architecture is
designed, and tested for performing multi-class NER, in this work we configure the architecture to use
only one head for single class classification as illustrated in Figure 1. Whilst the general architecture is
the same, in order to keep this work self-contained, we provide a brief overview of the Multi-Head-CRF
architecture.</p>
        <p>
          The architecture was inspired by several existing works [
          <xref ref-type="bibr" rid="ref1 ref2">1, 2, 8, 9</xref>
          ], achieving competitive results in
various challenges. The main idea behind the architecture is to utilize several CRF heads, one per entity
class, with a shared transformer as a base, in this case using a Spanish RoBERTa model [30]. By having
several classification heads, we can solve the problem of overlapping entity classes, since each entity
is trained separately. However, since each head shares the same transformer, significant overhead is
reduced compared to training several individual classifiers. Going more in-depth, the work utilizes the
well-known BIO tagging schema, where each entity has its own tagging schema assigned to it. The CRF
classification heads utilize several dense layers, a classification layer, and a CRF layer. Each of these
heads then produces a series of labels corresponding to the BIO tagging for the specific entity of the
head. The model is trained using a joint loss function, aggregated from each classification head.
1The event organizers explicitly mentioned: “Participants are encouraged to experiment with the documents and annotations
as they see fit.”
        </p>
        <p>The model also employs a document splitting system to overcome the maximum context length of the
transformer. Each document is split in a sliding window fashion, with each piece of the document being
encapsulated with a fixed length context. The work also utilizes some data augmentation techniques,
namely random token replacement and a variation, random token replacement with unknown. In the
ifrst technique, a random input token is replaced with a random token from the vocabulary, while
in the latter, the token is replaced with a special token ‘[UNK]’, however in this work we follow the
conclusions of the original work, and utilize only random token replacement, as it performed better. To
better control the augmentation, two hyperparameters were put in place: one determines the chance
of selecting a sample for augmentation (the augmentation probability), and the other determines how
many tokens within the sample get augmented (percentage tags).</p>
        <p>Finally, following the work of our previous NER submission [9, 8], we employed an entity-level
ensemble to merge the outputs from various models, which proved to improve overall results. The
entity-level ensemble is a majority voting approach over the exact entities predicted by the models,
where each entity is added to the final submission if enough support is present for the given entity.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>In this section we will present the results obtained with the proposed system. Initially, we evaluate the
performance of the model on a validation set, before discussing the results on the final test set of the
competition.</p>
      <sec id="sec-4-1">
        <title>4.1. Validation results</title>
        <p>In order to find the optimal hyperparameters for the models to be submitted, we performed basic
hyperparameter tuning, investigating varying amounts of training epochs, diferent augmentation
configurations, and adjusting the context size and number of hidden layers. The validation set used
for this work was the 258 document development set, containing cardiac data. Figures describing this
basic hyperparameter search can be seen in Figures 2 - 4. The best-performing model configurations on
validation can be seen in Table 1.</p>
        <p>Looking first at Figure 2, we can see that the performance diference between random augmentation
and no augmentation is significant, and the use of augmentation improves the overall performance of
the system. This is inline with the conclusions drawn from the original multi-head CRF model, however
we did not investigate the use of the ‘unk’ augmentation technique. We also note that training for more
epochs does not necessarily result in significant performance gains, especially for models with random
augmentation.</p>
        <p>Examining the optimal augmentation configuration in Figure 3, we observe that lower values of
percentage tags perform better, especially with increasing augmentation probabilities. This corresponds
to selecting a large number of documents and augmenting a small number of tokens. While this is not
the exact same configurations used in the original Multi-Head CRF work, the configuration is relatively
similar, with similar conclusions being drawn.</p>
        <p>Finally, discussing the context size and number of hidden layers as described in Figure 4, we note that
higher context size performs better on average, with the optimal number of hidden layers being either
1 or 3, which is the same as the optimal model in the original paper. We also note that the performance
for our 123 model search ranged from 69.47 to 74.87, with an average of 72.89 and a median of 72.96.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Competition results</title>
        <p>Below we present the oficial results of our systems in the competition. The competition uses the
F1-score as the oficial metric, with the test set containing 250 documents.</p>
        <p>For the competition, we submitted five diferent systems. We followed two separate approaches.
The first approach was to keep the validation set separate in order to test the adaptability of a system
trained exclusively on diseases to directly identify cardiology-related concepts. Our second approach
was to utilize the validation set to train a model, in combination with the generic disease data, with the
intuition that more data is always beneficial for training a model. A summary of our submitted systems
is below, with there performance of the models being displayed in Table 2.</p>
        <p>• run0: This submission used an ensemble of our top 5 validation models (ranging from 74.20-74.87),
trained using all data including the validation set.
• run1: This submission used an ensemble of top 17 runs, trained using all data including the
validation set. (all above 74)
• run2: This submission used our best performing model on validation, trained without validation
data.
• run3: This submission used an ensemble of our top 24 runs, all trained without validation data.
• run4: This represents an ensemble of all submissions containing 41 runs.</p>
        <p>Looking at the results, we obtained the best submission in the competition. This was achieved by
using a large ensemble with multiple runs that included validation data for training. It is not surprising
that run1 outperformed run0. Generally, from previous work, we have observed that larger ensembles
tend to perform better. We also have no assurance that the top 5 models on validation would achieve the
best results on the test set. Next, we note that the performance of the models trained using the validation
data greatly outperformed those that were not. While the performance was 6 percentage points higher,
the base system still performed relatively well, considering it was not directly trained for the cardiac
domain. Similarly, to the comparison of run0 and run1, we see a slight performance increase with run3
over run2 with the increase in the number of models. However, the performance gains were not as
significant. Finally, given these results, our final submission, run4, performed as expected—somewhere
between our best and worst models—given the large discrepancy in data performances. Overall, all our
submissions achieved F1 scores above the median, with our best submission obtaining the top F1 score
and recall in the competition.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>In this work, we proposed two research questions: 1. investigating the impact of training with validation
data for domain-adaptation and 2. investigating the robustness of the multi-head CRF model.</p>
      <p>With regards to the first research question, we can look to our submission results. We utilized two
diferent approaches to see the performance diference when using the validation data in training. The
conclusions we drew indicate that the models have a significant performance gain when using the
validation data. This was an expected outcome; however, the models that did not have access to this
data still obtained competitive performance, being well above the average submission.</p>
      <p>Discussing the second research question, we can see the robustness of the model by looking at
both the overall performance of the model in the competition and our validation results. Our model
performed well overall, obtaining the top performance within the competition. Looking at the validation
we performed, we obtained comparative results to the original work, indicating the overall robustness
and reliability of the model, showcasing its ability to perform well on not only multi-class NER but also
single-class NER.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This study aimed to address two research questions using the MultiCardioNER challenge as a basis. The
ifrst question, related to the task, was whether an NER model trained on a specific domain—in this case,
Diseases—could generalize to a sub-domain, in this case, cardiology data. The conclusions we drew
from our work indicate that while the system does perform well in generalization, better performance
is always achieved when using task-specific data.</p>
      <p>Our second research question was aligned with our previous work and focused on testing the
robustness of our Multi-Head CRF architecture [10]. In this work, we demonstrated that the architecture
is indeed robust, achieving top performance in the competition, which utilizes only a single entity, as
opposed to previous work which utilizes multi-class NER. We further observed the robustness of the
model within our validation tests, where we drew many similar conclusions to those of the original
work. Overall, we believe that this Multi-Head CRF architecture stands as a solid basis for future work
in NER.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was funded by the Foundation for Science and Technology (FCT) in the context of the project
doi.org/10.54499/UIDB/00127/2020. Tiago Almeida is funded by the grant doi.org/10.54499/2020.05784.
BD. Richard A. A. Jonker is funded by the grant PRT/BD/154792/2023. This work was funded by FCT
I.P. under the project Advanced Computing Project 2023.10766.CPCA.A0, platform Vision at University
of Évora.
[8] T. Almeida, R. A. Jonker, R. Poudel, J. M. Silva, S. Matos, Bit. ua at medprocner: discovering medical
procedures in spanish using transformer models with mcrf and augmentation, Working Notes of
CLEF (2023).
[9] R. A. A. Jonker, T. Almeida, S. Matos, J. Almeida, Team BIT.UA @ BC8 SympTEMIST Track: A
Two-Step Pipeline for Discovering and Normalizing Clinical Symptoms in Spanish., 2023. URL:
https://doi.org/10.5281/zenodo.10103360. doi:10.5281/zenodo.10103360.
[10] R. A. A. Jonker, T. Almeida, R. Antunes, J. R. Almeida, S. Matos, Multi-head CRF classifier for
biomedical multi-class Named Entity Recognition on Spanish clinical notes, Database (to appear)
2024 (2024).
[11] A. Nentidis, G. Katsimpras, A. Krithara, S. Lima-López, E. Farré-Maduell, M. Krallinger,
N. Loukachevitch, V. Davydova, E. Tutubalina, G. Paliouras, Overview of BioASQ 2024: The
twelfth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering,
in: L. Goeuriot, P. Mulhem, G. Quénot, D. Schwab, L. Soulier, G. Maria Di Nunzio, P. Galuščáková,
A. García Seco de Herrera, G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality,
Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF
Association (CLEF 2024), 2024.
[12] S. Lima-López, E. Farré-Maduell, J. Rodríguez-Miret, M. Rodríguez-Ortega, L. Lilli, J. Lenkowicz,
G. Ceroni, J. Kossof, A. Shah, A. Nentidis, A. Krithara, G. Katsimpras, G. Paliouras, M. Krallinger,
Overview of MultiCardioNER task at BioASQ 2024 on Medical Speciality and Language Adaptation
of Clinical NER Systems for Spanish, English and Italian, in: G. Faggioli, N. Ferro, P. Galuščáková,
A. García Seco de Herrera (Eds.), CLEF Working Notes, 2024.
[13] S. Lima-López, E. Farré-Maduell, J. Rodríguez-Miret, M. Krallinger, MultiCardioNER Corpus:
Multilingual Adaptation of Clinical NER Systems to the Cardiology Domain, 2024. URL: https:
//doi.org/10.5281/zenodo.11368861. doi:10.5281/zenodo.11368861.
[14] World Health Organization, Cardiovascular diseases (cvds), 2021. URL: https://www.who.int/
health-topics/cardiovascular-diseases#tab=tab_1, accessed: 08-06-2024.
[15] Z. Li, S. Zhang, Y. Song, J. Park, Extrinsic factors afecting the accuracy of biomedical ner, 2023.</p>
      <p>arXiv:2305.18152.
[16] D. Demner-Fushman, W. W. Chapman, C. J. McDonald, What can natural language processing do
for clinical decision support?, J. Biomed. Inform. 42 (2009) 760–772.
[17] E. French, B. T. McInnes, An overview of biomedical entity linking throughout the years, Journal
of biomedical informatics 137 (2023) 104252.
[18] A. Gonzalez-Agirre, M. Marimon, A. Intxaurrondo, O. Rabal, M. Villegas, M. Krallinger,
Pharmaconer: Pharmacological substances, compounds and proteins named entity recognition track, in:
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks, 2019, pp. 1–10.
[19] A. Miranda-Escalada, L. Gascó, S. Lima-López, E. Farré-Maduell, D. Estrada, A. Nentidis, A. Krithara,
G. Katsimpras, G. Paliouras, M. Krallinger, Overview of distemist at bioasq: Automatic detection
and normalization of diseases from clinical texts: results, methods, evaluation and multilingual
resources., in: CLEF (Working Notes), 2022, pp. 179–203.
[20] A. Miranda-Escalada, E. Farré, M. Krallinger, Named entity recognition, concept normalization
and clinical coding: Overview of the cantemist track for cancer text mining in spanish, corpus,
guidelines, methods and results., IberLEF@ SEPLN (2020) 303–323.
[21] S. Lima-López, E. Farré-Maduell, L. Gascó, A. Nentidis, A. Krithara, G. Katsimpras, G. Paliouras,
M. Krallinger, Overview of medprocner task on medical procedure detection and entity linking at
bioasq 2023, Working Notes of CLEF (2023).
[22] S. Lima-López, E. Farré-Maduell, L. Gasco-Sánchez, J. Rodríguez-Miret, M. Krallinger, Overview of
symptemist at biocreative viii: corpus, guidelines and evaluation of systems for the detection and
normalization of symptoms, signs and findings from text, in: Proceedings of the BioCreative VIII
Challenge and Workshop: Curation and Evaluation in the era of Generative Models, 2023.
[23] S. Vassileva, G. Grazhdanski, S. Boytcheva, I. Koychev, Fusion @ bioasq medprocner:
Transformerbased approach for procedure recognition and linking in spanish clinical text, in: M.
Aliannejadi, G. Faggioli, N. Ferro, M. Vlachos (Eds.), Working Notes of the Conference and Labs
of the Evaluation Forum (CLEF 2023), Thessaloniki, Greece, September 18th to 21st, 2023,
volume 3497 of CEUR Workshop Proceedings, CEUR-WS.org, 2023, pp. 190–205. URL: https:
//ceur-ws.org/Vol-3497/paper-017.pdf.
[24] E. Zotova, A. G. Pablos, M. Cuadros, G. Rigau, VICOMTECH at medprocner 2023:
Transformersbased sequence-labelling and cross-encoding for entity detection and normalisation in spanish
clinical texts, in: M. Aliannejadi, G. Faggioli, N. Ferro, M. Vlachos (Eds.), Working Notes of the
Conference and Labs of the Evaluation Forum (CLEF 2023), Thessaloniki, Greece, September 18th
to 21st, 2023, volume 3497 of CEUR Workshop Proceedings, CEUR-WS.org, 2023, pp. 206–218. URL:
https://ceur-ws.org/Vol-3497/paper-018.pdf.
[25] G. Grazhdanski, S. Vassileva, I. Koychev, S. Boytcheva, Team Fusion@SU @ BC8 SympTEMIST
track: Transformer- based Approach for Symptom Recognition and Linking, 2023. URL: https:
//doi.org/10.5281/zenodo.10103750. doi:10.5281/zenodo.10103750.
[26] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers
for language understanding, in: J. Burstein, C. Doran, T. Solorio (Eds.), Proceedings of the 2019
Conference of the North American Chapter of the Association for Computational Linguistics:
Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational
Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186. URL: https://aclanthology.org/N19-1423.
doi:10.18653/v1/N19-1423.
[27] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,</p>
      <p>Roberta: A robustly optimized bert pretraining approach, 2019. arXiv:1907.11692.
[28] J. Cañete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, J. Pérez, Spanish pre-trained bert model and
evaluation data, in: PML4DC at ICLR 2020, 2020.
[29] C. P. Carrino, J. Llop, M. Pàmies, A. Gutiérrez-Fandiño, J. Armengol-Estapé, J. Silveira-Ocampo,
A. Valencia, A. Gonzalez-Agirre, M. Villegas, Pretrained biomedical language models for clinical
NLP in Spanish, in: D. Demner-Fushman, K. B. Cohen, S. Ananiadou, J. Tsujii (Eds.), Proceedings of
the 21st Workshop on Biomedical Language Processing, Association for Computational Linguistics,
Dublin, Ireland, 2022, pp. 193–199. URL: https://aclanthology.org/2022.bionlp-1.19. doi:10.18653/
v1/2022.bionlp-1.19.
[30] L. Campillos-Llanos, A. Valverde-Mateos, A. Capllonch-Carrión, A. Moreno-Sandoval, A clinical
trials corpus annotated with umls© entities to enhance the access to evidence-based medicine,
BMC Medical Informatics and Decision Making 21 (2021) 1–19.
[31] D. Ye, Y. Lin, P. Li, M. Sun, Packed levitated marker for entity and relation extraction, in:
S. Muresan, P. Nakov, A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational
Linguistics, Dublin, Ireland, 2022, pp. 4904–4917. URL: https://aclanthology.org/2022.acl-long.337.
doi:10.18653/v1/2022.acl-long.337.
[32] L. Luo, C.-H. Wei, P.-T. Lai, R. Leaman, Q. Chen, Z. Lu, Aioner: all-in-one scheme-based biomedical
named entity recognition using deep learning, Bioinformatics 39 (2023) btad310.
[33] M. Sänger, S. Garda, X. D. Wang, L. Weber-Genzel, P. Droop, B. Fuchs, A. Akbik, U. Leser, Hunflair2
in a cross-corpus evaluation of named entity recognition and normalization tools, arXiv preprint
arXiv:2402.12372 (2024).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Antunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Matos</surname>
          </string-name>
          ,
          <article-title>Chemical detection and indexing in pubmed full text articles using deep learning and rule-based methods</article-title>
          ,
          <source>CDR</source>
          <volume>1500</volume>
          (
          <year>2021</year>
          )
          <fpage>15943</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Antunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Matos</surname>
          </string-name>
          ,
          <article-title>Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics</article-title>
          ,
          <source>Database</source>
          <year>2022</year>
          (
          <year>2022</year>
          )
          <article-title>baac047</article-title>
          . URL: https://doi.org/10.1093/database/baac047. doi:
          <volume>10</volume>
          .1093/database/baac047.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A. A.</given-names>
            <surname>Jonker</surname>
          </string-name>
          , D. da Silva,
          <string-name>
            <given-names>J.</given-names>
            <surname>Almeida</surname>
          </string-name>
          , S. Matos,
          <string-name>
            <surname>BIT.</surname>
          </string-name>
          <article-title>UA at Biocreative VIII track 1: A joint model for relation classification</article-title>
          and novelty detection,
          <year>2023</year>
          . URL: https://doi.org/10.5281/ zenodo.10117952. doi:
          <volume>10</volume>
          .5281/zenodo.10117952.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A. A.</given-names>
            <surname>Jonker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Antunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Matos</surname>
          </string-name>
          , Towards Discovery:
          <article-title>An End-to-End System for Uncovering Novel Biomedical Relations</article-title>
          , Database (to appear)
          <year>2024</year>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Masked conditional random fields for sequence labeling</article-title>
          , in: K.
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rumshisky</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Hakkani-Tur</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Beltagy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Cotterell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Chakraborty</surname>
          </string-name>
          , Y. Zhou (Eds.),
          <source>Proceedings of the</source>
          <year>2021</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>2024</fpage>
          -
          <lpage>2035</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .naacl-main.
          <volume>163</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .naacl-main.
          <volume>163</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ratinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <article-title>Design challenges and misconceptions in named entity recognition</article-title>
          , in: S.
          <string-name>
            <surname>Stevenson</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          Carreras (Eds.),
          <source>Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Boulder, Colorado,
          <year>2009</year>
          , pp.
          <fpage>147</fpage>
          -
          <lpage>155</lpage>
          . URL: https://aclanthology.org/W09-1119.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lample</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ballesteros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Subramanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kawakami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dyer</surname>
          </string-name>
          ,
          <article-title>Neural architectures for named entity recognition</article-title>
          , in: K.
          <string-name>
            <surname>Knight</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Nenkova</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          Rambow (Eds.),
          <source>Proceedings of the</source>
          <year>2016</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , San Diego, California,
          <year>2016</year>
          , pp.
          <fpage>260</fpage>
          -
          <lpage>270</lpage>
          . URL: https://aclanthology.org/N16-1030. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N16</fpage>
          -1030.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>