<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Y. Liu);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>LYX_DMIIP_FDU At BioASQ 2025: Utilizing BERT Embeddings For Biomedical Text Mining⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yuxuan Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shanfeng Zhu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University</institution>
          ,
          <addr-line>Shanghai</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education</institution>
          ,
          <addr-line>Shanghai</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Shanghai Key Lab of Intelligent Information Processing and Shanghai Institute of Artificial Intelligence Algorithm, Fudan University</institution>
          ,
          <addr-line>Shanghai</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Zhangjiang Fudan International Innovation Center</institution>
          ,
          <addr-line>Shanghai</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Due to the increasing volume of biomedical documents, biomedical text mining becomes increasingly important to gather knowledge automatically from large amount of biomedical text. In this paper, we present our method for three information extraction tasks in BioASQ Lab 2025: ELCardioCC, BioNNE-L and GutBrainIE. In these tasks, we utilized diferent BERT models to generate embeddings for concept representation, which are then used for biomedical named entity recognition, entity linking and relation extraction task. Our methods are simple yet eficient, ranking the first place on ELCardioCC and the multilingual track of BioNNE-L task. On GutBrainIE, our method surpassed the baseline method on Named Entity Recognition and Ternary Mention-Based Relation Extraction subtask. Our results demonstrated that many biomedical information extraction task can be eficiently solved by utilizing BERT embeddings.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Biomedical text mining</kwd>
        <kwd>Named entity recognition</kwd>
        <kwd>Named entity linking</kwd>
        <kwd>Relation extraction</kwd>
        <kwd>Transformers</kwd>
        <kwd>BERT</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Over the past few years, the amount of biomedical text grows quickly. These texts includes biomedical
research papers, clinical notes and patient health records. Biomedical text mining methods are required
to eficiently extract useful information from biomedical text. For example, Named Entity Recognition
(NER) recognizes diferent types of entities in biomedical text, Named Entity Linking (NEN) maps the
entities found by NER to a certain entry in a given database, and Relation Extraction (RE) finds the
relationships between each pair of entities.</p>
      <p>
        Deep learning using Bidirectional Encoder Representations from Transformers (BERT) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is
currently the most prevalent method for biomedical text mining, due to its great performance and
eficiency. Many BERT models are pretrained on large biomedical corpus, and can be easily
finetuned for downstream tasks. A few models to name are PubmedBERT [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], BiolinkBERT [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and
xlm-roberta-large-english-clinical [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>We utilized BERT embeddings to participate in three biomedical text mining tasks in the BioASQ
lab of CLEF 2025 [5]: ELCardioCC[6], BioNNE-L[7] and GutBrainIE[8]. Overall, our methods achieved
good results.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>BERT has long been a useful tool for many natural language processing (NLP) tasks [9]. After intensive
pretraining, BERT embeddings can efectively capture the meanings of a certain token or a sentence
along with its current context, and are often fed into a classifier for classification tasks or used for
calculating similarity for entity linking task. In the past few years, many works have utilized BERT
embeddings for biomedical text mining.</p>
      <p>For biomedical NER, sequence labeling is a common approach, where the BERT embedding of each
token is first calculated, then each token are classified as either begin of entity(B), continue of entity(I), or
not inside entity(O) for a certain entity type. Based on this method, BERN2 [10] developed a multi-task
model that treats recognition of diferent types of entity as diferent tasks by sharing the same BERT
model across all entity types and using a separate classifier for each entity type. As a result, the model
can eficiently recognize all given entity types with a single BERT forward pass. Subsequent works
include AIONER [11] and Hunflair2 [ 12], which combines multiple biomedical NER datasets for training
and achieve good result on all datasets.</p>
      <p>For biomedical NEN, a database is first selected, then a preprocessing step is performed to calculate
the BERT embeddings of all entities within the database. After that, the BERT embedding of the target
entity is also calculated and matched to the database embeddings to find the entity that has the most
similar embedding to the target entity in the database. This process is often accelerated with certain
python package like faiss [13]. However, the default BERT model may not generate an embedding that
is most suitable for NEN task, thus SapBERT [14] performed an additional pretraining using contrastive
learning specifically on NEN task. More recent models like geBERT [15] and BERGAMOT[16] used graph
neural networks to capture the relation between entities, and used carefully designed training objective
for a better entity representation.</p>
      <p>For biomedical RE, ATLOP [17] combines the BERT embeddings of the two target entities to extract
relation from and used them for a 0/1 classification for each relation type. They also proposed a localized
context pooling technique that find the information that is important to both entities by utilizing
attention weights in BERT layers. BioREX [18] carefully combined multiple biomedical RE datasets
with diferent labeling standards and trained on them for better model performance. A more recent
work [19] trained the BERT model to predict not only the relation type but also the relation direction
and whether the relation is novel.</p>
    </sec>
    <sec id="sec-3">
      <title>3. ELCardioCC</title>
      <sec id="sec-3-1">
        <title>3.1. Task description</title>
        <p>The ELCardioCC of BioASQ 2025 requires participants to extract entities from discharge letters written
in Greek that record patients’ conditions, treatments and outcomes. There are five types of entities:
chief complaint, diagnosis, prior medical history, drugs and cardiac echo.</p>
        <p>The task consists of three subtasks: NER subtask requires accurate prediction of the span of each
entity, EL subtask requires not only the entity span but also its associated ICD-10 code, MLC-X subtask
requires identification of all the ICD-10 codes that are present in a letter, while the corresponding entity
spans are not required. This subtask also contains a further step that requires participants to identify
the mentions from ICD-10 codes using explainable AI techniques. All the subtasks are evaluated using
precision, recall and F1-score.</p>
        <p>The dataset of this task, containing 1500 discharge letters in Greek (1000 for training, 500 for testing),
was collected from the cardiology department of a tertiary hospital in Greece. Sensitive personal
information has been removed from the dataset. Then a professional team labeled all the entities and
their corresponding ICD-10 codes in the dataset. There are 10168 labeled entities in the dataset in total,
and the average length of the entities is 14.312 chars. The organizers also provided a supplementary file
that contains all the 324 ICD-10 codes used in the labeling process.</p>
        <p>The challenge of this task lies in the fact that Greek is a language of scarce biomedical data, thus the
pretrained BERT model might not capture Greek language structures very well. Also, the entity types
are not given in the training dataset, so either an additional step should be performed to generate the
entity types or the model has to view all types as an unified entity type, which may degrade model
performance.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Method</title>
        <sec id="sec-3-2-1">
          <title>3.2.1. NER subtask</title>
          <p>For the NER subtask, we applied a standard sequence labeling scheme. For the BERT model, we used
bert-base-greek-uncased-v1 [20] from HuggingFace, which is a Greek BERT model pretrained
exclusively on Greek data, because our initial experiment shows that Greek only BERT model performs
better than the multilingual models. We calculated the final layer embedding of each token using the
Greek BERT model and fed them into a simple classifier with two MLP layers to output probability for
BIO classes.</p>
          <p>During training, we fine-tuned all BERT layers and the classifier simultaneously using a standard
cross-entropy loss for classification tasks:
 = ∑︁ −  ∑︀( )</p>
          <p>=1 ( )
=1</p>
          <p>Where  is the number of data points in total,  is the number of classes,  is the model output
logit of the ground truth class of data point .</p>
          <p>The context length was set to 512. We used AdamW [21] as optimizer, with a learning rate of 3e-5 for
both BERT and MLP layers and a weight decay of 0.1 for regularization, other parameters are as default.
Since the dataset is small, we trained the model for 50 epochs. The model achieved a NER F1-score of
0.7671 on the development dataset.</p>
          <p>During prediction, since a single document can be longer than the context length of the model, we
applied a sliding window technique that divides the document into overlapping windows of the context
length of the model and do inference on them separately. For the overlapping regions, we divide them
into two equal halfs. The tokens in the left half are predicted together with the left window, and the
tokens in the right half are predicted together with the right window.
(1)</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. EL subtask</title>
          <p>For the EL subtask, we selected SapBERT, a model pretrained on UMLS database [22] with an entity
linking objective as our BERT model. Since SapBERT works on English only, we used a simple translation
model Libretranslate deployed locally to translate the entities recognized by NER into English. Since
the label set is small (only 324 diferent labels), we treated the problem as a classification task, using the
translated entity names as BERT input and used the CLS token representation of the last layer as the
representation of the entity. We then used two MLP layers to classify each entity into one of the 324
classes. As in the NER task, we fine-tuned SapBERT and classifier simultaneously using cross-entropy
loss as described above.</p>
          <p>To enrich the dataset, we collected two versions of the ICD-10 database online, and added the entities
as well as their corresponding ICD-10 codes into the training dataset. The training parameters of the
EL task are similar to NER task, except the learning rate is increased to 3e-4 for MLP layers and 1e-4 for
SapBERT. The model achieved a F1-score of 0.7096 on development dataset after training.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. MLC-X subtask</title>
          <p>For the MLC-X subtask, we didn’t design a specific method, and simply used the EL prediction of all
entities in a document and removed duplicates to obtain the document-wise labels. We didn’t participate
in the explainable AI subtask.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Results and Discussion</title>
        <p>The oficial test results for all three subtasks are shown in Table 1, Table 2 and Table 3, respectively. For
simplicity, we only showed the results of the best run of each team as well as the baseline results. We
didn’t show the results of the explainable AI subtask because only one team submitted this subtask and
their results were not satisfactory.</p>
        <p>Overall, we achieved good results for this task, surpassing the baseline and ranking the first place in
all subtasks. The results shows that a Greek BERT model can capture the language of Greek data well.
Using an extra model to predict the entity classes in the training dataset and training the NER model to
predict each entity type separately may further improve the results.</p>
        <p>For the EL subtask, translating the entity names to English may not be necessary, since one can
directly use GreekBERT on the entity names and predict the ICD-10 codes. The svassileva team seems
to have applied this method and also achieved good results. This method didn’t utilize the ICD-10
database (which is in English). Another method (other than translation) is to use a multilingual model
that takes both Greek and English entities as input, but the performance of the multilingual model may
not be as good as unilingual model. Nevertheless, our performance improvement over the baseline is
slight, which suggests that the addition of the database entries into the training dataset doesn’t bring
significant performance boost.</p>
        <p>English_train
English_development</p>
        <p>Russian_train</p>
        <p>Russian_development
4. BioNNE-L
4.1. Task description
The BioNNE-L shared task is based on the BioNNE[23, 24] task of BioASQ 2024 . Diferent from BioNNE,
BioNNE-L requires participants to link the entities recognized by NER to the Unified Medical Language
System (UMLS) database. Some of the entities are nested, which means they can be a substring of each
other. There are three tracks in this task: English track, Russian track and Bilingual track. These tracks
require the model to tackle text written in diferent languages.</p>
        <p>The provided dataset consists of English and Russian scientific abstracts in the biomedical domain.
Three types of entities are used for this task: ANATOMY, CHEM and DISO. For each entity in the text,
the document ID, entity type, entity span and the UMLS identifier are given. The dataset statistics
is shown in Table 4. A vocab file in both English and Russian is also provided, which contains all
UMLS identifiers used in this task. However, the Russian version of the vocab is incomplete, so some of
the Russian entities has to be linked to English version of UMLS items. There are 1510431 diferent
identifiers in total, with 3902187 English entities and 145803 Russian entities.</p>
        <p>For evaluation, the participants may submit up to 5 predictions for each entity, as well as their relative
ranking. The submitted results are evaluated based on their Top-1 accuracy, Top-5 accuracy and Mean
Reciprocal Rank (MRR) that averages the inverse ranking of the correct answer across all entities. A
baseline method as well as its results on development dataset has also been provided. This baseline
method directly maps to BERT embeddings of the target entity to the embeddings of vocab entities and
selects the most similar entities as prediction.</p>
        <p>There are a few challenges when participating in this task: Firstly, the bilingual data requires the
model to understand both English and Russian well. Secondly, the nested nature of the entities means
that their names can be quite similar to each other, but their meanings may difer, which requires the
model to clearly diferentiate between them. Thirdly, the provided vocab file is very large, consisting of
millions of entities, and many entities in the vocab are quite similar though having diferent identifiers.
For example, the ‘Depressed state’ entity has the identifier of C0011570, the ‘Depressive Disorders’
entity has the identifier of C0011581, and the ‘Depression’ entity has the identifier of C1999266. Fourthly,
the UMLS identifiers are not organized with a tree structure. Instead, they only reflects the order they
were added into the database, and thus they don’t carry any meanings by themselves.</p>
      </sec>
      <sec id="sec-3-4">
        <title>4.2. Method</title>
        <p>For the BERT model, we chose the same models the baseline used: BERGAMOT-multilingual-GAT [15]
and gebert_eng_gat [16]. These models utilize graph representations of the relationships between
entities to obtain a better representation of each entity.</p>
        <p>Inspired by SapBERT [14], we fine-tuned the BERT model using contrastive learning to enhance
the BERT representations for entity linking. We fixed the representation of the vocab entities to the
embeddings generated by the original model during the entire fine-tuning process. Since the text
data was also provided, we used the following input format: [CLS] text_before [SEP] entity [SEP]
text_after. Both text_before and text_after are clipped to about 250 chars. This format enables the
output embedding of the [CLS] token to be the context-aware representation of the entity.
 = ∑︁
=1
 ∑︀
=1 ( −  )2</p>
        <p>= ∑︁

=1
− 
( · + )</p>
        <p>∑︀</p>
        <p>=1 ( ·   )</p>
        <p>The detailed training process is as follows: We first used a warm-up training phase to make the
model more familiar with the new input format, where the output embeddings of the [CLS] token when
the model is provided with context should be similar to when only the entity name is provided. We
applied a simple mean squared loss between the two embeddings for this phase:
(2)
(3)</p>
        <p>Where  is the number of data entities,  is the embedding dimension,  is the embedding
without context, and  is the embedding with context.</p>
        <p>Then comes the contrastive learning phase, where for each entity, we select the entities in the vocab
that are most similar to the target entities for training. Entities that have the same identifier as the label
are used as positive examples, and entities that have diferent identifiers are used as negative examples.
For the loss function, we used InfoNCE which is quite commonly used for contrastive learning:</p>
        <p>Where  is the number of target entities,  is the number of negative samples,  is the output
embedding of target entity, + is the output embedding of the positive entity,  is the output embedding
of the th negative entity, and  is a hyperparameter.</p>
        <p>We used AdamW optimizer for both phases, with a learning rate of 3e-5 and weight decay of 0.1. We
trained the model for 1 epoch at the warm-up training phase and 4 epochs at the contrastive learning
phase. During contrastive learning, we used faiss [13] to select the top 20 most similar entities to the
target entity in the vocab, among with the last positive entity and the top 10 negative entities are being
used for loss calculation. If all 20 selected entities are negative, we randomly select one entity from the
vocab that has the correct identifier as positive entity. If there are less than 10 negative entities, we fill
the rest with random entities from the vocab.</p>
        <p>During inference, we used faiss to match the output embeddings of the model to the fixed vocab
embeddings, and used the associated identifiers of the corresponding vocab entity as output. Among all
identifiers, we selected top 5 diferent identifiers as the final prediction.</p>
        <p>We observed that some of the entity identifiers in the training dataset doesn’t appear in the original
vocab, so before training, we added the entities together with their identifiers into the training and
development dataset into the vocab to enrich it (removing duplicates with the same entity name and
identifier). We then trained our final version model on both training and development dataset.</p>
        <p>We applied our method to all three tracks. Since the BERGAMOT-multilingual-GAT model is
multilingual, it can be used for all three tracks. For the Russian track, a multilingual model is required
since some entities has to be linked to English vocab. For the English track. we attempted to train
the English only gebert model with only English data and vocab, but the result is not as good as the
multilingual model, so we just submitted the prediction of the multilingual model as our final prediction.</p>
      </sec>
      <sec id="sec-3-5">
        <title>4.3. Results and Discussion</title>
        <p>The oficial test results for all three tracks are shown in Table 5, Table 6 and Table 7, respectively. Our
method achieved promising results, ranking the first or second place in all three tracks. Although we
used the same BERT model as the baseline, our fine-tuned model surpassed the baseline model by a
large margin (the top 1 accuracy of the baseline model on the development dataset is 0.53), suggesting
that fitting the dataset bias is important in machine learning tasks.</p>
        <p>The model result on Russian data is better than on English data. This may be due to more Russian
data in the training dataset. Also because of this, we observed that our fine-tuned bilingual model
linked almost all entities to Russian vocab entities (If available). This could also be the reason why our
English only gebert model didn’t perform as well.</p>
        <p>As shown in the results, our model tends to have high top 5 accuracy, but relatively low top 1 accuracy.
This may be closely linked to our contrastive training process: we selected the last positive entity
as our positive example instead of the rfist one, thus encouraging the model to form a more robust
representation. However, this may also reduce top 1 accuracy.</p>
        <p>Adding the entities of the training dataset into the vocab improved our Top 1 accuracy by about 0.02,
because some identifiers that were not in the original vocab can now be mapped to. Also, some entities
in the training dataset may re-appear in the test dataset, so the prediction might be easier for these
entities. A drawback of this trick is that the same entity may have diferent UMLS identifiers based
on diferent context, so directly adding the entity and its annotated identifier to the vocab may not be
exactly accurate and may sometimes cause wrong mapping.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. GutBrainIE</title>
      <sec id="sec-4-1">
        <title>5.1. Task description</title>
        <p>The GutBrainIE task of BioASQ 2025 focuses on extracting structured information from biomedical
abstracts related to the gut microbiota and its connections with Parkinson’s disease and mental health.
It consists of two subtasks: NER subtask and RE subtask. The RE subtask is further split into three
tasks: Binary Relation Extraction (BT-RE), Ternary Tag-Based Relation Extraction (TT-RE) and Ternary
Mention-Based Relation Extraction (TM-RE). The first task only requires participants to find all entity
class pairs that have relation between them in a document, the second task also requires the exact
relation type, and the third task additionally requires the extraction of entities that are involved in each
relation. All tasks are evaluated using Micro and Macro Precision, Recall and F1-score.</p>
        <p>Four dataset collections are provided in the train data of this task: The Platinum collection is manually
annotated by 7 experts and further validated by experts from another university; the Gold collection
is also manually annotated by experts but has not been validated; the Silver collection is manually
annotated by about 40 students trained by experts; while the Bronze collection is automatically generated
by baseline algorithms. There is also a development set annotated by experts. The dataset statistics
are shown in Table 8. Each dataset provides the entity spans as well as all relations between them in
a document. There are totally 13 entity types in the dataset, and 25 relation types between them are
defined.</p>
        <p>This task is challenging for a few reasons: For the NER subtask, there are lots of entity types, and the
number of annotations of each type is not balanced. For example, In the Platinum training dataset, there
are 1232 DDF entities, but only 20 food entities. This imbalance makes recognition of food dificult. For
the RE subtask, the two entities that are in relation with each other are not necessarily in the same
sentence but can be very far away, which makes RE dificult because the relation may not be explicitly
stated in these cases. Also, the RE subtask is based on the recognized entities of the NER subtask, and a
wrong NER prediction will result in wrong RE prediction as well.</p>
      </sec>
      <sec id="sec-4-2">
        <title>5.2. Method</title>
        <sec id="sec-4-2-1">
          <title>5.2.1. NER subtask</title>
          <p>For the NER subtask, we used similar sequence labeling scheme as we used for ELCardioCC. The
main diference is that we applied a multi-task scheme to recognize all entity types using a single
BERT model and used a separate classifier for each entity type. We trained all classifiers and the BERT
model together using cross-entropy loss (Equation is given in section 3.2.1). To deal with the imbalance
between diferent entity types, we applied a simple re-weighting to increase the loss of entity types
that doesn’t have enough training data:</p>
          <p>= ∑︁  
=1 
(4)</p>
          <p>Where  is the number of classes,  is the total number of entities,  is the number of entities of
class , and  is the total loss of entities of class .</p>
          <p>We used AdamW optimizer with a learning rate of 3e-5 and weight decay of 0.1 for training, and we
trained on the silver, gold and platinum training set for 25 epochs. For the final submitted version of
the model, we also added the development set into the training data.</p>
          <p>
            We noticed that model ensemble by simply merging the outputs of diferent BERT models by voting
can slightly boost model performance. Specifically, after separate inference using each model, we used
average predicted probability of each token as the probability of each entity span, and filtered the
predicted entity spans based on the total probability across all models. We selected three BERT models
for ensemble: BioLinkBERT_base [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ], BiomedBERT_base_uncased_abstract_fulltext [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] and
xlm_roberta_large_english_clinical [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]. The NER results for each separate model and the
ensembled model on the development dataset are shown in Table 9.
          </p>
        </sec>
        <sec id="sec-4-2-2">
          <title>5.2.2. RE subtask</title>
          <p>For the RE subtask, we applied a simple classification scheme. We first checked all entity pairs between
the entities recognized by our NER model and selected those pairs that possibly have one of the 25
relations defined in the competition, then used BERT to do a 0/1 classification for each entity pair and
relation type. We also noticed that more than 80% of relations in the training dataset are between
entities that are not far from each other (with distance of less than 200 chars). So we only selected close
entity pairs as relation candidates. This will reduce the recall of the model, but makes the subsequent
classification much easier.</p>
          <p>For the classification step, we used BioLinkBERT as our backbone model. We used the following
input format: [CLS] (EntityA) (Relation) (EntityB) [SEP] (text) [SEP] (EntityA) [SEP] (text) [SEP]
(EntityB) [SEP] (text) [SEP], which provides the BERT model with the relation type as well as the
entities along with their context, where the context from the sentence with the first entity to the
sentence with the second entity is selected. Since we only kept close entity pairs, the entire input can
be fit into the context length (512 tokens) of BioLinkBERT. We then used a classifier with two MLP
layers to map the final layer embedding of the [CLS] token into the probability that the relation is valid.</p>
          <p>During training, we used AdamW with a learning rate of 2e-5 for BERT and 5e-5 for the classifier,
and weight decay of 0.1. We trained the model for 5 epochs using cross-entropy loss (Equation is given
in section 3.2.1). During inference, we first used the fine-tuned BioLinkBERT model on the selected
entity pairs to obtain Ternary Mention-Based RE results, then combined all relations in each article to
obtain Binary RE and Ternary Tag-Based RE results.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>5.3. Results and Discussion</title>
        <p>The oficial results for GutBrainIE task are shown in Table 10, Table 11, Table 12, and Table 13, where
the performance of the best run of each team are reported. Our team didn’t achieve the best results, yet
our NER and Ternary Mention-Based RE Results surpassed the baseline which is quite strong given
that the baseline method is also trained on the training datasets. Our results on the other two subtasks
are not satisfactory, because we didn’t optimize our methods for these two subtasks.</p>
        <p>Overall, model ensembling seems to be an efective method, and many top performing teams applied
this method to improve the model performance. Careful selection of the BERT models in the ensemble
may further enhance the results. Also, instead of combining the final entity span, one can take the
embeddings generated by each model as input to the classifier, if all model have the same tokenizer.
This method may lead to more efective ensemble of BERT models.</p>
        <p>In the RE subtask, our method was simple and efective, though it still have much room for
improvement. Ignoring entity pairs that are far away degraded our model performance. However, accurately
identifying relations that are far away requires efective method to select related information from the
context and ignore distracting information, like in the ATLOP baseline. Future work may explore more
efective ways of RE in long documents.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion</title>
      <p>In this paper, we describe our methods and results for three tasks in BioASQ lab for CLEF 2025:
ELCardioCC, BioNNE-L and GutBrainIE. We utilized pretrained BERT models and designed a specific
technique for each tasks. Our methods achieved good results, ranking the first place in ELCardioCC
and the multilingual subtask of BioNNE-L, and also surpassing the baseline on the NER and TM-RE
subtask of GutBrainIE. These results further show the efectiveness of BERT embedding for biomedical
text mining tasks.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We want to express thanks to the organizers from Aristotle University of the Thessaloniki for hosting
the ELCardioCC task, the organizers from HSE University, Lomonosov Moscow State University and
Kazan Federal University for hosting the BioNNE-L task, and the organizers from University of Padua for
hosting the GutBrainIE task. This work has been supported by the National Natural Science Foundation
of China (Grant No. 62272105).</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
on cross-task transfer for concept extraction in the clinical domain, Bioinformatics 38 (2022)
3267–3274.
[5] A. Nentidis, G. Katsimpras, A. Krithara, M. Krallinger, M. Rodríguez-Ortega, E. Rodriguez-López,
N. Loukachevitch, A. Sakhovskiy, E. Tutubalina, D. Dimitriadis, G. Tsoumakas, G. Giannakoulas,
A. Bekiaridou, A. Samaras, G. M. Di Nunzio, N. Ferro, S. Marchesin, M. Martinelli, G. Silvello,
G. Paliouras, Overview of BioASQ 2025: The thirteenth BioASQ challenge on large-scale biomedical
semantic indexing and question answering, volume TBA of Lecture Notes in Computer Science,
Springer, 2025, p. TBA.
[6] D. Dimitriadis, V. Patsiou, E. Stoikopoulou, A. Toumpas, A. Kipouros, D. Papadopoulos, A.
Bekiaridou, K. Barmpagiannos, A. Vasilopoulou, A. Barmpagiannos, A. Samaras, G. Giannakoulas,
G. Tsoumakas, Overview of ElCardioCC Task on Clinical Coding in Cardiology at BioASQ
2025, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), CLEF 2025 Working Notes, 2025.
[7] A. Sakhovskiy, N. Loukachevitch, E. Tutubalina, Overview of the BioASQ BioNNE-L Task on
Biomedical Nested Entity Linking in CLEF 2025, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.),
CLEF 2025 Working Notes, 2025.
[8] M. Martinelli, G. Silvello, V. Bonato, G. M. Di Nunzio, N. Ferro, O. Irrera, S. Marchesin, L. Menotti,
F. Vezzani, Overview of GutBrainIE@CLEF 2025: Gut-Brain Interplay Information Extraction, in:
G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), CLEF 2025 Working Notes, 2025.
[9] M. V. Koroteev, Bert: a review of applications in natural language processing and understanding,
arXiv preprint arXiv:2103.11943 (2021).
[10] M. Sung, M. Jeong, Y. Choi, D. Kim, J. Lee, J. Kang, Bern2: an advanced neural biomedical named
entity recognition and normalization tool, Bioinformatics 38 (2022) 4837–4839.
[11] L. Luo, C.-H. Wei, P.-T. Lai, R. Leaman, Q. Chen, Z. Lu, Aioner: all-in-one scheme-based biomedical
named entity recognition using deep learning, Bioinformatics 39 (2023) btad310.
[12] M. Sänger, S. Garda, X. D. Wang, L. Weber-Genzel, P. Droop, B. Fuchs, A. Akbik, U. Leser, Hunflair2
in a cross-corpus evaluation of biomedical named entity recognition and normalization tools,
Bioinformatics 40 (2024) btae564.
[13] M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P.-E. Mazaré, M. Lomeli, L. Hosseini,</p>
      <p>H. Jégou, The faiss library, arXiv preprint arXiv:2401.08281 (2024).
[14] F. Liu, E. Shareghi, Z. Meng, M. Basaldella, N. Collier, Self-alignment pretraining for biomedical
entity representations, arXiv preprint arXiv:2010.11784 (2020).
[15] A. Sakhovskiy, N. Semenova, A. Kadurin, E. Tutubalina, Graph-enriched biomedical entity
representation transformer, in: International Conference of the Cross-Language Evaluation Forum for
European Languages, Springer, 2023, pp. 109–120.
[16] A. Sakhovskiy, N. Semenova, A. Kadurin, E. Tutubalina, Biomedical entity representation with
graph-augmented multi-objective transformer, in: Findings of the Association for Computational
Linguistics: NAACL 2024, 2024, pp. 4626–4643.
[17] W. Zhou, K. Huang, T. Ma, J. Huang, Document-level relation extraction with adaptive thresholding
and localized context pooling, in: Proceedings of the AAAI conference on artificial intelligence,
volume 35, 2021, pp. 14612–14620.
[18] P.-T. Lai, C.-H. Wei, L. Luo, Q. Chen, Z. Lu, Biorex: improving biomedical relation extraction by
leveraging heterogeneous datasets, Journal of Biomedical Informatics 146 (2023) 104487.
[19] P.-T. Lai, C.-H. Wei, S. Tian, R. Leaman, Z. Lu, Enhancing biomedical relation extraction with
directionality, arXiv preprint arXiv:2501.14079 (2025).
[20] J. Koutsikakis, I. Chalkidis, P. Malakasiotis, I. Androutsopoulos, Greek-bert: The greeks visiting
sesame street, in: 11th Hellenic Conference on Artificial Intelligence, SETN 2020, Association
for Computing Machinery, New York, NY, USA, 2020, p. 110–117. URL: https://doi.org/10.1145/
3411408.3411440.
[21] I. Loshchilov, F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv:1711.05101
(2017).
[22] O. Bodenreider, The unified medical language system (umls): integrating biomedical terminology,</p>
      <p>Nucleic acids research 32 (2004) D267–D270.
[23] N. Loukachevitch, A. Sakhovskiy, E. Tutubalina, Biomedical concept normalization over nested
entities with partial umls terminology in russian, in: Proceedings of the 2024 Joint International
Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING
2024), 2024, pp. 2383–2389.
[24] V. Davydova, N. Loukachevitch, E. Tutubalina, Overview of bionne task on biomedical nested
named entity recognition at bioasq 2024, CLEF Working Notes (2024).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers</article-title>
          ),
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tinn</surname>
          </string-name>
          , H. Cheng, M. Lucas,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usuyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Naumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Poon</surname>
          </string-name>
          ,
          <article-title>Domainspecific language model pretraining for biomedical natural language processing</article-title>
          ,
          <source>ACM Transactions on Computing for Healthcare (HEALTH) 3</source>
          (
          <issue>2021</issue>
          )
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yasunaga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Linkbert: Pretraining language models with document links</article-title>
          ,
          <source>arXiv preprint arXiv:2203.15827</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Adel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Strötgen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Klakow</surname>
          </string-name>
          ,
          <article-title>Clin-x: pre-trained language models and a study</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>