<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>CLEF 2025: Gut-Brain Interplay Information Extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>of GutBrainIE@</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Martinelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianmaria Silvello</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vanessa Bonato</string-name>
          <email>vanessa.bonato@unipd.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giorgio Maria Di Nunzio</string-name>
          <email>giorgiomaria.dinunzio@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Ferro</string-name>
          <email>nicola.ferro@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ornella Irrera</string-name>
          <email>ornella.irrera@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Marchesin</string-name>
          <email>stefano.marchesin@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laura Menotti</string-name>
          <email>laura.menotti@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federica Vezzani</string-name>
          <email>federica.vezzani@unipd.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering, University of Padua</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Linguistic and Literary Studies, University of Padua</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Information Extraction (IE)</institution>
          ,
          <addr-line>Named Entity Recognition (NER), Relation Extraction, RE</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Recent studies link the gut microbiota to mental health conditions and to neurodegenerative diseases such as Parkinson's and Alzheimer's. However, the rapid speed at which this research field is evolving presents a significant challenge for clinicians and researchers who have to keep pace with an ever-expanding volume of biomedical literature. In this context, automatic tools for extracting and structuring information from scientific texts are becoming essential to support the understanding of the gut-brain axis.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>CEUR</p>
      <p>ceur-ws.org</p>
      <p>
        In response to this challenge, the GutBrainIE-2025 task, part of the BioASQ Lab [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] and inserted in
the context of the EU-funded project HEREDITARY,1 introduces a Natural Language Processing (NLP)
challenge focused on extracting structured information from PubMed abstracts related to the gut–brain
axis. The task aims to foster the development of robust and efective Information Extraction (IE) systems
that support experts in analyzing the scientific literature, thereby contributing to biomedical knowledge
discovery and, in the long term, informed clinical decision-making.
      </p>
      <p>In its first edition, GutBrainIE proposes four subtasks of increasing complexity:
• Subtask 6.1 - Named Entity Recognition (NER): participants are asked to identify and classify
specific text spans (entity mentions) into one of the 13 predefined categories (e.g., bacteria,
chemical, microbiota).
• Subtask 6.2.1 - Binary Tag-based Relation Extraction (BT-RE): participants are provided
with a set of predefined relation types, each defined by a combination of compatible head and
tail entities (e.g., Chemical → Microbiome via Impact or Produced by), and are asked to identify
which entities are in relation within a document, without specifying the exact predicate or entity
mentions involved.
• Subtask 6.2.2 - Ternary Tag-based Relation Extraction (TT-RE): this subtask extends BT-RE
by requiring participants to predict the specific relation predicate connecting each head-tail entity
pair.
• Subtask 6.2.3 - Ternary Mention-based Relation Extraction (TM-RE): this is the most
challenging subtask, demanding to identify the exact entity mention involved in a relation and
assign the correct relation predicate.</p>
      <p>All subtasks target PubMed abstracts, leveraging a corpus of biomedical documents related to the
gut-brain axis. Each document contains a title and abstract, both annotated with entity mentions and
relations. Specifically, the GutBrainIE-2025 dataset consists of over 1500 annotated documents, split
into Training, Development, and Test sets. A noteworthy feature of the dataset is its tiered annotation
quality, organized as follows:
• Platinum Annotations: highest-quality annotations, expert-curated and internally reviewed;
• Gold Annotations: high-quality annotations and expert-curated;
• Silver Annotations: mid-quality annotations, created by trained students under expert
supervision;
• Bronze Annotations: automatically generated annotations with no manual correction.</p>
      <p>In particular, the Development and Test sets contain only expert annotations (Platinum- and
GoldStandard Annotations).</p>
      <p>Submissions are evaluated using standard macro- and micro-averaged Precision, Recall, and F1
metrics. Results are compared against a baseline system shared with participants at the beginning of
the challenge to provide a reference baseline.</p>
      <p>This paper provides a comprehensive overview of the GutBrainIE-2025 task. Section 2 presents the
subtasks and their structure; Section 3 introduces the dataset structure and annotation schema; Section
4 presents participating teams and evaluation procedures; Section 5 reports the results and leaderboards
across subtasks; Section 6 describes the systems, models, and approaches employed by participating
teams; finally, Section 7 concludes the paper and proposes future directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Task Overview</title>
      <p>In its first edition, GutBrainIE-2025 featured four subtasks:
1. Named Entity Recognition (NER).
2. Binary Tag-based Relation Extraction (BT-RE).
3. Ternary Tag-based Relation Extraction (TT-RE).
4. Ternary Mention-based Relation Extraction (TM-RE).</p>
      <p>Participants were free to develop their systems without constraints on architecture, training
methodology, or external resources, aiming to achieve the best possible performance. Overall, 17 teams submitted
a total of 395 runs. In the remainder of this section, we describe each task in detail.</p>
      <sec id="sec-2-1">
        <title>2.1. Subtask 1: Named Entity Recognition (NER)</title>
        <p>The NER subtask focuses on classifying entity mentions into one of the 13 predefined categories.
Participants were provided with PubMed abstracts related to the gut-brain axis and asked to identify
specific text spans corresponding to one of the 13 categories defined in Table 1.</p>
        <p>Each entity mention consists of the following elements:
• Location, indicating whether the entity mention appears in the title or in the abstract.
• Start and end indices, denoting character ofsets of the entity mention within the text.
• Text span, representing the actual string of text corresponding to the mention.
• Label, specifying the entity label assigned to the mention.</p>
        <p>A predicted entity mention is considered correct only if all its fields exactly match an entry in the
ground truth.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Subtask 2: Binary Tag-based Relation Extraction (BT-RE) Subtask</title>
        <p>The BT-RE subtask is one of the three GutBrainIE-2025 subtasks dealing with RE. In this subtask,
participants have to determine which pairs of entities are in relation within a document, considering
the set of relations defined in Table 2.</p>
        <p>Within BT-RE, participants are not required to predict a relation predicate. Therefore, a predicted
relation for this subtask will be a pair (subjectEntityLabel; objectEntityLabel), where entity labels are
taken from the ones reported in Table 1.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Subtask 3: Ternary Tag-based Relation Extraction (TT-RE) Subtask</title>
        <p>The TT-RE subtask complements BT-RE by requiring participants to predict, along with the pair of
entities in relation, the predicate of the relation holding among them. As in BT-RE, the set of relations
to be considered is reported in Table 2.</p>
        <p>Predicted relations for TT-RE will be triples (subjectEntityLabel; relationPredicate; objectEntityLabel).</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Subtask 4: Ternary Mention-based Relation Extraction (TM-RE) Subtask</title>
        <p>
          The TM-RE subtask is, among the three RE subtasks, the one most aligned with the standard NLP task
of Relation Extraction [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Here, participants are required to identify the entity mentions involved in a
relation, predict their entity labels, and specify the relation predicate that links them.
        </p>
        <p>Predicted relations for TM-RE will be tuples (subjectEntityTextSpan; subjectEntityLabel;
relationPredicate; objectEntityTextSpan; objectEntityLabel).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>The released dataset for GutBrainIE-2025 consists of titles and abstracts of biomedical articles retrieved
from PubMed, focusing on the gut-brain axis and its implications in neurological and mental health.
Articles were manually annotated, either by experts or trained students,2 for entity mentions (i.e., text
2The students we are referring to are enrolled in the Master Degree in Modern Languages for International Communication
and Cooperation of the University of Padua. They received a specific training on medical terminology during the course of
Translation-Oriented Terminography.
spans mapped to one of the categories defined in Table 1) and relations (i.e., associations between
entities defined in Table 2).</p>
      <sec id="sec-3-1">
        <title>3.1. Dataset Creation</title>
        <p>To build the GutBrainIE-2025 dataset, we first retrieved documents from PubMed using two separate
queries: "gut microbiota" AND "Parkinson" and "gut microbiota" AND "Mental Health". The
ifrst retrieval was performed on 09/05/2024 and yielded 828 documents. A second retrieval using the
same queries was conducted on 31/10/2024, resulting in 834 additional documents not included in the
ifrst batch. We then filtered out documents from the years 2014–2019 (for the “Mental Health” query)
and 2013–2020 (for the “Parkinson” query) due to the limited volume of relevant literature in those
periods, discarding 16 documents in total. The final collection includes 1,647 documents.</p>
        <p>
          Before starting manual annotation, documents were pre-annotated for NER using GLiNER [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] in
a zero-shot setting, aiming to speed up and facilitate the annotation process. We decided not to
pre-annotate documents for RE since, in a zero-shot setting, the likelihood of introducing noise was
        </p>
        <sec id="sec-3-1-1">
          <title>Animal NCIT_C14182</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Biomedical Technique NCIT_C15188</title>
        </sec>
        <sec id="sec-3-1-3">
          <title>Bacteria</title>
        </sec>
        <sec id="sec-3-1-4">
          <title>NCBITaxon_2</title>
        </sec>
        <sec id="sec-3-1-5">
          <title>Chemical CHEBI_59999</title>
        </sec>
        <sec id="sec-3-1-6">
          <title>Dietary Supplement MESH_68019587</title>
        </sec>
        <sec id="sec-3-1-7">
          <title>Disease, Disorder, or</title>
        </sec>
        <sec id="sec-3-1-8">
          <title>Finding (DDF)</title>
        </sec>
        <sec id="sec-3-1-9">
          <title>Drug</title>
        </sec>
        <sec id="sec-3-1-10">
          <title>Food</title>
        </sec>
        <sec id="sec-3-1-11">
          <title>Gene</title>
        </sec>
        <sec id="sec-3-1-12">
          <title>Human</title>
        </sec>
        <sec id="sec-3-1-13">
          <title>Microbiome</title>
          <p>NCIT_C7057
CHEBI_23888</p>
          <p>NCIT_C1949
SNOMEDCT_67261001</p>
        </sec>
        <sec id="sec-3-1-14">
          <title>NCBITaxon_9606</title>
          <p>OHMI_0000003</p>
        </sec>
        <sec id="sec-3-1-15">
          <title>Statistical Technique NCIT_C19044</title>
          <p>Explanation</p>
        </sec>
        <sec id="sec-3-1-16">
          <title>Named locations of or within the body.</title>
        </sec>
        <sec id="sec-3-1-17">
          <title>A non-human living organism that has membranous cell walls, requires oxygen and organic foods, and is capable of voluntary movement, as distinguished from a plant or mineral.</title>
        </sec>
        <sec id="sec-3-1-18">
          <title>Research concerned with the application of biological</title>
          <p>and physiological principles to clinical medicine.</p>
        </sec>
        <sec id="sec-3-1-19">
          <title>One of the three domains of life (the others being Eu</title>
          <p>karya and ARCHAEA), also called Eubacteria. They are
unicellular prokaryotic microorganisms which generally
possess rigid cell walls, multiply by cell division, and
exhibit three principal forms: round or coccal, rodlike or
bacillary, and spiral or spirochetal.</p>
          <p>A chemical substance is a portion of matter of constant
composition, composed of molecular entities of the same
type or of diferent types. This category also includes
metabolites, which in biochemistry are the intermediate
or end product of metabolism, and neurotransmitters,
which are endogenous compounds used to transmit
information across the synapses.</p>
        </sec>
        <sec id="sec-3-1-20">
          <title>Products in capsule, tablet or liquid form that provide</title>
          <p>dietary ingredients, and that are intended to be taken by
mouth to increase the intake of nutrients. Dietary
supplements can include macronutrients, such as proteins,
carbohydrates, and fats; and/or micronutrients, such as
vitamins; minerals; and phytochemicals.</p>
        </sec>
        <sec id="sec-3-1-21">
          <title>A condition that is relevant to human neoplasms and</title>
          <p>non-neoplastic disorders. This includes observations,
test results, history and other concepts relevant to the
characterization of human pathologic conditions.</p>
        </sec>
        <sec id="sec-3-1-22">
          <title>Any substance which when absorbed into a living organ</title>
          <p>ism may modify one or more of its functions. The term
is generally accepted for a substance taken for a
therapeutic purpose, but is also commonly used for abused
substances.</p>
        </sec>
        <sec id="sec-3-1-23">
          <title>A substance consumed by humans and animals for nutritional purpose.</title>
        </sec>
        <sec id="sec-3-1-24">
          <title>A functional unit of heredity which occupies a specific position on a particular chromosome and serves as the template for a product that contributes to a phenotype or a biological function.</title>
        </sec>
        <sec id="sec-3-1-25">
          <title>Members of the species Homo sapiens.</title>
        </sec>
        <sec id="sec-3-1-26">
          <title>This term refers to the entire habitat, including the mi</title>
          <p>croorganisms (bacteria, archaea, lower and higher
eukaryotes, and viruses), their genomes (i.e., genes), and
the surrounding environmental conditions.</p>
        </sec>
        <sec id="sec-3-1-27">
          <title>A method of calculating, analyzing, or representing statistical data.</title>
          <p>
            significantly higher than that of adding valid relations. Excessive noise in pre-annotations could lead to
biases among annotators, ultimately impacting the quality of the final annotated dataset [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ].
          </p>
          <p>Articles were then distributed between expert and student annotators. In total, 7 experts and 26
students annotated documents. Documents from the first retrieval were annotated exclusively by
experts, while those from the second retrieval were assigned to students.</p>
          <p>The annotation process was conducted in two phases, each followed by iterative refinement. At
the end of each phase, expert annotators conducted a meeting to review progress, discuss critical
challenges noted during the annotation phase, and make any necessary adjustments to the guidelines.
These guidelines, publicly available at https://hereditary.dei.unipd.it/challenges/gutbrainie/2025/files/
GutBrainIE_2025_Annotation_Guidelines.pdf, were also shared with task participants so they could
better tailor and tune their systems.</p>
          <p>
            Once manual annotation was completed, we fine-tuned GLiNER [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] for NER and ATLOP [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] for
RE using the annotated entities and relations and used them to annotate the remaining unannotated
documents from both batches of the original retrieval. More detailed information about the fine-tuning
of these models can be found in Section 4.4.
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset Folds</title>
        <p>The training set is divided into four parts:
1. Platinum Collection: highest-quality annotations, expert-curated and revised internally by a
subgroup of annotators to ensure consistency, uniformity, and alignment with the final annotation
guidelines;
2. Gold Collection: high-quality annotations, expert-curated and produced after the finalization of
the annotation guidelines. No subsequent revision performed;
3. Silver Collection: mid-quality annotations, created by trained students under expert supervision.</p>
        <p>Students were divided into two clusters:
• StudentA, including those with more consistent annotation performance,
• StudentB, including those with less consistent annotation performance.
4. Bronze Collection: automatically generated annotations obtained using fine-tuned GLiNER (for</p>
        <p>
          NER) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and fine-tuned ATLOP (for RE) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. No manual revision was performed on this subset.
        </p>
        <p>The development and test sets are held-out selections of documents from the gold and platinum
collections, selected to ensure full representativeness and coverage of all entity and relation types.
3.3. Dataset Format
Annotations are provided in JSON format. Each entry corresponds to a PubMed article, keyed by its
PubMed ID (PMID), and contains the following fields:
• Metadata: Article-level information including:
– title, author, journal, year, abstract;
– annotator_id: one of expert_1–expert_7, student_A, student_B, or distant
(automatically generated). Participants may decide to filter or weight examples diferently based on
the annotator.
• Entities: An array of objects, each with:
– start , end : character ofsets of the text span associated to the entity mention;
– location: “title” or “abstract”;
– text_span: the actual text span of the mention;
– label: the annotated entity label (such as bacteria, microbiome).
• Relations: An array of objects representing relations, each with:
– subject_start , subject_end, subject_location, subject_text_span, subject_label: the subject entity
mention;
3.3.1. Alternative Dataset Formats
For users preferring tabular data, each field above is also provided in both CSV and TSV formats:
• metadata.csv — metadata.tsv
• entities.csv — entities.tsv
• relations.csv — relations.tsv
• binary_tag_relations.csv — binary_tag_relations.tsv
• ternary_tag_relations.csv — ternary_tag_relations.tsv
• ternary_mention_relations.csv — ternary_mention_relations.tsv</p>
        <p>CSV files use the pipe symbol ( |) as a delimiter, while TSV files use the tab character ( \t).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Participation and Evaluation</title>
      <p>This section provides a concise overview of the teams that participated in GutBrainIE-2025. A
comprehensive description of the submitted systems can be found in Section 6 and in the participants’
individual papers reported in Table 5.</p>
      <p>Teams could participate in any of the four subtasks independently and submit up to 25 runs per
subtask.</p>
      <p>Although 85 teams from 29 diferent countries registered for the challenge, the final number of
teams submitting at least one run was 17, resulting in 395 submitted runs. Among these, 15 teams
also submitted a participant paper describing their methodologies, approaches, and systems. However,
the discussion presented in Section 6 includes all 17 teams that submitted at least one run. Table 4
summarizes participation across the various subtasks.</p>
      <p>The task began on February 3, 2025, with the release of the training and development sets. The test
set was made available on April 28, and final submissions were due by May 10.</p>
      <sec id="sec-4-1">
        <title>4.1. Guidelines</title>
        <p>Participating teams were required to satisfy the following guidelines:
• Runs should be submitted in the JSON format described below;
• Each team can submit a maximum of 25 runs per subtask.
4.1.1. Subtask 1 (NER) Run Format
Runs must be submitted as a JSON file ( .json) with the following structure:
where:
},
{
}
"start_idx": 75,
"end_idx": 82,
"location": "title",
"text_span": "patients",
"label": "human"
"start_idx": 250,
"end_idx": 270,
"location": "abstract",
"text_span": "intestinal microbiome",
"label": "microbiome"
• The top‐level key (e.g. “34870091”) is the PubMed ID of the document.
• entities is a list of entity objects.
• Each entity object represents a predicted entity and contains:
– start_idx and end_idx: character ofsets of the span,
– location: “title” or “abstract”,
– text_span: the actual text,
– label: the entity type.
4.1.2. Subtask 2 (BT-RE) Run Format
where:
where:
• The top‐level key (e.g. “34870091”) is the PubMed ID of the document.
• binary_tag_based_relations is a list of relation objects.
• Each relation object represents a predicted binary tag-based relation and contains:
– subject_label: the entity type of the relation’s subject,
– object_label: the entity type of the relation’s object.
4.1.3. Subtask 3 (TT-RE) Annotation Format
Submissions must be provided as a JSON file ( .json) with the following structure:
"34870091": {
"ternary_tag_based_relations": [
{
"subject_label": "microbiome",
"predicate": "located in",
"object_label": "human"
• The top-level key (e.g. “34870091”) is the PubMed ID of the document.
• ternary_tag_based_relations is a list of relation objects.
• Each relation object represents a predicted ternary tag-based relation and contains:
– subject_label: the entity type of the relation’s subject,
– predicate: the relation type between the subject and object,
– object_label: the entity type of the relation’s object.
4.1.4. Subtask 4 (TM-RE) Annotation Format
Submissions must be provided as a JSON file ( .json) with the following structure:
• The top-level key (e.g. “34870091”) is the PubMed ID of the document.
• ternary_mention_based_relations is a list of relation objects.
• Each relation object represents a predicted ternary mention-based relation and contains:
– subject_text_span: the exact character sequence of the subject mention,
– subject_label: the entity type of the subject mention,
– predicate: the relation type between the subject and object,
– object_text_span: the exact character sequence of the object mention,
– object_label: the entity type of the object mention.
4.1.5. Submission Upload
All runs must be submitted as a single ZIP archive named &lt;teamID&gt;_GutBrainIE_2025.zip.
Within this archive, each run has to be placed in its own folder named
&lt;teamID&gt;_&lt;taskID&gt;_&lt;runID&gt;_&lt;systemDesc&gt; (without spaces or special characters), where:
• &lt;teamID&gt; is the name of the participating team;
• &lt;taskID&gt; is the identifier of the subtask the run is being submitted to (one of T61 for NER, T621
for BT-RE, T622 for TT-RE, or T623 for TM-RE);
• &lt;runID&gt; is a unique alphanumeric string (a–z, A–Z, 0–9) chosen by the team to distinguish among
their runs;
• &lt;systemDesc&gt; is an optional short label describing the system.</p>
        <p>Each run folder is required to contain exactly two files:
• &lt;teamID&gt;_&lt;taskID&gt;_&lt;runID&gt;_&lt;systemDesc&gt;.json
• &lt;teamID&gt;_&lt;taskID&gt;_&lt;runID&gt;_&lt;systemDesc&gt;.meta
The .json file holds the team’s predictions for the specified subtask on the test set. The accompanying
.meta file must include the following information:
• Team ID, Task ID, and Run ID;
• Type of training applied;
• Pre‐processing methods;
• Training data used;
• Relevant details of the run;
• A link to a public repository enabling reproducibility.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Participants</title>
        <p>A total of 85 teams registered for the GutBrainIE2025 task, of which 17 submitted at least one run and
thus participated in the evaluation.</p>
        <p>In total, 391 runs were submitted: 101 for NER, 100 for BT-RE, and 95 each for TT-RE and TM-RE.
Table 4 shows which tasks each team participated in and how many runs they submitted, while Table 5
reports their afiliations, countries of origin, and associated resources.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Evaluation</title>
        <p>All submitted runs are evaluated using standard IE metrics of precision ( ), recall ( ), and F1‐score ( 1),
assessed with both macro‐ and micro‐averaging. The same metrics apply to all four subtasks.</p>
        <p>Let   ℓ,   ℓ, and   ℓ denote, respectively, the number of true positives, false positives, and false
negatives for label ℓ. We define the label set ℒ as:
• for subtask 1 (NER): the set of entity types;
• for subtask 2 (BT-RE): the set of pairs (subject label, object label);
• for subtasks 3 and 4 (TT-RE and TM-RE): the set of triples (subject label, predicate, object label).
The macro-averaged metrics are computed as:
 macro =
1   ℓ</p>
        <p>∑
|ℒ | ℓ∈ℒ   ℓ +   ℓ
The micro‐averaged metrics aggregate counts before division:
 macro =</p>
        <p>1 ∑   ℓ
|ℒ | ℓ∈ℒ   ℓ +   ℓ
 micro =
 micro =
 1,micro =</p>
        <p>∑ℓ∈ℒ   ℓ
∑ℓ∈ℒ(  ℓ +   ℓ)
∑ℓ∈ℒ   ℓ</p>
        <p>,
∑ℓ∈ℒ(  ℓ +   ℓ)
2  micro  micro .
 micro +  micro
,
(1b)
(1c)
(2a)
(2b)
(2c)</p>
        <p>For each subtask, the micro‐averaged F1‐score (Eq. 2c) is adopted as the reference metric for the final
leaderboard.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Baseline</title>
        <p>To support participants and provide a reference for performance evaluation, we developed a baseline
system for all four GutBrainIE subtasks. This system is the same one used to generate the automatic
annotations included in the Bronze fold of the training set (see Section 3.2).</p>
        <p>
          The system consists of two independent modules: a NER module based on GLiNER [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], and a RE
module based on ATLOP [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The NER module employs GLiNER, a bidirectional transformer encoder
trained for instruction-based named entity recognition [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. We used the NuNERZero checkpoint [24]
and fine-tuned the model on the Platinum, Gold, and Silver portions of the training data, applying a
confidence threshold of 0.6. After inference, we merged predicted entities having adjacent or overlapping
spans.
        </p>
        <p>
          The RE module uses ATLOP, a document-level relation extraction model that employs localized
context pooling and adaptive thresholding [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. ATLOP receives the document text and the entities
predicted by the NER module and predicts relational triples within each document. The resulting
relations are filtered to exclude any relation not listed in Table 2. For fine-tuning, the Platinum, Gold,
and Silver collections as manually annotated sets, and the Bronze collection as the distantly supervised
annotated set.
        </p>
        <p>Table 6 reports, for each participating team, the number of submitted runs that surpassed the baseline
system out of the total number of runs submitted for each subtask (considering the micro-averaged F1
score as the reference metric).</p>
        <p>The code implementing the baseline system is available at the following GitHub repository: https:
//github.com/MMartinelli-hub/GutBrainIE_2025_Baseline.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>This section presents the performance results for each subtask, based on the evaluation metrics described
in Section 4.3.</p>
      <p>For each subtask, we report the leaderboard tables showing the best-performing run per team, ranked
by micro-averaged F1 score. Complete scores for every submitted run can be found in the appendix.</p>
      <sec id="sec-5-1">
        <title>5.1. Subtask 1 (NER) Results</title>
        <p>Most participating teams in the NER subtask adopted supervised fine-tuning or transformer-based
models pre-trained on large-scale biomedical corpora, with the most employed ones being PubMedBERT
[25], BioBERT [26], BioLinkBERT [27], and ELECTRA [28]. In addition to these, several teams employed
Number of submitted runs surpassing the baseline system. For each team and subtask, the table reports the
number of submitted runs that achieved a higher micro-averaged F1 score than the baseline system out of the
total number of runs submitted.</p>
        <p>team_id</p>
        <p>
          Performance metrics of each team’s top run for NER. For each evaluation metric, the best result is in bold, the
team_id
GutUZH
Gut-Instincts
NLPatVCU
ICUE
LYX-DMIIP-FDU
ata2425ds
greenday
Graphwise-1
BASELINE
ataupd2425-gainer
DS@GT-bioasq-task6
DS@GT-BioNER
ataupd2425-pam
Schemalink
BIU-ONLP
lasigeBioTM
run_id
2
5eedev
ensemble1
ensemble5
run1
trf
1
13
Organizers
ma
1
run2
3
1
3
R1
system_desc
AugEnsemble
ensemble1
th10
EnsembleBERT
tranformer
llmner
NERWise
NuNerZero-Finetuned
trainplatinumandgold
glinerbiomed
pubmedbert
biosyn-sapbert-bc2gn-12
SchemaBasedMultiPrompt
3_gliner_large_bio-v0.1
BENTMistral
GLiNER [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] fine-tuned on the training data. Ensemble approaches were widely utilized to improve
efectiveness, often combining models trained with diferent data, seeds, and configurations.
        </p>
        <p>While the majority of teams used the platinum, gold, and silver folds, a few also included the noisier
bronze data, applying cleaning or re-weighting strategies. Some systems also incorporated additional
knowledge from external corpora or pseudo-labeled texts to enhance training coverage.</p>
        <p>A smaller number of teams experimented with prompt-based or zero-shot methods using Large
Language Models (LLMs). These approaches avoided traditional supervised learning and relied on
structured prompting and schema-guided extraction.</p>
        <p>Overall, systems that combined strong biomedical backbones with fine-tuning and ensemble strategies
tended to outperform others.</p>
        <p>Performance metrics of each team’s top run for BT-RE. For each evaluation metric, the best result is in bold, the</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Subtask 2 (BT-RE) Results</title>
        <p>A discussion of the systems and methodologies employed for BT-RE is provided in the section dedicated
to the TM-RE subtask (see Section 5.4), which ofers an overview valid across all RE subtasks.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Subtask 3 (T T-RE) Results</title>
        <p>second-best is underlined.</p>
        <p>Performance metrics of each team’s top run for TT-RE. For each evaluation metric, the best result is in bold, the
team_id run_id system_desc
Gut-Instincts 6229eedev3re
ataupd2425-pam B7 RE-BiomedNLP-3NoRel-1epoch-COMPLETE_DATASET
ONTUG union ElectraCLEANR
Graphwise-1 105 AtlopOnto
ICUE run22 biolinkbertl_pp
BIU-ONLP 4 RobertaLarge
BASELINE Organizers Atlop-Finetuned
NLPatVCU C19 mixedCNNWLabModel4Preds
LYX-DMIIP-FDU run1 BioLinkBERT
Schemalink 1 gpt4re
ataupd2425-gainer td trainplatinumandgold
ToGS hermes8bragreorder CLEANR
lasigeBioTM R1 ConstParsing</p>
        <p>A discussion of the systems and methodologies employed for TT-RE is provided in the section
dedicated to the TM-RE subtask (see Section 5.4), which ofers an overview valid across all RE subtasks.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Subtask 4 (TM-RE) Results</title>
        <p>second-best is underlined.</p>
        <p>Performance metrics of each team’s top run for TM-RE. For each evaluation metric, the best result is in bold, the
team_id run_id system_desc
Gut-Instincts 6239eedev3re
Graphwise-1 107 AtlopOnto
ICUE run23 biolinkbertl_pp
LYX-DMIIP-FDU run1 BioLinkBERT
ONTUG union ElectraCLEANR
BASELINE Organizers Atlop-Finetuned
Schemalink 1 gpt4re
ataupd2425-pam C7 RE-BiomedNLP-3NoRel-1epoch-COMPLETE_DATASET
ataupd2425-gainer tms trainplatinumandgold
NLPatVCU C11 ensembleWLabModel4Preds
BIU-ONLP 4 RobertaLarge
ToGS hermes3bloraragreorder CLEANR
lasigeBioTM R1 ConstParsing</p>
        <p>Most participating teams approached RE as a supervised classification task, using fine-tuned
biomedical transformers such as BioBERT [26], BioLinkBERT [27], PubMedBERT [25], and BioMedElectra [28].
Entity pairs were detected via upstream NER modules, explicitly marked in input texts and used to
generate relation-specific training instances.</p>
        <p>Some teams tackled RE at the document level, incorporating sampling strategies (e.g., negative
sampling, class-weighted losses) and architectural enhancements (e.g., query-based encoders, hypergraph
neural networks) to better capture long-tail relations. Data augmentation, input filtering, and relation
predicate-based constraints were also employed to refine candidate relation sets.</p>
        <p>Ensemble techniques, including majority voting and model fusion, were used by several
topperforming teams to improve systems’ efectiveness across the three RE subtasks.</p>
        <p>Few teams experimented with prompt-based or zero-shot approaches using LLMs guided by structured
templates or relation schemas, without any form of supervised training or fine-tuning.</p>
        <p>Overall, the most efective submissions combined strong biomedical encoders with supervised
finetuning and ensemble mechanisms.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>This section provides an overview of the approaches adopted by participating teams in the
GutBrainIE2025 task. We organize the discussion into two subsections: one dedicated to NER (subtask 1) and
another covering the RE subtasks (subtasks 2, 3, and 4).</p>
      <sec id="sec-6-1">
        <title>6.1. Subtask 1 (NER) Discussion</title>
        <p>Han et al. [18] (Team GutUZH) fine-tuned a BioMedBERT model [ 29] augmented with a Conditional
Random Field (CRF) layer to improve label dependency modeling [30]. Titles and abstracts were
processed separately, with special tokens ([TITLE], [ABSTRACT]) used to mark structural components
of the documents.</p>
        <p>The team experimented with multiple runs involving data augmentation and model ensembling. In
one setup, they pseudo-labeled 500 additional abstracts using ensemble predictions, integrating them
into a second training phase. Another variant trained on the full labeled set, including also
bronzequality annotations, while a final run retrained the top-performing model using only manual annotations
(platinum, gold, silver sets) to reinforce the patterns learned from the most reliable examples.</p>
        <p>Training employed weighted loss functions for class imbalance, mixed-precision optimization, and
early stopping based on entity-level F1 score [31, 32]. Inference relied on Viterbi decoding [33], with
evaluation using the seqeval library [34].</p>
        <p>Andersen et al. [17] (Team Gut-Instincts) built a large ensemble system integrating multiple
biomedical transformers, including BioLinkBERT [27], BioMedBERT [29], and BioMedElectra [28], with diferent
decoding heads (dense layers, CRFs, LSTM-CRFs). In their runs, they combined from 3 to 17 models.</p>
        <p>All available training data were used, including a cleaned version of the silver and bronze sets and, in
some runs, also the development set.</p>
        <p>Preprocessing included boundary corrections using manually crafted dictionaries, while training
involved class-weighted losses to give more importance to high-quality data during optimization and
a custom learning rate scheduler. Post-processing rules were used to merge overlapping or adjacent
entities.</p>
        <p>Taylor et al. [22] (Team NLPatVCU) submitted ensembles of fine-tuned GLiNER models specialized for
biomedical NER [35]. These models difered in pretraining sources, training subsets, and configuration
parameters.</p>
        <p>
          Training data included all annotation tiers, and some models were additionally pretrained on external
corpora such as BC5CDR [36]. To improve training stability, the team adopted GLiNER’s probabilistic
masking mechanism [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], selectively ignoring potentially mislabeled non-entity spans during training.
In addition, focal loss was used to emphasize harder examples and counter class imbalance.
        </p>
        <p>Ensemble predictions were constructed by combining the outputs of the three models. Model 1,
based on GLiNER-BioMed [35], was trained on all annotation tiers and served as the primary model;
Model 2 introduced a two-stage training pipeline with initial fine-tuning on BC5CDR [ 36] to improve
performance on disease-like entities; Model 3 reused the same training data as Model 1 but employed
diferent focal loss parameters to adjust class sensitivity. Post-processing involved per-entity confidence
thresholds and merging rules derived heuristically from the development set.</p>
        <p>Lee et al. [19] (Team ICUE) explored both token classification and span-based approaches. Their
primary models were transformer-based classifiers using IOB2 tagging [ 37] and ensembled predictions
across 11 models trained separately with variations in architectural choices and span manipulation
strategies.</p>
        <p>Training data comprised platinum, gold, and silver sets, with preprocessing involving token alignment,
label assignment, and filtering based on entity presence.</p>
        <p>The team employed BioLinkBERT [27] and PubMedBERT [25] as models, while span strategies
included union-span and bigger-span [38]. Some configurations further integrated PubTator annotations
as training data [39].</p>
        <p>Liu [21] (Team LYX-DMIIP-FDU) used a majority-vote ensemble of BioMedBERT [29], BioLinkBERT
[27], and a clinical variant of XLM-RoBERTa [40]. Each model was fine-tuned in a multi-task learning
setup, treating each entity class as a distinct prediction objective.</p>
        <p>Input annotations were converted to PubTator format before training [39]. Models were trained
on platinum, gold, silver, and development sets. During inference, span-level voting was applied to
determine final entity labels. Specifically, after separate inference using each model, they used the
average predicted probability of each token as the probability of each entity span, and filtered the
predicted entity spans based on the total probability across all models</p>
        <p>Team ata2425ds trained spaCy-based NER models using both static word embeddings and transformer
backbones [41].</p>
        <p>Two main pipelines were implemented: one using en_core_web_lg with tok2vec + NER layers, the
other based on en_core_web_trf with RoBERTa as the underlying encoder [42, 43]. Models were
trained on the full dataset, including bronze-quality annotations, with diferent tokenization and input
cleaning configurations.</p>
        <p>Preprocessing involved HTML tag removal using BeautifulSoup [44] and tokenization adjustments
to preserve annotated spans.</p>
        <p>Gupta et al. [16] (Team greenday) proposed a generation-based NER model by fine-tuning
GPT-4.1mini [45] to perform entity annotation using inline text markers, following the approach adopted by
the GPT-NER framework [46].</p>
        <p>Training was conducted via OpenAI’s API on platinum and gold subsets, using specific prompts that
directly instructed entity tagging. The team experimented with zero- and few-shot settings, utilizing a
FAISS-based vector database of training examples for retrieval-augmented few-shot prompting [47, 48].</p>
        <p>Post-processing involved recovering token-level entity spans from the annotated output by resolving
discrepancies and misalignments introduced by inline annotations and hallucinations.</p>
        <p>
          Datseris et al. [15] (Team Graphwise-1) developed an ensemble approach combining fine-tuned
biomedical transformers, GLiNER [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], and data-augmentation strategies. Their pipeline integrates
BioBERT [26], ELECTRA-based models [28], and GLiNER [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] fine-tuned on the full annotated dataset.
        </p>
        <p>
          To mitigate data imbalance in low-resource categories, they applied a data augmentation strategy
based on distant supervision. Specifically, they queried the PubMed API using MeSH-based queries
tailored to each entity type. Retrieved abstracts were then annotated using multiple NER systems,
including GLiNER [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and BioBERT [26], and incorporated into an expanded bronze-quality collection.
        </p>
        <p>To further improve system robustness, the team experimented with spaCy pipelines enhanced with
domain-specific gazetteers [ 49].</p>
        <p>Ensemble predictions were constructed by selecting the best-performing model for each entity type.
Post-processing rules were applied to adjust entity boundaries based on systematic validation error
analysis.</p>
        <p>
          Piron et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] (Team ataupd2425-gainer) trained GliNER-based models initialized from the
NuNer_Zero checkpoint [24]. Training variants explored diferent dataset combinations: platinum+gold,
platinum+gold+dev, and platinum+gold+silver.
        </p>
        <p>Preprocessing involved concatenating titles and abstracts, applying the DeBERTa-v3-large tokenizer
[50], and mapping entity ofsets across fields. Training used cosine learning rate scheduling, fixed batch
size (2), and variable training steps (6k-12k depending on the setting).</p>
        <p>
          Mehta [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] (Team DS@GT-bioasq-task6) submitted a single run using the GLiNER-biomed checkpoint
ifne-tuned on platinum, gold, and silver annotations [ 35].
        </p>
        <p>Post-processing involved a dictionary-based refinement using external biomedical lexicons to correct
low-confidence or invalid predictions.</p>
        <p>Team DS@GT-BioNER submitted three runs based on BioBERT [26] and PubMedBERT [25] models
ifne-tuned on platinum, gold, and silver folds. All annotations were converted to BIO format before
training [51].</p>
        <p>The first and second runs used BioBERT and PubMedBERT individually, while the third run ensembled
their outputs. Models were trained with HuggingFace’s default settings.</p>
        <p>
          Pamio et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] (Team ataupd2425-pam) explored CRF- and transformer-based models across ten
runs. Transformer models included BioBERT [26], BioMedBERT [29], NuNER [24], SapBERT [52], and
SciBERT [53].
        </p>
        <p>Some models were trained with class-weighted loss functions to address label imbalance. CRF-based
models used custom F1/F2 loss weighting strategies. For most of their submitted runs, models were
trained on the full dataset (all training and development sets), with data preprocessed by parsing entities
into token-label sequences.</p>
        <p>Team Schemalink applied a schema-driven in-context learning approach using OpenAI’s GPT-4o
[45]. No supervised training was employed.</p>
        <p>A LinkML schema derived from the ontology provided in the challenge materials was used to guide the
LLM [54], along with the incorporation of few-shot examples in the prompt. For each entity class, they
generated a separate prompt and used OpenAI’s response_format field to enforce structured extraction.
UTF-8 normalization was applied as a preprocessing step to improve model input compatibility.</p>
        <p>
          Keinan et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] (Team BIU-ONLP) fine-tuned five variants of GLiNER [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] on platinum, gold, and
silver tiers. Preprocessing included lowercasing and space normalization.
        </p>
        <p>Models difered by GLiNER backbone (e.g., domain-specific or multilingual). All were trained with
the same hyperparameters: 384-tokens input, learning rate of 5e-5, batch size of 8, and 3k training
epochs. The confidence threshold has been fixed to 0.9 to retain only highly reliable predictions.</p>
        <p>Conceição et al. [20] (Team LasigeBioTM) submitted two zero-shot runs using Mistal-7B [55].</p>
        <p>The first run used the BENT tool [ 56] to insert inline entity annotations with unique IDs and label
types, which were then passed to Mistral for processing. The second run applied Mistral directly to raw
texts without tagging. No fine-tuning or labeled data was used in either run.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Subtasks 2,3, and 4 (RE) Discussion</title>
        <p>Andersen et al. [17] (Team Gut-Instincts) extended their ensemble-based approach to all three RE
subtasks. Their approach combined fine-tuned transformers (BioLinkBERT [ 27], BioMedBERT [29],
BioMedElectra [28]) with specific adaptations to accommodate task-specific output structures.</p>
        <p>To improve training quality, they cleaned the silver and bronze datasets by correcting or removing
entity spans with misalignments and filtering out documents having more than 100 relations annotated.
Candidate entity pairs were marked in input texts, and a 10:1 negative sampling ratio was used to
balance the training data.</p>
        <p>Training used class-weighted loss and a custom learning rate schedule. Final predictions were
generated via ensemble voting across the three top-performing models per configuration.</p>
        <p>Kantz et al. [23] (Team ToGS) submitted runs for all RE subtasks using a hybrid system combining
retrieval-augmented generation (RAG) [57], LoRA fine-tuning [ 58], transformer-based models such as
BioMedElectra [28] and Hermes-3 (LLaMA-3.2 3B and LLaMA-3.1 8B variants) [59], and prompting
with GPT-4o-mini [45].</p>
        <p>Prompts were dynamically built using training examples retrieved from a VectorDB and reordered
to prioritize high-quality (platinum and gold) annotations. In addition to prompting, LoRA-based
ifne-tuning was applied to improve models’ specialization eficiently [ 58].</p>
        <p>Furthermore, Teams Togs [23] and Graphwise-1 [15] submitted collaborative runs as Team ONTUG.
Here, the BiomedElectra [28] model was first fine-tuned on the binary relation extraction task (BT-RE)
and subsequently adapted for the mention-level task (TM-RE), leveraging shared entity representations
across subtasks.</p>
        <p>All models were trained for 100 epochs with the same hyperparameters: batch size of 2, gradient
accumulation of 2 steps, learning rate of 5e-5, and a warm-up ratio of 0.06. Two diferent output fusion
strategies (union and intersection) were evaluated to assess the impact of conservative and inclusive
inference fusion.</p>
        <p>Datseris et al. [15] (Team Graphwise-1) participated in all three RE subtasks, exploring
transformerand encoder-based classifiers.</p>
        <p>For encoder models, BioMedElectra [28] and XLM-RoBERTa [40] were fine-tuned sequentially for
BT-RE, TT-RE, and TM-RE using consistent settings (up to 200 epochs, learning rate of 5e-5). Some
variants experimented with masked language modeling pre-training [60].</p>
        <p>They also employed fine-tuned REBEL-large [ 61] to perform end-to-end relation generation.</p>
        <p>
          Pamio et al.[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] (Team ataupd2425-pam) submitted models for all RE subtasks using transformer
classifiers trained on relation-centric instances.
        </p>
        <p>Entity mentions were extracted via upstream NER (e.g., SapBERT [52], NuNER [24]) and injected
into text using marker tokens. These instances were then used to fine-tune multiple RE models, trained
for one epoch on the full dataset (platinum, gold, silver, bronze, and development sets). No significant
run-specific modifications or hyperparameter variations were reported across submissions.</p>
        <p>
          Keinan et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] (Team BIU-ONLP) submitted twelve runs across all RE subtasks based on fine-tuning
ATLOP [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] on diferent language models, including SapBERT [ 52], BioBERT [26], and RoBERTa [43].
        </p>
        <p>Each model was trained using a standardized configuration (learning rate of 5e-5, batch size of 4, 500
training epochs, warmup ratio of 0.06).</p>
        <p>Only the platinum, gold, and silver folds were used. Preprocessing involved lowercasing and
whitespace normalization. No ensemble, augmentation, or post-processing strategies were applied.</p>
        <p>Liu [21] (Team LYX-DMIIP-FDU) used a unified binary classification approach across all RE subtasks.
Entity pairs were filtered by type compatibility and distance (&lt;200 characters) and formatted in PubTator
style with markers and contextual windows [39].</p>
        <p>BioLinkBERT [27] was employed as the backbone, fine-tuned using platinum, gold, silver, and
development sets. The same model and pipeline were reused across all RE subtasks, with no
taskspecific variation or augmentation.</p>
        <p>Taylor et al. [22] (Team NLPatVCU) explored two families of models: sentence-level CNN classifiers
[62] and document-level Hypergraph Neural Networks (HGNN) [63].</p>
        <p>CNNs were trained on sentences labeled with relations and sampled sentences with no relation, using
platinum, gold, and silver training datasets. Entity spans were derived from prior NER submissions,
and final outputs were aggregated via ensemble logic.</p>
        <p>HGNNs modeled entities and their interactions as nodes and hyperedges, using BioBERT embeddings
[26] and a hypergraph convolution layer [64]. The outputs obtained with these approaches supported
BT-RE and TT-RE predictions, but did not address TM-RE predictions.</p>
        <p>Team Schemalink used prompting-based approaches via OpenAI’s GPT-4o [45], operating in a fully
zero-shot setting.</p>
        <p>
          Entity mentions identified by GLiNER [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] were inserted into sentence-level prompts using custom
tags. Prompts included few-shot examples from the platinum set and targeted predefined relation
patterns (e.g., [bacteria] LOCATED IN [host]). The same system was applied to all subtasks with
no fine-tuning or augmentations.
        </p>
        <p>
          Piron et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] (Team ataupd2425-gainer) submitted runs for all three RE subtasks using
PubMedBERT [25] and BioBERT [26] trained via HuggingFace’s classification pipeline.
        </p>
        <p>Entity spans were marked using [E1] and [E2] tokens. Sentences were tokenized to a max length of
256 or 356, and negative sampling (0.2 or 0.3) was applied.</p>
        <p>Across runs, models were fine-tuned for 5 to 8 epochs on stratified 80/20 train–validation splits with
batch sizes between 8 and 12, and learning rates ranging from 1e-5 to 2e-5. Training data spanned
diferent combinations of the platinum, gold, silver, and dev sets. No ensembles or post-processing
were used.</p>
        <p>Lee et al. [19] (Team ICUE) participated in all RE subtasks by framing RE as binary classification
over entity combinations using a query-based BioLinkBERT model [27]. Inputs were constructed by
inserting tagged entities and a natural language query representing the candidate relations.</p>
        <p>Balanced sampling was used to mitigate class imbalance. Some runs included second-stage reasoning
with a distilled LLM trained on synthetic binary-choice prompts: given a candidate relation and
supporting context, the LLM is asked whether the relation holds, choosing between a positive or
negative restatement. The LLM confidence was then fused with classified logits.</p>
        <p>Ensemble strategies were also explored, with final outputs selected based on majority voting across
models trained on distinct splits or using diferent sampling thresholds.</p>
        <p>Conceição et al. [20] (Team LasigeBioTM) participated in TT-RE and TM-RE using a zero-shot
approach combining BENT for entity tagging [56] and Mistral-7B for relation extraction [55].</p>
        <p>Tagged inputs contained nested entity labels and IDs. In some runs, syntactic features
(dependency paths, constituency parses) were added using spaCy [49]. All configurations relied uniquely on
prompting and required no model fine-tuning or training data.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions and Future Works</title>
      <p>GutBrainIE-2025 marked the first edition of a shared task dedicated to information extraction on the
gut–brain axis, a research area of growing relevance in both neuroscience and microbiology.</p>
      <p>This first edition saw 85 teams registering and 17 teams submitting a total of 395 runs. Participants
tackled a diverse set of subtasks, from Named Entity Recognition to increasingly fine-grained Relation
Extraction, with results highlighting the efectiveness of ensemble-based methods and biomedical
transformers fine-tuned on domain-specific data.</p>
      <p>The released dataset, including over 1600 annotated PubMed abstracts and stratified into annotation
quality tiers, represents a valuable resource for training and evaluating biomedical NLP systems.</p>
      <p>As future work, we plan to further improve the overall quality of the dataset by manually reviewing
and annotating the current bronze fold, currently composed of fully automatic and not revised
annotations. Additionally, we will leverage the pool of submitted predictions to identify possible annotation
errors, such as wrongly annotated entities or relations as well as missing annotations that may have
been overlooked during the annotation process. Finally, we aim to extend the task by incorporating
entity linking. This will enable the inclusion of two additional subtasks: entity linking itself, and the
classical NLP task of Relation Extraction framed at the concept level rather than at the mention level.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This project has received funding from the HEREDITARY Project, as part of the European Union’s
Horizon Europe research and innovation programme under grant agreement No GA 101137074.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author used GPT-4o and Grammarly in order to: Grammar
and spelling check. After using these tools, the author reviewed and edited the content as needed and
takes full responsibility for the publication’s content.
Dictionary-Based Post-processing for BioASQ 2025 task 6, in: G. Faggioli, N. Ferro, P. Rosso,
D. Spina (Eds.), Working Notes of CLEF 2025 – Conference and Labs of the Evaluation Forum,
CEUR Workshop Proceedings, CEUR-WS, 2025.
[15] A. Datseris, M. Kuzmanov, I. Nikolova-Koleva, D. Taskov, S. Boytcheva, Graphwise @ CLEF-2025
GutBrainIE: Towards Automated Discovery of Gut-Brain Interactions: Deep Learning for NER
and Relation Extraction from PubMed Abstracts, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.),
Working Notes of CLEF 2025 – Conference and Labs of the Evaluation Forum, CEUR Workshop
Proceedings, CEUR-WS, 2025.
[16] H. P. Gupta, R. Banerjee, LLMs for Biomedical NER, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina
(Eds.), Working Notes of CLEF 2025 – Conference and Labs of the Evaluation Forum, CEUR
Workshop Proceedings, CEUR-WS, 2025.
[17] L. R. Andersen, M. I. Gardshodn, M. H. Dolmer, J. M. Rodriguez, D. Dell’Aglio, Trusting Gut
Instincts: Transformer-Based Extraction of Structured Data from Gut-Brain Axis Publications, in:
G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes of CLEF 2025 – Conference and
Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS, 2025.
[18] J. Han, Y. Liu, GutUZH at CLEF2025 BioASQ Task 6: a method of SOTA performance with the best
results at GutBrainIE NER subtask 1, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working
Notes of CLEF 2025 – Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings,
CEUR-WS, 2025.
[19] C. Lee, S. Doneva, M. Rodriguez-Cubillos, E. Castagnari, A. Lain, J. Posma, T. I. Simpson,
Understanding Gut-Brain Interplay in Scientific Literature: A Hybrid Approach from Classification to
Generative LLM Reasoning, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes
of CLEF 2025 – Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings,
CEUR-WS, 2025.
[20] S. I. R. Conceição, P. R. C. Lopes, F. M. Couto, lasigeBioTM at BioASQ25 Task GutBrainIE - Lean
Large language models with syntactic features, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.),
Working Notes of CLEF 2025 – Conference and Labs of the Evaluation Forum, CEUR Workshop
Proceedings, CEUR-WS, 2025.
[21] Y. Liu, LYX_DMIIP_FDU at BioASQ 2025: Utilizing BERT embeddings for biomedical text mining,
in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes of CLEF 2025 – Conference and
Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS, 2025.
[22] S. Taylor, C. Dil, A. Shah, Jannat, C. Oldham, A. Upadhyay, J. Varughese, N. Yazbeck, B. T. McInnes,
NLP@VCU at BioASQ2025: Information Extraction on the GutBrainIE dataset, in: G. Faggioli,
N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes of CLEF 2025 – Conference and Labs of the
Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS, 2025.
[23] B. Kantz, P. Waldert, S. Lengauer, T. Schreck, Constrained Linked Entity ANnotation using RAG
(CLEANR), in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes of CLEF 2025 –
Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS, 2025.
[24] S. Bogdanov, A. Constantin, T. Bernard, B. Crabbé, E. Bernard, NuNER: Entity Recognition Encoder</p>
      <p>Pre-training via LLM-Annotated Data, 2024. arXiv:2402.15343.
[25] Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao, H. Poon,
Domainspecific language model pretraining for biomedical natural language processing, ACM Transactions
on Computing for Healthcare (HEALTH) 3 (2021) 1–23.
[26] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, J. Kang, BioBERT: pre-trained biomedical language
representation model for biomedical text mining, arXiv preprint arXiv:1901.08746 (2019).
[27] M. Yasunaga, J. Leskovec, P. Liang, LinkBERT: Pretraining Language Models with Document</p>
      <p>Links, in: Association for Computational Linguistics (ACL), 2022.
[28] S. Alrowili, V. Shanker, BioM-Transformers: Building Large Biomedical Language Models with
BERT, ALBERT and ELECTRA, in: Proceedings of the 20th Workshop on Biomedical Language
Processing, Association for Computational Linguistics, Online, 2021, pp. 221–227. URL: https:
//www.aclweb.org/anthology/2021.bionlp-1.24.
[29] S. Chakraborty, E. Bisong, S. Bhatt, T. Wagner, R. Elliott, F. Mosconi, BioMedBERT: A pre-trained
biomedical language model for QA and IR, in: Proceedings of the 28th international conference
on computational linguistics, 2020, pp. 669–679.
[30] C. Sutton, A. McCallum, et al., An introduction to conditional random fields, Foundations and</p>
      <p>Trends® in Machine Learning 4 (2012) 267–373.
[31] P. Micikevicius, S. Narang, J. Alben, G. Diamos, E. Elsen, D. Garcia, B. Ginsburg, M. Houston,</p>
      <p>O. Kuchaiev, G. Venkatesh, et al., Mixed precision training, arXiv preprint arXiv:1710.03740 (2017).
[32] Z. Ji, J. Li, M. Telgarsky, Early-stopped neural networks are consistent, Advances in Neural</p>
      <p>Information Processing Systems 34 (2021) 1805–1817.
[33] A. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding
algorithm, IEEE transactions on Information Theory 13 (2003) 260–269.
[34] H. Nakayama, seqeval: A Python framework for sequence labeling evaluation, 2018. URL:
https://github.com/chakki-works/seqeval, software available from
https://github.com/chakkiworks/seqeval.
[35] A. Yazdani, I. Stepanov, D. Teodoro, GLiNER-biomed: A Suite of Eficient Models for Open
Biomedical Named Entity Recognition, 2025. URL: https://arxiv.org/abs/2504.00676. arXiv:2504.00676.
[36] J. Li, Y. Sun, R. J. Johnson, D. Sciaky, C.-H. Wei, R. Leaman, A. P. Davis, C. J. Mattingly, T. C.</p>
      <p>Wiegers, Z. Lu, BioCreative V CDR task corpus: a resource for chemical disease relation extraction,
Database 2016 (2016).
[37] L. A. Ramshaw, M. P. Marcus, Text chunking using transformation-based learning, in: Natural
language processing using very large corpora, Springer, 1999, pp. 157–176.
[38] C. Liu, H. Fan, J. Liu, Span-based nested named entity recognition with pretrained language model,
in: Database Systems for Advanced Applications: 26th International Conference, DASFAA 2021,
Taipei, Taiwan, April 11–14, 2021, Proceedings, Part II 26, Springer, 2021, pp. 620–628.
[39] C.-H. Wei, H.-Y. Kao, Z. Lu, PubTator: a web-based text mining tool for assisting biocuration,</p>
      <p>Nucleic acids research 41 (2013) W518–W522.
[40] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott,
L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at scale, arXiv
preprint arXiv:1911.02116 (2019).
[41] H. Shelar, G. Kaur, N. Heda, P. Agrawal, Named entity recognition approaches and their comparison
for custom ner model, Science &amp; Technology Libraries 39 (2020) 324–337.
[42] R. Weischedel, M. Palmer, M. Marcus, E. Hovy, S. Pradhan, L. Ramshaw, N. Xue, A. Taylor,
J. Kaufman, M. Franchini, et al., Ontonotes release 5.0 ldc2013t19, Linguistic Data Consortium,
Philadelphia, PA 23 (2013) 20.
[43] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,</p>
      <p>Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692 (2019).
[44] L. Richardson, Beautiful soup documentation, 2007.
[45] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt,</p>
      <p>S. Altman, S. Anadkat, et al., Gpt-4 technical report, arXiv preprint arXiv:2303.08774 (2023).
[46] S. Wang, X. Sun, X. Li, R. Ouyang, F. Wu, T. Zhang, J. Li, G. Wang, Gpt-ner: Named entity
recognition via large language models, arXiv preprint arXiv:2304.10428 (2023).
[47] M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P.-E. Mazaré, M. Lomeli, L. Hosseini,</p>
      <p>H. Jégou, The faiss library, arXiv preprint arXiv:2401.08281 (2024).
[48] M. Dong, Z. Cheng, C. Luo, T. He, Retrieval-Augmented Generation for Large Language Model
based Few-shot Chinese Spell Checking, in: Proceedings of the 31st International Conference on
Computational Linguistics, 2025, pp. 10767–10780.
[49] Y. Vasiliev, Natural language processing with Python and spaCy: A practical introduction, No</p>
      <p>Starch Press, 2020.
[50] P. He, J. Gao, W. Chen, DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training
with Gradient-Disentangled Embedding Sharing, 2021. arXiv:2111.09543.
[51] L. Ramshaw, M. Marcus, Text Chunking using Transformation-Based Learning, in: Third Workshop
on Very Large Corpora, 1995. URL: https://aclanthology.org/W95-0107/.
[52] F. Liu, E. Shareghi, Z. Meng, M. Basaldella, N. Collier, Self-alignment pretraining for biomedical
entity representations, in: Proceedings of the 2021 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language Technologies, Association
for Computational Linguistics, Online, 2021, pp. 4228–4238. URL: https://aclanthology.org/2021.
naacl-main.334. doi:10.18653/v1/2021.naacl- main.334.
[53] I. Beltagy, K. Lo, A. Cohan, SciBERT: Pretrained Language Model for Scientific Text, in: EMNLP,
2019. arXiv:arXiv:1903.10676.
[54] S. A. Moxon, H. Solbrig, D. R. Unni, D. Jiao, R. M. Bruskiewich, J. P. Balhof, G. Vaidya, W. D.</p>
      <p>Duncan, H. Hegde, M. Miller, et al., The Linked Data Modeling Language (LinkML): A
GeneralPurpose Data Modeling Framework Grounded in Machine-Readable Semantics, ICBO 3073 (2021)
148–151.
[55] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F.
Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao,
T. Lavril, T. Wang, T. Lacroix, W. E. Sayed, Mistral 7B, 2023. URL: https://arxiv.org/abs/2310.06825.
arXiv:2310.06825.
[56] P. Ruas, F. M. Couto, NILINKER: attention-based approach to NIL entity linking, Journal of</p>
      <p>Biomedical Informatics 132 (2022) 104137.
[57] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih,
T. Rocktäschel, et al., Retrieval-augmented generation for knowledge-intensive nlp tasks, Advances
in neural information processing systems 33 (2020) 9459–9474.
[58] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, et al., Lora: Low-rank
adaptation of large language models, ICLR 1 (2022) 3.
[59] R. Teknium, J. Quesnelle, C. Guang, Hermes 3 technical report, arXiv preprint arXiv:2408.11857
(2024).
[60] K. Sinha, R. Jia, D. Hupkes, J. Pineau, A. Williams, D. Kiela, Masked language modeling and the
distributional hypothesis: Order word matters pre-training for little, arXiv preprint arXiv:2104.06644
(2021).
[61] P.-L. Huguet Cabot, R. Navigli, REBEL: Relation extraction by end-to-end language generation,
in: Findings of the Association for Computational Linguistics: EMNLP 2021, Association for
Computational Linguistics, Punta Cana, Dominican Republic, 2021, pp. 2370–2381. URL: https:
//aclanthology.org/2021.findings-emnlp.204.
[62] A. Jarrahi, R. Mousa, L. Safari, SLCNN: Sentence-level convolutional neural network for text
classification, arXiv preprint arXiv:2301.11696 (2023).
[63] Y. Feng, H. You, Z. Zhang, R. Ji, Y. Gao, Hypergraph neural networks, in: Proceedings of the AAAI
conference on artificial intelligence, volume 33, 2019, pp. 3558–3565.
[64] S. Bai, F. Zhang, P. H. Torr, Hypergraph convolution and hypergraph attention, Pattern Recognition
110 (2021) 107637.</p>
    </sec>
    <sec id="sec-10">
      <title>A. Subtask 6.1 (NER) Overall Results</title>
      <p>team_id
ICUE
ICUE
ICUE
ICUE
LYX-DMIIP-FDU
NLPatVCU
NLPatVCU
NLPatVCU
NLPatVCU
NLPatVCU
Schemalink
ata2425ds
ata2425ds
ata2425ds
ataupd2425-gainer
ataupd2425-gainer
ataupd2425-gainer
ataupd2425-pam
ataupd2425-pam
ataupd2425-pam
ataupd2425-pam
ataupd2425-pam
ataupd2425-pam
ataupd2425-pam
ataupd2425-pam
ataupd2425-pam
ataupd2425-pam
greenday
lasigeBioTM
lasigeBioTM
run_id
single3
single4
single5
single6
run1
ensemble1
ensemble2
ensemble3
model4
model6
1
HTMLremoval
hyperparams
trf
ma
md
ms
10
1
2
3
4
5
6
7
8
9
1
R1
R1
system_desc
biolinkbert
pubmedbertb
pubmedbertb
biolinkbertpubtator
EnsembleBERT
ensemble1
ensemble2
ensemble3
model4
model6
SchemaBasedMultiPrompt
tranformer
trainplatinumandgold
trainplatinumgolddev
trainplatinumgoldsilver
customCRF
biobert-base-cased-v1.2-14-CW-xtreme
biosyn-sapbert-bc2gn-8
biosyn-sapbert-bc2gn-12
BiomedNLP-BiomedBERT
NuNerv2.0-22-CW-xtreme
scibert-47
scibert-27
customCRF-LowF
customCRF-LowF
llmner
BENTMistral
MistralBaseline</p>
    </sec>
    <sec id="sec-11">
      <title>B. Subtask 6.2.1 (BT-RE) Overall Results</title>
      <p>team_id run_id system_desc
ToGS hermes3breorder CLEANR
ToGS hermes8b CLEANR
ToGS hermes8bragreorder CLEANR
ToGS hermes8breorder CLEANR
ToGS openai4omini CLEANR
ToGS openai4ominirag CLEANR
ToGS openai4ominiragreorder CLEANR
ToGS openai4ominireorder CLEANR
ataupd2425-gainer ba1 trainplatinumgolddev
ataupd2425-gainer ba2 trainplatinumgoldsilver
ataupd2425-gainer ba trainplatinumandgold
ataupd2425-gainer bd1 trainplatinumgolddev
ataupd2425-gainer bd2 trainplatinumgoldsilver
ataupd2425-gainer bd trainplatinumandgold
ataupd2425-gainer bp1 trainplatinumgolddev
ataupd2425-gainer bp2 trainplatinumgoldsilver
ataupd2425-gainer bp trainplatinumandgold
ataupd2425-gainer bs1 trainplatinumgolddev
ataupd2425-gainer bs2 trainplatinumgoldsilver
ataupd2425-gainer bs trainplatinumandgold
ataupd2425-pam A0 RE-BiomedNLP-1NoRel-1epoch
ataupd2425-pam A1 RE-BiomedNLP-1NoRel-1epoch
ataupd2425-pam A2 RE-BiomedNLP-1NoRel-1epoch
ataupd2425-pam A3 RE-BiomedNLP-2NoRel-1epoch
ataupd2425-pam A4 RE-BiomedNLP-2NoRel-1epoch
ataupd2425-pam A5 RE-BiomedNLP-2NoRel-1epoch
ataupd2425-pam A6 RE-BiomedNLP-3NoRel-1epoch
ataupd2425-pam A7 RE-BiomedNLP-3NoRel-1epoch
ataupd2425-pam A8 RE-BiomedNLP-3NoRel-1epoch</p>
    </sec>
    <sec id="sec-12">
      <title>C. Subtask 6.2.2 (T T-RE) Overall Results</title>
      <p>team_id run_id system_desc
ToGS openai4ominiragreorder CLEANR
ToGS openai4ominireorder CLEANR
ataupd2425-gainer ta1 trainplatinumgolddev
ataupd2425-gainer ta2 trainplatinumgoldsilver
ataupd2425-gainer ta trainplatinumandgold
ataupd2425-gainer td1 trainplatinumgolddev
ataupd2425-gainer td2 trainplatinumgoldsilver
ataupd2425-gainer td trainplatinumandgold
ataupd2425-gainer ts1 trainplatinumgolddev
ataupd2425-gainer ts2 trainplatinumgoldsilver
ataupd2425-gainer ts trainplatinumandgold
ataupd2425-pam B0 RE-BiomedNLP-1NoRel-1epoch
ataupd2425-pam B1 RE-BiomedNLP-1NoRel-1epoch
ataupd2425-pam B2 RE-BiomedNLP-1NoRel-1epoch
ataupd2425-pam B3 RE-BiomedNLP-2NoRel-1epoch
ataupd2425-pam B4 RE-BiomedNLP-2NoRel-1epoch
ataupd2425-pam B5 RE-BiomedNLP-2NoRel-1epoch
ataupd2425-pam B6 RE-BiomedNLP-3NoRel-1epoch
ataupd2425-pam B7 RE-BiomedNLP-3NoRel-1epoch
ataupd2425-pam B8 RE-BiomedNLP-3NoRel-1epoch
lasigeBioTM R1 BENTMistral
lasigeBioTM R1 BENTMistralSemantic
lasigeBioTM R1 Baseline
lasigeBioTM R1 ConstParsing</p>
    </sec>
    <sec id="sec-13">
      <title>D. Subtask 6.2.3 (TM-RE) Overall Results</title>
      <p>team_id run_id system_desc
ToGS openai4ominiragreorder CLEANR
ToGS openai4ominireorder CLEANR
ataupd2425-gainer tma1 trainplatinumgolddev
ataupd2425-gainer tma2 trainplatinumgoldsilver
ataupd2425-gainer tma trainplatinumandgold
ataupd2425-gainer tmd1 trainplatinumgolddev
ataupd2425-gainer tmd2 trainplatinumgoldsilver
ataupd2425-gainer tmd trainplatinumandgold
ataupd2425-gainer tms1 trainplatinumgolddev
ataupd2425-gainer tms2 trainplatinumgoldsilver
ataupd2425-gainer tms trainplatinumandgold
ataupd2425-pam C0 RE-BiomedNLP-1NoRel-1epoch
ataupd2425-pam C1 RE-BiomedNLP-1NoRel-1epoch
ataupd2425-pam C2 RE-BiomedNLP-1NoRel-1epoch
ataupd2425-pam C3 RE-BiomedNLP-2NoRel-1epoch
ataupd2425-pam C4 RE-BiomedNLP-2NoRel-1epoch
ataupd2425-pam C5 RE-BiomedNLP-2NoRel-1epoch
ataupd2425-pam C6 RE-BiomedNLP-3NoRel-1epoch
ataupd2425-pam C7 RE-BiomedNLP-3NoRel-1epoch
ataupd2425-pam C8 RE-BiomedNLP-3NoRel-1epoch
lasigeBioTM R1 BENTMistral
lasigeBioTM R1 Baseline
lasigeBioTM R1 ConstParsing
lasigeBioTM R1 BENTMistralSemantic</p>
      <p>F1
0.0034
0.0000
0.1552
0.1492
0.2051
0.2035
0.1965
0.2437
0.2078
0.2012
0.2542
0.2447
0.2439
0.2395
0.2602
0.2607
0.2553
0.2716
0.2738
0.2645
0.0101
0.0026
0.0000
0.0000</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Appleton</surname>
          </string-name>
          ,
          <article-title>The gut-brain axis: influence of microbiota on mood and mental health</article-title>
          ,
          <source>Integrative Medicine: A Clinician's Journal</source>
          <volume>17</volume>
          (
          <year>2018</year>
          )
          <fpage>28</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Carabotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Scirocco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Maselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Severi</surname>
          </string-name>
          ,
          <article-title>The gut-brain axis: interactions between enteric microbiota, central and enteric nervous systems</article-title>
          ,
          <source>Annals of gastroenterology: quarterly publication of the Hellenic Society of Gastroenterology</source>
          <volume>28</volume>
          (
          <year>2015</year>
          )
          <fpage>203</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Cryan</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. J. O'Riordan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Sandhu</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Peterson</surname>
            ,
            <given-names>T. G. Dinan,</given-names>
          </string-name>
          <article-title>The gut microbiome in neurological disorders</article-title>
          ,
          <source>The Lancet Neurology</source>
          <volume>19</volume>
          (
          <year>2020</year>
          )
          <fpage>179</fpage>
          -
          <lpage>194</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghaisas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Maher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kanthasamy</surname>
          </string-name>
          ,
          <article-title>Gut microbiome in health and disease: Linking the microbiome-gut-brain axis and environmental factors in the pathogenesis of systemic and neurodegenerative diseases</article-title>
          ,
          <source>Pharmacology &amp; therapeutics 158</source>
          (
          <year>2016</year>
          )
          <fpage>52</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Katsimpras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodríguez-Ortega</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rodriguez-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Loukachevitch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sakhovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tutubalina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dimitriadis</surname>
          </string-name>
          , G. Tsoumakas,
          <string-name>
            <given-names>G.</given-names>
            <surname>Giannakoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bekiaridou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Samaras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Martinelli</surname>
          </string-name>
          , G. Silvello, G. Paliouras,
          <source>Overview of BioASQ</source>
          <year>2025</year>
          :
          <article-title>The thirteenth BioASQ challenge on large-scale biomedical semantic indexing and question answering</article-title>
          ,
          <source>volume TBA of Lecture Notes in Computer Science</source>
          , Springer,
          <year>2025</year>
          , p.
          <source>TBA.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Katsimpras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodríguez-Ortega</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Loukachevitch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sakhovskiy</surname>
          </string-name>
          , E. Tutubalina, G. Tsoumakas,
          <string-name>
            <given-names>G.</given-names>
            <surname>Giannakoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bekiaridou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Samaras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Menotti</surname>
          </string-name>
          , G. Silvello, G. Paliouras, BioASQ at CLEF2025:
          <article-title>The Thirteenth Edition of the Large-Scale Biomedical Semantic Indexing and Question Answering Challenge</article-title>
          , in: C.
          <string-name>
            <surname>Hauf</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Jannach</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Kazai</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          <string-name>
            <surname>Nardini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Pinelli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Silvestri</surname>
          </string-name>
          , N. Tonellotto (Eds.),
          <source>Advances in Information Retrieval - 47th European Conference on Information Retrieval</source>
          ,
          <string-name>
            <surname>ECIR</surname>
          </string-name>
          <year>2025</year>
          , Lucca, Italy, April 6-
          <issue>10</issue>
          ,
          <year>2025</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>V</given-names>
          </string-name>
          , volume
          <volume>15576</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2025</year>
          , pp.
          <fpage>407</fpage>
          -
          <lpage>415</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -88720-8_
          <fpage>61</fpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>031</fpage>
          - 88720- 8\_
          <fpage>61</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Badaskar</surname>
          </string-name>
          ,
          <article-title>A review of relation extraction</article-title>
          ,
          <source>Literature review for Language and Statistics II</source>
          <volume>2</volume>
          (
          <year>2007</year>
          )
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>U.</given-names>
            <surname>Zaratiana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tomeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Holat</surname>
          </string-name>
          , T. Charnois,
          <article-title>GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer</article-title>
          , in: K. Duh,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , S. Bethard (Eds.),
          <source>Proceedings of the</source>
          <year>2024</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Association for Computational Linguistics</article-title>
          , Mexico City, Mexico,
          <year>2024</year>
          , pp.
          <fpage>5364</fpage>
          -
          <lpage>5376</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .
          <article-title>naacl-long</article-title>
          .
          <volume>300</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .naacl- long.300.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Belinkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Rush</surname>
          </string-name>
          ,
          <article-title>Learning from others mistakes: Avoiding dataset biases without modeling them</article-title>
          , arXiv preprint arXiv:
          <year>2012</year>
          .
          <volume>01300</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Huang</surname>
          </string-name>
          , T. Ma, J. Huang,
          <article-title>Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Piron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <article-title>Named Entity Recognition with GLiNER and Relation Extraction with LLMs</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pamio</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. M.</surname>
          </string-name>
          <article-title>Di Nunzio, BioASQ task GutBrainIE 2025 Task 6: Comparing CRF vs BERT Models for Named Entity Recognition and Relation Extraction</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Keinan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D. N.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tsarfaty</surname>
          </string-name>
          , From Named Entities to Relations:
          <article-title>End-to-End Biomedical Information Extraction</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <article-title>Enhancing Biomedical Named Entity Recognition using GLiNER-BioMed with Targeted</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>