<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of BioASQ Tasks 12b and Synergy12 in CLEF2024</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anastasios Nentidis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgios Katsimpras</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasia Krithara</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgios Paliouras</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aristotle University of Thessaloniki</institution>
          ,
          <addr-line>Thessaloniki</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>NCSR Demokritos</institution>
          ,
          <addr-line>Athens</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents an overview of the twelfth edition of BioASQ challenge, which is part of the Conference and Labs of the Evaluation Forum (CLEF) 2024. BioASQ serves as a key platform for advancing large-scale biomedical information retrieval and question-answering (QA) systems and includes a variety of tasks. In this paper, we present an overview of the QA tasks b and Synergy of the BioASQ 12 challenge. Notably, BioASQ 12 introduces an additional phase (Phase A+) for task b, further expanding the challenge's scope. This year, 27 teams with more than 100 systems participated in the two tasks of the challenge, with 26 of them focusing on task 12b, and 4 on task Synergy. While the total number of participating teams varies year-to-year, the increasing rate of new team participation, as observed in previous editions, highlights the impact of BioASQ in fostering robust biomedical QA solutions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Biomedical knowledge</kwd>
        <kwd>Semantic Indexing</kwd>
        <kwd>Question Answering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Overview of the Tasks</title>
      <sec id="sec-2-1">
        <title>2.1. Biomedical semantic QA - Task 12b</title>
        <p>Task 12b introduces a comprehensive question-answering challenge in the biomedical field. Participants
are required to create systems that address all stages of question-answering. Similar to previous editions,
the task focuses on four question types: ‘yes/no,’ ‘factoid,’ ‘list,’ and ‘summary’ questions [6].</p>
        <p>In the twelfth edition of the BioASQ Challenge, participating teams were provided with a new version
of the BioASQ QA training dataset, containing 5,046 questions that had been annotated with relevant
golden elements and answers from previous task versions [7]. These questions served as the basis for
developing their systems. The details of both the training and testing sets for task 12b are outlined in
Table 1. These statistics reveal that the average number of documents and snippets in training data is
significantly larger than in the test batches. This can be attributed to two main factors. First, in the early
years of BioASQ the annotation with relevant documents and snippets by the experts was exhaustive,
in an attempt to identify as many relevant items as possible in the corpus. These questions are part of
the training datasets afecting the average number of relevant items per question. Currently, only a
suficient number of relevant answers is required when the initial version of the data is developed. Still,
when the participants submit their responses, the experts assess the submitted items and enrich the
ground-truth data with potential additional relevant items detected by the participants. The numbers
of relevant items for the test sets in Table 1 are preliminary, before the enrichment by the assessment
process which is still in progress. The final evaluation of the participants will be against these enriched
relevant items, ensuring that all the submitted items that are relevant are indeed handled as such.</p>
        <p>Unlike previous challenges, task 12b consisted of three phases. An additional phase (Phase A+) of
submitting answers (exact and/or ideal), before the golden documents and snippets become available,
i.e. answers based on documents identified by participant systems, was provided. The goal of this
additional phase is to compare the performance of the competing systems with and without golden
feedback. Task 12b was divided into four independent bi-weekly batches and the three phases for each
batch run for two consecutive days. The three phases of task 12b consist of: (phase A) the retrieval of
the required information, (phase A+) answering the question without golden feedback and (phase B)
answering the question with golden feedback, which run for two consecutive days for each batch. In
each phase, the participants receive the corresponding test set and have 24 hours to submit the answers
of their systems. In the current year, the test sets comprised 85 questions each. For each test set, the
respective questions, written in English, were released for phase A and the participants were expected
to identify and submit relevant elements from designated resources, including PubMed/MedLine articles
and snippets extracted from these articles. Then, these questions were also released in phase A+ and the
participating systems were asked to respond with exact answers, that is entity names or short phrases,
and ideal answers, that is natural language summaries of the requested information. Finally, during
phase B, manually selected relevant articles and snippets related to these questions were also made
available, and participating systems were once again asked to provide exact answers and ideal answers.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Synergy12 Task</title>
        <p>In the BioASQ challenge, the Synergy task was introduced in its ninth edition to foster collaboration
between biomedical experts studying COVID-19 and automated question-answering systems
participating in BioASQ. The goal is to create a synergy where experts assess system responses, and this feedback
is used to iteratively improve the systems.</p>
        <p>In the process depicted in Figure 1, competing systems provide their initial responses to open
questions related to emerging problems. These responses, along with relevant documents and snippets,
are evaluated by experts. Subsequently, the experts provide feedback to the systems and address any
new or pending questions.</p>
        <p>This version of the Synergy task (Synergy12) involved a series of four rounds, with a two-week
interval between each round. The task focused on emerging issues, drawing from relevant documents
in the current PubMed version. As with earlier versions, the questions posed were open-ended, allowing
for dynamic responses.</p>
        <p>In the Synergy task, during each round, the system responses and expert feedback address the same
questions, unless those questions have already been closed by experts due to receiving a comprehensive
and definite answer. Specifically, in Synergy12, a group of six biomedical experts contributed a total
of 72 open biomedical questions. They evaluated the retrieved material (including documents and
snippets) and the responses submitted by participating systems in all four rounds. Table 2 shows the
details of the datasets used in task Synergy12.</p>
        <p>Synergy12, similar to task 12b, explores four question types: yes/no, factoid, list, and summary, and
two types of answers, exact and ideal. Moreover, the evaluation of systems relies on the same measures
used in task 12b. Upon completing the Synergy12 task, relevant material was identified for answering
roughly 78% of the questions. Additionally, around 51% of the questions had at least one ideal answer
submitted by the systems, which was deemed satisfactory by the expert who posed the question.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Overview of participation</title>
      <p>In this year’s BioASQ challenge, over 100 distinct systems engaged in tasks 12b and Synergy12 with
a total of 27 teams. Specifically, 26 of these teams submitted on task 12b and 4 on task Synergy12.
Furthermore, Figure 2 demonstrates the global interest in the challenge, with participating teams
representing various countries worldwide.</p>
      <p>In line with previous years, task b attracted more participants than Synergy. Furthermore, Figure 3
illustrates a considerable increase in the total number of participating teams this year in comparison to
last year. Additionally, the high percentage of teams joining the BioASQ challenge for the first time
(indicated by red circles in Figure 2), indicates the enduring interest of the community in large-scale
biomedical semantic indexing and question answering. Specifically, 16 new teams participated in this
year’s BioASQ tasks b and Synergy.
3.1. Task 12b
In task 12b, a total of 26 teams participated this year, contributing 89 diferent systems across all three
phases A, A+, and B. Specifically, 18 teams with 64 systems competed in phase A, 8 teams with 34
systems participated in A+, and phase B saw 16 participants with 54 systems. Notably, 8 teams were
involved in all three phases, as depicted in Figure 4.</p>
      <sec id="sec-3-1">
        <title>3.2. Synergy Task</title>
        <p>In task Synergy12, 4 teams participated this year contributing a total of 16 distinct systems. Since
Synergy12 shares some common concepts with task 12b, a few teams participated in both tasks.</p>
        <p>Specifically, 3 teams engaged in both task 12b and Synergy12, as depicted in Figure 5. However,
consistent with previous versions of the tasks, fewer teams participated in Synergy12 compared to
task 12b. This could be due to the particularities of open questions in Synergy, such as the volatility
of answers and the evolving nature of the relevant knowledge which pose greater challenges than
traditional question answering.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>In this paper, we introduced the twelfth version of the BioASQ tasks b and Synergy. Both tasks are
already established through the previous versions of the challenge. The participation of teams was
comparable to last year’s version of these tasks with a slight decrease. On the other hand, we noticed
a high number of newly registered teams. Therefore, we believe that the challenge and the datasets
developed for its tasks increase the research community’s interest in question answering</p>
      <p>In this paper, we introduced the twelfth version of the BioASQ challenge, focusing on tasks b and
Synergy. These tasks have been well-established through previous versions of the challenge. Notably,
team participation has grown and we observed a significant increase in newly registered teams. As a
result, we consider that the challenge, along with the associated datasets, has sparked greater interest
within the research community and continues to advance the field of biomedical semantic indexing and
question answering.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>Google was a proud sponsor of the BioASQ Challenge in 2023. The twelfth edition of BioASQ is also
sponsored by Ovid Technologies, Inc., Elsevier, and Atypon Systems inc. The MEDLINE/PubMed data
resources considered in this work were accessed courtesy of the U.S. National Library of Medicine.
[4] V. Davydova, N. Loukachevitch, E. Tutubalina, Overview of BioNNE Task on Biomedical Nested
Named Entity Recognition at BioASQ 2024, in: G. Faggioli, N. Ferro, P. Galuščáková, A. García
Seco de Herrera (Eds.), Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum,
2024.
[5] G. Tsatsaronis, G. Balikas, P. Malakasiotis, I. Partalas, M. Zschunke, M. R. Alvers, D. Weissenborn,
A. Krithara, S. Petridis, D. Polychronopoulos, Y. Almirantis, J. Pavlopoulos, N. Baskiotis, P. Gallinari,
T. Artieres, A. Ngonga, N. Heino, E. Gaussier, L. Barrio-Alvers, M. Schroeder, I. Androutsopoulos,
G. Paliouras, An overview of the BIOASQ large-scale biomedical semantic indexing and question
answering competition, BMC Bioinformatics 16 (2015) 138. doi:10.1186/s12859-015-0564-6.
[6] G. Balikas, I. Partalas, A. Kosmopoulos, S. Petridis, P. Malakasiotis, I. Pavlopoulos, I.
Androutsopoulos, N. Baskiotis, E. Gaussier, T. Artieres, P. Gallinari, Evaluation Framework Specifications, Project
deliverable D4.1, UPMC, 2013.
[7] A. Krithara, A. Nentidis, K. Bougiatiotis, G. Paliouras, BioASQ-QA: A manually curated corpus for
Biomedical Question Answering, Scientific Data 10 (2023) 170.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Katsimpras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lima-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Farré-Maduell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Loukachevitch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Davydova</surname>
          </string-name>
          , E. Tutubalina, G. Paliouras,
          <source>Overview of BioASQ</source>
          <year>2024</year>
          :
          <article-title>The twelfth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Quénot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Maria Di Nunzio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuščáková</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ),
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          , G. Paliouras,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. G.</given-names>
            <surname>Sanchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Farre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Loukachevitch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Davydova</surname>
          </string-name>
          , E. Tutubalina, BioASQ at CLEF2024:
          <article-title>The Twelfth Edition of the Large-Scale Biomedical Semantic Indexing and Question Answering Challenge</article-title>
          ,
          <source>in: European Conference on Information Retrieval</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>490</fpage>
          -
          <lpage>497</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lima-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Farré-Maduell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rodríguez-Miret</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodríguez-Ortega</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lenkowicz</surname>
          </string-name>
          , G. Ceroni,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kossof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          , G. Katsimpras, G. Paliouras,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Krallinger, Overview of MultiCardioNER task at BioASQ 2024 on Medical Speciality and Language Adaptation of Clinical NER Systems for Spanish, English and Italian</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . García Seco de Herrera (Eds.),
          <source>CLEF Working Notes</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>