<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <abstract>
        <p>1 https://clef2025.clef-initiative.eu/</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The CLEF 2025 conference is the twenty-sixth edition of the popular CLEF
campaign and workshop series that has run since 2000, contributing to the
systematic evaluation of multilingual and multimodal information access systems,
primarily through experimentation on shared tasks. In 2010, CLEF was launched
in a new format, as a conference with research presentations, panels, poster and
demo sessions and laboratory evaluation workshops. These are proposed and
operated by groups of organizers volunteering their time and efort to define,
promote, administer and run an evaluation activity.</p>
      <p>CLEF 20251 was organized by the Universidad Nacional de Educacoi´n a
Distancia (UNED), Madrid, Spain, from 9 to 12 September 2025. CLEF 2025 was
the 16th year of the CLEF Conference and the 26th year of the CLEF initiative
as a forum for Information Retrieval (IR) Evaluation. The conference format
remained the same as in past years and consisted of keynotes, contributed papers,
lab sessions, and poster sessions, including reports from other benchmarking
initiatives from around the world. All sessions were organized in person, but
also allowed for remote participation for those who were not able to attend
physically.</p>
      <p>A total of 20 lab proposals were received and evaluated in peer review based
on their innovation potential and the quality of the resources created. The 14
selected labs represented scientific challenges based on new datasets and real-world
problems in multimodal and multilingual information access. These datasets
provide unique opportunities for scientists to explore collections, to develop
solutions for these problems, to receive feedback on the performance of their solutions
and to discuss the challenges with peers at the workshops. In addition to these
workshops, the labs reported results of their year-long activities in overview talks
and lab sessions.</p>
      <p>We continued the mentorship program to support the preparation of lab
proposals for newcomers to CLEF. The CLEF newcomers mentoring program
ofered help, guidance, and feedback on the writing of draft lab proposals by
assigning a mentor to proponents, who helped them in preparing and maturing
the lab proposal for submission.</p>
      <p>Building on previous experience, the Labs at CLEF 2025 demonstrate the
maturity of the CLEF evaluation environment by creating new tasks, new and
larger data sets, new ways of evaluation or more languages. Details of the
individual Labs are described by the Lab organizers in these proceedings.</p>
      <p>The 14 labs running as part of CLEF 2025 comprised mainly labs that
continued from previous editions at CLEF (BioASQ, CheckThat!, ELOQUENT,
eRisk, EXIST, ImageCLEF, JOKER, LifeCLEF, LongEval, PAN,
QuantumCLEF, SimpleText, and Touche)´ and a new pilot/workshop activity:
TalentCLEF. In the following, we give a few details for each of the labs organized at
CLEF 2025 (presented in alphabetical order):
BioASQ: Large-scale Biomedical Semantic Indexing and Question
Answering2 aims to push the research frontier towards systems that use
the diverse and voluminous information available online to respond directly
to the information needs of biomedical scientists. This edition of BioASQ
ofered the following tasks: Task 1 (13b) – Biomedical Semantic Question
Answering: Benchmark datasets of biomedical questions, in English, along
with gold standard (reference) answers constructed by a team of biomedical
experts. The participants have to respond with relevant articles, and snippets
from designated resources, as well as exact and “ideal” answers. Task 2 –
Synergy: Question Answering for developing problems: Biomedical experts pose
unanswered questions for developing problems, such as COVID-19, receive
the responses provided by the participating systems, and provide feedback,
together with updated questions in an iterative procedure that aims to
facilitate the incremental understanding of developing problems in biomedicine
and public health. Task 3 – MultiClinSum: Multilingual Clinical
Summarization: a shared task on the automatic summarization of lengthy clinical case
reports written in diferent languages. The organizers distribute lengthy
clinical case reports written in English, Spanish, French, and Portuguese. The
participants generate summaries of the clinical case reports. The evaluation
is based on a comparison with manual summaries of the clinical case reports.
Task 4 – BioNNE-L: Nested Named Entity Linking in Russian and English:
A shared task on Natural Language Processing (NLP) challenges in entity
linking, also known as medical concept normalization (MCN), for English
and Russian languages. The train/dev datasets include annotated mentions
of disorders, anatomical structures, and chemicals. The participants
normalize the entity mentions to concept names and unique UMLS identifiers. The
evaluation is based on a comparison with manual nested named entity
linking annotations. Task 5 – ElCardioCC: Clinical Coding in Cardiology: The
ELCardioCC on the automated clinical coding concerns i) the assignment of
cardiology-related ICD-10 codes to discharge letters from Greek hospitals, ii)
the extraction of the specific mentions of ICD-10 codes from the discharge
letters. The evaluation is based on metrics, such as micro and macro F-measure
for Subtask (i) and token F-measure for Subtask (ii). Task 6 – GutBrainIE:
Gut-Brain Interplay Information Extraction: The GutBrainIE task aims to
foster the development of Information Extraction (IE) systems that support
experts by automatically extracting and linking knowledge from scientific
literature, facilitating the understanding of gut-brain interplay and its role
in neurological disease. The task is divided into two subtasks: i) extraction
of named entities and linking them to concepts in a reference ontology, and
ii) identifying binary relations between entity pairs.</p>
    </sec>
    <sec id="sec-2">
      <title>2 https://www.bioasq.org/workshop2025</title>
      <p>CheckThat! Lab on Checkworthiness, Subjectivity, Persuasion, Roles,
Authorities and Adversarial Robustness.3 The eighth edition of the
CheckThat! lab at CLEF presents a diverse set of challenges aimed at
advancing technology to support and enhance the journalistic verification
process. This edition revisits core tasks in the verification pipeline while also
introducing auxiliary tasks such as subjectivity identification, claim
normalization, and fact-checking numerical claims, with a particular emphasis on
scientific web discourse. These tasks pose complex classification and retrieval
problems at both the document level, including in multilingual contexts. The
lab was organized into the following tasks: Task 1 – Subjectivity: Given a
sentence from a news article, determine whether it is subjective or
objective. This is a binary classification task and is ofered in Arabic, English,
Bulgarian, German, and Italian for mono- and multi-lingual settings.
Additionally, unseen languages like French and Spanish are considered for
zeroshot settings. Task 2 – Claim Normalization: Given a noisy, unstructured
social media post, the task is to simplify it into a concise form. This is a
generation task, ofered in 20 languages: English, Arabic, Bengali, Czech,
German, Greek, French, Hindi, Korean, Marathi, Indonesian, Dutch,
Punjabi, Polish, Portuguese, Romanian, Spanish, Tamil, Telugu, Thai. Task 3 –
Fact-Checking Numerical Claims This task focuses on verifying claims with
numerical quantities and temporal expressions. Numerical claims are defined
as those requiring validation of explicit or implicit quantitative or temporal
details. Participants must classify each claim as True, False, or Conflicting
based on a short list of evidence. Task 4 – Scientific Web Discourse
Processing (SciWeb) which was further divided into two subtasks. Subtask 4.1
– SciWeb Discourse Detection: This task aims at classifying the diferent
forms of science-related online discourse. Namely, given a tweet, this
multilabel task aims at detecting if a tweet contains a scientific claim or scientific
reference or is referring to science contexts or entities. Subtask 4.2 – SciWeb
Claim-Source Retrieval: Given a tweet containing a scientific claim and an
informal reference to a scientific paper, this task aims at retrieving the
scientific paper that serves as the source for the claim from a given pool of
candidate scientific papers.</p>
      <p>ELOQUENT Lab for Evaluation of Generative Language Model
Quality4 addresses high-level quality criteria through a set of open-ended shared
tasks implemented to require minimal human assessment efort. It ofered the
following tasks: Task 1 – Voight-Kampf: Generate text samples for a
classifier to distinguish between human-authored and machine-generated text.
Task 2 – Robustness and Consistency: Explore how much a generative
language model’s output is afected by stylistic, dialectal, or other non-topical
variation in the input. Task 3 – Preference Score Prediction: Predict
human preferences between sets of LLM-generated responses collected from
human assessors, and generate judgments to explain the choice made. Task</p>
    </sec>
    <sec id="sec-3">
      <title>3 https://checkthat.gitlab.io/clef2025/</title>
    </sec>
    <sec id="sec-4">
      <title>4 https://eloquent-lab.github.io/</title>
      <p>4 – Sensemaking: Given a set of possibly noisy texts, generate questions and
answers about the topic.
eRisk: Early Risk Prediction on the Internet5 explores the evaluation
methodology, efectiveness metrics and practical applications (particularly
those related to health and safety) of early risk detection on the Internet.
This year’s edition of eRisk included the following tasks: Task 1 – Search
for Symptoms of Depression: Rank sentences from users according to their
relevance to each of the 21 symptoms of the BDI-II questionnaire. Training
data consists of sentence-tagged datasets from 2023 and 2024, with new test
data including contextual information (previous and next sentences). Task
2 – Contextualized Early Detection of Depression: Participants analyze full
conversational interactions to classify users with signs of depression,
considering the conversational context beyond isolated user writings. The test
phase includes writings with full conversational dynamics, while the training
phase uses isolated user submissions. Pilot Task – Conversational
Depression Detection via LLMs: Participants interact with a persona powered by
a large language model (LLM) that is fine-tuned using types of depressive
and non depressive users. The objective is to detect signs of depression, with
participants limited to a specified number of messages to engage with the
LLM.</p>
      <p>EXIST: sEXism Identification in Social neTworks 6 aims to capture and
categorize sexism, from explicit misogyny to other subtle behaviors, in
social networks. In 2024 the EXIST lab included multimedia content in the
format of memes, stepping forward research on more robust techniques to
identify sexism in social networks. Following this line, in 2025 EXIST
introduces TikTok videos in the challenge, thus including in the dataset the three
most important sources of sexism spreading: text, images, and videos.
Consequently, it is essential to develop automated multimodal tools capable of
detecting sexism in text, images, and videos, to raise alarms or automatically
remove such content from social network because platforms’ algorithms often
amplify content that perpetuates gender stereotypes and internalized
misogyny. This lab contributes to the creation of applications that identify sexist
content in social media across all three formats. EXIST 2025 was divided
into three tasks, each split into three subtasks. Task 1 – Sexism
Identification and Characterization in Tweets Subtask 1.1 – Sexism Identification
in Tweets: The first subtask is a binary classification. The systems have to
decide whether or not a given tweet contains or describes sexist expressions
or behaviors (i.e., it is sexist itself, describes a sexist situation or criticizes a
sexist behavior). Subtask 1.2 – Source Intention in Tweets This subtask aims
to categorize the sexist messages according to the intention of the author in
one of the following categories: (i) direct sexist message, (ii) reported sexist
message and (iii) judgemental message. Subtask 1.3 – Sexism Categorization
in Tweets The third subtask is a multiclass task that aims to categorize</p>
    </sec>
    <sec id="sec-5">
      <title>5 https://erisk.irlab.org/</title>
    </sec>
    <sec id="sec-6">
      <title>6 https://nlp.uned.es/exist2025/</title>
      <p>the sexist messages according to the type or types of sexism they contain
(according to the categorization proposed by experts and that takes into
account the diferent facets of women that are undermined): (i) ideological and
inequality, (ii) stereotyping and dominance, (iii) objectification, (iv) sexual
violence and (v) misogyny and non-sexual violence. Task 2 – Sexism
Identiifcation and Characterization in Memes Subtask 2.1 – Sexism Identification
in Memes: Similar to Subtask 1.1, Subtask 2.1 is a binary classification task
where participants must determine when a meme contains or describes sexist
expressions or behaviors (i.e., it is sexist itself, describes a sexist situation or
criticizes a sexist behavior). Subtask 2.2 – Source Intention in Tweets: This
subtask aims to categorize the sexist messages according to the intention
of the author in one of the following categories: (i) direct sexist message,
(ii) judgmental message. Subtask 2.3 – Sexism Categorization in Memes:
Finally, this subtask addresses the problem of categorizing a sexist meme
according to the type of sexism that it encloses: (i) ideological and
inequality, (ii) stereotyping and dominance, (iii) objectification, (iv) sexual violence
and (v) misogyny and non-sexual violence. Task 3 - Sexism Identification
and Characterization in TikTok Videos Subtask 3.1 – Sexism Identification
in Videos: Similar to Subtasks 1.1 and 2.1, this subtask is a binary
classiifcation task where participants must determine when a meme contains or
describes sexist expressions or behaviors (i.e., it is sexist itself, describes a
sexist situation or criticizes a sexist behavior). Subtask 3.2 – Source Intention
in Videos: This subtask aims to categorize the sexist messages according to
the intention of the author in one of the following categories: (i) direct sexist
message, (ii) judgmental message. Subtask 3.3 – Sexism Categorization in
Videos: Finally, this subtask addresses the problem of categorizing a sexist
meme according to the type of sexism that it encloses: (i) ideological and
inequality, (ii) stereotyping and dominance, (iii) objecticfiation, (iv) sexual
violence and (v) misogyny and non-sexual violence.</p>
      <p>ImageCLEF: Multimodal Challenge in CLEF7 focuses on evaluating
technologies for annotating, indexing, classifying, retrieving and generating
multimodal data, providing access to large datasets across a variety of
scenarios, including medical, social media, and internet-based applications.
Building on the success of recent editions, it encourages interdisciplinary methods
by engaging participants in diverse domains, providing large amounts of
challenging multimodal data and providing am evaluation platform for a large
number of use cases. This year’s edition of ImageCLEF involved the following
tasks: Task 1 – ImageCLEFmedical: In its 21st edition, the task will continue
all the medical sub-tasks from from the last 2 years, namely: (i) the Caption
task with medical concept detection and caption prediction, (ii) the GAN
task focused on synthetic medical images, (iii) MEDVQA regarding Visual
Question Answering for gastrointestinal data, and (iv) MEDIQA-MAGIC,
introducing a new use-case on multimodal dermatology response generation.</p>
      <p>Task 2 – Image Retrieval/Generation for Arguments: As a joint task between</p>
    </sec>
    <sec id="sec-7">
      <title>7 https://www.imageclef.org/2025</title>
      <p>Touche´ and ImageCLEF since 2022, the task aims to show the impact of
images in arguments, making them more compelling. In this year’s task,
participants shall find suitable images that convey a given argument. Two
submission styles are possible, either as a retrieval task or as prompt generation
for an image generator. Task 3 – ImageCLEFtoPicto: The aim of this task
to convert either speech or text into a meaningful sequence of pictograms,
aiding communication for people with language impairments, enhancing user
understanding or helping with translation. Therefore, 2 sub-tasks are derived
from this: (i) Text-to-Picto, involving generating pictograms starting from
a French text and (ii) Speech-to-Picto, which focuses on translating speech
to pictograms directly. Task 4 – MultimodalReason: is a new task, focusing
on Multilingual Visual Question Answering. Participants are given
multiplechoice questions and corresponding images and are asked to identify the
correct answer, in multiple languages, disciplines and dificulty levels. The
task aims to assess the reasoning abilities of modern LLMs across a wide
range of real-world situations.</p>
      <p>JOKER: Automatic Humour Analysis8 aims to foster interdisciplinary
approaches to the (semi-)automatic analysis and processing of humor and
wordplay. Task 1 – Humor-aware Information Retrieval: For Task 1, the aim
is to retrieve short humorous texts from a document collection based on a
given query. The languages are English and Portuguese. Task 2 – Wordplay
Translation: For Task 2, the goal is to translate English punning jokes into
French. Task 3 – Onomastic Wordplay: For Task 3, the goal is to classify
proper names according to whether they are humorous, and to translate
them from English into French. Task 4 – Controlled Creativity: For Task
4, the goal is to identify the introduction of distorted or spurious content
(“hallucinations”) in generated creative texts.</p>
      <p>LifeCLEF: Challenges on Species Presence Prediction and
Identification, and Individual Animal Identification 9 focuses on advancing
Artificial Intelligence (AI)-driven solutions for biodiversity monitoring through
challenges on species and individuals recognition and prediction. Task 1 –
AnimalCLEF: Multi-species individual animal identification. Task 2 –
BirdCLEF: Bird species identification in soundscape recordings. Task 3 –
FungiCLEF: Few-shot classification with rare fungi species. Task 4 –
GeoLifeCLEF: Multi-modal species prediction using remote sensing and large-scale
biodiversity data. Task 5 – PlantCLEF: Multi-species plant identification in
vegetation plot images.</p>
      <p>LongEval: Longitudinal Evaluation of Model Performance10 aims to
ignite the development of Information Retrieval systems that can handle
temporal data evolution. The retrieval systems evaluated in this task are
expected to be persistent in their retrieval eficiency over time, as Web
documents and Web queries evolve. To evaluate such features of systems, we</p>
    </sec>
    <sec id="sec-8">
      <title>8 http://joker-project.com/</title>
    </sec>
    <sec id="sec-9">
      <title>9 https://www.imageclef.org/LifeCLEF2025 10 https://clef-longeval.github.io/</title>
      <p>rely on collections of documents and queries, corresponding to real data
acquired from actual Web search engines. LongEval 2025 included two tasks:
Task 1 – WebRetrieval: this task uses evolving Web data to evaluate IR
system longitudinally, namely, it will assess whether the IR system performance
is persistent over time. Task 2 – SciRetrieval: Similar to Task 1, this task
aims to examine how IR systems’ efectiveness changes over time, when the
underlying document collection changes, where the documents are scientific
publications.</p>
      <p>PAN: Lab on Stylometry and Digital Text Forensics11 aims to advance
the state of the art and provide for an objective evaluation on newly
developed benchmark datasets in those areas. The tasks proposed by PAN Lab
this year included: Task 1 – Generated Content Analysis: Given a document,
decide if it was written by a human, an AI, or both. Task 2 – Multilingual
Text Detoxification: Given a toxic piece of text, re-write it in a non-toxic way
while saving the main content as much as possible. Task 3 – Multi-author
Writing Style Analysis: Given a document, determine at which positions the
author changes. Task 4 – Generated Plagiarism Detection: Given a
generated and a human-written source document, identify the passages of reused
text between them.</p>
      <p>QuantumCLEF: Quantum Computing at CLEF.12 The second edition
of the QuantumCLEF lab is composed of three tasks and aims at:
Discovering and evaluating Quantum Annealing approaches compared to their
traditional counterpart; Identifying new ways of formulating Information
Retrieval and Recommender Systems algorithms and methods, so that they can
be solved with Quantum Annealing; Establishing collaborations among
researchers from diferent fields to harness their knowledge and skills to solve
the considered challenges and promote the usage of Quantum Annealing.
This lab allows participants to use real quantum computers provided by
CINECA, one of the most important computing centers worldwide. Task 1 –
Feature Selection: focuses on formulating the well-known NP-Hard Feature
Selection problem and solving it with quantum annealers. Feature Selection
is a widespread problem for both Information Retrieval and Recommender
systems which requires to identify a subset of the available features (e.g.,
the most informative, less noisy, etc.) to train a learning model. This
problem is very impacting since many of these systems involve the optimization
of learning models, and reducing the dimensionality and noise of the input
data can improve their performance. Task 2 – Instance Selection: focuses
on formulating the Instance Selection problem to solve it through Quantum
Annealing. Currently, transformer-based architectures, including 1st and 2nd
generation transformers (e.g., RoBERTa) as well as current large language
models (e.g., Llama3), are used and considered state-of-the-art in several
ifelds. Given the LLMs high-cost application, one of the big challenges is
to fine-tune these models eficiently. Instance Selection focuses on selecting
11 http://pan.webis.de/
12 https://qclef.dei.unipd.it/
a representative subset of instances from a dataset to make the training of
these models faster while maintaining a high level of efectiveness of the
trained model. Task 3 – Clustering: focuses on the formulation of the
clustering problem to solve it with a quantum annealer. Clustering is a relevant
problem for Information Retrieval and Recommender systems which involves
grouping items together according to their characteristics. Clustering can be
helpful for organizing large collections, helping users to explore a collection
and providing similar results to a query. It can also be used to divide users
according to their interests or build user models with the cluster centroids
boosting eficiency or efectiveness for users with limited data.</p>
      <p>SimpleText: Simplify Scientific Text (and Nothing More) 13 aims at
improving accessibility to scientific information for everyone, developing
corpora, evaluation measures, and new IR/NLP models able to reduce
scientific text complexity with strict faithfulness to the original text. Task 1 –
Text Simplification: simplify scientific text: aims to simplify scientific text,
using aligned biomedical abstracts and lay summaries for sentence-level,
paragraph-level, and document-level text simplification. Task 2 – Controlled
Creativity: identify and avoid hallucination: aims to identify and avoid
hallucination, by either post-hoc detection on CLEF submissions with
overgeneration, or by avoiding creative license of models by design. Task 3 –
SimpleText 2024 Revisited: selected tasks by popular request: aims to rerun
selected tasks by popular request, on scientific passage retrieval and
complex terminology detection, and on tracking the state-of-the-art in scholarly
papers.</p>
      <p>TalentCLEF: Skill and Job Title Intelligence for Human Capital
Management14 aims to drive technological advancement in Human
Capital Management by establishing a public benchmark for NLP models that
facilitates their application in real-world Human Resources (HR)
scenarios, incorporating evaluation criteria including multilingualism, fairness, and
cross-industry adaptability. The lab also seeks to build a community for
researchers and practitioners to generate, evaluate, and discuss ideas on the use
of AI in Human Resources, pushing the state-of-the-art of NLP applications
for Human Resources. Task 1 – Multilingual Job Title Matching: involves the
development of systems that can identify and rank job titles most similar to
a given one. For each job title in a provided test set, participants must
generate a ranked list of similar job titles from a specified knowledge base. The
task includes multilingual and cross-lingual tracks, requiring participants to
develop systems adapted to English, Spanish, German, and optionally
Chinese. Task 2 – Job Title-Based Skill Prediction: involves developing systems
capable of retrieving relevant skills associated with a given job title.
Participants must train models that can retrieve a list of relevant skills from a
provided knowledge base, ranking them according to their relevance to the
job title. This task is in English.
13 http://simpletext-project.com/
14 https://talentclef.github.io/talentclef/
Touche:´ Argumentation Systems 15 focuses on computational
argumentation and causality. Touche´ 2025 included 4 tasks. Task 1 –
RetrievalAugmented Debating: it served to develop generative retrieval systems that
argue against their users to support users in forming or confirming opinions
or to train their debating skills. Task 2 – Ideology and Power Identification in
Parliamentary Debates: it concerned with predicting ideology and power in
the parliamentary debates on a multi-lingual, multi-country dataset. Task 3
– Image Retrieval/Generation for Arguments (Joint task with ImageCLEF):
aimed to find images that support a particular point of view. Task 4 –
Advertisement in Retrieval-Augmented Generation: analyzed possibilities and
counter-measures for advertisements in retrieval-augmented search results.</p>
      <p>CLEF has always been backed by European projects that complement the
incredible amount of volunteering work performed by Lab Organizers and the
CLEF community with the resources needed for its necessary central
coordination, in a similar manner to the other major international evaluation initiatives
such as TREC, NTCIR, FIRE, and MediaEval. Since 2014, the organization of
CLEF no longer has direct support from European projects and are working
to transform itself into a self-sustainable activity. This is being made possible
thanks to the establishment of the CLEF Association,16 a non-profit legal entity
in late 2013, which, through the support of its members, ensures the resources
needed to smoothly run and coordinate CLEF.</p>
      <sec id="sec-9-1">
        <title>Acknowledgments</title>
        <p>We would like to thank the mentor who helped in shepherding the preparation
of lab proposals by newcomers:
Martin Krallinger, Barcelona Supercomputing Center, Spain</p>
        <p>We would like to thank the members of CLEF-LOC (the CLEF Lab
Organization Committee) for their thoughtful and elaborate contributions to assessing
the proposals during the selection process:
Alberto Baror´n-Ceden˜o, Universiat` di Bologna, Italy
Martin Braschler, ZHAW Zurich University of Applied Sciences, Switzerland
Daryna Dementieva, Technical University of Munich, Germany
Liana Ermakova, Universiet´ de Bretagne Occidentale, France
Elisabetta Fersini, University of Milano-Bicocca, Italy
Marc Franco-Salvador, United Nations International Computing Centre (UNICC),
Spain
Anastasia Giachanou, Utrecht University, The Netherlands
Joes´ A´ngel Gonaz´lez, Symanto Research, Spain
Salud Maıar´ Jimen´ez-Zafra, Universidad de Jean´, Spain
15 https://touche.webis.de/
16 https://www.clef-initiative.eu/#association</p>
        <sec id="sec-9-1-1">
          <title>Jaap Kamps, University of Amsterdam, The Netherlands</title>
          <p>Evangelos Kanoulas, University of Amsterdam, The Netherlands
Jussi Karlgren, Silo AI, Finland
Johannes Kiesel, Bauhaus-Universiat¨t Weimar, Germany
David Losada, University of Santiago de Compostela, Spain
Maria Maistro, University of Copenhagen, Denmark
Alejandro Marınt´ , Universidad Polietc´nica de Madrid, Spain
Maıar´ -Teresa Marınt´ -Valdivia, Universidad de Jean´, Spain
Arturo Montejo-Ra´ez, University of Jean´, Spain
Manuel Montes-Y-Go´mez, Instituto Nacional de Astroısf´ica, O´ptica y Elector´nica,
Mexico
Roser Morante, Universidad Nacional de Educacion a Distancia (UNED), Spain
Josiane Mothe, Universiet´ de Toulouse, France
Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence, United
Arab Emirates
Joakim Nivre, Uppsala University and RISE, Sweden
Javier Parapar, Universidade da Corun˜a, Spain
Florina Piroi, Technische Universiat¨t Wien, Austria
Simone Paolo Ponzetto, University of Mannheim, Germany
Martin Potthast, University of Kassel, hessian.AI, and ScaDS.AI, Germany
Francisco Rangel, Symanto Research, Spain
Eric Sanjuan, Laboratoire Informatique d’Avignon- Universiet´ d’Avignon, France
Areg Mikael Sarvazyan, Symanto Research, Spain
Efstathios Stamatatos, University of the Aegean, Greece
Benno Stein, Bauhaus-Universiat¨t Weimar, Germany
Sara Tonelli, Fondazione Bruno Kessler, Italy
Theodora Tsikrika, Information Technologies Institute, CERTH, Greece
Rafael Valencia-Garcia, Universidad de Murcia, Spain
David Vilares, Universidade da Corun˜a, Spain
Esau´ Villatoro, Idiap, Switzerland
Matti Wiegmann, Bauhaus-Universiat¨t Weimar, Germany
Christa Womser-Hacker, University of Hildesheim, Germany
Eva Zangerle, University of Innsbruck, Austria
Arkaitz Zubiaga, Queen Mary University of London, United Kingdom</p>
          <p>We thank the Friends of SIGIR program for covering the registration fees
for a number of student delegates; UNED for contributing by funding the cofee
breaks, providing institutional support, and granting access to the venues at
the Faculties of Education and Psychology; and the HiTZ Chair of Artificial
Intelligence and Language Technology at the University of the Basque Country
for their generous sponsorship. Last but not least, without the important and
tireless efort of the enthusiastic and creative proposal authors, the organizers of
the selected labs and workshops, the colleagues and friends involved in running
them, and the participants who contribute their time to making the labs and
workshops a success, the CLEF labs would not be possible.</p>
          <p>Thank you all very much!
July, 2025</p>
          <p>Organization
CLEF 2025, Conference and Labs of the Evaluation Forum – Experimental IR
meets Multilinguality, Multimodality, and Interaction, was hosted by the
Universidad Nacional de Educacoi´n a Distancia (UNED), Spain.</p>
        </sec>
      </sec>
      <sec id="sec-9-2">
        <title>General Chairs</title>
        <p>Jorge Carrillo-de-Albornoz, Universidad Nacional de Educacoi´n a Distancia (UNED),
Spain
Alba Garıac´ Seco de Herrera, Universidad Nacional de Educacoi´n a Distancia
(UNED), Spain
Julio Gonzalo, Universidad Nacional de Educacoi´n a Distancia (UNED), Spain
Laura Plaza, Universidad Nacional de Educacoi´n a Distancia (UNED), Spain</p>
      </sec>
      <sec id="sec-9-3">
        <title>Program Chairs</title>
        <sec id="sec-9-3-1">
          <title>Josiane Mothe, Universiet´ de Toulouse, France</title>
        </sec>
        <sec id="sec-9-3-2">
          <title>Florina Piroi, Technische Universiat¨t Wien, Austria</title>
        </sec>
      </sec>
      <sec id="sec-9-4">
        <title>Lab Chairs</title>
        <sec id="sec-9-4-1">
          <title>Paolo Rosso, Universitat Polietc`nica de Vaeln`cia, Spain</title>
        </sec>
        <sec id="sec-9-4-2">
          <title>Damiano Spina, RMIT University, Australia</title>
        </sec>
      </sec>
      <sec id="sec-9-5">
        <title>Proceedings Chairs</title>
        <sec id="sec-9-5-1">
          <title>Guglielmo Faggioli, University of Padua, Italy</title>
        </sec>
        <sec id="sec-9-5-2">
          <title>Nicola Ferro, University of Padua, Italy</title>
        </sec>
      </sec>
      <sec id="sec-9-6">
        <title>Local Organization Committee</title>
        <p>Vıc´tor Fresno, Universidad Nacional de Educacoi´n a Distancia (UNED), Spain
Enrique Amigo´, Universidad Nacional de Educacoi´n a Distancia (UNED), Spain</p>
        <p>CLEF Steering Committee</p>
      </sec>
      <sec id="sec-9-7">
        <title>Steering Committee Chair</title>
        <sec id="sec-9-7-1">
          <title>Nicola Ferro, University of Padua, Italy</title>
        </sec>
      </sec>
      <sec id="sec-9-8">
        <title>Steering Committee Co-Chairs</title>
        <p>Alba Garıac´ Seco de Herrera, Universidad Nacional de Educacoi´n a Distancia
(UNED), Spain
Alberto Baror´n-Ceden˜o, University of Bologna, Italy</p>
      </sec>
      <sec id="sec-9-9">
        <title>Deputy Steering Committee Chair for the Conference</title>
        <sec id="sec-9-9-1">
          <title>Paolo Rosso, Universitat Polietc`nica de Vaeln`cia, Spain</title>
        </sec>
      </sec>
      <sec id="sec-9-10">
        <title>Deputy Steering Committee Chair for the Evaluation Labs</title>
        <p>Martin Braschler, Zurich University of Applied Sciences, Switzerland</p>
      </sec>
      <sec id="sec-9-11">
        <title>Members</title>
        <sec id="sec-9-11-1">
          <title>Avi Arampatzis, Democritus University of Thrace, Greece</title>
          <p>Khalid Choukri, Evaluations and Language resources Distribution Agency (ELDA),
France
Fabio Crestani, Universiat` della Svizzera italiana, Switzerland
Carsten Eickhof, University of T ubingen, Germany
Norbert Fuhr, University of Duisburg-Essen, Germany
Petra Galacus´ kˇova´, University of Stavanger, Norway
Anastasia Giachanou, Utrecht University, The Netherlands
Lorraine Goeuriot, Universiet´ Grenoble Alpes, France
Julio Gonzalo, National Distance Education University (UNED), Spain
Donna Harman, National Institute for Standards and Technology (NIST), USA
Bogdan Ionescu, University “Politehnica” of Bucharest, Romania
Evangelos Kanoulas, University of Amsterdam, The Netherlands
Birger Larsen, University of Aalborg, Denmark
XIV</p>
        </sec>
        <sec id="sec-9-11-2">
          <title>Maria Maistro, University of Copenhagen, Denmark</title>
        </sec>
        <sec id="sec-9-11-3">
          <title>Josiane Mothe, IRIT, Universiet´ de Toulouse, France Henning Mu¨ller, University of Applied Sciences Western Switzerland (HES-SO), Switzerland</title>
        </sec>
        <sec id="sec-9-11-4">
          <title>Jian-Yun Nie, Universiet´ de Montera´l, Canada</title>
        </sec>
        <sec id="sec-9-11-5">
          <title>Gabriella Pasi, University of Milano-Bicocca, Italy</title>
        </sec>
        <sec id="sec-9-11-6">
          <title>Eric SanJuan, University of Avignon, France Laure Soulier, Pierre and Marie Curie University (Paris 6), France Theodora Tsikrika, Information Technologies Institute (ITI), Centre for Research and Technology Hellas (CERTH), Greece</title>
        </sec>
      </sec>
      <sec id="sec-9-12">
        <title>Past Members</title>
        <sec id="sec-9-12-1">
          <title>Paul Clough, University of Shefield, United Kingdom</title>
        </sec>
        <sec id="sec-9-12-2">
          <title>Djoerd Hiemstra, Radboud University, The Netherlands</title>
        </sec>
        <sec id="sec-9-12-3">
          <title>Jaana Kekaa¨l¨inen, University of Tampere, Finland</title>
        </sec>
        <sec id="sec-9-12-4">
          <title>Sea´mus Lawless, Trinity College Dublin, Ireland</title>
          <p>David E. Losada, Universidade de Santiago de Compostela, Spain</p>
        </sec>
        <sec id="sec-9-12-5">
          <title>Mihai Lupu, Vienna University of Technology, Austria</title>
        </sec>
        <sec id="sec-9-12-6">
          <title>Carol Peters, ISTI, National Council of Research (CNR), Italy (Steering Committee Chair 2000–2009) Emanuele Pianta, Centre for the Evaluation of Language and Communication Technologies (CELCT), Italy</title>
          <p>Maarten de Rijke, University of Amsterdam UvA, The Netherlands</p>
        </sec>
        <sec id="sec-9-12-7">
          <title>Giuseppe Santucci, Sapienza University of Rome, Italy</title>
        </sec>
        <sec id="sec-9-12-8">
          <title>Jacques Savoy, University of Neuchˆetel, Switzerland</title>
        </sec>
        <sec id="sec-9-12-9">
          <title>Alan Smeaton, Dublin City University, Ireland</title>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>