Preface


The CLEF 2021 conference is the twenty-second edition of the popular CLEF
campaign and workshop series that has run since 2000 contributing to the sys-
tematic evaluation of multilingual and multimodal information access systems,
primarily through experimentation on shared tasks. In 2010 CLEF was launched
in a new format, as a conference with research presentations, panels, poster and
demo sessions and laboratory evaluation workshops. These are proposed and
operated by groups of organizers volunteering their time and effort to define,
promote, administrate and run an evaluation activity.
    CLEF 20211 was organized by the University “Politehnica” of Bucharest,
Romania, from 21 to 24 September 2021.
    The continued outbreak of the Covid-19 pandemic affected the organization
of CLEF 2021. The CLEF steering committee along with the organizers of CLEF
2021, after detailed discussions, decided to run the conference fully virtually.
The conference format remained the same as in past years, and consisted of
keynotes, contributed papers, lab sessions, and poster sessions, including reports
from other benchmarking initiatives from around the world. All sessions were
organized and run online.
    15 lab proposals were received and evaluated in peer review based on their
innovation potential and the quality of the resources created. To identify the
best proposals, besides the well-established criteria from the editions of previ-
ous years of CLEF such as topical relevance, novelty, potential impact on future
world affairs, likely number of participants, and the quality of the organizing
consortium, this year we further stressed the connection to real-life usage sce-
narios and we tried to avoid as much as possible overlaps among labs in order
to promote synergies and integration.
    The 12 selected labs represented scientific challenges based on new data sets
and real world problems in multimodal and multilingual information access.
These data sets provide unique opportunities for scientists to explore collections,
to develop solutions for these problems, to receive feedback on the performance
of their solutions and to discuss the issues with peers at the workshops.
    We continued the mentorship program to support the preparation of lab
proposals for newcomers to CLEF. The CLEF newcomers mentoring program
offered help, guidance, and feedback on the writing of draft lab proposals by
assigning a mentor to proponents, who helped them in preparing and maturing
the lab proposal for submission. If the lab proposal fell into the scope of an
already existing CLEF lab, the mentor helped proponents to get in touch with
those lab organizers and team up forces.
    Building on previous experience, the Labs at CLEF 2021 demonstrate the
maturity of the CLEF evaluation environment by creating new tasks, new and
1
    http://clef2021.clef-initiative.eu/
larger data sets, new ways of evaluation or more languages. Details of the indi-
vidual Labs are described by the Lab organizers in these proceedings. Below is
a short summary of them.

 ARQMath: Answer Retrieval for Mathematical Questions2 considers
   the problem of finding answers to new mathematical questions among posted
   answers on the community question answering site Math Stack Exchange.
   The goals of the lab are to develop methods for mathematical information
   retrieval based on both text and formula analysis.
 BioASQ3 challenges researchers with large-scale biomedical semantic index-
   ing and question answering (QA). The challenges include tasks relevant to
   hierarchical text classification, machine learning, information retrieval, QA
   from texts and structured data, multi-document summarization and many
   other areas. The aim of the BioASQ workshop is to push the research frontier
   towards systems that use the diverse and voluminous information available
   online to respond directly to the information needs of biomedical scientists.
 CheckThat!: Identification and Verification of Political Claims4 aims
   to foster the development of technology capable of both spotting and verify-
   ing check-worthy claims in political debates in English, Arabic and Italian.
   The concrete tasks were to assess the check worthiness of a claim in a tweet,
   check if a (similar) claim has been previously verified, retrieve evidence to
   fact-check a claim, and verify the factuality of a claim.
 ChEMU: Cheminformatics Elsevier Melbourne University5 proposes
   two key information extraction tasks over chemical reactions from patents.
   Task 1 aims to identify chemical compounds and their specific types, i.e.
   to assign the label of a chemical compound according to the role which
   it plays within a chemical reaction. Task 2 requires identification of event
   trigger words (e.g. “added” and “stirred”) which all have the same type of
   “EVENT TRIGGER”, and then determination of the chemical entity argu-
   ments of these events.
 CLEF eHealth6 aims to support the development of techniques to aid laypeo-
   ple, clinicians and policy-makers in easily retrieving and making sense of
   medical content to support their decision making. The goals of the lab are to
   develop processing methods and resources in a multilingual setting to enrich
   difficult-to-understand eHealth texts and provide valuable documentation.
 eRisk: Early Risk Prediction on the Internet7 explores challenges of
   evaluation methodology, effectiveness metrics and other processes related to
   early risk detection. Early detection technologies can be employed in different
   areas, particularly those related to health and safety. The 2020 edition of the
2
  https://www.cs.rit.edu/~dprl/ARQMath
3
  http://www.bioasq.org/workshop2021
4
  https://sites.google.com/view/clef2021-checkthat
5
  http://chemu2021.eng.unimelb.edu.au/
6
  https://clefehealth.imag.fr/
7
  https://erisk.irlab.org/
    lab focused on texts written in social media for the early detection of signs
    of self-harm and depression.
 ImageCLEF: Multimedia Retrieval8 provides an evaluation forum for vi-
    sual media analysis, indexing, classification/learning, and retrieval in medi-
    cal, nature, security and lifelogging applications with a focus on multimodal
    data, so data from a variety of sources and media.
 LifeCLEF: Multimedia Life Species Identification9 aims at boosting re-
    search on the identification and prediction of living organisms in order to
    solve the taxonomic gap and improve our knowledge of biodiversity. Through
    its biodiversity informatics related challenges, LifeCLEF is intended to push
    the boundaries of the state-of-the-art in several research directions at the
    frontier of multimedia information retrieval, machine learning and knowl-
    edge engineering.
 Lilas: Living Labs for Academic Search10 aims to bring together re-
    searchers interested in the online evaluation of academic search systems.
    The long term goal is to foster knowledge on improving the search for aca-
    demic resources like literature, research data, and the interlinking between
    these resources in fields from the Life Sciences and the Social Sciences. The
    immediate goal of this lab is to develop ideas, best practices, and guidelines
    for a full online evaluation campaign at CLEF 2021.
 PAN: Digital Text Forensics and Stylometry11 is a networking initiative
    for the digital text forensics, where researchers and practitioners study tech-
    nologies that analyze texts with regard to originality, authorship, and trust-
    worthiness. PAN provides evaluation resources consisting of large-scale cor-
    pora, performance measures, and web services that allow for meaningful eval-
    uations. The main goal is to provide for sustainable and reproducible evalu-
    ations, to get a clear view of the capabilities of state-of-the-art-algorithms.
 SimpleText: (Re)Telling right scientific stories to non-specialists via
    text simplification12 aims to create a community interested in generating
    a simplified summary of scientific documents and to contribute in making
    the science really open and accessible for everyone. The goal is to generate a
    simplified abstract of multiple scientific documents based on a given query.
 Touché: Argument retrieval13 is the first shared task on the topic of ar-
    gument retrieval. Decision making processes, be it at the societal or at the
    personal level, eventually come to a point where one side will challenge the
    other with a why-question, which is a prompt to justify one’s stance. Thus,
    technologies for argument mining and argumentation processing are matur-
    ing at a rapid pace, giving rise for the first time to argument retrieval.

8
   https://www.imageclef.org/2021
9
   https://www.imageclef.org/LifeCLEF2021
10
   https://clef-lilas.github.io/
11
   http://pan.webis.de/
12
   https://www.irit.fr/simpleText/
13
   https://touche.webis.de/
    As a group, the 152 lab organizers were based in 22 countries, with Germany,
and France leading the distribution. Despite CLEF’s traditionally Europe-based
audience, 44 (28.9%) organizers were affiliated with international institutions
outside of Europe. The gender distribution was biased towards 75% male orga-
nizers.
    CLEF has always been backed by European projects that complement the
incredible amount of volunteering work performed by Lab Organizers and the
CLEF community with the resources needed for its necessary central coordina-
tion, in a similar manner to the other major international evaluation initiatives
such as TREC, NTCIR, FIRE and MediaEval. Since 2014, the organisation of
CLEF no longer has direct support from European projects and are working
to transform itself into a self-sustainable activity. This is being made possible
thanks to the establishment of the CLEF Association14 , a non-profit legal entity
in late 2013, which, through the support of its members, ensures the resources
needed to smoothly run and coordinate CLEF.

Acknowledgments
We would like to thank the mentors who helped in shepherding the preparation
of lab proposals by newcomers:
Bogdan Ionescu, University “Politehnica” of Bucharest, Romania;
Henning Müller, University of Applied Sciences Western Switzerland (HES-SO),
Switzerland.
    We would like to thank the members of CLEF-LOC (the CLEF Lab Organi-
zation Committee) for their thoughtful and elaborate contributions to assessing
the proposals during the selection process:
Donna Harman, National Institute of Standards and Technology (NIST), USA;
Braschler Martin, Zurich University of Applied Sciences, Switzerland;
Paolo Rosso, Universitat Politècnica de València, Spain.
    Last but not least, without the important and tireless effort of the enthu-
siastic and creative proposal authors, the organizers of the selected labs and
workshops, the colleagues and friends involved in running them, and the partic-
ipants who contribute their time to making the labs and workshops a success,
the CLEF labs would not be possible.
    Thank you all very much!


July, 2021
                                                             Guglielmo Faggioli,
                                                                   Nicola Ferro,
                                                                    Alexis Joly,
                                                                Maria Maistro,
                                                                   Florina Piroi
14
     http://www.clef-initiative.eu/association
                             Organization


CLEF 2021, Conference and Labs of the Evaluation Forum – Experimental IR
meets Multilinguality, Multimodality, and Interaction, was hosted (online) by the
University “Politehnica” of Bucharest, Romania.


General Chairs

K. Selçuk Candan, Arizona State University, USA
Bogdan Ionescu, University “Politehnica” of Bucharest, Romania


Program Chairs

Lorraine Goeuriot, Université Grenoble Alpes, France
Birger Larsen, Aalborg University Copenhagen, Denmark
Henning Müller, University of Applied Sciences Western Switzerland, Switzer-
land


Lab Chairs

Alexis Joly, INRIA Sophia-Antipolis, France
Maria Maistro, University of Copenhagen, Denmark
Florina Piroi, Vienna University of Technology, Austria


Lab Mentorship Chair

Lorraine Goeuriot, Université Grenoble Alpes, France


Publicity Chairs

Liviu-Daniel Stefan, University “Politehnica” of Bucharest, Romania
Mihai Dogariu, University “Politehnica” of Bucharest, Romania
Outreach Program Chairs

Yu-Gang Jiang, Fudan University, China - Asian Liaison
Hugo Jair Escalante, Instituto Nacional de Astrofisica, Optica y Electronica,
Mexico - Central American Liaison
Fabio A. Gonzalez, National University of Colombia, Colombia - South Ameri-
can Liaison
Ben Herbst, Praelexis, South Africa - African Liaison
Abdulmotaleb El Saddik, University of Ottawa, Canada - North American Liai-
son


Industry & Sponsorship Chairs
Şeila Abdulamit, Vodafone, Romania
Mihai-Gabriel Constantin, University “Politehnica” of Bucharest, Romania
Bogdan Boteanu, University “Politehnica” of Bucharest, Romania


Website & Social Media Chair
Denisa Ionas, cu, University “Politehnica” of Bucharest, Romania


Finance Chair
Ion Marghescu, University “Politehnica” of Bucharest, Romania


Proceedings Chairs
Guglielmo Faggioli, University of Padua, Italy
Nicola Ferro, University of Padua, Italy
                  CLEF Steering Committee


Steering Committee Chair

Nicola Ferro, University of Padua, Italy


Deputy Steering Committee Chair for the Conference

Paolo Rosso, Universitat Politècnica de València, Spain


Deputy Steering Committee Chair for the Evaluation Labs

Martin Braschler, Zurich University of Applied Sciences, Switzerland


Members

Khalid Choukri, Evaluations and Language resources Distribution Agency (ELDA),
France
Paul Clough, University of Sheffield, United Kingdom
Fabio Crestani, Università della Svizzera italiana, Switzerland
Carsten Eickhoff, Brown University, USA
Norbert Fuhr, University of Duisburg-Essen, Germany
Lorraine Goeuriot, Université Grenoble Alpes, France
Julio Gonzalo, National Distance Education University (UNED), Spain
Donna Harman, National Institute for Standards and Technology (NIST), USA
Evangelos Kanoulas, University of Amsterdam, The Netherlands
Birger Larsen, University of Aalborg, Denmark
David E. Losada, Universidade de Santiago de Compostela, Spain
Mihai Lupu, Vienna University of Technology, Austria
Josiane Mothe, IRIT, Université de Toulouse, France
Henning Müller, University of Applied Sciences Western Switzerland (HES-SO),
Switzerland
Jian-Yun Nie, Université de Montréal, Canada
Eric SanJuan, University of Avignon, France
Giuseppe Santucci, Sapienza University of Rome, Italy
Jacques Savoy, University of Neuchêtel, Switzerland
Laure Soulier, Pierre and Marie Curie University (Paris 6), France
Theodora Tsikrika, Information Technologies Institute (ITI), Centre for Re-
search and Technology Hellas (CERTH), Greece
Christa Womser-Hacker, University of Hildesheim, Germany


Past Members
Djoerd Hiemstra, Radboud University, The Netherlands
Jaana Kekäläinen, University of Tampere, Finland
Séamus Lawless, Trinity College Dublin, Ireland
Carol Peters, ISTI, National Council of Research (CNR), Italy
(Steering Committee Chair 2000–2009)
Emanuele Pianta, Centre for the Evaluation of Language and Communication
Technologies (CELCT), Italy
Maarten de Rijke, University of Amsterdam UvA, The Netherlands
Alan Smeaton, Dublin City University, Ireland