<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the CLEF 2024 SimpleText Task 1: Retrieve Passages to Include in a Simplified Summary</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Éric SanJuan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stéphane Huet</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaap Kamps</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liana Ermakova</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Avignon Université</institution>
          ,
          <addr-line>LIA</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Université de Bretagne Occidentale</institution>
          ,
          <addr-line>HCTI</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Amsterdam</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents an overview of the CLEF 2024 SimpleText Task 1 on Content Selection, asking systems to retrieve scientific abstracts in response to a query prompted by a popular science article. Overall, the SimpleText track provides an evaluation platform for the automatic simplification of scientific texts. We discuss the details of the task set-up. First, the SimpleText Corpus with over 4 million academic papers and abstracts. Second, the Topics based on 40 popular science articles in the news and the 114 Queries prompted by them. Third, the Formats of requests and results, the Evaluation labels and Evaluation measures used. Fourth, the Results of the runs submitted by our participants.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;information retrieval</kwd>
        <kwd>scientific documents</kwd>
        <kwd>text simplification</kwd>
        <kwd>scientific information retrieval</kwd>
        <kwd>non-expert queries</kwd>
        <kwd>press outlets</kwd>
        <kwd>query-document relationships (Q-rels)</kwd>
        <kwd>popularized science</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Task 1: Retrieve Passages to Include in a Simplified Summary</title>
      <p>This section details Task 1: Content Selection on retrieve passages to include in a simplified summary.</p>
      <sec id="sec-2-1">
        <title>2.1. Description 2.2. Data</title>
        <p>Given a popular science article targeted to a general audience, this task aims at retrieving passages,
which can help to understand this article, from a large corpus of academic abstracts and bibliographic
metadata. Relevant passages should relate to any of the topics in the source article.
We use popular science articles as a source for the types of topics the general public is interested in and
as a validation of the reading level that is suitable for them. The main corpus is a large set of scientific
abstracts plus associated metadata covering the fields of computer science and engineering. We reuse
the collection of academic abstracts from the Citation Network Dataset (12th version released in 2020)1
[5]. This collection was extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other
sources. It includes 4,232,520 abstracts in English, published before 2020.</p>
        <p>Search requests are based on popular press articles targeted to a general audience, based on The
Guardian and Tech Xplore. Each of these popular science articles represents a general topic that has to
be analyzed to retrieve relevant scientific information from the corpus.</p>
        <p>We provide the URLs to original articles, the title, and the textual content of each popular science
article as a general topic. Each general topic was also enriched with one or more specific keyword
queries manually extracted from their content, creating a familiar information retrieval task ranking
passages or abstracts in response to a query. Available training data from 2023 includes 29 (train) and 34
(test) queries, with the later set having an extensive recall base due to the large number of submissions
in 2023 [6].</p>
        <p>In 2024, we added between 2 and 5 new queries (with IDs of the form G*.C*) for each of the 20 articles
from the Guardian. These topics were generated by ChatGPT 4, with a prompt asking to list the main
subtopics related to computer science; they were manually inspected to check they are linked to the
original article and are not redundant. They are longer, containing around ten words and focusing
on a specific point related to the article. An example of a keyword query is “ system on chip” (T06.1)
and an example of a long query is “How AI systems, especially virtual assistants, can perpetuate gender
stereotypes?” (G01.C1).</p>
        <p>The C1 queries were generated based on the following prompt: In the attached article from the
Guardian, list the main sub topics related to computer science and for each topic find at least five related
references to scientific publications before 2019 that would have been relevant to be cited in this article.
Just provide the references, don’t try to get the full text. We then considered as query the first sub topic.
We also considered to use ChatGPT results as a complete run, but few references were returned, many
were not indexed in computer science and some did not even exist. That emphasizes the real dificulty
of the task of retrieving references to be included in a popular science article.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.3. Baselines</title>
        <p>An ElasticSearch index is provided to participants with access through an API. A JSON dump of
the index is also available for participants. This index can be accessed online through queries, e.g.
https://clef.termwatch.eu/dblp1/_search?q=biases&amp;size=1000 for the query “Biases.”</p>
        <p>We additionally provided two supplementary baselines leveraging bag-of-words models and sparse
vector document representations. The first baseline (denoted by “meili” in the results tables) was
generated using the Meilisearch system2 and relies on a bucket sort approach. The second baseline
(denoted by “boolean”) was constructed using a simple boolean model powered by PostgreSQL GIN
text indexing.</p>
        <p>For each topic, organizers manually assessed each proposed keyword query retrieved by the baseline
run powered by Elasticsearch, ensuring that it retrieved at least five relevant documents. As a
consequence, this boolean system, which retrieves all abstracts containing all query keywords within the
abstract, is expected to artificially achieve high recall levels at a depth of 5. However, this approach
sufers from two limitations: it misses relevant abstracts that do not contain all keywords, and it retrieves
irrelevant abstracts that happen to contain all query keywords.</p>
        <p>In the case of the long C1 queries, we manually extracted the largest subset of terms that retrieved
at least five relevant documents. For these queries, the boolean approach is essentially a manual run,
which is indicated by an asterisk (*) in the results tables.</p>
        <p>Despite their efectiveness, neural models are computationally expensive, requiring significant
training data and processing power. Consequently, most participants rely on a hybrid document
retrieval approach. This approach leverages a two-stage process:
1. Initial Retrieval: This phase employs a more traditional and less resource-intensive method, such
as tf-idf vectorization. This initial retrieval identifies a set of potentially relevant documents.
2. Re-ranking: The documents retrieved in the first stage are then re-ranked using the more nuanced
dense representations provided by neural models. This step refines the initial retrieval results
based on the semantic understanding of the neural models.</p>
        <p>In previous editions, participants relied on the provided ElasticSearch baseline for the initial retrieval
phase. To enhance run diversity and address resource limitations, the organizers this year provided
access to two vector databases containing pre-computed paragraph embeddings (for titles and abstracts).
These vector databases enable to compare the eficiency of scientific document retrieval techniques
using asymmetric sparse document retrieval (based on tf-idf) and symmetric dense passage retrieval
(based on pre-computed embeddings).</p>
        <p>Two embedding vectors were based on the paragraph cross-encoder MS MARCO Mini LM
(allMiniLM-L6-v2)3. These embeddings, along with a search API based on them, have been released to
participants. Documents are ranked based on the dot product between the query and the abstract
(vir_abstract) or the title (vir_title) using the pg_vector4 PostgreSQL extension and an ivvflat dense
vector index (k-means vector clustering with √︀|| centroids).</p>
        <p>These dense vector and the boolean baselines can be accessed online through a CGI API5 with three
parameters:
2https://www.meilisearch.com/
3https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
4https://github.com/pgvector/pgvector
5https://clef.termwatch.eu/stvir_test
corpus : title, abstract or bool
phrase : text passage as query
length : number of results to be retrieved</p>
        <p>In the case of non boolean query, this API generates the vector embedding of the query on the fly
before retrieving results using SQL syntax. For example, for the query “Exploring the use of AI to
improve success rates and speed in the pharmaceutical research field”, the top 100 documents whose
abstracts are most similar to the query (based on dot product) can be retrieved in JSON format using
the following syntax:
https://clef.termwatch.eu/stvir\_test?
corpus=abstract\\
\&amp;phrase=Exploring the use of AI to improve success rates and speed
in the pharmaceutical research field\\
\&amp;length=100</p>
        <p>In addition to the dot product similarity measure, we also experimented with cosine distance. However,
this alternative approach yielded comparable results.</p>
        <p>The Boolean and dense vector baselines are provided as a PostgreSQL database containing four tables:
1. Complete Documents (JSON): full documents in JSON format, enabling access to all content.
2. Textual Content (Boolean Search): title and abstracts of documents, facilitating eficient boolean
search operations.
3. Title Embeddings: pre-computed dense vector representations (embeddings) of the document
titles.
4. Truncated Abstract Embeddings: pre-computed dense vector representations (embeddings) of the
ifrst 110 tokens of each document’s abstract.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.4. Formats</title>
        <p>
          Ad-hoc passage retrieval Participants should retrieve, for each topic and each query, DBLP abstracts
related to the query and relevant to be inserted as a citation in the paper associated with the topic. We
encourage participants to take into account passage complexity as well as its credibility/influentialness.
Open passage retrieval (optional) Participants are encouraged to extract supplementary relevant
queries from the titles or content articles and to provide results based on these supplementary queries.
Output format Results should be provided in a TREC style JSON format with the following fields:
1. run_id: Run ID starting with &lt;team_id&gt;_&lt;task_id&gt;_&lt;method_used&gt;, e.g. UBO_Task1_TFIDF
2. manual: Whether the run is manual {0,1}
3. topic_id: Topic ID
4. query_id: Query ID used to retrieve the document (if one of the queries provided for the topic
was used; 0 otherwise)
5. doc_id: ID of the retrieved document (to be extracted from the JSON output)
6. rel_score: Relevance score of the passage (in the [
          <xref ref-type="bibr" rid="ref1">0-1</xref>
          ] scale)
7. comb_score: General score that may combine relevance and other aspects: readability, citation
measures. . . (in the [
          <xref ref-type="bibr" rid="ref1">0-1</xref>
          ] scale)
8. passage: Text of the selected passage
        </p>
        <p>For each query, the maximum number of distinct DBLP references (doc_id field) must be 100 and the
total length of passages should not exceed 1,000 tokens. The idea of taking into account complexity is
to have passages easier to understand for non-experts, while the credibility score aims at guiding them
on the expertise of authors and the value of publication w.r.t. the article topic. For example, complexity
scores can be evaluated using readability scores and credibility scores using bibliometrics.</p>
        <p>Here is an output format example:</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.5. Evaluation</title>
        <p>To assess topical relevance, we assigned a 0-2 score to each retrieved document based on its content
alignment with the original article. To expand the training data for relevance judgments (qrels), we
pooled all documents retrieved at depth 10 from all submitted systems. This approach significantly
increased the size of the qrels by 9,990 documents, with a particular focus on newly introduced long
queries for the Guardian corpus and T06-T11 queries that previously lacked relevance assessments.</p>
        <p>Table 2 summarizes the test collection constructed for the CLEF 2024 SimpleText Task 1, in relation
to the earlier years (note that earlier topics have been reused in the “train” data).</p>
        <p>While generating the long C1 queries using state-of-the-art LLMs, we were surprised by the inability
of these models, specifically ChatGPT4, to find relevant references in the computer science domain
suitable for inclusion in large audience tech articles. This raises questions about the inherent dificulty
of the task and the potential necessity of combining multiple retrieval systems to improve recall. This
need was addressed by both participants and organizers this year.</p>
        <p>Many participants employed multiple LLMs, not for initial retrieval, but as rerankers within their
systems. Additionally, several participants utilized diferent implementations of BM25 compared to the
one provided by the organizers for the retrieval stage. These novel end-to-end retrieval approaches,
coupled with the 4 new baselines provided, resulted in an unexpectedly high number of unassessed
documents among the top ten retrieved documents per run. This phenomenon included queries from
previous editions. For instance, among queries G01-G10, there were 3, 843 new documents not returned
in the top ten of previous editions. Notably, 954 of these documents appeared relevant to at least one
existing topic, and 576 were relevant to one of the newly introduced long C1 queries. This confirms the
task’s inherent dificulty but also demonstrates the potential to achieve high recall levels at depth 10.</p>
        <p>In addition to topical relevance, we took into account other key aspects of the track, such as the
text complexity and the credibility of the retrieved results. These evaluations were performed using
automatic metrics.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Scientific Passage Retrieval Approaches</title>
      <p>In this section, we discuss a range of scientific text retrieval approaches that have been applied by the
participants of the track. A total of 11 teams submitted 42 runs in total.</p>
      <p>AB/DPV Varadi and Bartulović [7] submitted 1 run for Task 1. They used our ElasticSearch API and
took into account an FKGL readability score for their combined score.</p>
      <p>Sharingans Ali et al. [8] also submitted 1 run. They experimented with the ColBERT neural ranker
and used GPT 3.5 to select the most informative and concise passages for inclusion in the summary.
Tomislav/Rowan Mann and Mikulandric [9] submitted a total of 2 runs. They took the top 100
results retrieved by ElasticSearch. Then, they used cosine similarity on TF-IDF vectors as the relevance
score and FKGL score as the combined score.</p>
      <p>Petra/Regina Elagina and Vučić [10] submitted 1 run, for the first 3 queries only, with the same
approach as the previous system.</p>
      <p>AIIRLab Largey et al. [11] submitted a total of 5 runs and proposed several models. First, since input
queries are short keyword terms, they used query expansion with LLaMA 3 and reranked the top 5,000
results retrieved by TF-IDF with a bi-encoder or a cross-encoder. Second, they applied LLaMa3 as a
pairwise re-ranker. Third, they leveraged ElasticSearch with fine-tuned cross-encoders.
UBO Vendeville et al. [12] submitted a total of 1 run. They used PyTerrier6 to retrieve documents
from TF-IDF scores. Then, the MonoT5 reranker provided by PyTerrier was employed to reorder all
extracted documents.
UAmsterdam Bakker et al. [13] submitted a total of 6 runs for Task 1. First, they focused on regular
information retrieval efectiveness with 2 vanilla baseline runs on an Anserini index, using either BM25
or BM25+RM3, and 2 other runs generated with neural cross-encoder rerankings of these runs by an
MS MARCO-trained ranker. Second, 2 further runs filter out the most complex abstracts per request,
using the median FKGL readability measure.</p>
      <p>Elsevier Capari et al. [14] submitted a total of 10 runs. Their approaches mainly centered on creating
a ranking model. They started by assessing the performance of several models on a proprietary test
collection of scientific papers. Then, the top-performing model was fine-tuned on a large set of unlabeled
documents using the Generative Pseudo Labeling approach. They also experimented with generating
new search queries.</p>
      <p>LIA submitted a total of 5 runs as baselines for Task 1. All five have been included in the pool of
results for qrel evaluation.</p>
      <p>Ruby This team (No paper received) submitted a total of 1 run for Task 1. Their approach relies on
ElasticSearch and a TF-IDF score.</p>
      <p>Arampatzis This team (No paper received) submitted a total of 9 runs for Task 1. As these reports
are very close, the Tables below only report their evaluation made on their first run.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <sec id="sec-4-1">
        <title>4.1. Released database</title>
        <p>This section details the results of the task, for both the train and test data.</p>
        <p>All data and results have been organized within a relational database, which will be released to all active
participants. This release will facilitate:
• Computation of Diverse Scores.
• Addressing qrel Issues.</p>
        <p>• Easy Generation of Supplementary Runs.</p>
        <p>One particular benefit of the relational database is the ability to easily extend the qrels based on dense
vector similarity and similarity thresholds. This capability is especially relevant given the observation
that seemingly identical abstracts in the DBLP dataset appear with diferent relevance labels.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Train results</title>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Test results</title>
        <p>† Evaluated on comb_score.
which can be a source of public debates (like privacy, quantum computing, bitcoins...) from those
established on the short Tech Xplore queries, which are more specific and related with a scientific
paper in peer-reviewed venues (indoor positioning system, RISC-V architecture for space computing,
underwater WiFi developed using LEDs and lasers...). Rankings on these two subsets are very similar,
which shows the consistency of relevance results across queries.</p>
        <p>10
5. Analysis
This section provides further analysis of the submitted runs, and the task as whole.</p>
        <p>We complement the evaluation above by taking into consideration other aspects essential for Task 1.
Table 7 highlights credibility and text complexity. We used simple automatic metrics to provide an
overview of the importance and the complexity of the article. First, the average number of bibliographic
references among the top 10 results of each query is provided. Second, we provide several metrics
provided by the Python library readability7: the average size of vocabulary per abstract, the average
ratio of words considered as long (i.e., with at least 7 characters), the average ratio of words considered
as complex (i.e., absent from the Dale-Chall word list of 3,000 words recognized by 80 % of fifth graders)
and the averaged and median FKGL readability metrics.</p>
        <p>A large majority of runs have a similar FKGL of 15, corresponding to university level texts, which
can be expected since the document deals with advanced scientific topics. However, AIIRLab runs
obtained with bi- or cross-encoders and ordered according to comb scores exhibit a significant higher
FKGL readability scores. This diference is related to longer sentences retrieved with this score that
with relevance score (average length of 31 words vs 23 words).</p>
        <p>Only one run (Sharingans_Task1_marco-GPT3) provided a rephrased extract from the retrieved
abstracts, while other runs gave the abstracts in full. This feature translates in the Table in a lower size
of vocabulary in their passages.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Discussion and Conclusions</title>
      <p>This concludes the results for the CLEF 2024 SimpleText Task 1: Content Selection on retrieve passages
to include in a simplified summary. Our main findings are the following: First, the Tables on relevance
are dominated by neural rankers, in particular, cross-encoders and LLaMA 3 used as a pairwise
reranker. Second, a majority of participants relied on ElasticSearch search results. If neural models used
in processing steps leveraged these results, other IR systems turned out to be competitive. For instance,
LIA_vir_title operating with embedding sentences or UAms_Task1_Anserini_rm3, using an Anserini
index have high relevance evaluations. Third, as expected, ranking over systems difers according to
the considered criterion. Runs filtered against readability measures tend to have shorter sentences with
a more or less drop in relevance. Remarkably, LLaMA 3 used as a reranker seems to not only help to
select more relevant documents but also with more concise sentences.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This track would not have been possible without the great support of numerous individuals. We want to
thank in particular the colleagues and the students who participated in data construction and evaluation.
Please visit the SimpleText website for more details on the track.8</p>
      <p>Liana Ermakova is funded by the French National Research Agency (ANR) Automatic Simplification of
Scientific Texts project (ANR-22-CE23-0019-01), 9 and the MaDICS research group.10
8https://simpletext-project.com/
9https://anr.fr/Project-ANR-22-CE23-0019
10https://www.madics.fr/ateliers/simpletext/
[5] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, Z. Su, Arnetminer: Extraction and mining of academic
social networks, in: KDD’08, 2008, pp. 990–998.
[6] E. SanJuan, S. Huet, J. Kamps, L. Ermakova, Overview of the CLEF 2023 simpletext task 1: Passage
selection for a simplified summary, in: M. Aliannejadi, G. Faggioli, N. Ferro, M. Vlachos (Eds.),
Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023), Thessaloniki,
Greece, September 18th to 21st, 2023, volume 3497 of CEUR Workshop Proceedings, CEUR-WS.org,
2023, pp. 2823–2834. URL: https://ceur-ws.org/Vol-3497/paper-238.pdf.
[7] D. P. Varadi, A. Bartulović, SimpleText 2024: Scientific Text Made Simpler Through the Use of AI,
in: [15], 2024.
[8] S. M. Ali, H. Sajid, O. Aijaz, O. Waheed, F. Alvi, A. Samad, Improving Scientific Text Comprehension:</p>
      <p>A Multi-Task Approach with GPT-3.5 Turbo and Neural Ranking, in: [15], 2024.
[9] R. Mann, T. Mikulandric, CLEF 2024 SimpleText Tasks 1-3: Use of LLaMA-2 for text simplification,
in: [15], 2024.
[10] R. Elagina, P. Vučić, AI Contributions to Simplifying Scientific Discourse in SimpleText 2024, in:
[15], 2024.
[11] N. Largey, R. Maarefdoust, S. Durgin, B. Mansouri, AIIR Lab Systems for CLEF 2024 SimpleText:</p>
      <p>Large Language Models for Text Simplification, in: [15], 2024.
[12] B. Vendeville, L. Ermakova, P. De Loor, UBO NLP report on the SimpleText track at CLEF 2024, in:
[15], 2024.
[13] J. Bakker, G. Yüksel, J. Kamps, University of Amsterdam at the CLEF 2024 SimpleText Track, in:
[15], 2024.
[14] A. Capari, H. Azarbonyad, G. Tsatsaronis, Z. Afzal, Enhancing Scientific Document Simplification
through Adaptive Retrieval and Generative Models, in: [15], 2024.
[15] G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024:
Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org, 2024.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vezzani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bonato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azarbonyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2024 SimpleText task 2: Identify and explain dificult concepts</article-title>
          ,
          <source>in: [15]</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Laimé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>McCombie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2024 SimpleText task 3: Simplify scientific text</article-title>
          ,
          <source>in: [15]</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>J. D'Souza</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kabongo</surname>
            ,
            <given-names>H. B.</given-names>
          </string-name>
          <string-name>
            <surname>Giglou</surname>
            ,
            <given-names>Y. Zhang,</given-names>
          </string-name>
          <article-title>Overview of the CLEF 2024 SimpleText Task 4: SOTA? Tracking the State-of-the-Art in Scholarly Publications</article-title>
          , in: [15],
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          , E. SanJuan, S. Huet,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azarbonyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vezzani</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. D'Souza</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2024 SimpleText track: Improving access to scientific texts for everyone</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>G. Q.</given-names>
          </string-name>
          <string-name>
            <surname>Philippe Mulhem</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. M. D. Nunzio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Lecture Notes in Computer Science, Springer,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>