<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Grenoble, France
$ liana.ermakova@univ-brest.fr (L. Ermakova)
 https://simpletext-project.com/ (L. Ermakova)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Overview of the CLEF 2024 SimpleText Task 3: Simplify Scientific Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Liana Ermakova</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valentin Laimé</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Helen McCombie</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaap Kamps</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Université de Bretagne Occidentale</institution>
          ,
          <addr-line>BTU</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Université de Bretagne Occidentale</institution>
          ,
          <addr-line>HCTI</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Amsterdam</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>This article provides a comprehensive summary of the CLEF 2024 SimpleText Task 3, which focuses on simplifying scientific text based on specific queries. We discuss in detail the motivation for lay access to scholarly literature, and provide an overview of the setup of the scientific text simplification task. One of the main innovations of the CLEF 2024 SimpleText Task 3 is to complement sentence-level text simplification with a document-level text simplification task. We describe the resulting sentence-level and document-level text simplification test collection in detail, which consists of a corpus of over 1,500 paired source and reference sentences, and a corpus of over 250 paired source and reference abstracts, both containing the source text from scientific abstracts with direct reference simplifications produced by human annotators. We present the results of the participants submission, with 15 teams submitting 52 sentence-level text simplification runs and 9 teams submitting 31 sentence-level text simplification runs. The article concludes with an in-depth analysis, including information distortion and potential LLM “hallucinations” of the simplified sentences submitted by participants.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;automatic text simplification</kwd>
        <kwd>science popularization</kwd>
        <kwd>information distortion</kwd>
        <kwd>error analysis</kwd>
        <kwd>lexical complexity</kwd>
        <kwd>syntactic complexity</kwd>
        <kwd>LLMs hallucination</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>1. Task 1 on Content Selection: retrieve passages to include in a simplified summary.
2. Task 2 on Complexity Spotting: identify and explain dificult concepts.</p>
      <sec id="sec-1-1">
        <title>3. Task 3 on Text Simplification : simplify scientific text.</title>
        <p>4. Task 4 on SOTA?: track the state-of-the-art in scholarly publications.</p>
        <p>
          This paper presents an overview of the CLEF 2024 SimpleText Task 3 on Content Selection. For a
comprehensive overview of the other tasks, the task overview papers on Task 1 [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], Task 2 [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], and
Task 4 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], as well as the track overview paper [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], provide detailed information and further insights.
        </p>
        <p>
          The CLEF 2024 SimpleText Task 3 directly addresses the technical and evaluation challenges associated
with making scientific information accessible to a wide audience, including students and non-experts. We
describes the data and benchmarks provided for scientific text simplification, along with the participants’
results and further analysis. This task on simplifying scientific text is a direct continuation of the CLEF
2023 Task 3 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. One of the key innovation in 2024 is the introduction of both sentence level and
document (abstract) level scientific text simplification subtasks, as Task 3.1 and Task 3.2.
        </p>
        <p>A total of 45 teams registered for our SimpleText track at CLEF 2024. A total of 20 teams submitted
207 runs in total for the Track, of which 15 teams submitted a total of 83 runs for Task 3. The statistics
for the Task 3 runs submitted are presented in Table 1. However, some runs had problems that we could
not resolve. We do not detail them in the paper as well as the 0-scored runs.</p>
        <p>This introduction is followed by Section 2 presenting the text simplification task with the datasets and
evaluation metrics used. Section 3 gives an overview of text simplification approaches for scientific text
as deployed by the participants. In Section 4, we present and discuss the results of the oficial submissions.
In Section 5, a thorough analysis of the results is carried out, covering several important aspects. This
includes examining the relationship between dificult scientific terms and the simplification process,
investigating information distortion that may occur during simplification, and exploring instances of
language models (LLMs) generating hallucinations and producing inaccurate information. The analysis
delves into these topics to provide a comprehensive understanding of the findings and insights derived
from the study. We end with Section 6 summarizes the findings and draws perspective for future work.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Task 3: Simplify Scientific Text</title>
      <p>This section details Task 3: Text Simplification on simplify scientific text.</p>
      <sec id="sec-2-1">
        <title>2.1. Description</title>
        <p>The goal of this task is to provide a simplified version of the sentences extracted from scientific abstracts.
Participants will be provided with popular science articles and queries and matching abstracts of
scientific papers, either split into individual sentences or as the entire abstracts. This year will feature
both sentence level (Task 3.1) and document or abstract level (Task 3.2) text simplification.</p>
        <p>Table 2 shows an example of a human reference simplification, combining the input sentences
belonging to the abstract of the document  = 130055196 retrieved for query G01.1. Here, we show
the deletions and insertions relative to the source input sentences (in this case in the first 4 sentences).
G01.1 130055196 As various kinds The rise of output devices emerged , such as highresolution like
high-resolution printers or a display of and PDA ( Personal Digital Assistant ) , displays
has increased the importance of need for high-quality resolution conversion has been
increasing . ⃒⃒ This The paper proposes a new method for enlarging image with to make
images bigger while maintaining high quality . ⃒⃒ One of the largest problems on image
enlargement The main issue with enlarging images is the exaggeration of the jaggy that
jagged edges can become exaggerated . ⃒⃒ To remedy solve this problem , we propose suggest
a new interpolation method , which uses artificial that helps us to estimate the value of
the newly generated pixels using a neural network to determine the optimal values of
interpolated pixels . ⃒⃒ The experimental experiment ’s results are shown presented and
evaluated analyzed . ⃒⃒ The We evaluate the efectiveness of our methods is discussed by
comparing with the conventional methods them to traditional approaches . ⃒⃒
2.1.1. Data</p>
        <p>Level
Sentence
Sentence
Sentence
Document
Document
Document</p>
        <p>Role
Train</p>
        <p>Test
Combined</p>
        <p>Train</p>
        <p>Test
Combined</p>
        <p>Source
893 sentences
578 sentences
1,471 sentences
175 abstracts
103 abstracts
278 abstracts
958 simplified sentences
578 simplified sentences
1,536 simplified sentences
175 simplified abstracts
103 simplified abstracts
278 simplified abstracts
Task 3 uses a corpus based on the high-ranked abstracts retrieved for the requests of the CLEF 2024
SimpleText Task 1. Our training data is a truly parallel corpus of directly simplified sentences coming
from scientific abstracts from the DBLP Citation Network Dataset for Computer Science and Google
Scholar and PubMed articles on Health and Medicine. Other existing text simplification corpora used
post-hoc aligned sentences [e.g., 13].</p>
        <p>In 2024, we expanded the training and evaluation data. In addition to sentence-level text simplification,
we will provide document-level or abstract-level input and reference simplifications. In order to make
the sentence-level and document-level tasks fairly comparably, both use the exact same reference
simplifications. The scientific sentences from scientific abstracts were simplified either by master
students in Technical Writing and Translation or by a domain expert (a computer scientist) and a
professional translator (native English speaker) working together.</p>
        <p>Table 3 gives an overview of all the SimpleText Task 3 scientific text simplification corpora constructed
in 2024. The SimpleText corpus contains 1,536 directly simplified sentences, corresponding to 278
scientific abstracts. This is a useful addition to existing high-quality corpora like Newsela [ 13], with
2,259 sentences in Newsela-Manual. Our track is the first to focus on the simplification of scientific text
with a much higher text complexity than news articles.</p>
        <p>
          Available Task 3 training data is derived from the CLEF 2023 edition [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], and includes 893 source
sentences from 175 scientific abstracts paired with the corresponding manual reference simplifications.
The new test data created in 2024 consists of 578 sentences paired with reference simplifications
for the sentence-level task (Task 3.1), and 103 abstracts paired with reference simplifications for the
document-level task (Task 3.2).
2.1.2. Formats
Sources The source data are provided in JSON formats with the following fields:
1. snt_id (Task 3.1) or abs_id (Task 3.2): a unique sentence (or abstract) identifier
2. source_snt (Task 3.1) or source_abs (Task 3.2): passage text (sentence or abstract)
3. doc_id: a unique source document identifier
4. query_id: a query ID
5. query_text: dificult terms should be extracted from sentences with regard to this query
An example of the Task 3.1 JSON source input is:
"query_id":"G11.1",
"query_text":"drones",
"doc_id":2892036907,
"snt_id":"G11.1_2892036907_2",
"source_snt":"With the ever increasing number of unmanned aerial vehicles getting
˓→ involved in activities in the civilian and commercial domain, there is an increased
˓→ need for autonomy in these systems too."
References The references are provided in a very similar format as the predictions above. An example
of a Task 3.1 reference in JSON is:
"snt_id":"G11.1_2892036907_2",
"simplified_snt":"Drones are increasingly used in the civilian and commercial domain
˓→ and need to be autonomous."
Predictions Predictions or submissions of participants were also requested in a JSON format with
the following fields:
1. run_id: Run ID starting with &lt;team_id&gt;_&lt;task_id&gt;_&lt;method_used&gt;, e.g. UBO_Task3.1_BLOOM
2. manual: Whether the run is manual {0,1}
3. snt_id (Task 3.1) or abs_id (Task 3.2): a unique sentence or abstract identifier from the input file
4. simplified_snt (Task 3.1) or simplified_abs (Task 3.2): simplified text for the sentence or abstract
An example of the Task 3.1 submission in JSON is:
"run_id": "Elsevier@SimpleText_Task3.1_run1",
"manual": 0,
"snt_id": "G11.1_2892036907_2",
"simplified_snt": "As more and more drones are used for civilian and commercial
˓→ purposes, there is a growing need for them to operate independently."
{
},
{
},
{
},
2.1.3. Evaluation
In 2024, we emphasize large-scale automatic evaluation measures (SARI, BLEU, compression, readability)
that provide a reusable test collection. This automatic evaluation will be supplemented with a detailed
human evaluation of other aspects, essential for deeper analysis. Almost all participants used generative
models for text simplification, yet existing evaluation measures are blind to potential hallucinations
with extra or distorted content [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. In 2024, we provide further analysis of ways to detect and quantify
spurious content in the output, potentially corresponding to what is informally called “hallucinations.”
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Scientific Text Simplification Approaches</title>
      <p>In this section, we discuss a range of text simplification approaches that have been applied to scientific
text as provided by the track. A total of 15 teams submitted 83 runs in total.</p>
      <p>AB/DPV Varadi and Bartulović [14] submitted one run for Task 3. Their approach is an LSTM model
for the sentence-level task.</p>
      <p>AIIRLab Largey et al. [15] submitted a total of eight runs for Task 3. Their approach uses LLaMA3 and
Mistral models with diferent prompting and fine-tuning, for both the sentence-level and abstract-level
tasks.</p>
      <p>Arampatzis (No paper received) submitted a total of eight runs for Task 3. Their approach is a range
of models (DistilBERT, T5) for both the sentence-level and abstract-level tasks.</p>
      <p>Dajana/Katya (No paper with run details received) submitted one run for Task 3. Their approach
which follows standard text simplification approaches is applied to the sentence-level task.
Elsevier Capari et al. [16] submitted a total of ten runs for Task 3. Their approach is based on
a GPT-3.5 model experimenting with zero-shot and few-shot prompts for both sentence-level and
abstract-level tasks.</p>
      <p>Frane/Andrea (No paper with run details received) submitted one run for Task 3. Their approach
which follows standard text simplification approaches is applied to the sentence-level task.
Petra/Diana Elagina and Vučić [17] submitted one run for Task 3. Their approach is a LLaMA model
for the sentence-level task.</p>
      <p>PiTheory (No paper with run details received) submitted a total of twenty runs for Task 3. Their
approach uses pre-trained BART and T5 models but contains very few results for both the sentence-level
and abstract-level tasks.</p>
      <p>Ruby (No paper received) submitted two runs for Task 3. Their approach uses standard models for
both sentence-level and abstract-level tasks.</p>
      <p>Sharigans Ali et al. [18] submitted a total of two runs for Task 3. Their approach is a GPT-3.5 model
for both the sentence-level and abstract-level tasks.</p>
      <p>SONAR (No paper received) submitted a single run for Task 3. Their approach is a standard model
for the sentence-level task.</p>
      <p>Tomislav/Rowan Mann and Mikulandric [19] submitted a total of two runs for Task 3. Their approach
is the LLama 2 model with a range of prompts and post-processing for both the sentence-level and
abstract-level tasks. Their submission only covers a part of the train topics.</p>
      <p>UAmsterdam Bakker et al. [20] submitted a total of ten runs for Task 3. They experiment with
GPT-2, and Wiki and Cochrane-trained models at the sentence, paragraph, and document-level text
simplification, for both sentence-level and document-level tasks.</p>
      <p>UBO Vendeville et al. [21] submitted a total of four runs for Task 3. Their approach is to prompt a
smaller Phi3 model for lexical and grammatical text simplifications, for both the sentence-level and
abstract-level tasks.</p>
      <p>UZHPandas Michail et al. [22] submitted a total of ten runs for Task 3. They experiment with a
multi-prompt Minimum Bayes Risk (MBR) decoding approach to the sentence-level task. Their approach
is a refinement of their CLEF 2023 approach, which was recognized with a prestigious Best of the Labs
award, and published as part of the CLEF 2024 LNCS proceedings [23].</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>This section details the results of the task, for both sentence-level and abstract-level test simplification
subtasks.</p>
      <sec id="sec-4-1">
        <title>4.1. Task 3.1: Sentence-level scientific text simplification</title>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Task 3.2: Abstract-level scientific text simplification</title>
        <p>Table 5 shows the Task 3.2 (abstract-level text simplification) results. Again we restrict the table to
submissions covering a suficient number of input abstracts.</p>
        <p>We make a number of observations. First, in terms of evaluation measures like SARI we see again
similar encouraging performance levels when evaluating against the human reference simplifications.
This is partly due to the use of proven sentence-level text simplification models with the output merged
back into the entire abstract. Second, there remains room for improvement in capturing the human
simplifications more closely, as the BLEU score remains low throughout. Here, the more conservative
approaches seem to obtain better scores. Third, we see less extreme values on the other indicators, but
still considerable variation in the compression ratio and number of splits, and proportions of addition
and deletions. We will investigate how much of the output is grounded in the source sentences and
abstracts below.</p>
        <p>Many submissions rely on proven sentence-level text simplification approaches, with results closely
mirroring those observed for the sentence-level task. It is encouraging to see solid performance for the
approaches that perform text simplification at the entire abstracts in one pass. This holds the promise to
incorporate the discourse structure, use more complex text simplifications operations such as deletions
and merges, and deploy planner-based approaches to the text simplification of long documents.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Train results</title>
        <p>In this section, we show the results over the train data for sentence-level and abstract-level scientific
text simplification. This analysis includes those submission retricted to the train data and left out above.
4.3.1. Task 3.1: Sentence-level scientific text simplification
Table 6 shows the sentence-level text simplification results for the train data.</p>
        <p>We make the following observations. First, we observed very high performance with SARI scores
up to 65% for systems fine-tuned on the train data. Even more striking are very high BLEU scores of
over 50%. This is a signal of potential overfitting, although the top performing systems on train still
perform reasonably on the new test data. The majority of runs performs similar on train and test, which
is according to expectation as most are not particularly trained or fine-tuned on the relatively small set
of train sentences and abstracts.</p>
        <p>Second, we observe again a clear reduction of FKGL readability, in particular for systems with a
high proportion of sentence splits. We make the same proviso that although shorter sentences, and
shorter or more common words, is a weak proxy for text complexity, as complex terminology and brief
abbreviations may remain and stay opaque for lay users. A very simple grammar is common in youth
reading levels, such as target by the popular Newsela-auto [13] data, making FKGL a popular readability
score. However, in plain English summaries of scientific text we don’t observe such reduction [25].</p>
        <p>Third, while we observe higher scores on the train data in Table 4 than on the test data above in
Table 4, there seems to be still room for improvement. Throughout the table, we see many low BLEU
scores, and very high fractions of additions may risk gratuitous introduction of new content, and hence
risk “hallucination.”
4.3.2. Task 3.2: Abstract-level scientific text simplification
Table 7 shows the abstract-level text simplification results for the train data.</p>
        <p>We make the following observations. First, we observe higher scores for systems who deploy
ifnetuning, which doesn’t seem to generalize to the unseen test evaluation before. Most systems,
however, wer not particularly trained or finetuned on the train data and show similar performance on
both train and test.</p>
        <p>Second, we observe solid performance for the more complex document-level scientific text
simpliifcation task, but this is due to many systems deploying proving sentence-level text simplification
technology with merging the sentence-level output back into complete abstracts.</p>
        <p>Third, while a sentence-level approach to document-level text simplification is a pragmatic choice
and viable strategy, several model perform direct abstract-level or paragraph-level taking the discourse
structure and more complex sentences reordering and deletion into account. These document-level text
simplification approach tend to lead to far greater compression, including whole sentence deletions,
making their output far more succinct than sentence-level approaches to document-level text
simplification. Giving their succinct output, and in light of the sentence-level constructed human reference
simplifications, the scores of direct abstract-level or paragraph-level approaches is impressive. Further
research in such document-level text simplification approaches would be important in the future of the
CLEF SimpleText track.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Analysis</title>
      <p>This section provides further analysis of the submitted runs, and the task as whole.</p>
      <p>As various kinds of output devices emerged , such as highresolution printers or a display
of PDA ( Personal Digital Assistant ) , the . The importance of high-quality resolution
conversion has been increasing . ⃒⃒ This paper proposes a new method for enlarging an
image with high quality . It will involve using a combination of high-speed imaging and
high-resolution video . ⃒⃒ One of the largest biggest problems on image enlargement is the
exaggeration of the jaggy edges . This is especially true when the image is enlarged , as
in this case . ⃒⃒ To remedy this problem , we propose a new interpolation method , which .
This method uses artificial neural network to determine the optimal values of interpolated
pixels . ⃒⃒ The experimental results are shown and evaluated . The results are compared to
other studies and found to be inconclusive . ⃒⃒ The efectiveness of our methods is discussed
by comparing with the conventional methods . Our methods are designed to help people
with mental health problems , not just as a way to cure them . ⃒⃒</p>
      <sec id="sec-5-1">
        <title>5.1. Human Evaluation</title>
        <p>Due to the delayed submission deadline, as well as, follow-up correspondence with teams on partial or
incorrect output, the manual annotation of system output has been limited to a small sample, and is still
ongoing. We report here only initial observations from the translation professionals conducting this
analysis, based on the expectation of what a professional editor would provide as reference output. We
looked in particular at the novel document-level simplifications of the entire abstract, and it’s coherence
and discourse structure.</p>
        <p>First, and foremost, something is working. The automatic text simplifications are generally of
impressive quality despite the remaining limitations that are the focus of this section. The fluency
and language variation is impressive, and far exceeds earlier language generation technology often
reflecting the protocol, and template or rule-based system underlying it.</p>
        <p>
          Second, changes can be unnecessary nor helpful. Frequently, as we observed in our work on the
project last year [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], the information is written in another way but does not ofer simplification.
Sometimes the vocabulary does no change but is simply rearranged.
        </p>
        <p>Third, discourse structure matters. In other examples the resulting text is not shaped as a whole,
with a proper beginning middle and end, but is reorder to the detriment of clarity. For example, the
ifrst sentence of the “simplified” abstract can contain a reference back to information already given.
Another example: start of a first sentence with “ However, . . . ” in the simplification when source text
started “It is the purpose of this study, . . . ” or with “For example, . . . ” when the original first sentences
presented the subject.</p>
        <p>Fourth, brevity is not always clearer. Although some examples shorten the sentences within an
abstract, thus technically simplifying, their interrelation is not necessarily maintained, producing a
choppy style. Better results were produced when the new text was split into subsections dedicated to
particular subtopics, including their explanation.</p>
        <p>Fifth, gratuitous additions are problematic. Another type of problem is illustrated by the creation
of a cumbersome nominal group “the 21st Century managed care needs of patients, . . . ” which does
not exist in the original, where we instead had an evocative example: “the emergency room at home.”
Here though, both things belong in the same domain. Elsewhere, seeming hallucinations appeared,
for example, through the addition of an of-topic sentence. For example, to an abstract about digital
tools to aid Parkinson’s suferers, we found the following last sentence added during simplification:
“It includes advice on how to manage consultant work, such as research and development .” Although, in
terms of meaning, this has no equivalent in the source text, the source text starting sentence was: “The
paper also discusses how a practitioner can accomplish UCSD in the context of product development and
consultant work.”, which mentions the topic in a diferent context.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Spurious or overgeneration</title>
        <p>We conduct a deeper analysis of how much of the generated simplified output sentences and abstracts
can be traced to the source input. In particular, we look at spurious generated content and it’s prevalence
in the submitted generated text simplifications. This content is at risk of being introduced gratuitously
by the generative model, and what is informally referred to as “hallucinations.”</p>
        <p>Earlier in Table 2, we showed an example of a human reference simplification, combining the input
sentences belonging to the abstract of the document  = 130055196 retrieved for query G01.1. We can
do the same for the automatically generated scientific text simplifications. We show again the deletions
and insertions relative to the source input sentences. Table 8 shows an example output simplification of
one of the participating teams, for the same input sentences as in Table 2 above. Most simplifications
are revisions of the input, but we also observe that sometimes an entire sentence is inserted (shown as
xxx in Table 8). The example in Table 8 is an extreme case picked to illustrate both the importance and
complexity of detecting such spurious content.</p>
        <p>We provide a detailed analysis quantifying the prevalence of spurious content in the CLEF 2024
# Input Sentences
SimpleText Task 3 submissions. Table 9 quantifies how often such spurious generation occurs. We
re-aligned the generated output with the original source sentences, and flag here only entire output
sentences that do not share a single token with the input. Our analysis reveals that the amount of
spurious content is varying but far from infrequent. A total of 17 out 36 submissions (47%) have spurious
whole sentences in at least 10% of the input sentences. In fact, 14 (39%) submissions in at least 20% of
the input, and 7 (19%) submissions in at least 50% of the input sentences. The detection of non-aligned
output sentences is indicative but imperfect. For example, a significant reordering of content may lead
to false positives in rare cases, and unusual tokenization or formatting may afect the alignment with
the source even systematically. Note also that the detected additions may introduce helpful background
knowledge or other useful information to contextualize the information in the source sentences.</p>
        <p>Table 10 quantifies how often such spurious generation occurs for the abstract-level output. Here we
look again at the spurious output at the end of the input abstract, rather than conducting a sentence-level
analysis as done above. Aligning longer text is more complex than sentences. For those generating true
paragraph or document level simplifications, we observe more variation involving content of multiple
input sentences leading to a more complex alignment. Hence we focus on detecting spurious content at
the end of the generated abstract. As a result, for those aggregating sentence-level output merged into
the abstracts, we are only able to detect spurious content for the final sentence.</p>
        <p>We make a number of observations based on our analysis in this section. First, the fraction of
sentences with spurious content is very low for some submissions, however, for other submissions, the
fraction is very substantial. Second, the standard evaluation measures used for text simplification, and
in fact for any text generation task in NLP, do not take this aspect into account. A submission with
significant spurious content can still obtain very high text overlap with the reference, and hence obtain
a very high performance score. Third, and more generally, human evaluation and this type of analysis
feel crucial to accurately evaluate generative models for the NLP and IR challenges addressed in our
Track and in CLEF in general.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>
        The paper provides an overview of the CLEF 2024 SimpleText Task 3: Text Simplification, which focuses
on the simplification of scientific text. The objective of the task is to simplify either the separate
sentences or the entire scientific abstracts in order to enhance their accessibility and comprehensibility
for a general audience. We highlighted the key aspects and goals of the task within the broader context
of the CLEF 2024 SimpleText track [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>Our main findings are the following: First, we observe competitive performance for scientific text
simplification, both on evaluation against the human reference simplifications and on text statistics
such as FKGL readability score. Second, the abstract-level text simplification results is a mixture of
sentence-level and passage-level text simplification approaches. Third, our analysis reveals a very high
and varying range of spurious text generation, not detected by standard evaluation measures, and a
major concern in the use of these model in a real-world setting. More generally, almost all participants
use generative models (for the task, the track, and CLEF in general), and the track ofers a unique setting
to study some of the inherent limitations of generative models.</p>
      <p>
        The main aim of our task, the track, and the CLEF evaluation forum as a whole, is i) to foster a
community of IR, NLP, and AI researchers working together on the important task of making science
more accessible for everyone, and ii) to construct corpora and evaluation resources to stimulate research
on scientific text summarization and simplification. In terms of a building a community researching
scientific text summarization and simplification, the task saw a record attendance in 2024: due to the
additional abstract level task we received 83 runs from 15 teams, the largest number of participating
teams ever. In fact, the community is broadening beyond CLEF and raising general interest in generative
scientific text summarization and simplification [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Within the CLEF 2024 SimpleText Task 3, we have constructed extensive corpora and manually
labeled evaluation data for scientific text simplification. Specifically, we added in 2024 a a parallel
corpus of manually simplified sentences and abstracts from the scientific literature:
• Train, sentence level: 958 source sentences from scientific abstracts paired with corresponding
human reference simplifications.
• Test, sentence level: 578 source sentences from scientific abstracts paired with corresponding
human reference simplifications.
• Train, abstract level: 175 source scientific abstracts paired with corresponding human reference
simplifications.
• Test, abstract level: 103 source scientific abstracts paired with corresponding human reference
simplifications.</p>
      <p>These reusable corpora and evaluation resources are available to participants and other researchers who
want to work on the important problem of making scientific information open and easily accessible for
everyone.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This track would not have been possible without the great support of numerous individuals. We want to
thank in particular the colleagues and the students who participated in data construction and evaluation.
Please visit the SimpleText website for more details on the track.1</p>
      <p>Liana Ermakova is funded by the French National Research Agency (ANR) Automatic Simplification of
Scientific Texts project (ANR-22-CE23-0019-01),2 and the MaDICS research group.3 Jaap Kamps is partly
funded by the Netherlands Organization for Scientific Research (NWO CI # CISC.CC.016, NWO NWA #
1518.22.105), the University of Amsterdam (AI4FinTech program), and ICAI (AI for Open Government
Lab). Views expressed in this paper are not necessarily shared or endorsed by those funding the
research.</p>
      <sec id="sec-7-1">
        <title>1https://simpletext-project.com/ 2https://anr.fr/Project-ANR-22-CE23-0019 3https://www.madics.fr/ateliers/simpletext/</title>
        <p>2023, pp. 2855–2875. URL: https://ceur-ws.org/Vol-3497/paper-240.pdf.
[13] W. Xu, C. Callison-Burch, C. Napoles, Problems in current text simplification research: New data
can help, Transactions of the Association for Computational Linguistics 3 (2015) 283–297. URL:
https://aclanthology.org/Q15-1021. doi:10.1162/tacl_a_00139.
[14] D. P. Varadi, A. Bartulović, SimpleText 2024: Scientific Text Made Simpler Through the Use of AI,
in: [26], 2024.
[15] N. Largey, R. Maarefdoust, S. Durgin, B. Mansouri, AIIR Lab Systems for CLEF 2024 SimpleText:</p>
        <p>Large Language Models for Text Simplification, in: [26], 2024.
[16] A. Capari, H. Azarbonyad, G. Tsatsaronis, Z. Afzal, Enhancing Scientific Document Simplification
through Adaptive Retrieval and Generative Models, in: [26], 2024.
[17] R. Elagina, P. Vučić, AI Contributions to Simplifying Scientific Discourse in SimpleText 2024, in:
[26], 2024.
[18] S. M. Ali, H. Sajid, O. Aijaz, O. Waheed, F. Alvi, A. Samad, Improving Scientific Text Comprehension:</p>
        <p>A Multi-Task Approach with GPT-3.5 Turbo and Neural Ranking, in: [26], 2024.
[19] R. Mann, T. Mikulandric, CLEF 2024 SimpleText Tasks 1-3: Use of LLaMA-2 for text simplification,
in: [26], 2024.
[20] J. Bakker, G. Yüksel, J. Kamps, University of Amsterdam at the CLEF 2024 SimpleText Track, in:
[26], 2024.
[21] B. Vendeville, L. Ermakova, P. De Loor, UBO NLP report on the SimpleText track at CLEF 2024, in:
[26], 2024.
[22] A. Michail, P. S. Andermatt, T. Fankhauser, Scientific Text Simplification Using Multi-Prompt</p>
        <p>Minimum Bayes Risk Decoding: Examining MBR’s Decisions, in: [26], 2024.
[23] A. Michail, P. S. Andermatt, T. Fankhauser, Scientific text simplification using multi-prompt
minimum bayes risk decoding: Simpletext best of labs in CLEF 2023, in: [27], 2024.
[24] L. Ermakova, J. Kamps, Complexity-aware scientific literature search: Searching for relevant
and accessible scientific text, in: G. M. D. Nunzio, F. Vezzani, L. Ermakova, H. Azarbonyad,
J. Kamps (Eds.), Proceedings of the Workshop on DeTermIt! Evaluating Text Dificulty in a
Multilingual Context @ LREC-COLING 2024, ELRA and ICCL, Torino, Italia, 2024, pp. 16–26. URL:
https://aclanthology.org/2024.determit-1.2.
[25] J. Bakker, J. Kamps, Plan-guided simplification of biomedical documents, in: Under Submission,
2024.
[26] G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024:</p>
        <p>Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org, 2024.
[27] L. Goeuriot, G. Q. Philippe Mulhem, D. Schwab, L. Soulier, G. M. D. Nunzio, P. Galuščáková, A. G. S.
de Herrera, G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and
Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF
2024), Lecture Notes in Computer Science, Springer, 2024.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>G. M. D. Nunzio</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Vezzani</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Ermakova</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Azarbonyad</surname>
          </string-name>
          , J. Kamps (Eds.),
          <source>Proceedings of the Workshop on DeTermIt! Evaluating Text Dificulty in a Multilingual Context @ LREC-COLING</source>
          <year>2024</year>
          ,
          <article-title>ELRA</article-title>
          and
          <string-name>
            <given-names>ICCL</given-names>
            ,
            <surname>Torino</surname>
          </string-name>
          , Italia,
          <year>2024</year>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .determit-
          <volume>1</volume>
          .0.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Štajner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Saggio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shardlow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alva-Manchego</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability</source>
          , INCOMA Ltd.,
          <string-name>
            <surname>Shoumen</surname>
          </string-name>
          , Bulgaria, Varna, Bulgaria,
          <year>2023</year>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .tsar-
          <volume>1</volume>
          .0.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Štajner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Saggion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ferrés</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shardlow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. C.</given-names>
            <surname>Sheang</surname>
          </string-name>
          , K. North,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , W. Xu (Eds.),
          <source>Proceedings of the Workshop on Text Simplification</source>
          , Accessibility, and
          <string-name>
            <surname>Readability</surname>
          </string-name>
          (TSAR-
          <year>2022</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , Abu Dhabi,
          <source>United Arab Emirates (Virtual)</source>
          ,
          <year>2022</year>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .tsar-
          <volume>1</volume>
          .0.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Saggion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stajner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ferrés</surname>
          </string-name>
          , K. C. Sheang (Eds.),
          <source>Proceedings of the First Workshop on Current Trends in Text Simplification (CTTS</source>
          <year>2021</year>
          )
          <article-title>co-located with the 37th Conference of the Spanish Society for Natural Language Processing (SEPLN2021), Online (initially located in Málaga</article-title>
          , Spain),
          <year>September 21st</year>
          ,
          <year>2021</year>
          , volume
          <volume>2944</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2021</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2944</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bellot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Braslavski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mothe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nurbakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Ovchinnikova</surname>
          </string-name>
          , E. SanJuan, Overview of simpletext 2021 -
          <article-title>CLEF workshop on text simplification for scientific information access</article-title>
          , in: K. S. Candan,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Larsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maistro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality</source>
          , Multimodality, and Interaction - 12th
          <source>International Conference of the CLEF Association, CLEF</source>
          <year>2021</year>
          ,
          <string-name>
            <given-names>Virtual</given-names>
            <surname>Event</surname>
          </string-name>
          ,
          <source>September 21-24</source>
          ,
          <year>2021</year>
          , Proceedings, volume
          <volume>12880</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2021</year>
          , pp.
          <fpage>432</fpage>
          -
          <lpage>449</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -85251-1_
          <fpage>27</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -85251-1\_
          <fpage>27</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          , E. SanJuan, J. Kamps,
          <string-name>
            <given-names>S.</given-names>
            <surname>Huet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Ovchinnikova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nurbakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Araújo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hannachi</surname>
          </string-name>
          , É. Mathurin,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bellot</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2022 simpletext lab: Automatic simplification of scientific texts</article-title>
          , in: A.
          <string-name>
            <surname>Barrón-Cedeño</surname>
            ,
            <given-names>G. D. S.</given-names>
          </string-name>
          <string-name>
            <surname>Martino</surname>
            ,
            <given-names>M. D.</given-names>
          </string-name>
          <string-name>
            <surname>Esposti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Pasi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hanbury</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Potthast</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality</source>
          , Multimodality, and Interaction - 13th
          <source>International Conference of the CLEF Association, CLEF</source>
          <year>2022</year>
          , Bologna, Italy, September 5-
          <issue>8</issue>
          ,
          <year>2022</year>
          , Proceedings, volume
          <volume>13390</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2022</year>
          , pp.
          <fpage>470</fpage>
          -
          <lpage>494</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -13643-6_
          <fpage>28</fpage>
          . doi:
          <volume>10</volume>
          . 1007/978-3-
          <fpage>031</fpage>
          -13643-6\_
          <fpage>28</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          , E. SanJuan, S. Huet,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azarbonyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Augereau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2023 simpletext lab: Automatic simplification of scientific texts</article-title>
          , in: A.
          <string-name>
            <surname>Arampatzis</surname>
            , E. Kanoulas,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Tsikrika</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Vrochidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Giachanou</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Aliannejadi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Vlachos</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality</source>
          , Multimodality, and Interaction - 14th
          <source>International Conference of the CLEF Association, CLEF</source>
          <year>2023</year>
          , Thessaloniki, Greece,
          <source>September 18-21</source>
          ,
          <year>2023</year>
          , Proceedings, volume
          <volume>14163</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2023</year>
          , pp.
          <fpage>482</fpage>
          -
          <lpage>506</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -42448-9_
          <fpage>30</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -42448-9\_
          <fpage>30</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>SanJuan</surname>
          </string-name>
          , S. Huet,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2024 SimpleText task 1: Retrieve passages to include in a simplified summary</article-title>
          ,
          <source>in: [26]</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vezzani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bonato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azarbonyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2024 SimpleText task 2: Identify and explain dificult concepts</article-title>
          ,
          <source>in: [26]</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>J. D'Souza</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kabongo</surname>
            ,
            <given-names>H. B.</given-names>
          </string-name>
          <string-name>
            <surname>Giglou</surname>
            ,
            <given-names>Y. Zhang,</given-names>
          </string-name>
          <article-title>Overview of the CLEF 2024 SimpleText Task 4: SOTA? Tracking the State-of-the-Art in Scholarly Publications</article-title>
          , in: [26],
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          , E. SanJuan, S. Huet,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azarbonyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vezzani</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. D'Souza</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2024 SimpleText track: Improving access to scientific texts for everyone</article-title>
          ,
          <source>in: [27]</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bertin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>McCombie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2023 simpletext task 3: Simplification of scientific texts</article-title>
          , in: M.
          <string-name>
            <surname>Aliannejadi</surname>
            , G. Faggioli,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
          </string-name>
          , M. Vlachos (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2023</year>
          ), Thessaloniki, Greece,
          <source>September 18th to 21st</source>
          ,
          <year>2023</year>
          , volume
          <volume>3497</volume>
          <source>of CEUR Workshop Proceedings</source>
          , CEUR-WS.org,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>