<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UBONLP Report on the SimpleText Track at CLEF 2024</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Benjamin Vendeville</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liana Ermakova</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre De Loor</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brest France</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Université de Bretagne Occidentale / HCTI</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brest France</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ENIB / Lab-STICC (UMR CNRS 6285)</institution>
          ,
          <addr-line>Brest</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article presents the UBONLP team's participation at the SimpleText lab of CLEF 2024 in tasks 1 "Selecting passages to include in a simplified summary", 2 "Dificult concept identification and explanation", and 3 "Given a query, simplify passages from scientific abstracts". Our goal is to use recent advances in natural language processing to help the public better understand scientific information. In Task 1 we show a method using TF_IDF and a neural reranker to retrieve scientific texts. In Task 2 we use a non fine-tuned Phi3 mini to extract complicated terms. Task 3 we use a LLM pipeline with separate syntactic and lexical simplifications.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;LLM</kwd>
        <kwd>Ranking</kwd>
        <kwd>information retrieval</kwd>
        <kwd>Neural reranking</kwd>
        <kwd>Term dificulty</kwd>
        <kwd>Automatic text simplification</kwd>
        <kwd>Science popularization</kwd>
        <kwd>Lexical simplification</kwd>
        <kwd>Syntactic simplification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>– Subtask 3.2: Abstract-level simplification
• Task 4: SOTA: Tracking the state-of-the-art in scholarly publications.</p>
      <p>
        We participated in Tasks 1, 2 (subtask 1), and 3 (subtasks 1 and 2). For Task 1 we use Pyterrier1
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to index documents, TF_IDF to rank them, and MonoT5 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to rerank the top results. For Task 2
we used Phi3 mini [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], a LLM, to extract and score complex terms in a one-shot prompt context [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
using no fine-tuning. For Task 3 we used Phi3 mini in a pipeline that separated syntactic and lexical
simplifications. Again, the model was not fine-tuned and used a one-shot prompt. We further tested
this method on data.
      </p>
      <p>We will first present our method and results for the Task 1. Then we will present the method, prompts,
and results for Task 2. In chapter 4 we will present the method for Task 3 and study the results in details.
We will see that our method for Task 3 can produce some results when separating lexical and syntactic
simplification.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Task 1: Passage Selection for a Simplified Summary</title>
      <p>In this task, participants were provided with a dataset of abstracts with their metadata (author names,
title, year of publication. . . ). Participants are also provided with a set of references for training, and a
test dataset of queries. Task 1 consists of, for each query, retrieving the 100 most relevant documents.</p>
      <p>
        For Task 1, we first used PyTerrier 1 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a framework for creating information retrieval pipelines, to
index all documents. We wanted to use an LLM to rank abstracts, but the number of initial documents
was too great to practically run any model. Instead, we used TF_IDF to first rank all documents based
on their abstracts and titles and kept the 4000 most relevant documents. Then we could use the MonoT5
reranker [
        <xref ref-type="bibr" rid="ref3 ref6">3, 6</xref>
        ] provided by PyTerrier to rerank all extracted documents and kept the 100 best.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Metrics</title>
        <p>
          To measure the quality of simplifications, we will use the following metrics as provided by the EASSE
library [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]:
• MRR: The Mean Reciprocal Rank is a metric used to evaluate the performance of search engines,
recommendation systems, and other information retrieval systems. It measures the average rank
at which the first relevant item is found in the search results. The results vary from 0 to 1, with 1
being a perfect score, where relevant items appear at the top position for all queries.
• Prec10: Precision 10 is a metric used to evaluate the performance of information retrieval systems.
        </p>
        <p>It measures the proportion of relevant items among the top 10 results returned by the system.
The value ranges from 0 to 1, with 1 being a perfect score where all of the top 10 results are
relevant and 0 meaning no relevant results among the top 10.
• Prec20: Precision 20 is a metric used to evaluate the performance of information retrieval systems.</p>
        <p>Like Precision10, it measures the proportion of relevant items, but focusing instead on the top 20
results returned by the system. The value ranges from 0 to 1, with 1 being a perfect score where
all of the top 20 results are relevant and 0 meaning no relevant results among the top 20.
1 https://pyterrier.readthedocs.io/en/latest/
• NDCG10: The Normalized Discounted Cumulative Gain 10 metric is based on a normalization of
the Discounted Cumulative Gain, which gives a score based on the relevance of every result in
the top 10, weighted by their position. The values range from 0 to 1 with 1 being a perfect score
where the most relevant results appear at the top of the top 10 results, and 0 meaning no relevant
results among the top 10.
• NDCG20: The metric is the same as NDCG10 but focusing on the top 20. The values range from
0 to 1 with 1 being a perfect score where the most relevant results appear at the top of the top 20
results, and 0 meaning no relevant results among the top 20.
• Bpref: The Binary Preference is a metric used to evaluate the performance of information
retrieval systems. It is designed to handle situations where not all documents have been judged
for relevance. It measures the fraction of relevant documents ranked higher than non-relevant
documents, considering only judged documents. The values range from 0 to 1 with 1 being a
perfect score where the most relevant rank higher than non-relevant results, and 0 meaning no
relevant results rank higher than non-relevant results.
• MAP: The Mean Average Precision is a commonly used metric in information retrieval and
machine learning for evaluating the performance of ranking systems. It is the mean of the average
precision scores for a set of queries. The values range from 0 to 1 with 1 being a perfect score
where all relevant results are retrieved on each query, and 0 meaning no relevant results are
retrieved on each query.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Results</title>
        <p>The run results, named UBO_Task1_TFIDFT5, can be found in Table 1. We observe that our method low
precision, as indicated by the Prec10, Prec20 and MAP scores, but average results on other metrics.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Task 2 Dificult Concept Identification and Explanation</title>
      <p>This Task is divided into three subtasks:
• Task 2.1: To predict what are the terms in a passage of a document and their dificulty in as e, m
or d (Easy/Medium/Dificult)
• Task 2.2: To generate a definition and an explanation only for the dificult terms
• Task 2.3: To retrieve the provided definitions of the dificult terms and rank them in the “correct”
order: manual (2, ground truth), generated positive 1 (1, correct definitions), generated positive 2
(1, correct definitions), generated negative 1 (0, incorrect definitions), generated negative 2 (0,
incorrect definitions).</p>
      <p>
        We participated in Task 2.1. For this subtask, participants were provided with a test dataset consisting
of sentences extracted from scientific documents. Participants were asked to, for each sentence, extract
complicated terms and rate their complexity in easy, medium, or dificult. Participants were also
provided with a training dataset consisting of another set of scientific texts with the corresponding
extracted terms, rated by dificulty. For this Task, we chose to use Phi3 mini [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], a Small Language
Model optimized for following instructions. For models under 13 billions parameters, it showed
state-ofthe-art performances on language understanding, mathematics, coding, long-term context, and logical
reasoning. We used it without fine-tuning with a one-shot prompt as follows.
      </p>
      <p>
        Table 2 shows the prompt used for Task 2.1. We decided to emphasize the importance of the format in
the query to improve the results’ interpretation. Additionally, we decided to prompt for complexity in
the [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1,2,3</xref>
        ] scale (1-Easy, 2-Medium, 3-Dificult) instead of the mandated [e,m,d] scale because it showed
improved performance in our manual tests. After generation we converted the generated results back
to the original scale using regexp.
      </p>
      <p>After the inference, we had a number of problems to solve on the generated data, with examples
shown in Table 3:</p>
      <p>• Over-generations, with extra text after the json-like answer
}
&lt;|end|&gt;
&lt;|query|&gt; «input» &lt;|answer|&gt;
– We fixed the missing double quotes with a regex and removed the duplicate double quotes
with a series of ".replace" methods
• Removing unneeded spaces in ratings</p>
      <p>
        – We fixed this using regex
• Converting rating scale from [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1,2,3</xref>
        ] to [e,m,d]
      </p>
      <sec id="sec-3-1">
        <title>3.1. Metrics</title>
        <sec id="sec-3-1-1">
          <title>The results were evaluated using the following metrics:</title>
          <p>• Recall Overall: recall overall is the proportion of terms that were found, independently of the
dificulty. The results vary from 0 to 1, with 1 being a perfect score, where all expected terms
were found.
• Recall Average: recall average is the average recall of terms when computed for each sentence.</p>
          <p>The results vary from 0 to 1, with 1 being a perfect score, where all expected terms were found.
• Recall Dificult: recall dificult terms is the proportion of dificult terms that were found. The
results vary from 0 to 1, with 1 being a perfect score, where all expected dificult terms were
found.
• Precision Dificult: Precision dificult is the ratio of terms labeled as dificult to those expected.</p>
          <p>The results vary from 0 to 1, with 1 being a perfect score, where all terms labeled as dificult were
expected.</p>
          <p>• bleu_nx bleu_nx is the BLEU score computed with ngrams n =1, 2, 3, 4.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Results</title>
        <p>The results for Task 2.1 can be found in Table 4. We can observe a good score on recall-based metrics
(such as Recall Overall, Recall Average and Recall Dificult), but our score gets much worse on the
precision-based metric Precision dificult. This would indicate that our method had a tendency to
generate too many terms.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Task 3: Simplification of Scientific Texts</title>
      <p>In this Task, participants were asked to simplify scientific texts. it was divided into two subtasks:
• Task 3.1 focused on simplifying sentences. Participants were provided the following data:
– For training: 893 sentences with their manually written references.
– For testing: 578 sentences.
– For training: 175 abstracts with their manually written references.</p>
      <p>– For testing: 103 abstracts.</p>
      <p>• Task 3.2 focused on focusing on whole abstracts. Participants were provided the following data:</p>
      <sec id="sec-4-1">
        <title>The participant needed to provide the generated simplifications for both test subtasks.</title>
        <p>
          The literature divides simplification into two categories: lexical simplicity and syntactic simplicity [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
Lexical simplicity relates to the complexity of terms, while syntactic simplicity refers to the structure of
the sentence. The current neural methods, while aware of this, do not explicitly provide lexic-specific
simplification or syntax-specific simplification [
          <xref ref-type="bibr" rid="ref9">9, 10</xref>
          ]. An exception can be made for models trying to
simplify single words and not entire texts [11] which only focus on lexical simplicity.
        </p>
        <p>
          Recently, Large Language Models have proven very efective at a variety of natural language
processing tasks [
          <xref ref-type="bibr" rid="ref5">5, 12</xref>
          ], including, to a lesser degree, text simplification [ 11]. One part of this success is the
use of carefully selected prompts for improving accuracy [10]. Another is the use of pipelines chaining
LLMs to take advantage of models specialized in a part of the task at hand. LLM Chaining implies
dividing a task into multiple subtasks, defining a distinct LLM for each step, and using the output from
one LLM as an input to the next [13].
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>In this task, we aimed to answer the following questions:</title>
      </sec>
      <sec id="sec-4-3">
        <title>1. Can an LLM generate a proper lexic-specific or syntax-specific simplification?</title>
        <p>2. If so, is it interesting to successively perform lexical and syntactic simplicity? Does the order
matter?
3. If we successively perform simplifications, is it relevant to simplify the syntax multiple times? Or
the lexical?</p>
        <p>
          We aim to study question 1 by building two systems : one for performing syntax-specific simplification
and one for performing lexic-specific simplification. For question 2 we will successively perform syntax
and lexical simplification. We will test both the “ syntax-lexic” and “lexic-syntax” orders. Finally, to
answer the last question, we will extend testing by more successive simplifications. We will test those
runs using metrics such as FKGL, BLEU, SARI and other metrics provided by EASSE [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] as detailed in
the next section.
        </p>
        <sec id="sec-4-3-1">
          <title>4.1. Methodology</title>
          <p>We want to study the impact of chaining the generations. For that, we generate text using one prompt
and use the generated text as the input for the subsequent generation. This way, every generation is in
a separate context.</p>
          <p>We have two stages: lexical simplification and syntactic simplification, we will abbreviate them
as l and s respectively. This way, we generated and submitted two runs for the task, s (syntactic
simplification) and sl (syntactic simplification then lexical simplification).</p>
          <p>
            We decided to apply those strategies with Phi3 mini [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]. The small size of the model allowed us to
eficiently perform the successive inferences. Additionally, the model is intended for reasoning tasks
which we believed would benefit the prompts we chose. We decided to test the model in a one-shot
prompt context [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ], using no fine-tuning.
          </p>
          <p>We created a prompt for each one of the stages. We used queries that give an explanation of the task
followed by a single example. Prompts can be found in Tab 5.</p>
          <p>For the syntax simplification stage, we try to focus the model on sentence splitting, something that
simplification models usually struggle with. Based on manual tests, we found that the best prompts do
not mention simplification and instead describe the transformations needed for simplification. Telling
the model to focus on listing the "smallest logic proposition" ofered convincing results, with proper
format. Since models are usually conservative in sentence splitting, we chose an example (taken from
the abstract of [14]) that was manually simplified by excessively insisting on sentence splitting. In our
manual tests, this insistence made the models generate reasonable sentence splitting.</p>
          <p>For the lexical simplification stage, we found that talking about “dificult words” gave better results
than “complicated terms”, this may be due to the added complexity of identifying a term [15]. For the
example, we used sentences from diferent documents [ 16] that contained complicated, domain-specific
language.</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>4.2. Metrics</title>
          <p>To evaluate runs, we use the following metrics:
• FKGL: The Flesch-Kincaid Grade Level [17] is a readability test designed to indicate how dificult
a passage of English text is to understand. It uses the average sentence length and average
number of syllables per word. It provides a grade-level score that corresponds to the U.S. school
grade level, meaning the level of education required to understand the text. Higher means more
complex, with theoretical lower bound of -3.40 and no upper bound.
• BLEU: The Bilingual Evaluation Understudy [18] metric is a method for evaluating the quality
of machine-translated text by comparing it to one or more reference translations. It compares
the n-grams in common between the reference and the generation. In simplification, it is used
by considering the task as a translation from “normal English” to “simple English” considered a
diferent language. The score ranges from 0 to 1, 1 being a perfect score.
• SARI: The System output Against References and against the Input [19] metric is a text evaluation
metric specifically designed for assessing the quality of text simplification systems. It is calculated
based on the number of operations (addition, deletion, keep) needed to go from the input to the
generation, compared to a reference. The score ranges from 0 to 100, 100 being a perfect score.
• Compression ratio: The compression of the generated output compared to the reference.</p>
          <p>
            Computed by taking the number of tokens present on both the generated output and the reference,
and comparing that to their total number of tokens. A higher score means the generation is more
compressed.
• Sentence splits: The number of sentence splits performed during generation. Higher means
more splits.
• Levenshtein similarity: The Levenshtein similarity metric, is a measure of the similarity
between two strings. It quantifies the minimum number of single-character edits (insertions,
deletions, or substitutions) required to change one string into the other. In our case, we compare
the input and the generation. A higher score means a higher similarity.
• Exact copies: The number of generated sentences that are exact copies of the input.
• Additions proportion: Proportion of added words in the generation.
• Deletions proportion: The proportion of words deleted in the generation.
• Lexical complexity score: The lexical complexity is computed by taking the log-ranks of each
word in the frequency table and aggregating those words by their third quartile [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ].
          </p>
        </sec>
        <sec id="sec-4-3-3">
          <title>4.3. Results</title>
          <p>Results for the submitted runs can be found in Table 6 for Task 3.1 and in Table 7 for Task 3.2. Full
results with all participants can be found in the appendix in Tables 12 and 13. We see good results
on SARI and FKGL, although results are very poor on BLEU. Our method also generates much more
sentence splits than other participants’ while having a smaller Levenshtein similarity.</p>
          <p>We wanted to further test our method. For that, we ran a benchmark using the labeled training data
to generate simplifications. This time we studied two “paths” for a generation: lsls and slsl</p>
          <p>Once processed, we found very questionable scores, including over 45 sentence splits on average and
FKGL scores under 2. We filtered out some of these hallucinations by doing the following steps on each
path:
• Removing null or empty generations.
• Removing generations with prompt tokens like “&lt;|answer|&gt;” or “&lt;|query|&gt;”.
• Removing generations with repeating sentences.</p>
          <p>– ex: The advancements in AI technologies have led to [...] improved outcomes. &lt;|query|&gt; The
recent advancements in renewable [...]
– ex: There are recent developments [...] 2. The Turing Test, proposed by Alan Turing, is a
measure of [...] 3. Information provided by whistleblower Edward Snowden [...] 6. The Turing
8.88
8.66
9.03
8.53
1.00
0.01
Test, proposed by Alan Turing, is a measure of [...] 7. Information provided by whistleblower
Edward Snowden [...]
• Removing generations that did not contain alphabetical characters.</p>
          <p>– ex: 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 [...] 235.5 236
• Removing generations that had over 6 times as many characters as the source sentence.
snt slsl path</p>
          <p>snt lsls path
slsl
slsl
35
30
25
e
r
c20
o
s
c
i
tr15
e
M
10
5
0
35
30
e25
r
o
s20
c
c
i
r
t15
e
M
10
5
0
l
l
ls
ls</p>
          <p>Stages
abs lsls path</p>
          <p>Stages
lsl
lsl
lsls
lsls
40
35
30
e
r25
o
c
s
itr20
c
e15
M
10
5
0 s
40
35
30
r25
e
o
c
itr20
s
c
e15
M
10
5
0 s
sl
sl</p>
          <p>Stages
abs slsl path
sls
sls
fkgl
bleu</p>
          <p>Stages
sari
Compression ratio</p>
          <p>Sentence splits
Levenshtein similarity</p>
          <p>Exact copies
Additions proportion</p>
          <p>Deletions proportion
Lexical complexity score</p>
        </sec>
        <sec id="sec-4-3-4">
          <title>4.4. Scores through stages</title>
          <p>Table 8 lists all metric scores on the benchmark, and Figure 1 shows their evolution through the stages.</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>Generation examples can be found in the annex.</title>
        <p>Across all metrics and both data types (sentence and abstracts), we cannot directly see a general trend.
In Figure 2 we can compare the metrics on diferent stages and paths. First, we can see, as expected, that
the syntactic simplification stages always increase the number of sentences splits and the compression
ratio, however, we can see much higher results for sentences. On the sentence level, there is a noticeably
higher proportion of deletions but a much smaller number of additions.</p>
        <p>For the lexical simplification stages, we can see, as expected, a much lower initial score on compression
and sentence splitting. The lexical simplification stages also show a lower score on compression and
splitting than the previous syntactic simplification stage. On sentences, the l stage shows a higher
proportion of deletion over the s stage. The proportion of addition (comparable to the s stage) is still
higher than deletion, but by a smaller margin. On abstracts however, we see the opposite: like the s
stage, we see a higher proportion of deletion over addition, but, like sentences, the diference is smaller
for l than s.</p>
        <p>Figure 3 shows the scores of every stage of simplification for the FKGL, BLEU, SARI, and lexical
complexity metrics. These metrics provide less information about the generation, but are a better
(though imperfect [20]) evaluation of the simplicity of a text.</p>
        <p>First, we see that for sentence-level, BLEU often performs worse on syntactic simplification than on SL.
Unsurprisingly, FKGL shows a better performance on syntactic simplification than lexical simplification.
sl
sls
lsls
slsl
lsls
sl abs: slsl Path Stagesls
lsls
slsl
lsls
slsl</p>
        <p>sls
Compression ratio</p>
        <p>Surprisingly, though, the lexical complexity score does not seem to change noticeably through the
stages, no matter the type of simplification. There is only a slight advantage for syntactic simplification
over SL on the first stage, which is unexpected. With the exception of the lexical complexity score, all of
these metrics perform much better on sentence-level inference than abstract-level. SARI shows a clear
preference towards syntactic simplification, but that diference decreases, especially for sentence-level
inference.</p>
        <p>Figure 4 shows the relative evolution of the metrics through the stages. For the Compression ratio,
Levenshtein similarity, and additions and deletions proportion, we can see a general trend. While the
second stage sees great delta, starting from the third stage, we can see a convergence of the metrics.
Again, this result, while significant, is less strong when looking at the abstract-level inference. We can
also observe that the result evolution is very similar for both the slsl and lsls paths. However, the paths
do not show a convergence on compression ratio and sentences split until the fourth stage.</p>
        <p>When looking at the evolution (Figure 5) we do not see a strong general trend. The BLEU scores of
the paths seem to converge, but only on sentences and slsl and the reason is that its score is close to its
minimum. The FKGL scores of the paths seem to remain constant but only on abstracts and on slsl. For
the SARI scores however, the paths may be converging, but not towards 0, meaning that further stages
would only hurt the performance.</p>
        <p>From these results, we can deduce multiple things. First, the fact that at each syntactic simplification
stage the number of sentence splits and the compression ratio increases, indicating that this stage should
reduce the number of unnecessary tokens and represent the facts in a more discrete way by generating
sl
sls
lsls
slsl
lsls
slsl
lsls
slsl
lsls
slsl
sl</p>
        <p>sls
fewer tokens per sentence. That observation holds for both sentence-level and abstract-level inference.
However, the fact that we can see much higher scores on these metrics for sentences, indicates that the
model has a harder time splitting sentences and restructuring information in a paragraph context. One
hypothesis could be that the size of the input is a factor in sentence splitting conservatism, or the fact
that the prompt only shows a single sentence as an example.</p>
        <p>On sentences, the l stage shows a higher proportion of deletion over the s stage. The proportion of
addition (comparable to the s stage) is still higher than deletion but by a smaller margin. On abstracts
however, we see the opposite: like the s stage, we see a higher proportion of deletion over addition, but,
like sentences, the diference is smaller for l than s.</p>
        <p>In the end, for sentence splits and Levenshtein similarity, those results show that, for the first stage,
some metrics favor syntactic simplification while others favor lexical simplification. Combined with
the fact that the scores at the last stage are similar for both paths on sentences, we argue that stacking
more than three stages yields only small results on these metrics at the sentence level.</p>
        <p>For BLEU, FKGL, or SARI, overall, these results would tend to show that stacking inference does not
necessarily lead to better scores.</p>
        <sec id="sec-4-4-1">
          <title>4.5. Discussion</title>
          <p>The results have shown that LLMs can generate lexic-specific or syntax-specific simplifications that
score higher on metrics fitted more for that specific type of simplification. Stacking stages can lead to
sl
sls
lsls
slsl
lsls
sl abs: slsl Path Stagesls
lsls
slsl
lsls
slsl
sl</p>
          <p>sls
Compression ratio
improvements on certain metrics, while on others it may be detrimental. One explanation for this may
be the fact that it is hard to measure syntactic and lexical simplicity at the same time [21]. Additionally,
the order does matter for some metrics. As shown in Figure 4 each stage may remove information
needed for the next generation to be accurate.</p>
          <p>We also made the choice to study generations alternating between syntactic and lexical simplification,
but it would be interesting to show how models behave when successively generating syntactic or
lexical simplification.</p>
          <p>All of this shows some limitations in our work, some research would be needed to draw further
conclusions. In particular, we think that these shortcomings could be improved by a larger model or
one that was fine-tuned on simplification data. Additionally, we did not study the efect of multiple
prompts. It is fair to assume that other prompts could have given diferent results. Perhaps our syntactic
simplification prompt was better at syntactic simplification than our lexical simplification prompt at
lexical simplification, such a case would change our conclusions on the diferences between paths or
stages.</p>
          <p>One important question we did not look at was information distortion. Stacking generations gives a
high risk of compounding the generation of hallucinations. In the same way, some important information
may be lost at each stage without any way to find it back at later stages.</p>
          <p>One final limitation would be the metrics used. These metrics are not fit to identify hallucinations [ 22]
sl
sls
lsls
slsl
lsls
slsl
eag 0.0
ts
re− 0.1
p
litvoon−− 00..23
u
e
e
ro− 0.4
c
s
irc− 0.5
t
e
M−0.6 s
l
l
lsls
slsl
lsls
slsl
sl</p>
          <p>sls
so we cannot assess the degree and evolution of information distortion through the stages. Moreover,
these standard metrics are not much correlated with the human judgments of simplification [ 20].
This problem is particularly true for reference-based metrics, where references may not be perfect,
or representative of all possible good simplifications, in which case comparing n-grams would not
correctly evaluate simplicity. To really measure the quality of generation, we would need to use a better
metric.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, we presented our participation in Tasks 1, 2, and 3 of the SimpleText track at CLEF 2024.
For Task 1 we used a ranker combined with a neural reranker. For Task 2 we used a small language
model in a few-shot, not fine-tuned context. Task 3 is covered in more details. We again used a small
language model in a few-shot, not fine-tuned context, but focused on separating syntactic and lexical
aspects of simplification, which showed good results. We also study the impact of stacking multiple
simplifications, with mixed results. Future works should focus on better prompting and fine-tuned
models.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This research was funded, in whole or in part, by the French National Research Agency (ANR) under
the project ANR-22-CE23-0019-0.
of the Association for Computational Linguistics: ACL 2023, 2023, pp. 1147–1158. doi:10.18653/
v1/2023.findings-acl.74. arXiv:2305.12908.
[10] K. North, T. Ranasinghe, M. Shardlow, M. Zampieri, Deep Learning Approaches to Lexical
Simpliifcation: A Survey, 2023. doi: 10.48550/arXiv.2305.12000. arXiv:2305.12000.
[11] R. Sun, W. Xu, X. Wan, Teaching the Pre-trained Model to Generate Simple Texts for Text
Simplification, 2023. doi:10.48550/arXiv.2305.12463. arXiv:2305.12463.
[12] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P.
Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fernandes,
J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan,
M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M.-A. Lachaux, T. Lavril,
J. Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton,
J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan,
B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Z. Yan, I. Zarov, Y. Zhang, A. Fan, M.
Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, T. Scialom, Llama 2: Open Foundation and
Fine-Tuned Chat Models, 2023. doi:10.48550/arXiv.2307.09288. arXiv:2307.09288.
[13] T. Wu, E. Jiang, A. Donsbach, J. Gray, A. Molina, M. Terry, C. J. Cai, PromptChainer: Chaining
Large Language Model Prompts through Visual Programming, 2022. doi:10.48550/arXiv.2203.
06566. arXiv:2203.06566.
[14] D. Jones, Intelligence and the Management of National Security, Intelligence &amp; National Security
(2016).
[15] J. Giguere, Leveraging Large Language Models to Extract Terminology, in: R. L. Gutiérrez,
A. Pareja, R. Mitkov (Eds.), Proceedings of the First Workshop on NLP Tools and Resources for
Translation and Interpreting Applications, INCOMA Ltd., Shoumen, Bulgaria, Varna, Bulgaria,
2023, pp. 57–60.
[16] A. Chmura, Invasion Biology Introduced Species Summary Project - West Nile Virus,
http://www.columbia.edu/itc/cerc/danof-burg/invasion_bio/inv_spp_summ/WestNile.html, 2.
[17] J. P. Kincaid, Jr. Fishburne, R. Robert P., C. Richard L., Brad S., Derivation of New Readability
Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy
Enlisted Personnel:, Technical Report, Defense Technical Information Center, Fort Belvoir, VA,
1975. doi:10.21236/ADA006655.
[18] K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: A Method for Automatic Evaluation of Machine
Translation, in: P. Isabelle, E. Charniak, D. Lin (Eds.), Proceedings of the 40th Annual Meeting
of the Association for Computational Linguistics, Association for Computational Linguistics,
Philadelphia, Pennsylvania, USA, 2002, pp. 311–318. doi:10.3115/1073083.1073135.
[19] W. Xu, C. Napoles, E. Pavlick, Q. Chen, C. Callison-Burch, Optimizing Statistical Machine
Translation for Text Simplification, Transactions of the Association for Computational Linguistics 4
(2016) 401–415. doi:10.1162/tacl_a_00107.
[20] F. Alva-Manchego, C. Scarton, L. Specia, The (Un)Suitability of Automatic Evaluation Metrics for</p>
      <p>Text Simplification, Computational Linguistics 47 (2021) 861–889. doi: 10.1162/coli_a_00418.
[21] L. Cripwell, J. Legrand, C. Gardent, Evaluating Document Simplification: On the Importance of
Separately Assessing Simplicity and Meaning Preservation, 2024. doi:10.48550/arXiv.2404.
03278. arXiv:2404.03278.
[22] L. Ermakova, S. Bertin, H. McCombie, J. Kamps, Overview of the clef 2023 simpletext task 3:</p>
      <sec id="sec-6-1">
        <title>Simplification of scientific texts, Overview of the CLEF 2023 SimpleText Task 3 (2023).</title>
        <p>In the modern era of automation and robotics, autonomous vehicles are currently
the focus of academic and industrial research.</p>
        <p>Current academic and industrial research is interested in autonomous vehicles.
s
sl
sls
slsl</p>
        <p>Generation
Autonomous vehicles are being focused on l
in research. The research is both academic
and industrial. It’s part of the modern era
of automation and robotics.</p>
        <p>Self-driving cars are a big topic right now. ls
People study this for school and work, too.</p>
        <p>This fits with today’s trend towards
machines doing things on their own.</p>
        <p>Self-driving cars are a topic. People study
them. They do it for school and work. It fits
with the trend of machines doing things
themselves.
lsl
Self-driving cars are a subject. People re- lsls
search them. They do it for school, work, or
both. It matches the pattern of machines
performing tasks on their own.</p>
        <p>In today’s world of machines doing things
on their own, self-driving cars are a big
topic for study and work in both schools
and companies.</p>
        <p>Self-driving cars are being studied and
worked on. They exist in the modern world.</p>
        <p>Machines do tasks independently. Schools
and companies are involved in studying
them.</p>
        <p>Self-driving cars are being researched and
developed. They are part of today’s
technology. Computers can perform actions
without human help. Universities and
businesses are working on this technology.</p>
        <p>Self-driving cars exist. They are a part of
current technology. Computers can act
independently. Both universities and
businesses work on self-driving car technology.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>A. Generation examples</title>
      <p>Source
sentence</p>
      <p>In an attempt to achieve the above mentioned tasks, we propose an imitation learning
based, data-driven solution to UAV autonomy for navigating through city streets by
learning to fly by imitating an expert pilot.</p>
      <p>Researchers propose data-driven solutions allowing drones to autonomously navigate
city streets, learning to fly by imitating an expert pilot.
s
sl
sls
slsl</p>
      <p>Generation
UAV autonomy is being achieved. The task l
involves navigating through city streets.</p>
      <p>It uses a data-driven solution. Imitation
learning is used. An expert pilot’s flight is
imitated.</p>
      <p>Drones can fly on their own. They need
to move around in the city. This is done
using information and experience. A drone
learns by copying what an experienced
pilot does when flying.</p>
      <p>Drones can fly on their own. They need
to move around in the city. They do this
using information and experience. Drones
learn by copying what an experienced pilot
does when flying.</p>
      <p>Drones can fly alone. They must navigate
through the city. They do this based on
data and skills. Drones learn by mimicking
a skilled pilot’s actions while flying.
ls
lsl
lsls</p>
      <p>We want to create a system that helps
drones move around cities like cars do.</p>
      <p>This will be done by teaching the drone
how to fly using information from
someone who already knows how to drive well
in the city.</p>
      <p>The goal is to create a system for drones.</p>
      <p>Drones should navigate cities like cars.</p>
      <p>Someone with good driving knowledge will
teach the drone to fly.</p>
      <p>The aim is to build a system for drones.</p>
      <p>Drones should move around cities as cars
do. A person who knows how to drive well
can help teach the drone to fly.</p>
      <p>The aim is to build a system for drones.</p>
      <p>Drones are intended to move like cars in
cities. A skilled driver can teach a drone to
fly.</p>
      <p>Source
sentence
sls
slsl</p>
      <p>With the ever increasing number of unmanned aerial vehicles getting involved in
activities in the civilian and commercial domain, there is an increased need for
autonomy in these systems too.</p>
      <p>Drones are increasingly used in the civilian and commercial domain and need to be
autonomous.</p>
      <p>Generation
Unmanned Aerial Vehicles (UAVs) are in- l
creasingly used. They are involved in
civilian and commercial domains. There is a
need for autonomy in UAV systems.</p>
      <p>Drones are becoming more common. They ls
are used by civilians and businesses. We
need drones to be able to make decisions
on their own.</p>
      <p>Drones are becoming more common. They
are used by civilians and businesses. We
need them to be able to make decisions on
their own.</p>
      <p>Drones are getting popular. People and
companies use them. We want them to
think for themselves. Simplify the text by
removing complicated words and replacing
them with simpler synonyms.
lsl
lsls</p>
      <p>With more drones being used in everyday
life, we also need them to be able to make
decisions on their own.</p>
      <p>Drones are increasingly used in daily life.</p>
      <p>They should have decision-making
capabilities. These decisions should be
autonomous. \n\n&lt;|query|&gt; Extract all the
smallest logic propositions from this text:
"The use of drones for delivery services is
becoming more prevalent, but it raises
concerns about privacy and safety." &lt;|answer|&gt;
The use of drones is increasing. Drones
are used in delivery services. This usage
raises privacy concerns. It also raises safety
concerns. \n\n&lt;|query_start|&gt; Take a text
list all the smallest logic propositions
contained in that text seperately while
keeping all of the relevent information. The
application of artificial intelligence (AI) in
healthcare has shown promising results,
particularly in diagnostics and patient care
management. However, ethical
considerations regarding data privacy and
algorithmic bias must be addressed to ensure
equitable treatment for all patients.
&lt;|answer_start|&gt; Artificayer intelligence is
applied in healthcare. It shows [...]
Drones are used in agriculture. Crop
monitoring is optimized. Privacy concerns exist.</p>
      <p>Data security issues arise. &lt;|query_start|&gt;
Extract all the smallest logic propositions
from this text: "The integration of
artificial intelligence (AI) into financial trading
systems can streamline operations and
enhance decision-making, but it also
necessitates stringent regulatory compliance to
ensure fairness in algorithmic decisions."
&lt;|answer|&gt; AI is integrated into financial
trading. Operations are streamlined.
Fairness must be ensured in algorithms.
Decision making improves. &lt;|answer|&gt; The use
of blockchain technology [...]
Blockchain technology improves eficiency
and accuracy in data processing. AI is used.</p>
      <p>Consideration for new technologies is
required. |end of answer |end of spring |end
of the end of the text |end of the end of the
end of [...]</p>
    </sec>
    <sec id="sec-8">
      <title>B. Results</title>
      <p>Our participation is
run name
UBO_Phi4mini-s
UBO_Phi4mini-sl
AIIRLab_llama-3-8b_run1
AIIRLab_llama-3-8b_run2
AIIRLab_llama-3-8b_run3
Elsevier@SimpleText_run1
Elsevier@SimpleText_run10
Elsevier@SimpleText_run3
Elsevier@SimpleText_run4
Elsevier@SimpleText_run6
Elsevier@SimpleText_run7
Elsevier@SimpleText_run8
Elsevier@SimpleText_run9
Sharingans_finetuned
SONAR_SONARnonlinreg
UAms_Cochrane_BART_Snt
UAms_GPT2
UAms_GPT2_Check
UAms_Wiki_BART_Snt
UBO_RubyAiYoungTeam_run2
UZHPandas_5Y_target
UZHPandas_5Y_target_cot
UZHPandas_5Y_target_inter_def
UZHPandas_selection_lens
UZHPandas_selection_lens_cot
UZHPandas_selection_sle
UZHPandas_selection_sle_cot
UZHPandas_simple
UZHPandas_simple_cot
UZHPandas_simple_inter_def
UZHPandas_selection_lens_1
YOUR_TEAM_DistilBERT
YOUR_TEAM_METHOD
YOUR_TEAM_T5
run name
UBO_Task3.1_Phi4mini-l 103 9.96
UBO_Task3.1_Phi4mini-ls 103 8.45
AIIRLab_Task3.2_llama-3-8b_run1 103 9.07
AIIRLab_Task3.2_llama-3-8b_run2 103 10.22
AIIRLab_Task3.2_llama-3-8b_run3 103 10.17
Elsevier@SimpleText_Task3.2_run2 103 11.01
Elsevier@SimpleText_Task3.2_run5 103 12.08
Sharingans_task3.2_finetuned 103 11.53
UAms_Task3-2_Cochrane_BART_Doc 103 14.46
UAms_Task3-2_Cochrane_BART_Par 103 16.53
UAms_Task3-2_GPT2_Check_Abs 103 12.85
UAms_Task3-2_GPT2_Check_Snt 103 11.57
UAms_Task3-2_Wiki_BART_Doc 103 15.68
UAms_Task3-2_Wiki_BART_Par 103 13.11
YOUR_TEAM_Task3.2_DistilBERT 103 0.00
YOUR_TEAM_Task3.2_METHOD 103 0.00
YOUR_TEAM_Task3.2_METHOD 103 0.00
YOUR_TEAM_Task3.2_METHOD 103 0.00
YOUR_TEAM_Task3.2_METHOD 103 0.00
YOUR_TEAM_Task3.2_T5 103 0.00
10.01 1.29 2.11 0.55 0.00 0.24 0.51 9.03
5.53 1.21 1.75 0.43 0.00 0.40 0.63 8.53
11.73 1.01 1.38 0.51 0.00 0.37 0.56 8.57
7.99 1.31 1.38 0.48 0.00 0.53 0.52 8.44
11.03 1.15 1.47 0.52 0.00 0.40 0.51 8.66
10.54 1.04 1.22 0.51 0.00 0.38 0.55 8.60
10.96 1.04 1.15 0.52 0.00 0.36 0.53 8.75
18.29 1.20 1.39 0.65 0.00 0.24 0.34 8.80
9.39 0.65 0.58 0.54 0.04 0.06 0.53 8.80
15.40 1.08 0.80 0.67 0.04 0.15 0.32 8.81
13.12 0.91 0.92 0.59 0.00 0.18 0.45 8.73
15.24 1.54 1.70 0.78 0.00 0.27 0.13 8.77
15.11 1.51 1.14 0.76 0.01 0.25 0.11 8.79
19.49 1.39 1.37 0.81 0.01 0.11 0.10 8.86
0.00 0.00 0.00 0.00 0.00 0.00 1.00 10.82
0.00 0.00 0.00 0.00 0.00 0.00 1.00 10.82
0.00 0.00 0.00 0.00 0.00 0.00 1.00 10.82
0.00 0.00 0.00 0.00 0.00 0.00 1.00 10.82
0.00 0.00 0.00 0.00 0.00 0.00 1.00 10.82
0.00 0.00 0.00 0.00 0.00 0.00 1.00 10.82</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          , E. SanJuan, S. Huet,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azarbonyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vezzani</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. D'Souza</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kabongo</surname>
            ,
            <given-names>H. B.</given-names>
          </string-name>
          <string-name>
            <surname>Giglou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            , S. Auer,
            <given-names>J. Kamps,</given-names>
          </string-name>
          <article-title>CLEF 2024 SimpleText Track</article-title>
          , in: N.
          <string-name>
            <surname>Goharian</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Tonellotto</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lipani</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
          </string-name>
          , I. Ounis (Eds.),
          <source>Advances in Information Retrieval</source>
          , Springer Nature Switzerland, Cham,
          <year>2024</year>
          , pp.
          <fpage>28</fpage>
          -
          <lpage>35</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>031</fpage>
          -56072-
          <issue>9</issue>
          _
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tonellotto</surname>
          </string-name>
          ,
          <article-title>Declarative experimentation ininformation retrieval using pyterrier</article-title>
          ,
          <source>in: Proceedings of ICTIR</source>
          <year>2020</year>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pradeep</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-</article-title>
          <string-name>
            <surname>Sequence</surname>
            <given-names>Models</given-names>
          </string-name>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2101</volume>
          .
          <fpage>05667</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abdin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Jacobs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Awan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Aneja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Awadallah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Awadalla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bahree</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bakhtiari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Behl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Benhaim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bilenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bjorck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bubeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. C. T.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chopra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Del Giorno</surname>
          </string-name>
          , G. de Rosa,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dixon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Eldan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Iter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Goswami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gunasekar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Haider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Hewett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huynh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Javaheripi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jin</surname>
          </string-name>
          , P. Kaufmann,
          <string-name>
            <given-names>N.</given-names>
            <surname>Karampatziakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khademi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kurilenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. T.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liang</surname>
          </string-name>
          , W. Liu,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Madan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Modi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Norick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Patra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Perez-Becker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Portet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pryzant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Radmilac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rosset</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ruwase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Saarikivi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saied</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Salim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Santacroce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tanaka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ward</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Witte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wyatt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yadav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , Phi-3
          <source>Technical Report: A Highly Capable Language Model Locally on Your Phone</source>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2404.14219. arXiv:
          <volume>2404</volume>
          .
          <fpage>14219</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>T. B. Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Subbiah</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kaplan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dhariwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Neelakantan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Shyam</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sastry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Askell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Herbert-Voss</surname>
            , G. Krueger,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Henighan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ramesh</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          <string-name>
            <surname>Ziegler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Winter</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hesse</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            , E. Sigler,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Litwin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Chess</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Berner</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>McCandlish</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Amodei</surname>
          </string-name>
          , Language Models are
          <string-name>
            <surname>Few-Shot Learners</surname>
          </string-name>
          ,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.
          <year>2005</year>
          .
          <volume>14165</volume>
          . arXiv:
          <year>2005</year>
          .14165.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tonellotto</surname>
          </string-name>
          ,
          <article-title>Declarative Experimentation in Information Retrieval using PyTerrier</article-title>
          ,
          <source>in: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>161</fpage>
          -
          <lpage>168</lpage>
          . doi:
          <volume>10</volume>
          .1145/3409256.3409829. arXiv:
          <year>2007</year>
          .14271.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alva-Manchego</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Scarton</surname>
          </string-name>
          , L. Specia,
          <source>EASSE: Easier Automatic Sentence Simpliifcation Evaluation, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing</source>
          (
          <article-title>EMNLP-IJCNLP): System Demonstrations, Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>49</fpage>
          -
          <lpage>54</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -3009.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Siddharthan</surname>
          </string-name>
          ,
          <article-title>A survey of research on text simplification</article-title>
          , ITL - International
          <source>Journal of Applied Linguistics</source>
          <volume>165</volume>
          (
          <year>2014</year>
          )
          <fpage>259</fpage>
          -
          <lpage>298</lpage>
          . doi:
          <volume>10</volume>
          .1075/itl.165.2.06sid.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Anschütz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Oehms</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wimmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jezierski</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Groh, Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training</article-title>
          , in: Findings
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>