<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SINAI Participation in SimpleText Task 2 at CLEF 2024: Zero-shot Prompting on GPT-4-Turbo for Lexical Complexity Prediction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jenny Ortiz-Zambrano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>César Espin-Riofrio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arturo Montejo-Ráez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Guayaquil</institution>
          ,
          <addr-line>Delta Av. s/n, Guayaquil, 090510</addr-line>
          ,
          <country country="EC">Ecuador</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Jaén</institution>
          ,
          <addr-line>Las Lagunillas s/n, Jaén, 23071</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this article, we present our participation in Tasks 2.1 and 2.2 of the SimpleText track of CLEF 2024. Our work focused on the implementation of zero-shot learning using the GPT-4 Turbo autoregressive model. To this end, we develop and evaluate various cues to optimize the model's ability to predict lexical complexity. The results of our experiments indicated that GPT-4 Turbo can perform this task with remarkably robust performance, demonstrating its potential to assess language complexity efectively without the need for additional training.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Lexical Complexity Prediction</kwd>
        <kwd>Auto-regressive models</kwd>
        <kwd>GPT-4 Turbo</kwd>
        <kwd>Prompting</kwd>
        <kwd>Zero-shot learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Readability is defined as the quality that makes a text more accessible and easier to read [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], but for
many people, the way a text is written can be a barrier to understanding its content [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] due to the
presence of infrequent or unknown words, and phrases with lexical and semantic complexity which
drastically complicate the reader’s understanding [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]; this is especially evident in cases such as children,
non-native speakers [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and people with various cognitive abilities or reading disabilities [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The
success or failure of understanding a text will depend on the reader’s prior knowledge of the meaning
of the words [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Information technologies have facilitated access to a wide and abundant amount of information in
various fields such as education, news, social networks, health, government, and also science; In the
case of scientific literature to the general public, it has increased thanks to digitalization. However,
this wealth of information is not available to everyone, since many people face significant obstacles to
understanding , such as the complexity of grammatical structures, the use of technical language and
the length of sentences, which directly afects individuals with intellectual disabilities, people with low
levels of literacy, and even university students who, despite their academic training and specialized
knowledge, can also be found among those who experience dificulties in reading and understanding
complex texts [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In the case of scientific information, a significant barrier persists that makes direct
access to scientific knowledge from the original sources dificult, where one of the main obstacles lies
in the complexity of scientific texts, which presents dificulties for those without experience in the field
due to to the lack of prior specialized knowledge [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        The SimpleText laboratory [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is part of the CLEF 2024 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] initiative, which promotes the systematic
evaluation of information access systems through experimentation with shared tasks. SimpleText
addresses the challenges of text simplification, especially in the context of making scientific information
accessible. In this sense, it provides relevant data and reference points, as the general public tends
to avoid reliable sources such as scientific literature due to its complex language and lack of prior
knowledge. Instead, they rely on superficial and derivative sources on the web and social media, which
are often published for commercial or political rather than informational purposes.
      </p>
      <p>Can simplifying the text help overcome these barriers to access? That is the central question that
this task addresses. Its objective is to generate a simplified summary of several scientific documents,
based on a popular science query. This summary will provide the user with an accessible overview of
the specific topic, with the hope of making scientific information more accessible and understandable
to a broader audience.</p>
      <p>The main objective of this research is to demonstrate the capacity of the language model based on
Transformers like GPT-4 Turbo for lexical simplification. To achieve this, several variants of prompts
without samples have been created and evaluated. This approach allows us to determine which
concepts in scientific summaries require additional explanation and contextualization to help the reader
understand the scientific text more efectively.</p>
      <p>The article follows the following structure: first, a brief description is given of the current
stateof-the-art in the prediction of lexical complexity and the identification of complex words (which are
synonymous with the same natural language processing task). Section 3 presents Task 2 of the Simple
Text track. Section 5 details our solution and the results obtained using diferent indication variants.
Finally, Section 6 presents the conclusions and raises some ideas for future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>In recent years, Complex Word Identification (CWI) has attracted great interest in the scientific
community and among computational linguistics researchers. These initiatives have significantly boosted
development and research in the field of text simplification and improving information accessibility.
The proposals include conferences, workshops and specific tasks that address challenges and
promote advances in the simplification of texts, thus facilitating greater access to information for various
audiences.</p>
      <p>
        This advancement is reflected in the development of computational semantic analysis systems, as
illustrated by several shared Complex Word Identification (CWI) tasks at notable events such as SemEval
2016 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], NAACL-HTL 2018 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], ALexS at IberLEF 2020 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the 15th edition of SemEval and the first
lexical complexity prediction task [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], TSAR-2022 - the workshop on simplicfiation, accessibility and
readability of texts [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], the SimpleText tasks in CLEF in 2021 [16], 2022 [17], and 2023 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] among other
relevant initiatives in this area.
      </p>
      <sec id="sec-2-1">
        <title>2.1. GPT-4 Turbo for solving NLP tasks</title>
        <p>State-of-the-art deep learning models, such as BERT [18], RoBERTa [19], GPT-3 [20], and GPT-4 Turbo
[21], significantly outperform existing traditional approaches. GPT-4 Turbo, the transformer-based
large language model generation developed by OpenAI1, is a prominent example of these advances.
These models have demonstrated superior capabilities in various natural language processing tasks,
raising the bar for performance and accuracy in the field.</p>
        <p>GPT-3 Contains 175 billion parameters. Due to its extensive language knowledge, enormous
processing power, and potential to learn from large amounts of online text data, GPT-3 can perform a wide
variety of general natural language-based tasks with unprecedented ease, such as the generation and
classification of text. [ 22]. The enormous scale of the model allows generating results with quality,
precision and diversity of the generated content. This development has sparked considerable interest
and concern in the field of NLP, the machine learning industry in general, the media, the AI ethics
communities and civil society [23], The latest GPT-4 Turbo model of the OpenAI has been recognized
for its ability to process text and images [24].</p>
        <p>Although GTP-3 is a generative model, several approaches may be necessary to classify text, mainly
ranging from zero-shot classification to single-shot and few-shot classification [ 22]. In zero-shot learning
no samples of data are needed [25] .</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. SimpleText@CLEF-2024 Tasks</title>
      <p>3.1. Task 2: “What is unclear?” Dificult concept identification and explanation
The goal of this task is to identify key concepts that need to be contextualized with a definition, example,
and/or use-case and provide useful and understandable explanations for them. Thus, there are three
subtasks:
1. To predict what are the terms in a passage of a document and their dificulty
(easy/medium/dificult).
2. To generate a definition and an explanation only for the dificult terms.
3. To retrieve the provided definitions of the dificult terms in “correct” order: manual (2), generated
positive 1 (1), generated positive 2 (1), generated negative 1 (0), generated negative 2 (0).</p>
      <p>In Task 2.1, for each passage of a document, participants should provide a list of terms with
corresponding dificulty scores (easy/medium/dificult). Passages (sentences) are considered to be
independent, so term repetition is allowed (the same term can be detected in diferent sentences, even
in the same document).</p>
      <p>In Task 2.2, for each dificult term (terms that have been evaluated with the highest level of dificulty),
participants should provide the definition and explanation which will be evaluated both from a
qualitative point of view (manual review by terminologists) and from a quantitative point of view
(applying overlapping text measures like BLUE, ROUGE, etc.).</p>
      <p>In Task 2.3, participants should rank the set of definitions provided for the dificult terms in a
way that the “best” definitions are ranked higher in the list of definitions. In particular, for each
term there will be one manual definition (considered the best one) and two automatically generated
good definitions that should be placed at the top of the list of retrieved definitions. Quantitative
metrics (for example, P@1, P@3, rank correlation measures) will be used to evaluate participants’results.</p>
      <p>Task 2.1 and Task 2.3 can be performed independently. Participants that want to run experiments on
Task 2.2 need to accomplish Task 2.1 first.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed system</title>
      <p>As mentioned above, our approach is to apply learning cues without prior examples to the GPT-4 Turbo
model, extracting predictions from the generated sequence. Details of the model configuration are
presented in Table 1. We use the OpenAI API in Python to interact with the model, allowing us to
easily integrate GPT-4 Turbo into our workflow. In addition, the OpenAI Playground ofers an intuitive
web interface that facilitates experimentation and rapid generation of code necessary for our tests and
developments.</p>
      <sec id="sec-4-1">
        <title>4.1. Experiments with zero-shot learning</title>
        <p>We execute the model applying zero-shot learning where no examples are specified for learning and a
sample of a generated output</p>
        <sec id="sec-4-1-1">
          <title>Prompt PRM_ZS_TASK2_V1:</title>
          <p>Performing the following actions according to a given text:
Select a number from 1 to 5 terms within the sentence named text
Assign the difficulty of the words: easy, medium and difficult, giving
the initial letter of each difficulty as an answer ([e]fácil/[m]edium/[d]ifficul)
Give the definition of each difficult term, add an explanation and use
case according to context of the sentences.</p>
          <p>Provide the response in following JSON format:
{
"word": &lt;word&gt;,
"difficulty": &lt;word_difficult&gt;,
"definition": "&lt;definition&gt;",
"explanation": &lt;explanation&gt;
}
Context: {{context}}
Text : {{text}}
Response: .
##
Context: Digital assistant
Text : Abstract Novel technological advances in mobile devices and
applications can be exploited in wildfire confrontation, enabling
endusers to easily conduct several everyday tasks, such as access to data
and information, sharing of intelligence and coordination of personnel
and vehicles.</p>
          <p>Response:
json
[
{
},
{
"word": "exploited",
"difficulty": "m",
"definition": "make full use of and derive benefit from (a
resource)",
"explanation": "In the context, ’exploited’ refers to utilizing
the new technological capabilities of mobile devices and
applications to their full advantage, specifically in managing
and combating wildfires.",
"use_case": "The novel technologies can be exploited to improve
the efficiency and effectiveness of wildfire management."
"word": "confrontation",
"difficulty": "m",
"definition": "a hostile or argumentative meeting or situation
between opposing parties",
"explanation": "Here, ’confrontation’ is used metaphorically to
describe the act of dealing with wildfires, suggesting a direct
and active engagement with the fires.",
"use_case": "Mobile technologies are being used in the
confrontation of wildfires, aiding in direct engagement and
management."
"word": "coordination",
"difficulty": "m",
"definition": "the organization of the different elements of a
complex body or activity so as to enable them to work together
effectively",
"explanation": "In this context, ’coordination’ refers to the
organized management of personnel and vehicles to tackle
wildfires efficiently, facilitated by mobile technology.",
"use_case": "Effective coordination of personnel and vehicles
is crucial for the successful management of wildfire incidents."</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>Prompt PRM_ZS_TASK2_V2:</title>
          <p>Depending on the text, select up to 5 terms considered complex
within the sentence, then assign the complexity of the terms to
easy, medium and difficult using the first letter of each word, for
example if the complexity is easy add the letter "e", If the
complexity of the term is medium, add the letter "m", and if the
complexity of the term is difficult, you should add the letter "d".
Then, from each complex term generate its definition, an explanation
and a use case according to the context of the sentence. Finally,
provide the response in the following JSON format:
{
}
"word": &lt;word&gt;,
"difficulty" &lt;difficult_word&gt;,
"definition": "&lt;definition&gt;",
"explanation" : &lt;explanations&gt;
##
Context: Drug discovery
Sentence: A pharmacophore analysis approach was used to investigate and
compare different classes of compounds relevant to the drug discovery
process (specifically, drug molecules, compounds in high throughput
screening libraries, combinatorial chemistry building blocks and
nondrug molecules).</p>
          <p>Response:
json
[
{
},
{
"word": "pharmacophore",
"difficulty": "d",
"definition": "A set of structural features in a molecule that
is recognized by a biological macromolecule and is responsible
for its biological activity.",
"explanation": "In drug discovery, a pharmacophore analysis is
used to identify these structural features which can interact
with a biological target, such as a protein, to induce a
therapeutic effect. This helps in designing or identifying new
drugs that can mimic or block the biological activity.",
"use_case": "In the sentence, a pharmacophore analysis approach
helps in comparing different classes of compounds to find
potential drug candidates."
"word": "combinatorial chemistry",
"difficulty": "d",
"definition": "A method in chemistry where different
combinations of building blocks are systematically mixed to
generate a large number of different compounds.",
"explanation": "Combinatorial chemistry is used in drug
discovery to rapidly synthesize and screen large libraries of
compounds for potential drug activity. It allows researchers to
},
{
},
{
},
{
explore a wide variety of chemical structures.",
"use_case": "In the sentence, combinatorial chemistry building
blocks are mentioned as part of the classes of compounds
relevant to drug discovery."
"word": "high throughput screening",
"difficulty": "d",
"definition": "A method used in drug discovery to quickly
conduct millions of chemical, genetic, or pharmacological
tests.",
"explanation": "High throughput screening (HTS) is crucial in
the early stages of drug discovery. It allows researchers to
quickly identify active compounds, antibodies, or genes that
modulate a particular biomolecular pathway.",
"use_case": "The sentence refers to compounds in high
throughput screening libraries, indicating these are tested in
large-scale assays to find promising drug candidates."
"word": "nondrug",
"difficulty": "m",
"definition": "Substances or compounds that are not considered
drugs and do not have therapeutic effects.",
"explanation": "In the context of drug discovery, nondrug
molecules are those that are used as controls or are part of
the compound libraries but are not expected to lead to a
therapeutic drug.",
"use_case": "The sentence compares nondrug molecules with drug
molecules and other compounds to understand their relevance or
irrelevance in the drug discovery process."
"word": "compounds",
"difficulty": "e",
"definition": "Chemical substances composed of two or more
elements chemically bonded together.",
"explanation": "In the context of the sentence, ’compounds’
refers to various chemical entities that are being investigated
in the drug discovery process to find those with potential as
drug molecules.",
"use_case": "The sentence discusses investigating different
classes of compounds in relation to their relevance to drug
discovery."</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>Prompt PRM_ZS_TASK2_V3</title>
          <p>Identify up to a total of 5 complex words in the sentence and determine
the difficulty of each complex word by classifying it into the category
of "easy", "medium" or "difficult". For the "easy" difficulty place the
letter "e", for the "medium" difficulty place the letter "m", and for
the "hard" difficulty place the letter "d". Furthermore, in the case of
words identified as complex taking into account the context of the
sentence, three things must be generated: a definition, an explanation,
and a use case.</p>
          <p>Finally, provide the response in the following JSON format:
{
"word": &lt;word&gt;,
"difficulty" :&lt;hard_word&gt;,
"definition": "&lt;definition&gt;",
"explanation" : &lt;explanations&gt;
##
Context: Humanoid robots
Sentence: In many situations, users walk on typical paths between
specific destinations at which the service of a mobile robot is needed.
},
{
},
{
}
"word": "humanoid",
"difficulty": "m",
"definition": "having an appearance or character resembling
that of a human.",
"explanation": "In the context of robots, ’humanoid’ refers to
robots designed to look and sometimes behave like humans, often
used to interact with environments built for human use.",
"use_case": "The humanoid robot was programmed to assist
customers in the store."
"word": "destinations",
"difficulty": "m",
"definition": "the place to which someone or something is going
or being sent.",
"explanation": "In the given sentence, ’destinations’ refers to
the specific places where the mobile robot is required to
perform services.",
"use_case": "The tourists set off early in the morning to reach
their destinations before sunset."
"word": "mobile",
"difficulty": "e",
"definition": "able to move or be moved freely or easily.",
"explanation": "Here, ’mobile’ describes a robot that is
capable of moving around rather than being fixed in one
location.",
"use_case": "The company developed a mobile app to increase
accessibility for users on the go."
]
###</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results</title>
        <p>The Table 2 presents various results in response to the approach of sub-task 2.1. To automatically
support this task, we have proposed an innovative approach that explores the potential of GPT-4 Turbo,
an advanced large language model (LLM) developed by OpenAI as in the work done by [21]. On
this occasion, we seek to evaluate the improvements and additional capabilities of GPT-4 Turbo in
comparison with the proposal made for the same task in the SimpleText 2023 workshop where the
results obtained were outstanding, demonstrating the efectiveness of the GPT-3 model, text-davinci-003
version [26].</p>
        <p>The Table 3 presents several results of sub-task 2.2 applying GPT-4 Turbo with zero-shot learning
according to the PRM_ZS_TASK2_V2 prompt. As we can see, the table illustrates an example that
corresponds to Snt_id G01.1_1000902583_1. In the solution proposed for sub-task 2.2, the model
generates a definition, an explanation, and also generates a use case illustrating in a broad way the
complexity of the terms for diferent groups of users, making suggestions for definitions, explanations
and use cases based on the context of the sentence (Abstract).</p>
        <p>As we can see, the table illustrates an example that corresponds to Snt_id G01.1_1000902583_1. In the
solution proposed for sub-task 2.2, the model generates a definition, an explanation, and also generates
a use case illustrating in a broad way the complexity of the terms for diferent groups of users, making
suggestions for definitions, explanations and use cases based on the context of the sentence (Abstract).</p>
        <p>Table 4, we present the oficial results published by the organizers [ 27], including the scores obtained
d
m
m
d
m
d
m
m
d
m
e
m
d
m
in our participation in subtasks 2.1 and 2.2. The meaning of each column is the following:
• recall overall: the proportion of terms (independently from the dificulty) that were found.
• recall average: the average of the recall of terms computed per sentence.
• recall dificult terms:</p>
        <p>the proportion of dificult terms that were found.
• precision dificult:</p>
        <p>the precision of terms that were labeled as dificult.</p>
        <p>• bleu_nx: the BLEU score computed with ngrams n =1, 2, 3, 4.</p>
        <p>According to the results presented in table 4, the UboNLP_Task2.1_phi3-oneshot team showed
the best overall performance in terms of recall_overall (0.5424) and recall_average (0.5620), which
indicates a high ability to retrieve relevant terms. However, its precision in dificult terms is relatively
low (0.3743). Sharingans_Task2.2_GPT also performs well with a recall_overall of 0.4722 and a
recall_average of 0.5302, in addition, it has a high recall_terms (0.5448), demonstrating a good ability to
handle dificult terms.</p>
        <p>According to the performance analysis in the dificult terms, the AIIRLab_Task2.2_LLaMAFT
and SINAI_task_2_PRM_ZS_TASK2_V3 teams have a precision_dificult_terms of 1.0000 and 0.8333
respectively, which indicates that these models are very precise with dificult terms, although their
recall_overall and recall_average are not as high in comparison with other equipment.</p>
        <p>In the performance in BLEU Scores, the unipd_t21t22_chatgpt teams and their variants (mod1
and mod2) have the best bleu_n1_average, bleu_n2_average, bleu_n3_average and bleu_n4_average
demonstrating good performance in terms of precision of n-grams in text generation. In the
case of AIIRLab_Task2.2_Mistral, this team has a good balance with a recall_overall of 0.4118
and precision_dificult_terms of 0.4907, suggesting that their model is quite robust in general
terms and in dificult terms. The SINAI team variants show variations in their results, with
SINAI_task_2_PRM_ZS_TASK2_V2 having the best performance among its variants, with a recall_overall
term
exploited
confrontation
coordination
d
m
m</p>
        <p>Definition: make full use of and derive benefit from (a
resource).</p>
        <p>Explication: In the context of the sentence, exploited
refers to utilizing the new technological capabilities of
mobile devices and applications to their fullest potential
in managing and combating wildfires. Use case: The
novel technologies can be exploited to improve the
eficiency and efectiveness of wildfire management.</p>
        <p>Definition: a hostile or argumentative meeting or
situation between opposing parties.</p>
        <p>Explication: Here, confrontation is used metaphorically
to describe the act of dealing with or managing wildfires,
which are seen as the opposing party to human safety
and property. Use case: Technological advances aid in
the confrontation of wildfires by providing better tools
for managing the crisis.</p>
        <p>Definition: the organization of the diferent elements
of a complex body or activity so as to enable them to
work together efectively.</p>
        <p>Explication: In this context, coordination refers to the
organized management of personnel and vehicles,
ensuring they operate in a synchronized and eficient manner
during wildfire emergencies. Use case: Efective
coordination of personnel and vehicles is crucial for rapid and
eficient wildfire suppression.</p>
        <p>Snt_id: G01.1_1000902583_1
Abstract: Novel technological advances in mobile devices and applications can be exploited in wildfire
confrontation, enabling end-users to easily conduct several everyday tasks, such as access to data and
information, sharing of intelligence and coordination of personnel and vehicles.</p>
        <p>dificulty
of 0.1556 and precision_dificult_terms of 0.7746. There are teams with values of 0 in almost all metrics,
which indicates very low performance or problems in the implementation of their models.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>The model has proven to be able to generate robust responses based on the text or instruction (message)
provided. We observe that GPT-4 Turbo analyzes the text, uses its extensive knowledge to identify
words that can be considered complex, and categorizes them according to their complexity. Our
preliminary analysis evaluates the model’s ability to understand and generate arguments in specific
contexts. The results show that GPT-4 Turbo is highly competent in natural language processing tasks,
such as predicting lexical complexity, demonstrating its efectiveness in identifying and categorizing
complex terms accurately and consistently.</p>
      <p>We have applied the GPT-4 Turbo model in the construction of several solutions for sub-tasks 2.1
and 2.2, where the model has demonstrated robust performance in the execution of natural language
processing tasks, specifically in lexical simplification and identification of complex words. Not only has
it successfully tackled these tasks, but it has also been able to generate detailed illustrations by creating
definitions, explanations and use cases based on the context of the sentence (text). By identifying
complex words and phrases and ofering simpler versions, it facilitates access to scientific sources in an
understandable way, helping readers to understand the complexity of terms in diferent user groups,
which makes it very useful for a general audience. and especially valuable for readers with reading
dificulties or cognitive disabilities.</p>
      <p>The results showed a variety of performances among the participating teams. Some teams, such as
UboNLP_Task2.1_phi3-oneshot and Sharingans_Task2.2_GPT, stand out for their overall performance
and ability to handle dificult terms, while other teams show specific strengths such as accuracy on
dificult terms or good BLEU scores. However, there are also teams with significantly low performances,
which could indicate the need for improvements in their approaches or implementations.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgments</title>
      <p>This work has been partially supported by projects CONSENSO (PID2021-122263OB-C21), MODERATES
(TED2021-130145B-I00), SocialTOX (PDC2022-133146-C21) funded by the Spanish Government.
in: Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022),
2022.
[16] L. Ermakova, P. Bellot, P. Braslavski, J. Kamps, J. Mothe, D. Nurbakova, I. Ovchinnikova, E. Sanjuan,
Overview of SimpleText CLEF 2021 workshop and pilot tasks, in: 12th Conference and Labs of the
Evaluation Forum (CLEF 2021), volume 2936, CEUR-WS. org, 2021, pp. 2212–2227.
[17] L. Ermakova, E. SanJuan, J. Kamps, S. Huet, I. Ovchinnikova, D. Nurbakova, S. Araújo, R.
Hannachi, E. Mathurin, P. Bellot, Overview of the clef 2022 simpletext lab: Automatic simplification
of scientific texts, in: A. Barrón-Cedeño, G. Da San Martino, M. Degli Esposti, F. Sebastiani,
C. Macdonald, G. Pasi, A. Hanbury, M. Potthast, G. Faggioli, N. Ferro (Eds.), Experimental IR Meets
Multilinguality, Multimodality, and Interaction, Springer International Publishing, Cham, 2022, pp.
470–494.
[18] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers
for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[19] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
RoBERTa: A robustly optimized BERT pretraining approach, arXiv preprint arXiv:1907.11692
(2019).
[20] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam,
G. Sastry, A. Askell, et al., Language models are few-shot learners, Advances in neural information
processing systems 33 (2020) 1877–1901.
[21] K. K. Shahandashti, M. Sivakumar, M. M. Mohajer, A. B. Belle, S. Wang, T. C. Lethbridge,
Evaluating the efectiveness of gpt-4 turbo in creating defeaters for assurance cases, 2024.
arXiv:2401.17991.
[22] S. Kublik, S. Saboo, GPT-3: Building innovative NLP products using large language models,</p>
      <p>O’Reilly Media (2022).
[23] A. Chan, GPT-3 and InstructGPT: technological dystopianism, utopianism, and “Contextual”
perspectives in AI ethics and industry, AI and Ethics 3 (2023) 53–64.
[24] Y. Hirano, S. Hanaoka, T. Nakao, S. Miki, T. Kikuchi, Y. Nakamura, Y. Nomura, T. Yoshikawa,
O. Abe, Gpt-4 turbo with vision fails to outperform text-only gpt-4 turbo in the japan diagnostic
radiology board examination, Japanese Journal of Radiology (2024) 1–9.
[25] A. Bhattacharjee, R. Morafah, J. Garland, H. Liu, Zero-shot llm-guided counterfactual generation
for text, 2024. arXiv:2405.04793.
[26] J. Ortiz-Zambrano, C. Espin-Riofrio, A. Montejo-Ráez, Sinai participation in simpletext task 2 at
clef 2023: Gpt-3 in lexical complexity prediction for general audience (2023).
[27] G. M. D. Nunzio, et al., Overview of the CLEF 2024 SimpleText task 2: Identify and explain dificult
concepts, in: G. Faggioli, et al. (Eds.), Working Notes of the Conference and Labs of the Evaluation
Forum (CLEF 2024), CEUR Workshop Proceedings, CEUR-WS.org, 2024.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. H. DuBay,</surname>
          </string-name>
          <article-title>The principles of readability</article-title>
          .,
          <string-name>
            <surname>Online Submission</surname>
          </string-name>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Saggion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Štajner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Drndarevic</surname>
          </string-name>
          ,
          <article-title>Making it simplext: Implementation and evaluation of a text simplification system for spanish</article-title>
          ,
          <source>ACM Transactions on Accessible Computing (TACCESS) 6</source>
          (
          <issue>2015</issue>
          )
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>North</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shardlow</surname>
          </string-name>
          ,
          <article-title>Lexical complexity prediction: An overview</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I.</given-names>
            <surname>Segura-Bedmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <article-title>Simplifying drug package leaflets written in spanish by using word embedding</article-title>
          ,
          <source>Journal of Biomedical Semantics</source>
          <volume>8</volume>
          (
          <year>2017</year>
          ).
          <source>doi:10.1186/s13326-017-0156-7.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Saggion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Štajner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Drndarevic</surname>
          </string-name>
          ,
          <article-title>Making it simplext: Implementation and evaluation of a text simplification system for spanish</article-title>
          ,
          <source>ACM Trans. Access. Comput. 6</source>
          (
          <year>2015</year>
          ). URL: https://doi.org/10.1145/2738046. doi:
          <volume>10</volume>
          .1145/2738046.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Anula</surname>
          </string-name>
          ,
          <article-title>Lecturas adaptadas a la enseñanza del español como l2: variables lingüísticas para la determinación del nivel de legibilidad, La evaluación en el aprendizaje y la enseñanza del español como LE L 2 (</article-title>
          <year>2008</year>
          )
          <fpage>162</fpage>
          -
          <lpage>170</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alarcón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Martínez</surname>
          </string-name>
          ,
          <string-name>
            <surname>Hulat-ALexS CWI</surname>
          </string-name>
          Task-
          <article-title>CWI for Language and</article-title>
          Learning Disabilities Applied to University Educational Texts,
          <source>in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2020</year>
          ),
          <article-title>CEUR-WS, Malaga</article-title>
          , Spain,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azarbonyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bertin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Augereau</surname>
          </string-name>
          ,
          <article-title>Overview of the clef 2023 simpletext task 2: Dificult concept identification and explanation (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          , et al.,
          <source>Overview of CLEF</source>
          <year>2024</year>
          <article-title>SimpleText track on improving access to scientific texts</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
          </string-name>
          , et al. (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Lecture Notes in Computer Science, Springer,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname>Faggioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . García Seco de Herrera (Eds.), Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2024</year>
          , CEUR Workshop Proceedings, CEUR-WS.org, Grenoble, France,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Paetzold</surname>
          </string-name>
          , L. Specia,
          <source>SemEval 2016 Task</source>
          <volume>11</volume>
          :
          <string-name>
            <surname>Complex Word Identification</surname>
          </string-name>
          ,
          <year>2016</year>
          , pp.
          <fpage>560</fpage>
          -
          <lpage>569</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>S16</fpage>
          -1085.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Yimam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Paetzold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Specia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Štajner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <source>A Report on the Complex Word Identification Shared Task</source>
          <year>2018</year>
          ,
          <source>in: Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications</source>
          , Association for Computational Linguistics, New Orleans, Louisiana,
          <year>2018</year>
          , pp.
          <fpage>66</fpage>
          -
          <lpage>78</lpage>
          . URL: https://aclanthology. org/W18-0507. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W18</fpage>
          -0507.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ortiz-Zambrano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Montejo-Ráez</surname>
          </string-name>
          ,
          <source>Overview of ALexS 2020: First workshop on lexical analysis at SEPLN, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2020</year>
          ), volume
          <volume>2664</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Shardlow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Paetzold</surname>
          </string-name>
          , M. Zampieri, SemEval
          <article-title>-2021 task 1: Lexical complexity prediction</article-title>
          ,
          <source>in: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval2021)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          . URL: https://aclanthology. org/
          <year>2021</year>
          .semeval-
          <volume>1</volume>
          .1. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .semeval-
          <volume>1</volume>
          .1.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Štajner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Saggion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ferrés</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shardlow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. C.</given-names>
            <surname>Sheang</surname>
          </string-name>
          , K. North,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , W. Xu,
          <source>Proceedings of the Workshop on Text Simplification</source>
          , Accessibility, and
          <string-name>
            <surname>Readability</surname>
          </string-name>
          (TSAR-
          <year>2022</year>
          ),
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>