<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UNIPD@SimpleText2024: A Semi-Manual Approach on Prompting ChatGPT for Extracting Terms and Write Terminological Definitions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giorgio Maria Di Nunzio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Gallina</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federica Vezzani</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering, University of Padova</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Linguistic and Literary Studies, University of Padova</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this experimental work, we explore Task 2 of the SimpleText Lab, which aims to enhance text simplification technologies using manually annotated datasets. The objective of this work is to propose a methodology for evaluating the capability of Large Language Models to identify and explain dificult terms through optimal prompting. Additionally, we assess improvements by manually correcting the extracted terms and definitions, aiming to refine and advance the utility of text simplification tools for broader applications.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Text Simplification</kwd>
        <kwd>Automatic Term Extraction</kwd>
        <kwd>Terminological Definition</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Our participation to this task has the objective to study the capability of a Large Language Model to
extract dificult terms and build terminological definition to explain those terms with the right prompt.
In addition, we also want to evaluate the improvement (if any) of the initial results with a manual
correction of the extracted terms and the provided definitions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>Our participation to Task 2 focuses on identifying and explaining dificult content using Large Language
Models (LLMs) to enhance text simplification. The methodology involves iterative experimentation with
various prompting strategies to optimize the performance of the model in this task. The methodology
that we designed with the help of a Master Student in Translation-oriented Terminography followed
these steps:
• Initially analyze a diverse set of complex texts to identify common linguistic and contextual
dificulties.
• Design and test a series of prompts to guide the LLM not only to detect these dificult sections
but also to provide clear and concise explanations or simplifications.
• Refining prompts based on feedback and evaluation metrics like readability, clarity, and fidelity
to the original meaning.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Setting</title>
      <p>
        In order to find the most suitable prompt to submit to ChatGPT 3.5 (April 15 2024 is the time the
experiment was performed) we followed the procedure presented in the previous section. In particular,
we started by analyzing the abstract of the paper [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and started trying diferent prompts in to obtain an
output that performed tasks relating to terminology extraction, identification of the level of dificulty of
each term and the formulation of definitions for those considered dificult.
      </p>
      <p>An example of the first prompt is shown in Figure 1 (initial prompt) and Figure 3 (output).</p>
      <p>A second and third attempt of the prompt was necessary to be more precise in the request: for
this reason, we added two brief definitions of “term” and “intensional definition” (according to ISO
1087: 2019, intensional definition ”conveys the intension of a concept by stating the immediate generic
concept and the delimiting characteristic(s)”) were included in the input and explicitly mention the fact
that the evaluation of the dificulty of each term should be performed as the user is a general public
user.</p>
      <p>The output produced by ChatGPT maintained the same terms extracted in the previous attempt (the
second one, not shown here in the figures), while adding “coarse-to-fine tuning strategy” to the terms
considered dificult to understand. As already seen in the second attempt, the definitions provided
contain elements of the intensional definition (superordinate concept and delimiting characteristics) in
the first part of the output related to subtask 2.3 (building definitions), while the second part contains a
further explanation aimed at deepening the terms analyzed. The results are shown in Figure 3.</p>
      <p>After this preliminary analysis to tune the right prompt, we run the same prompt on each abstract of
the dataset and collected all the extracted terms, their dificulty, and the intensional definition.</p>
      <p>We produced three runs:
• “unipd_t21t22_chatgpt” contains the ChatGPT output without any modification;
• “unipd_t21t22_chatgpt_mod1” contains the output of the original runs contains minus the
elements that we do not consider as terms (so the only operation we did was to eliminate elements
from the original run);
• “unipd_t21t22_chatgpt_mod2” contains additional manual corrections like:
– remove partial/not meaningful multi-word terms;
– for situations like “body mass (BM)”, we separate “body mass” and “BM” into two entries;
– incomplete terms are completed;
– terms assigned to an incorrect sentence are reassigned to the correct sentence.</p>
      <p>In addition, we created a non-oficial run (that was not submitted to this Task) completely manual
named:</p>
      <p>• “unipd_t21t22_manual”.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>In this section, we present a summary of the quantitative results obtained with the three oficial runs
plus the one additional manual run that was prepared afterwards. For all the runs, we have the following
information:
• name of the run;
• recall overall: the proportion of terms (independently from the dificulty) that were found;
• precision overall: the proportion of terms (independently from the dificulty) correctly categorized
as terms;
• f1 overall score;
• recall average: the average of the recall of terms computed per sentence;
• precision average: the average of the recall of terms computed per sentence;
• f1 average score;
• recall dificult terms: the proportion of dificult terms that were found;
• precision dificult: the precision of terms that were labeled as dificult;
• f1 dificult overall score;
• recall dificult terms: the proportion of dificult terms that were found;
• precision dificult: the precision of terms that were labeled as dificult;
• f1 dificult average score;
• bleu_nx: the BLEU score computed with ngrams x = 1, 2, 3, 4.</p>
      <p>1.00
0.75
ll
a
r
e
v
o
iino0.50
s
c
e
r
p
0.25
0.00
1.00
lt)0.75
u
c
iiff
d
(
ll
a
re0.50
v
o
n
o
ii
s
c
e
r
p
0.25
0.00</p>
      <p>In particular: in Table 1, we show the overall results for the term extraction independently from the
dificulty of the term. In Table 2, we show the overall results for the term extraction only for dificult
terms. In Table 3, we present the scores of the BLEU measure for the provided definitions. In Figure 4,
we show the recall-precision plot of the overall scores and the scores averaged per sentence for all the
terms; in Figure 5, the same information for dificult terms only; in Figure 6, we display the BLEU score,
for n = 1 and n = 2, in relation to the f1 value for dificult terms.</p>
      <p>For almost all the results, we can see that the performance of all the run is much better than the
median values for the task for both the extraction of terms and the generation of definitions. Manual
interventions on the output of ChatGPT are beneficial, as expected, in particular in creasing the recall
maintaining a high precision in the extraction of the terms. The fully manual run, has shown the best
recall but a slightly worse performance for what concerns precision across all the terms. On the other
hand, the manual correction of definitions has not improved the BLAU score significantly.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Final Considerations</title>
      <p>In this paper, we described the methodology and the experiments submitted to the SimpleText Lab for
Task 2 which is about identifying and explaining dificult concepts. The objective of this work was to
analyze the performance of a Large Language Model, specifically ChatGPT 3.5, in extracting terms,
evaluate their dificulty, and create intensional definitions to explain the dificult terms. The preliminary</p>
      <p>Acknowledgments
This work is partially supported by the HEREDITARY Project, as part of the European Union’s Horizon Europe
research and innovation programme under grant agreement No GA 101137074. This work is also part of the
initiatives carried out by the Center for Studies in Computational Terminology (CENTRICO) of the University of
Padua and in the research directions of the Italian Common Language Resources and Technology Infrastructure
CLARIN-IT.
[7] G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024: Conference
and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org, 2024.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          , E. SanJuan, S. Huet,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azarbonyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vezzani</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. D'Souza</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2024 SimpleText track: Improving access to scientific texts for everyone</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>G. Q.</given-names>
          </string-name>
          <string-name>
            <surname>Philippe Mulhem</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. M. D. Nunzio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Lecture Notes in Computer Science, Springer,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>SanJuan</surname>
          </string-name>
          , S. Huet,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2023 simpletext task 1: Passage selection for a simplified summary</article-title>
          , in: M.
          <string-name>
            <surname>Aliannejadi</surname>
            , G. Faggioli,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
          </string-name>
          , M. Vlachos (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2023</year>
          ), Thessaloniki, Greece,
          <source>September 18th to 21st</source>
          ,
          <year>2023</year>
          , volume
          <volume>3497</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>2823</fpage>
          -
          <lpage>2834</lpage>
          . URL: https: //ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3497</volume>
          /paper-238.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vezzani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bonato</surname>
          </string-name>
          , H. Azarbonyad, ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2024 SimpleText task 2: Identify and explain dificult concepts</article-title>
          ,
          <source>in: [7]</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Laimé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>McCombie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2024 SimpleText task 3: Simplify scientific text</article-title>
          ,
          <source>in: [7]</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>J. D'Souza</surname>
          </string-name>
          , et al.,
          <article-title>Overview of the CLEF 2024 SimpleText task 4: Track the state-of-the-art in scholarly publications</article-title>
          ,
          <source>in: [7]</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Luo,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Shi</surname>
          </string-name>
          , G. Cheng, Dense Re-
          <article-title>Ranking with Weak Supervision for RDF Dataset Search</article-title>
          , in: T. R.
          <string-name>
            <surname>Payne</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Presutti</surname>
            , G. Qi,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Poveda-Villalón</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Hollink</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Kaoudi</surname>
          </string-name>
          , G. Cheng, J.
          <source>Li (Eds.)</source>
          ,
          <source>The Semantic Web - ISWC 2023</source>
          , Springer Nature Switzerland, Cham,
          <year>2023</year>
          , pp.
          <fpage>23</fpage>
          -
          <lpage>40</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -47240-
          <issue>4</issue>
          _
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>