1. Introduction

UNIPD@SimpleText2024: A Semi-Manual Approach on Prompting ChatGPT for Extracting Terms and Write Terminological Definitions

Giorgio Maria Di Nunzio

Elena Gallina

Federica Vezzani

1 0 Department of Information Engineering, University of Padova , Italy 1 Department of Linguistic and Literary Studies, University of Padova , Italy

In this experimental work, we explore Task 2 of the SimpleText Lab, which aims to enhance text simplification technologies using manually annotated datasets. The objective of this work is to propose a methodology for evaluating the capability of Large Language Models to identify and explain dificult terms through optimal prompting. Additionally, we assess improvements by manually correcting the extracted terms and definitions, aiming to refine and advance the utility of text simplification tools for broader applications.

eol>Text Simplification Automatic Term Extraction Terminological Definition

1. Introduction

Our participation to this task has the objective to study the capability of a Large Language Model to extract dificult terms and build terminological definition to explain those terms with the right prompt. In addition, we also want to evaluate the improvement (if any) of the initial results with a manual correction of the extracted terms and the provided definitions.

2. Methodology

Our participation to Task 2 focuses on identifying and explaining dificult content using Large Language Models (LLMs) to enhance text simplification. The methodology involves iterative experimentation with various prompting strategies to optimize the performance of the model in this task. The methodology that we designed with the help of a Master Student in Translation-oriented Terminography followed these steps: • Initially analyze a diverse set of complex texts to identify common linguistic and contextual dificulties. • Design and test a series of prompts to guide the LLM not only to detect these dificult sections but also to provide clear and concise explanations or simplifications. • Refining prompts based on feedback and evaluation metrics like readability, clarity, and fidelity to the original meaning.

3. Experimental Setting

In order to find the most suitable prompt to submit to ChatGPT 3.5 (April 15 2024 is the time the experiment was performed) we followed the procedure presented in the previous section. In particular, we started by analyzing the abstract of the paper [ 6 ] and started trying diferent prompts in to obtain an output that performed tasks relating to terminology extraction, identification of the level of dificulty of each term and the formulation of definitions for those considered dificult.

An example of the first prompt is shown in Figure 1 (initial prompt) and Figure 3 (output).

A second and third attempt of the prompt was necessary to be more precise in the request: for this reason, we added two brief definitions of “term” and “intensional definition” (according to ISO 1087: 2019, intensional definition ”conveys the intension of a concept by stating the immediate generic concept and the delimiting characteristic(s)”) were included in the input and explicitly mention the fact that the evaluation of the dificulty of each term should be performed as the user is a general public user.

The output produced by ChatGPT maintained the same terms extracted in the previous attempt (the second one, not shown here in the figures), while adding “coarse-to-fine tuning strategy” to the terms considered dificult to understand. As already seen in the second attempt, the definitions provided contain elements of the intensional definition (superordinate concept and delimiting characteristics) in the first part of the output related to subtask 2.3 (building definitions), while the second part contains a further explanation aimed at deepening the terms analyzed. The results are shown in Figure 3.

After this preliminary analysis to tune the right prompt, we run the same prompt on each abstract of the dataset and collected all the extracted terms, their dificulty, and the intensional definition.

We produced three runs: • “unipd_t21t22_chatgpt” contains the ChatGPT output without any modification; • “unipd_t21t22_chatgpt_mod1” contains the output of the original runs contains minus the elements that we do not consider as terms (so the only operation we did was to eliminate elements from the original run); • “unipd_t21t22_chatgpt_mod2” contains additional manual corrections like: – remove partial/not meaningful multi-word terms; – for situations like “body mass (BM)”, we separate “body mass” and “BM” into two entries; – incomplete terms are completed; – terms assigned to an incorrect sentence are reassigned to the correct sentence.

In addition, we created a non-oficial run (that was not submitted to this Task) completely manual named:

• “unipd_t21t22_manual”.

4. Results

In this section, we present a summary of the quantitative results obtained with the three oficial runs plus the one additional manual run that was prepared afterwards. For all the runs, we have the following information: • name of the run; • recall overall: the proportion of terms (independently from the dificulty) that were found; • precision overall: the proportion of terms (independently from the dificulty) correctly categorized as terms; • f1 overall score; • recall average: the average of the recall of terms computed per sentence; • precision average: the average of the recall of terms computed per sentence; • f1 average score; • recall dificult terms: the proportion of dificult terms that were found; • precision dificult: the precision of terms that were labeled as dificult; • f1 dificult overall score; • recall dificult terms: the proportion of dificult terms that were found; • precision dificult: the precision of terms that were labeled as dificult; • f1 dificult average score; • bleu_nx: the BLEU score computed with ngrams x = 1, 2, 3, 4.

1.00 0.75 ll a r e v o iino0.50 s c e r p 0.25 0.00 1.00 lt)0.75 u c iiff d ( ll a re0.50 v o n o ii s c e r p 0.25 0.00

In particular: in Table 1, we show the overall results for the term extraction independently from the dificulty of the term. In Table 2, we show the overall results for the term extraction only for dificult terms. In Table 3, we present the scores of the BLEU measure for the provided definitions. In Figure 4, we show the recall-precision plot of the overall scores and the scores averaged per sentence for all the terms; in Figure 5, the same information for dificult terms only; in Figure 6, we display the BLEU score, for n = 1 and n = 2, in relation to the f1 value for dificult terms.

For almost all the results, we can see that the performance of all the run is much better than the median values for the task for both the extraction of terms and the generation of definitions. Manual interventions on the output of ChatGPT are beneficial, as expected, in particular in creasing the recall maintaining a high precision in the extraction of the terms. The fully manual run, has shown the best recall but a slightly worse performance for what concerns precision across all the terms. On the other hand, the manual correction of definitions has not improved the BLAU score significantly.

5. Final Considerations

In this paper, we described the methodology and the experiments submitted to the SimpleText Lab for Task 2 which is about identifying and explaining dificult concepts. The objective of this work was to analyze the performance of a Large Language Model, specifically ChatGPT 3.5, in extracting terms, evaluate their dificulty, and create intensional definitions to explain the dificult terms. The preliminary

Acknowledgments This work is partially supported by the HEREDITARY Project, as part of the European Union’s Horizon Europe research and innovation programme under grant agreement No GA 101137074. This work is also part of the initiatives carried out by the Center for Studies in Computational Terminology (CENTRICO) of the University of Padua and in the research directions of the Italian Common Language Resources and Technology Infrastructure CLARIN-IT. [7] G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024: Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org, 2024.

[1]

Ermakova , E. SanJuan, S. Huet,

Azarbonyad ,

G. M.

Di Nunzio ,

Vezzani , J. D'Souza , J. Kamps , Overview of the CLEF 2024 SimpleText track: Improving access to scientific texts for everyone , in: L. Goeuriot , G. Q.

Philippe Mulhem , D.

Schwab , L.

Soulier , G. M. D. Nunzio , P. Galuščáková , A. G. S. de Herrera , G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024 ), Lecture Notes in Computer Science, Springer, 2024 .

[2]

SanJuan , S. Huet,

Kamps ,

Ermakova , Overview of the CLEF 2023 simpletext task 1: Passage selection for a simplified summary , in: M. Aliannejadi , G. Faggioli, N. Ferro , M. Vlachos (Eds.), Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023 ), Thessaloniki, Greece, September 18th to 21st , 2023 , volume 3497 of CEUR Workshop Proceedings, CEUR-WS.org , 2023 , pp. 2823 - 2834 . URL: https: //ceur-ws. org/ Vol- 3497 /paper-238.pdf.

[3]

G. M.

Di Nunzio ,

Vezzani ,

Bonato , H. Azarbonyad, ,

Kamps ,

Ermakova , Overview of the CLEF 2024 SimpleText task 2: Identify and explain dificult concepts , in: [7] , 2024 .

[4]

Ermakova ,

Laimé ,

McCombie ,

Kamps , Overview of the CLEF 2024 SimpleText task 3: Simplify scientific text , in: [7] , 2024 .

[5] J. D'Souza , et al., Overview of the CLEF 2024 SimpleText task 4: Track the state-of-the-art in scholarly publications , in: [7] , 2024 .

[6]

Chen ,

Huang ,

Zhang , W. Luo,

Lin ,

Shi , G. Cheng, Dense Re- Ranking with Weak Supervision for RDF Dataset Search , in: T. R. Payne , V.

Presutti , G. Qi, M.

Poveda-Villalón , G.

Stoilos , L.

Hollink , Z.

Kaoudi , G. Cheng, J. Li (Eds.) , The Semantic Web - ISWC 2023 , Springer Nature Switzerland, Cham, 2023 , pp. 23 - 40 . doi: 10 .1007/978-3- 031 -47240- 4 _ 2 .