=Paper= {{Paper |id=Vol-3740/paper-317 |storemode=property |title=CLEF 2024 SimpleText Tasks 1-3: Use of Llama-2 for Text Simplification |pdfUrl=https://ceur-ws.org/Vol-3740/paper-317.pdf |volume=Vol-3740 |authors=Rowan Mann,Tomislav Mikulandric |dblpUrl=https://dblp.org/rec/conf/clef/MannM24c }} ==CLEF 2024 SimpleText Tasks 1-3: Use of Llama-2 for Text Simplification== https://ceur-ws.org/Vol-3740/paper-317.pdf
                         CLEF 2024 SimpleText Tasks 1-3: Use of Llama-2 for Text
                         Simplification⋆
                         Notebook for the SimpleText Lab at CLEF 2024

                         Rowan Mann1 , Tomislav Mikulandric2
                         1
                             Christian-Albrechts-Universität zu Kiel (CAU), Christian-Albrechts-Platz 4, 24118 Kiel
                         2
                             The University of Split, Ul. Ruđera Boškovića 31, 21000, Split, Croatia


                                         Abstract
                                         In an era defined by the vast availability of information, the challenge of discerning reliable information is
                                         more pressing than ever. Our paper presents findings from Tasks 1, 2, and 3 of the SimpleText track at the
                                         15th Conference and Labs of the Evaluation Forum (CLEF) 2024, aimed at advancing research in automatic
                                         simplification of scientific texts using LLaMA-2.
                                             Task 1 involved selecting relevant passages for simplified summaries, leveraging ElasticSearch and TF-IDF with
                                         cosine similarity for evaluating relevance. We achieved an average Flesch-Kincaid grade level of 0.6, indicating a
                                         moderate complexity suitable for further simplification.
                                             Task 2 focused on identifying and explaining difficult concepts. Using the LLaMA-2 13B model, we extracted
                                         and rated the difficulty of scientific terms, generating explanations for the most challenging ones. However,
                                         reliance on Wikipedia for definitions proved inconsistent, highlighting a limitation in our methodology.
                                             Task 3 addressed the simplification of scientific abstracts and sentences. We utilized LLaMA-2 to generate
                                         simplified versions, effectively maintaining the original meaning while reducing complexity and length. Human
                                         validation confirmed the preservation of essential content in the simplified texts.
                                             Our research demonstrates the efficacy of LLaMA-2 for text simplification tasks, albeit with noted challenges
                                         in obtaining reliable definitions from external sources like Wikipedia. These findings contribute to the broader
                                         goal of enhancing scientific literacy through accessible information.

                                         Keywords
                                         LLMs, text simplification, LLaMA-2




                         1. Introduction
                         We live in an era characterised by an abundance of information available to all, almost instantaneously.
                         However, far from creating a world defined by truth and understanding, our era seems to be accurately
                         defined by misinformation and polarity. Fake news and algorithmically determined “echo-chambers”
                         have helped spread conspiracy and division across the world, with consequences that reverberate far
                         beyond their origins in cyberspace.
                            For the average person, it’s more difficult than ever to know what information to believe. We all need
                         to be able to understand our world, so scientific literacy is a more important skill than ever.
                            This paper presents the results of our analysis of Tasks 1, 2 and 3 of the SimpleText track as part of
                         the 15th Conference and Labs of the Evaluation Forum 2024. The main goal of SimpleText is to advance
                         research in the area of automatic simplification of scientific texts [1].
                            The paper deals with:

                                 • Task 1: What is in (or out)? Selecting passages to include in a simplified summary.
                                 • Task 2: What is unclear? Difficult concept identification and explanation
                                 • Task 3: Rewrite this! Given a query, simplify passages from scientific abstracts.

                         CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France
                         ⋆
                           CLEF 2024 SimpleText Tasks 1-3: Use of LLaMA-2 for Text Simplification
                         *
                           Corresponding author.
                         †
                           These authors contributed equally.
                         $ rowanmann93@gmail.com (R. Mann); tomislav.mikulandric@gmail.com (T. Mikulandric)
                                      © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
2. Task 1: Experimental Setup
2.1. Data Description
The data provided to us by CLEF consisted of 2 folders, “corpus” and “topics qrels”. The corpus includes
a new vector database with sentence embedding scores and retains the previously released ElasticSearch
Index for field-specific document searches. The ElasticSearch Index allows querying fields such as id,
abstract, authors, title, year, and doi from the DBLP dump and is suitable for various applications like
passage retrieval, Latent Dirichlet Allocation models, and training Graph Neural Networks. The Vector
Database stores each article’s id and sentence-embedding vectors from their title and abstract, excluding
articles with empty or very short abstracts, supporting longer queries enabled by sentence embedding.
   The SimpleText 2024 Task 1 Corpus includes topics defined by articles from The Guardian’s tech
section (G01 to G20) and Tech Xplore (T01 to T20), with URLs and textual content provided for participant
use. Queries associated with each topic, manually verified for relevance, enable retrieval of relevant
DBLP passages. This edition introduces new queries for The Guardian articles, generated by ChatGPT
4.0, focusing on specific sub-topics and provided in CSV and JSON formats. The Simpletext 2024 task1
train.qrels file offers quality relevance judgments on a 0-2 scale for abstracts, incorporating data from
previous editions and new judgments for topics G01-G15, excluding articles with nearly empty abstracts
to ensure consistency with the new vector database.

2.2. Method
We created an ElasticSearch function “query elasticsearch” to query the ElasticSearch database. It took
two parameters: query (the search query) and size (the number of results to return, defaulting to 100).
The function sent a GET request to the ElasticSearch URL with the specified query and size and returned
the search results in JSON format.
# Function to query the E l a s t i c S e a r c h
d e f q u e r y _ e l a s t i c s e a r c h ( query , s i z e = 1 0 0 ) :
      r e s p o n s e = r e q u e s t s . g e t ( f " { ES_URL } ? q = { q u e r y }& s i z e = { s i z e } " , a u t h
            =( ’ inex ’ , ’ qatc2011 ’ ) )
      i f r e s p o n s e . s t a t u s _ c o d e == 2 0 0 :
              return response . json ( ) [ ’ hits ’ ] [ ’ hits ’ ]
      else :
              print ( " F a i l e d to fetch data : " , response . status_code )
              return []
  We used the first five examples from the “simpletext 2024 task1 queries.json” file and went over every
index. (Appendix A)
  We created a function to calculate how relevant the abstracts retrieved were to our search. The
function created a Text Frequency Inverse Document Frequency (TF IDF), for vectorizing the texts,
which assessed relevancy of words with regards to our corpus, then calculated the cosine similarity of
our vectorised words. This function could then return a relevance score (rel score)
  To create the combined score, we calculated word difficulty based on the Flesch Kincaid grade level.
The Flesch–Kincaid grade level is one of the formulas used for assessing reading-ease, scores indicate
the grade a person would have to be in US education system to understand the text.
def flesch_kincaid_grade_level ( text ) :
    # Constants for the formula
    ASL = a v e r a g e _ s e n t e n c e _ l e n g t h ( t e x t )
    ASW = a v e r a g e _ s y l l a b l e s _ p e r _ w o r d ( t e x t )


       # Calculating the score
       s c o r e = 0 . 3 9 ∗ ASL + 1 1 . 8 ∗ ASW − 1 5 . 5 9


       # N o r m a l i z e s c o r e t o r a n g e from 0 t o 1
       n o r m a l i z e d _ s c o r e = normalize ( score , min_score =0 , max_score =25)
            # Adjust max_score as needed


       return normalized_score


3. Task 1: Experimental Results
We analysed our success by using elastic search to select passages and calculated scores using FKGL
and normalisation. The mean of these scores was close to 0.6 which meant that the texts were more
complex than everyday speech and appropriate to be used for the next tasks.

Table 1
Official results for Task 1
                     MMR      Precision 10   Precision 20   NDCG 10   NDCG 20   Bpref     MAP
             T11 0.217           0,0233         0,0150       0,0121    0,0106   0,0062   0,0025
             T12 0,5444          0,3733         0,2750       0,2443    0,2183   0,0963   0,0601



4. Task 2: Experimental Setup
4.1. Data Description
The dataset for "Task 2: Identifying and Explaining Difficult Concepts" in the SimpleText Lab is divided
into training and validation folders, each containing several tab-separated files. The training folder
includes documents.tsv (576 rows, 115 documents), documents users.tsv (145 rows, document and
expert IDs), terms.tsv (1,910 rows, terms, difficulty, expert ID), definitions explanations.tsv (1,046 rows,
definitions, explanations, expert ID), and definitions generated.tsv (589 rows, automatically generated
definitions). The validation folder contains definitions explanations.tsv (960 rows, definitions without
explanations), definitions generated.tsv (932 rows, automatically generated definitions), and terms.tsv
(680 rows, terms, difficulty). Initial annotations were performed by multiple experts, with a second
round of validation by an external expert to identify additional terms and definitions. The dataset will
later include test files for the evaluation phase.
   The test dataset for "Task 2: Identifying and Explaining Difficult Concepts" in the SimpleText Lab
includes several tab-separated files. The documents.tsv file contains 501 rows across 55 documents with
columns for document ID, sentence ID, and sentence text. The terms.tsv and definitions explanations.tsv
files, available after the evaluation phase, provide annotated sentence IDs, extracted terms, difficulty
levels (easy, medium, difficult), user-provided definitions, and explanations. Finally, the definitions
generated.tsv file contains 3,816 rows with unique definition IDs and the corresponding definitions to
be ranked.

4.2. Method
We created a prompt for LLAMA-2 13B model that asked the LLM to iterate over each of our source
sentences and extract three scientific terms from the phrase.
prompt_terms = " " "
    You a r e a r o b o t t h a t ONLY o u t p u t s JSON .
      You r e p l y i n JSON f o r m a t w i t h t h e f i e l d ’ terms ’ .
      You p r o v i d e ONLY s e m i c o l o n − s e p a r a t e d    l i s t o f MAXIMUM 3
          s c i e n t i f i c t e r m s o f a s o u r c e s e n t e n c e ONLY .
      You DO NOT add ’ Sure , Here a r e t h e s c i e n t i f i c t e r m s o f your
          sentence : ’ .
      Example s o u r c e s e n t e n c e : I n t h e modern e r a o f a u t o m a t i o n and
          robotics , \
      autonomous v e h i c l e s a r e c u r r e n t l y t h e f o c u s o f a c a d e m i c and
          industrial research .? \
      Example answer : { ’ terms ’ : ’ r o b o t i c s ; autonomous v e h i c l e s ’ }
      Now h e r e i s my s e n t e n c e :
"""
   We used Regex to help us deal with regular expressions, removing unnecessary content in the outputs.
(Appendix B)
   The terms were then sorted into three rows, with duplicates removed, one term per row and we
prompted Llama to give us a difficulty rating of easy, medium, or difficult for our terms. (Appendix C)
   We used wikipedia to return definitions for the difficult terms, with limited success. (Appendix D)
   We also asked the LLM to provide an explanation. When creating our prompt for our LLM, we gave
it a few examples of correct return phrases, that were taken from the document provided. This was to
improve the ability to achieve “few-shot” results. (Appendix E)
   We then created a function to remove unnecessary text. (Appendix F)
   Finally, we compiled our results in a JSON file, with those terms considered “d” for difficult, generating
definitions. (Appendix G)


5. Task 2: Experimental Results
The LLM was successful in generating definitions for our difficult terms, but an issue we encountered
was that Wikipedia was unsuccessful in generating definitions for our terms. Therefore, this certainly
harms the appropriateness of this method as many of our definitions are missing.

Table 2
Official results for task 2
                                               Recall overall   Recall avg
                                   Task 2.2       0,0069         0,0040
                                   Task 2.21      0,0083         0,0084



6. Task 3: Experimental Setup
6.1. Method
We used LLAMA-2 13B once more, creating a larger context window of 4096. We gave the sentences to
the LLM, asking it to simplify the texts. Again, we instructed the LLM to remove fluff words like “Sure!”
etc. This gave us an additional column for our simplified sentences, simplified snt. (Appendix H)
   Once again, it was important to remove unnecessary text therefore we created a function to carry
out this task. (Appendix I)


7. Task 3: Experimental Results
Our results from the LLAMA 13B model for simplifying both the source abstracts and source sentences
seems promising. Based on human validation of the simplified phrases, it’s seems clear that the meaning
has been preserved while reducing the complexity of words and the length of the sentences.
8. Conclusion
Our research has shown LLAMA-2 18B to be an effective model for selecting and simplifying passages
from scientific texts. However, we’ve also highlighted the unreliability of relying on wikipedia for the
provision of definitions in this context.


Acknowledgments
We’d like to extend our gratitude to the University of Brest for organising the Blended Intensive
Programme (BIP) AI For Humanities. We would also like to thank Liana Ermakova for her teaching of
the course and Caroline L’haridon for her support during our stay in Brest.


References
[1] L. Ermakova, et al., Overview of CLEF 2024 SimpleText track on improving access to scientific texts,
    in: L. Goeuriot, et al. (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction.
    Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024), Lecture
    Notes in Computer Science, Springer, 2024.
[2] E. SanJuan, et al., Overview of the CLEF 2024 SimpleText task 1: Retrieve passages to include in a
    simplified summary, in: G. Faggioli, et al. (Eds.), Working Notes of the Conference and Labs of the
    Evaluation Forum (CLEF 2024), CEUR Workshop Proceedings, CEUR-WS.org, 2024.
[3] G. M. D. Nunzio, et al., Overview of the CLEF 2024 SimpleText task 2: Identify and explain difficult
    concepts, in: G. Faggioli, et al. (Eds.), Working Notes of the Conference and Labs of the Evaluation
    Forum (CLEF 2024), CEUR Workshop Proceedings, CEUR-WS.org, 2024.
[4] L. Ermakova, et al., Overview of the CLEF 2024 SimpleText task 3: Simplify scientific text, in:
    G. Faggioli, et al. (Eds.), Working Notes of the Conference and Labs of the Evaluation Forum (CLEF
    2024), CEUR Workshop Proceedings, CEUR-WS.org, 2024.
[5] J. D’Souza, et al., Overview of the CLEF 2024 SimpleText task 4: Track the state-of-the-art in
    scholarly publications, in: G. Faggioli, et al. (Eds.), Working Notes of the Conference and Labs of
    the Evaluation Forum (CLEF 2024), CEUR Workshop Proceedings, CEUR-WS.org, 2024.


.1. Appendix A

d e f main ( ) :
      # Read q u e r i e s from JSON f i l e i n t o a d a t a f r a m e
      q u e r i e s = pd . r e a d _ j s o n ( ’ / c o n t e n t / d r i v e / MyDrive / BIP / S i m p l e T e x t /
           task 1/ task 1/ t o p i c s _ q r e l s / s i m p l e t e x t _ 2 0 2 4 _ t a s k 1 _ q u e r i e s . json
            ’)
      q u e r i e s = q u e r i e s . head ( 5 )


      all_results = []
      f o r i n d e x , query_row i n q u e r i e s . i t e r r o w s ( ) :
            q u e r y _ t e x t = query_row [ ’ q u e r y _ t e x t ’ ]
            t o p i c _ i d = query_row [ ’ t o p i c _ i d ’ ]
            q u e r y _ i d = query_row [ ’ q u e r y _ i d ’ ]


             docs = q u e r y _ e l a s t i c s e a r c h ( query_text )
             s c o r e s = c a l c u l a t e _ r e l e v a n c e ( docs , q u e r y _ t e x t )
             r e s u l t s = f o r m a t _ r e s u l t s ( docs , s c o r e s , t o p i c _ i d , q u e r y _ i d )
             a l l _ r e s u l t s . extend ( r e s u l t s )


      # Output r e s u l t s t o a JSON f i l e
      w i t h open ( ’ r e s u l t s . j s o n ’ , ’w ’ ) a s f :
             j s o n . dump ( a l l _ r e s u l t s , f , i n d e n t = 4 )


.2. Appendix B


def extract_value_inside_curly_braces ( text ) :
    # Use r e g e x t o f i n d t h e v a l u e i n s i d e c u r l y b r a c e s
    match = r e . s e a r c h ( r " \ { ( [ ^ { } ] ∗ ) \ } " , t e x t )


      i f match :
           r e t u r n match . group ( 1 )
      else :
           r e t u r n None


.3. Appendix C

prompt_difficulty ="""
    You a r e a r o b o t t h a t r a t e s t h e d i f f i c u l t y o f d i f f e r e n t t e r m s .
    You p r o v i d e ONE LEVEL o d i f f i c u l t y f o r s c i e n t i f i c t e r m s .
    You need t o c o n s i d e r two words a s one term .
    P r o v i d e ONE r a t i n g f o r t h e u n d e r s t a b l i t y d i f f i c u l t y o f term
          provided .
    There a r e 3 l e v e l s . You need t o u s e : e f o r easy , m f o r medium
          and d f o r d i f f i c u l t .
    Give t h e r a t i n g i n s i d e o f c u r l y b r a c e s l i k e t h i s { e }
    You can r e p l y w i t h ONLY one word .
    Example s o u r c e : autonomous v e h i c l e s
    Example answer : { ’ m’ }
    Now h e r e i s my s e n t e n c e :
"""


.4. Appendix D


import wikipedia


d e f g e t _ w i k i p e d i a _ d e f i n i t i o n ( term ) :
      try :
             # F e t c h W i k i p e d i a summary f o r t h e term
             summary = w i k i p e d i a . summary ( term )
              r e t u r n summary
      except wikipedia . exceptions . DisambiguationError as e :
           # I f there ’ s a d i s a m b i g u a t i o n e r r o r , handle i t as needed
           r e t u r n " D i s a m b i g u a t i o n E r r o r : Ambiguous term "
       except wikipedia . exceptions . PageError as e :
           # I f t h e page doesn ’ t e x i s t , h a n d l e i t a s n e e d e d
           r e t u r n " P a g e E r r o r : Term n o t f o u n d "
       except Exception as e :
           # Handle o t h e r e x c e p t i o n s
           return str ( e )


# Assuming t e s t [ ’ d i f f i c u l t y ’ ] c o n t a i n s t e r m s f o r which you want
      Wikipedia d e f i n i t i o n s
# t e s t [ ’ wi k i ’ ] = t e s t [ ’ term ’ ] . a p p l y ( g e t _ w i k i p e d i a _ d e f i n i t i o n )
t e s t . l o c [ t e s t [ ’ d i f f i c u l t y ’ ] == ’ d ’ , ’ w i ki ’ ] = t e s t . l o c [ t e s t [ ’
      d i f f i c u l t y ’ ] == ’ d ’ , ’ term ’ ] . a p p l y ( g e t _ w i k i p e d i a _ d e f i n i t i o n )
test


.5. Appendix E

prompt_explanation =" ""
    You a r e a r o b o t t h a t e x p l a i n s d i f f i c u l t s c i e n t i f i c t e r m s .
   DO NOT add i n t r o l i k e " Sure , I ’ d be happy t o h e l p ! "
    Use o n l y once s e n t a n c e and wrap t h e s e n t a n c e i n c u r l y b r a c e s .
     D o n t j u s t i f y your a n s w e r s . D o n t g i v e i n f o r m a t i o n n o t
        m e n t i o n e d i n t h e CONTEXT INFORMATION .
    Example s o u r c e : w i r e l e s s network e n v i r o n m e n t
    Example answer : { ’ a s y s t e m i n which d e v i c e s makes u s e o f R a d i o
        F r e q u e n c y c o n n e c t i o n s between n o d e s i n t h e network a s y s t e m
        i n which d e v i c e s a r e c o n n e c t e d t o a network w i t h o u t t h e need
        f o r p h y s i c a l c a b l e s or wires ’ }
    Example s o u r c e : B l u e t o o t h w i r e l e s s t e c h n o l o g y
    Example answer : { ’ s h o r t − r a n g e w i r e l e s s c o m m u n i c a t i o n t e c h n o l o g y
        t h a t a l l o w s d e v i c e s t o c o n n e c t and e x c h a n g e d a t a . I t
        f a c i l i t a t e s d a t a e x c h a n g e between d e v i c e s l i k e s m a r t p h o n e s ,
        co m p u t e r s , and p e r i p h e r a l s s u c h a s h e a d p h o n e s o r m e d i c a l
        d e v i c e s . B l u e t o o t h t e c h n o l o g y e l i m i n a t e s t h e need f o r
        p h y s i c a l c a b l e s , p r o v i d i n g c o n v e n i e n c e and v e r s a t i l i t y i n
        device connectivity . ’ }
    Example s o u r c e : a p p l i c a t i o n
    Example answer : { ’ s o f t w a r e program o r t o o l d e s i g n e d t o p e r f o r m
        s p e c i f i c t a s k s o r f u n c t i o n s on e l e c t r o n i c d e v i c e s . I t can
        r a n g e from p r o d u c t i v i t y t o o l s and games t o u t i l i t i e s and
        c o m m u n i c a t i o n p l a t f o r m s on e l e c t r o n i c d e v i c e s s u c h a s
        co m p u t e r s , s m a r t p h o n e s , o r t a b l e t s . ’ }
    Example s o u r c e : PDA
    Example answer : { ’ PDA i s t h e acronym f o r p e r s o n a l d i g i t a l
        a s s i s t a n t , which i s a h a n d h e l d e l e c t r o n i c d e v i c e d e s i g n e d f o r
          p e r s o n a l o r g a n i z a t i o n , communication , and i n f o r m a t i o n a c c e s s
        . PDAs may i n c l u d e f e a t u r e s s u c h a s c a l e n d a r s , c o n t a c t l i s t s ,
          and note − t a k i n g c a p a b i l i t i e s , s e r v i n g a s p o r t a b l e t o o l s f o r
        managing d a i l y t a s k s . PDA i s t h e acronym f o r p e r s o n a l d i g i t a l
           a s s i s t a n t , which i s a h a n d h e l d e l e c t r o n i c d e v i c e c r a f t e d f o r
           p e r s o n a l o r g a n i z a t i o n , communication , and i n f o r m a t i o n
         r e t r i e v a l . PDAs o f t e n i n c o r p o r a t e f e a t u r e s l i k e c a l e n d a r s ,
         c o n t a c t l i s t s , and note − t a k i n g c a p a b i l i t i e s , f u n c t i o n i n g a s
         p o r t a b l e t o o l s f o r managing d a i l y t a s k s and s t a y i n g c o n n e c t e d
         . While modern s m a r t p h o n e s have l a r g e l y r e p l a c e d t r a d i t i o n a l
         PDAs , t h e c o n c e p t i n f l u e n c e d t h e d e v e l o p m e n t o f c o n t e m p o r a r y
         mobile devices . ’ }
      Example s o u r c e : p i l o t s t u d y
      Example answer : { ’ a p r e l i m i n a r y r e s e a r c h i n v e s t i g a t i o n c o n d u c t e d
           on a s m a l l s c a l e t o a s s e s s t h e f e a s i b i l i t y , and p o t e n t i a l
         c h a l l e n g e s o f a l a r g e r r e s e a r c h p r o j e c t . an i n i t i a l and
         smaller − s c a l e research i n v e s t i g a t i o n undertaken to evaluate
         t h e f e a s i b i l i t y , methodology , and p o t e n t i a l o b s t a c l e s o f a
         l a r g e r r e s e a r c h p r o j e c t . I t s e r v e s a s a t e s t i n g ground t o
         r e f i n e t h e s t u d y d e s i g n , i d e n t i f y l o g i s t i c a l i s s u e s , and
         en h a nc e t h e o v e r a l l r o b u s t n e s s and e f f e c t i v e n e s s o f t h e
         planned f u l l − s c a l e r e s e a r c h endeavor . ’ }
      Now h e r e i s my ONE s e n t e n c e e x p l a n a t i o n :
"""


.6. Appendix F


def remove_redundant_text ( t e x t ) :
    # Define patterns to search for
    patterns = [
        r ’ ^ Hey t h e r e ! ’ ,
        r ’^ Sure ! ’ ,
        r ’ ^ As a s c i e n t i f i c j o u r n a l i s t , ’ ,
        r ’ I \ ’m h e r e t o b r e a k down a complex s t u d y i n t o s i m p l e t e r m s
              f o r you \ . ’ ,
        r ’ Here \ ’ s a s i m p l i f i e d v e r s i o n o f t h e t e x t ’ ,
        r ’ L e t me b r e a k i t down f o r you : ’ ,
        r ’ I \ ’m h e r e t o b r e a k down a complex s t u d y i n t o s i m p l e t e r m s
              f o r you \ . ’ ,
        r ’ I \ ’m h e r e t o b r e a k down complex s c i e n t i f i c c o n c e p t s i n t o
              s i m p l e , easy − to − u n d e r s t a n d l a n g u a g e . ’ ,
        r ’ I \ ’m h e r e t o b r e a k down a complex t o p i c i n t o s i m p l e r t e r m s
                f o r you . So , l e t \ ’ s t a l k ab o u t ’ ,
        r ’ Here i s my one s e n t e n c e e x p l a n a t i o n of ’


      ]


.7. Appendix G


      # Add d e f i n i t i o n and e x p l a n a t i o n i f t h e y a r e n o t empty
      i f row [ " d i f f i c u l t y " ] == " d " :
            d e f i n i t i o n = row . g e t ( " d e f i n i t i o n " , None )
              e x p l a n a t i o n = row . g e t ( " e x p l a n a t i o n " , None )
              if definition :
                      json_obj [" definition "] = definition
              i f explanation :
                      json_obj [" explanation "] = explanation


       return json_obj


.8. Appendix H

# Example u s a g e
def simplify ( snt ) :
    c = model . c r e a t e _ c h a t _ c o m p l e t i o n (
            m e s s a g e s =[
                    { " r o l e " : " s y s t e m " , " c o n t e n t " : " You a r e a s c i e n t i f i c
                          j o u r n a l i s t who p o p u l a r i z e s s c i e n t i f i c r e s u l t s . " } ,
                    { " r ol e " : " user " , " content " : " Simplify the following t e x t
                          : \ n" + snt }
            ]
    )
    r e t u r n c [ ’ c h o i c e s ’ ] [ 0 ] [ ’ message ’ ] [ ’ c o n t e n t ’ ] . s t r i p ( )


def simplify ( snt ) :
  c = model . c r e a t e _ c h a t _ c o m p l e t i o n (
          messages = [
                { " r o l e " : " s y s t e m " , " c o n t e n t " : " You a r e a s c i e n t i f i c
                      j o u r n a l i s t who p o p u l a r i z e s s c i e n t i f i c r e s u l t s . " } ,
                {
                        " role " : " user " ,
                        " c o n t e n t " : " S i m p l i f y the f o l l o w i n g t e x t : \ n"+ s n t
                }
          ]
  )
  r e t u r n c [ ’ c h o i c e s ’ ] [ 0 ] [ ’ message ’ ] [ ’ c o n t e n t ’ ] . s t r i p ( )


s i m p l i f y ( " With t h e e v e r i n c r e a s i n g number o f unmanned a e r i a l
     v e h i c l e s g e t t i n g i n v o l v e d i n a c t i v i t i e s i n t h e c i v i l i a n and
     c o m m e r c i a l domain , t h e r e i s an i n c r e a s e d need f o r autonomy i n
     these systems too . " )


.9. Appendix I


def remove_redundant_text ( t e x t ) :
    # Define patterns to search for
    patterns = [
        r ’ ^ Hey t h e r e ! ’ ,
        r ’^ Sure ! ’ ,
      r ’ ^ As a s c i e n t i f i c j o u r n a l i s t , ’ ,
      r ’ I \ ’m h e r e t o b r e a k down a complex s t u d y i n t o s i m p l e t e r m s
            f o r you \ . ’ ,
      r ’ Here \ ’ s a s i m p l i f i e d v e r s i o n o f t h e t e x t ’ ,
      r ’ L e t me b r e a k i t down f o r you : ’ ,
      r ’ I \ ’m h e r e t o b r e a k down a complex s t u d y i n t o s i m p l e t e r m s
            f o r you \ . ’ ,
      r ’ I \ ’m h e r e t o b r e a k down complex s c i e n t i f i c c o n c e p t s i n t o
            s i m p l e , easy − to − u n d e r s t a n d l a n g u a g e . ’ ,
      r ’ I \ ’m h e r e t o b r e a k down a complex t o p i c i n t o s i m p l e r t e r m s
              f o r you . So , l e t \ ’ s t a l k ab o u t ’ ,
      r ’ Sure , I \ ’ d be happy t o h e l p ! ’ ,
      r ’ Here \ ’ s a s i m p l i f i e d e x p l a n a t i o n of ’ ,
      r ’ I n o t h e r words , ’ ,
      r ’ I n s i m p l e terms , ’


]
# Compile r e g u l a r e x p r e s s i o n s
regex_patterns = [ re . compile ( pattern ) for pattern in p a t t e r n s ]


# Remove p a t t e r n s from t e x t
for pattern in regex_patterns :
    t e x t = r e . sub ( p a t t e r n , ’ ’ , t e x t ) . s t r i p ( )


return text