<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CLEF 2024 SimpleText Tasks 1-3: Use of Llama-2 for Text Simplification⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rowan Mann</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomislav Mikulandric</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The University of Split</institution>
          ,
          <addr-line>Ul. Ruđera Boškovića 31, 21000, Split</addr-line>
          ,
          <country country="HR">Croatia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In an era defined by the vast availability of information, the challenge of discerning reliable information is more pressing than ever. Our paper presents findings from Tasks 1, 2, and 3 of the SimpleText track at the 15th Conference and Labs of the Evaluation Forum (CLEF) 2024, aimed at advancing research in automatic simplification of scientific texts using LLaMA-2. Task 1 involved selecting relevant passages for simplified summaries, leveraging ElasticSearch and TF-IDF with cosine similarity for evaluating relevance. We achieved an average Flesch-Kincaid grade level of 0.6, indicating a moderate complexity suitable for further simplification. Task 2 focused on identifying and explaining dificult concepts. Using the LLaMA-2 13B model, we extracted and rated the dificulty of scientific terms, generating explanations for the most challenging ones. However, reliance on Wikipedia for definitions proved inconsistent, highlighting a limitation in our methodology. Task 3 addressed the simplification of scientific abstracts and sentences. We utilized LLaMA-2 to generate simplified versions, efectively maintaining the original meaning while reducing complexity and length. Human validation confirmed the preservation of essential content in the simplified texts. Our research demonstrates the eficacy of LLaMA-2 for text simplification tasks, albeit with noted challenges in obtaining reliable definitions from external sources like Wikipedia. These findings contribute to the broader goal of enhancing scientific literacy through accessible information.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;LLMs</kwd>
        <kwd>text simplification</kwd>
        <kwd>LLaMA-2</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>• Task 1: What is in (or out)? Selecting passages to include in a simplified summary.
• Task 2: What is unclear? Dificult concept identification and explanation
• Task 3: Rewrite this! Given a query, simplify passages from scientific abstracts.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Task 1: Experimental Setup</title>
      <sec id="sec-2-1">
        <title>2.1. Data Description</title>
        <p>The data provided to us by CLEF consisted of 2 folders, “corpus” and “topics qrels”. The corpus includes
a new vector database with sentence embedding scores and retains the previously released ElasticSearch
Index for field-specific document searches. The ElasticSearch Index allows querying fields such as id,
abstract, authors, title, year, and doi from the DBLP dump and is suitable for various applications like
passage retrieval, Latent Dirichlet Allocation models, and training Graph Neural Networks. The Vector
Database stores each article’s id and sentence-embedding vectors from their title and abstract, excluding
articles with empty or very short abstracts, supporting longer queries enabled by sentence embedding.</p>
        <p>The SimpleText 2024 Task 1 Corpus includes topics defined by articles from The Guardian’s tech
section (G01 to G20) and Tech Xplore (T01 to T20), with URLs and textual content provided for participant
use. Queries associated with each topic, manually verified for relevance, enable retrieval of relevant
DBLP passages. This edition introduces new queries for The Guardian articles, generated by ChatGPT
4.0, focusing on specific sub-topics and provided in CSV and JSON formats. The Simpletext 2024 task1
train.qrels file ofers quality relevance judgments on a 0-2 scale for abstracts, incorporating data from
previous editions and new judgments for topics G01-G15, excluding articles with nearly empty abstracts
to ensure consistency with the new vector database.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Method</title>
        <p>We created an ElasticSearch function “query elasticsearch” to query the ElasticSearch database. It took
two parameters: query (the search query) and size (the number of results to return, defaulting to 100).
The function sent a GET request to the ElasticSearch URL with the specified query and size and returned
the search results in JSON format.
# F u n c t i o n t o q u e r y t h e E l a s t i c S e a r c h
d e f q u e r y _ e l a s t i c s e a r c h ( q u e r y , s i z e = 1 0 0 ) :
r e s p o n s e = r e q u e s t s . g e t ( f " { ES_URL } ? q = { q u e r y }&amp; s i z e = { s i z e } " , a u t h
= ( ’ i n e x ’ , ’ q a t c 2 0 1 1 ’ ) )
i f r e s p o n s e . s t a t u s _ c o d e == 2 0 0 :</p>
        <p>r e t u r n r e s p o n s e . j s o n ( ) [ ’ h i t s ’ ] [ ’ h i t s ’ ]
e l s e :
p r i n t ( " F a i l e d t o f e t c h d a t a : " , r e s p o n s e . s t a t u s _ c o d e )
r e t u r n [ ]</p>
        <p>We used the first five examples from the “simpletext 2024 task1 queries.json” file and went over every
index. (Appendix A)</p>
        <p>We created a function to calculate how relevant the abstracts retrieved were to our search. The
function created a Text Frequency Inverse Document Frequency (TF IDF), for vectorizing the texts,
which assessed relevancy of words with regards to our corpus, then calculated the cosine similarity of
our vectorised words. This function could then return a relevance score (rel score)</p>
        <p>To create the combined score, we calculated word dificulty based on the Flesch Kincaid grade level.
The Flesch–Kincaid grade level is one of the formulas used for assessing reading-ease, scores indicate
the grade a person would have to be in US education system to understand the text.
d e f f l e s c h _ k i n c a i d _ g r a d e _ l e v e l ( t e x t ) :
# C o n s t a n t s f o r t h e f o r m u l a
ASL = a v e r a g e _ s e n t e n c e _ l e n g t h ( t e x t )
ASW = a v e r a g e _ s y l l a b l e s _ p e r _ w o r d ( t e x t )
# C a l c u l a t i n g t h e s c o r e
s c o r e = 0 . 3 9 ∗ ASL + 1 1 . 8 ∗ ASW − 1 5 . 5 9
# N o r m a l i z e s c o r e t o r a n g e from 0 t o 1
n o r m a l i z e d _ s c o r e = n o r m a l i z e ( s c o r e , m i n _ s c o r e = 0 , m a x _ s c o r e = 2 5 )
# A d j u s t m a x _ s c o r e a s n e e d e d
r e t u r n n o r m a l i z e d _ s c o r e</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Task 1: Experimental Results</title>
      <p>We analysed our success by using elastic search to select passages and calculated scores using FKGL
and normalisation. The mean of these scores was close to 0.6 which meant that the texts were more
complex than everyday speech and appropriate to be used for the next tasks.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Task 2: Experimental Setup</title>
      <sec id="sec-4-1">
        <title>4.1. Data Description</title>
        <p>The dataset for "Task 2: Identifying and Explaining Dificult Concepts" in the SimpleText Lab is divided
into training and validation folders, each containing several tab-separated files. The training folder
includes documents.tsv (576 rows, 115 documents), documents users.tsv (145 rows, document and
expert IDs), terms.tsv (1,910 rows, terms, dificulty, expert ID), definitions explanations.tsv (1,046 rows,
definitions, explanations, expert ID), and definitions generated.tsv (589 rows, automatically generated
definitions). The validation folder contains definitions explanations.tsv (960 rows, definitions without
explanations), definitions generated.tsv (932 rows, automatically generated definitions), and terms.tsv
(680 rows, terms, dificulty). Initial annotations were performed by multiple experts, with a second
round of validation by an external expert to identify additional terms and definitions. The dataset will
later include test files for the evaluation phase.</p>
        <p>The test dataset for "Task 2: Identifying and Explaining Dificult Concepts" in the SimpleText Lab
includes several tab-separated files. The documents.tsv file contains 501 rows across 55 documents with
columns for document ID, sentence ID, and sentence text. The terms.tsv and definitions explanations.tsv
ifles, available after the evaluation phase, provide annotated sentence IDs, extracted terms, dificulty
levels (easy, medium, dificult), user-provided definitions, and explanations. Finally, the definitions
generated.tsv file contains 3,816 rows with unique definition IDs and the corresponding definitions to
be ranked.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Method</title>
        <p>We created a prompt for LLAMA-2 13B model that asked the LLM to iterate over each of our source
sentences and extract three scientific terms from the phrase.
p r o m p t _ t e r m s = " " "</p>
        <p>You a r e a r o b o t t h a t ONLY o u t p u t s JSON .</p>
        <p>You r e p l y i n JSON f o r m a t w i t h t h e f i e l d ’ t e r m s ’ .</p>
        <p>You p r o v i d e ONLY s e m i c o l o n − s e p a r a t e d l i s t o f MAXIMUM 3</p>
        <p>s c i e n t i f i c t e r m s o f a s o u r c e s e n t e n c e ONLY .</p>
        <p>You DO NOT add ’ S u r e , Here a r e t h e s c i e n t i f i c t e r m s o f y o u r
s e n t e n c e : ’ .</p>
        <p>Example s o u r c e s e n t e n c e : I n t h e modern e r a o f a u t o m a t i o n and
r o b o t i c s , \
autonomous v e h i c l e s a r e c u r r e n t l y t h e f o c u s o f a c a d e m i c and
i n d u s t r i a l r e s e a r c h . ? \
Example a n s w e r : { ’ t e r m s ’ : ’ r o b o t i c s ; autonomous v e h i c l e s ’ }
Now h e r e i s my s e n t e n c e :
" " "</p>
        <p>We used Regex to help us deal with regular expressions, removing unnecessary content in the outputs.
(Appendix B)</p>
        <p>The terms were then sorted into three rows, with duplicates removed, one term per row and we
prompted Llama to give us a dificulty rating of easy, medium, or dificult for our terms. (Appendix C)
We used wikipedia to return definitions for the dificult terms, with limited success. (Appendix D)
We also asked the LLM to provide an explanation. When creating our prompt for our LLM, we gave
it a few examples of correct return phrases, that were taken from the document provided. This was to
improve the ability to achieve “few-shot” results. (Appendix E)</p>
        <p>We then created a function to remove unnecessary text. (Appendix F)</p>
        <p>Finally, we compiled our results in a JSON file, with those terms considered “d” for dificult, generating
definitions. (Appendix G)</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Task 2: Experimental Results</title>
      <p>The LLM was successful in generating definitions for our dificult terms, but an issue we encountered
was that Wikipedia was unsuccessful in generating definitions for our terms. Therefore, this certainly
harms the appropriateness of this method as many of our definitions are missing.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Task 3: Experimental Setup</title>
      <sec id="sec-6-1">
        <title>6.1. Method</title>
        <p>We used LLAMA-2 13B once more, creating a larger context window of 4096. We gave the sentences to
the LLM, asking it to simplify the texts. Again, we instructed the LLM to remove fluf words like “Sure!”
etc. This gave us an additional column for our simplified sentences, simplified snt. (Appendix H)</p>
        <p>Once again, it was important to remove unnecessary text therefore we created a function to carry
out this task. (Appendix I)</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Task 3: Experimental Results</title>
      <p>Our results from the LLAMA 13B model for simplifying both the source abstracts and source sentences
seems promising. Based on human validation of the simplified phrases, it’s seems clear that the meaning
has been preserved while reducing the complexity of words and the length of the sentences.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion</title>
      <p>Our research has shown LLAMA-2 18B to be an efective model for selecting and simplifying passages
from scientific texts. However, we’ve also highlighted the unreliability of relying on wikipedia for the
provision of definitions in this context.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments References</title>
      <p>
        We’d like to extend our gratitude to the University of Brest for organising the Blended Intensive
Programme (BIP) AI For Humanities. We would also like to thank Liana Ermakova for her teaching of
the course and Caroline L’haridon for her support during our stay in Brest.
.1. Appendix A
d e f main ( ) :
# Read q u e r i e s from JSON f i l e i n t o a d a t a f r a m e
q u e r i e s = pd . r e a d _ j s o n ( ’ / c o n t e n t / d r i v e / MyDrive / B I P / S i m p l e T e x t /
t a s k 1 / t a s k 1 / t o p i c s _ q r e l s / s i m p l e t e x t _ 2 0 2 4 _ t a s k 1 _ q u e r i e s . j s o n
’ )
q u e r i e s = q u e r i e s . h e a d (
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
a l l _ r e s u l t s = [ ]
f o r i n d e x , q u e r y _ r o w i n q u e r i e s . i t e r r o w s ( ) :
q u e r y _ t e x t = q u e r y _ r o w [ ’ q u e r y _ t e x t ’ ]
t o p i c _ i d = q u e r y _ r o w [ ’ t o p i c _ i d ’ ]
q u e r y _ i d = q u e r y _ r o w [ ’ q u e r y _ i d ’ ]
d o c s = q u e r y _ e l a s t i c s e a r c h ( q u e r y _ t e x t )
s c o r e s = c a l c u l a t e _ r e l e v a n c e ( d o c s , q u e r y _ t e x t )
r e s u l t s = f o r m a t _ r e s u l t s ( docs , s c o r e s , t o p i c _ i d , q u e r y _ i d )
a l l _ r e s u l t s . e x t e n d ( r e s u l t s )
# Output r e s u l t s t o a JSON f i l e
with open ( ’ r e s u l t s . j s o n ’ , ’w’ ) a s f :
      </p>
      <p>j s o n . dump ( a l l _ r e s u l t s , f , i n d e n t = 4 )
.2. Appendix B
d e f e x t r a c t _ v a l u e _ i n s i d e _ c u r l y _ b r a c e s ( t e x t ) :
# Use r e g e x t o f i n d t h e v a l u e i n s i d e c u r l y b r a c e s
match = r e . s e a r c h ( r " \ { ( [ ^ { } ] ∗ ) \ } " , t e x t )
i f match :</p>
      <p>
        r e t u r n match . group (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
e l s e :
      </p>
      <p>r e t u r n None
.3. Appendix C
p r o m p t _ d i f f i c u l t y = " " "</p>
      <p>You a r e a r o b o t t h a t r a t e s t h e d i f f i c u l t y o f d i f f e r e n t ter ms .
You p r o v i d e ONE LEVEL o d i f f i c u l t y f o r s c i e n t i f i c term s .
You need t o c o n s i d e r two words a s one term .</p>
      <p>P r o v i d e ONE r a t i n g f o r t h e u n d e r s t a b l i t y d i f f i c u l t y o f term
p r o v i d e d .</p>
      <p>There a r e 3 l e v e l s . You need t o use : e f o r easy , m f o r medium
and d f o r d i f f i c u l t .</p>
      <p>Give t h e r a t i n g i n s i d e o f c u r l y b r a c e s l i k e t h i s { e }
You can r e p l y with ONLY one word .</p>
      <p>Example s o u r c e : autonomous v e h i c l e s
Example answer : { ’m’ }</p>
      <p>Now h e r e i s my s e n t e n c e :
" " "
.4. Appendix D
i m p o r t w i k i p e d i a
d e f g e t _ w i k i p e d i a _ d e f i n i t i o n ( term ) :
t r y :
# F e t c h W i k i p e d i a summary f o r t h e term
summary = w i k i p e d i a . summary ( term )
r e t u r n summary
e x c e p t w i k i p e d i a . e x c e p t i o n s . D i s a m b i g u a t i o n E r r o r a s e :
# I f t h e r e ’ s a d i s a m b i g u a t i o n e r r o r , h a n d l e i t a s needed
r e t u r n " D i s a m b i g u a t i o n E r r o r : Ambiguous term "
e x c e p t w i k i p e d i a . e x c e p t i o n s . P a g e E r r o r a s e :
# I f t h e page doesn ’ t e x i s t , h a n d l e i t a s needed
r e t u r n " P a g e E r r o r : Term not found "
e x c e p t E x c e p t i o n a s e :
# Handle o t h e r e x c e p t i o n s
r e t u r n s t r ( e )
# Assuming t e s t [ ’ d i f f i c u l t y ’ ] c o n t a i n s term s f o r which you want</p>
      <p>W i k i p e d i a d e f i n i t i o n s
# t e s t [ ’ wiki ’ ] = t e s t [ ’ term ’ ] . a p p l y ( g e t _ w i k i p e d i a _ d e f i n i t i o n )
t e s t . l o c [ t e s t [ ’ d i f f i c u l t y ’ ] == ’ d ’ , ’ wiki ’ ] = t e s t . l o c [ t e s t [ ’
d i f f i c u l t y ’ ] == ’ d ’ , ’ term ’ ] . a p p l y ( g e t _ w i k i p e d i a _ d e f i n i t i o n )
t e s t
.5. Appendix E
p r o m p t _ e x p l a n a t i o n = " " "</p>
      <p>You a r e a r o b o t t h a t e x p l a i n s d i f f i c u l t s c i e n t i f i c term s .
DO NOT add i n t r o l i k e " Sure , I ’ d be happy t o h e l p ! "
Use o n l y once s e n t a n c e and wrap t h e s e n t a n c e i n c u r l y b r a c e s .
D o n t j u s t i f y your answers . D o n t g i v e i n f o r m a t i o n not
mentioned i n t h e CONTEXT INFORMATION .</p>
      <p>Example s o u r c e : w i r e l e s s network environment
Example answer : { ’ a system i n which d e v i c e s makes use o f Radio
Frequency c o n n e c t i o n s between nodes i n t h e network a system
i n which d e v i c e s a r e c o n n e c t e d t o a network w i t h o u t t h e need
f o r p h y s i c a l c a b l e s or wires ’ }
Example s o u r c e : B l u e t o o t h w i r e l e s s t e c h n o l o g y
Example answer : { ’ s h o r t − range w i r e l e s s communication t e c h n o l o g y
t h a t a l l o w s d e v i c e s t o c o n n e c t and exchange d a t a . I t
f a c i l i t a t e s d a t a exchange between d e v i c e s l i k e smartphones ,
computers , and p e r i p h e r a l s such a s headphones or m e d i c a l
d e v i c e s . B l u e t o o t h t e c h n o l o g y e l i m i n a t e s t h e need f o r
p h y s i c a l c a b l e s , p r o v i d i n g c o n v e n i e n c e and v e r s a t i l i t y i n
d e v i c e c o n n e c t i v i t y . ’ }
Example s o u r c e : a p p l i c a t i o n
Example answer : { ’ s o f t w a r e program or t o o l d e s i g n e d t o perform
s p e c i f i c t a s k s or f u n c t i o n s on e l e c t r o n i c d e v i c e s . I t can
range from p r o d u c t i v i t y t o o l s and games t o u t i l i t i e s and
communication p l a t f o r m s on e l e c t r o n i c d e v i c e s such a s
computers , smartphones , or t a b l e t s . ’ }
Example s o u r c e : PDA
Example answer : { ’ PDA i s t h e acronym f o r p e r s o n a l d i g i t a l
a s s i s t a n t , which i s a handheld e l e c t r o n i c d e v i c e d e s i g n e d f o r
p e r s o n a l o r g a n i z a t i o n , communication , and i n f o r m a t i o n a c c e s s
. PDAs may i n c l u d e f e a t u r e s such a s c a l e n d a r s , c o n t a c t l i s t s ,
and note − t a k i n g c a p a b i l i t i e s , s e r v i n g a s p o r t a b l e t o o l s f o r
managing d a i l y t a s k s . PDA i s t h e acronym f o r p e r s o n a l d i g i t a l
a s s i s t a n t , which i s a handheld e l e c t r o n i c d e v i c e c r a f t e d f o r
p e r s o n a l o r g a n i z a t i o n , communication , and i n f o r m a t i o n
r e t r i e v a l . PDAs o f t e n i n c o r p o r a t e f e a t u r e s l i k e c a l e n d a r s ,
c o n t a c t l i s t s , and note − t a k i n g c a p a b i l i t i e s , f u n c t i o n i n g a s
p o r t a b l e t o o l s f o r managing d a i l y t a s k s and s t a y i n g c o n n e c t e d
. While modern smartphones have l a r g e l y r e p l a c e d t r a d i t i o n a l
PDAs , t h e c o n c e p t i n f l u e n c e d t h e development o f contemporary
m o b i l e d e v i c e s . ’ }
Example s o u r c e : p i l o t s t u d y
Example answer : { ’ a p r e l i m i n a r y r e s e a r c h i n v e s t i g a t i o n c o n d u c t e d
on a s m a l l s c a l e t o a s s e s s t h e f e a s i b i l i t y , and p o t e n t i a l
c h a l l e n g e s o f a l a r g e r r e s e a r c h p r o j e c t . an i n i t i a l and
s m a l l e r − s c a l e r e s e a r c h i n v e s t i g a t i o n u n d e r t a k e n t o e v a l u a t e
t h e f e a s i b i l i t y , methodology , and p o t e n t i a l o b s t a c l e s o f a
l a r g e r r e s e a r c h p r o j e c t . I t s e r v e s a s a t e s t i n g ground t o
r e f i n e t h e s t u d y d e s i g n , i d e n t i f y l o g i s t i c a l i s s u e s , and
enhance t h e o v e r a l l r o b u s t n e s s and e f f e c t i v e n e s s o f t h e
p l a n n e d f u l l − s c a l e r e s e a r c h e n d e a v o r . ’ }</p>
      <p>Now h e r e i s my ONE s e n t e n c e e x p l a n a t i o n :
" " "
.6. Appendix F
d e f r e m o v e _ r e d u n d a n t _ t e x t ( t e x t ) :
# D e f i n e p a t t e r n s t o s e a r c h f o r
p a t t e r n s = [
r ’ ^ Hey t h e r e ! ’ ,
r ’ ^ Sure ! ’ ,
r ’ ^ As a s c i e n t i f i c j o u r n a l i s t , ’ ,
r ’ I \ ’m h e r e t o b r e a k down a complex s t u d y i n t o s i m p l e t e r m s
f o r you \ . ’ ,
r ’ Here \ ’ s a s i m p l i f i e d v e r s i o n o f t h e t e x t ’ ,
r ’ L e t me b r e a k i t down f o r you : ’ ,
r ’ I \ ’m h e r e t o b r e a k down a complex s t u d y i n t o s i m p l e t e r m s
f o r you \ . ’ ,
r ’ I \ ’m h e r e t o b r e a k down complex s c i e n t i f i c c o n c e p t s i n t o
s i m p l e , easy − to − u n d e r s t a n d l a n g u a g e . ’ ,
r ’ I \ ’m h e r e t o b r e a k down a complex t o p i c i n t o s i m p l e r t e r m s
f o r you . So , l e t \ ’ s t a l k about ’ ,
r ’ Here i s my one s e n t e n c e e x p l a n a t i o n of ’
.7. Appendix G
# Add d e f i n i t i o n and e x p l a n a t i o n i f t h e y a r e not empty
i f row [ " d i f f i c u l t y " ] == " d " :
d e f i n i t i o n = row . g e t ( " d e f i n i t i o n " , None )
e x p l a n a t i o n = row . g e t ( " e x p l a n a t i o n " , None )
i f d e f i n i t i o n :</p>
      <p>j s o n _ o b j [ " d e f i n i t i o n " ] = d e f i n i t i o n
i f e x p l a n a t i o n :</p>
      <p>j s o n _ o b j [ " e x p l a n a t i o n " ] = e x p l a n a t i o n
r e t u r n j s o n _ o b j
.8. Appendix H
# Example usage
d e f s i m p l i f y ( s n t ) :
c = model . c r e a t e _ c h a t _ c o m p l e t i o n (
messages =[
{ " r o l e " : " system " , " c o n t e n t " : " You a r e a s c i e n t i f i c</p>
      <p>j o u r n a l i s t who p o p u l a r i z e s s c i e n t i f i c r e s u l t s . " } ,
{ " r o l e " : " u s e r " , " c o n t e n t " : " S i m p l i f y t h e f o l l o w i n g t e x t
: \ n " + s n t }
)
r e t u r n c [ ’ c h o i c e s ’ ] [ 0 ] [ ’ message ’ ] [ ’ c o n t e n t ’ ] . s t r i p ( )
d e f s i m p l i f y ( s n t ) :
c=model . c r e a t e _ c h a t _ c o m p l e t i o n (
messages = [
{ " r o l e " : " system " , " c o n t e n t " : " You a r e a s c i e n t i f i c
j o u r n a l i s t who p o p u l a r i z e s s c i e n t i f i c r e s u l t s . " } ,
" r o l e " : " u s e r " ,
" c o n t e n t " : " S i m p l i f y t h e f o l l o w i n g t e x t : \ n "+ s n t
)
r e t u r n c [ ’ c h o i c e s ’ ] [ 0 ] [ ’ message ’ ] [ ’ c o n t e n t ’ ] . s t r i p ( )
s i m p l i f y ( " With t h e e v e r i n c r e a s i n g number o f unmanned a e r i a l
v e h i c l e s g e t t i n g i n v o l v e d i n a c t i v i t i e s i n t h e c i v i l i a n and
commercial domain , t h e r e i s an i n c r e a s e d need f o r autonomy i n
t h e s e s y s t e m s t o o . " )
.9. Appendix I
d e f r e m o v e _ r e d u n d a n t _ t e x t ( t e x t ) :
# D e f i n e p a t t e r n s t o s e a r c h f o r
p a t t e r n s = [
r ’ ^ Hey t h e r e ! ’ ,
r ’ ^ Sure ! ’ ,
r ’ ^ As a s c i e n t i f i c j o u r n a l i s t , ’ ,
r ’ I \ ’m h e r e t o b r e a k down a complex s t u d y i n t o s i m p l e t erms
f o r you \ . ’ ,
r ’ Here \ ’ s a s i m p l i f i e d v e r s i o n o f t h e t e x t ’ ,
r ’ L e t me b r e a k i t down f o r you : ’ ,
r ’ I \ ’m h e r e t o b r e a k down a complex s t u d y i n t o s i m p l e t erms
f o r you \ . ’ ,
r ’ I \ ’m h e r e t o b r e a k down complex s c i e n t i f i c c o n c e p t s i n t o
s i m pl e , easy − to − u n d e r s t a n d l a n g u a g e . ’ ,
r ’ I \ ’m h e r e t o b r e a k down a complex t o p i c i n t o s i m p l e r t erms
f o r you . So , l e t \ ’ s t a l k about ’ ,
r ’ Sure , I \ ’ d be happy t o h e l p ! ’ ,
r ’ Here \ ’ s a s i m p l i f i e d e x p l a n a t i o n of ’ ,
r ’ I n o t h e r words , ’ ,
r ’ I n s i m p l e terms , ’
]
# Compile r e g u l a r e x p r e s s i o n s
r e g e x _ p a t t e r n s = [ r e . c o m p i l e ( p a t t e r n ) f o r p a t t e r n i n p a t t e r n s ]
# Remove p a t t e r n s from t e x t
f o r p a t t e r n i n r e g e x _ p a t t e r n s :</p>
      <p>t e x t = r e . sub ( p a t t e r n , ’ ’ , t e x t ) . s t r i p ( )</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          , et al.,
          <source>Overview of CLEF</source>
          <year>2024</year>
          <article-title>SimpleText track on improving access to scientific texts</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
          </string-name>
          , et al. (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Lecture Notes in Computer Science, Springer,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>SanJuan</surname>
          </string-name>
          , et al.,
          <article-title>Overview of the CLEF 2024 SimpleText task 1: Retrieve passages to include in a simplified summary</article-title>
          , in: G.
          <string-name>
            <surname>Faggioli</surname>
          </string-name>
          , et al. (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2024</year>
          ), CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>G. M. D. Nunzio</surname>
          </string-name>
          , et al.,
          <article-title>Overview of the CLEF 2024 SimpleText task 2: Identify and explain dificult concepts</article-title>
          , in: G.
          <string-name>
            <surname>Faggioli</surname>
          </string-name>
          , et al. (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2024</year>
          ), CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          , et al.,
          <article-title>Overview of the CLEF 2024 SimpleText task 3: Simplify scientific text</article-title>
          , in: G.
          <string-name>
            <surname>Faggioli</surname>
          </string-name>
          , et al. (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2024</year>
          ), CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>J. D'Souza</surname>
          </string-name>
          , et al.,
          <article-title>Overview of the CLEF 2024 SimpleText task 4: Track the state-of-the-art in scholarly publications</article-title>
          , in: G.
          <string-name>
            <surname>Faggioli</surname>
          </string-name>
          , et al. (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2024</year>
          ), CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>