<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">CLEF 2024 SimpleText Tasks 1-3: Use of Llama-2 for Text Simplification ⋆ Notebook for the SimpleText Lab at CLEF 2024</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Rowan</forename><surname>Mann</surname></persName>
							<email>rowanmann93@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Christian</orgName>
								<orgName type="institution">Albrechts-Universität zu Kiel (CAU)</orgName>
								<address>
									<addrLine>Christian-Albrechts-Platz 4</addrLine>
									<postCode>24118</postCode>
									<settlement>Kiel</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Tomislav</forename><surname>Mikulandric</surname></persName>
							<email>tomislav.mikulandric@gmail.com</email>
							<affiliation key="aff1">
								<orgName type="institution">The University of Split</orgName>
								<address>
									<addrLine>Ul. Ruđera Boškovića 31</addrLine>
									<postCode>21000</postCode>
									<settlement>Split</settlement>
									<country key="HR">Croatia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">CLEF 2024 SimpleText Tasks 1-3: Use of Llama-2 for Text Simplification ⋆ Notebook for the SimpleText Lab at CLEF 2024</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">97DF434D08A24F9E691E6AD6039F657B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:02+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>LLMs</term>
					<term>text simplification</term>
					<term>LLaMA-2</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In an era defined by the vast availability of information, the challenge of discerning reliable information is more pressing than ever. Our paper presents findings from Tasks 1, 2, and 3 of the SimpleText track at the 15th Conference and Labs of the Evaluation Forum (CLEF) 2024, aimed at advancing research in automatic simplification of scientific texts using LLaMA-2.</p><p>Task 1 involved selecting relevant passages for simplified summaries, leveraging ElasticSearch and TF-IDF with cosine similarity for evaluating relevance. We achieved an average Flesch-Kincaid grade level of 0.6, indicating a moderate complexity suitable for further simplification.</p><p>Task 2 focused on identifying and explaining difficult concepts. Using the LLaMA-2 13B model, we extracted and rated the difficulty of scientific terms, generating explanations for the most challenging ones. However, reliance on Wikipedia for definitions proved inconsistent, highlighting a limitation in our methodology.</p><p>Task 3 addressed the simplification of scientific abstracts and sentences. We utilized LLaMA-2 to generate simplified versions, effectively maintaining the original meaning while reducing complexity and length. Human validation confirmed the preservation of essential content in the simplified texts.</p><p>Our research demonstrates the efficacy of LLaMA-2 for text simplification tasks, albeit with noted challenges in obtaining reliable definitions from external sources like Wikipedia. These findings contribute to the broader goal of enhancing scientific literacy through accessible information.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>We live in an era characterised by an abundance of information available to all, almost instantaneously. However, far from creating a world defined by truth and understanding, our era seems to be accurately defined by misinformation and polarity. Fake news and algorithmically determined "echo-chambers" have helped spread conspiracy and division across the world, with consequences that reverberate far beyond their origins in cyberspace.</p><p>For the average person, it's more difficult than ever to know what information to believe. We all need to be able to understand our world, so scientific literacy is a more important skill than ever.</p><p>This paper presents the results of our analysis of Tasks 1, 2 and 3 of the SimpleText track as part of the 15th Conference and Labs of the Evaluation Forum 2024. The main goal of SimpleText is to advance research in the area of automatic simplification of scientific texts <ref type="bibr" target="#b0">[1]</ref>.</p><p>The paper deals with:</p><p>• Task 1: What is in (or out)? Selecting passages to include in a simplified summary.</p><p>• Task 2: What is unclear? Difficult concept identification and explanation • Task 3: Rewrite this! Given a query, simplify passages from scientific abstracts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Task 1: Experimental Setup</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Data Description</head><p>The data provided to us by CLEF consisted of 2 folders, "corpus" and "topics qrels". The corpus includes a new vector database with sentence embedding scores and retains the previously released ElasticSearch Index for field-specific document searches. The ElasticSearch Index allows querying fields such as id, abstract, authors, title, year, and doi from the DBLP dump and is suitable for various applications like passage retrieval, Latent Dirichlet Allocation models, and training Graph Neural Networks. The Vector Database stores each article's id and sentence-embedding vectors from their title and abstract, excluding articles with empty or very short abstracts, supporting longer queries enabled by sentence embedding. The SimpleText 2024 Task 1 Corpus includes topics defined by articles from The Guardian's tech section (G01 to G20) and Tech Xplore (T01 to T20), with URLs and textual content provided for participant use. Queries associated with each topic, manually verified for relevance, enable retrieval of relevant DBLP passages. This edition introduces new queries for The Guardian articles, generated by ChatGPT 4.0, focusing on specific sub-topics and provided in CSV and JSON formats. The Simpletext 2024 task1 train.qrels file offers quality relevance judgments on a 0-2 scale for abstracts, incorporating data from previous editions and new judgments for topics G01-G15, excluding articles with nearly empty abstracts to ensure consistency with the new vector database.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Method</head><p>We created an ElasticSearch function "query elasticsearch" to query the ElasticSearch database. It took two parameters: query (the search query) and size (the number of results to return, defaulting to 100). The function sent a GET request to the ElasticSearch URL with the specified query and size and returned the search results in JSON format. # F u n c t i o n t o q u e r y t h e E l a s t i c S e a r c h d e f q u e r y _ e l a s t i c s e a r c h ( query , s i z e = 1 0 0 ) : r e s p o n s e = r e q u e s t s . g e t ( f " { ES_URL } ? q = { q u e r y }&amp; s i z e = { s i z e } " , a u t h = ( ' i n e x ' , ' q a t c 2 0 1 We created a function to calculate how relevant the abstracts retrieved were to our search. The function created a Text Frequency Inverse Document Frequency (TF IDF), for vectorizing the texts, which assessed relevancy of words with regards to our corpus, then calculated the cosine similarity of our vectorised words. This function could then return a relevance score (rel score)</p><p>To create the combined score, we calculated word difficulty based on the Flesch Kincaid grade level. The Flesch-Kincaid grade level is one of the formulas used for assessing reading-ease, scores indicate the grade a person would have to be in US education system to understand the text. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Task 1: Experimental Results</head><p>We analysed our success by using elastic search to select passages and calculated scores using FKGL and normalisation. The mean of these scores was close to 0.6 which meant that the texts were more complex than everyday speech and appropriate to be used for the next tasks. The test dataset for "Task 2: Identifying and Explaining Difficult Concepts" in the SimpleText Lab includes several tab-separated files. The documents.tsv file contains 501 rows across 55 documents with columns for document ID, sentence ID, and sentence text. The terms.tsv and definitions explanations.tsv files, available after the evaluation phase, provide annotated sentence IDs, extracted terms, difficulty levels (easy, medium, difficult), user-provided definitions, and explanations. Finally, the definitions generated.tsv file contains 3,816 rows with unique definition IDs and the corresponding definitions to be ranked.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Method</head><p>We created a prompt for LLAMA-2 13B model that asked the LLM to iterate over each of our source sentences and extract three scientific terms from the phrase. The terms were then sorted into three rows, with duplicates removed, one term per row and we prompted Llama to give us a difficulty rating of easy, medium, or difficult for our terms. (Appendix C)</p><p>We used wikipedia to return definitions for the difficult terms, with limited success. (Appendix D)</p><p>We also asked the LLM to provide an explanation. When creating our prompt for our LLM, we gave it a few examples of correct return phrases, that were taken from the document provided. This was to improve the ability to achieve "few-shot" results. (Appendix E)</p><p>We then created a function to remove unnecessary text. (Appendix F) Finally, we compiled our results in a JSON file, with those terms considered "d" for difficult, generating definitions. (Appendix G)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Task 2: Experimental Results</head><p>The LLM was successful in generating definitions for our difficult terms, but an issue we encountered was that Wikipedia was unsuccessful in generating definitions for our terms. Therefore, this certainly harms the appropriateness of this method as many of our definitions are missing. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Task 3: Experimental Setup</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.">Method</head><p>We used LLAMA-2 13B once more, creating a larger context window of 4096. We gave the sentences to the LLM, asking it to simplify the texts. Again, we instructed the LLM to remove fluff words like "Sure!" etc. This gave us an additional column for our simplified sentences, simplified snt. (Appendix H) Once again, it was important to remove unnecessary text therefore we created a function to carry out this task. (Appendix I)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Task 3: Experimental Results</head><p>Our results from the LLAMA 13B model for simplifying both the source abstracts and source sentences seems promising. Based on human validation of the simplified phrases, it's seems clear that the meaning has been preserved while reducing the complexity of words and the length of the sentences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.">Conclusion</head><p>Our research has shown LLAMA-2 18B to be an effective model for selecting and simplifying passages from scientific texts. However, we've also highlighted the unreliability of relying on wikipedia for the provision of definitions in this context.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>.1. Appendix A d e f main ( ) :</head><p># Read q u e r i e s from JSON f i l e i n t o a d a t a f r a m e q u e r i e s = pd . r e a d _ j s o n ( ' / c o n t e n t / d r i v e / MyDrive / BIP / S i m p l e T e x t / t a s k 1 / t a s k 1 / t o p i c s _ q r e l s / s i m p l e t e x t _ 2 0 2 4 _ t a s k 1 _ q u e r i e s . j s o n ' ) q u e r i e s = q u e r i e s . head <ref type="bibr" target="#b4">( 5 )</ref> a l l _ r e s u l t s = [ ] f o r i n d e x , query_row i n q u e r i e s . i t e r r o w s ( ) :</p><p>q u e r y _ t e x t = query_row [ ' q u e r y _ t e x t ' ] t o p i c _ i d = query_row [ ' t o p i c _ i d ' ] q u e r y _ i d = query_row [ ' q u e r y _ i d ' ] d o c s = q u e r y _ e l a s t i c s e a r c h ( q u e r y _ t e x t ) s c o r e s = c a l c u l a t e _ r e l e v a n c e ( docs , q u e r y _ t e x t ) r e s u l t s = f o r m a t _ r e s u l t s ( docs , s c o r e s , t o p i c _ i d , q u e r y _ i d ) a l l _ r e s u l t s . e x t e n d ( r e s u l t s ) # Output r e s u l t s t o a JSON f i l e w i t h open ( ' r e s u l t s . j s o n ' , 'w ' ) a s f : j s o n . dump ( a l l _ r e s u l t s , f , i n d e n t = 4 )</p><p>. on a s m a l l s c a l e t o a s s e s s t h e f e a s i b i l i t y , and p o t e n t i a l c h a l l e n g e s o f a l a r g e r r e s e a r c h p r o j e c t . an i n i t i a l and s m a l l e r − s c a l e r e s e a r c h i n v e s t i g a t i o n u n d e r t a k e n t o e v a l u a t e t h e f e a s i b i l i t y , methodology , and p o t e n t i a l o b s t a c l e s o f a l a r g e r r e s e a r c h p r o j e c t . I t s e r v e s a s a t e s t i n g ground t o r e f i n e t h e s t u d y d e s i g n , i d e n t i f y l o g i s t i c a l i s s u e s , and { " r o l e " : " s y s t e m " , " c o n t e n t " : " You a r e a s c i e n t i f i c j o u r n a l i s t who p o p u l a r i z e s s c i e n t i f i c r e s u l t s . " } , { " r o l e " : " u s e r " , " c o n t e n t " : " S i m p l i f y t h e f o l l o w i n g t e x t : \ n " + s n # Remove p a t t e r n s from t e x t f o r p a t t e r n i n r e g e x _ p a t t e r n s : t e x t = r e . sub ( p a t t e r n , ' ' , t e x t ) . s t r i p ( ) r e t u r n t e x t</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Appendix B</head></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>d e f f l e s c h _ k i n c a i d _ g r a d e _ l e v e l ( t e x t ) : # C o n s t a n t s f o r t h e f o r m u l a ASL = a v e r a g e _ s e n t e n c e _ l e n g t h ( t e x t ) ASW = a v e r a g e _ s y l l a b l e s _ p e r _ w o r d ( t e x t ) # C a l c u l a t i n g t h e s c o r e s c o r e = 0 . 3 9 * ASL + 1 1 . 8 * ASW − 1 5 . 5 9 # N o r m a l i z e s c o r e t o r a n g e from 0 t o 1 n o r m a l i z e d _ s c o r e = n o r m a l i z e ( s c o r e , m i n _ s c o r e = 0 , m a x _ s c o r e = 2 5 ) # A d j u s t m a x _ s c o r e a s n e e d e d r e t u r n n o r m a l i z e d _ s c o r e</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>p r o m p t _ t e r m s = " " " You a r e a r o b o t t h a t ONLY o u t p u t s JSON . You r e p l y i n JSON f o r m a t w i t h t h e f i e l d ' terms ' . You p r o v i d e ONLY s e m i c o l o n − s e p a r a t e d l i s t o f MAXIMUM 3 s c i e n t i f i c t e r m s o f a s o u r c e s e n t e n c e ONLY . You DO NOT add ' Sure , Here a r e t h e s c i e n t i f i c t e r m s o f your s e n t e n c e : ' . Example s o u r c e s e n t e n c e : I n t h e modern e r a o f a u t o m a t i o n and r o b o t i c s , \ autonomous v e h i c l e s a r e c u r r e n t l y t h e f o c u s o f a c a d e m i c and i n d u s t r i a l r e s e a r c h . ? \ Example answer : { ' terms ' : ' r o b o t i c s ; autonomous v e h i c l e s ' } Now h e r e i s my s e n t e n c e : " " " We used Regex to help us deal with regular expressions, removing unnecessary content in the outputs. (Appendix B)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>d e f e</head><label></label><figDesc>x t r a c t _ v a l u e _ i n s i d e _ c u r l y _ b r a c e s ( t e x t ) : # Use r e g e x t o f i n d t h e v a l u e i n s i d e c u r l y b r a c e s match = r e . s e a r c h ( r " \ { ( [ ^{ } ] * ) \ } " , t e x t ) i f match : r e t u r n match . group ( 1 ) e l s e : r e t u r n None .3. Appendix C p r o m p t _ d i f f i c u l t y = " " " You a r e a r o b o t t h a t r a t e s t h e d i f f i c u l t y o f d i f f e r e n t t e r m s . You p r o v i d e ONE LEVEL o d i f f i c u l t y f o r s c i e n t i f i c t e r m s . You need t o c o n s i d e r two words a s one term . P r o v i d e ONE r a t i n g f o r t h e u n d e r s t a b l i t y d i f f i c u l t y o f term p r o v i d e d . There a r e 3 l e v e l s . You need t o u s e : e f o r easy , m f o r medium and d f o r d i f f i c u l t . Give t h e r a t i n g i n s i d e o f c u r l y b r a c e s l i k e t h i s { e } You can r e p l y w i t h ONLY one word . Example s o u r c e : autonomous v e h i c l e s Example answer : { ' m' } Now h e r e i s my s e n t e n c e : " " " .4. Appendix D i m p o r t w i k i p e d i a d e f g e t _ w i k i p e d i a _ d e f i n i t i o n ( term ) : t r y : # F e t c h W i k i p e d i a summary f o r t h e term summary = w i k i p e d i a . summary ( term ) r e t u r n summary e x c e p t w i k i p e d i a . e x c e p t i o n s . D i s a m b i g u a t i o n E r r o r a s e : # I f t h e r e ' s a d i s a m b i g u a t i o n e r r o r , h a n d l e i t a s n e e d e d r e t u r n " D i s a m b i g u a t i o n E r r o r : Ambiguous term " e x c e p t w i k i p e d i a . e x c e p t i o n s . P a g e E r r o r a s e : # I f t h e page doesn ' t e x i s t , h a n d l e i t a s n e e d e d r e t u r n " P a g e E r r o r : Term n o t f o u n d " e x c e p t E x c e p t i o n a s e : # Handle o t h e r e x c e p t i o n s r e t u r n s t r ( e ) # Assuming t e s t [ ' d i f f i c u l t y ' ] c o n t a i n s t e r m s f o r which you want W i k i p e d i a d e f i n i t i o n s # t e s t [ ' wi k i ' ] = t e s t [ ' term ' ] . a p p l y ( g e t _ w i k i p e d i a _ d e f i n i t i o n ) t e s t . l o c [ t e s t [ ' d i f f i c u l t y ' ] == ' d ' , ' w i ki ' ] = t e s t . l o c [ t e s t [ ' d i f f i c u l t y ' ] == ' d ' , ' term ' ] . a p p l y ( g e t _ w i k i p e d i a _ d e f i n i t i o n ) t e s t .5. Appendix E p r o m p t _ e x p l a n a t i o n = " " " You a r e a r o b o t t h a t e x p l a i n s d i f f i c u l t s c i e n t i f i c t e r m s . DO NOT add i n t r o l i k e " Sure , I ' d be happy t o h e l p ! " Use o n l y once s e n t a n c e and wrap t h e s e n t a n c e i n c u r l y b r a c e s . D o n t j u s t i f y your a n s w e r s . D o n t g i v e i n f o r m a t i o n n o t m e n t i o n e d i n t h e CONTEXT INFORMATION . Example s o u r c e : w i r e l e s s network e n v i r o n m e n t Example answer : { ' a s y s t e m i n which d e v i c e s makes u s e o f R a d i o F r e q u e n c y c o n n ec t i o n s between n o d e s i n t h e network a s y s t e m i n which d e v i c e s a r e c o n n e c t e d t o a network w i t h o u t t h e need f o r p h y s i c a l c a b l e s o r w i r e s ' } Example s o u r c e : B l u e t o o t h w i r e l e s s t e c h n o l o g y Example answer : { ' s h o r t − r a n g e w i r e l e s s c o m m u n i c a t i o n t e c h n o l o g y t h a t a l l o w s d e v i c e s t o c o n n e c t and e x c h a n g e d a t a . I t f a c i l i t a t e s d a t a e x c h a n g e between d e v i c e s l i k e s m a r t p h o n e s , co m p u t e r s , and p e r i p h e r a l s s u c h a s h e a d p h o n e s o r m e d i c a l d e v i c e s . B l u e t o o t h t e c h n o l o g y e l i m i n a t e s t h e need f o r p h y s i c a l c a b l e s , p r o v i d i n g c o n v e n i e n c e and v e r s a t i l i t y i n d e v i c e c o n n e c t i v i t y . ' }Example s o u r c e : a p p l i c a t i o n Example answer : { ' s o f t w a r e program o r t o o l d e s i g n e d t o p e r f o r m s p e c i f i c t a s k s o r f u n c t i o n s on e l e c t r o n i c d e v i c e s . I t can r a n g e from p r o d u c t i v i t y t o o l s and games t o u t i l i t i e s and c o m m u n i c a t i o n p l a t f o r m s on e l e c t r o n i c d e v i c e s s u c h a s co m p u t e r s , s m a r t p h o n e s , o r t a b l e t s . ' } Example s o u r c e : PDA Example answer : { ' PDA i s t h e acronym f o r p e r s o n a l d i g i t a l a s s i s t a n t , which i s a h a n d h e l d e l e c t r o n i c d e v i c e d e s i g n e d f o r p e r s o n a l o r g a n i z a t i o n , communication , and i n f o r m a t i o n a c c e s s . PDAs may i n c l u d e f e a t u r e s s u c h a s c a l e n d a r s , c o n t a c t l i s t s , and note − t a k i n g c a p a b i l i t i e s , s e r v i n g a s p o r t a b l e t o o l s f o r managing d a i l y t a s k s . PDA i s t h e acronym f o r p e r s o n a l d i g i t a l a s s i s t a n t , which i s a h a n d h e l d e l e c t r o n i c d e v i c e c r a f t e d f o r p e r s o n a l o r g a n i z a t i o n , communication , and i n f o r m a t i o n r e t r i e v a l . PDAs o f t e n i n c o r p o r a t e f e a t u r e s l i k e c a l e n d a r s , c o n t a c t l i s t s , and note − t a k i n g c a p a b i l i t i e s , f u n c t i o n i n g a s p o r t a b l e t o o l s f o r managing d a i l y t a s k s and s t a y i n g c o n n e c t e d . While modern s m a r t p h o n e s have l a r g e l y r e p l a c e d t r a d i t i o n a l PDAs , t h e c o n c e p t i n f l u e n c e d t h e d e v e l o p m e n t o f c o n t e m p o r a r y m o b i l e d e v i c e s . ' } Example s o u r c e : p i l o t s t u d y Example answer : { ' a p r e l i m i n a r y r e s e a r c h i n v e s t i g a t i o n c o n d u c t e d</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head></head><label></label><figDesc>en h a nc e t h e o v e r a l l r o b u s t n e s s and e f f e c t i v e n e s s o f t h e p l a n n e d f u l l − s c a l e r e s e a r c h e n d e a v o r . ' } Now h e r e i s my ONE s e n t e n c e e x p l a n a t i o n : " " " .6. Appendix F d e f r e m o v e _ r e d u n d a n t _ t e x t ( t e x t ) : # D e f i n e p a t t e r n s t o s e a r c h f o r p a t t e r n s = [ r ' ^Hey t h e r e ! ' , r ' ^S u r e ! ' , r ' ^As a s c i e n t i f i c j o u r n a l i s t , ' , r ' I \ 'm h e r e t o b r e a k down a complex s t u d y i n t o s i m p l e t e r m s f o r you \ . ' , r ' Here \ ' s a s i m p l i f i e d v e r s i o n o f t h e t e x t ' , r ' L e t me b r e a k i t down f o r you : ' , r ' I \ 'm h e r e t o b r e a k down a complex s t u d y i n t o s i m p l e t e r m s f o r you \ . ' , r ' I \ 'm h e r e t o b r e a k down complex s c i e n t i f i c c o n c e p t s i n t o s i m p l e , easy − to − u n d e r s t a n d l a n g u a g e . ' , r ' I \ 'm h e r e t o b r e a k down a complex t o p i c i n t o s i m p l e r t e r m s f o r you . So , l e t \ ' s t a l k ab o u t ' , r ' Here i s my one s e n t e n c e e x p l a n a t i o n of ' ] .7. Appendix G # Add d e f i n i t i o n and e x p l a n a t i o n i f t h e y a r e n o t empty i f row [ " d i f f i c u l t y " ] == " d " : d e f i n i t i o n = row . g e t ( " d e f i n i t i o n " , None ) e x p l a n a t i o n = row . g e t ( " e x p l a n a t i o n " , None ) i f d e f i n i t i o n : j s o n _ o b j [ " d e f i n i t i o n " ] = d e f i n i t i o n i f e x p l a n a t i o n : j s o n _ o b j [ " e x p l a n a t i o n " ] = e x p l a n a t i o n r e t u r n j s o n _ o b j .8. Appendix H # Example u s a g e d e f s i m p l i f y ( s n t ) : c = model . c r e a t e _ c h a t _ c o m p l e t i o n ( m e s s a g e s =[</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head></head><label></label><figDesc>t } ] ) r e t u r n c [ ' c h o i c e s ' ] [ 0 ] [ ' message ' ] [ ' c o n t e n t ' ] . s t r i p ( ) d e f s i m p l i f y ( s n t ) : c = model . c r e a t e _ c h a t _ c o m p l e t i o n ( m e s s a g e s = [{ " r o l e " : " s y s t e m " , " c o n t e n t " : " You a r e a s c i e n t i f i c j o u r n a l i s t who p o p u l a r i z e s s c i e n t i f i c r e s u l t s . " } , { " r o l e " : " u s e r " , " c o n t e n t " : " S i m p l i f y t h e f o l l o w i n g t e x t : \ n " + s n t } ] ) r e tu r n c [ ' c h o i c e s ' ] [ 0 ] [ ' message ' ] [ ' c o n t e n t ' ] . s t r i p ( ) s i m p l i f y ( " With t h e e v e r i n c r e a s i n g number o f unmanned a e r i a l v e h i c l e s g e t t i n g i n v o l v e d i n a c t i v i t i e s i n t h e c i vi l i a n and c o m m e r c i a l domain , t h e r e i s an i n c r e a s e d need f o r autonomy i n t h e s e s y s t e m s t o o . " ) .9. Appendix I d e f r e m o v e _ r e d u n d a n t _ t e x t ( t e x t ) : # D e f i n e p a t t e r n s t o s e a r c h f o r p a t t e r n s = [ r ' ^Hey t h e r e ! ' , r ' ^S u r e ! ' , r ' ^As a s c i e n t i f i c j o u r n a l i s t , ' , r ' I \ 'm h e r e t o b r e a k down a complex s t u d y i n t o s i m p l e t e r m s f o r you \ . ' , r ' Here \ ' s a s i m p l i f i e d v e r s i o n o f t h e t e x t ' , r ' L e t me b r e a k i t down f o r you : ' , r ' I \ 'm h e r e t o b r e a k down a complex s t u d y i n t o s i m p l e t e r m s f o r you \ . ' , r ' I \ 'm h e r e t o b r e a k down complex s c i e n t i f i c c o n c e p t s i n t o s i m p l e , easy − to − u n d e r s t a n d l a n g u a g e . ' , r ' I \ 'm h e r e t o b r e a k down a complex t o p i c i n t o s i m p l e r t e r m s f o r you . So , l e t \ ' s t a l k ab o u t ' , r ' Sure , I \ ' d be happy t o h e l p ! ' , r ' Here \ ' s a s i m p l i f i e d e x p l a n a t i o n of ' , r ' I n o t h e r words , ' , r ' I n s i m p l e terms , ' ] # Compile r e g u l a r e x p r e s s i o n s r e g e x _ p a t t e r n s = [ r e . c o m p i l e ( p a t t e r n ) f o r p a t t e r n i n p a t t e r n s ]</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell>Official results for Task 1</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="6">MMR Precision 10 Precision 20 NDCG 10 NDCG 20 Bpref</cell><cell>MAP</cell></row><row><cell>T1 1 0.217</cell><cell>0,0233</cell><cell>0,0150</cell><cell>0,0121</cell><cell>0,0106</cell><cell cols="2">0,0062 0,0025</cell></row><row><cell>T1 2 0,5444</cell><cell>0,3733</cell><cell>0,2750</cell><cell>0,2443</cell><cell>0,2183</cell><cell cols="2">0,0963 0,0601</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>4. Task 2: Experimental Setup 4.1. Data Description The</head><label></label><figDesc></figDesc><table /><note>dataset for "Task 2: Identifying and Explaining Difficult Concepts" in the SimpleText Lab is divided into training and validation folders, each containing several tab-separated files. The training folder includes documents.tsv (576 rows, 115 documents), documents users.tsv (145 rows, document and expert IDs), terms.tsv (1,910 rows, terms, difficulty, expert ID), definitions explanations.tsv (1,046 rows, definitions, explanations, expert ID), and definitions generated.tsv (589 rows, automatically generated definitions). The validation folder contains definitions explanations.tsv (960 rows, definitions without explanations), definitions generated.tsv (932 rows, automatically generated definitions), and terms.tsv (680 rows, terms, difficulty). Initial annotations were performed by multiple experts, with a second round of validation by an external expert to identify additional terms and definitions. The dataset will later include test files for the evaluation phase.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2 Official</head><label>2</label><figDesc></figDesc><table><row><cell>results for task 2</cell><cell></cell><cell></cell></row><row><cell></cell><cell cols="2">Recall overall Recall avg</cell></row><row><cell>Task 2.2</cell><cell>0,0069</cell><cell>0,0040</cell></row><row><cell>Task 2.2 1</cell><cell>0,0083</cell><cell>0,0084</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We'd like to extend our gratitude to the University of Brest for organising the Blended Intensive Programme (BIP) AI For Humanities. We would also like to thank Liana Ermakova for her teaching of the course and Caroline L'haridon for her support during our stay in Brest.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Overview of CLEF 2024 SimpleText track on improving access to scientific texts</title>
		<author>
			<persName><forename type="first">L</forename><surname>Ermakova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024)</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</editor>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Overview of the CLEF 2024 SimpleText task 1: Retrieve passages to include in a simplified summary</title>
		<author>
			<persName><forename type="first">E</forename><surname>Sanjuan</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Overview of the CLEF 2024 SimpleText task 2: Identify and explain difficult concepts</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">M D</forename><surname>Nunzio</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Overview of the CLEF 2024 SimpleText task 3: Simplify scientific text</title>
		<author>
			<persName><forename type="first">L</forename><surname>Ermakova</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Overview of the CLEF 2024 SimpleText task 4: Track the state-of-the-art in scholarly publications</title>
		<author>
			<persName><forename type="first">J</forename><surname>Souza</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
