<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">AIIR Lab Systems for CLEF 2024 SimpleText: Large Language Models for Text Simplification Notebook for the SimpleText Lab at CLEF 2024</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Nicholas</forename><surname>Largey</surname></persName>
							<email>nicholas.largey@maine.edu</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Southern Maine</orgName>
								<address>
									<settlement>Portland</settlement>
									<region>Maine</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Reihaneh</forename><surname>Maarefdoust</surname></persName>
							<email>reihaneh.maarefdoust@maine.edu</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Southern Maine</orgName>
								<address>
									<settlement>Portland</settlement>
									<region>Maine</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Shea</forename><surname>Durgin</surname></persName>
							<email>shea.durgin@maine.edu</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Southern Maine</orgName>
								<address>
									<settlement>Portland</settlement>
									<region>Maine</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Behrooz</forename><surname>Mansouri</surname></persName>
							<email>behrooz.mansouri@maine.edu</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Southern Maine</orgName>
								<address>
									<settlement>Portland</settlement>
									<region>Maine</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">AIIR Lab Systems for CLEF 2024 SimpleText: Large Language Models for Text Simplification Notebook for the SimpleText Lab at CLEF 2024</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">B70C8FBE92E8EE53A96E84B063D40598</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:01+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Scientific Text Simplification</term>
					<term>Definition Extraction</term>
					<term>Large Language Models</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents the participation of the Artificial Intelligence and Information Retrieval (AIIR) Lab from the University of Southern Maine in the CLEF 2024 SimpleText Lab. SimpleText has three main Tasks. Five systems are proposed for the first Task, which involves retrieving passages to include in a simplified summary. These systems select candidates using TF-IDF with expanded queries via LLaMA3. The re-ranking is performed using a bi-encoder, a cross-encoder, and LLaMA3. In Task 2, which involves identifying and explaining difficult concepts, three models utilizing LLaMA3 and Mistral are employed. Finally, for Task 3, which focuses on simplifying scientific text, four systems are introduced. Similar to Task 2, LLaMA3 and Mistral are used with different prompting and fine-tuning approaches. The experimental results show the proposed systems in Task 1 are the most effective, and for Tasks 2 and 3 are comparable with other systems proposed in the SimpleText lab.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The CLEF 2024 SimpleText lab <ref type="bibr" target="#b0">[1]</ref> is dedicated to enhancing accessibility to scientific information for all users, encompassing both information retrieval and natural language processing aspects. Unlike traditional text simplification methods, which often target lower literacy levels by making general texts more accessible to younger readers, Scientific Text Simplification has a distinct focus.</p><p>The Artificial Intelligence and Information Retrieval (AIIR) lab from the University of Southern Maine (USA) participated in three Tasks of the CLEF 2024 SimpleText lab. With advances in large language models (LLMs), our team considered using them for different Tasks, mainly focusing on two models: LLaMA3 1 and Mistral <ref type="bibr" target="#b1">[2]</ref>.</p><p>SimpleText Task 1 <ref type="bibr" target="#b2">[3]</ref>, Retrieving Passages to Include in a Simplified Summary, aims to retrieve passages from a vast collection of academic abstracts and bibliographic metadata that can aid in understanding this article. These relevant passages should pertain to any of the topics covered in the article. For Task 1, we submitted five runs using different techniques, ranging from cross and bi-encoders to large language models (LLMs) for query expansion and re-ranking.</p><p>In Task 2 <ref type="bibr" target="#b3">[4]</ref>, Identifying and Explaining Difficult Concepts, the goal is to identify which concepts in scientific abstracts need explanation and contextualization to assist readers in understanding the scientific text. Our team participated in two Subtasks: 2.1) retrieve up to 5 difficult terms in a given passage from a scientific abstract, and 2.2) provide an explanation of these difficult terms. For these Subtasks, in addition to LLaMA3 and Mistral, we used a fine-tuned LLaMA3 model.</p><p>Finally, Task 3 <ref type="bibr" target="#b4">[5]</ref>, Simplify Scientific Text, tackles the problem of creating simplified versions of sentences taken from scientific abstracts. The input for the systems is popular science articles, queries, and corresponding scientific paper abstracts, all divided into individual sentences. For this Task, we also used a fine-tuned LLaMA3 model, and Mistral as our proposed approaches.</p><p>The reported results show our proposed systems for all three Tasks have high effectiveness. For Task 1, and Subtask 3.2 our proposed models were the most effective ones, while for Task 2, and Subtask 3.1, they are comparable to the leading systems. In the next sections, we will describe our systems for each Task, followed by evaluation results and analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Task 1: Retrieving Passages to Include in a Simplified Summary</head><p>This section first describes the data for Task 1 <ref type="bibr" target="#b2">[3]</ref>. Then we describe our five proposed systems. Finally, we will provide the results and analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Topic and Collection</head><p>As described by the organizers, the topics for this Task are from two resources: 1) the tech section of The Guardian<ref type="foot" target="#foot_0">2</ref> newspaper (topics G01 to G20), and 2) Tech Xplore<ref type="foot" target="#foot_1">3</ref> website (topics T01 to T20). Each topic represents a query selected from one of these resources. For instance, for the topic 'G13.1', the query is "digital marketing", with its context being an article titled "Baffled by digital marketing? Find your way out of the maze", from The Guardian. Participants have access to the whole article, its title, and the query.</p><p>The main corpus consists of a large set of scientific abstracts and associated metadata in the field of computer science and engineering. The 12th version of the Citation Network Dataset <ref type="bibr" target="#b5">[6]</ref>, released in 2020, provides this data extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other sources. It contains 4,894,083 bibliographic references published before 2020, 4,232,520 English abstracts, 3,058,315 authors with affiliations, and 45,565,790 ACM citations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Proposed Models</head><p>AIIR Lab submitted five runs, of which three participated in the pooling and assessment process. Here, we explain each of our proposed approaches:</p><p>• Query Expansion with LLaMA3, Search with Bi-Encoder / Cross-Encoder. (LLaMA Bi-Encoder/CrossEncoder): For Task 1, input queries are short keyword terms (e.g., "drones", "advertising", "gene editing") selected from technical articles. To contextualize and potentially expand these queries, we consider their related articles and leverage LLaMA3<ref type="foot" target="#foot_2">4</ref> for query reformulation/expansion. Following the approach proposed by Anand et al. <ref type="bibr" target="#b6">[7]</ref>, we provide the query and the article to the model, and use the system prompt for query rewriting/expansion shown in Table <ref type="table">9</ref> in the Appendix.</p><p>Using our system prompt, we then pass the query, the related article title, and context to LLaMA3 and expand the initial query. Table <ref type="table" target="#tab_0">1</ref> shows examples of expanded queries. After this step, we use TF-IDF from PyTerrier <ref type="bibr" target="#b7">[8]</ref> with default parameters to get the top-5000 results for each expanded query.</p><p>We then re-rank the candidates using two architectures of SentenceBERT <ref type="bibr" target="#b8">[9]</ref>: bi-and cross-encoder.</p><p>For the bi-encoder, we use 'all-mpnet-base-v2' model due to its demonstrated effectiveness in capturing semantic similarity between queries and documents across various information retrieval Tasks. This model is used without further fine-tuning. The input query for the bi-encoder combines the initial query, related article title, and LLaMA-expanded query. We consider the title and abstract of each passage as the document for comparison with the query. For our second run, based on observations from previous lab participation <ref type="bibr" target="#b9">[10]</ref>, we fine-tune a cross-encoder model, 'ms-marco-MiniLM-L-6-v2'. For fine-tuning, we use the data from previous where the initial query is the query specified by the organizers, the Article's Title corresponds to the topic text, and the Expanded Query is the context generated by LLaMA3. For example, the input query for topic G11.1 would be: drones + [TOP] + UK wants new drones in wake of Azerbaijan military success + [CON] + UK military drones Nagorno-Karabakh conflict Azerbaijan Armenia. Documents in the collection are represented as 'title + [ABS] + abstract'. In our fine-tuning process, three special tokens {TOP, CON, ABS} are included to separate different text types. After fine-tuning the cross-encoder model, we re-rank the top-100 results retrieved by the bi-encoder model.</p><p>• Re-ranking with LLaMA (LLaMA Re-Ranker): While we used LLaMA3 for query expansion for our two first runs, for our next two runs, we used it as a pairwise re-ranker. Following the approach proposed by Qin et al. <ref type="bibr" target="#b10">[11]</ref>, we used a system prompt for pair-wise re-ranking shown in Table <ref type="table">9</ref> (Appendix). Two variations of this architecture were implemented, differing in the user message provided to LLaMA3. In one version, the user message included the query, related article title, and context generated from the previous runs (i.e., the expanded query from the LLaMA3). The other version omitted the context. Essentially, LLaMA3 was Tasked with determining which of the two documents was more relevant to the query based on the provided information. We re-ranked the top-100 candidates retrieved by the bi-encoder model. Since LLaMA3's outputs in this context might not be suitable for direct confidence scores, we assigned a simple ranking based on enumeration. The highest-ranked document received a score of 100, with scores decreasing by 1 for lower ranks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>• Fine-Tuned Cross-Encoder combined with ElasticSearch (CERRF): Our last run leverages</head><p>ElasticSearch, provided by the organizers. We first retrieve the top-100 results for each topic using a combination of the query and topic text. Subsequently, we re-rank these results using a fine-tuned cross-encoder 'ms-marco-MiniLM-L-6-v2'. For fine-tuning, the training data from previous labs was used. We represented each input query as "&lt;query&gt; [QSP] &lt;topic text&gt;", while the papers were represented as "&lt;title&gt; [TSP] &lt;abstract&gt;". Here, [QSP] and [TSP] are special tokens separating the query text from the topic text and the paper title from its abstract, respectively. To select optimal hyperparameters, topics G10 and G11 were chosen for validation. The 2023 test set was used for the final evaluation. After hyperparameter selection, the model was fine-tuned on all available training topics.</p><p>In addition to the cross-encoder approach, we also perform a separate retrieval using Elasticsearch with only the query (without the topic text). The results from both methods are then combined using the modified Reciprocal Rank Fusion (MRRF) technique <ref type="bibr" target="#b11">[12]</ref> as EQ.1, where 𝑑 is the document, 𝑠 𝑚 and 𝑟 𝑚 are the model's similarity score and the rank, respectively. The underlying principle of MRRF is that documents ranked highly by both retrieval methods are likely more relevant than those ranked highly by only one method.</p><formula xml:id="formula_0">𝑅𝑅𝐹 𝑠𝑐𝑜𝑟𝑒(𝑑 ∈ 𝐷) = ∑︁ 𝑚∈𝑀 𝑠 𝑚 (𝑑) 60 + 𝑟 𝑚 (𝑑)<label>(1)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Experimental Results and Analysis</head><p>Table <ref type="table" target="#tab_1">2</ref> shows the effectiveness of our proposed models, reported by the organizers. Except for P@20, the LLaMABiEncoder archives the highest effectiveness across all measures. Another aspect evaluated in Task 1, is credibility and text complexity, for which the results from our systems are shown in Table <ref type="table" target="#tab_2">3</ref>.</p><p>Looking at the LLaMABiEncoder results, for only 10% of topics, the MRR value is not 1. The lowest MRR is for the topic 'G02.C1', at 0.33 (P@10 of 0.7). For this topic, the query text by the organizers is defined as "concerns related to the handling of sensitive information by voice assistants". With LLaMA3, the expanded query is "voice assistants handling sensitive information concerns Apple Siri recordings", does not seem to add any new useful terms to the original query. The top retrieved results for this topic is an article titled, "Poster: A First Look at the Privacy Risks of Voice Assistant Apps. ", assessed as non-relevant. For topics like 'T11.1' the original query "character relationship" is expanded to "character relationship network map The Witcher", helping find more relevant results, leading to MRR and P@10 of 1.</p><p>Comparing our LLaMA3 re-ranking approach system, LLaMAReranker2 against LLaMABiEncoder, there is no significant difference between the two systems, using Two-sided Paired Student's t-Test (p-value=0.05). Interestingly, both models have the same topics for which they did not achieve MMR of 1. For topic 'G02.C1', the MMR drops to 0.2 with LLaMAReranker2 (P@10 of 0.3). Investigating the results for this topic, LLaMA3 gave higher ranks to articles that have only titles (abstract missing) such as the article titled "Examining the Use of Voice Assistants: A Value-Focused Thinking Approach". With the article's abstract missing, these articles are assessed as non-relevant. Overall, using LLaMA3 for either re-ranking or query expansion showed similar effectiveness, while re-ranking with a bi-Encoder proved more efficient.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Task 2: Identifying and Explaining Difficult Concepts</head><p>This section describes the data for Task 2 <ref type="bibr" target="#b3">[4]</ref>, our proposed models, and evaluation results. We rely on LLaMA3 and Mistral <ref type="bibr" target="#b1">[2]</ref> language models and propose three systems for Subtasks 1 and 2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Training and Test Data:</head><p>For Task 2, 576 sentences from 115 documents are provided for training. For these sentences, 2590 annotated difficult terms are available. Subtask 2.2 leverages a dataset of 501 sentences across 55 documents, containing 2,006 explanations and 1,521 definitions. These documents are selected from high-ranked abstracts to the requests of Task 1. Participants are asked to detect difficult terms, along with the difficulty level for Subtask 2.1, and provide definitions and explanations of detected difficult terms for Subtask 2.2 <ref type="bibr" target="#b0">[1]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Proposed Models</head><p>Our team participated in Task 2, with three proposed systems, based on LLaMA3 and Mistral. Here we describe our models:</p><p>• LLaMA: Our first model uses LLaMA3-8B-Instruct, using a system prompt to instruct the model to act as a knowledgeable high school student (details in Table <ref type="table" target="#tab_0">10</ref>). This prompt achieved the best performance among those studied on the training data. We process each sentence from the test set using the following user message:</p><p>For the sentence: SENTENCE, what are difficult terms (one to five consecutive terms)?</p><p>What is the difficulty level? Your output is term or terms: difficulty level (e, m, or d).</p><p>Do not provide explanation, just give the answer.</p><p>where SENTENCE represents the actual sentence. We specify the output format, as LLaMA can add unnecessary information. After identifying difficult terms, we again utilize LLaMA to generate definitions and explanations. As shown in Table <ref type="table" target="#tab_0">10</ref>, we instruct LLaMA to act as a technician with knowledge of technical terms and request definitions and explanations. The following user message is used for this step:</p><p>You have identified term "TERM" in the sentence: "SENTENCE" as an unclear term. Provide its definition and explain what it is. The output should be like: Definition: Give definition here, Explanation: Give explanation here where TERM represents the term identified earlier and SENTENCE is the sentence it originated from. • LLaMA Fine-tuned (LLaMAFT): Our second approach is based on prompt engineering and reinforcement learning with human feedback to improve the quality of outputs generated by the LLaMA model. We designed several models to enhance the feedback loop, ultimately aiming for better results. Our exploration resulted in three distinct models, shown in Figure <ref type="figure">1</ref>. Our models differ in how the user and system messages are sent to LLaMA. Step 2</p><p>Step 1</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 1:</head><p>Our studied approaches for using LLaMA3 for Task 2. Sentence represent the input sentences for which difficult terms should be extracted and defined. The output shows human-annotated data, including the extracted terms, their definitions, and explanations. Four different approaches were studied; Model M3's consistent high-quality performance on the training data makes it the preferred choice for further evaluation and testing phases.</p><p>• Mistral: Similar to our LLaMA3-based model, our approach with Mistral-7B leverages a system prompt (details in Table <ref type="table" target="#tab_0">10</ref>). This prompt instructs Mistral to identify difficult terms. We process training examples through a series of prompts and responses with Mistral to achieve this.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Experimental Results and Analysis</head><p>Our proposed systems results on the test set are summarized in Table <ref type="table" target="#tab_4">4</ref>. For each run, the organizers reported:</p><p>• Recall of all the terms, independently from the level of difficulty • Precision of all the terms, independently from the level of difficulty • Recall of the difficult terms • Precision of the difficult terms • BLEU score computed for bigrams Our proposed Mistral approach provided better results compared to LLaMA3. Providing an example, for the sentence "Cryptocurrency was built initially as a possible implementation of digital currency, then various derivatives were created in a variety of fields such as financial transactions, capital management, and even nonmonetary applications. " (sentence ID: G08.1_2972302621_1), Table <ref type="table">5</ref> shows the ground-truth, and the results generated by Mistral and LLaMA, for Subtask 2.1. As can be seen, LLaMA tends to extract fewer terms for each sentence, leading to lower recall; however, the precision for correctly identifying difficulty level is more precise.</p><p>Another interesting aspect of Task 2 is duplicate sentences. The organizers have provided repeated sentences to study whether LLMs provide the same results. Our results show while Mistral mostly produces the same responses, LLaMA3 responses seem to differ each time. For a short sentence, "This is especially true for self-driving vehicles deployed in public transport services. ", LLaMA3 once extracts the terms 'self-driving', 'vehicles', 'public transport' and the next time extracts 'self-driving', 'deployed'. Mistral extracted terms, however, remained the same.</p><p>Note on LLaMAFT Run: We have identified a mistake while submitting this run. Our studies for different models (M0 to M3) used a two-stage process of first identifying the difficult terms and then generating the definitions. In our submitted model for the test data, we mistakenly used a single prompt for all the Subtasks. Upon correction, including previous related documents and human answers improved the results (Precision: 0.28, Recall: 0.41).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Task 3: Simplify Scientific Text</head><p>This section describes the data, proposed models, and evaluation results for Task 3 <ref type="bibr" target="#b4">[5]</ref>. LLaMA3-8B-Instruct and Mistral were utilized for both Subtasks 3.1 and 3.2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 6</head><p>Task 3.1 training data example. The source shows the input sentence, and the target is its simplified version. snt_id G06.2_855132903_1 source In this paper we present queuing-theoretical methods for the modeling, analysis, and control of autonomous mobility-ondemand MOD systems wherein robotic, self-driving vehicles transport customers within an urban environment and rebalance themselves to ensure acceptable quality of service throughout the network. target Queuing models are used for autonomous mobility-on-demand MOD systems. A queuing model is constructed so that queue lengths and waiting time can be predicted. In MOD systems, robotic, self-driving vehicles transport customers within an urban environment and rebalance themselves to ensure quality of service.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Topic and Collection</head><p>The training data consists of a collection of parallel text passages (source and simplified versions). These simplified sentences are directly created from original scientific abstracts in the DBLP Citation Network Dataset for Computer Science, Google Scholar, and PubMed articles on Health and Medicine (all from 2023). The dataset includes 648 sentences for training and 245 sentences for testing. The simplification process involved either master's students in Technical Writing and Translation or a team of a computer scientist and a professional translator (native English speaker). An example of this source (original) and target (simplified) sentence pair is provided in Table <ref type="table">6</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Proposed Models</head><p>AIIR Lab submitted a total of four runs for both Subtasks 3.1 (sentence-level) and 3.2 (abstract-level). Three of the runs utilized a fine-tuned LLaMA3-8B model and one used Mistral, with prompt engineering. Our proposed approaches are as follows:</p><p>• Prompt Engineering with Instruction-tuned LLaMA3-8B: Our first three runs for this Task utilized LLaMA3-8B which was instruction-tuned with the provided training data for both the sentence and abstract levels. We used a split of 90:10 for training and validation. For instruction tuning with LLaMA, we used Quantized Low-Rank Adaptation (QLoRA). QLoRA, as shown in Figure <ref type="figure" target="#fig_1">3</ref>, is a method used for fine-tuning processes to reduce the amount of memory required and computational cost <ref type="bibr" target="#b12">[13]</ref>. The model's weights are first converted from 16-bit floating point numbers to 4-bit NormalFloat. These reduced-size weight matrices are then approximated to low-rank matrices by reducing the number of parameters, speeding up computation time, and reducing the data footprint. These 4-bit embeddings then utilize NVIDIA's unified memory feature, which allows for automatic paging optimization before updating the weights. This paging optimization allows for the CPU RAM to be accessed by the GPU directly for page-to-page transferring, preventing the possibility of running out of GPU memory space as long as sufficient system memory is available.</p><p>During the training process, the data was first run through QLoRA for the token embeddings to be resized. The hyperparameters are set as follows: an alpha of 32, a dropout of 0.1, a Task type of "CASUAL_LM" and an R-value of 8. The output data was then fed to LLaMA3-8B with the hyperparameters of a learning rate of e-4, a paged_adam_32 optimization function, 20 epochs and a batch size of 8.  where prompt (P), for all training samples, was the one used for Run 1 (Table <ref type="table" target="#tab_1">12</ref>). The source (S) and target (T ) values would be the output token embeddings from QLoRA. We believe this would give LLaMA3 a better understanding of the linguistic styles in the desired target simplifications. For prompt engineering, we focused on the average FKGL (Flesch-Kincaid Grade Level) score for the provided test sentences and abstracts. The data was passed into our instruction-tuned model and an FKGL score was averaged at the end of each run.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>As shown in</head><p>• Mistral (RUN 4): Using Mistral 7B, we used the system prompt as shown in Table <ref type="table" target="#tab_1">12</ref>. We then used three sample sentences from training data, along with their simplified versions, to provide examples for Mistral. As our final user message, we passed the test sentence/abstract to mistral with the prompt:</p><p>Now do the same for this text, simplify by explaining technical terms or replacing them with easier words without removing context: TEXT where TEXT is the input sentence/abstract. Note: While submitting this run, we only evaluated the model on the training data by mistake. Therefore, this run was excluded from the evaluation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Experimental Results and Analysis</head><p>Task 3 results are evaluated based on several metrics, with SARI <ref type="bibr" target="#b13">[14]</ref> score against the human reference simplifications as the main measure. Table <ref type="table" target="#tab_6">7</ref> shows our results for both Subtasks 3.1 (sentence-level) and 3.2 (abstract-level). While for Subtask 3.1, our team runs are ranked second in terms of SARI score, we achieved the highest SARI score for Subtask 3.2 between the participating teams. For the level of text complexity, the FKGL readability measure is used. Compared to the references, our models have high compression ratios and sentence splits, as LLaMA's outputs are lengthier. An example of this is shown in Table <ref type="table">8</ref>, where our simplified version of the original input text is compared against the ground-truth and for Subtask 3.1. For Subtask 3.1, all LLaMA3's Sari scores fell within a ±0.82 difference from one another. The Sari scores for Subtask 3.2 were similar to Subtask 3.1, in that, they varied by a relatively narrow margin of ±1.25. The original sentences have an FKGL of 13-14 corresponding to a university-level text, with the reference scores being 8.86 for Subtask 3.1 and 8.91 for Subtask 3.2. Our FKGL results for all runs in both Tasks fell within the 8.39 to 10.33 FKGL range, with our run 1 scores being 0.47 points below for Task 3.1 and 0.16 points above for Task 3.2 compared to the reference FKGL score. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 8</head><p>Our results for sentence-level simplification for sentence ID 'G01.1_1552637960_1' in Subtask 3.1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Original text</head><p>The goal of the MOST project is to develop a novel, inexpensive, easy-to-use digital talking device for blind and visually impaired users based on off-theshelf handheld computers (Personal Digital Assistant).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Simplification system</head><p>Simplified Result</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Ground-truth</head><p>The goal of the MOST project is to create a new talking device for blind people. LLaMA3-8B Run 1</p><p>The MOST project aims to create a simple, affordable, and easy-to-use digital talking device for blind and visually impaired people using ordinary handheld computers. LLaMA3-8B Run 2</p><p>The goal of the MOST project is to create a simple, affordable, and easy-touse digital device that can talk to blind and visually impaired people using handheld computers. LLaMA3-8B Run 3</p><p>The MOST project aims to create a simple, affordable, and user-friendly digital talking device for blind and visually impaired people using common handheld computers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>AIIR lab participated in SimpleText CLEF 2024 lab Tasks 1 to 3, relying on large language models, namely LLaMA3 and Mistral. In Task 1, we submitted five runs leveraging LLaMA for query expansion, TF-IDF for candidate selection, and both bi-encoder and fine-tuned cross-encoder models for re-ranking. We also explored LLaMA for re-ranking within this Task. Our bi-encoder model and LLaMA re-ranker models were the most effective systems among the participating teams. For Task 2, we had three runs, using LLaMA and Mistral. Our Mistral-based model provided better effectiveness compared to LLaMA, providing higher recall and precision in detecting difficult terms. However, LLaMA model was better at detecting difficulty levels. Finally, for Task 3, we participated in both Subtasks, submitting four runs that employed LLaMA and Mistral. Our LLaMA models had high SARI scores for Subtasks 3.1 and 3.2. For future work, we aim to explore large language models further for these Tasks, incorporating techniques such as chain-of-thoughts to study the effectiveness of these models for the related Tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 12</head><p>System prompts used for Task 3.</p><p>Model Prompt LLaMA3-8B Run 1 Simplify this text for English speaking science students in college. Maximize the use of simple words and short sentences, but include keywords from the original text. Optimize the output ROUGE, SARI, and BLEU scores LLaMA3-8B Run 2 You are a skilled editor, known for your ability to simplify complex text while preserving its meaning. You have a strong understanding of readability principles and how to apply them to improve text comprehension. LLaMA3-8B Run 3 Simplify the following scientific text for an average American citizen. Keep, but define, any keywords and subjects with less complex words and phrases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Mistral</head><p>You are a skilled editor, known for your ability to simplify complex text while preserving it. You explain the technical terms, defining what they are (e.g., terms like Blockchain, Cryptojacking, all abbreviations), without removing sentences or summarizing them.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 illustratesFigure 2 :</head><label>22</label><figDesc>Figure 2: Prompts used for Subtask 2.1 to extract difficult terms. The SENTENCE represents the test sentence passed to Mistral, and the final response from Mistral provides the identified difficult terms.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: QLoRA embedding and paging pipeline.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Query rewriting/expansion using LLaMA3.</figDesc><table><row><cell cols="2">Initial Query Expanded Query</cell></row><row><cell>drones</cell><cell>UK military drones Nagorno-Karabakh conflict Azerbaijan Armenia</cell></row><row><cell>advertising</cell><cell>advertising digital marketing channels</cell></row><row><cell cols="2">gene editing Gene editing Crispr therapy diseases treatment prospects</cell></row><row><cell cols="2">years of the SimpleText lab, splitting into 90% training and 10% validation sets. We fine-tune the</cell></row><row><cell cols="2">model for 25 epochs, choosing the hyperparameters with the highest MRR@10 (Mean Reciprocal</cell></row><row><cell cols="2">Rank) on the validation set. The input queries were fed to this model as</cell></row><row><cell cols="2">Initial Query + [TOP] + Article's Title + [CON] + Expanded Query</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>AIIRLab systems results for CLEF 2024 SimpleText Task 1 on the test Qrels (G01.C1-G10.C1 and T06-T11).</figDesc><table><row><cell>Model</cell><cell>MRR</cell><cell>P@10</cell><cell cols="3">P@20 NDCG@10 NDCG@20</cell><cell>Bpref</cell><cell>MAP</cell></row><row><cell>LLaMABiEncoder</cell><cell cols="3">0.9444 0.8167 0.5517</cell><cell>0.6170</cell><cell>0.5166</cell><cell cols="2">0.3559 0.2304</cell></row><row><cell>LLaMAReranker2</cell><cell>0.9300</cell><cell cols="2">0.7933 0.5417</cell><cell>0.5943</cell><cell>0.5004</cell><cell>0.3495</cell><cell>0.2177</cell></row><row><cell>LLaMAReranker</cell><cell>0.8944</cell><cell cols="2">0.7967 0.5583</cell><cell>0.5889</cell><cell>0.5011</cell><cell>0.3541</cell><cell>0.2200</cell></row><row><cell cols="2">LLaMACrossEncoder 0.7975</cell><cell cols="2">0.6933 0.5100</cell><cell>0.4745</cell><cell>0.4240</cell><cell>0.3404</cell><cell>0.1970</cell></row><row><cell>CERRF</cell><cell>0.7264</cell><cell cols="2">0.5033 0.4000</cell><cell>0.3584</cell><cell>0.3239</cell><cell>0.2204</cell><cell>0.1309</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Evaluation of AIIRLab systems for complexity and credibility in Task 1 (over all 176 queries) .</figDesc><table><row><cell>Model</cell><cell cols="3">Avg. #Refs Avg. Sentence Length Avg. Syllabus per Word</cell></row><row><cell>LLaMABiEncoder</cell><cell>9.5</cell><cell>31.0</cell><cell>1.865</cell></row><row><cell>LLaMAReranker2</cell><cell>8.6</cell><cell>20.9</cell><cell>1.707</cell></row><row><cell>LLaMAReranker</cell><cell>8.8</cell><cell>22.1</cell><cell>1.772</cell></row><row><cell>LLaMACrossEncoder</cell><cell>10.0</cell><cell>30.6</cell><cell>1.890</cell></row><row><cell>CERRF</cell><cell>10.6</cell><cell>22.0</cell><cell>1.895</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 11 ,</head><label>11</label><figDesc>shows the order of prompts used for each model. Each model mainly follows a two-step process:-Step 1: After instructing model based on the prompts in Table11, the user message is based on the sentence and in some cases, by incorporating human-annotated data (output) from training data. This output represents the desired outcome for the Task, including identified difficult terms and their corresponding definitions and explanations.</figDesc><table><row><cell>(Sentence i , Output i )</cell><cell>Sentence i</cell><cell>Sentence j</cell><cell>(Sentence j , Output j )</cell></row><row><cell>LLaMA3</cell><cell>LLaMA3</cell><cell>LLaMA3</cell><cell>LLaMA3</cell></row><row><cell>Sentence i</cell><cell>(Sentence i , Output i )</cell><cell>(Output j , Sentence i )</cell><cell>Sentence i</cell></row><row><cell>LLaMA3</cell><cell>LLaMA3</cell><cell>LLaMA3</cell><cell>LLaMA3</cell></row><row><cell>M0</cell><cell>M1</cell><cell>M2</cell><cell>M3</cell></row></table><note>-Step 2: Using the generated result by LLaMA from Step 1, and a new user prompt, a second round of results is produced. Each model was studied with different combinations of training data and prompts. Through our experiments, Model M3 outperformed other approaches and was used as our second run.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4</head><label>4</label><figDesc>AIIRLab systems results for CLEF 2024 SimpleText Task 2.</figDesc><table><row><cell>Model</cell><cell cols="5">Recall Precision Rec_Difficult Prec_Difficult Blue</cell></row><row><cell>Mistral</cell><cell>0.41</cell><cell>0.69</cell><cell>0.19</cell><cell>0.49</cell><cell>0.13</cell></row><row><cell>LLaMA</cell><cell>0.28</cell><cell>0.65</cell><cell>0.26</cell><cell>0.67</cell><cell>0.15</cell></row><row><cell>LLaMAFT</cell><cell>0.01</cell><cell>0.99</cell><cell>0.00</cell><cell>1.00</cell><cell>0.12</cell></row><row><cell>Table 5</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="7">Extracted difficult terms with their difficulty levels for sentence ID 'G08.1_2972302621_1' from SimpleText 2024.</cell></row><row><cell cols="5">Letters 'd', 'm', and 'e' show difficult, medium, and easy terms, respectively.</cell><cell></cell></row><row><cell>Ground-truth</cell><cell></cell><cell cols="2">Mistral</cell><cell></cell><cell>LLaMA</cell></row><row><cell>Term</cell><cell cols="2">Difficulty Term</cell><cell></cell><cell cols="2">Difficulty Term</cell><cell>Difficulty</cell></row><row><cell>cryptocurrency</cell><cell>m</cell><cell>cryptocurrency</cell><cell></cell><cell>d</cell><cell>cryptocurrency</cell><cell>d</cell></row><row><cell>digital currency</cell><cell>m</cell><cell>digital currency</cell><cell></cell><cell>m</cell><cell>digital currency</cell><cell>m</cell></row><row><cell>capital management</cell><cell>m</cell><cell cols="2">capital management</cell><cell>m</cell><cell>derivatives</cell><cell>m</cell></row><row><cell>nonmonetary applications</cell><cell>d</cell><cell cols="2">nonmonetary applications</cell><cell>m</cell><cell></cell></row><row><cell>financial transactions</cell><cell>e</cell><cell cols="2">financial transactions</cell><cell>e</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head></head><label></label><figDesc>Table 6 each entry for the data is paired into source and target values. We passed training data for LLaMA3 instruction-tuning as:</figDesc><table><row><cell>Paged 4-bit Embeddings</cell><cell>4-bit Paging Optimizer</cell><cell>CPU</cell></row><row><cell>QLoRA</cell><cell></cell><cell></cell></row><row><cell>Embeddings</cell><cell></cell><cell></cell></row><row><cell></cell><cell>16-bit Tokenized Word Embedding Matrices</cell><cell></cell></row><row><cell cols="2">"Instruction:" + [P] + "Input: " + [S] + "Response: " + [T]</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 7</head><label>7</label><figDesc>AIIRLab systems results for CLEF 2024 SimpleText Task 3.</figDesc><table><row><cell></cell><cell></cell><cell>Subtask 3.1</cell><cell></cell><cell></cell><cell>Subtask 3.2</cell></row><row><cell>Model</cell><cell cols="6">FKGL SARI BLEU FKGL SARI BLEU</cell></row><row><cell>LLaMA3-8B Run1</cell><cell>8.39</cell><cell>40.58</cell><cell>7.53</cell><cell>9.07</cell><cell cols="2">43.44 11.73</cell></row><row><cell>LLaMA3-8B Run3</cell><cell>9.47</cell><cell>40.36</cell><cell>6.26</cell><cell cols="3">10.17 43.21 11.03</cell></row><row><cell cols="3">LLaMA3-8B Run2 10.33 39.76</cell><cell>5.46</cell><cell cols="2">10.22 42.19</cell><cell>7.99</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">https://www.theguardian.com/uk/technology</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">https://techxplore.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">We used Meta-LLaMA3-8B-Instruct model from HuggingFace</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Prompts</head><p>This section shows the prompts used for SimpleText lab for the Tasks we participated in. For query rewriting/expansion and re-ranking, we used the system prompts shown in Table <ref type="table">9</ref> with LLaMA3. For Task 2, Table <ref type="table">10</ref> shows the system prompts that we used for Subtasks 1 and 2. Table <ref type="table">11</ref> shows our prompts for fine-tuning LLaMA for Task 2. Finally, Table <ref type="table">12</ref> shows our prompts for Task 3. </p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Overview of CLEF 2024 SimpleText track on improving access to scientific texts</title>
		<author>
			<persName><forename type="first">L</forename><surname>Ermakova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024)</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</editor>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Q</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sablayrolles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mensch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bamford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Chaplot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Casas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bressand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lengyel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lample</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Saulnier</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2310.06825</idno>
		<title level="m">Mistral 7B</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Overview of the CLEF 2024 SimpleText task 1: Retrieve passages to include in a simplified summary</title>
		<author>
			<persName><forename type="first">E</forename><surname>Sanjuan</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">M D</forename><surname>Nunzio</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Overview of the CLEF 2024 SimpleText task 2: Identify and explain difficult Table 9 System prompts used for query expansion and re-ranking with LLaMA3 for Task 1</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024)</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Overview of the CLEF 2024 SimpleText task 3: Simplify scientific text</title>
		<author>
			<persName><forename type="first">L</forename><surname>Ermakova</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">ArnetMiner: Extraction and Mining of Academic Social Networks</title>
		<author>
			<persName><forename type="first">J</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Su</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining</title>
				<meeting>the 14th ACM SIGKDD international conference on Knowledge discovery and data mining</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Context Aware Query Rewriting for Text Rankers using LLM</title>
		<author>
			<persName><forename type="first">A</forename><surname>Anand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Setty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Anand</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2308.16753</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Declarative Experimentation in Information Retrieval using PyTerrier</title>
		<author>
			<persName><forename type="first">C</forename><surname>Macdonald</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tonellotto</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval</title>
				<meeting>the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks</title>
		<author>
			<persName><forename type="first">N</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">AIIR and LIAAD Labs Systems for CLEF 2023 SimpleText</title>
		<author>
			<persName><forename type="first">B</forename><surname>Mansouri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Durgin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Franklin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Fletcher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Campos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF (Working Notes)</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Jagerman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Hui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Metzler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: NAACL 2024</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">DPRL Systems in the CLEF 2022 ARQMath Lab: Introducing MathAMR for Math-Aware Search</title>
		<author>
			<persName><forename type="first">B</forename><surname>Mansouri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">W</forename><surname>Oard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zanibbi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. CLEF</title>
		<title level="s">CEUR Working Notes</title>
		<meeting>CLEF</meeting>
		<imprint>
			<date type="published" when="2022">2022. 2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">QLoRA: Efficient Finetuning of Quantized LLMs</title>
		<author>
			<persName><forename type="first">T</forename><surname>Dettmers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pagnoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Holtzman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Optimizing Statistical Machine Translation for Text Simplification</title>
		<author>
			<persName><forename type="first">W</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Napoles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Pavlick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Callison-Burch</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">4</biblScope>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
