<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Argument Mining in BioMedicine: Zero-Shot, In-Context Learning and Fine-tuning with LLMs</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Jérémie</forename><surname>Cabessa</surname></persName>
							<email>jeremie.cabessa@uvsq.fr</email>
							<affiliation key="aff0">
								<orgName type="laboratory">David Lab</orgName>
								<orgName type="institution" key="instit1">University of Versailles Saint-Quentin (UVSQ)</orgName>
								<orgName type="institution" key="instit2">University of Paris-Saclay</orgName>
								<address>
									<postCode>78000</postCode>
									<settlement>Versailles</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">Czech Academy of Sciences</orgName>
								<address>
									<postCode>18207</postCode>
									<settlement>Prague 8</settlement>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Hugo</forename><surname>Hernault</surname></persName>
							<email>hugoh@playtika.com</email>
							<affiliation key="aff2">
								<orgName type="institution">Playtika Ltd</orgName>
								<address>
									<postCode>CH-1003</postCode>
									<settlement>Lausanne</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Umer</forename><surname>Mushtaq</surname></persName>
							<email>umer.mushtaq@univ-lr.fr</email>
							<affiliation key="aff3">
								<orgName type="laboratory">Laboratoire Informatique, Image, Interaction (L3i)</orgName>
								<orgName type="institution">University of La Rochelle</orgName>
								<address>
									<postCode>17042</postCode>
									<settlement>La Rochelle</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff4">
								<orgName type="department">Tenth Italian Conference on Computational Linguistics</orgName>
								<address>
									<addrLine>Dec 04 -06</addrLine>
									<postCode>2024</postCode>
									<settlement>Pisa</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Argument Mining in BioMedicine: Zero-Shot, In-Context Learning and Fine-tuning with LLMs</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">7847365F2C5FCD922628FF2D8A480F50</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:38+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Argument Mining</term>
					<term>NLP</term>
					<term>LLMs</term>
					<term>LLaMA-3</term>
					<term>Zero-Shot Learning</term>
					<term>In-Context Learning</term>
					<term>Fine-tuning</term>
					<term>Ensembling</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Argument Mining (AM) aims to extract the complex argumentative structure of a text and Argument Type Classification (ATC) is an essential sub-task of AM. Large Language Models (LLMs) have shown impressive capabilities in most NLP tasks and beyond. However, fine-tuning LLMs can be challenging. In-Context Learning (ICL) has been suggested as a bridging paradigm between training-free and fine-tuning settings for LLMs. In ICL, an LLM is conditioned to solve tasks using a few solved demonstration examples included in its prompt. We focuse on AM in the biomedical AbstRCT dataset. We address ATC using quantized and unquantized LLaMA-3 models through zero-shot learning, in-context learning, and fine-tuning approaches. We introduce a novel ICL strategy that combines kNN-based example selection with majority vote ensembling, along with a well-designed fine-tuning strategy for ATC. In zero-shot setting, we show that LLaMA-3 fails to achieve acceptable classification results, suggesting the need for additional training modalities. However, in our ICL training-free setting, LLaMA-3 can leverage relevant information from only a few demonstration examples to achieve very competitive results. Finally, in our fine-tuning setting, LLaMA-3 achieves state-of-the-art performance on ATC task in AbstRCT dataset.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Argument Mining (AM) focuses on extracting the underlying argumentative and discursive structure from raw text <ref type="bibr" target="#b0">[1]</ref>. Argument Type Classification (ATC), which involves classifying argumentative units in text according to their argumentative roles, is the crucial sub-task in AM. Research has shown that the argumentative role of a unit cannot be inferred solely for its text: additional structural and contextual information is needed <ref type="bibr" target="#b1">[2]</ref>. This additional information can be incorporated via feature engineering <ref type="bibr" target="#b1">[2]</ref>, memory-enabled neural architectures <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref> or LLM-based hybrid methods <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>.</p><p>Large Language Models (LLMs) have become ubiquitous in deep learning and have shown impressive capabilities in most NLP tasks <ref type="bibr" target="#b6">[7]</ref>. In the main, LLMs are used in two distinct settings: (i) training-free, where the pretrained LLM is used for inference without any parameter adjustment, and (ii) fine-tuning, where the parameters of the LLM are updated through supervised training to enable transfer learning on a downstream task. Zero-shot learning refers to the training-free approach where a pretrained LLM is prompted to solve tasks on completely unseen data samples.</p><p>Recently, In-Context Learning (ICL) has been proposed as a bridging paradigm between the training-free and fine-tuning settings. ICL is a prompt engineering technique whereby an LLM is conditioned to solve tasks by means of a few solved demonstration examples included as part of its input prompt <ref type="bibr" target="#b7">[8]</ref>. Generally, the input prompt includes task instructions, the current input sample to be solved as well as several solved input-output pair examples. In this way, ICL maintains the training-free posture (parameters frozen) of the LLM while at the same time providing it with some supervision through demonstration examples. It also enables direct incorporation of selected features inside the prompt template, thereby obviating the need for architecture customization. Creative ICL strategies combining kNN-based examples selection, generated chain-of-thought (CoT) prompting, and majority vote ensembling have been proposed and shown to outperform fine-tuning approaches <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12]</ref>. In the main, kNN-based examples selection optimizes the process of learning from few examples and ensembling increases the robustness of the predictions <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b10">11]</ref>.</p><p>This work focuses on AM in the biomedical AbstRCT dataset <ref type="bibr" target="#b13">[14]</ref>. More specifically, we address the ATC task using quantized and unquantized LLaMA-3 models, among the most capable openly available LLMs (cf. leaderboard), through zero-shot learning, in-context learning, and fine-tuning approaches. Our contributions are as follows:</p><p>• In zero-shot learning setting, we show that LLaMA-3 fails to achieve acceptable classification results, suggesting the need for implementing additional training modalities.</p><p>• We introduce a novel ICL strategy that combines kNN-based example selection with majority vote ensembling. In this training-free setting, LLaMA-3 can leverage relevant information from only a few demonstration examples to achieve very competitive results.</p><p>• We further experiment with fine-tuning strategy for LLaMA-3. In this setting, we achieve state-ofthe-art performance on the ATC task for AbstRCT dataset.</p><p>Our code is freely available on GitHub.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Works</head><p>In early works, Argument Mining has been approached using both classical algorithms such as SVM <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b16">17]</ref> as well as recurrent neural network models such as BiLSTMs <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b18">19,</ref><ref type="bibr" target="#b3">4]</ref>. Transformer-based models, such as BERT <ref type="bibr" target="#b19">[20]</ref>, have also been utilized for AM, including multi-scale argument modelling and customized featureinjected BERT-based models <ref type="bibr" target="#b20">[21,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b22">23,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b23">24,</ref><ref type="bibr" target="#b24">25]</ref>. AM in the biomedical AbstRCT dataset has been approached using LSTMs <ref type="bibr" target="#b27">[26,</ref><ref type="bibr" target="#b28">27]</ref>, sequential transfer learning <ref type="bibr" target="#b29">[28]</ref> as well as transformer-based models <ref type="bibr" target="#b30">[29,</ref><ref type="bibr" target="#b31">30,</ref><ref type="bibr" target="#b32">31]</ref>.</p><p>More recently, AM sub-tasks have been modeled as text generation tasks using LLMs. For the Argument Type Classification (ATC) sub-task, this approach involves using a prompt template to generate the corresponding class of an argument component. This method has been applied to various AM use-cases, such as podcast transcripts and legal documents <ref type="bibr" target="#b33">[32,</ref><ref type="bibr" target="#b34">33,</ref><ref type="bibr" target="#b35">34]</ref>. The latest approach in this 'AM using LLM text generation' direction involves a prompt that includes the argument component as the query and the complete text as the context, to output the class of the argument component using a generative model <ref type="bibr" target="#b36">[35]</ref>. In this study, the three AM subtasks are modeled using the Persuasive Essays (PE) and AbstRCT datasets.</p><p>In contrast to the fine-tuning approach, a relevant training-free ICL prompting strategy for LLMs has been proposed <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b10">11]</ref>. This strategy combines kNN-based example selection, generated chain-of-thought prompting, and majority vote ensembling for few-shot classification. Interestingly, the ICL strategy outperforms the fine-tuning approach on the datasets used in the study.</p><p>Our work sits at the intersection of zero-shot learning, in-context learning and fine-tuning. We implement and compare the performance of the latest openly available LLMs using these three approaches for AM on the AbstRCT dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Datasets</head><p>We consider the AbstRCT dataset which consists of abstracts of 650 Randomized Controlled Trials selected from the biomedical database PUBMed <ref type="bibr" target="#b13">[14]</ref>. For Ab-stRCt dataset, the Neoplasm train set (Neo-train) consists of 350 abstracts whereas the three Neoplasm, Glaucoma and Mixed tests sets (Neo-test, Gla-test and Mix-test, respectively) consist of 100 abstracts each. The statistics of AbstRCT dataset are given in   An sample of the AbstRCT dataset is provided below. The argument components (ACs) and their corresponding classes are indicated by bold tags.</p><p>&lt;AC1: Major Claim&gt;A combination of mitoxantrone plus prednisone is preferable to prednisone alone for reduction of pain in men with metastatic, hormone-resistant, prostate cancer.&lt;/AC1&gt; The purpose of this study was to assess the effects of these treatments on health-related quality of life (HQL). Men with metastatic prostate cancer (n = 161) were randomized to receive either daily prednisone alone or mitoxantrone (every 3 weeks) plus prednisone. Those who received prednisone alone could have mitoxantrone added after 6 weeks if there was no improvement in pain. HQL was assessed before treatment initiation and then every 3 weeks using the European Organization for Research and Treatment of Cancer Qualityof-Life Questionnaire C30 (EORTC QLQ-C30) and the Quality of Life Module-Prostate 14 (QOLM-P14), a trial-specific module developed for this study. An intent-to-treat analysis was used to determine the mean duration of HQL improvement and differences in improvement duration between groups of patients. &lt;AC2: Premise&gt; At 6 weeks, both groups showed improvement in several HQL do-mains&lt;/AC2&gt;, and &lt;AC3: Premise&gt;only physical functioning and pain were better in the mitoxantrone-plus-prednisone group than in the prednisone-alone group&lt;/AC3&gt;. &lt;AC4: Premise&gt;After 6 weeks, patients taking prednisone showed no improvement in HQL scores, whereas those taking mitoxantrone plus prednisone showed significant improvements in global quality of life (P =.009), four functioning domains, and nine symptoms (.001 &lt; P &lt;. 01)&lt;/AC4&gt;, and &lt;AC5: Premise&gt;the improvement (&gt; 10 units on a scale of 0 to100) lasted longer than in the prednisone-alone group (.004 &lt; P &lt;.05)&lt;/AC5&gt;. &lt;AC6: Premise&gt;The addition of mitoxantrone to prednisone after failure of prednisone alone was associated with improvements in pain, pain impact, pain relief, insomnia, and global quality of life (.001 &lt; P &lt;.003).&lt;/AC6&gt; &lt;AC7: Claim&gt;Treatment with mitoxantrone plus prednisone was associated with greater and longer-lasting improvement in several HQL domains and symptoms than treatment with prednisone alone.&lt;/AC7&gt; ) in text format, where X = {x1, x2, . . . } and Y = {y1, . . . , y k } are the sets of possible input and outputs, respectively. The ZSL and ICL paradigms correspond to the cases where k = 0 and k &gt; 0, respectively. For input x, the LLM M predicts the output y ˆsuch that y ˆ= arg max</p><formula xml:id="formula_0">y i ∈Y PM(yi | C; x) ,</formula><p>where PM(yi | C; x) is the probability that M generates yi when C and x are given as prompt. The main rationale behind ZSL and ICL is that the consideration of a well-chosen context C increases the probability of M predicting the correct answer y for input x, i.e., that PM(y | C; x) &gt; PM(y | x).</p><p>We consider a 2-step ICL strategy for argument type classification (ATC) inspired by a recent study <ref type="bibr" target="#b8">[9]</ref> (see Figure <ref type="figure">1</ref>). More precisely, let A be an abstract containing argument components (ACs) c1, . . . , cm with corresponding true classes y1, . . . , ym, where each yi ∈ {Claim, Premise}. Given the ACs c1, . . . , cm in the prompt, the LLM generates the corresponding class predictions y ˆ1, . . . , y ˆm as follows:</p><p>(1) kNN-based examples selection (k = 3, 5): First, 2k neighboring abstracts A1, . . . , A 2k of A are selected according to the following similarity measure. For any abstract Ai, let the signature of Ai be the embedding of the first sentence of Ai using the BioBERT model. The abstracts A1, . . . , A 2k are the ones whose signatures are the closest, with respect to cosine similarity, to the signature of A. Then, k abstracts, Ai 1 , . . . , Ai k , are randomly chosen from A1, . . . , A 2k . Afterwards, a prompt containing all the ACs and their corresponding classes in these k abstracts is constructed (kNN). Finally, the LLM predicts the classes y ˆ1, . . . , y ˆm of c1, . . . , cm on the basis of on this prompt.</p><p>(2) n-Ensembling (n = 3, 5): The kNN-based examples selection step, which involves randomness, is repeated n times (nEns), leading to a set of n sequences of class predictions {(y ˆi,1 , . . . , y ˆi,m ) : i = 1, . . . n}. The final class predictions y ˆ1, . . . , y ˆm of c1, . . . , cm are obtained by applying a component wise majority vote to the n predictions sequences.</p><p>The kNN-based example selection optimizes learning from few examples by selecting samples most similar to the current instance, rather than choosing them randomly. The ensembling step increases prediction robustness by selecting the most frequent predictions. Note that the relevance of the ensembling step relies on the random selection in the kNN step. This randomness ensures that same predictions are not always produced, allowing for majority voting and thereby increasing robustness.</p><p>To aid the LLM in generating predictions, additional task-specific information is typically included in the prompt. For example, definitions of the 'Claim' and 'Premise' classes, along with their statistics in the Neotrain set, can be incorporated in the prompt (info). Moreover, in addition to the ACs c1, . . . , cm whose class are to be predicted, the abstract text from which these ACs originate can be included in the prompt (abstract). According to this ICL strategy, the classes y ˆ1, . . . , y ˆm of c1, . . . , cm are predicted all-at-once (see Figure <ref type="figure">1</ref>). Therefore, a prompt of the form 'info + abstract + 3NN + 3Ens' (see Table <ref type="table">3</ref>) indicates that the argument components (ACs) of the abstract are predicted all-at-once, by incorporating additional information and the entire abstract text as contextual cues in the prompt, and employing the ICL strategy with 3NN-based example selection and 3-ensembling. A similar ICL strategy, where the classes y ˆ1, . . . , y ˆm are inferred one-by-one (i.e., each model inference leads to a single prediction y ˆj), has been considered but shown to be significantly less efficient. Due to space constraints, the latter results are omitted in this work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Fine-tuning</head><p>Fine-tuning (FT) refers to the process of further training a pre-trained LLM on a downstream task. Previous studies indicate that relying solely on the text of an argument component is insufficient for predicting its argumentative class; additional contextual information is essential for achieving competitive classification accuracy <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6]</ref>. Therefore, we propose a fine-tuning strategy that models the ATC task at the document level. Specifically, we incorporate task-specific information into each training</p><formula xml:id="formula_1">… abstract info ex. 1 ex. k preds … LLM … … … … … final preds nENS kNN … Figure 1: 2-step</formula><p>ICL approach: a kNN-based example prediction (k = 3, 5) step followed by an n-Ensembling (n = 3) step (cf. text for further details). For each abstract A, the class predictions y ˆ1, dots, y ˆm of all of its ACs x 1 , dots, xm are generated in one inference step (all-at-once modality).</p><p>sample and generate the class label predictions for the ACs of an abstract all-at-once.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Implementation Details</head><p>As the embedding engine, we use dmis-lab's BioBERT<ref type="foot" target="#foot_0">1</ref> . For zero-shot learning, ICL and fine-tuning, we experiment with the LLaMA-3-8B-Instruct and LLaMA-3-70B-Instruct models, as well as various GGML-quantized configurations of them <ref type="foot" target="#foot_1">2</ref> . For ICL, we set the generate temperature to 0.1. For fine-tuning, we use LoRA adapters with loraplus_lr_ratio of 16.0. We set batch size of 2 and learning rate of 5e −5 . For implementation, we use the LLaMA-Factory<ref type="foot" target="#foot_2">3</ref> framework <ref type="bibr" target="#b37">[36]</ref>. An example of the prompts we use for zero-shot learning, in-context learning and fine-tuning with LLaMA-3 are given in Appendix A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Zero-Shot Learning</head><p>The results for zero-shot learning (ZSL) on ATC task are reported in Table <ref type="table">2</ref>. Recall that zero-shot learning corresponds to the prompting strategy where no nearest neighbors are included as demonstration examples, referred to as 'info + abstract + 0NN' in our notation. In an initial experimentation phase, we observed that adding complementary information (info) (definitions of 'Claim' and 'Premise' and dataset statistics) and including the entire text of the abstract (abstract) significantly improve the results. These expected observations serve as an ablation study and justify the usage of the additional information and full abstract text (prompt template 'info + abstract') in all subsequent experiments.</p><p>In all experiments, we observed that the models consistently generated the correct number of classes for each inference task. This observation remains valid for subsequent ICL and fine-tuning settings. It demonstrates the model's capability to understand the correspondence between the number of input ACs and the number of classes to predict.</p><p>In ZSL training-free setting, across Neo, Gla and Mix test sets, the performance of LLMs strongly correlated with the complexity of these models, achieving maximal macro F1-scores of 0.698, 0.819 and 0.725, respectively. Overall, in ZSL, the LLMs fail to achieve acceptable results. These considerations underscore the need for implementing additional learning modalities to address the ATC task effectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">In-Context Learning</head><p>The results for in-context learning (ICL) on the ATC task are reported in Table <ref type="table">3</ref>. First, note that the transition from zero-shot learning ('info + abstract + 0NN', Table <ref type="table">2</ref>) to in-context learning ('info + abstract + kNN', Table <ref type="table">3</ref>) drastically improves the results. This validates the effectiveness of the kNNN-based examples selection method.</p><p>In addition, except for the Mix test set, the 3NN strategy consistently outperforms the 5NN strategy, suggesting that three examples suffice for optimal learning the ATC task in an ICL setting. The inclusion of more demonstration examples correlates with a significant increase</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Model</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C P F1</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Neo test</head><p>LLaMA-3-8b-Instruct-bnb-4bit 0.529 0.539 0.534 LLaMA-3-8b-Instruct 0.544 0.558 0.551 LLaMA-3-70b-Instruct-bnb-4bit 0.642 0.753 0.698</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Gla test</head><p>LLaMA-3-8b-Instruct-bnb-4bit 0.553 0.635 0.594 LLaMA-3-8b-Instruct 0.569 0.692 0.631 LLaMA-3-70b-Instruct-bnb-4bit 0.755 0.882 0.819</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Mix test</head><p>LLaMA-3-8b-Instruct-bnb-4bit 0.546 0.524 0.535 LLaMA-3-8b-Instruct 0.563 0.564 0.563 LLaMA-3-70b-Instruct-bnb-4bit 0.671 0.779 0.725</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>Zero-shot results for ATC on three test sets of the AbstRTC dataset using LLaMA-3.</p><p>in prompt length, potentially hindering the performance of the LLM or exceeding the maximum size of its context. Furthermore, the ensembling strategy consistently improves the results, even if only slightly, ensuring that the robustness of the results can indeed be strengthened through ensembling predictions.</p><p>Overall, the training-free ICL strategy achieves very competitive F1-scores of 0.912, 0.910, and 0.929 on Neo, Mix, and Gla test sets, respectively. However, these results remain lower than those obtained by previous training-dependent models (see Table <ref type="table">4</ref>, upper rows).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Fine-Tuning</head><p>The results achieved by the fine-tuning (FT) strategy on the ATC task are reported in Table <ref type="table">4</ref>. Our results show that fine-tuning significantly outperforms ICL. These findings suggest that the argumentative flow within abstracts cannot be inferred solely from the knowledge acquired during pre-training, and requires additional parameters updates to be effectively learned.</p><p>In this training-dependent context, we achieve maximal F1-scores of 0.935, 0.913, and 0.951 on the Neo, Gla, and Mix test sets, respectively, establishing new stateof-the-art results for the Neo and Mix test sets. These results suggest once again that the sequentiality of arguments inside a specific corpus requires fine-tuning to be optimally captured.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>In this work, we address argument type classification (ATC) in the biomedical AbstRTC dataset with openly available LLaMA-3 from the three-fold perspective of </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Prompt</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3</head><p>Results for ATC on three test sets of AbstRCT dataset with LLaMA-3 models using the 2-step ICL strategy described in the text.</p><p>zero-shot learning (ZSL), in-context learning (ICL) and fine-tuning (FT). We show that ZSL fails to achieve acceptable performance, ICL significantly improves the results, and FT reaches state-of-the-art performance. These results support the fact that ATC task cannot be solved in a zero-shot setting by relying solely on general-purpose language modalities acquired during</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Model</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Neo</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Gla Mix</head><p>ResAttArg(Ensemble) <ref type="bibr" target="#b28">[27]</ref> 0.879 0.877 0.897 SeqMT <ref type="bibr" target="#b29">[28]</ref> 0.919 0.924 0.922 MRC_GEN <ref type="bibr" target="#b36">[35]</ref> 0.928 0.926 0.940 GIAM <ref type="bibr" target="#b24">[25]</ref> 0.930 0.928 0.936</p><p>LLaMA-3-8B-Instruct 0.919 0.908 0.939 LLaMA-3-8B-Instruct-bnb-4bit 0.935 0.910 0.953 LLaMA-3-70B-Instruct 0.929 0.913 0.940 LLaMA-3-70B-Instruct-bnb-4bit 0.921 0.908 0.951</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 4</head><p>Fine-tuning results for ATC task on the three test sets of Ab-stRCT dataset using LLaMA-3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>pre-training. Additional learning is essential, either in the form of solved demonstration examples (ICL) or via parameters' updates (FT)</head><p>. We conjecture that the sequential flow of arguments within a text is a corpus-specific feature that cannot be inferred through zero-shot methods.</p><p>Previous works demonstrated that the text of argument components alone do not suffice to infer their argumentative roles <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b5">6]</ref>. Additional contextual, structural and syntactic features are necessary. In our ICL and FT settings, comprehensive contextual and structural information is incorporated through task-specific information and complete abstract text provided in the prompt. This information enables the model to discern the sequence of arguments, their associated markers, and other characteristics closely associated with their argumentative roles.</p><p>For future work, the design and implementation of a full AM pipeline using LLMs represents a major milestone. In this scenario, the LLM would take raw texts as input and produce a detailed map of the argumentative structure as output. We believe that LLMs will substantially transform the landscape of AM and its practical applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Appendix</head><p>Examples of prompts for LLaMA 3 for the zero-shot learning (ZSL), in-context learning (ICL) and fine-tuning (FT) settings are provided below.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.1. Zero-Shot Learning</head><p>### Task description: You are an expert biomedical assistant that takes 1) an abstract text and 2) the list of all arguments from this abstract text, and must classify all arguments into one of two classes: Claim or Premise. 68.0052% of examples are of type Premise and 31.9948% of type Claim. You must absolutely not generate any text or explanation other than the following JSON format {"Argument 1": &lt;predicted class for Argument 1 (str)&gt;, ..., "Argument n": &lt;predicted class for Argument n (str)&gt;} ### Class definitions: Claim = A claim in the abstract of an RCT is a statement or conclusion about the findings of the study. Premise = A premise in the abstract of an RCT is a statement that provides an evidence or proof for a claim.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>### Abstract: Few controlled clinical trials exist to support oral combination therapy in pulmonary arterial hypertension (PAH). Patients with PAH (idiopathic [IPAH] or associated with connective tissue disease [APAH-CTD])</head><p>taking bosentan (62.5 or 125 mg twice daily at a stable dose for ≥3 months) were randomized (1:1) to sildenafil (20 mg, 3 times daily; n = 50) or placebo (n = 53). The primary endpoint was change from baseline in 6-min walk distance (6MWD) at week 12, assessed using analysis of covariance. Patients could continue in a 52-week extension study. An analysis of covariance main-effects model was used, which included categorical terms for treatment, baseline 6MWD (&lt;325 m; ≥325 m), and baseline aetiology; sensitivity analyses were subsequently performed. In sildenafil versus placebo arms, week-12 6MWD increases were similar (least squares mean difference [sildenafil-placebo], -2.4 m [90% CI: -21.8 to 17.1 m]; P = 0.6); mean ± SD changes from baseline were 26.4 ± 45.7 versus 11.8 ± 57.4 m, respectively, in IPAH (65% of population) and -18.3 ± 82.0 versus 17.5 ± 59.1 m in APAH-CTD (35% of population). One-year survival was 96%; patients maintained modest 6MWD improvements. Changes in WHO functional class and Borg dyspnoea score and incidence of clinical worsening did not differ. Headache, diarrhoea, and flushing were more common with sildenafil. Sildenafil, in addition to stable (≥3 months) bosentan therapy, had no benefit over placebo for 12-week change from baseline in 6MWD. The influence of PAH aetiology warrants future study. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.2. In-Context Learning (ICL)</head><p>### Task description: You are an expert biomedical assistant that takes 1) an abstract text, 2) the list of all arguments from this abstract text, and must classify all arguments into one of two classes: Claim or Premise. 68.0052% of examples are of type Premise and 31.9948% of type Claim. You must absolutely not generate any text or explanation other than the following JSON format {"Argument 1": &lt;predicted class for Argument 1 (str)&gt;, ..., "Argument n": &lt;predicted class for Argument n (str)&gt;} ### Class definitions: Claim = A claim in the abstract of an RCT is a statement or conclusion about the findings of the study. Premise = A premise in the abstract of an RCT is a statement that provides an evidence or proof for a claim.</p><formula xml:id="formula_2">### Examples: ## Example 1 # Abstract:</formula><p>Treatment of patients with advanced or metastatic esophagogastric adenocarcinoma should not only prolong life but also provide relief of symptoms and improve quality of life (QOL). Esophagogastric adenocarcinoma mainly occurs in elderly patients, but they are underrepresented in most clinical trials and often do not receive effective combination chemotherapy, most probably for fear of intolerance. Using validated instruments, we prospectively assessed QOL within the randomized FLOT65+ phase II trial. Within the FLOT65+ trial, a total of 143 patients aged ≥65 years were randomly allocated to receive biweekly oxaliplatin plus 5-fluorouracil (5-FU) continuous infusion and folinic acid (FLO) or the same regimen in combination with docetaxel 50 mg/m(2) (FLOT). The European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire C30 (EORTC QLQ-C30) and the gastric module STO22 were administered every 8 weeks until progression. Time to definitive deterioration of QOL parameters was analyzed and compared within the treatment arms. The median age of patients was 70 years. Patients receiving FLOT exhibited higher response rates and had improved disease-free and progression-free survival (PFS). The proportions of patients with evaluable baseline EORTC QLQ-C30 and STO22 questionnaires were balanced (83 % in FLOT and 89 % in FLO). Considering evaluable patients with assessable questionnaires (n = 123), neither functioning nor symptom parameters differed significantly in favor of one of the two treatment groups. Particularly, there was no significant difference regarding time to definitive deterioration of global health status/quality of life from baseline (primary endpoint). Notably, patients receiving FLO or FLOT as palliative treatment (n = 98) achieved comparable QOL results. Although toxicity was higher in patients receiving FLOT, no negative impact of the addition of docetaxel on QOL parameters could be demonstrated. Thus, elderly patients in need of intensified chemotherapy may receive FLOT without compromising patient-reported outcome parameters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head># Arguments:</head><p>Argument 1=Patients receiving FLOT exhibited higher response rates and had improved disease-free and progression-free survival (PFS). Argument 2=there was no significant difference regarding time to definitive deterioration of global health status/quality of life from baseline (primary endpoint). Argument 3=patients receiving FLO or FLOT as palliative treatment (n = 98) achieved comparable QOL results. Argument 4=Although toxicity was higher in patients receiving FLOT, Argument 5=no negative impact of the addition of docetaxel on QOL parameters could be demonstrated. Argument 6=elderly patients in need of intensified chemotherapy may receive FLOT without compromising patient-reported outcome parameters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head># Result:</head><p>{"Argument 1": "Premise", "Argument 2": "Premise", "Argument 3": "Premise", "Argument 4": "Premise", "Argument 5": "Premise", "Argument 6": "Claim"} ## Example 2 # Abstract: Chemotherapy prolongs survival and improves quality of life (QOL) for good performance status (PS) patients with advanced non-small cell lung cancer (NSCLC). Targeted therapies may improve chemotherapy effectiveness without worsening toxicity. SGN-15 is an antibody-drug conjugate (ADC), consisting of a chimeric murine monoclonal antibody recognizing the Lewis Y (Le(y)) antigen, conjugated to doxorubicin. Le(y) is an attractive target since it is expressed by most NSCLC. SGN-15 was active against Le(y)-positive tumors in early phase clinical trials and was synergistic with docetaxel in preclinical experiments. This Phase II, open-label study was conducted to confirm the activity of SGN-15 plus docetaxel in previously treated NSCLC patients. Sixty-two patients with recurrent or metastatic NSCLC expressing Le(y), one or two prior chemotherapy regimens, and PS&lt; or =2 were randomized 2:1 to receive SGN-15 200 mg/m2/week with docetaxel 35 mg/m2/week (Arm A) or docetaxel 35 mg/m2/week alone (Arm B) for 6 of 8 weeks. Intrapatient dose-escalation of SGN-15 to 350 mg/m2 was permitted in the second half of the study. Endpoints were survival, safety, efficacy, and quality of life. Forty patients on Arm A and 19 on Arm B received at least one treatment. Patients on Arms A and B had median survivals of 31.4 and 25.3 weeks, 12-month survivals of 29% and 24%, and 18-month survivals of 18% and 8%, respectively Toxicity was mild in both arms. QOL analyses favored Arm A. SGN-15 plus docetaxel is a well-tolerated and active second and third line treatment for NSCLC patients . Ongoing studies are exploring alternate schedules to maximize synergy between these agents.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head># Arguments:</head><p>Argument 1=Chemotherapy prolongs survival and improves quality of life (QOL) for good performance status (PS) patients with advanced non-small cell lung cancer (NSCLC). Argument 2=Targeted therapies may improve chemotherapy effectiveness without worsening toxicity. Argument 3=Le(y) is an attractive target since it is expressed by most NSCLC. Argument 4=SGN-15 was active against Le(y)-positive tumors in early phase clinical trials and was synergistic with docetaxel in preclinical experiments. Argument 5=Patients on Arms A and B had median survivals of 31.4 and 25.3 weeks, 12-month survivals of 29% and 24%, and 18-month survivals of 18% and 8%, respectively Argument 6=Toxicity was mild in both arms. Argument 7=QOL analyses favored Arm A. Argument 8=SGN-15 plus docetaxel is a well-tolerated and active second and third line treatment for NSCLC patients # Result: {"Argument 1": "Claim", "Argument 2": "Claim", "Argument 3": "Claim", "Argument 4": "Premise", "Argument 5": "Premise", "Argument 6": "Premise", "Argument 7": "Premise", "Argument 8": "Claim"} ## Example 3 # Abstract:</p><p>The impact of treatment on health-related quality of life (HRQoL) is an important consideration in the adjuvant treatment of operable breast cancer. Here we report mature HRQoL outcomes from the ATAC trial, comparing anastrozole with tamoxifen as primary adjuvant therapy for postmenopausal women with localized breast cancer. Patients completed the Functional Assessment of Cancer Therapy-Breast (FACT-B) questionnaire plus endocrine subscale (ES) at baseline, 3 and 6 months, and every 6 months thereafter. Baseline characteristics in the HRQoL sub-protocol were well balanced between the anastrozole (n = 335) and tamoxifen (n = 347) groups in the primary analysis population. As with previously published results at 2 years, there was no statistically significant difference in the Trial Outcome Index of the FACT-B, the primary endpoint of the study, between treatments at 5 years. There were no statistically significant differences between treatment groups in ES total scores. Consistent with the 2-year analysis, there were differences between treatment groups in patient-reported side effects: diarrhea (anastrozole 3.1% vs. tamoxifen 1.3%), vaginal dryness (18.5% vs. 9.1%), diminished libido (34.0% vs. 26.1%), and dyspareunia (17.3% vs. 8.1%) were significantly more frequent with anastrozole compared to tamoxifen. Dizziness (3.1% vs. 5.4%) and vaginal discharge (1.2% vs. 5.2%) were significantly less frequent with anastrozole compared to tamoxifen. In this, the first report of HRQoL over 5 years of initial adjuvant therapy with an aromatase inhibitor, we conclude that anastrozole and tamoxifen had similar impacts on HRQoL, which was maintained or slightly improved during the treatment period for both groups.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head># Arguments:</head><p>Argument 1=The impact of treatment on health-related quality of life (HRQoL) is important consideration in the adjuvant treatment of operable breast cancer. Argument 2=As with previously published results at 2 years, there was no statistically significant difference in the Trial Outcome Index of the FACT-B, the primary endpoint of the study, between treatments at 5 years. Argument 3=There were no statistically significant differences between treatment groups in ES total scores. Argument 4=there were differences between treatment groups in patient-reported side effects: Argument 5=diarrhea (anastrozole 3.1% vs. tamoxifen 1.3%), vaginal dryness (18.5% vs. 9.1%), diminished libido (34.0% vs. 26.1%), and dyspareunia (17.3% vs. 8.1%) were significantly more frequent with anastrozole compared to tamoxifen. Argument 6=Dizziness (3.1% vs. 5.4%) and vaginal discharge (1.2% vs. 5.2%) were significantly less frequent with anastrozole compared to tamoxifen. Argument 7=In this, the first report of HRQoL over 5 years of initial adjuvant therapy with an aromatase inhibitor, we conclude that anastrozole and tamoxifen had similar impacts on HRQoL, which was maintained or slightly improved during the treatment period for both groups.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head># Result:</head><p>{"Argument 1": "Claim", "Argument 2": "Premise", "Argument 3": "Premise", "Argument 4": "Claim", "Argument 5": "Premise", "Argument 6": "Premise", "Argument 7": "Claim"} # Abstract: Few controlled clinical trials exist to support oral combination therapy in pulmonary arterial hypertension (PAH). Patients with PAH (idiopathic [IPAH] or associated with connective tissue disease [APAH-CTD]) taking bosentan (62.5 or 125 mg twice daily at a stable dose for ≥3 months) were randomized (1:1) to sildenafil (20 mg, 3 times daily; n = 50) or placebo (n = 53). The primary endpoint was change from baseline in 6-min walk distance (6MWD) at week 12, assessed using analysis of covariance. Patients could continue in a 52-week extension study. An analysis of covariance main-effects model was used, which included categorical terms for treatment, baseline 6MWD (&lt;325 m; ≥325 m), and baseline aetiology; sensitivity analyses were subsequently performed. In sildenafil versus placebo arms, week-12 6MWD increases were similar (least squares mean difference [sildenafil-placebo], -2.4 m [90% CI: -21.8 to 17.1 m]; P = 0.6); mean ± SD changes from baseline were 26.4 ± 45.7 versus 11.8 ± 57.4 m, respectively, in IPAH (65% of population) and -18.3 ± 82.0 versus 17.5 ± 59.1 m in APAH-CTD (35% of population). One-year survival was 96%; patients maintained modest 6MWD improvements. Changes in WHO functional class and Borg dyspnoea score and incidence of clinical worsening did not differ. Headache, diarrhoea, and flushing were more common with sildenafil. Sildenafil, in addition to stable (≥3 months) bosentan therapy, had no benefit over placebo for 12-week change from baseline in 6MWD. The influence of PAH aetiology warrants future study.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head># Arguments:</head><p>Argument 1=In sildenafil versus placebo arms, week-12 6MWD increases were similar (least squares mean difference [sildenafil-placebo], -2.4 m [90% CI: -21.8 to 17.1 m]; P = 0.6); mean ± SD changes from baseline were 26.4 ± 45.7 versus 11.8 ± 57.4 m, respectively, in IPAH (65% of population) and -18.3 ± 82.0 versus 17.5 ± 59.1 m in APAH-CTD (35% of population). Argument 2=Changes in WHO functional class and Borg dyspnoea score and incidence of clinical worsening did not differ. Argument 3=Headache, diarrhoea, and flushing were more common with sildenafil. Argument 4=Sildenafil, in addition to stable (≥3 months) bosentan therapy, had no benefit over placebo for 12-week change from baseline in 6MWD. # Result:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.3. Fine-Tuning (FT)</head><p>### You are an expert in medical analysis. You are given the abstract of a random controlled trial which contains numbered argument components enclosed by &lt;AC&gt;&lt;/AC&gt; tags. Your task is to classify each argument components in the essay as either "Claim" or "Premise". You must return a list of argument component types in following JSON format: "component_types": [component_type (str), component_type (str), ..., component_type (str)] ### Here is the abstract text: An open, randomized study was performed to assess the effects of supportive pamidronate treatment on morbidity from bone metastases in breast cancer patients. Eighty-one pamidronate patients and 80 control patients were monitored for a median of 18 and 21 months, respectively, for events of skeletal morbidity and the radiologic course of metastatic bone disease. The oral pamidronate dose was 600 mg/d (high dose [HD]) during the earliest study years, then changed to 300 mg/d (low dose [LD]) because of gastrointestinal toxicity. Twenty-nine of 81 pamidronate (HD/LD) patients first received 600 mg/d and were then changed to 300 mg/d; 52 of 81 pamidronate LD patients received 300 mg/d throughout the study. Tumor treatment was unrestricted. An overall intent-to-treat analysis was performed.&lt;AC&gt; In the pamidronate group, the occurrence of hypercalcemia, severe bone pain, and symptomatic impending fractures decreased by 65%, 30%, and 50%, respectively; event-rates of systemic treatment and radiotherapy decreased by 35% (P &lt; or = .02). &lt;/AC&gt;&lt;AC&gt; The event-free period (EFP), radiologic course of disease, and survival did not improve. &lt;/AC&gt;&lt;AC&gt; Subgroup analyses suggested a dose-dependent treatment effect. &lt;/AC&gt;&lt;AC&gt; Compared with their controls, in pamidronate HD/LD patients, events occurred 60% to 90% less frequently (P &lt; or = .03) and the EFP was prolonged (P = .002). &lt;/AC&gt;&lt;AC&gt; In pamidronate LD patients, event-rates decreased by 15% to 45% (P &lt; or = .04). &lt;/AC&gt;&lt;AC&gt; Gastrointestinal toxicity of pamidronate caused a 23% drop-out rate, &lt;/AC&gt;&lt;AC&gt; but other cancer-associated factors seemed to contribute to this toxicity. &lt;/AC&gt;&lt;AC&gt; Pamidronate treatment of breast cancer patients efficaciously reduced skeletal morbidity. &lt;/AC&gt;&lt;AC&gt; The effect appeared to be dose-dependent. &lt;/AC&gt;&lt;AC&gt; Further research on dose and mode of treatment is mandatory. &lt;/AC&gt; {"component_types": ["Premise", "Premise", "Claim", "Premise", "Premise", "Premise", "Claim", "Claim", "Claim", "Claim"]}</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>#</head><label></label><figDesc>## Arguments: Argument 1=In sildenafil versus placebo arms, week-12 6MWD increases were similar (least squares mean difference [sildenafil-placebo], -2.4 m [90% CI: -21.8 to 17.1 m]; P = 0.6); mean ± SD changes from baseline were 26.4 ± 45.7 versus 11.8 ± 57.4 m, respectively, in IPAH (65% of population) and -18.3 ± 82.0 versus 17.5 ± 59.1 m in APAH-CTD (35% of population). Argument 2=Changes in WHO functional class and Borg dyspnoea score and incidence of clinical worsening did not differ. Argument 3=Headache, diarrhoea, and flushing were more common with sildenafil. Argument 4=Sildenafil, in addition to stable (≥3 months) bosentan therapy, had no benefit over placebo for 12-week change from baseline in 6MWD. ### Result:</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table</head><label></label><figDesc></figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 1</head><label>1</label><figDesc>. The argument type classification (ATC) task consists of predicting the type of each argument component (AC) as 'Major Claim', 'Claim' or 'Premise'. Following previous approaches, we combine the 'Major Claim' and 'Claim' classes into a single class 'Claim'. AbstRCT dataset statistics.</figDesc><table><row><cell cols="2">Dataset Split Abstracts</cell><cell>ACs</cell></row><row><cell>Neo-train</cell><cell>350</cell><cell>2,291</cell></row><row><cell>Neo-test</cell><cell>100</cell><cell>691</cell></row><row><cell>Gla-test</cell><cell>100</cell><cell>615</cell></row><row><cell>Mix-test</cell><cell>100</cell><cell>609</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>3.2. Zero-Shot Learning (ZSL) and In-Context Learning (ICL)</head><label></label><figDesc></figDesc><table><row><cell>Zero-shot learning (ZSL) is the paradigm where the LLM</cell></row><row><cell>is asked to solve a downstream task without receiving</cell></row><row><cell>any specific solved examples in the prompt. By contrast,</cell></row><row><cell>in-context learning (ICL) refers to the emergent ability of</cell></row><row><cell>LLMs to solve a downstream task based on a few demon-</cell></row><row><cell>stration examples given in the prompt as contextual in-</cell></row><row><cell>formation [8]. As the major advantage, ZSL and ICL</cell></row><row><cell>paradigms do not require any fine-tuning of the model's</cell></row><row><cell>parameters (i.e. training-free framework).</cell></row><row><cell>Formally, let x be a query input text and C =</cell></row><row><cell>[I; t(xi 1 , yi 1 ); . . . ; t(xi k , yi k )] be a context composed</cell></row><row><cell>of instructions I concatenated with input-output pairs</cell></row><row><cell>(xj, yi j</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://huggingface.co/dmis-lab</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://github.com/ggerganov/ggml</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://github.com/hiyouga/LLaMA-Factory</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work benefited from access to the computing resources of the L3i laboratory, operated and hosted by the University of La Rochelle. It is financed by the French government and the Region Nouvelle-Acquitaine. This research also benefited from institutional support RVO: 67985807 and partially supported by the grant of the Czech Science Foundation No. GA22-02067S. Finally, we are grateful to Playtika Ltd. for their support for this research.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Argumentation mining: The detection, classification and structure of arguments in text</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">M</forename><surname>Palau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-F</forename><surname>Moens</surname></persName>
		</author>
		<idno type="DOI">10.1145/1568234.1568246</idno>
		<idno>doi:10.1145/1568234.1568246</idno>
		<ptr target="https://doi.org/10.1145/1568234.1568246" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of ICAIL 2019, ICAIL &apos;09</title>
				<meeting>ICAIL 2019, ICAIL &apos;09<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="98" to="107" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Parsing argumentation structures in persuasive essays</title>
		<author>
			<persName><forename type="first">C</forename><surname>Stab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<idno type="DOI">10.1162/COLI_a_00295</idno>
		<ptr target="https://aclanthology.org/J17-3005.doi:10.1162/COLI_a_00295" />
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">43</biblScope>
			<biblScope unit="page" from="619" to="659" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Here&apos;s my point: Joint pointer architecture for argument mining</title>
		<author>
			<persName><forename type="first">P</forename><surname>Potash</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Romanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rumshisky</surname></persName>
		</author>
		<idno type="DOI">10.18653/V1/D17-1143</idno>
		<ptr target="https://doi.org/10.18653/v1/d17-1143.doi:10.18653/V1/D17-1143" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of EMNLP 2017, ACL</title>
				<editor>
			<persName><forename type="first">M</forename><forename type="middle">P</forename></persName>
		</editor>
		<meeting>EMNLP 2017, ACL</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1364" to="1373" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">An empirical study of span representations in argumentation structure parsing</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kuribayashi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ouchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Inoue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Reisert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Miyoshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Suzuki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Inui</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P19-1464</idno>
		<ptr target="https://aclanthology.org/P19-1464.doi:10.18653/v1/P19-1464" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of ACL 2019, ACL</title>
				<editor>
			<persName><forename type="first">A</forename><forename type="middle">K</forename></persName>
		</editor>
		<meeting>ACL 2019, ACL<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="4691" to="4698" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Argument classification with BERT plus contextual, structural and syntactic features as text</title>
		<author>
			<persName><forename type="first">U</forename><surname>Mushtaq</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cabessa</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-981-99-1639-9_52</idno>
		<ptr target="https://doi.org/10.1007/978-981-99-1639-9_52.doi:10.1007/978-981-99-1639-9\_52" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of ICONIP 2022</title>
				<editor>
			<persName><forename type="first">M</forename><forename type="middle">T</forename></persName>
		</editor>
		<meeting>ICONIP 2022</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">1791</biblScope>
			<biblScope unit="page" from="622" to="633" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Argument mining with modular BERT and transfer learning</title>
		<author>
			<persName><forename type="first">U</forename><surname>Mushtaq</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cabessa</surname></persName>
		</author>
		<idno type="DOI">10.1109/IJCNN54540.2023.10191968</idno>
		<ptr target="https://doi.org/10.1109/IJCNN54540.2023.10191968.doi:10.1109/IJCNN54540.2023.10191968" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of IJCNN 2023</title>
				<meeting>IJCNN 2023</meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">A survey of large language models</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">X</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Hou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Min</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Nie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wen</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.2303.18223</idno>
		<idno type="arXiv">arXiv:2303.18223</idno>
		<ptr target="/ARXIV.2303.18223" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">A survey on in-context learning</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Sui</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.2301.00234</idno>
		<idno type="arXiv">arXiv:2301.00234</idno>
		<ptr target="https://doi.org/10.48550/arXiv.2301.00234.doi:10.48550/ARXIV.2301.00234" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Can generalist foundation models outcompete special-purpose tuning? case study in medicine</title>
		<author>
			<persName><forename type="first">H</forename><surname>Nori</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.2311.16452</idno>
		<idno type="arXiv">arXiv:2311.16452</idno>
		<ptr target="/ARXIV.2311.16452" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Chain-of-thought prompting elicits reasoning in large language models</title>
		<author>
			<persName><forename type="first">J</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schuurmans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bosma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ichter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Xia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Chi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhou</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of NeurIPS 2022</title>
				<editor>
			<persName><forename type="first">S</forename><forename type="middle">K</forename></persName>
		</editor>
		<meeting>NeurIPS 2022</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="24824" to="24837" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Lei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.2309.11911</idno>
		<idno type="arXiv">arXiv:2309.11911</idno>
		<ptr target="https://doi.org/10.48550/arXiv.2309.11911.doi:10.48550/ARXIV.2309.11911" />
		<title level="m">Instructerc: Reforming emotion recognition in conversation with a retrieval multi-task llms framework</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schuurmans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Chi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chowdhery</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhou</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2203.11171</idno>
		<title level="m">Self-consistency improves chain of thought reasoning in language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Capabilities of GPT-4 on medical challenge problems</title>
		<author>
			<persName><forename type="first">H</forename><surname>Nori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>King</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Mckinney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Carignan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Horvitz</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.2303.13375</idno>
		<idno type="arXiv">arXiv:2303.13375</idno>
		<ptr target="https://doi.org/10.48550/arXiv.2303.13375" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Mayer</surname></persName>
		</author>
		<ptr target="https://theses.hal.science/tel-03209489" />
		<title level="m">Argument Mining on Clinical Trials, Theses</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
		<respStmt>
			<orgName>Université Côte d&apos;Azur</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Argumentation mining</title>
		<author>
			<persName><forename type="first">R</forename><surname>Mochales</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Moens</surname></persName>
		</author>
		<idno type="DOI">10.1007/s10506-010-9104-x</idno>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence and Law</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="page" from="1" to="22" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Argumentation mining in user-generated web discourse</title>
		<author>
			<persName><forename type="first">I</forename><surname>Habernal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<idno type="DOI">10.1162/COLI_a_00276</idno>
		<ptr target="https://aclanthology.org/J17-1004.doi:10.1162/COLI_a_00276" />
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">43</biblScope>
			<biblScope unit="page" from="125" to="179" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Context dependent claim detection</title>
		<author>
			<persName><forename type="first">R</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bilu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hershcovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Aharoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Slonim</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:18847466" />
		<imprint>
			<date type="published" when="2014">2014</date>
			<publisher>ICCL</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Neural endto-end learning for computational argumentation mining</title>
		<author>
			<persName><forename type="first">S</forename><surname>Eger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Daxenberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P17-1002</idno>
		<ptr target="https://aclanthology.org/P17-1002.doi:10.18653/v1/P17-1002" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of ACL 2017, ACL</title>
				<editor>
			<persName><forename type="first">R</forename><surname>Barzilay</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M.-Y</forename><surname>Kan</surname></persName>
		</editor>
		<meeting>ACL 2017, ACL<address><addrLine>Vancouver, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="11" to="22" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Argument mining with structured SVMs and RNNs</title>
		<author>
			<persName><forename type="first">V</forename><surname>Niculae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Cardie</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P17-1091</idno>
		<ptr target="https://aclanthology.org/P17-1091.doi:10.18653/v1/P17-1091" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of ACL 2017, ACL</title>
				<editor>
			<persName><forename type="first">R</forename><surname>Barzilay</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M.-Y</forename><surname>Kan</surname></persName>
		</editor>
		<meeting>ACL 2017, ACL<address><addrLine>Vancouver, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="985" to="995" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">BERT: pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/V1/N19-1423</idno>
		<ptr target="https://doi.org/10.18653/v1/n19-1423.doi:10.18653/V1/N19-1423" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of NAACL-HLT 2019, ACL</title>
				<editor>
			<persName><forename type="first">J</forename><forename type="middle">B</forename></persName>
		</editor>
		<meeting>NAACL-HLT 2019, ACL</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">Enhancing legal argument mining with domain pre-training and neural networks</title>
		<author>
			<persName><forename type="first">G</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nulty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lillis</surname></persName>
		</author>
		<idno>CoRR abs/2202.13457</idno>
		<ptr target="https://arxiv.org/abs/2202.13457.arXiv:2202.13457" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Argumentation mining on essays at multi scales</title>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Dou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Hong</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.coling-main.478</idno>
		<ptr target="https://aclanthology.org/2020.coling-main.478.doi:10.18653/v1/2020.coling-main.478" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of COLING 2020</title>
				<editor>
			<persName><forename type="first">D</forename><forename type="middle">S</forename></persName>
		</editor>
		<meeting>COLING 2020<address><addrLine>Barcelona, Spain (Online</addrLine></address></meeting>
		<imprint>
			<publisher>ICCL</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="5480" to="5493" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Linguistic feature injection for efficient natural language processing</title>
		<author>
			<persName><forename type="first">S</forename><surname>Fioravanti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zugarini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Giannini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rigutini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Maggini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Diligenti</surname></persName>
		</author>
		<idno type="DOI">10.1109/IJCNN54540.2023.10191680</idno>
		<ptr target="https://doi.org/10.1109/IJCNN54540.2023.10191680.doi:10.1109/IJCNN54540.2023.10191680" />
	</analytic>
	<monogr>
		<title level="m">IJCNN 2023</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2023">June 18-23, 2023. 2023</date>
			<biblScope unit="page" from="1" to="7" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">A neural transition-based model for argumentation mining</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Dang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Xu</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.acl-long.497</idno>
		<ptr target="https://aclanthology.org/2021.acl-long.497.doi:10.18653/v1/2021.acl-long.497" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 59th Annual Meeting of the ACL and the 11th International Joint Conference on Natural Language Processing</title>
				<editor>
			<persName><forename type="first">C</forename><surname>Zong</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Xia</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">W</forename><surname>Li</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Navigli</surname></persName>
		</editor>
		<meeting>the 59th Annual Meeting of the ACL and the 11th International Joint Conference on Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="6354" to="6364" />
		</imprint>
	</monogr>
	<note>: Long Papers), ACL</note>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<author>
			<persName><forename type="first">B</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Schlegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Thompson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">T</forename><surname>Batista-Navarro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ananiadou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Global informationaware argument mining based on a topdown multi-turn qa model</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title/>
		<idno type="DOI">10.1016/j.ipm.2023.103445</idno>
		<ptr target="https:" />
	</analytic>
	<monogr>
		<title level="j">Processing &amp; Management</title>
		<imprint>
			<biblScope unit="volume">60</biblScope>
			<biblScope unit="page">103445</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<title/>
		<idno type="DOI">10.1016/j.ipm.2023.103445</idno>
		<idno>.ipm.2023.103445</idno>
		<ptr target="//doi.org/10.1016/j" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Argumentative link prediction using residual networks and multiobjective learning</title>
		<author>
			<persName><forename type="first">A</forename><surname>Galassi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lippi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Torroni</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/W18-5201</idno>
		<ptr target="https://aclanthology.org/W18-5201.doi:10.18653/v1/W18-5201" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th Workshop on Argument Mining, ACL</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Slonim</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Aharonov</surname></persName>
		</editor>
		<meeting>the 5th Workshop on Argument Mining, ACL<address><addrLine>Brussels, Belgium</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Multi-task attentive residual networks for argument mining</title>
		<author>
			<persName><forename type="first">A</forename><surname>Galassi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lippi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Torroni</surname></persName>
		</author>
		<idno type="DOI">10.1109/TASLP.2023.3275040</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE/ACM Transactions on Audio, Speech, and Language Processing</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="page" from="1877" to="1892" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Biomedical argument mining based on sequential multi-task learning</title>
		<author>
			<persName><forename type="first">J</forename><surname>Si</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<idno type="DOI">10.1109/TCBB.2022.3173447</idno>
		<ptr target="https://doi.org/10.1109/TCBB.2022.3173447.doi:10.1109/TCBB.2022.3173447" />
	</analytic>
	<monogr>
		<title level="j">IEEE/ACM Trans. Comput. Biol. Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="864" to="874" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Transformer-based argument mining for healthcare applications</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mayer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Cabrio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Villata</surname></persName>
		</author>
		<idno type="DOI">10.3233/FAIA200334</idno>
		<ptr target="https://doi.org/10.3233/FAIA200334.doi:10.3233/FAIA200334" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of ECAI 2020</title>
				<editor>
			<persName><forename type="first">G</forename><forename type="middle">D G</forename></persName>
		</editor>
		<meeting>ECAI 2020</meeting>
		<imprint>
			<publisher>IOS Press</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">325</biblScope>
			<biblScope unit="page" from="2108" to="2115" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Acta 2.0: A modular architecture for multi-layer argumentative analysis of clinical trials</title>
		<author>
			<persName><forename type="first">B</forename><surname>Molinet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Marro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Cabrio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Villata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mayer</surname></persName>
		</author>
		<idno type="DOI">10.24963/ijcai.2022/859</idno>
		<ptr target="https://doi.org/10.24963/ijcai.2022/859.doi:10.24963/ijcai.2022/859,demoTrack" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of IJCAI-22, International Joint Conferences on Artificial Intelligence Organization</title>
				<editor>
			<persName><forename type="first">L</forename><forename type="middle">D</forename><surname>Raedt</surname></persName>
		</editor>
		<meeting>IJCAI-22, International Joint Conferences on Artificial Intelligence Organization</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="5940" to="5943" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Enhancing evidence-based medicine with natural language argumentative analysis of clinical trials</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mayer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Marro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Cabrio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Villata</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.artmed.2021.102098</idno>
		<ptr target="https://doi.org/10.1016/j.artmed.2021.102098" />
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence in Medicine</title>
		<imprint>
			<biblScope unit="volume">118</biblScope>
			<biblScope unit="page">102098</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Will it blend? mixing training paradigms &amp; prompting for argument quality prediction</title>
		<author>
			<persName><forename type="first">M</forename><surname>Van Der Meer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Reuver</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Khurana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Krause</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">B</forename><surname>Santamaría</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2022.argmining-1.8" />
	</analytic>
	<monogr>
		<title level="m">ArgMining@COLING 2022</title>
				<editor>
			<persName><forename type="first">G</forename><surname>Lapesa</surname></persName>
		</editor>
		<imprint>
			<publisher>ICCL</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="95" to="103" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Argumentmining from podcasts using chatgpt</title>
		<author>
			<persName><forename type="first">M</forename><surname>Pojoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Dumani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Schenkel</surname></persName>
		</author>
		<idno>WS.org</idno>
		<ptr target="https://ceur-ws.org/Vol-3438/paper_10.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of ICCBR-WS 2023</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">L</forename><surname>Malburg</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Verma</surname></persName>
		</editor>
		<meeting>ICCBR-WS 2023</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">3438</biblScope>
			<biblScope unit="page" from="129" to="144" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Performance analysis of large language models in the domain of legal argument mining</title>
		<author>
			<persName><forename type="first">A</forename><surname>Al Zubaer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Granitzer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mitrović</surname></persName>
		</author>
		<idno type="DOI">10.3389/frai.2023.1278796</idno>
		<ptr target="https://www.frontiersin.org/articles/10.3389/frai.2023.1278796.doi:10.3389/frai.2023.1278796" />
	</analytic>
	<monogr>
		<title level="j">Frontiers in Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">Argument mining as a multi-hop generative machine reading comprehension task</title>
		<author>
			<persName><forename type="first">B</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Schlegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Batista-Navarro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ananiadou</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=KTFxOnrbvu" />
	</analytic>
	<monogr>
		<title level="m">The 2023 Conference on Empirical Methods in Natural Language Processing</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Llamafactory: Unified efficient fine-tuning of 100+ language models</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ma</surname></persName>
		</author>
		<ptr target="http://arxiv.org/abs/2403.13372" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 62nd Annual Meeting of the ACL (Volume 3: System Demonstrations), ACL</title>
				<meeting>the 62nd Annual Meeting of the ACL (Volume 3: System Demonstrations), ACL<address><addrLine>Bangkok, Thailand</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
