<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Using Parameter Efficient Fine-Tuning on Legal Artificial Intelligence</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Kuo-Chun</forename><surname>Chien</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">National Central University</orgName>
								<address>
									<addrLine>No. 300, Zhongda Rd., Zhongli District</addrLine>
									<postCode>320317</postCode>
									<settlement>Taoyuan City</settlement>
									<country key="TW">Taiwan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Chia-Hui</forename><surname>Chang</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">National Central University</orgName>
								<address>
									<addrLine>No. 300, Zhongda Rd., Zhongli District</addrLine>
									<postCode>320317</postCode>
									<settlement>Taoyuan City</settlement>
									<country key="TW">Taiwan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ren-Der</forename><surname>Sun</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">National Central University</orgName>
								<address>
									<addrLine>No. 300, Zhongda Rd., Zhongli District</addrLine>
									<postCode>320317</postCode>
									<settlement>Taoyuan City</settlement>
									<country key="TW">Taiwan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<address>
									<postCode>2023</postCode>
									<settlement>Sherbrooke</settlement>
									<region>Québec</region>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Using Parameter Efficient Fine-Tuning on Legal Artificial Intelligence</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">42B7A3EEA65C0F5FC07646E6384B604C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:26+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Legal AI</term>
					<term>Legal Judgment Prediction</term>
					<term>Parameter-Efficient Fine-Tuning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Legal AI has a wide range of applications such as predicting whether a prosecution will be punished, or whether the punishment will be a prison sentence or a fine. However, current advances in natural language processing have resulted in an ever-increasing number of language models. The cost of finetuning the pre-trained language model and storing these fine-tuned language models becomes more and more expensive. To address this issue, we adopted the concept of Parameter Efficient Fine-Tuning (PEFT) and applied it to the field of Legal AI. By leveraging PEFT techniques, particularly through the implementation of the Low-Rank Adaptation (LoRA) architecture, we have achieved promising results in fine-tuning pre-trained language models. This approach enables us to achieve comparable, if not superior, performance while significantly reducing the time required for model adjustments. It demonstrates the potential of PEFT techniques in adapting language models to different legal frameworks, enhancing the accuracy and relevance of legal knowledge services, and making Legal AI more accessible to individuals without legal backgrounds.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Legal AI refers to the utilization of artificial intelligence (AI) technology in the legal sector. It is an expanding field that harnesses sophisticated algorithms and machine learning techniques to assist in the organization, analysis, and interpretation of extensive legal documentation. Applications of legal AI encompass various areas, including case management <ref type="bibr" target="#b0">[1]</ref>, legal judgment prediction (LJP) <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>, court views generation <ref type="bibr" target="#b3">[4]</ref>, among others. From overseeing compliance to managing legal risks, from streamlining contract management to conducting due diligence, AI technology can automate and enhance the legal workflow, leading to improved efficiency, accuracy, and convenience for legal professionals. Ultimately, the implementation of legal AI has the potential to revolutionize the legal industry, making legal services more accessible and cost-effective for individuals and businesses alike.</p><p>Legal cases typically fall into two main categories: civil law and criminal law. Since gathering facts and evidence for civil cases can be challenging <ref type="bibr" target="#b4">[5]</ref>, most research efforts in LJP have primarily concentrated on criminal cases <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8]</ref>, utilizing verdicts as the primary dataset for predicting potential legal articles, charges, and terms based on given factual information. However, in the field of Legal Judgment Prediction (LJP) in criminal cases, there are not only verdicts but also indictments and various prediction tasks. For example, prosecutors may want to know whether the case ultimately went to trial according to the legal provision and charges in the indictment; if the case went to trial, did its punishment result in jail time or a fine; and if the case was dismissed, was it because of immunity or not guilty?</p><p>In recent years, significant progress has also been made on many legal tasks based on pretrained models, including accusation prediction <ref type="bibr" target="#b8">[9]</ref>, prison term classification <ref type="bibr" target="#b1">[2]</ref>, criminal element extraction <ref type="bibr" target="#b2">[3]</ref>, and court view generation <ref type="bibr" target="#b3">[4]</ref>, etc. However, current advances in natural language processing have resulted in an ever-increasing number of language models. The cost of fine-tuning the pre-trained language model for different LJP tasks and storing these finetuned language models becomes more and more expensive. If we were to train a separate large language model for each sub-task, it would consume excessive time and resources. This highlights the need for adaptive methods, such as Parameter-Efficient Fine-Tuning (PEFT), which allows for selective updates or additions of parameters to train the model for new tasks.</p><p>In this study, we propose the use of PEFT to fine-tune pre-trained language models. Specifically, we adopt Low-Rank Adaptation of Large Language Models (LoRA) <ref type="bibr" target="#b9">[10]</ref>, as an implementation of PEFT, which offers advantages in reducing computational resources and fine-tuning time while maintaining or surpassing model performance. This makes it particularly valuable for refining large models with billions of parameters.</p><p>The rest of the paper is organized as follows: Section 2 introduces related work on legal AI and LJP and Parameter-Efficient Fine-Tuning (PEFT). The problem definition and dataset construction is detailed in Section 3. Section 4 explains PEFT. We report the experimental results in Section 5. Finally, Section 6 concludes the paper and suggests for future research direction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Legal AI</head><p>Legal artificial intelligence (LegalAI) has drawn increasing attention from NLP researchers because of the vast amount of legal documents. Zhong et al. <ref type="bibr" target="#b10">[11]</ref> surveyed the researches on legal artificial intelligence (LegalAI) and categorized its applications into three types: legal judgment prediction (LJP), similar case matching, and legal question answering.</p><p>Among them, legal judgment prediction has been widely studied for decades, and there are also several related LJP datasets, such as CAIL <ref type="bibr" target="#b1">[2]</ref>, CAIL-Long <ref type="bibr" target="#b11">[12]</ref>, ECHR <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b13">14]</ref>, etc. CAIL is the first Chinese Legal Judgment Prediction Dataset, which collects the criminal cases from Supreme People's Court of China. CAIL-Long further obtains more information form Supreme People's Court of China, including civil and criminal cases. ECHR <ref type="bibr" target="#b13">[14]</ref> is an English Legal Judgment Prediction Dataset collected from European Court of Human Rights, which contains cases that a state has breached human rights provisions of the European Convention of Human Rights.</p><p>LegalAI's research methods can be divided into symbol-based methods and embedding-based methods <ref type="bibr" target="#b10">[11]</ref>. In the past, researchers have used traditional machine learning methods for feature extraction, attempting to extract or create specific features from the description of criminal facts using additional labeling to help describe the crime. For example, Hu et al. <ref type="bibr" target="#b5">[6]</ref> combined ten discriminative legal features to help predict low-frequency charges. Shaikh et al. <ref type="bibr" target="#b14">[15]</ref> identified and extract 19 features of murder-related criminal cases to train a binary classifier to judge if guilty or not. However, these features are difficult to apply to large-scale datasets <ref type="bibr" target="#b15">[16]</ref> because fact descriptions are expressed in different ways and some of these features require additional labels.</p><p>To address the above scaling issues, researchers have attempted to incorporate legal knowledge into neural networks via automatic learning. For example, Luo et al. <ref type="bibr" target="#b16">[17]</ref> adopted a twostep approach to filter out irrelevant law articles with and retain the top k articles to scale up to a large number of law articles. They built a binary classifier for each article focusing on its relevance to the input case. The advantage of such an approach is that we can add new articles with the existing classifiers untouched. Similarly, Bao et al. <ref type="bibr" target="#b8">[9]</ref> proposed an attention neural network, LegalAtt, which uses relevant articles to improve the performance and interpretability of charge prediction task. Gan et al. <ref type="bibr" target="#b17">[18]</ref> injected the legal knowledge in the form as a set of first-order logic rules and integrate these rules into a co-attention network-based model, which makes the prediction more interpretable for civil loan cases. Kang et al. <ref type="bibr" target="#b15">[16]</ref> constructed auxiliary fact representations from the definitions of behavioral reasons to enhance fact descriptions. Lyu et al. <ref type="bibr" target="#b2">[3]</ref> introduced four types of criminal elements as bridges between the fact description and article, and used the concept of reinforcement learning to jointly identify similar articles and confusing fact descriptions in the legal judgment prediction task.</p><p>Multi-task learning framework is a machine learning method that can train multiple related tasks simultaneously, thus improving the performance of each task. It can use a shared layer to extract common features for all tasks, and then use different specialized layers to handle the details of each task, or use different layers to extract features for each task and then use some methods to limit the differences between the parameters of these layers. Zhong et al. <ref type="bibr" target="#b6">[7]</ref> proposed the TopJudge model, which uses a topological graph to enhance performance by exploiting the relationships between legal judgments, predicting articles, charges, and terms. Yang et al. <ref type="bibr" target="#b18">[19]</ref> proposed a multi-layer forward prediction and backward validation framework to effectively utilize the dependency relationships between multiple sub-tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Parameter-Efficient Fine-Tuning</head><p>It has been shown that it is feasible to update or add a very small number of parameters as opposed to updating all of the parameters of the pre-trained model as is the case with ordinary fine-tuning. The addition of adapters, which are tiny trainable feed-forward networks inserted between the layers of the fixed pre-trained model, was suggested Houlsby et al. <ref type="bibr" target="#b19">[20]</ref> (See Figure <ref type="figure" target="#fig_0">1</ref>). Since then, a wide range of advanced PEFT techniques have been put forth, e.g. low rank adaptation by Hu et al. <ref type="bibr" target="#b9">[10]</ref>, and prefix-tuning by <ref type="bibr" target="#b20">[21]</ref>. In a way, Houlsby et al. ( <ref type="formula">2019</ref>) places two adapters sequentially within one layer of the transformer, that is to say typical adapters are sequential computation. On the other hand, prefix-tuning and LoRA can be thought of as a "parallel"computation to the PLM layer. An unified view toward Parameter-efficient transfer learning was proposed by He et al. <ref type="bibr" target="#b21">[22]</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Problem Formulation and Dataset Construction</head><p>Four steps make up a criminal proceeding: investigation, prosecution, trial, and execution. Among these steps, the public is most interested in the investigation and trial steps. The investigation procedure refers to the process in which law enforcement agents look into potential criminal events and gather evidence under the direction of the prosecutor. The prosecution will file charges and begin the trial process if they feel that the defendant has a strong suspicion of committing a crime. An impartial, unbiased judge oversees the trial process and determines whether the defendant actually committed a crime based on the evidence given by the prosecutor. Today, judgment documents are used as the data source in the majority of publicly accessible datasets for LJP research. However the language employed in judgment documents is frequently more eloquent, and the substance primarily concentrates on the facts and procedures, leading to greater document lengths and more difficult comprehension for legal specialists. On the other hand, prosecutors employ language that is shorter and more akin to that of the general public when describing the portion of the criminal facts in the indictment that are based on their involvement in the investigation. Hence, rather than using judgment documents for the scope of the data collection, we employ indictments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Dataset Construction</head><p>We collected indictments from the public document inquiry system of the Ministry of Justice of Taiwan from June 15, 2018 to June 30, 2021. The defendant, charges, criminal facts, and legal provisions were extracted from the indictments using regular expressions, and the material was then organized into a JSON format. There were 533 articles under 41 laws and 183 charges from 355,295 cases in the original dataset.</p><p>How many articles and charges to include in the prediction model is a recurring issue while creating the LJP dataset. We screened out instances where the number of charges or articles was insufficient in order to make the experiment fair and prevent classification-related insufficient training or testing data, which may have an impact on the experimental outcomes (e.g., less than 30 cases). Furthermore, the first 100 articles of Taiwan's criminal code contain definitions of terms like attempted offenses and criminal responsibility, but we did not include these articles in our dataset because they do not specify the real penalties. Excluding the above cases, the total number of articles decreased significantly to 165, and the number of charges decreased from 183 to 94. A total of 12,541 cases were removed, accounting for 3.5% of the total dataset. It is worth noting that a case may violate more than one charge, but often only the primary</p><formula xml:id="formula_0">Facts ⋯知悉將帳戶存簿、金融卡及密碼交 付他人使用，恐為不法者充作詐騙被 害人匯入款項之犯罪工具，亦不違背 其本意之洗錢及幫助詐欺取財之犯意， 將其之存摺及提款卡等資料，並提供 提款卡密碼，以寄送包裹之方式，租借 寄予詐欺集團成員，容任該人及其所 屬之詐騙集團持以犯罪使用。⋯</formula><p>⋯Knowing that handing over account passbooks, financial cards, and passwords to others may serve as tools for criminals to commit fraud by transferring funds, and also not deviating from their intention of money laundering and aiding in fraudulent schemes, providing the passbooks, withdrawal cards, and supplying the PIN codes through parcel delivery to members of a fraudulent group enables that person and their affiliated fraudulent organization to utilize them for criminal purposes. ⋯ An example indictment document of a criminal case (original Chinese text and its English translation).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Laws</head><p>We have highlighted the criminal intent and the article in blue and green respectively.</p><p>charge are listed in the indictment. Thus, it is more difficult to estimate charge than articles (even though the number of articles in our dataset is greater than the number of charges). The distribution of instances in this dataset is unequal, as one might anticipate. The top 10 counts make about 85% of all cases, according to the number of charges in the indictment. In contrast, just 0.14% of the instances are covered by the lowest 10 charges. We divided the cases into categories based on the charges in the indictment in order to fairly split the data, using 80% of the instances in each category as training data, 10% as validation data, and the final 10% as testing data. Lastly, we created a dataset called TWLJP 1 (TaiWan Legal Judgment Prediction Datasets) by combining the data from all categories to create training, validation, and testing datasets. Table <ref type="table" target="#tab_0">1</ref> displays an example of an indictment, with the criminal intent and articles marked in blue and green, respectively. In this instance, a suspect gave the fraudsters access to his bank account, and the group tricked the victim into wiring money to the account before withdrawing it. The Anti-Money Laundering Act and the Criminal Law were both allegedly broken by the defendant, however the indictment only listed fraud as a crime.</p><p>Dataset # cases # laws # articles # charge avg length avg articles TWLJP 342,754 33 165 94 376.31 1.16</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>The Statistic of TWLJP dataset</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Problem Formulation</head><p>Let 𝐷 = (𝑑 1 , 𝑑 2 , ⋯ , 𝑑 𝑛 ) denotes a dataset with 𝑛 cases where each case 𝑑 𝑖 is described by a sequence of 𝑚 words 𝑑 𝑖 = (𝑤 𝑖 1 , 𝑤 𝑖 2 , ⋯ , 𝑤 𝑖 𝑚 ), and is associated with three labels 𝑙 𝑖 in 𝑅 𝑝 , 𝑐 𝑖 in 𝑅 𝑞 and 𝑎 𝑖 in 𝑅 𝑟 , where 𝑝 and 𝑞 denote the size of the one-hot vector of law and charge, while 𝑎 𝑖 is a multi-hot vector of articles with dimension 𝑟.</p><p>Each case 𝑑 𝑖 is also associated with a vector of 𝑝 laws, 𝑙 𝑖 = (𝑙 𝑖 1 , 𝑙 𝑖 2 , ⋯ , 𝑙 𝑖 𝑝 ), a vector of 𝑞 articles, 𝑎 𝑖 = (𝑎 𝑖 1 , 𝑎 𝑖 2 , ⋯ , 𝑎 𝑖 𝑞 ), and a vector of 𝑟 charges, 𝑐 𝑖 = (𝑐 𝑖 1 , 𝑐 𝑖 2 , ⋯ , 𝑐 𝑖 𝑟 ), where 𝑝, 𝑞, 𝑟 represent the size of the three vectors and 𝑙 𝑖 𝑗 , 𝑎 𝑖 𝑗 , 𝑐 𝑖 𝑗 in {0, 1}.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Proposed Models</head><p>Current models like Lawformer and TopJudge, as well as other state-of-the-art Legal Judgment Prediction (LJP) models, showcase the potential of neural network models in terms of accuracy and efficiency in predicting legal judgments. However, it is important to acknowledge that these models have certain limitations when applied to legal systems of different countries. For example, Lawformer is a pre-trained language model that utilizes legal documents from Mainland China as training data. It has shown impressive performance on the CAIL dataset. However, when applied to the TWLJP dataset, its performance is not as good as Chinese BERT.</p><p>The reason could be attributed to variations in legal terminology, penalties, and writing styles of legal documents across different countries.</p><p>As mentioned before, Parameter Efficient Fine-Tuning (PEFT) is an alternative approach that allows a model to learn a new task with minimal updates. In PEFT, a pre-trained model is finetuned by selectively updating or adding a small number of parameters. Recent advancements in PEFT techniques have demonstrated the ability to achieve performance comparable to finetuning the entire model while only modifying a fraction (e.g., 0.01%) of its parameters <ref type="bibr" target="#b22">[23]</ref>.</p><p>In this paper, we adopt LoRA <ref type="bibr" target="#b9">[10]</ref> to reduce the number of trainable parameters by learning pairs of rank-decomposition matrices while keeping the original weights frozen. The idea behind LoRA is that when adapting a pre-trained language model to a specific task or dataset, only a few features need to be emphasized or re-learnt. This means that the update matrix (ΔW) can be a low-rank matrix. As shown in Figure <ref type="figure" target="#fig_1">2</ref>, The update of a pre-trained weight matrix 𝑊 ∈ 𝑅 𝑑×𝑘 is constrained by using a low-rank decomposition 𝑊 0 + ∆𝑊 = 𝑊 0 + 𝐵𝐴, where 𝐵 ∈ 𝑅 𝑑×𝑟 , 𝐴 ∈ 𝑅 𝑟×𝑘 , and the rank r is a hyper parameter less than or equal to the minimum of d and k. During training, 𝑊 0 remains unchanged and does not receive any gradient updates, while A and B contain trainable parameters.</p><p>Since our LJP tasks include the prediction of legal law, article, and charges, we add three fully connected layers to the CLS output as depicted in Figure <ref type="figure" target="#fig_1">2</ref>. By adopting this approach with PLM frozen, we can significantly minimize the computational resources and time required for fine-tuning while ensuring the model's performance is preserved. The key advantage of LoRA lies in its remarkable ability to substantially reduce the computational resources and time necessary for fine-tuning, while maintaining the model's performance. This method proves particularly valuable when tackling extensive fine-tuning tasks, such as the refinement of highly capable large models that consist of billions of parameters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Experiment</head><p>In order to evaluate the performance of the TWLJP dataset that we have collected across different pre-trained language models, we conducted training and evaluation using the following settings:</p><p>Multi-task BERT As dipicted in Figure2, we use multi-task learning to model the prediction of Law, Charge and Article by given criminal fact descriptions in the indictment as input.</p><p>We utilize the Huggingface <ref type="bibr" target="#b23">[24]</ref> Chinese pre-training language model bert-base-chinese. The optimizer we use for Multi-task BERT is BERT Adam with a learning rate of 1e-5, maximum length of 512 and hidden size of 768 for the parameters of pre-trained language model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Multi-task Lawformer</head><p>Lawformer <ref type="bibr" target="#b11">[12]</ref> is a pre-trained language model based on the CAILlong dataset and capable of processing articles up to 4096 characters in length. However, since Lawformer uses the CAIL-long dataset in simplified Chinese, and our data is in traditional Chinese, we first used the OpenCC package to convert the crime facts to simplified Chinese before training. The optimizer we use for Multi-task Lawformer is AdamW with a learning rate of 1e-5, maximum length of 512 and hidden size of 768 for the parameters of pre-trained language model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>LoRA</head><p>We utilized the LoRA implementation from Hugging Face's PEFT package <ref type="bibr" target="#b24">[25]</ref> and bertbased-chinese model to generate embeddings. In the LoRA setting, the value of r is set to 8. The optimizer we use for LoRA is AdamW with a learning rate of 3e-4, maximum length of 512 and hidden size of 768 for the parameters of pre-trained language model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Evaluation Metric</head><p>We adopt micro precision (MiP), recall (MiR), and F1 score (MiF), as well as macro precision (MaP), recall (MaR), and F1 score (MaF), as the evaluation metrics. Macroprecision/recall/F1 is computed by averaging each class, which is a commonly used metric in multi-label classification tasks. The performance of Charge prediction on TWLJP dataset. Mi means Micro, Ma means Macro.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Performance on TWLJP</head><p>To evaluate the performance of the TWLJP dataset across different models, we conducted experiments using the models introduced in the previous section. The performance of TWLJP on each model is shown in Tables <ref type="table" target="#tab_2">3, 4</ref>, and 5. In each experiment, we selected the epoch with the best performance on the validation dataset and tested on the testing dataset. The performance shown in the tables is the average performance of the model over five experiments, with a calculation of 2 times the standard deviation. We conducted the experiments using the GeForce RTX 4070 Ti graphics card, and the training time for each model for one epoch, as well as the parameter information of the models, are presented in Table <ref type="table" target="#tab_4">6</ref>. The training time and parameter information for each model on the TWLJP dataset.</p><p>Based on the experimental results, it is evident that the performance of models implemented using the Lawformer pre-trained language model did not meet our expectations. Upon analysis, we determined that the reason behind this discrepancy lies in the fact that Lawformer was trained on legal documents from mainland China. Despite our efforts to convert the input criminal facts from Traditional Chinese to Simplified Chinese, there are significant differences between the legal systems and terminologies used in mainland China and Taiwan. This mismatch in legal terminology and usage negatively impacted the performance of Lawformer on the TWLJP dataset.</p><p>Under the training architecture of LoRA, comparable performance to Multi-task BERT is achieved in terms of case cause, legal provisions, and legal sources, and even superior performance compared to Multi-task BERT. The training time for one epoch is 1 hour and 58 minutes, which is approximately half the time required by Multi-task BERT, which is 3 hours and 49 minutes. Regarding the parameter count, Multi-task BERT has a total of 102,716,744 parameters, all of which need adjustment. In the LoRA architecture, the total number of parameters is 103,011,656, but only 744,008 parameters need to be trained, which is approximately 0.72% of the trainable parameters in Multi-task BERT. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Performance on CAIL</head><p>To ensure fairness in our experiments, we also utilized the publicly available CAIL dataset. We conducted multi-task training on the dataset, focusing on the charges and articles. The perfor- mance of each model is shown in Tables <ref type="table" target="#tab_6">7 and 8</ref>. Since the main objective of our experiment was to compare the performance, time, and parameters of large language models, we did not compare them to other related models. For each experiment, we selected the epoch with the best performance on the validation dataset and tested it on the test dataset. We conducted the experiments using the GeForce RTX 4070 Ti graphics card, and the training time for each model per epoch and the parameter information are provided in Table <ref type="table" target="#tab_7">9</ref>. The training time and parameter information for each model on the CAIL dataset From the experimental results, it can be observed that the performance of the Lawformer pretrained language model did not meet expectations. Upon analyzing the reasons for this, although Lawformer was trained on legal documents from mainland China, it is based on the Longformer architecture, which allows for input lengths of up to 4096 tokens. However, we used a maximum length of 512 tokens, and modifying this maximum length would lead to insufficient memory on the graphics card. As a result, the weights of some models were not updated, leading to poor training performance.</p><p>On the other hand, under the training framework of LoRA, comparable performance to Multitask BERT was achieved for charges and articles. The training time for one epoch was 1 hour and 14 minutes, compared to 2 hours and 19 minutes for Multi-task BERT, requiring approximately half the time. In terms of parameter quantity, LoRA only required approximately 0.86% of the training parameters compared to Multi-task BERT.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>Legal AI plays a crucial role in providing legal knowledge services to individuals with legal backgrounds, as well as assisting non-legal professionals. However, due to the diverse range of sub-tasks in Legal AI and the increasing size of pre-trained language models, training and storing a separate language model for each sub-task can be costly and resource-intensive. To address this challenge, we have embraced the concept of Parameter Efficient Fine-Tuning (PEFT) and applied it to the field of Legal AI.</p><p>By leveraging the PEFT approach, specifically through the implementation of the LoRA architecture, we have observed promising results in fine-tuning pre-trained language models. This approach allows us to achieve comparable, if not superior, performance while significantly reducing the time required for model adjustments. In our experiments, we found that using the LoRA framework required only about half the time compared to fine-tuning the entire model, without sacrificing performance. This innovative methodology opens up new possibilities for adapting language models to different legal contexts efficiently.</p><p>The success of our approach highlights the potential of PEFT techniques in the Legal AI domain. By efficiently adjusting and fine-tuning language models, we can tailor them to specific legal frameworks, taking into account the variations in legal definitions, documents, and terminologies across different countries. This advancement not only enhances the accuracy and relevance of legal knowledge services but also extends the accessibility of Legal AI to individuals without a legal background.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Three PEFT mechanism: Adapter, prefix tuning and LoRA (Adopted from [22])</figDesc><graphic coords="4,198.43,109.70,195.92,170.08" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Model Architecture</figDesc><graphic coords="7,154.66,217.72,283.47,142.61" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell></cell><cell>洗錢防制法、刑法</cell><cell>Money Laundering Control Act、Criminal Code</cell></row><row><cell></cell><cell></cell><cell>It was committed by the defendant, and is</cell></row><row><cell>Articles</cell><cell>⋯是核被告所為，係犯洗錢防制法第 2 條第 2 款、第 14 條第 1 項之洗錢罪嫌 及刑法第 30 條第 1 項前段、第 339 條 第 1 項之幫助詐欺取財罪嫌⋯</cell><cell>guilty of the crime of money laundering un-der Article 2, paragraph 2, and Article 14, para-graph 1, and the crime of assisting in fraud-ulent acquisition of money under Article 30, paragraph 1, and Article 339, paragraph 1 of</cell></row><row><cell></cell><cell></cell><cell>the Criminal Code.</cell></row><row><cell cols="2">Charge 詐欺</cell><cell>Fraud</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3</head><label>3</label><figDesc>The performance of Law prediction on TWLJP dataset. Mi means Micro, Ma means Macro.</figDesc><table><row><cell>Sub-task</cell><cell></cell><cell></cell><cell></cell><cell>Law</cell><cell></cell><cell></cell></row><row><cell>Model/Metric</cell><cell>MiP</cell><cell>MiR</cell><cell>MiF</cell><cell>MaP</cell><cell>MaR</cell><cell>MaF</cell></row><row><cell>Multi-task BERT</cell><cell cols="3">99.46±0.1 99.04±0.1 99.24±0.1</cell><cell>96.2±1.0</cell><cell>93.28±2.0</cell><cell>94.52±0.7</cell></row><row><cell cols="4">Multi-task Lawformer 99.30±0.1 98.46±0.3 98.88±0.1</cell><cell>95.0±2.7</cell><cell>87.02±5.6</cell><cell>89.98±3.2</cell></row><row><cell>LoRA(r=8)</cell><cell>99.43±0.1</cell><cell cols="5">99.10±0 99.27±0.1 96.50±0.5 93.87±1.6 95.03±1.0</cell></row><row><cell>Sub-task</cell><cell></cell><cell></cell><cell cols="2">Article</cell><cell></cell><cell></cell></row><row><cell>Model/Metric</cell><cell>MiP</cell><cell>MiR</cell><cell>MiF</cell><cell>MaP</cell><cell>MaR</cell><cell>MaF</cell></row><row><cell>Multi-task BERT</cell><cell cols="3">96.76±0.3 94.60±0.7 95.68±0.3</cell><cell>80.60±5.1</cell><cell>72.20±2.9</cell><cell>74.6±3.7</cell></row><row><cell cols="4">Multi-task Lawformer 95.60±0.6 91.88±1.0 93.72±0.3</cell><cell>73.50±3.7</cell><cell>62.94±2.0</cell><cell>65.9±2.2</cell></row><row><cell>LoRA(r=8)</cell><cell>96.63±0.1</cell><cell cols="5">95.10±0 95.87±0.1 84.90±4.7 78.07±2.3 80.23±3.4</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 4</head><label>4</label><figDesc>The performance of Article prediction on TWLJP dataset. Mi means Micro, Ma means Macro.</figDesc><table><row><cell>Sub-task</cell><cell></cell><cell></cell><cell>Charge</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Model/Metric</cell><cell>MiP</cell><cell>MiR</cell><cell>MiF</cell><cell>MaP</cell><cell>MaR</cell><cell>MaF</cell></row><row><cell>Multi-task BERT</cell><cell>94.08±0.2</cell><cell cols="3">93.46±0.3 93.74±0.1 69.36±3.5</cell><cell>64.14±2.6</cell><cell>65.10±2.7</cell></row><row><cell cols="2">Multi-task Lawformer 93.00±1.0</cell><cell cols="3">92.46±0.2 92.76±0.5 64.44±3.2</cell><cell>59.14±2.5</cell><cell>59.94±1.6</cell></row><row><cell>LoRA(r=8)</cell><cell>94.53±0</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>.2 93.53±0.1 94.00±0 71.10±1.6 65.17±1.9 66.93±1.6Table 5</head><label>5</label><figDesc></figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 6</head><label>6</label><figDesc></figDesc><table><row><cell>TWLJP</cell><cell cols="2">Multi-task BERT Multi-task Lawformer</cell><cell>LoRA</cell></row><row><cell>Time</cell><cell>3hrs 49mins</cell><cell>3hrs 44mins</cell><cell>1hr 58mins</cell></row><row><cell># Parameters</cell><cell>102,716,744</cell><cell>105,470,792</cell><cell>103,011,656</cell></row><row><cell># Trainable Parameters</cell><cell>102,716,744</cell><cell>105,470,792</cell><cell>744,008</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>86.9 85.8 79.9 71.6 73.9Table 7</head><label>7</label><figDesc>The performance of Article prediction on CAIL dataset. Mi means Micro, Ma means Macro.</figDesc><table><row><cell>Sub-task</cell><cell>Article</cell><cell></cell></row><row><cell>Model/Metric</cell><cell cols="3">MiP MiR MiF MaP MaR MaF</cell></row><row><cell>Multi-task BERT</cell><cell cols="3">84.1 85.7 84.9 79.0 71.6 73.4</cell></row><row><cell cols="2">Multi-task Lawformer 79.3 79.8 79.6 70.9</cell><cell>59.6</cell><cell>62.7</cell></row><row><cell>LoRA(r=8)</cell><cell>84.6</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 8</head><label>8</label><figDesc>The performance of Charge prediction on CAIL dataset. Mi means Micro, Ma means Macro.</figDesc><table><row><cell>Sub-task</cell><cell>Charge</cell><cell></cell></row><row><cell>Model/Metric</cell><cell cols="3">MiP MiR MiF MaP MaR MaF</cell></row><row><cell>Multi-task BERT</cell><cell cols="3">89.0 89.1 89.0 84.4 77.1 79.4</cell></row><row><cell cols="2">Multi-task Lawformer 82.4 81.9 82.1 74.5</cell><cell>62.1</cell><cell>65.5</cell></row><row><cell>LoRA(r=8)</cell><cell cols="2">89.0 88.3 88.7 84.8 76.7</cell><cell>79.1</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_7"><head>Table 9</head><label>9</label><figDesc></figDesc><table><row><cell>TWLJP</cell><cell cols="2">Multi-task BERT Multi-task Lawformer</cell><cell>LoRA</cell></row><row><cell>Time</cell><cell>2hrs 19mins</cell><cell>2hrs 11mins</cell><cell>1hr 14mins</cell></row><row><cell># Parameters</cell><cell>102,859,778</cell><cell>105,613,826</cell><cell>103,154,690</cell></row><row><cell># Trainable Parameters</cell><cell>102,859,778</cell><cell>105,613,826</cell><cell>887,042</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">doi: 10.17632/gxxcv4jcgg.1</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A comparative study of automated legal text classification using random forests and deep learning</title>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ding</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.ipm.2021.102798</idno>
		<ptr target="https://www.sciencedirect.com/science/article/pii/S0306457321002764.doi:10.1016/j.ipm.2021.102798" />
	</analytic>
	<monogr>
		<title level="j">Information Processing &amp; Management</title>
		<imprint>
			<biblScope unit="volume">59</biblScope>
			<biblScope unit="page">102798</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Cail2018: A large-scale legal dataset for judgment prediction</title>
		<author>
			<persName><forename type="first">C</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Tu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Xu</surname></persName>
		</author>
		<idno type="DOI">10.48550/ARXIV.1807.02478</idno>
		<ptr target="https://arxiv.org/abs/1807.02478.doi:10.48550/ARXIV.1807.02478" />
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Improving legal judgment prediction through reinforced criminal element extraction</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Lyu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Song</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.ipm.2021.102780</idno>
		<ptr target="https://www.sciencedirect.com/science/article/pii/S0306457321002600.doi:10.1016/j.ipm.2021.102780" />
	</analytic>
	<monogr>
		<title level="j">Information Processing &amp; Management</title>
		<imprint>
			<biblScope unit="volume">59</biblScope>
			<biblScope unit="page">102780</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Interpretable charge predictions for criminal cases: Learning to generate court views from fact descriptions</title>
		<author>
			<persName><forename type="first">H</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chao</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N18-1168</idno>
		<ptr target="https://aclanthology.org/N18-1168.doi:10.18653/v1/N18-1168" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long Papers</title>
		<meeting>the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>New Orleans, Louisiana</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1854" to="1864" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Legal judgment prediction with multi-stage case representation learning in the real court setting</title>
		<author>
			<persName><forename type="first">L</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="DOI">10.1145/3404835.3462945</idno>
		<idno>doi:10.1145/ 3404835.3462945</idno>
		<ptr target="https://doi.org/10.1145/3404835.3462945" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;21</title>
				<meeting>the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;21<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="993" to="1002" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Few-shot charge prediction with discriminative legal attributes</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Tu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sun</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/C18-1041" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 27th International Conference on Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 27th International Conference on Computational Linguistics, Association for Computational Linguistics<address><addrLine>Santa Fe, New Mexico, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="487" to="498" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Legal judgment prediction via topological learning</title>
		<author>
			<persName><forename type="first">H</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Tu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sun</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D18-1390</idno>
		<ptr target="https://aclanthology.org/D18-1390.doi:10.18653/v1/D18-1390" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<meeting>the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics<address><addrLine>Brussels, Belgium</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="3540" to="3549" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Distinguish confusing law articles for legal judgment prediction</title>
		<author>
			<persName><forename type="first">N</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhao</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.280</idno>
		<ptr target="https://aclanthology.org/2020.acl-main.280.doi:10.18653/v1/2020.acl-main.280" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="3086" to="3095" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Charge prediction with legal attention</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Bao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Gong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xiao</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-32233-5_35</idno>
		<idno>doi:</idno>
		<ptr target="10.1007/978-3-030-32233-5_35" />
	</analytic>
	<monogr>
		<title level="m">Natural Language Processing and Chinese Computing: 8th CCF International Conference, NLPCC 2019</title>
				<meeting><address><addrLine>Dunhuang, China; Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2019">October 9-14, 2019. 2019</date>
			<biblScope unit="page" from="447" to="458" />
		</imprint>
	</monogr>
	<note>Proceedings, Part I</note>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wallis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Allen-Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2106.09685</idno>
		<title level="m">Lora: Low-rank adaptation of large language models</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">How does NLP benefit legal system: A summary of legal artificial intelligence</title>
		<author>
			<persName><forename type="first">H</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Tu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sun</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.466</idno>
		<ptr target="https://aclanthology.org/2020.acl-main.466.doi:10.18653/v1/2020.acl-main.466" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="5218" to="5230" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Lawformer: A pre-trained language model for chinese legal long documents</title>
		<author>
			<persName><forename type="first">C</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Tu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sun</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.aiopen.2021.06.003</idno>
		<ptr target="https://www.sciencedirect.com/science/article/pii/S2666651021000176.doi:10.1016/j.aiopen.2021.06.003" />
	</analytic>
	<monogr>
		<title level="j">AI Open</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="79" to="84" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Predicting judicial decisions of the european court of human rights: A natural language processing perspective</title>
		<author>
			<persName><forename type="first">N</forename><surname>Aletras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tsarapatsanis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Preoţiuc-Pietro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Lampos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PeerJ Computer Science</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page">e93</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Neural legal judgment prediction in English</title>
		<author>
			<persName><forename type="first">I</forename><surname>Chalkidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Androutsopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Aletras</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P19-1424</idno>
		<ptr target="https://aclanthology.org/P19-1424.doi:10.18653/v1/P19-1424" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="4317" to="4323" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Predicting outcomes of legal cases based on legal factors using classifiers</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Shaikh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">P</forename><surname>Sahu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Anand</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.procs.2020.03.292</idno>
		<ptr target="https://www.sciencedirect.com/science/article/pii/S1877050920307584.doi:10.1016/j.procs.2020.03.292" />
	</analytic>
	<monogr>
		<title level="m">international Conference on Computational Intelligence and Data Science</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">167</biblScope>
			<biblScope unit="page" from="2393" to="2402" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Label definitions augmented interaction model for legal charge prediction</title>
		<author>
			<persName><forename type="first">L</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ye</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-72113-8_18</idno>
		<idno>doi:</idno>
		<ptr target="10.1007/978-3-030-72113-8_18" />
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event</title>
				<meeting><address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2021-04-01">March 28 -April 1, 2021. 2021</date>
			<biblScope unit="page" from="270" to="283" />
		</imprint>
	</monogr>
	<note>Proceedings, Part I</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Learning to predict charges for criminal cases with legal basis</title>
		<author>
			<persName><forename type="first">B</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhao</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D17-1289</idno>
		<ptr target="https://aclanthology.org/D17-1289.doi:10.18653/v1/D17-1289" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<meeting>the 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics<address><addrLine>Copenhagen, Denmark</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="2727" to="2736" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Judgment prediction via injecting legal knowledge into neural networks</title>
		<author>
			<persName><forename type="first">L</forename><surname>Gan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wu</surname></persName>
		</author>
		<ptr target="https://ojs.aaai.org/index.php/AAAI/article/view/17522" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI Conference on Artificial Intelligence</title>
				<meeting>the AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="12866" to="12874" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Legal judgment prediction via multi-perspective bifeedback network</title>
		<author>
			<persName><forename type="first">W</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Jia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Luo</surname></persName>
		</author>
		<idno type="DOI">10.24963/ijcai.2019/567</idno>
		<ptr target="https://doi.org/10.24963/ijcai.2019/567.doi:10.24963/ijcai.2019/567" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, International Joint Conferences on Artificial Intelligence Organization</title>
				<meeting>the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, International Joint Conferences on Artificial Intelligence Organization</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="4085" to="4091" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Parameter-efficient transfer learning for nlp</title>
		<author>
			<persName><forename type="first">N</forename><surname>Houlsby</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Giurgiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jastrzebski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Morrone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>De Laroussilhe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gesmundo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Attariyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gelly</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="2790" to="2799" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Prefix-tuning: Optimizing continuous prompts for generation</title>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liang</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.acl-long.353</idno>
		<ptr target="https://aclanthology.org/2021.acl-long.353.doi:10.18653/v1/2021.acl-long.353" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing</title>
		<title level="s">Long Papers</title>
		<meeting>the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4582" to="4597" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Berg-Kirkpatrick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Neubig</surname></persName>
		</author>
		<title level="m">Towards a unified view of parameter-efficient transfer learning</title>
				<imprint>
			<publisher>ICLR</publisher>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning</title>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Muqeeth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mohta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bansal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">A</forename><surname>Raffel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="1950" to="1965" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Transformers: State-of-the-art natural language processing</title>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Delangue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cistac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Louf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Funtowicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Davison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shleifer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Von Platen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jernite</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Plu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">L</forename><surname>Scao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gugger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Drame</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Lhoest</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Rush</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/2020.emnlp-demos.6" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</title>
		<title level="s">Association for Computational Linguistics</title>
		<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="38" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Mangrulkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gugger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Belkada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Paul</surname></persName>
		</author>
		<ptr target="https://github.com/huggingface/peft" />
		<title level="m">Peft: State-of-the-art parameterefficient fine-tuning methods</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
