<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Enhancing Cross-prompt Automated Essay Scoring by Selecting Training Data Based on Reinforcement Learning</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Takumi</forename><surname>Shibata</surname></persName>
							<email>shibata@ai.lab.uec.ac.jp</email>
							<affiliation key="aff0">
								<orgName type="institution">The University of Electro-Communications</orgName>
								<address>
									<addrLine>1-5-1 Chofugaoka</addrLine>
									<settlement>Chofu, Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Masaki</forename><surname>Uto</surname></persName>
							<email>uto@ai.lab.uec.ac.jp</email>
							<affiliation key="aff0">
								<orgName type="institution">The University of Electro-Communications</orgName>
								<address>
									<addrLine>1-5-1 Chofugaoka</addrLine>
									<settlement>Chofu, Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Enhancing Cross-prompt Automated Essay Scoring by Selecting Training Data Based on Reinforcement Learning</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">3EA439019CBFEE1B4BBBC901506A7B71</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:38+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Cross-prompt automated essay scoring</term>
					<term>reinforcement learning</term>
					<term>data valuation</term>
					<term>transfer learning</term>
					<term>educational measurement</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Automated essay scoring (AES) aims to automatically grade essays, thereby reducing the time and cost associated with manual scoring. The most common AES methods are classified under the prompt-specific approach, which involves developing a scoring model exclusively for a target prompt by using a dataset of scored essays corresponding to that prompt. Meanwhile, recent studies have emphasized the crossprompt approach, which leverages scored essay data from other prompts, referred to as source prompts, to build an AES model for the target prompt. However, these cross-prompt methods have limitations in that they do not consider the presence of source prompt essays that can potentially have a negative impact on the construction of the AES model for the target prompt. To address this limitation, we propose a novel cross-prompt AES method that utilizes data valuation with reinforcement learning (DVRL). The proposed method enables the selective use of source prompt essays, which positively contributes to improving the scoring accuracy of the AES for the target prompt. Experiments on a benchmark dataset demonstrate that the proposed method enhances the performance of various AES models in cross-prompt scoring settings.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In recent years, dynamic changes in social structures have led to a growing emphasis on practical skills such as critical thinking and expressive abilities in educational settings. The essay exam has gained attention as a popular method for assessing these practical abilities <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. However, grading essays incurs substantial costs in terms of personnel, time, and money, and it is also challenging to ensure consistency and fairness in scoring <ref type="bibr" target="#b2">[3]</ref>. To address these issues, automated essay scoring (AES) methods, which employ artificial intelligence technologies to automatically score essays, have been extensively explored in recent years (e.g., <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b18">19,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b22">23]</ref>).</p><p>AES methods can be broadly classified into two categories <ref type="bibr" target="#b21">[22]</ref>: prompt-specific and crossprompt methods. Prompt-specific AES methods construct a specialized scoring model for a single target prompt by using a training dataset consisting of scored essays corresponding to that prompt <ref type="foot" target="#foot_0">1</ref> . Traditional prompt-specific AES methods have relied on feature-based methods, which involve extracting specific features such as essay length and grammatical error rate from essays and training machine learning models using these features <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>. However, these methods require substantial effort in feature engineering and their performance depends heavily on manually designed features. To address these limitations, deep learning-based approaches have gained popularity in recent years. These methods directly input the word sequences of essays into deep neural networks, eliminating the need for manual feature design <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b14">15]</ref>. In particular, pre-trained transformer-encoder-based models, such as those using BERT <ref type="bibr" target="#b23">[24]</ref> or its variants, have been widely adopted over the past few years, and have demonstrated high performance <ref type="bibr" target="#b24">[25]</ref>. Furthermore, recent research has begun to explore the potential of large language models (LLMs) for AES, investigating their enhanced knowledge retention and language-understanding capabilities <ref type="bibr" target="#b25">[26,</ref><ref type="bibr" target="#b26">27]</ref>, although they are not necessarily superior to the AES models using BERT or its variants.</p><p>Although these prompt-specific AES models demonstrate high performance on the target prompt for which they were trained, there is no guarantee that directly applying the trained model to other prompts will yield high performance. To enhance the scoring performance for other prompts, it is generally necessary to collect an additional scored essay dataset tailored to each prompt and subsequently retrain the AES model using those data. To avoid such retraining processes, cross-prompt AES methods have recently been proposed <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b22">23,</ref><ref type="bibr" target="#b27">28,</ref><ref type="bibr" target="#b28">29]</ref>. Cross-prompt AES methods build an AES model for a target prompt by leveraging scored essay data collected from other prompts, referred to as source prompts. The effective use of source prompt data can enhance the performance of an AES model for a target prompt, even when there are no or only a limited number of scored essays corresponding to that prompt.</p><p>Various cross-prompt AES methods have been explored recently. For example, Li et al. <ref type="bibr" target="#b22">[23]</ref> proposed a feature-based AES model using prompt-independent features, constructed by domain adversarial neural networks (DANN) <ref type="bibr" target="#b29">[30]</ref>. Furthermore, Ridley et al. <ref type="bibr" target="#b10">[11]</ref> proposed a deep neural network model that integrates prompt-independent features and is designed to receive sequences of part-of-speech (POS) tags instead of word sequences as input in order to mitigate the influence of prompt-specific information. More recently, Chen et al. <ref type="bibr" target="#b21">[22]</ref> introduced a technique that employs a contrastive learning approach to obtain more consistent prompt-independent features, thereby achieving the current state-of-the-art.</p><p>However, these existing cross-prompt AES methods are assumed to utilize all source prompt essays, ignoring the presence of essays that can potentially have a negative impact on the construction of the AES model for the target prompt <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31,</ref><ref type="bibr" target="#b31">32]</ref>. Because some essays from source prompts that exhibit significantly different characteristics compared with the target prompt essays can act as noise, proper data selection to omit such essays is expected to improve scoring accuracy.</p><p>For this reason, we propose a cross-prompt AES method that follows the approach of data valuation by using reinforcement learning (DVRL) <ref type="bibr" target="#b31">[32]</ref> to select source prompt essays that are valuable in constructing AES models for the target prompt. DVRL is a reinforcement learning framework that estimates the value of each data sample based on its contribution to performance improvement in a specific target task. In our method, we adapt DVRL to construct a data value estimator, which assigns higher values to source prompt essays that positively contribute to AES performance on the target prompt and assigns lower values to those that might negatively impact the AES performance. The data selected using our DVRL framework can be used to construct any type of AES model, enhancing their AES performance on the target prompt compared with scenarios that use all source prompt data. In this study, we evaluate the effectiveness of our proposed method, using a benchmark dataset and several popular AES models, including BERT, Llama-2 <ref type="bibr" target="#b32">[33]</ref>, and the models proposed by Ridley et al. <ref type="bibr" target="#b10">[11]</ref> and Chen et al. <ref type="bibr" target="#b21">[22]</ref>. The experimental results show that the proposed method succeeded in improving performance across all AES models.</p><p>The remainder of this paper is structured as follows: Section 2 provides further details on conventional cross-prompt AES models. Section 3 explains the data valuation methods. Section 4 describes the proposed method, and Section 5 evaluates its effectiveness, using a benchmark dataset. Finally, Section 6 summarizes the study.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Conventional Cross-Prompt AES Methods</head><p>This section provides an overview of conventional cross-prompt AES methods and discusses the limitations and drawbacks of these approaches.</p><p>Jin et al. <ref type="bibr" target="#b16">[17]</ref> proposed a cross-prompt AES method based on a two-stage approach. In the first stage, a RankSVM <ref type="bibr" target="#b33">[34]</ref> is trained using essays from source prompts. This RankSVM is then used to generate prediction scores for essays of the target prompt, which serve as pseudo-scores for the next stage. In the second stage, a prompt-specific AES model is trained for the target prompt, using these pseudo-scores.</p><p>Li et al. <ref type="bibr" target="#b22">[23]</ref> also proposed a two-stage AES method that utilizes DANN in the first stage. DANN is a deep learning approach that learns domain-independent features through an adversarial training process. This adversarial training uses two models: a main model that solves a target task and a domain classifier that identifies the domain each datum belongs to. These models are trained to maximize the performance of the main model while minimizing that of the domain classifier. The first stage of the method of Li et al. <ref type="bibr" target="#b22">[23]</ref> uses the DANN to construct a feature extractor that produces prompt-independent features. Then, an AES model is constructed using source prompt data of essays that are vectorized by the feature extractor to generate pseudo-scores for the target prompt essays. The second stage trains a prompt-dependent AES model for the target prompt, using the target prompt essays with the pseudo-scores.</p><p>Meanwhile, Ridley et al. <ref type="bibr" target="#b10">[11]</ref> introduced a model called the prompt-agnostic essay scorer (PAES), which learns an AES model in an end-to-end fashion. PAES is a deep neural network model that integrates manually-designed prompt-independent features. This neural model is designed to receive sequences of POS tags instead of word sequences as input in order to mitigate the influence of prompt-specific information.</p><p>Chen et al. <ref type="bibr" target="#b21">[22]</ref> proposed a model called the prompt-mapping contrastive learning for crossprompt automated essay scoring (PMAES), which uses contrastive learning to learn more consistent prompt-independent features. PMAES utilizes PAES as an encoder to generate feature vectors for essays. It then employs contrastive learning to bring the vectors from the essays of source prompts closer to those from the target prompt. This process contributes to the construction of more consistent prompt-independent features, which are effective for cross-prompt scoring. PMAES has achieved state-of-the-art performance in cross-prompt AES methods.</p><p>As discussed above, conventional cross-prompt AES methods have focused primarily on learning prompt-independent features in order to extract transferable knowledge in essay scoring from source prompt data to target prompt data. However, these existing cross-prompt AES methods are assumed to utilize all source prompt essays, ignoring the presence of essays that can negatively impact the construction of the AES model for the target prompt <ref type="bibr" target="#b29">[30,</ref><ref type="bibr" target="#b30">31,</ref><ref type="bibr" target="#b31">32]</ref>. Although these methods assume the source prompts to be a mixture of multiple prompts <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b22">23,</ref><ref type="bibr" target="#b27">28,</ref><ref type="bibr" target="#b28">29]</ref>, not all of the source prompts will necessarily share similar characteristics with the target prompt. Thus, the inclusion of source prompt essays that are greatly dissimilar to the target prompt essays can act as noise in the construction of an AES model for the target prompt. This issue becomes particularly relevant in conditions where there is a large variety of source prompts in terms of topics and writing styles. These insights suggest that a careful selection of source prompt essays would be effective for obtaining accurate cross-prompt AES models. The idea of our study is thus to apply data valuation methods to construct a selector of valuable source prompt essays.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Data Valuation Methods</head><p>Data valuation is a method for quantifying the importance of each sample in a dataset. Quantifying the value of data is regarded as an important task in various machine learning problems, including domain adaptation, discovering noisy samples, learning robust models, and improving the quality of datasets.</p><p>Representative data valuation methods include leave-one-out and data Shapley <ref type="bibr" target="#b34">[35]</ref>. Leaveone-out is a method that estimates the importance of each sample by calculating the change in performance of a target task when removing each sample one by one. Data Shapley evaluates the value of data, using the Shapley value from cooperative game theory. Specifically, data Shapley calculates the marginal contribution of each sample by evaluating the prediction performance of a target task when using each possible combination of samples. Moreover, another method using the Banzhaf value, which originates from cooperative game theory as well, has also been proposed <ref type="bibr" target="#b35">[36]</ref>.</p><p>Several data valuation methods based on meta-learning have also been proposed. One example is ChoiceNet <ref type="bibr" target="#b36">[37]</ref>, a valuation method that identifies noisy data within training datasets by separately estimating the distributions of meaningful data and noise data. Learning to reweight <ref type="bibr" target="#b37">[38]</ref> is another method that calculates the weights of each sample in the source dataset based on the performance of a target task on a validation dataset. Furthermore, as a recent meta-learning-based data valuation method, Yoon et al. <ref type="bibr" target="#b31">[32]</ref> proposed a method called data valuation using reinforcement learning (DVRL). DVRL employs a reinforcement learning strategy that simultaneously optimizes a data value estimator and a predictor model for a target task. In this study, we apply the framework of DVRL to cross-prompt AES. are given. Here, 𝑥 𝑠 𝑖 and 𝑥 𝑡 𝑖 represent the 𝑖-th essay in the source and target prompt essays, respectively, while 𝑦 𝑠 𝑖 and 𝑦 𝑡 𝑖 denote their corresponding scores. 𝑁 𝑠 and 𝑁 𝑡 represent the total numbers of essays for the source prompts and target prompt, respectively.</p><p>Our study aims to develop an AES model that can accurately predict scores for unscored essays corresponding to the target prompt by executing the following two steps.</p><p>1. Construct a data value estimator, using DVRL to assign value scores to each essay in the source prompt essays.</p><p>2. Train an AES model for the target prompt, using a subset of source prompt essays assigned high-value scores by the data value estimator.</p><p>Note that this study exclusively uses 𝒟 𝑠 in the AES training process, while both 𝒟 𝑠 and 𝒟 𝑡 are used in the DVRL process 2 . The following sections describe the details of each step.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Data Valuation Using DVRL</head><p>Figure <ref type="figure" target="#fig_0">1</ref> illustrates the outline of our DVRL framework. It consists of two models: a data value estimator 𝑓 𝜃 that estimates the value of each scored essay data, and a predictor 𝑔 𝜑 that outputs 2 It should be noted that 𝒟 𝑡 is also available to train the AES model constructed in step 2. However, we do not use 𝒟 𝑡 because this study focuses on how data selection by the proposed method affects AES performance compared with scenarios in which all source prompt data are used. A detailed evaluation of the effect of integrating 𝒟 𝑡 as AES training data remains a subject for future research.</p><p>the predicted score of the essay. Here, 𝜃 and 𝜑 are the model parameters of the data value estimator and predictor, respectively. In the figure, ℎ 𝑠 𝑖 and ℎ 𝑡 𝑖 represent feature vectors corresponding to 𝑥 𝑠 𝑖 and 𝑥 𝑡 𝑖 , respectively. The method for creating these feature vectors depends on the type of AES model that will ultimately be constructed. Specifically, when we intend to use AES models that accept word sequences as input, we use distributed essay representation vectors obtained from DeBERTa-v3-large <ref type="bibr" target="#b38">[39,</ref><ref type="bibr" target="#b39">40]</ref> as the feature vectors. Meanwhile, when we intend to use cross-prompt AES models such as PAES and PMAES, we utilize manually designed prompt-independent features.</p><p>The learning process of DVRL is formulated as the following optimization problem:</p><formula xml:id="formula_0">max 𝑓 𝜃 E (ℎ 𝑡 ,𝑦 𝑡 )∼𝒫 𝑡 [𝑅(𝜑)] s.t. 𝑔 * 𝜑 = arg min 𝑔 𝜑 E (ℎ 𝑠 ,𝑦 𝑠 )∼𝒫 𝑠 [𝑓 𝜃 (ℎ 𝑠 , 𝑦 𝑠 )ℒ(𝑔 𝜑 (ℎ 𝑠 ), 𝑦 𝑠 )] .<label>(1)</label></formula><p>Here, 𝑅(𝜑) represents the reward, which is the performance of the predictor 𝑔 𝜑 trained using the source prompt data 𝒟 𝑠 and evaluated using 𝒟 𝑡 as test data. The reward is measured using the quadratic weighted kappa (QWK) metric, which assesses the agreement between the predicted scores and the ground truth scores and is widely used in AES studies <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b9">10]</ref>. ℒ denotes the mean squared error (MSE) loss function used to train the predictor, as explained in Section 4.2.2. 𝒫 𝑠 and 𝒫 𝑡 represent the distributions of the source prompt data and the target prompt data, respectively. Solving this formulation offers a data value estimator that estimates the value score of each essay. The following subsections explain the specific calculation procedures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1.">Data Value Estimator</head><p>For each essay vector ℎ 𝑠 𝑖 and its score 𝑦 𝑠 𝑖 for the source prompt essays in 𝒟 𝑠 , the data value estimator 𝑓 𝜃 outputs its data value 𝑝 𝑖 ∈ [0, 1] as 𝑝 𝑖 = 𝑓 𝜃 (ℎ 𝑠 𝑖 , 𝑦 𝑠 𝑖 ). The data value estimator 𝑓 𝜃 is implemented using a deep neural network with six stacked dense layers, where the output layer is designed as a linear layer with sigmoid activation; it also incorporates marginal information 𝑚 𝑖 into its intermediate layer. The marginal information 𝑚 𝑖 is a quantity expected to correlate with the data value of each essay 𝑖 and can be written as 𝑚 𝑖 = |𝑦 𝑠 𝑖 − 𝑔 ˆ𝜑(ℎ 𝑠 𝑖 )|, where 𝑔 ˆ𝜑 is a predictor trained on 𝒟 𝑡 .</p><p>Using the calculated data value 𝑝 𝑖 , the selection indicator 𝑠 𝑖 ∈ {0, 1} for each essay is determined by sampling from a Bernoulli distribution with probability 𝑝 𝑖 ; that is, 𝑠 𝑖 ∼ Ber(𝑝 𝑖 ), where 𝑠 𝑖 = 1 means that the 𝑖-th data is selected, and 𝑠 𝑖 = 0 means that it is not selected.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.2.">Predictor</head><p>The source prompt data selected through the above procedure are used to train the predictor 𝑔 𝜑 . The predictor is designed as a multi-layer perceptron with a linear output layer with sigmoid activation <ref type="foot" target="#foot_1">3</ref> . The weighted loss function ℒ 𝑝𝑟𝑒𝑑 used for learning is calculated as follows:</p><formula xml:id="formula_1">ℒ 𝑝𝑟𝑒𝑑 (𝜑) = 1 𝑁 𝑠 ∑︁ (𝑥 𝑠 𝑖 ,𝑦 𝑠 𝑖 )∈𝒟 𝑠 𝑠 𝑖 • ℒ(𝑦 ˆ𝑠 𝑖 , 𝑦 𝑠 𝑖 ),<label>(2)</label></formula><p>where 𝑦 ˆ𝑠 𝑖 is the predicted score of the predictor 𝑔 𝜑 for the 𝑖-th essay of the source prompt data.</p><p>As the loss function ℒ, we use the MSE between the predicted score 𝑦 ˆ𝑠 𝑖 and the ground truth score 𝑦 𝑠 𝑖 . Note that the ground truth scores 𝑦 𝑠 𝑖 are assumed to be normalized to the range [0, 1] because the predicted scores are within this range too, as a result of the sigmoid activation in the output layer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.3.">Reinforcement Learning</head><p>Using the trained predictor, our method computes the reward 𝑅(𝜑) for reinforcement learning as the QWK between the predicted scores and the ground truth scores evaluated using the dataset 𝒟 𝑡 . The reward 𝑅(𝜑) is used to update the parameters 𝜃 of the data value estimator 𝑓 𝜃 . Specifically, the parameters 𝜃 are updated using the REINFORCE algorithm <ref type="bibr" target="#b40">[41]</ref>, a reinforcement learning algorithms, with the following loss function <ref type="bibr" target="#b31">[32]</ref>:</p><formula xml:id="formula_2">ℒ 𝑅𝐿 (𝜃) = 𝑅(𝜑) * log 𝑃 ((𝑠 1 , 𝑠 2 , . . . , 𝑠 𝑁𝑠 ) | 𝜃),<label>(3)</label></formula><p>where 𝑃 ((𝑠 1 , 𝑠 2 , . . . , 𝑠 𝑁𝑠 ) | 𝜃) represents the joint probability of the selection indicators given the parameters 𝜃. Note that each essay is selected independently, meaning that the joint probability can be written as</p><formula xml:id="formula_3">∏︀ 𝑁𝑠 𝑖=1 𝑝 𝑠 𝑖 𝑖 (1 − 𝑝 𝑖 ) 1−𝑠 𝑖 .</formula><p>Using this loss function, the parameters 𝜃 are updated by gradient ascent as follows:</p><formula xml:id="formula_4">𝜃 ← 𝜃 + 𝛼∇ 𝜃 ℒ 𝑅𝐿 (𝜃),<label>(4)</label></formula><p>where 𝛼 represents the learning rate, which is set to 0.001 in this study. Adam <ref type="bibr" target="#b41">[42]</ref> is used as the optimization method for parameter updates. Finally, by repeating the above steps until the model converges, the data value estimator 𝑓 𝜃 is trained.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Train an Arbitrary AES Model Based on Estimated Data Values</head><p>Through the above process, we can obtain the data value estimator 𝑓 𝜃 and the resulting data value scores for essays in the source prompt data 𝒟 𝑠 . Thus, our last step is to construct an AES model for the target prompt, using source prompt essays with high-value scores. However, it is not clear how much data should be selected based on their value scores. Thus, we employ the following approach, which is inspired by that described in <ref type="bibr" target="#b31">[32]</ref>, to select essays based on their value scores.</p><p>1. Sort the source prompt essays in descending order based on their estimated value scores.</p><p>2. Train an AES model using essays with top 10% value scores and repeat this process with different data usage percentages, ranging from 10% to 100%, in increments of 10%.</p><p>3. For the ten constructed models, evaluate their MSE loss, using 𝒟 𝑡 as test data. The model with the lowest MSE loss is selected as the optimal one and is used for scoring the unscored target prompt essays. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Experiment</head><p>We conducted an evaluation experiment using real-world data to demonstrate the score prediction performance of the proposed method compared with the conventional method, which uses all source data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Dataset</head><p>In this experiment, we used the ASAP (Automated Student Assessment Prize) <ref type="foot" target="#foot_2">4</ref> dataset as realworld data. The ASAP dataset is used in Kaggle's automated essay-scoring competition and is widely used as a benchmark dataset in many AES studies. The ASAP contains a total of 8 essay prompts for 3 genres: argumentative, source-dependent responses, and narrative. Each prompt also includes student's essays and their scores. The details of the dataset characteristics are shown in Table <ref type="table" target="#tab_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Performance Evaluation of our Proposed Method</head><p>In line with previous cross-prompt AES studies, the present experiment was conducted using prompt-wise cross-validation <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b21">22]</ref>. In prompt-wise cross-validation, one prompt is used as the target prompt, while all remaining prompts are used as source prompts for training. This operation is performed sequentially for all prompts, and the average is calculated to evaluate performance.</p><p>Our proposed method needs 𝒟 𝑡 , a small number of scored essay data sampled from the target prompt. In this experiment, the size of 𝒟 𝑡 was set to 30, and the set of samples was selected so that the sum of the Euclidean distances between each distributed essay representation vector obtained from DeBERTa-v3-large was maximized.</p><p>Our proposed method can be used for any AES model. The present experiment used four representative AES models: BERT, Llama-2-7B <ref type="bibr" target="#b32">[33]</ref>, PAES, and PMAES. Note that the PMAES with the same hyper-parameters as in <ref type="bibr" target="#b21">[22]</ref> could not be implemented using our GPU (RTX4090). Thus, we changed some hyper-parameters. Specifically, the number of mini-batches was changed from 2 to 20. The experiments were conducted in two settings: All source, and Proposed, and the score prediction accuracy was compared. All source is a setting in which each AES model is trained using all source prompt data, which is equivalent to the case where all essays are selected in the proposed method. Proposed is a setting in which each AES model is trained using a subset of source prompt data selected using our method. The prediction performance of each trained model is evaluated by QWK using the target prompt essays, excluding 30 data in 𝒟 𝑡 .</p><p>Table <ref type="table" target="#tab_1">2</ref> shows the experimental results. The results show that the proposed method outperforms the All source settings for all models. The improvement is particularly significant for BERT and Llama-2-7B. These models use the word sequence as the input data, increasing the difference in feature vector characteristics between the source and target prompts. This would enhance the negative impact of using source prompt essays irrelevant to the target prompt, thereby deteriorating the AES model trained using all source prompt data.</p><p>For PAES and PMAES, the improvement margin is smaller because they mitigate the difference in the feature space between prompts by using prompt-independent features and POS sequences as input. However, even for these models, the proposed method succeeds in improving their performances by selecting relevant essays that align better with the target prompt's characteristics.</p><p>Moreover, BERT achieves higher performance with the proposed method than does PAES and PMAES without the proposed method. This suggests that the proposed method applied to BERT can achieve performance comparable to these cross-prompt AES models. This is a significant result because it indicates that by simply selecting essays that are effective for the target prompt, it is possible to achieve performance comparable to conventional cross-prompt AES models without relying on complex techniques to align features across prompts.</p><p>These results demonstrate the effectiveness of the proposed method in selecting the most relevant essays from source prompts, leading to improved performance of conventional AES models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Validity Evaluation of Estimated Data Values</head><p>In this section, we investigate whether the value estimates of the proposed method appropriately relate to the score prediction performance. To confirm this point, we examined the prediction accuracy, QWK, of an AES model trained using source prompt essays, excluding those with top or bottom 𝑛% value scores. The removing ratio 𝑛 was changed from 0% to 90% in increments of 10%. This analysis uses PAES as the AES model because, as reported above, it demonstrated the highest performance among the models to which the proposed method was applied.</p><p>The experimental results for Prompt 1 are presented in Figure <ref type="figure" target="#fig_1">2</ref>, which shows the ratio of excluded essays on the horizontal axis and the QWK on the vertical axis. The blue line represents the QWK when essays are excluded in order of the highest value scores, while the orange line represents the QWK when essays are excluded in order of the lowest value scores.</p><p>The figure demonstrates that, for the range where the ratios of removed essays are small to medium, QWK tends to increase as essays with low value scores are sequentially excluded, whereas it tends to decrease when essays with high value scores are sequentially excluded. For the range where the ratios of removed essays are extremely large, both cases revealed low QWK values due to the removal of too many training data, which is a reasonable trend.</p><p>These results suggest that the value scores estimated by the proposed method appropriately relate to the effectiveness of the scoring performance of the constructed AES model for the target prompt.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>This study introduced a novel cross-prompt AES approach that leverages the data valuation method to select source prompt essays valuable to improving the accuracy of the AES model for the target prompt. The experimental results demonstrate the effectiveness of our method in improving the performance of AES models.</p><p>In future work, we will perform further analyses of the proposed model aimed at gaining a deeper understanding of its characteristics and behavior. Additional experiments are needed to evaluate the effects of utilizing a small set of scored essays for the target prompt, denoted as 𝒟 𝑡 , to train the AES model, in addition to its usage in our DVRL process. We also aim to explore methods that do not rely on 𝒟 𝑡 because this requirement may not always be feasible in real-world scenarios. Furthermore, we intend to develop an end-to-end model that integrates the data value estimation and AES components into a single, unified framework. This will enable a more streamlined and efficient approach to cross-prompt AES.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Model architecture of DVRL</figDesc><graphic coords="5,114.80,84.19,365.65,193.85" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Relationship between the ratio of essays and QWK for Prompt 1.</figDesc><graphic coords="10,198.43,84.19,198.42,121.38" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Details of the ASAP.</figDesc><table><row><cell cols="3">Prompt No. of essays Avg. len.</cell><cell>Genre</cell><cell>Score range</cell></row><row><cell>1</cell><cell>1783</cell><cell>350</cell><cell>Argumentative</cell><cell>2-12</cell></row><row><cell>2</cell><cell>1800</cell><cell>350</cell><cell>Argumentative</cell><cell>1-6</cell></row><row><cell>3</cell><cell>1726</cell><cell>150</cell><cell>Source-dependent</cell><cell>0-3</cell></row><row><cell>4</cell><cell>1772</cell><cell>150</cell><cell>Source-dependent</cell><cell>0-3</cell></row><row><cell>5</cell><cell>1805</cell><cell>150</cell><cell>Source-dependent</cell><cell>0-4</cell></row><row><cell>6</cell><cell>1800</cell><cell>150</cell><cell>Source-dependent</cell><cell>0-4</cell></row><row><cell>7</cell><cell>1569</cell><cell>250</cell><cell>Narrative</cell><cell>0-30</cell></row><row><cell>8</cell><cell>723</cell><cell>650</cell><cell>Narrative</cell><cell>0-60</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Experimental results.</figDesc><table><row><cell>Model</cell><cell>Setting</cell><cell>1</cell><cell>2</cell><cell>3</cell><cell>Prompts 4 5</cell><cell>6</cell><cell>7</cell><cell>8</cell><cell>Avg.</cell></row><row><cell>BERT</cell><cell cols="9">All source .513 .541 .578 .582 .637 .600 .529 .431 .551 Proposed .640 .581 .684 .631 .683 .636 .597 .628 .635</cell></row><row><cell>Llama-2-7B</cell><cell cols="9">All source .481 .556 .545 .610 .690 .582 .583 .424 .559 Proposed .530 .522 .661 .589 .704 .574 .686 .558 .603</cell></row><row><cell>PAES</cell><cell cols="9">All source .654 .583 .612 .605 .730 .565 .706 .542 .625 Proposed .787 .600 .588 .588 .747 .573 .737 .560 .648</cell></row><row><cell>PMAES</cell><cell cols="9">All source .799 .634 .591 .589 .716 .567 .658 .366 .615 Proposed .800 .627 .559 .606 .749 .613 .664 .523 .643</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">Note that the term prompt refers to the writing task or instructions given to a student, distinct from prompts used as inputs for large language models.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">In our study, we used different multi-layer perceptrons depending on the input data types. Specifically, a two-layer perceptron is used for cases inputting distributed essay representation vectors obtained from DeBERTa-v3-large, while a single-layer perceptron is used for cases inputting manually designed prompt-independent features.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">https://www.kaggle.com/c/asap-aes</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Analyzing rater severity in a freshman composition course using many facet Rasch measurement</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">D</forename><surname>Erguvan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">Aksu</forename><surname>Dunya</surname></persName>
		</author>
		<idno type="DOI">10.1186/s40468-020-0098-3</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.1186/s40468-020-0098-3" />
	</analytic>
	<monogr>
		<title level="j">Language Testing in Asia</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="1" to="20" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A generalized many-facet Rasch model and its Bayesian estimation using Hamiltonian Monte Carlo</title>
		<author>
			<persName><forename type="first">M</forename><surname>Uto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ueno</surname></persName>
		</author>
		<idno type="DOI">10.1007/s41237-020-00115-7</idno>
		<idno>020-00115-7</idno>
		<ptr target="org/10.1007/s41237-" />
	</analytic>
	<monogr>
		<title level="j">Behaviormetrika</title>
		<imprint>
			<biblScope unit="volume">47</biblScope>
			<biblScope unit="page" from="469" to="496" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A neural approach to automated essay scoring</title>
		<author>
			<persName><forename type="first">K</forename><surname>Taghipour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">T</forename><surname>Ng</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D16-1193</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2016 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1882" to="1891" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Automated essay scoring with e-rater® v.2</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Attali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Burstein</surname></persName>
		</author>
		<idno type="DOI">10.1002/j.2333-8504.2004.tb01972.x</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.1002/j.2333-8504.2004.tb01972.x" />
	</analytic>
	<monogr>
		<title level="j">The Journal of Technology, Learning and Assessment</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="1" to="30" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Automated essay scoring by maximizing human-machine agreement</title>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>He</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<meeting>the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="1741" to="1752" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Flexible domain adaptation for automated essay scoring using correlated linear regression</title>
		<author>
			<persName><forename type="first">P</forename><surname>Phandi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M A</forename><surname>Chai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">T</forename><surname>Ng</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D15-1049</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2015 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="431" to="439" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">ReaderBench learns Dutch: Building a comprehensive automated essay scoring system for Dutch language</title>
		<author>
			<persName><forename type="first">M</forename><surname>Dascalu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Westera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ruseti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Trausan-Matu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kurvers</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-61425-0_5</idno>
	</analytic>
	<monogr>
		<title level="m">International Conference on Artificial Intelligence in Education</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="52" to="63" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Active learning for improving machine learning of student explanatory essays</title>
		<author>
			<persName><forename type="first">P</forename><surname>Hastings</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hughes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Britt</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-93843-1_11</idno>
	</analytic>
	<monogr>
		<title level="m">International Conference on Artificial Intelligence in Education</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="140" to="153" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Prediction of writing true scores in automated scoring of essays by best linear predictors and penalized best linear predictors</title>
		<author>
			<persName><forename type="first">L</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Haberman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="DOI">10.1002/ets2.12248</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.1002/ets2.12248" />
	</analytic>
	<monogr>
		<title level="j">ETS Research Report Series</title>
		<imprint>
			<biblScope unit="page" from="1" to="27" />
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A review of deep-neural automated essay scoring models</title>
		<author>
			<persName><forename type="first">M</forename><surname>Uto</surname></persName>
		</author>
		<idno type="DOI">10.1007/s41237-021-00142-y</idno>
	</analytic>
	<monogr>
		<title level="j">Behaviormetrika</title>
		<imprint>
			<biblScope unit="volume">48</biblScope>
			<biblScope unit="page" from="1" to="26" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Prompt agnostic essay scorer: A domain generalization approach to cross-prompt automated essay scoring</title>
		<author>
			<persName><forename type="first">R</forename><surname>Ridley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2008.01441</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Automatic text scoring using neural networks</title>
		<author>
			<persName><forename type="first">D</forename><surname>Alikaniotis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yannakoudakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rei</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P16-1068</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 54th Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="715" to="725" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Automatic features for essay scoring-an empirical study</title>
		<author>
			<persName><forename type="first">F</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D16-1115</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2016 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1072" to="1077" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Attention-based recurrent convolutional neural network for automatic essay scoring</title>
		<author>
			<persName><forename type="first">F</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yang</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/K17-1017</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 21st Conference on Computational Natural Language Learning</title>
				<meeting>the 21st Conference on Computational Natural Language Learning</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="153" to="162" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">SkipFlow: Incorporating neural coherence features for end-to-end automatic text scoring</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Tay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Phan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A</forename><surname>Tuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">C</forename><surname>Hui</surname></persName>
		</author>
		<idno type="DOI">10.1609/aaai.v32i1.12045</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI Conference on Artificial Intelligence</title>
				<meeting>the AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">32</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Neural automated essay scoring and coherence modeling for adversarially crafted input</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Farag</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yannakoudakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Briscoe</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N18-1024</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long Papers</title>
		<meeting>the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="263" to="271" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">TDNN: A two-stage deep neural network for promptindependent automated essay scoring</title>
		<author>
			<persName><forename type="first">C</forename><surname>Jin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Hui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Sun</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/P18-1100</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics</title>
		<title level="s">Long Papers</title>
		<meeting>the 56th Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1088" to="1097" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Language models and automated essay scoring</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">U</forename><surname>Rodriguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jafari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">M</forename><surname>Ormerod</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1909.09482</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Neural automated essay scoring incorporating handcrafted features</title>
		<author>
			<persName><forename type="first">M</forename><surname>Uto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ueno</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.coling-main.535</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 28th International Conference on Computational Linguistics</title>
				<meeting>the 28th International Conference on Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="6077" to="6088" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Learning automated essay scoring models using item-response-theorybased scores to decrease dffects of rater biases</title>
		<author>
			<persName><forename type="first">M</forename><surname>Uto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Okano</surname></persName>
		</author>
		<idno type="DOI">10.1109/TLT.2022.3145352</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Learning Technologies</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="763" to="776" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Analytic automated essay scoring based on deep neural networks integrating multidimensional item response theory</title>
		<author>
			<persName><forename type="first">T</forename><surname>Shibata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Uto</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 29th International Conference on Computational Linguistics, International Committee on Computational Linguistics</title>
				<meeting>the 29th International Conference on Computational Linguistics, International Committee on Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="2917" to="2926" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">PMAES: Prompt-mapping contrastive learning for cross-prompt automated essay scoring</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.acl-long.83</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 61st Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1489" to="1503" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">SEDNN: Shared and enhanced deep neural network model for cross-prompt automated essay scoring</title>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-Y</forename><surname>Nie</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.knosys.2020.106491</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.1016/j.knosys.2020.106491" />
	</analytic>
	<monogr>
		<title level="j">Knowledge-Based Systems</title>
		<imprint>
			<biblScope unit="volume">210</biblScope>
			<biblScope unit="page">106491</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1423</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking</title>
		<author>
			<persName><forename type="first">R</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>He</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.findings-emnlp.141</idno>
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1560" to="1569" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Applying large language models and chain-ofthought for automatic scoring</title>
		<author>
			<persName><forename type="first">G.-G</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Latif</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhai</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.caeai.2024.100213</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.1016/j.caeai.2024.100213" />
	</analytic>
	<monogr>
		<title level="j">Computers and Education</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page">100213</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note>Artificial Intelligence</note>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Stahl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Biermann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nehring</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wachsmuth</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2404.15845</idno>
		<title level="m">Exploring llm prompting strategies for joint essay scoring and feedback generation</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Domain-adaptive neural automated essay scoring</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yu</surname></persName>
		</author>
		<idno type="DOI">10.1145/3397271.3401037</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval</title>
				<meeting>the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval</meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1011" to="1020" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Improving domain generalization for prompt-aware essay scoring via disentangled representation learning</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Gu</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.acl-long.696</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics</title>
		<title level="s">Long Papers</title>
		<meeting>the 61st Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="12456" to="12470" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<title level="m" type="main">Domain-adversarial training of neural networks</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Ganin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ustinova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ajakan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Germain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Larochelle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Laviolette</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Marchand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Lempitsky</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-58347-1_10</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
			<publisher>Springer International Publishing</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Domain adaptation for large-scale sentiment classification: A deep learning approach</title>
		<author>
			<persName><forename type="first">X</forename><surname>Glorot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bordes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 28th International Conference on International Conference on Machine Learning</title>
				<meeting>the 28th International Conference on International Conference on Machine Learning</meeting>
		<imprint>
			<publisher>Omnipress</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="513" to="520" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Data valuation using reinforcement learning</title>
		<author>
			<persName><forename type="first">J</forename><surname>Yoon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">O</forename><surname>Arik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Pfister</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 37th International Conference on Machine Learning</title>
				<meeting>the 37th International Conference on Machine Learning</meeting>
		<imprint>
			<publisher>JMLR</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Touvron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Stone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Albert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Almahairi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Babaei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Bashlykov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Batra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bhargava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bhosale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bikel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Blecher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">C</forename><surname>Ferrer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Cucurull</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Esiobu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fernandes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Fuller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Goswami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hartshorn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hosseini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Inan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kardas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kerkez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Khabsa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Kloumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Korenev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S</forename><surname>Koura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-A</forename><surname>Lachaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lavril</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Liskovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Mao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Martinet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mihaylov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Molybog</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Nie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Poulton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Reizenstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rungta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Saladi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Schelten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Silva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Subramanian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">E</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Taylor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Williams</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">X</forename><surname>Kuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Zarov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kambadur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rodriguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Stojnic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Edunov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Scialom</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2307.09288</idno>
		<title level="m">Llama 2: Open foundation and fine-tuned chat models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Optimizing search engines using clickthrough data</title>
		<author>
			<persName><forename type="first">T</forename><surname>Joachims</surname></persName>
		</author>
		<idno type="DOI">10.1145/775047.775067</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</title>
				<meeting>the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="133" to="142" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Data shapley: Equitable valuation of data for machine learning</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ghorbani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="2242" to="2251" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Data Banzhaf: A robust data valuation framework for machine learning</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Jia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Artificial Intelligence and Statistics</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lim</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1805.06431</idno>
		<title level="m">ChoiceNet: Robust learning by revealing output correlations</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Learning to reweight examples for robust deep learning</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zeng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Urtasun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="4334" to="4343" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2006.03654</idno>
		<title level="m">DeBERTa: Decoding-enhanced BERT with disentangled attention</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2111.09543</idno>
		<title level="m">DeBERTaV3: Improving DeBERTa using ELECTRA-style pretraining with gradient-disentangled embedding sharing</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<analytic>
		<title level="a" type="main">Simple statistical gradient-following algorithms for connectionist reinforcement learning</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Williams</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine learning</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="229" to="256" />
			<date type="published" when="1992">1992</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b41">
	<analytic>
		<title level="a" type="main">Adam: A method for stochastic optimization</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ba</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
