<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">DeBERTa-v3 with R-Drop regularization for Multi-Author Writing Style Analysis Notebook for the PAN Lab at CLEF 2024</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Zhijian</forename><surname>Huang</surname></persName>
							<email>huangzhijian1024@163.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Foshan University</orgName>
								<address>
									<settlement>Foshan</settlement>
									<region>Guangdong</region>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Leilei</forename><surname>Kong</surname></persName>
							<email>kongleilei@fosu.edu.cn</email>
							<affiliation key="aff0">
								<orgName type="institution">Foshan University</orgName>
								<address>
									<settlement>Foshan</settlement>
									<region>Guangdong</region>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">DeBERTa-v3 with R-Drop regularization for Multi-Author Writing Style Analysis Notebook for the PAN Lab at CLEF 2024</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">9006DEE04B9712BD123A74B9227E0CCD</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:00+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Multi-Author Writing Style Analysis</term>
					<term>DeBERTa-v3</term>
					<term>R-Drop</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The Multi-Author Writing Style Analysis task aims to identify points within a multi-author document where the author changes, using variations in writing style as indicators. Existing approaches face challenges in achieving high robustness due to the complexity of distinguishing between different authors' styles. To address these challenges, we use a model based on base version of the DeBERTa-v3 model combined with R-Drop regularization. We trained the DeBERTa-v3 model independently on three different datasets representing varying difficulty levels, using R-Drop during training to enhance the model's performance by reducing uncertainty and improving generalization. In experiments, our method achieves F1 scores of 0.985, 0.815, and 0.826 on Task 1, Task 2, and Task 3 of the official test set for the PAN 2024 Multi-Author Writing Style Analysis, respectively.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Multi-Author Writing Style Analysis aims to identify points within a multi-author document where authorship changes occur. This task is based on the hypothesis that variations in writing style can serve as indicators of changes in authorship. PAN's evaluation focuses on distinguishing authorship changes at the paragraph level under varying conditions of topical similarity <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>.</p><p>Various methods have been proposed to tackle this task, ranging from traditional machine learning algorithms to advanced deep learning models. Earlier approaches predominantly relied on features extracted from the text, such as lexical and syntactic markers, to differentiate between authors. However, these methods often fall short in scenarios where stylistic differences are minute. More recent approaches employ pre-trained language models <ref type="bibr" target="#b2">[3]</ref> like BERT <ref type="bibr" target="#b3">[4]</ref> and its variants, which have shown promise in capturing deeper contextual information.</p><p>To further enhance the robustness of pre-trained language model approaches, we use the advanced pre-trained language model DeBERTa(Decoding-enhanced BERT with Disentangled Attention)-v3 <ref type="bibr" target="#b4">[5]</ref> and combine it with the R-Drop regularization <ref type="bibr" target="#b5">[6]</ref>. The DeBERTa-v3 model, known for its ability to capture complex language structures and patterns through its decoding-enhanced attention mechanism, serves as the foundation. By incorporating R-Drop, which introduces dual regularization during training, we enhance the model's performance by reducing uncertainty and improving generalization, thereby effectively mitigating overfitting and maintaining model stability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related work</head><p>Pre-trained language models have revolutionized the field of natural language processing (NLP), demonstrating significant improvements across various tasks, including multi-author writing style analysis. Models such as BERT <ref type="bibr" target="#b3">[4]</ref>, RoBERTa <ref type="bibr" target="#b6">[7]</ref> and DeBERTa <ref type="bibr" target="#b4">[5]</ref> have set new benchmarks by leveraging large-scale unsupervised pre-training followed by fine-tuning on specific tasks. These models capture contextual information bidirectionally, making them highly effective for tasks requiring nuanced understanding of both writing styles and semantic content.</p><p>In multi-author writing style analysis, pre-trained language models are utilized to encode textual features that are indicative of different authors' styles. For instance, Chen et al. <ref type="bibr" target="#b7">[8]</ref> demonstrated the use of a pre-trained language model for generating sentence embeddings optimized through contrastive learning for detecting writing style changes in multi-author documents. Similarly, Huang et al. <ref type="bibr" target="#b8">[9]</ref> proposed an encoded classifier using knowledge distillation, leveraging a large pre-trained model as the teacher to train a smaller student model for style change detection.</p><p>Regularization techniques are critical in enhancing the generalization capabilities of neural networks by preventing overfitting, especially on small datasets. Traditional methods such as L2 regularization, dropout, and early stopping have been widely used to improve model robustness. Dropout, in particular, randomly sets a fraction of the input units to zero during training, which helps in preventing co-adaptation of hidden units.</p><p>Building on the concept of dropout, R-Drop (Regularized Dropout) is a more recent technique proposed by Liang et al. <ref type="bibr" target="#b5">[6]</ref>. R-Drop enhances regularization by applying dropout twice during training and minimizing the divergence between the two forward passes. This approach encourages the model to produce consistent outputs despite the dropout noise, thereby learning more robust representations.</p><p>In the context of multi-author writing style analysis, R-Drop can be particularly beneficial when combined with pre-trained language models. The regularization helps mitigate overfitting on training datasets, which is often a challenge in style change detection tasks. By ensuring that the model maintains consistent outputs despite the dropout noise, R-Drop enhances the robustness of the learned writing style representations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methods</head><p>In the Multi-Author Writing Style Analysis task, our goal is to identify points within a document where the author changes, using variations in writing style as indicators. We use a model based on the base version of the DeBERTa-v3 model combined with R-Drop regularization. We trained the base version of the DeBERTa-v3 model independently on three different datasets representing varying difficulty levels, using R-Drop during training to enhance the model's performance by reducing uncertainty and improving generalization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Encoder and Classifier</head><p>We used the base version of the DeBERTa-v3 model as the encoder, which excels in capturing language structures and patterns. DeBERTa utilizes enhanced decoding and disentangled attention mechanisms to better understand contextual information. On top of DeBERTa-v3, we added a binary classification layer to detect style changes between paragraphs. This layer is trained using a loss function that combines cross-entropy loss and KL divergence for enhanced regularization, as described in the R-Drop regularization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">R-Drop regularization</head><p>The R-Drop method (Regularized Dropout) aims to reduce model uncertainty by introducing dual regularization during training. Specifically, for each training batch, we perform two forward and backward passes and calculate the KL divergence between the two forward pass results as a regularization term. The core formula is as follows:</p><formula xml:id="formula_0">𝐿 total = 𝐿 ce + 𝛼𝐿 kl</formula><p>where 𝐿 ce is the cross-entropy loss, 𝐿 kl is the KL divergence between the two forward pass results, and 𝛼 is the weighting parameter.</p><p>Given the input data 𝑥 𝑖 at each training step, we feed 𝑥 𝑖 through the forward pass of the network twice, obtaining two distributions of the model predictions, denoted as 𝑃 𝑤1 (𝑦 𝑖 |𝑥 𝑖 ) and 𝑃 𝑤2 (𝑦 𝑖 |𝑥 𝑖 ). Since the dropout operator randomly drops units in a model, the two forward passes are based on two different sub models. The KL divergence between these two output distributions is then calculated as follows:</p><formula xml:id="formula_1">𝐿 kl = 1 2 (𝐷 KL (𝑃 𝑤1 (𝑦 𝑖 |𝑥 𝑖 )‖𝑃 𝑤2 (𝑦 𝑖 |𝑥 𝑖 )) + 𝐷 KL (𝑃 𝑤2 (𝑦 𝑖 |𝑥 𝑖 )‖𝑃 𝑤1 (𝑦 𝑖 |𝑥 𝑖 )))</formula><p>where 𝐷 KL denotes the Kullback-Leibler divergence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">DeBERTa-v3 with R-Drop regularization</head><p>As Algorithm 1 demonstrates, the training process for DeBERTa-v3 with R-Drop regularization includes data input, model training, and model parameters output, ensuring a comprehensive understanding of the method. We start by loading and preprocessing the data, splitting it into training, validation, and test sets. The preprocessed paragraph pairs are then input into the model. We fine-tune the DeBERTa-v3-base model on the training data using the R-Drop method. This method involves performing two forward and backward passes for each training batch and calculating the KL divergence between the two forward pass results as a regularization term, helping to reduce model uncertainty and improve robustness.</p><p>During training, for each batch, we compute the forward pass twice with dropout, calculate the cross-entropy and KL divergence losses, and update the model parameters. Early stopping <ref type="bibr" target="#b9">[10]</ref> is implemented based on the evaluation set to prevent overfitting, monitoring the validation loss and halting training when it ceases to improve. We assess the model's performance on the validation set using the F1-score to measure its effectiveness. Finally, we evaluate the final model on the test set to measure its overall performance. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiments</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Datasets</head><p>The datasets were provided by the PAN (Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection) initiative as part of the PAN 2024 lab at CLEF (Conference and Labs of the Evaluation Forum). The datasets used for this task are derived from Reddit comments, combined into documents that represent different levels of difficulty. These datasets are designed to test the models' ability to detect style changes under various conditions of topical similarity. Each dataset is split into three subsets: training, validation, and test sets, with respective proportions of 70%, 15%, and 15%. The difficulty levels of the task are as follows:</p><p>• Easy: This dataset consists of documents where the paragraphs cover a wide variety of topics.</p><p>The diverse topics make it easier for models to leverage topic changes as signals for detecting authorship changes. • Medium: The documents in this dataset have a limited number of topics, requiring models to focus more on subtle changes in writing style rather than topic shifts to detect changes in authorship. • Hard: All paragraphs in the documents within this dataset are on the same topic. This scenario poses the greatest challenge as the models must rely entirely on stylistic differences to identify authorship changes</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Experimental Setup</head><p>For our experiments, our preprocessing of the data involves several key steps. First, the dataset is loaded, and each document is read. The documents are then split into natural paragraphs. Following this, we generate pairs of consecutive paragraphs and label each pair to indicate whether there is a style change between them. This labeling transforms the task into a binary classification problem. Each labeled pair of paragraphs is then used as an input for the training of our model. These steps result in the creation of features and labels for the training, validation, and test sets, which are used for model training and evaluation.</p><p>We utilized the DeBERTa-v3-base(the base version of the DeBERTa-v3 mode) and DeBERTa-v3-base+R-Drop models. The DeBERTa-v3-base model served as our baseline, whereas the DeBERTa-v3-base+R-Drop model incorporated the R-Drop regularization technique to enhance performance by reducing variance between different forward passes. For all three datasets, the R-Drop hyperparameters were set as follows: kl_alpha, was set to 5 based on the experimental results from the original paper <ref type="bibr" target="#b5">[6]</ref>. The Dropout rate was set to 0.1, which is the default value for DeBERTa-v3-Base.</p><p>The DeBERTa-v3-base+R-Drop model was fine-tuned on the training sets using the Adam optimizer <ref type="bibr" target="#b10">[11]</ref> with a learning rate of 1 × 10 −5 . The batch size was set to 16, and the models were trained for 10 epochs. To prevent overfitting, early stopping was implemented based on the validation loss. Additionally, we included results from two simple baselines: one where the prediction is always 1 and another where the prediction is always 0 <ref type="bibr" target="#b11">[12]</ref>, and compared these to the outcomes from our models on the test dataset.</p><p>Evaluation of the models was conducted using the F1-score on both the validation and test sets. The F1-score was chosen as the primary metric due to its balance between precision and recall, which is crucial for accurately detecting changes in writing style.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Results</head><p>In Table <ref type="table">1</ref>, we report the F1 scores on the validation set for the multi-author writing style analysis task. We present the scores for the validation and test sets, comparing the performance of DeBERTa-v3-base and DeBERTa-v3-base+R-Drop.The results show that the DeBERTa-v3-base+R-Drop model generally outperforms the baseline DeBERTa-v3-base model across all difficulty levels on the validation set. However, this improvement is more noticeable in the Easy category compared to the Medium and Hard categories.</p><p>The Easy category shows a significant performance boost, which might be attributed to the R-Drop regularization reducing overfitting and enhancing model stability. On the other hand, the performance gains in the Medium and Hard categories are relatively modest. This could be due to the increased complexity and reduced topical variety in these categories, which pose a greater challenge for the model. The smaller performance gains suggest that while R-Drop helps, it might not fully address the intricacies involved in these more difficult tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>F1 scores on validation set for multi-author writing style analysis task using DeBERTa-v3 and DeBERTa-v3 + R-Drop. The tasks included Task 1 (easy dataset), Task 2 (medium dataset), and Task 3 (hard dataset). In Table <ref type="table">2</ref>, we report the F1 scores on the test set for the multi-author writing style analysis task. The DeBERTa-v3-base+R-Drop model maintains its performance, but with noticeable variability in the Medium and Hard categories. This variability indicates that the model's generalization capability, while improved by R-Drop, still faces challenges with more complex and less distinct style changes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Approach</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>F1 scores on test set for multi-author writing style analysis task using DeBERTa-v3 and two sample baselines. The tasks included Task 1 (easy dataset), Task 2 (medium dataset), and Task 3 (hard dataset).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Approach</head><p>Task 1 Task 2 </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>Our study shows that combining the base version of the DeBERTa-v3 model with R-Drop regularization significantly improves the accuracy of detecting authorship changes across documents of varying difficulty. We trained the model on datasets with different levels of topic diversity, showing marked improvements particularly in documents with diverse topics. However, in documents with limited topical diversity, performance gains are modest, indicating the need for further refinement. Future work should focus on enhancing model performance in more complex scenarios and exploring complementary methods to address the identified challenges.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Algorithm 1 6 : 9 : 10 :</head><label>16910</label><figDesc>Training Process for DeBERTa-v3 with R-Drop regularization 1: Input: Number of training epochs 𝑒𝑝𝑜𝑐ℎ𝑠, Training data loader 𝑡𝑟𝑎𝑖𝑛_𝑙𝑜𝑎𝑑𝑒𝑟, Validation data loader 𝑣𝑎𝑙_𝑙𝑜𝑎𝑑𝑒𝑟, Loss function 𝑙𝑜𝑠𝑠_𝑓 𝑐𝑡, Weighting parameter 𝛼, Optimizer 𝑜𝑝𝑡𝑖𝑚𝑖𝑧𝑒𝑟, Evaluation step 𝑒𝑣𝑎𝑙_𝑠𝑡𝑒𝑝 2: Output: Trained model parameters 𝜃 3: Initialize model parameters 𝜃 with initial values 4: for epoch 𝑒 in range 𝑒𝑝𝑜𝑐ℎ𝑠 do 5: for each batch (𝑥, 𝑦) in 𝑡𝑟𝑎𝑖𝑛_𝑙𝑜𝑎𝑑𝑒𝑟 do Set model to training mode 7: Compute model output 𝑜𝑢𝑡𝑝𝑢𝑡1 for the current batch 𝑥 8: Compute model output 𝑜𝑢𝑡𝑝𝑢𝑡2 for the current batch 𝑥 Calculate the cross-entropy loss 𝐿 𝑐𝑒 = 0.5 • (𝑙𝑜𝑠𝑠_𝑓 𝑐𝑡(𝑜𝑢𝑡𝑝𝑢𝑡1, 𝑦) + 𝑙𝑜𝑠𝑠_𝑓 𝑐𝑡(𝑜𝑢𝑡𝑝𝑢𝑡2, 𝑦)) Compute the KL divergence loss 𝐿 𝑘𝑙 = 𝑐𝑜𝑚𝑝𝑢𝑡𝑒_𝑘𝑙_𝑙𝑜𝑠𝑠(𝑜𝑢𝑡𝑝𝑢𝑡1, 𝑜𝑢𝑡𝑝𝑢𝑡2) 11: Combine the losses: 𝐿 𝑡𝑜𝑡𝑎𝑙 = 𝐿 𝑐𝑒 + 𝛼 • 𝐿 𝑘𝑙 12: Perform backpropagation and update model parameters using 𝑜𝑝𝑡𝑖𝑚𝑖𝑧𝑒𝑟 13: if current step % 𝑒𝑣𝑎𝑙_𝑠𝑡𝑒𝑝 == 0 then Trained model parameters 𝜃</figDesc></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work is supported by the National Social Science Foundation of China (22BTQ101)</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Overview of the Multi-Author Writing Style Analysis Task at PAN 2024</title>
		<author>
			<persName><forename type="first">E</forename><surname>Zangerle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mayerl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<ptr target=".org" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2024 -Conference and Labs of the Evaluation Forum</title>
				<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Galuščáková</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">G S</forename><surname>Herrera</surname></persName>
		</editor>
		<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Deep contextualized word representations</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Peters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Neumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Iyyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gardner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<idno type="DOI">10.18653/V1/N18-1202</idno>
		<ptr target="https://doi.org/10.18653/v1/n18-1202.doi:10.18653/V1/N18-1202" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018</title>
				<editor>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Walker</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Ji</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Stent</surname></persName>
		</editor>
		<meeting>the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018<address><addrLine>New Orleans, Louisiana, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">June 1-6, 2018. 2018</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="2227" to="2237" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017</title>
				<editor>
			<persName><forename type="first">I</forename><surname>Guyon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><surname>Luxburg</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Bengio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><forename type="middle">M</forename><surname>Wallach</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><forename type="middle">V N</forename><surname>Vishwanathan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Garnett</surname></persName>
		</editor>
		<meeting><address><addrLine>Long Beach, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">December 4-9, 2017. 2017</date>
			<biblScope unit="page" from="5998" to="6008" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">BERT: pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/V1/N19-1423</idno>
		<ptr target="https://doi.org/10.18653/v1/n19-1423.doi:10.18653/V1/N19-1423" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Burstein</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Doran</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Solorio</surname></persName>
		</editor>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019<address><addrLine>Minneapolis, MN, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">June 2-7, 2019. 2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Deberta: decoding-enhanced bert with disentangled attention</title>
		<author>
			<persName><forename type="first">P</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=XPZIaotutsD" />
	</analytic>
	<monogr>
		<title level="m">9th International Conference on Learning Representations, ICLR 2021, Virtual Event</title>
				<meeting><address><addrLine>, Austria</addrLine></address></meeting>
		<imprint>
			<publisher>OpenReview</publisher>
			<date type="published" when="2021">May 3-7, 2021. 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">R-drop: Regularized dropout for neural networks</title>
		<author>
			<persName><forename type="first">X</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Meng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Liu</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper/2021/hash/5a66b9200f29ac3fa0ae244cc2a51b39-Abstract.html" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Ranzato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Beygelzimer</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><forename type="middle">N</forename><surname>Dauphin</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Liang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Vaughan</surname></persName>
		</editor>
		<meeting><address><addrLine>NeurIPS; virtual</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021-12-06">2021. December 6-14, 2021. 2021</date>
			<biblScope unit="page" from="10890" to="10905" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno>volume abs/1907.11692</idno>
		<ptr target="http://arxiv.org/abs/1907.11692.arXiv:1907.11692" />
		<title level="m">Roberta: A robustly optimized BERT pretraining approach</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A writing style embedding based on contrastive learning for multi-author writing style analysis</title>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Han</surname></persName>
		</author>
		<idno>WS.org</idno>
		<ptr target="https://ceur-ws.org/Vol-3497/paper-206.pdf" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Andf Guglielmo Faggioli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Vlachos</surname></persName>
		</editor>
		<meeting><address><addrLine>Thessaloniki, Greece</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">September 18th to 21st, 2023. 2023</date>
			<biblScope unit="volume">3497</biblScope>
			<biblScope unit="page" from="2562" to="2567" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Encoded classifier using knowledge distillation for multi-author writing style analysis</title>
		<author>
			<persName><forename type="first">M</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kong</surname></persName>
		</author>
		<idno>WS.org</idno>
		<ptr target="https://ceur-ws.org/Vol-3497/paper-214.pdf" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">M</forename><surname>Aliannejadi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Vlachos</surname></persName>
		</editor>
		<meeting><address><addrLine>Thessaloniki, Greece</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">September 18th to 21st, 2023. 2023</date>
			<biblScope unit="volume">3497</biblScope>
			<biblScope unit="page" from="2629" to="2634" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping</title>
		<author>
			<persName><forename type="first">R</forename><surname>Caruana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lawrence</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">L</forename><surname>Giles</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper/2000/hash/059fdcd96baeb75112f09fa1dcc740cc-Abstract.html" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS) 2000</title>
				<editor>
			<persName><forename type="first">T</forename><forename type="middle">K</forename><surname>Leen</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><forename type="middle">G</forename><surname>Dietterich</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Tresp</surname></persName>
		</editor>
		<meeting><address><addrLine>Denver, CO, USA</addrLine></address></meeting>
		<imprint>
			<publisher>MIT Press</publisher>
			<date type="published" when="2000">2000</date>
			<biblScope unit="page" from="402" to="408" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Adam: A method for stochastic optimization</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ba</surname></persName>
		</author>
		<ptr target="http://arxiv.org/abs/1412.6980" />
	</analytic>
	<monogr>
		<title level="m">3rd International Conference on Learning Representations, ICLR 2015</title>
				<editor>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Lecun</surname></persName>
		</editor>
		<meeting><address><addrLine>San Diego, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">May 7-9, 2015. 2015</date>
		</imprint>
	</monogr>
	<note>Conference Track Proceedings</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Continuous Integration for Reproducible Shared Tasks with TIRA</title>
		<author>
			<persName><forename type="first">M</forename><surname>Fröbe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kolyada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Grahm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Elstner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Loebe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hagen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-28241-6_20</idno>
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval. 45th European Conference on IR Research (ECIR 2023)</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">J</forename><surname>Kamps</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Crestani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Maistro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Joho</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Davis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Gurrin</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><surname>Kruschwitz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Caputo</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="236" to="241" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
