1. Introduction

Continual Transfer Learning With Progress Prompt for Multi-Author Writing Style Analysis

Zhanhong Ye

Yutong Zhong

yutongz115@gmail.com 0

Chen Huang

Leilei Kong

kongleilei@fosu.edu.cn 0 0 Foshan University , Foshan, Guangdong , China 1 Zhongnan University of Economics and Law , Wuhan, Hubei , China

This paper introduces a method utilizing forward knowledge transfer in continual learning to address the MultiAuthor Writing Style Analysis 2024. The motivation is to transfer knowledge of varying dificulty levels to the current training task. Therefore, we employ the method of continual learning and forward knowledge transfer to train the model on task sequences composed of datasets with varying dificulty levels. This approach allows us to gradually transfer knowledge of diferent dificulties to the current training task. We then evaluated the Multi-Author Writing Style Analysis datasets provided by PAN. Finally, we selected model weights with the best validation set performance from each sequence. We achieved F1 scores of 0.993, 0.830, and 0.832 on each of the three dificulty levels of the test sets.

eol>PAN 2024 Multi-Author Writing Style Analysis 2024 continual learning transfer learning

1. Introduction

Multi-author style identification involves determining whether the writing styles of two authors are consistent. Specifically, the style change detection task aims to determine whether the writing style changes between two consecutive paragraphs in a given multi-author document. Multi-Author Writing Style is extensively applied in plagiarism detection and author identification [ 1 ]. Furthermore, style change detection can aid in uncovering anonymous authorships, verifying claimed authorships, or developing new technologies for writing support.

Recent studies [ 2, 3 ] have employed the MTL (Multiple task learning) method, which involves solving multiple tasks jointly. However, one of the biggest challenges in MTL is to balance the convergence schedule across tasks. Diferences in task dificulties can result in faster convergence on some tasks over others. As a result, when access to all datasets are available, three dificult datasets provided by PAN [ 4, 5 ] can be accessed simultaneously, it is sub-optimal to directly utilize the MTL method for mixing the data from the three dificulty levels of datasets together [ 2 ].

Progress prompts [6] difer from traditional MTL in that they transform multiple tasks into a sequential learning process. This approach efectively avoids the sub-optimal outcomes often associated with the simultaneous training of tasks in MTL. Hence Progress prompt methods are better solutions than simply adding together the losses of all tasks. Adding together the losses of all tasks is typically sub-optimal [ 2 ]. Especially, when each dataset is evaluated independently, rather than evaluating the performance of all datasets simultaneously.

Progress prompts [6] combine prompt tuning [7] with continual learning [8], retain a learnable soft prompt [9] for each incoming task, and sequentially concatenate it with previously trained soft prompts. The purpose of this approach is to facilitate forward knowledge transfer [10], focusing on learning multiple tasks sequentially rather than simultaneously.

In this paper, we leverage the progress prompts method mentioned in the study [7] to transfer knowledge of varying dificulty levels from previous tasks to the current task using learnable soft prompts. Diferent from the MTL method, we employ the progress prompts method, which involves training a soft prompt for the current task and concatenating it with previously trained soft prompts. This allows for the transfer of knowledge across datasets of varying dificulty levels. Regarding the model architecture, the model has three parts. The first part involves soft prompt parameters that are combined with the parameters of the soft prompt for the current task and the parameters of the soft prompt for the previous task. The second part consists of the deberta-v3-base [11] model, which handles the current task. The third part is the classifier with classification loss.

2. Network Architecture

First, let be the dataset, where ∈ 1..3. consists of a binary classification task for a style change. We convert the easy, medium, and hard dificulties datasets in the Multi-Author Writing Style Analysis task [ 4, 5 ] into binary classification tasks. This means that in any dataset , the data input is called , meaning the paragraph pair, with an output of 0 or 1. 0 indicates that the paragraph pair has no style change, while 1 indicates that the paragraph pair has a style change. We then form a sequence of tasks with easy, medium, and hard datasets, (1, 2, 3).

The goal is to utilize the DeBERTa-v3 model to sequentially implement this binary classification for a style change task on the task sequence. After training on , we obtain the model’s classification performance on and then proceed to the next classification task. The core feature of the method is the progress prompts method, which involves learning a distinct soft prompt [9] for each task , ∈ 1..3. Note that the soft prompt has parameters provided by the embedding layer of the pre-trained language model. In addition, we not only learn a soft prompt for each task but also concatenate it with all previously trained soft prompts ; < ≤ 3. According to the model shown in Figure, it consists of an encoder block, classification, and soft prompt parameters. The first is the encoder block. We use the deberta-v3 [11] model to encode the input, which consists of pairs of paragraphs from the current dificulty dataset. Next comes the classification part, where we use linear layers as classifiers to classify the encoded content, making it possible to complete the current downstream task. Then, concerning the soft prompt parameters, they are initialized using the parameters from the embedding layer of the deberta-v3 model. The details of the progress prompts are in section 2.1. Overall, the primary loss function ℒ for training task can be defined as follows.

ℒ = ℒ (1) The loss ℒ means a cross-entropy loss to optimize the encoder block, classifier, and soft prompt parameters.

2.1. Progress prompt

Firstly, the PAN has provided three dificult datasets for Multi-Author Writing Style Analysis. Given a batch named , which comes from the current training task , the contents of can be defined as {(1, 1), (2, 2) . . . ( , )} ∈ , where means the paragraph pair, and is the corresponding label. Then we retain a learnable soft prompt for each and sequentially concatenate it with all previously learned soft prompts ; < ≤ 3. The soft prompt is obtained through the embedding layer of the pre-trained language model. Specifically, we select the last tokens from the pre-trained language model vocabulary as pseudo tokens and then pass these pseudo tokens into the embedding layer of the pre-trained language model to obtain all soft prompts. Then, we combine , and ( ) which is the current input embedding, sending them to the pre-trained model. The pre-trained model consists of the transformer [12] block, to obtain the corresponding hidden state ℋ . ( ) represents the input encoded by the embedding layer of the pre-trained language model.

input embeddings prompt for task 1 prompt for task 2 Bidirectional attention block causal masking frozen parameters trainable parameters

Encoder Block Classifier

After obtaining the hidden state ℋ we use a classifier to generate the soft labels for each category. = ( ) = (1 , 2 ) = ( ((ℋ)1)

((ℋ)2) ∑︀=1 ((ℋ)) , ∑︀ =1 (()) ) where (· ) is the soft label of sample , (· ) indicates the output of the linear layer for category , and represents the total number of categories. Then we calculate the cross-entropy loss for the classification

∑︁ ,∈ ℒ = −

( | ( ), , ) where refers to the model parameters of the encoder and classifier, and denotes the trainable parameters of the soft prompt for the -th task in the embedding layer. By training with Equation (4), we obtain the final pre-trained language model for the current task . Since the PAN committee provides 3 datasets of varying dificulty for the Multi-Author Writing Style Analysis task, these datasets can be arranged in diferent combinations to form 6 task sequences with diferent orders. We will apply our proposed method to these 6 task sequences. Then, we will select and save the model weights that achieve the highest performance on the validation set for the easy, medium, and hard datasets from these 6 task sequences. (2) (3) (4)

3. Experiments and Results 3.1. Data analysis

The PAN organizers have provided all data and the data is available in three dificulty levels: easy, medium, and hard. Each dificulty data set is divided into a training set, a validation set, and a test set. The distribution of each dataset is 70%, 15%, and 15%, respectively and the statistical analysis reveals that the token length of most entries is less than 512. Then we organize the data according to the method mentioned in section 2.1. In addition to this, when documents in the datasets of three dificulties are provided by only two authors (also given in the ground truth), it is possible to further analyze which author wrote each paragraph in the documents. Therefore, besides using consecutive pairs of paragraphs as paragraph pairs, we incorporate additional non-consecutive pairs of paragraphs into our paragraph pair set and assign them labels based on the inferred relationships between the authors. For example, if the same author is believed to have written both paragraphs, it is assumed that the style has not changed, and vice versa.

3.2. Experiment setting

In this work, the deberta-v3 base model is selected for classification. It concludes with 12 transformer encoder layers, its hidden size is 768. The three dificulty datasets are formed into six diferent task sequences, as Table 1 depicts. We train the model sequentially on datasets of varying task dificulties, following the given sequence. We set the early stopping to 10, the prompt length of 10 tokens for each dificulties dataset, and the learning rate to 5e-5, 3e-5, and 3e-5 for three datasets respectively. All experiments are conducted on NVIDIA A800 GPU with 80GB memory with a batch size of 64. 3.3. Results order i ii iii iv v vi

Task sequence medium hard hard medium easy hard hard easy easy medium medium easy easy easy medium medium hard hard We will conduct four experiments for validation datasets: the fine-tune method with deberta-v3, the best performance on the validation set, diferent datasets from all sequences with data augmentation and diferent datasets from all sequences including partial datasets with data augmentation or without augmentation. The results are presented in tables 2-5 respectively. We will then select the model weights that achieve the highest F1 scores on the validation sets corresponding to the dificulty levels across all sequences and submit these to the TIRA platform [13]. The final test set results are presented in Table 6.

4. Conclusion

In this paper, we have completed the tasks set by PAN and have employed the progress prompt method to tackle the Multi-Author Writing Style Analysis task. Instead of using traditional MTL (Multi-Task Learning) techniques, we utilize the progress prompt method to transfer knowledge from datasets of varying dificulties to the current training dataset. The proposed method achieves scores of 0.993, 0.830, and 0.832 on three test datasets. These results validate the efectiveness of our proposed method in performing the Multi-Author Writing Style Analysis task.

Acknowledgments References

This research was supported by the Natural Science Foundation of Guangdong Province, China (No.2022A1515011544) Analysis, and Generative AI Authorship Verification, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fourteenth International Conference of the CLEF Association (CLEF 2024), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2024. [6] A. Razdaibiedina, Y. Mao, R. Hou, M. Khabsa, M. Lewis, A. Almahairi, Progressive prompts:

Continual learning for language models, arXiv preprint arXiv:2301.12314 (2023). [7] B. Lester, R. Al-Rfou, N. Constant, The power of scale for parameter-eficient prompt tuning, arXiv preprint arXiv:2104.08691 (2021). [8] S. Thrun, Lifelong learning algorithms, in: Learning to learn, Springer, 1998, pp. 181–209. [9] X. Liu, Y. Zheng, Z. Du, M. Ding, Y. Qian, Z. Yang, J. Tang, Gpt understands, too, AI Open (2023). [10] Z. Ke, B. Liu, N. Ma, H. Xu, L. Shu, Achieving forgetting prevention and knowledge transfer in continual learning, Advances in Neural Information Processing Systems 34 (2021) 22443–22456. [11] P. He, J. Gao, W. Chen, Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing, arXiv preprint arXiv:2111.09543 (2021). [12] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin,

Attention is all you need, Advances in neural information processing systems 30 (2017). [13] M. Fröbe, M. Wiegmann, N. Kolyada, B. Grahm, T. Elstner, F. Loebe, M. Hagen, B. Stein, M. Potthast, Continuous Integration for Reproducible Shared Tasks with TIRA.io, in: J. Kamps, L. Goeuriot, F. Crestani, M. Maistro, H. Joho, B. Davis, C. Gurrin, U. Kruschwitz, A. Caputo (Eds.), Advances in Information Retrieval. 45th European Conference on IR Research (ECIR 2023), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2023, pp. 236–241. doi:10.1007/ 978-3-031-28241-6_20.

[1]

Ye ,

Zhong ,

Qi , Y. Han, Supervised contrastive learning for multi-author writing style analysis , in: Conference and Labs of the Evaluation Forum , 2023 .

[2]

Liu ,

Rajagopalan ,

Nigam ,

Singh ,

Sun ,

Xu ,

Zeng , T. Chilimbi, Asynchronous convergence in multi-task learning via knowledge distillation from converged tasks , in: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track , 2022 , pp. 149 - 159 .

[3]

Hashemi ,

Shi , Enhancing writing style change detection using transformer-based models and data augmentation , Working Notes of CLEF ( 2023 ).

[4]

Zangerle ,

Mayerl ,

Potthast ,

Stein , Overview of the Multi-Author Writing Style Analysis Task at PAN 2024 , in: G. Faggioli,

Ferro ,

Galuščáková , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR-WS .org, 2024 .

[5]

Bevendorf ,

X. B.

Casals ,

Chulvi ,

Dementieva ,

Elnagar ,

Freitag ,

Fröbe ,

Korenčić ,

Mayerl ,

Mukherjee ,

Panchenko ,

Potthast ,

Rangel ,

Rosso ,

Smirnova ,

Stamatatos ,

Stein ,

Taulé ,

Ustalov ,

Wiegmann , E. Zangerle, Overview of PAN 2024: Multi-Author Writing Style Analysis , Multilingual Text Detoxification , Oppositional Thinking