<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Continual Transfer Learning With Progress Prompt for Multi-Author Writing Style Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhanhong Ye</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yutong Zhong</string-name>
          <email>yutongz115@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chen Huang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leilei Kong</string-name>
          <email>kongleilei@fosu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Foshan University</institution>
          ,
          <addr-line>Foshan, Guangdong</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Zhongnan University of Economics and Law</institution>
          ,
          <addr-line>Wuhan, Hubei</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper introduces a method utilizing forward knowledge transfer in continual learning to address the MultiAuthor Writing Style Analysis 2024. The motivation is to transfer knowledge of varying dificulty levels to the current training task. Therefore, we employ the method of continual learning and forward knowledge transfer to train the model on task sequences composed of datasets with varying dificulty levels. This approach allows us to gradually transfer knowledge of diferent dificulties to the current training task. We then evaluated the Multi-Author Writing Style Analysis datasets provided by PAN. Finally, we selected model weights with the best validation set performance from each sequence. We achieved F1 scores of 0.993, 0.830, and 0.832 on each of the three dificulty levels of the test sets.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;PAN 2024</kwd>
        <kwd>Multi-Author Writing Style Analysis 2024</kwd>
        <kwd>continual learning</kwd>
        <kwd>transfer learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Multi-author style identification involves determining whether the writing styles of two authors are
consistent. Specifically, the style change detection task aims to determine whether the writing style
changes between two consecutive paragraphs in a given multi-author document. Multi-Author Writing
Style is extensively applied in plagiarism detection and author identification [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Furthermore, style
change detection can aid in uncovering anonymous authorships, verifying claimed authorships, or
developing new technologies for writing support.
      </p>
      <p>
        Recent studies [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] have employed the MTL (Multiple task learning) method, which involves solving
multiple tasks jointly. However, one of the biggest challenges in MTL is to balance the convergence
schedule across tasks. Diferences in task dificulties can result in faster convergence on some tasks
over others. As a result, when access to all datasets are available, three dificult datasets provided by
PAN [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] can be accessed simultaneously, it is sub-optimal to directly utilize the MTL method for
mixing the data from the three dificulty levels of datasets together [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Progress prompts [6] difer from traditional MTL in that they transform multiple tasks into a sequential
learning process. This approach efectively avoids the sub-optimal outcomes often associated with the
simultaneous training of tasks in MTL. Hence Progress prompt methods are better solutions than simply
adding together the losses of all tasks. Adding together the losses of all tasks is typically sub-optimal
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Especially, when each dataset is evaluated independently, rather than evaluating the performance
of all datasets simultaneously.
      </p>
      <p>Progress prompts [6] combine prompt tuning [7] with continual learning [8], retain a learnable soft
prompt [9] for each incoming task, and sequentially concatenate it with previously trained soft prompts.
The purpose of this approach is to facilitate forward knowledge transfer [10], focusing on learning
multiple tasks sequentially rather than simultaneously.</p>
      <p>In this paper, we leverage the progress prompts method mentioned in the study [7] to transfer knowledge
of varying dificulty levels from previous tasks to the current task using learnable soft prompts. Diferent
from the MTL method, we employ the progress prompts method, which involves training a soft prompt
for the current task and concatenating it with previously trained soft prompts. This allows for the
transfer of knowledge across datasets of varying dificulty levels. Regarding the model architecture,
the model has three parts. The first part involves soft prompt parameters that are combined with the
parameters of the soft prompt for the current task and the parameters of the soft prompt for the previous
task. The second part consists of the deberta-v3-base [11] model, which handles the current task. The
third part is the classifier with classification loss.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Network Architecture</title>
      <p>
        First, let  be the dataset, where  ∈ 1..3.  consists of a binary classification task for a style change.
We convert the easy, medium, and hard dificulties datasets in the Multi-Author Writing Style Analysis
task [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] into binary classification tasks. This means that in any dataset , the data input is called  ,
meaning the paragraph pair, with an output of 0 or 1. 0 indicates that the paragraph pair has no style
change, while 1 indicates that the paragraph pair has a style change. We then form a sequence of tasks
with easy, medium, and hard datasets, (1, 2, 3).
      </p>
      <p>The goal is to utilize the DeBERTa-v3 model to sequentially implement this binary classification for
a style change task on the task sequence. After training on , we obtain the model’s classification
performance on  and then proceed to the next classification task. The core feature of the method
is the progress prompts method, which involves learning a distinct soft prompt [9]  for each task
,  ∈ 1..3. Note that the soft prompt  has parameters provided by the embedding layer of the
pre-trained language model. In addition, we not only learn a soft prompt  for each task  but also
concatenate it with all previously trained soft prompts ;  &lt;  ≤ 3. According to the model shown in
Figure, it consists of an encoder block, classification, and soft prompt parameters. The first is the encoder
block. We use the deberta-v3 [11] model to encode the input, which consists of pairs of paragraphs
from the current dificulty dataset. Next comes the classification part, where we use linear layers as
classifiers to classify the encoded content, making it possible to complete the current downstream
task. Then, concerning the soft prompt parameters, they are initialized using the parameters from the
embedding layer of the deberta-v3 model. The details of the progress prompts are in section 2.1. Overall,
the primary loss function ℒ for training task  can be defined as follows.</p>
      <p>ℒ = ℒ
(1)
The loss ℒ means a cross-entropy loss to optimize the encoder block, classifier, and soft prompt
parameters.</p>
      <sec id="sec-2-1">
        <title>2.1. Progress prompt</title>
        <p>Firstly, the PAN has provided three dificult datasets for Multi-Author Writing Style Analysis. Given a
batch named , which comes from the current training task , the contents of  can be defined as
{(1, 1), (2, 2) . . . ( ,  )} ∈ , where  means the paragraph pair, and  is the corresponding
label. Then we retain a learnable soft prompt  for each  and sequentially concatenate it with all
previously learned soft prompts ;  &lt;  ≤ 3. The soft prompt is obtained through the embedding
layer of the pre-trained language model. Specifically, we select the last  tokens from the pre-trained
language model vocabulary  as pseudo tokens and then pass these pseudo tokens into the embedding
layer of the pre-trained language model to obtain all soft prompts. Then, we combine ,  and ( )
which is the current input embedding, sending them to the pre-trained model. The pre-trained model
consists of the transformer [12] block, to obtain the corresponding hidden state ℋ . ( ) represents
the input  encoded by the embedding layer of the pre-trained language model.</p>
        <p>input embeddings
prompt for task 1
prompt for task 2
Bidirectional
attention block
causal masking
frozen parameters
trainable
parameters</p>
        <sec id="sec-2-1-1">
          <title>Encoder Block</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>Classifier</title>
          <p>After obtaining the hidden state ℋ we use a classifier to generate the soft labels for each category.
  =  ( ) = (1 , 2 ) = (
((ℋ)1)</p>
          <p>((ℋ)2)
∑︀=1 ((ℋ)) , ∑︀
=1 (()) )
where  (· ) is the soft label of sample , (· ) indicates the output of the linear layer for category ,
and  represents the total number of categories. Then we calculate the cross-entropy loss for the
classification</p>
          <p>∑︁
,∈
ℒ = −</p>
          <p>( | ( ), ,   )
where  refers to the model parameters of the encoder and classifier, and   denotes the trainable
parameters of the soft prompt for the -th task in the embedding layer. By training with Equation (4),
we obtain the final pre-trained language model  for the current task . Since the PAN committee
provides 3 datasets of varying dificulty for the Multi-Author Writing Style Analysis task, these datasets
can be arranged in diferent combinations to form 6 task sequences with diferent orders. We will apply
our proposed method to these 6 task sequences. Then, we will select and save the model weights that
achieve the highest performance on the validation set for the easy, medium, and hard datasets from
these 6 task sequences.
(2)
(3)
(4)</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments and Results</title>
      <sec id="sec-3-1">
        <title>3.1. Data analysis</title>
        <p>The PAN organizers have provided all data and the data is available in three dificulty levels: easy,
medium, and hard. Each dificulty data set is divided into a training set, a validation set, and a test
set. The distribution of each dataset is 70%, 15%, and 15%, respectively and the statistical analysis
reveals that the token length of most entries is less than 512. Then we organize the data according
to the method mentioned in section 2.1. In addition to this, when documents in the datasets of three
dificulties are provided by only two authors (also given in the ground truth), it is possible to further
analyze which author wrote each paragraph in the documents. Therefore, besides using consecutive
pairs of paragraphs as paragraph pairs, we incorporate additional non-consecutive pairs of paragraphs
into our paragraph pair set and assign them labels based on the inferred relationships between the
authors. For example, if the same author is believed to have written both paragraphs, it is assumed that
the style has not changed, and vice versa.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Experiment setting</title>
        <p>In this work, the deberta-v3 base model is selected for classification. It concludes with 12 transformer
encoder layers, its hidden size is 768. The three dificulty datasets are formed into six diferent task
sequences, as Table 1 depicts. We train the model sequentially on datasets of varying task dificulties,
following the given sequence. We set the early stopping to 10, the prompt length of 10 tokens for each
dificulties dataset, and the learning rate to 5e-5, 3e-5, and 3e-5 for three datasets respectively. All
experiments are conducted on NVIDIA A800 GPU with 80GB memory with a batch size of 64.
3.3. Results
order
i
ii
iii
iv
v
vi</p>
        <p>Task sequence
medium hard
hard medium
easy hard
hard easy
easy medium
medium easy
easy
easy
medium
medium
hard
hard
We will conduct four experiments for validation datasets: the fine-tune method with deberta-v3, the
best performance on the validation set, diferent datasets from all sequences with data augmentation
and diferent datasets from all sequences including partial datasets with data augmentation or without
augmentation. The results are presented in tables 2-5 respectively. We will then select the model weights
that achieve the highest F1 scores on the validation sets corresponding to the dificulty levels across all
sequences and submit these to the TIRA platform [13]. The final test set results are presented in Table 6.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this paper, we have completed the tasks set by PAN and have employed the progress prompt method
to tackle the Multi-Author Writing Style Analysis task. Instead of using traditional MTL (Multi-Task
Learning) techniques, we utilize the progress prompt method to transfer knowledge from datasets of
varying dificulties to the current training dataset. The proposed method achieves scores of 0.993, 0.830,
and 0.832 on three test datasets. These results validate the efectiveness of our proposed method in
performing the Multi-Author Writing Style Analysis task.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments References</title>
      <p>This research was supported by the Natural Science Foundation of Guangdong Province, China
(No.2022A1515011544)
Analysis, and Generative AI Authorship Verification, in: Experimental IR Meets Multilinguality,
Multimodality, and Interaction. Proceedings of the Fourteenth International Conference of the
CLEF Association (CLEF 2024), Lecture Notes in Computer Science, Springer, Berlin Heidelberg
New York, 2024.
[6] A. Razdaibiedina, Y. Mao, R. Hou, M. Khabsa, M. Lewis, A. Almahairi, Progressive prompts:</p>
      <p>Continual learning for language models, arXiv preprint arXiv:2301.12314 (2023).
[7] B. Lester, R. Al-Rfou, N. Constant, The power of scale for parameter-eficient prompt tuning, arXiv
preprint arXiv:2104.08691 (2021).
[8] S. Thrun, Lifelong learning algorithms, in: Learning to learn, Springer, 1998, pp. 181–209.
[9] X. Liu, Y. Zheng, Z. Du, M. Ding, Y. Qian, Z. Yang, J. Tang, Gpt understands, too, AI Open (2023).
[10] Z. Ke, B. Liu, N. Ma, H. Xu, L. Shu, Achieving forgetting prevention and knowledge transfer in
continual learning, Advances in Neural Information Processing Systems 34 (2021) 22443–22456.
[11] P. He, J. Gao, W. Chen, Debertav3: Improving deberta using electra-style pre-training with
gradient-disentangled embedding sharing, arXiv preprint arXiv:2111.09543 (2021).
[12] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin,</p>
      <p>Attention is all you need, Advances in neural information processing systems 30 (2017).
[13] M. Fröbe, M. Wiegmann, N. Kolyada, B. Grahm, T. Elstner, F. Loebe, M. Hagen, B. Stein, M. Potthast,
Continuous Integration for Reproducible Shared Tasks with TIRA.io, in: J. Kamps, L. Goeuriot,
F. Crestani, M. Maistro, H. Joho, B. Davis, C. Gurrin, U. Kruschwitz, A. Caputo (Eds.), Advances
in Information Retrieval. 45th European Conference on IR Research (ECIR 2023), Lecture Notes
in Computer Science, Springer, Berlin Heidelberg New York, 2023, pp. 236–241. doi:10.1007/
978-3-031-28241-6_20.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Qi</surname>
          </string-name>
          , Y. Han,
          <article-title>Supervised contrastive learning for multi-author writing style analysis</article-title>
          ,
          <source>in: Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rajagopalan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nigam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zeng</surname>
          </string-name>
          , T. Chilimbi,
          <article-title>Asynchronous convergence in multi-task learning via knowledge distillation from converged tasks</article-title>
          ,
          <source>in: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>149</fpage>
          -
          <lpage>159</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hashemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <article-title>Enhancing writing style change detection using transformer-based models and data augmentation</article-title>
          , Working Notes of CLEF (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Zangerle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the Multi-Author Writing Style Analysis Task at PAN 2024</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.), Working Notes of CLEF 2024 -
          <article-title>Conference and Labs of the Evaluation Forum, CEUR-WS</article-title>
          .org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. B.</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smirnova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taulé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ustalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle,
          <article-title>Overview of PAN 2024: Multi-Author Writing Style Analysis</article-title>
          ,
          <source>Multilingual Text Detoxification</source>
          , Oppositional Thinking
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>