Team foshan-university-of-guangdong at PAN: Adaptive
                         Entropy-Based Stability-Plasticity for Multi-Author
                         Writing Style Analysis
                         Notebook for the PAN Lab at CLEF 2024

                         Xurong Liu1,* , Hui Chen2 and Jiajun Lv1
                         1
                              Foshan University, Foshan , Guangdong, China
                         2
                             Shenzhen University, shenzhen , Guangdong, China


                                        Abstract
                                        In this paper, we address the Multi-Author Writing Style Analysis task for PAN 2024, which involves detecting
                                        style changes at paragraph levels within multi-author documents. To tackle this problem, we adopt the Entropy-
                                        based Stability-Plasticity (ESP) method, which dynamically adjusts the learning rates of different neural network
                                        layers based on their entropy values. This approach effectively balances stability and plasticity, allowing the
                                        model to retain essential knowledge from previous tasks while efficiently learning new information, thereby
                                        mitigating catastrophic forgetting. Our experiments, conducted on datasets of varying difficulty levels (Easy,
                                        Medium, Hard), demonstrate that ESP significantly outperforms traditional methods in detecting writing style
                                        changes. The results highlight the effectiveness of ESP in leveraging prior knowledge and reducing interference
                                        between tasks, making it a robust framework for continuous learning in text analysis applications.

                                        Keywords
                                        style change detection, ESP, lifelong learning


                         1. Introduction
                         The style change detection task aims to identify text positions within a given multi-author document at
                         which the author switches[1]. When there are no comparison texts are provided, if multiple authors
                         together have written a text, the only way that find evidence for this fact is style change detection to
                         detect plagiarism in a document. Likewise, style change detection can help to uncover gift authorships,
                         to verify a claimed authorship, or to develop new technology for writing support[2]. Style change
                         detection is a branch of authorship verification focussing on the examination of a document for the
                         different authorial style[3]. The application areas of writing style change detection range from plagiarism
                         detection, cyber security, and forensics and currently in fake news detection [4, 5, 6]. Endeavors to
                         detect changes in writing styles have been done under author dimerization or clustering and style
                         change detection [7, 8, 9].
                            The ability to continuously learn remains elusive for deep learning models, which cannot accumulate
                         knowledge in their weights when learning new tasks. For the task Multi-Author Writing Style Analysis
                         2024 of PAN that has three different difficulty datasets: Easy, Medium, and Hard, we adopted a method
                         called Entropy-based Stability-Plasticity (ESP) [10] for lifelong learning to address this problem, which
                         can decide dynamically how much each model layer should be modified via a plasticity factor. The results
                         show the robust framework of the approach in leveraging prior knowledge by reducing interference,
                         offering slight improvements over the baseline in maintaining performance across sequential tasks.


                         2. Background

                          CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France
                         *
                           Corresponding author.
                          $ Liu_xurongx@163.com (X. Liu); chenhui_yeah0101@163.com (H. Chen); lvjiajun.96@gmail.com (J. Lv)
                           0009-0000-4386-5336 (X. Liu); 0009-0007-2695-3220 (H. Chen); 0000-0002-8755-5310 (J. Lv)
                                     © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
For the task of Multi-Author Writing Style Analysis, some studies employ the use of feature combinations
such as lexical, syntactic and character features to analyze the variance in the styles of writing by
different authors [11, 12, 13, 14]. Some of these methods rely on the analysis of the different stylometric
features to detect stylistic changes in a document [15], while others adapted the outlier detection
methods used in plagiarism detection problems. In addition some studies investigated the use of
artificial neural networks to solve this problem [16, 15].
   The task Multi-Author Writing Style Analysis 2024 of PAN is defined as, for a given text, finding all
positions of writing style change on the paragraph level (i.e., for each pair of consecutive paragraphs,
assess whether there was a style change) [1]. The simultaneous shift in authorship and topic will be
carefully controlled; Task 1, Task 2, and Task 3 correspond to datasets of three different difficulty levels:
Easy, Medium, and Hard. The corresponding dataset descriptions are as follows:

    • Easy: Easy: The paragraphs of a document cover a variety of topics, allowing approaches to use
      topic information to detect authorship changes.
    • Medium: The topical variety in a document is small (though still present), forcing the approaches
      to focus more on style to effectively solve the detection task.
    • Hard: All paragraphs in a document are on the same topic.

   Artificial neural networks learn in a bounded environment, where the input distribution is assumed
to be fixed. When the input distribution changes, the model must adapt its weights to perform correctly
on the new task. Due to those modifications, the model overwrites previously learned patterns, creating
interference between old and new tasks, causing a problem known as catastrophic forgetting[17, 18].
The lifelong learning methods to alleviate catastrophic forgetting can be categorized into three classes,
based on how storing and using the task-specific information throughout the sequential learning
process, the replay methods, the regularization-based methods, and the parameter isolation methods
[19]. Vladimir Araujo proposes the ESP method, which relies on an entropy-based criterion to decide
how much a model has to modify the weights in each of its layers, which performs well compared to the
Stability version[10], and outperforms all baselines when trained on all experiments such as Replay.


3. System Overview
The Entropy-Based Stability-Plasticity (ESP) [10] method for lifelong learning utilizes an entropy-based
criterion to manage the trade-off between stability and plasticity. The plasticity factor is a crucial
component in ESP, determining how much each layer of the neural network should be updated. This
factor is computed using entropy to assess the importance of the parameters in each layer. The plasticity
factor 𝑃𝑖 for the 𝑖-th layer can be expressed as:

                                                     𝐻𝑖
                                             𝑃𝑖 = ∑︀𝐿                                                     (1)
                                                       𝑗=1 𝐻𝑗

  where 𝐻𝑖 is the entropy of the 𝑖-th layer and 𝐿 is the total number of layers.
  The entropy 𝐻𝑖 for the 𝑖-th layer is calculated based on the activations of the neurons in that layer.
The entropy helps in identifying how much information is being processed by the layer. The entropy
𝐻𝑖 for the 𝑖-th layer can be defined as:
                                                    𝑁𝑖
                                                    ∑︁
                                           𝐻𝑖 = −         𝑝𝑘 log 𝑝𝑘                                       (2)
                                                    𝑘=1

  where 𝑁𝑖 is the number of neurons in the 𝑖-th layer and 𝑝𝑘 is the probability associated with the
𝑘-th neuron’s activation.
  ESP uses the plasticity factor to scale the gradients during backpropagation. This ensures that layers
with higher entropy (more important) receive smaller updates, preserving their stability, while layers
with lower entropy (less important) are updated more, enhancing plasticity. The gradient update for
the 𝑖-th layer is scaled by its plasticity factor 𝑃𝑖 :

                                               ∆𝜃𝑖 = 𝑃𝑖 · ∇𝜃𝑖                                                 (3)

  where ∇𝜃𝑖 is the gradient of the loss with respect to the parameters of the 𝑖-th layer.
  The ESP training process involves the following steps:
   1. Forward Pass: Compute the output and activations of each layer.
   2. Entropy Calculation: Calculate the entropy for each layer.
   3. Plasticity Factor Calculation: Determine the plasticity factors based on the entropies.
   4. Backward Pass: Scale the gradients using the plasticity factors and update the model parameters.


Figure 1: Overview of the method. During the forward step, the backbone processes an example and generates
prediction and plasticity factor values for each block (left). During the backward pass, the plasticity factor is
used to adjust the final amount of modification each layer will have (right) [10].


   ESP effectively balances stability and plasticity by dynamically adjusting the learning rate for different
layers based on their entropy. This method ensures that important layers (with high entropy) are
preserved while less important layers (with low entropy) are more flexible to learn new tasks. These
formulas and the underlying methodology provide a robust framework for lifelong learning, addressing
the challenge of catastrophic forgetting while allowing the model to adapt to new information.
   We use BERT [20] as the encoder. For the decoder, following the original BERT model, we use the
first token (special token [CLS]) of the sequence and a classifier to predict the class. Additionally, we
use the default BERT vocabulary in our experiments. We utilize the Adam optimizer with a learning
rate of 3 × 10−5 and a training batch size of 32. To enhance the training for Task 2 and Task 3, we
adjusted the training sequence of the data to hard → medium → easy.


4. Results
Following the above experiment design, the results are as table 1 follows:
Table 1
Overview of the F1 accuracy for the multi-author writing style task in detecting at which positions the author
changes for task 1, tas 2, and task 3.
                                Approach             Task 1 Task 2 Task 3
                                emerald-callable     0.517    0.394    0.352
                                Baseline Predict 1   0.466    0.343    0.320
                                Baseline Predict 0   0.112    0.323    0.346


5. Conclusion
In this work, we addressed the Multi-Author Writing Style Analysis task for PAN 2024, which involves
detecting style changes at paragraph levels in multi-author documents. The approach, Entropy-based
Stability-Plasticity (ESP), effectively manages the trade-off between stability and plasticity in lifelong
learning scenarios by dynamically adjusting the learning rates of different network layers based on
their entropy values. This ensures that layers with high entropy, which are deemed more important,
receive smaller updates to preserve stability, while layers with low entropy, considered less critical, are
updated more to enhance plasticity. Our experiments utilized BERT as the encoder and demonstrated
the effectiveness of ESP across different difficulty levels of datasets (Easy, Medium, Hard). The results
showed that ESP outperforms traditional methods by effectively leveraging prior knowledge and
reducing interference between tasks. Specifically, ESP’s adaptive gradient scaling mechanism allows
the model to retain essential information from previous tasks while efficiently learning new tasks, thus
mitigating the issue of catastrophic forgetting.
   In conclusion, the ESP method provides a robust framework for continuous learning in multi-author
writing style analysis, offering slight improvements over the baseline in maintaining performance
across sequential tasks. Future work may explore the integration of ESP with other neural architectures
and its application to additional domains beyond text analysis.


References
 [1] M. Fröbe, M. Wiegmann, N. Kolyada, B. Grahm, T. Elstner, F. Loebe, M. Hagen, B. Stein, M. Potthast,
     Continuous Integration for Reproducible Shared Tasks with TIRA.io, in: J. Kamps, L. Goeuriot,
     F. Crestani, M. Maistro, H. Joho, B. Davis, C. Gurrin, U. Kruschwitz, A. Caputo (Eds.), Advances
     in Information Retrieval. 45th European Conference on IR Research (ECIR 2023), Lecture Notes
     in Computer Science, Springer, Berlin Heidelberg New York, 2023, pp. 236–241. doi:10.1007/
     978-3-031-28241-6_20.
 [2] J. Bevendorff, X. B. Casals, B. Chulvi, D. Dementieva, A. Elnagar, D. Freitag, M. Fröbe, D. Ko-
     renčić, M. Mayerl, A. Mukherjee, A. Panchenko, M. Potthast, F. Rangel, P. Rosso, A. Smirnova,
     E. Stamatatos, B. Stein, M. Taulé, D. Ustalov, M. Wiegmann, E. Zangerle, Overview of PAN 2024:
     Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking
     Analysis, and Generative AI Authorship Verification, in: L. Goeuriot, P. Mulhem, G. Quénot,
     D. Schwab, L. Soulier, G. M. D. Nunzio, P. Galuščáková, A. G. S. de Herrera, G. Faggioli, N. Ferro
     (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of
     the Fifteenth International Conference of the CLEF Association (CLEF 2024), Lecture Notes in
     Computer Science, Springer, Berlin Heidelberg New York, 2024.
 [3] V. A. Oloo, C. Otieno, L. A. Wanzare, A literature survey on writing style change detection based
     on machine learning: State-of-the-art–review, Int. J. Comput. Trends Technol. 70 (2022) 15–32.
 [4] D. Castro-Castro, C. A. Rodríguez-Lozada, R. Muñoz, Mixed style feature representation and
     b-maximal clustering for style change detection., in: CLEF (Working Notes), 2020.
 [5] A. Iyer, S. Vosoughi, Style change detection using bert., CLEF (Working Notes) 93 (2020) 106.
 [6] C. Zuo, Y. Zhao, R. Banerjee, Style change detection with feed-forward neural networks., CLEF
     (Working Notes) 93 (2019).
 [7] S. Nath, Style change detection using siamese neural networks., in: CLEF (Working Notes), 2021,
     pp. 2073–2082.
 [8] P. Rosso, F. Rangel, M. Potthast, E. Stamatatos, M. Tschuggnall, B. Stein, Overview of pan’16: new
     challenges for authorship analysis: cross-genre profiling, clustering, diarization, and obfuscation,
     in: Experimental IR Meets Multilinguality, Multimodality, and Interaction: 7th International
     Conference of the CLEF Association, CLEF 2016, Évora, Portugal, September 5-8, 2016, Proceedings
     7, Springer, 2016, pp. 332–350.
 [9] E. Zangerle, M. Mayerl, M. Potthast, B. Stein, Overview of the style change detection task at pan
     2020., CLEF (Working Notes) 93 (2020).
[10] V. Araujo, J. Hurtado, A. Soto, M.-F. Moens, Entropy-based stability-plasticity for lifelong learning,
     in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022,
     pp. 3721–3728.
[11] E. Zangerle, M. Mayerl, M. Potthast, B. Stein, Overview of the Multi-Author Writing Style Analysis
     Task at PAN 2024, in: G. Faggioli, N. Ferro, P. Galuščáková, A. G. S. de Herrera (Eds.), Working
     Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, CEUR-WS.org, 2021.
[12] M. L. Brocardo, I. Traore, S. Saad, I. Woungang, Authorship verification for short messages using
     stylometry, in: 2013 International Conference on Computer, Information and Telecommunication
     Systems (CITS), IEEE, 2013, pp. 1–6.
[13] M. P. Kuznetsov, A. Motrenko, R. Kuznetsova, V. V. Strijov, Methods for intrinsic plagiarism
     detection and author diarization., in: CLEF (Working notes), 2016, pp. 912–919.
[14] K. Safin, R. Kuznetsova, Style breach detection with neural sentence embeddings., in: CLEF
     (Working Notes), 2017.
[15] A. Rexha, M. Kröll, H. Ziak, R. Kern, Authorship identification of documents with high content
     similarity, Scientometrics 115 (2018) 223–237.
[16] M. Kestemont, M. Tschuggnall, E. Stamatatos, W. Daelemans, G. Specht, B. Stein, M. Potthast,
     Overview of the author identification task at pan-2018: cross-domain authorship attribution and
     style change detection, in: Working Notes Papers of the CLEF 2018 Evaluation Labs. Avignon,
     France, September 10-14, 2018/Cappellato, Linda [edit.]; et al., 2018, pp. 1–25.
[17] M. McCloskey, N. J. Cohen, Catastrophic interference in connectionist networks: The sequential
     learning problem, in: Psychology of learning and motivation, volume 24, Elsevier, 1989, pp.
     109–165.
[18] R. Ratcliff, Connectionist models of recognition memory: constraints imposed by learning and
     forgetting functions., Psychological review 97 (1990) 285.
[19] M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, T. Tuytelaars, A
     continual learning survey: Defying forgetting in classification tasks, IEEE transactions on pattern
     analysis and machine intelligence 44 (2021) 3366–3385.
[20] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers
     for language understanding, arXiv preprint arXiv:1810.04805 (2018).