=Paper= {{Paper |id=Vol-3180/paper-234 |storemode=property |title=Style Change Detection Based On Bi-LSTM And Bert |pdfUrl=https://ceur-ws.org/Vol-3180/paper-234.pdf |volume=Vol-3180 |authors=Jiayang Zi,Ling Zhou,Zhengyao Liu |dblpUrl=https://dblp.org/rec/conf/clef/ZiZL22 }} ==Style Change Detection Based On Bi-LSTM And Bert== https://ceur-ws.org/Vol-3180/paper-234.pdf
Style Change Detection Based On Bi-LSTM And Bert
Notebook for PAN at CLEF 2022

Jiayang Zia , Ling Zhoua , Zhengyao Liua

a Foshan University, Foshan, China



                Abstract
                This article is an overview of PAN 2022's approach to the Style Change Detection task. The
                goal of this competition task is to identify the text positions for author switching in a given
                multi-author document. PAN helps us segment the tasks, respectively task1 finds paragraphs
                with two authors and cuts them into different texts; tesk2 finds paragraphs with two or more
                authors, finds the writing style happens changing location. tesk3 does a more fine-grained
                lookup on the task requirements of tesk2 to find where in the sentence the writing style
                changes. This paper designs a method to handle the task based on a model composed of
                convolutional neural networks, Bert and bidirectional long-short-term memory networks,
                using binary classification to judge style changes and author label problems. The f1 scores
                are obtained, which are 0.67 for task1 and 0.40 for task2 and 0.65 for tesk3.

                Keywords 1
                Style Change Detection, Bi-LSTM, Bert, CNN

1. Introduction

    Today is an era that emphasizes intellectual property rights. There are many and secret means of
plagiarism. It may be difficult to find out whether an article is suspected of plagiarism by manual
work, and the labor efficiency is also very low. Using writing style detection makes the difficult task
of detecting plagiarism much easier. By using writing style detection to screen articles, the
problematic paragraphs in the article can be marked and then sent to manual detection, improving the
detection efficiency and improving the detection accuracy to have the best of both worlds. In addition
to detecting plagiarism, it can also classify according to different authors in the same article. This idea
fits perfectly with task1 in Pan's 2022 Style Change Detection task [1]. Task 2 is to find all the
positions where the writing style changes in the text written by two or more authors, and give the
paragraph a corresponding author's number. Detecting author switching is to use the model to learn
the characteristics of the training set and use this to judge whether the authors in the article have
changed, and on this basis, count the numbers of different authors and assign them to the
corresponding paragraphs. task3 is further screened and detected on the basis of task2. The position
where the writing style changes may also appear between different sentences in the same paragraph.
Task3 is to mark the position where the writing style changes between sentences [2][3].
    After a comprehensive analysis of all tasks, this paper proposes a solution for the PAN 2022 Style
Change Detection task, based on the BBCG model, which consists of BERT, Bi-LSTM, CNN and
GlobalMaxPooling. Use BERT to encode the features in the text and convert them into word vectors
to better express the relationship between words. Bi-LSTM solves the problem of context dependence
to extract relevant features. CNN further extracts the associated features, avoids overfitting and
improves the generalization ability of the model, and finally uses the Pooling layer to improve the
running speed and accuracy of the model. The data text given by this PAN task can be used to
effectively identify and judge the author, and finally, it has achieved good results.
1
 CLEF 2022 – Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy
EMAIL: 1109618450@qq.com (A. 1); zhoulingfsu@gmail.com (A. 2);
ORCID: 0000-0002-4307-622X (A. 1); 0000-0001-6861-8980 (A. 2)
                  ©️ 2022 Copyright for this paper by its authors.
                  Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                  CEUR Workshop Proceedings (CEUR-WS.org)
2. Background

    PAN 2022 Style Change Detection task is divided into three subtasks.
    1. For a text written by two authors that contains a single style change only, find the position of
this change.
    2. For a text written by two or more authors, find all positions of writing style change.
    3. For a text written by two or more authors, find all positions of writing style change, where style
changes now not only occur between paragraphs, but at the sentence level.
    In addition, PAN also provides three data sets for testing the algorithm, corresponding to three
tasks, and each data set corresponds to three parts:
    1. training set: Contains 70% of the whole dataset and includes ground truth data.
    2. validation set: Contains 15% of the whole dataset and includes ground truth data.
    3. test set: Contains 15% of the whole dataset, no ground truth data is given.
    The method model in this paper is evaluated based on the above training set and validation set, and
finally the test set is input into the model to obtain the experimental results. The generated results are
submitted to the TIRA [7] platform, and finally the calculation and evaluation of the model in this
paper is completed on the platform, and then the F1 score of the results is output.




   Figure 1: Expected output of simulation scenarios and tasks of Style Change Detection tasks [1]




3. Method

   The neural network model proposed in the task is named BBCG model in this paper. It is a neural
network structure composed of Bert [4][5], Bi-LSTM, one-dimensional convolution layer, pooling
layer and full connection layer, as shown in Figure 2.
Figure2:Architecture diagram for model
3.1 Word embedding layer
   Word embedding is used as the data input layer of the model, and in this task, the pre-training
model Bert model proposed in the past four years is first used. This model emphasizes that the
traditional single-item language model or the shallow splicing of two one-way language models is no
longer used as usual, but the new MLM technology is used [6]. Bert is a deep bidirectional pre-trained
language model using Transformers as a feature extractor. The model is used to train the word vector
model, that is, it has a deeper number of layers and better parallelism, and has achieved quite good
results in various language tasks. the result of. In this layer, it is mainly the process of encoding the
words in the text and the corresponding features, and converting them into the form of word vectors
for input. Through the mapped word vector form, the vector form is used to better express the
relationship between words.

3.2 Bi-LSTM layer
   Long short-term memory network (LSTM) is a variant of RNN, which is mainly used to solve the
problem of context dependence. Bi-LSTM (Bidirectional Long Short-term Memory) is composed of
forward and reverse LSTM. Compared with one-direction LSTM, a single LSTM can only get past
information, but it can't get future information. Therefore, bidirectional LSTM can extract more
comprehensive features than unidirectional LSTM. As Bi-LSTM is composed of two LSTMs.
   Therefore, LSTM is the state ℎ𝑡 at the current time from the value 𝑓𝑡 of the forgetting gate, the
value 𝑖𝑡 of the memory gate and the cell state 𝑐𝑡 , the temporary cell state 𝑔𝑡 and the output gate 𝑜𝑡 .
Finally, Bi-LSTM outputs the ℎ𝑡 of two LSTMs in opposite directions respectively and combines
them to obtain the output at time t The calculation formula is as follows:
                                   𝑓(𝑡) = 𝛿(𝑊𝑓 𝑥(𝑡) + 𝑢𝑓 ℎ(𝑡−1) + 𝑏𝑓 ,                              (1)
                                  𝑖(𝑡) = 𝛿(𝑊𝑖 𝑥(𝑡) + 𝑢𝑖 ℎ(𝑡−1) + 𝑏𝑓 ),                                (2)
                                 𝑔(𝑡) = tanh (𝑊𝑔 𝑥(𝑡) + 𝑢𝑔 ℎ(𝑡−1) + 𝑏𝑔 ) ,                            (3)
                                 𝑐(𝑡) = 𝑓(𝑡) × 𝑐𝑡−1 + 𝑖(𝑡) × 𝑔(𝑡) ,                                   (4)
                                 𝑜(𝑡) = 𝛿(𝑊𝑜 𝑥(𝑡) + 𝑢𝑜 ℎ(𝑡−1) + 𝑏𝑜 ),                                 (5)
                                 ℎ𝑡 = 𝑜(𝑡) × tanh (𝑐𝑡 ),                                              (6)
Where 𝑊represents the weight and 𝑏 represents the offset.
3.3 CNN layer
   Through the feature relationship extracted by Bi-LSTM, the convolution layer is used to further
extract the features, thus reducing the parameter quantity and operation time of the model, and using
this method to avoid over-fitting. Improve the generalization ability of the model.
3.4 Pooling layer
  Pooling, also known as downsampling layer, is used to extract the average or maximum value in
the region (called average pooling and max pooling), and global pooling, which is used to reduce
dimensions, such as reducing 3 dimensions to 1 dimension. The most important purpose of the
pooling layer is to reduce the dimension, reduce the number of parameters, improve the running speed
and accuracy of the model, and avoid overfitting. In the task, global pooling is used, which is used to
perform ensemble max-pooling over multiple data.

3.5 Full connection layer
   In the fully connected layer, the nonlinear activation function "Relu" is used first, and the piecewise
linear function is used for calculation. Finally, the "Sorftmax" function is used to calculate the output
feature to obtain the probability of the input text sentence.

4. Result

Table 1
F1 score and accuracy of the trained model on the verification set
     Validation set                 Task1                      Task2                      Task3
        Accuracy                    0.7581                     0.7445                     0.6767
        F1 score                    0.6627                     0.4295                     0.6623


Table 2
Test set F1 score of the trained model
        Test set                    Task1                      Task2                      Task3
        F1 score                    0.6690                     0.4012                     0.6483


   The result of task 1 is not bad, but task 2 still has a lot of room for improvement. It may be that the
initial prediction of whether the author has changed or not has been wrong, resulting in errors in
assigning author numbers and the accumulation of errors. The type of task that needs to be judged is
sentences. Need to judge whether author changes have been made between shorter texts, and can be
done better

5. Conclusion

   This paper proposes a neural network-based model to deal with the PAN 2022 Style Change
Detection task, trying to answer the three tasks proposed in the task. Although the results obtained
based on the model in this paper are better than the baseline, it does not achieve the desired effect,
which also shows that for tasks 1 and 2, the model in this paper still has a lot of room for
improvement. In task 2, it is necessary to find texts with two or more authors. If there are multiple
authors in each paragraph, it will undoubtedly increase the difficulty of detecting the change of
writing style between the entire text paragraphs. In the future, we will also consider how to reduce the
occurrence of misjudgments due to style approximation in the style detection of long texts. At the
same time, it also shows that the Style Change Detection task can have more interesting challenges,
such as detecting whether a piece of text is deliberately imitated by other authors, so as to confuse the
machine inspection, these tasks will be more challenging.
   6. References

[1] E. Zangerle, M. Mayerl, M. Potthast, and B. Stein, “Overview of the Style Change Detection
    Task at PAN 2022,” in CLEF 2022 Labs and Workshops, Notebook Papers. CEUR-WS.org,2022.
[2] Zhang Z, Han Z, Kong L, et al. Style Change Detection Based On Writing Style Similarity[J].
    Training, 1970, 11: 17,051.
[3] C. Zuo, Y. Zhao, R. Banerjee, Style Change Detection with Feed-forward Neural Networks, in: L.
    Cappellato, N. Ferro, D. Losada, H. Müller (Eds.), CLEF 2019 Labs and Workshops, Notebook
    Papers, CEUR-WS.org, 2019. URL: http://ceur-ws.org/Vol-2380/.
[4] A. Iyer, S. Vosoughi, Style Change Detection Using BERT—Notebook for PAN at CLEF 2020, in: L.
    Cappellato, C. Eickhoff, N. Ferro, A. Névéol (Eds.), CLEF 2020 Labs and Workshops, Notebook
    Papers, CEUR-WS.org, 2020. URL: http://ceur-ws.org/Vol-2696/.
[5] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers
    for language understanding, ArXiv preprint arXiv:1810.04805 (2018).
[6] E. Grave, P. Bojanowski, P. Gupta, A. Joulin, T. Mikolov, Learning word vectors for 157 languages,
    arXiv preprint arXiv:1802.06893 (2018).
[7] M. Potthast, T. Gollub, M. Wiegmann, B. Stein, TIRA Integrated Research Architecture, in: N. Ferro,
    C. Peters (Eds.), Information Retrieval Evaluation in a Changing World, The Information Retrieval
    Series, Springer, Berlin Heidelberg New York, 2019. doi:10.1007/978-3-030-22948-1\_5.