A Intelligent Detection Method for Irony and Stereotype Based on Hybird Neural Networks Notebook for PAN at CLEF 2022 Zexian Yang1 , Li Ma1 , Wenyin Yang1 , Qidi Lao 1 , Zhenlin Tan1 1 Foshan University, Foshan, China Abstract For the task of Profiling Irony and Stereotype Spreaders on Twitter[1,2], a deep learning model based on a combination of RNN and CNN is proposed in this paper. A special RNN is used to solve the context’s long-term dependency, and CNN is used to further extract relational features. The task involves classifying authors as Ironic or non-Ironic based on the number of their tweets, and the task is a judgment for those authors who use irony to spread a stereotype (ISS), that the task does as a binary classification task. After training and predicting on the task datasets given by PAN 22, the accuracy of the model announced by the organizer is about 0.9056. Keywords 1 Author Profiling, Irony and Stereotype Spreaders, Bi-LSTM 1. Introduction Today, with the birth of various new technologies such as big data and cloud computing, the technology of online social platforms is becoming more and more mature. Freely express personal remarks, so that people can express their personal remarks more freely on the online communication platform. Nowadays, because people wantonly publish such inflammatory remarks that are not conducive to the stable development of the country, social stability, and the physical and mental health of others, such remarks will cause serious harm to indiviuuals or the entire society [3]. Therefore, the social platform designs a corresponding algorithm to identify whether the speech sent by the user is excessive, incitement, hatred, or other speech to be restricted [4]. However, people's expressions today are also improving with the advancement of technology. After social platforms have restricted excessive speech, people use language in a metaphorical and subtle way to express the opposite of the literal meaning. That is, the language is ironic and negative. However, this type of language is offensive irony, used to ridicule and despise victims, causing certain psychological trauma to users. Considering the huge amount of daily information on social platforms, it is time-consuming, expensive, and inefficient to manually detect such ironic remarks. Therefore, it is necessary to develop an algorithm that can automatically identify ironic speech [5]. Therefore, the task of Profiling Irony and Stereotype Spreaders on Twitter at PAN 2022 is to verify whether the authors are likely to spread ironic remarks. Based on preprocessing the datasets with a custom function, this paper proposes a Bidirectional Long Short-Term Memory network (Bi- LSTM) and a Convolutional Neural Network (CNN) [6] composition. The spaces in the text are segmented through Textvectorization, and the segmented words are generated one by one corresponding to numerical values, thereby constructing a dictionary. Each word segmented from the training set will be used as a value, and each word is mapped to the value of the key in the dictionary. 1 CLEF 2022 – Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy EMAIL: roiv@qq.com (A. 1); molly_917@fosu.edu.cn (A. 2)(*corresponding author); cswyyang@163.com (A. 3) ORCID: 0000-0001-6060-7603 (A. 1); 0000-0002-5013-052X (A. 2); 0000-0003-4842-9060 (A. 3) - 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) Proceedings Then the positive integer sequence obtained by the Textvectorization text preprocessing function, and then the keys in the dictionary are mapped to the 120-dimensional word embedding layer. Put the data into the designed model to get the final desired result. 2. Datasets Profiling Irony and Stereotype Spreaders on Twitter provided a training set and a test set. The data sets are shown in Table 1. The datasets are all composed of XML files. Each XML file corresponds to an author, and there are 200 tweets in the XML file corresponding to each author. In the official training data set, there is also a real value file, giving each author the corresponding XML file tag of N or NI. Table 1 Statistics of datasets Datasets Number of texts Number of author Number of datas Training set 420 420 84000 Test set 180 180 36000 3. Irony and Stereotype Evangelist Identification Model Structure The neural network model proposed in this paper is to realize the discrimination task. The model consists of a textvectorization layer, an embedding layer, a Bi-LSTM layer,a convolutional layer and a fully connected layer. The neural network structure is shown in Figure 1. Figure 1: Architecture diagram for model 3.1. Textvectorization Layer The data after preprocessing is passed into the Textvectorization layer of the model. This layer mainly passes in the processed XML file data, divides the words according to spaces, and maps the words into the required integer sequence. In the dictionary learned by the Textvectorization function, in addition to the learned content, it also includes an empty character as padding (filling if the sentence length is not enough), and Unknown (UNK) represents that the character does not exist. It will be performed using UNK. This layer will further process the data for the word embedding layer mapping. 3.2. Word Embedding Layer Embedding layer [7] as a dictionary, that is, map integer indices (specific words) into dense vectors, will receive integers as input, look up these integers in the internal dictionary, and return the associated vector. And we will use the tensor input composed of the previous layer of integers to map to a 120-dimensional vector, and use the vector to solve the disadvantage that the integer encoding cannot express the relationship between words. 3.3. Bi-LSTM Layer A special model in RNN (Recurrent neural network) is called LSTM [8] which is used to solve the context dependence problem in RNN and is suitable for processing time series data. The structure is shown in Figure 2. Figure 2: LSTM structure Since LSTM can only use historical data, it cannot use future data information. Thus, the forward LSTM and the backward LSTM are combined to obtain a new Bi-LSTM structure [9]. Using Bi- LSTM is to insert the same input sequence into the forward and backward two LSTM, and then connect the hidden layers of the two networks together. computable information is improved so that the model can obtain historical and future information. Bi-LSTM [10] includes four parts: memory gate i, forgetting gate f, output gate o, and cell state c. The calculation process of LSTM is: layer state at the previous moment ℎ −1 , and obtain the value of the forgetting gate (A) Choose the forgotten information, enter the word at the current moment through the hidden . The formula is as follows: = ( + ℎ −1 + ) (1) moment ℎ −1 and the word at the current moment (B) By selecting the information to be memorized by inputting the hidden layer state at the previous , the value of the memory gate and the temporary cell state are obtained. The formula is as follows: = ( + ℎ −1 + ) (2) = ℎ( + ℎ −1 + ) (3) (C) By inputting the value of the memory gate , the value of the forgetting gate and the temporary cell state , the cell state at the current moment is obtained. The formula is as follows: = × −1 + × (4) (D) Through the hidden layer state at the previous moment ℎ −1 , the input word at the current at the current moment ℎ are obtained. The formula is as follows: moment , and the cell state at the current moment , the output gate and the hidden layer state = ( + ℎ −1 + ) (5) ℎ = × ℎ( ) (6) (E) Finally, since Bi-LSTM has forward LSTM and reverse LSTM represented by ℎ and ℎ , respectively, represent the output context hidden layer state vector, and connect and get the output of Bi-LSTM [9] at time t as ℎ = × ℎ( ) (7) In the formula, W and u represent the weight matrix, and b represents the offset. 3.4. CNN Layer Because Bi-LSTM can extract the feature relationship of the bidirectional time series dimension of the text, the CNN layer [11] is utilized to further extract the associated features in order to improve semantic analysis on the association between neighboring features. The complexity and quantity of parameters used in neural network model training can be decreased while maintaining the essential characteristics. It can successfully prevent overfitting and enhance the model’s capacity for generalization. 3.5. DNN Layer In the fully connected layer of the last two layers, the first layer uses the nonlinear activation function "Relu" for classification, and the last layer uses a simple linear activation function for the final result classification, and obtains the final two-classification result definition. Positive value is NI and negative value is I. 4. Experiments and Results 4.1. Experimental setting The word embedding layer included with Keras is used in this study to map words into 120- dimensional vectors. The activity of the model is then increased by adjusting the rate of SpatialDropout1D to 0.2. The Bi-LSTM was designed with 128 units. Relu is used as the activation function in Conv1D along with 64 convolution kernels, a convolution kernel stride of 1, a size of 4, GlobalMaxPooling1D for pooling calculation, and a rate of 0.3 dropout to avoid overfitting. The output unit of the first fully connected layer is 128 and the activation function is Relu. The weight matrix for classification is initialized by a unique kernel initializer in the final fully linked layer. During the training process, set the epoch to 5, and its optimization is Adam. 4.2. Results The data given by the organizer is divided into 80% for training and 20% for verification, and the trained model is used to verify the model. The training sample uses 5 epochs (E1, E2, E3, E4, E5). The results are shown in Table 2. Table 2 The result of training set Epoch Accuracy Loss val_accuracy val_loss E1 0.6310 0.6393 0.6220 0.6786 E2 0.6518 0.6038 0.8452 0.4877 E3 0.8363 0.4370 0.8571 0.3242 E4 0.9524 0.1464 0.8810 0.3010 E5 0.9851 0.0219 0.8810 0.3124 Task organizers invited participants to deploy their model on TIRA[12]. Through the five epochs, it can be clearly seen that the model is continuously trained, the accuracy is continuously improved, and the loss is reduced. After the accuracy of the validation set reaches the fifth time, the accuracy does not change, and the loss starts to increase. The organizer’s test set is used to verify the model, and the accuracy attained is 0.9056. 5. Conclusion In this paper, we describe the ironic speech task at PAN 22, in which we propose a deep learning- based model to detect Twitter users who spread ironic speech. By fine-tuning the hyperparameters during the training process of our proposed model, the model achieves the best accuracy of 0.9056. As the organizers announced, the model achieved accuracy on the English training set and on the final test set. At the same time, the experiment shows that the task is more challenging. Twitter is more than just text; it also has numerous emojis, which can be used sarcastically, and some people intentionally misspell the text. Errors never go away to avoid machine detection. This type of more complex detection remains a huge challenge, and people must think about it in order to devise better solutions to these problems. 6. Acknowledgments This work was supported by grants from the Basic and Applied Basic Research Fund of Guangdong Province No.2019A1515111080. 7. References [1] Bevendorff J, Chulvi B, Fersini E, et al. Overview of PAN 2022: Authorship Verification, Profiling Irony and Stereotype Spreaders, Style Change Detection, and Trigger Detection[C]//European Conference on Information Retrieval. Springer, Cham, 2022: 331- 338. [2] Ortega-Bueno R., Chulvi B., Rangel F., Rosso P. and Fersini E., “Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO) at PAN 2022,” in CLEF 2022 Labs and Workshops, Notebook Papers, CEUR-WS.org, 2022. [3] Bevendorff J, Chulvi B, Peña Sarracén G L D L, et al. Overview of PAN 2021: Authorship Verification, Profiling Hate Speech Spreaders on Twitter, and Style Change Detection[C]//International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, Cham, 2021: 419-431. [4] Rangel F, Giachanou A, Ghanem B H H, et al. Overview of the 8th author profiling task at pan 2020: Profiling fake news spreaders on twitter[C]//CEUR Workshop Proceedings. Sun SITE Central Europe, 2020, 2696: 1-18. [5] Rangel F, Sarracén G, Chulvi B, et al. Profiling hate speech spreaders on twitter task at PAN 2021[C]//CLEF. 2021. [6] Siino M, Di Nuovo E, Tinnirello I, et al. Detection of hate speech spreaders using convolutional neural networks[C]//CLEF. 2021. [7] Wang B, Wang A, Chen F, et al. Evaluating word embedding models: Methods and experimental results[J]. APSIPA transactions on signal and information processing, 2019, 8. [8] Yu Y, Si X, Hu C, et al. A review of recurrent neural networks: LSTM cells and network architectures[J]. Neural computation, 2019, 31(7): 1235-1270. [9] Liu G, Guo J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification[J]. Neurocomputing, 2019, 337: 325-338. [10]Zhang Y, Rao Z. n-BiLSTM: BiLSTM with n-gram Features for Text Classification[C]//2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC). IEEE, 2020: 1056-1059. [11]Yamashita R, Nishio M, Do R K G, et al. Convolutional neural networks: an overview and application in radiology[J]. Insights into imaging, 2018, 9(4): 611-629. [12]Potthast M, Gollub T, Wiegmann M, et al. TIRA integrated research architecture[M]//Information Retrieval Evaluation in a Changing World. Springer, Cham, 2019: 123-160.