=Paper=
{{Paper
|id=Vol-3180/paper-232
|storemode=property
|title=Irony and Stereotype Spreaders Detection using BERT-large and AutoGulon
|pdfUrl=https://ceur-ws.org/Vol-3180/paper-232.pdf
|volume=Vol-3180
|authors=Yuning Zhang,Hui Ning
|dblpUrl=https://dblp.org/rec/conf/clef/ZhangN22
}}
==Irony and Stereotype Spreaders Detection using BERT-large and AutoGulon==
<pdf width="1500px">https://ceur-ws.org/Vol-3180/paper-232.pdf</pdf>
<pre>
Irony and Stereotype Spreaders Detection using
BERT-large and AutoGulon
Yuning Zhang, Hui Ning*
Harbin Engineering University (HEU), 145 Nantong Street, Nangang District, Harbin, China


                                      Abstract
                                      With the continuous development of the Internet, the Internet has become the mainstream way for
                                      people to socialize, and there is more and more content on the Internet. However, with the development
                                      of social networks comes the emergence of many Irony and stereotyped remarks, making the need for
                                      an automatic detection system more urgent. This paper provides a solution to the "Profiling Irony and
                                      Stereotype Spreaders on Twitter (IROSTEREO)" task proposed by PAN CLEF 2022, using BERT-large and
                                      AutoGluon to process and predict the data, and the final submitted score is 94.44%.

                                      Keywords
                                      Irony detection, Twitter, BERT-large, AutoGulon


1. Introduction
Online social media plays a vital role in People’s Daily life. With the development of the Internet
and the improvement of corresponding functions, the proportion of online social media in
People’s Daily life will increase, and more people will use the Internet to socialize. People can
communicate freely on Twitter, which has led to a series of Ironic, stereotypical comments,
often directed at women or LGTB people. Due to fast transmission, anonymity and easy access
to online media [1, 2, 3], such improper remarks are even more rampant. These inappropriate
statements, spread by large numbers of people and spread quickly, are impractical to identify and
approve manually. So it makes sense to identify these inappropriate comments automatically.
This paper solves the task of "Profiling Irony and Stereotype Spreaders on Twitter" [4] published
by Pan in 2022 [5], implements an algorithm to identify sarcastic and stereotype remarks, and
is submitted on TIRA [6]. This task extracts text features through BERT-large text embedding
and then uses AutoGluon to predict the model and obtain experimental results.


2. Related Works
Successive PAN at CLEF has published similar classification algorithm tasks. Some of them are
PAN 2018: Multimodal Gender Identification In Twitter [7], PAN 2019: Bots and Gender Profiling
in Twitter [8], PAN 2020: Profiling Fake News Spreaders on Twitter [9] and PAN 2021: Profiling
Hate Speech Spreaders on Twitter [10]. In last year’s task, Uzan et al. used classic machine
               ∗
    Corresponding author
CLEF 2022: Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy
$ pigeon_zyn@163.com (Y. Zhang); ninghui@hrbeu.edu.cn (H. Ning)
                                    © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)
learning methods like Support Vector Classifier, Multi-Layer Perceptron, Logistic Regression,
Random Forest, Ada-Boost Classifier and K-Neighbors Classifier to more recent deep learning
methods like BERT and Bidirectional LSTM [11]. In addition, many people are also trying to
use different approaches to identify hate and ironic speech. Salminen et al. experimented with
several classification algorithms (Logistic Regression, Naive Bayes, Support Vector Machines,
XGBoost, and Neural Networks) and feature representations (Bag-of-Words, TF-IDF, Word2Vec,
BERT, and their combination) [12]. Gonzalez et al. describe a model for irony detection based
on the contextualization of pre-trained Twitter word embeddings utilizing the Transformer
architecture [13].


3. Methodology
This paper presents an automated machine learning (AutoML) tool, AutoGloun, submitted to
the task "Profiling Irony and Stereotype Spreaders on Twitter". This task can be viewed as a
binary text categorization problem, categorizing Twitter users as "IROSTEREO spreaders" or
"non- IROSTEREO spreaders" based on their tweets.

3.1. Corpus
The task’s corpus consists of 420 XML files corresponding to the author. Each file contains 200
tweets from an author. In addition to tweet content, XML includes tags.

3.2. Pre-processing
Firstly, we preprocess the text to improve the accuracy of prediction. For example, we were
removing ’URL’ and ’USER’, unifying the case of files, converting emojis to corresponding
characters, and so on. The specific pretreatment work is shown in the following table.

Table 1
The Method of Pre-processing
                    Item                           Processing method
              ’URL’ and ’USER’                          Eliminate
                  Text case                    Converts all text to lowercase
                    Emoji                   Convert to the corresponding text
            Sentence abbreviations   Convert sentence abbreviations to extended mode
               Duplicate words                     Delete and simplify


3.3. Data prediction by AutoGluon
AutoGluon is a robust and accurate automated machine learning (AutoML) tool for structured
data [14], developed by Amazon. Its purpose is to extract features from input as far as possible
without human help, select suitable machine learning models and train them. There are several
frameworks for Automl, most of which are based on hyperparametric search technology, which
is to select a good model from dozens or hundreds of hyperparametric candidates in the
hope of avoiding manual tuning. Autogluon, on the other hand, wants to avoid searching
for hyperparameters so that multiple different models can be trained at the same time. Train
multiple models without hyperparametric search and combine them to achieve better results
than using hyperparametric search.
   Autogluon uses three techniques to achieve this effect. The first is stacking, training multiple
different models such as KNN, tree model or complex neural network on the same data set. The
outputs of these models are then entered into a linear model to obtain the final output, which is
the weighted sum of the outputs, where the weights are obtained by training.


Figure 1: The schematic diagram of the stacking


   The second is repeated k-fold bagging. Bagging is the training of multiple models of the
same class, which may use multiple weights and data blocks with different initial values, and
finally averages the output of these models to reduce the variance of the prediction. The k-fold
bagging stem from the k-fold cross validation.


Figure 2: The schematic diagram of the k-fold bagging
  The last is multi-layer stack ensembling. Multi-layer stack ensembling combines the outputs
and data of multiple models for another stacking. That is, train multiple models on top of it,
and then use a linear model to get an output. In order to prevent overfitting, multi-layer stack
ensembling is often used in conjunction with repeated k-fold bagging.


Figure 3: The schematic diagram of the multi-layer stack ensembling


   We extract embeddings from the last hidden layer of the BERT model. We then average these
Twitter-based features down to the user level. Finally, these features are fed to AutoGluon
tabular predictor for classification. Moreover, to avoid overfitting and underfitting, we use
5-fold cross-validation. For each classification feature, AutoGluon uses a separate embedding
layer, and the dimension of the embedding is proportional to the number of layers observed for
the feature [15]. The analysis process includes Neural networkANN, LightGBM boosted tree
[16], CatBoost boosted tree [17], random forest (RF), extremely randomized tree (ExtRa Trees)
and k-nearest neighbors (KNN). The embedded classification features and numerical features
are connected in series into the three-layer feedforward network and directly connected with
the output prediction.


4. Results
As shown in Table 2, We used AutoGulon to make five predictions, and there were some
differences in the prediction results obtained using different machine learning algorithms. Then
we selected the group with the best experimental results to submit.The table shows that the
prediction results are relatively accurate, with the highest accuracy of train set being 95.238%.
And the accuracy of test set is 94.444%.
Table 2
Results
                Number                 Model                 Average accuracy(%)
                                    LightGBMXT
                                 RandomForestGini
                   1              Neuralnettorch             94.048
                                    LightGBMXT
                                  LightGBMLarge
                                 RandomForestEntr
                                  NeuralnetFastAI
                   2                 LightGBM                95.238
                                     LightGBM
                                  Neuralnettorch
                                  Neuralnettorch
                                 RandomForestGini
                   3              NeuralnetFastAI            95.476
                                    LightGBMXT
                                   ExtraTreesEntr
                                  NeuralnetFastAI
                                 RandomForestGini
                   4                 LightGBM                94.524
                                   ExtraTreesGini
                                    LightGBMXT
                                    LightGBMXT
                                 RandomForestGini
                   5               ExtraTreesGini            95.238
                                  Neuralnettorch
                                  Neuralnettorch


5. Conclusions
In this paper, we describe our participation in the task "Profiling Irony and Stereotype Spreaders
on Twitter (IROSTEREO)" organized by PAN @ CLEF 2022 and detail the process of completing
the task. The whole experiment preprocessed the text and embedded it with BERT-large. Finally,
AutoGluon was used to predict the model. We can see that the accuracy of the final experiment
reached 94.44%. In the future, we will try more NLP and text categorization tasks, using different
methods to achieve the best results.


6. Acknowledgments
Whenever we have problems, we can get timely help from the organizers. Many thanks to the
organizers and reviewers for their guidance and support.
References
 [1] M. Khan, A. Abbas, A. Rehman, R. Nawaz, Hateclassify: A service framework for hate
     speech identication on social media, IEEE Internet Computing PP (2020) 1–1.
 [2] A. T. Martini, M. Farrukh, H. Ge, Recognition of ironic sentences in twitter using attention-
     based lstm, International Journal of Advanced Computer Science and Applications (2018).
 [3] A. Natalie, Hate speech on social media networks: towards a regulatory framework?,
     Information Communications Technology Law (2018) 1–17.
 [4] O.-B. Reynier, C. Berta, R. Francisco, R. Paolo, F. Elisabetta, Profiling Irony and Stereotype
     Spreaders on Twitter (IROSTEREO) at PAN 2022, in: CLEF 2022 Labs and Workshops,
     Notebook Papers, CEUR-WS.org, 2022.
 [5] J. Bevendorff, B. Chulvi, E. Fersini, A. Heini, M. Kestemont, K. Kredens, M. Mayerl,
     R. Ortega-Bueno, P. Pezik, M. Potthast, F. Rangel, P. Rosso, E. Stamatatos, B. Stein, M. Wieg-
     mann, M. Wolska, E. Zangerle, Overview of PAN 2022: Authorship Verification, Profiling
     Irony and Stereotype Spreaders, and Style Change Detection, in: M. D. E. F. S. C. M. G. P. A.
     H. M. P. G. F. N. F. Alberto Barron-Cedeno, Giovanni Da San Martino (Ed.), Experimental
     IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Thirteenth
     International Conference of the CLEF Association (CLEF 2022), volume 13390 of Lecture
     Notes in Computer Science, Springer, 2022.
 [6] M. Potthast, T. Gollub, M. Wiegmann, B. Stein, TIRA Integrated Research Architecture,
     in: N. Ferro, C. Peters (Eds.), Information Retrieval Evaluation in a Changing World, The
     Information Retrieval Series, Springer, Berlin Heidelberg New York, 2019. doi:10.1007/
     978-3-030-22948-1\_5.
 [7] E. Stamatatos, F. Rangel, M. Tschuggnall, B. Stein, M. Potthast, Overview of PAN 2018: 9th
     International Conference of the CLEF Association, CLEF 2018, Avignon, France, September
     10-14, 2018, Proceedings, Experimental IR Meets Multilinguality, Multimodality, and
     Interaction, 2018.
 [8] W. Daelemans, M. Kestemont, E. Manjavacas, M. Potthast, F. Rangel, P. Rosso, G. Specht,
     E. Stamatatos, B. Stein, M. a. Tschuggnall, Overview of pan 2019: Bots and gender profil-
     ing, celebrity profiling, cross-domain authorship attribution and style change detection,
     Springer, Cham (2019).
 [9] J. Bevendorff, B. Ghanem, A. Giachanou, M. Kestemont, E. Zangerle, Overview of pan
     2020: Authorship verification, celebrity profiling, profiling fake news spreaders on twitter,
     and style change detection, Springer, Cham (2020).
[10] Overview of pan 2021: Authorship verification, profiling hate speech spreaders on twitter,
     and style change detection, in: European Conference on Information Retrieval, 2021.
[11] M. Uzan, Y. Hacohen-Kerner, Detecting hate speech spreaders on twitter using lstm and
     bert in english and spanish - notebook for pan at clef 2021 keywords, in: CLEF 2021
     – Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings (CEUR-
     WS.org), 2021.
[12] J. Salminen, M. Hopf, S. A. Chowdhury, S. G. Jung, B. J. Jansen, Developing an online hate
     classifier for multiple social media platforms, Human-centric Computing and Information
     Sciences 10 (2020) 1.
[13] J. González, L. F. Hurtado, F. Pla, Transformer based contextualization of pre-trained
     word embeddings for irony detection in twitter - sciencedirect, Information Processing
     Management 57 (2020).
[14] N. Erickson, J. Mueller, A. Shirkov, H. Zhang, A. Smola, Autogluon-tabular: Robust and
     accurate automl for structured data (2020).
[15] C. Guo, F. Berkhahn, Entity embeddings of categorical variables (2016).
[16] M. Qi, Lightgbm: A highly efficient gradient boosting decision tree, in: Neural Information
     Processing Systems, 2017.
[17] A. V. Dorogush, V. Ershov, A. Gulin, Catboost: gradient boosting with categorical features
     support (2018).

</pre>