1. Introduction

Conference and Labs of the Evaluation Forum, September

Profiling Cryptocurrency Influencers with Few-Shot Learning Using Data Augmentation and ELECTRA

Marco Siino

0 1

Maurizio Tesconi

Ilenia Tinnirello

1 0 Cyber Intelligence Lab, Institute of Informatics and Telematics, National Research Council , Pisa, 56127 , Italy 1 Department of Engineering, University of Palermo , Palermo, 90128 , Italy

2023

1 8 21

With this work we propose an application of the ELECTRA Transformer, fine-tuned on two augmented version of the same training dataset. Our team developed the novel framework for taking part at the Profiling Cryptocurrency Influencers with Few-shot Learning task hosted at PAN@CLEF2023. Our proposed strategy consists of an early data augmentation stage followed by a fine-tuning of ELECTRA. At the first stage we augment the original training dataset provided by the organizers using backtranslation. Using this augmented version of the training dataset, we perform a fine tuning of ELECTRA. Finally, using the fine-tuned version of ELECTRA, we inference the labels of the samples provided in the test set. To develop and test our model we used a two-ways validation on the training set. Firstly, we evaluate all the metrics on the augmented training set, and then we evaluate on the original training set. The metrics we considered span from accuracy to Macro F1, to Micro F1, to Recall and Precision. According to the oficial evaluator, our best submission reached a Macro F1 value equal to 0.3762.

eol>cryptocurrency influencers few-shot learning author profiling text classification Twitter data augmentation electra

1. Introduction

English tweets each were available and classes available for predictions were: (1) subjective opinion, (2) financial information, (3) advertising, (4) announcement. In this paper we discuss the framework we used to participate in the first subtask (i.e., low-resource influencer profiling).

After this introduction section, in Section 2 we discuss some traditional and deep approaches for text classification, along with a brief discussion on some of the architecture proposed in the previous editions of PAN. In Section 3 we provide the description of our method, including the training and the simulation steps. In Section 4 we detail the experimental setup and the evaluation of our framework, reporting the results obtained. In Section 5 we introduce some future works and conclude the paper.

2. Related Work

A comprehensive discussion on the proposed task for PAN@CLEF2023 is conducted in [ 3 ]. To develope our proposed approaches [ 4, 5 ], we evaluated the best performing methods participating at the previous-shared competitions organized by PAN. We looked at the results of the winning team at the author profiling task in 2021, where the best performing model consisted of a shallow CNN presented in [ 6, 7 ]. We also considered the winning model at PAN@CLEF2022 where the authors won the challenge thanks to a soft voting ensemble technique that combines BERTweet models with various loss functions and a BERT feature-based CNN model. In the 2020 edition of the author profiling task[ 8 ], based on their most recent 100 tweets, the aim was to identify the authors likely to disseminate false information. The winners at the shared task were [ 9 ] and [ 10 ]. On the given test set, their models’ total accuracy was 0.77. The winning strategies are based on n-grams, an SVM, and an ensemble of other machine learning models. Other ensemble models have been proposed at the following tasks hosted at PAN about irony and stereotype spreaders detection [ 11, 12 ].

We also examined a number of contemporary models for text categorization problems. It is important to note that Explainable Artificial Intelligence (XAI) techniques are increasingly being used in place of black box-based strategies. Several of these techniques based on graphs are applied in actual applications like text classification [ 13 ], trafic prediction [ 14], computer vision [15] and social networking [16]. Authors in [17] compare SVM, Naive Bayes, Logistic Regression, and Recurrent Neural Networks (RNN) as well as other popular machine learning methods. Experimental results demonstrate that SVM and Naive Bayes outperform other approaches on the dataset employed. In addition to the RNN, they do not report the evaluation of CNN or deep learning-based models. In another relevant comparative study [18], on three separate datasets, scholars assess seven machine learning methods. Gradient Boosting Algorithm, Gaussian Naive Bayes, SVM, Random Forest, AdaBoost, KNN and Multi-Layer Perceptron are among the models that were utilized. The Gradient Boosting Algorithm surpasses the others in terms of accuracy and F1 score. There are not additional deep model experiments in this work, though.

In [19] the task of automatically detecting fake news spreaders of COVID-19 news is addressed by the authors by extending the CoAID dataset[20]. A deep learning model and Transformer’s ability to produce language embeddings are combined in the authors’ stacked Transformer-based neural network.

In [21], the authors profile fake news spreaders using psycholinguistic and linguistic characteristics as input to CNN. The outcomes of their experiments demonstrate how well suggested model categorizes users as fake news spreaders. The dataset used for the authors’ comparison was created expressly for their goal. However, only BERT was tested as Transformer model, and no further investigations are provided about the performance of deep models. Their model has also been evaluated on the PAN2020 dataset in [22]. On the English and Spanish datasets, the tested model achieves a binary accuracy of 0.52 and 0.51, respectively. In the same work [22], the authors suggest a novel model that outperforms the two winning models at PAN@CLEF2020 on both languages by utilizing personality data and visual features.

In the work conducted in [23], for the purpose of sentiment classification, scholars suggest using CNN. The authors demonstrate that using consecutive convolutional layers is eficient for categorizing lengthy texts through tests with three well-known datasets.

In regards to cryptocurrencies, authors in [24] develop a number of sequence-to-sequence hyperbolic models that are suitable for bubble detection identification issues based on the power-law dynamics of cryptocurrencies and user activity on social media. The study described in [25] is intriguing from the standpoint of NLP. The authors use a combination of statistical models and NLP techniques to examine what happened in social media starting in June 2019 with a focus on the rise of the Ethereum and Bitcoin prices, in order to better understand the connections between cryptocurrency values and social media.

Finally, the survey in [26] gives a succinct rundown of various text classification algorithms. This overview discusses several ways for extracting text features, dimensionality reduction, existing algorithms and methods, and evaluation strategies.

Given the performances shown in another international multi-label text classification challenge [27] and, as discussed in [ 28, 29 ], presuming that natural language processing conventional methods can truly be outperformed by deep AI models, we decided to employ a Transformer based architecture (namely, ELECTRA [ 30 ]). Considering that the proposed task hosted at PAN@CLEF2023 consists on few-shot learning we also evaluated the augmented technique discussed in [ 31 ]. In this work the authors propose a data augmentation technique based on backtranslation to augment samples in the dataset.

3. The Proposed Approach

An empirical experiment with three stages is used to assess the suggested framework. First, datasets without our augmentation modules are used to construct the baseline of author profiling models. In the second phase, backtranslation from English to a target language and back to English is used to create enriched data. For our two submissions we used two diferent target languages. The first one is Italian, according to our previous study discussed in [ 31 ]. The second language we used was German. This choice was motivated by the promising results obtained in a similar study based on backtranslation [ 32 ]. The backtranslated sample is then concatenated to the original one. In the final stage, the augmented data are used to train ELECTRA [ 30 ] and to compare the performances with or without the backtranslation module. In our setting, each sample is a user’s set of tweets, and we hypothesise that semantically enriching the user’s tweets using our proposed modules can improve performance. By augmenting each sample with one or multiple translations, we aim to increase the diversity and informativeness of the data and improve the representation of the input, ultimately leading to better classification performance of diferent NLP models. Our results outperform the not-augmented baseline, showing that the expansion of samples with multiple languages using backtranslation leads to improved performances in author profiling tasks. Thanks to the backtranslation module our framework is able to outperform the results obtained without expanding the samples.

No preprocessing is applied to the source text in the training datasets. In Figure 1 we show the frameworks we used for our two submissions at the subtask 1. In the first submission we augmented the training set backtranslating from Italian [ 31 ] and in the second submission we backtranslated from German. In [ 31 ], as a last stage classifier, the authors did not use a Transformer but a shallow CNN instead.

The training of our model is performed on the augmented versions of the datasets. For the ifrst submission we fine-tuned ELECTRA for 30 epochs on the dataset augmented using the backtranslation tecnique with the Italian language. For the second one we used the German as a target language. We used ELECTRA both for the interesting performance in terms of training and inferencing time and results[ 30, 33 ]. In both cases we backtranslated the samples using the Google Translate API1. After the training phase, we used the fine-tuned ELECTRA to predict on the unlabeled test set provided by the task organizers.

1https://pypi.org/project/googletrans/ 4. Experimental Evaluation 4.1. Experimental Setup

Our training and inferencing models, developed in TensorFlow and using Simple Transformers2 library, are publicly available as a Jupyter Notebook on GitHub3. For the training and for the inferencing phases we made use of ELECTRA. According to what stated in [ 30 ], ELECTRA suggests to replace certain tokens with possible replacements taken from a small generator network, instead of masking input like in BERT. Then, a discriminative model is trained to predict whether each token in the corrupted input was replaced by a generator sample or not, as opposed to developing a model that predicts the original identities of the corrupted tokens. Along with a graph neural network, ELECTRA can also be employed as an embedding layer as in [ 13 ]. In our experiments, the original version of ELECTRA, presented in [ 30 ], was used. In both submissions we used a batch size of 1. We fine-tuned ELECTRA for 30 epochs. No improvements are obtained in fine-tuning for more epochs. Furthermore, we executed the ifne-tuning for five runs.

4.2. The Dataset

The dataset provided by the PAN organizers consists of a set of Twitter authors and a variable number of corresponding tweets. For each author in the training set the labels are also provided. In Figure 2 is reported the image from the oficial task website 4.

4.3. Results

The oficial metric used for the author profiling task at PAN@CLEF2023 is the Macro F1. This metric, along with others, is the same used in the rest of this section and defined in (1). 1 = ( 1) # (1)

In Table 1 we report the results of our two submissions on the augmented version of the datasets in terms of Macro F1. We report the highest Macro F1 along the 30 epochs of training and also the median one. The median is calculated along five random initialization and fine-tuning of ELECTRA. We also report the loss at the end of the training stage.

In Table 2 we report the results using all the metrics provided by the oficial evaluator available on GitHub5 for all the classes available and using the original non-augmented version of the training set. Finally, in Table 3, we report the results with the metrics already presented in Table 1, but using the original non-augmented version of the training set.

Although the Macro F1 and the accuracy prove that ELECTRA fine-tuned on the Italian backtranslated version of the dataset outperforms the German one, as can be seen from Table 2 for three out of five classes the Precision is higher in the case of the submission using German

2https://simpleTransformers.ai/about/ 3https://github.com/marco-siino/PAN-CRYPTO-2023 4https://pan.webis.de/clef23/pan23-web/author-profiling.html 5https://github.com/pan-webis-de/pan-code/tree/master/clef23/profiling-cryptocurrency-influencers

backtranslation. However, a further investigation on the efect of the backtranslation on the original samples could eventually lead to an explanation of these diferences among the classes. Finally, while on augmented dataset used for training both the fine-tuned ELECTRA are able to reach a Macro F1 equal to 0.9937, the version fine-tuned with the Italian backtranslation appears to generalize better with a gap of 5-6% with respect to Macro F1 and Accuracy when evaluated on the original non-augmented training set. On the oficial test set provided, our best submission reached a Macro F1 value equal to 0.3762.

5. Conclusion and Future Works

In this paper we have described our submitted model for our participation at the author profiling task hosted at PAN@CLEF 2023. It consists of a backtranslation layer followed by an expansion module to expand every sample in the dataset. These augmented versions of the samples are then provided to ELECTRA both for training and inference phases.

We intend to assess performance using diferent backtranslation techniques and other languages in future studies. We also consider for future works to perform an error analysis on authors who were incorrectly classified to assess the impact on the performance for the considered classification task. Increasing the model’s complexity, perhaps by utilizing other recent generative tool (i.e. ChatGPT), is another way that could eventually boost accuracy in author profiling tasks. Given the size of the dataset that was provided, additional data augmentation techniques could possibly be used. Before the training and testing phases of our model, some research into the content of each tweet could influence the construction of the model in the use of some strategies to remove noise (i.e., not relevant features) from input samples. According to our research, enhancing samples with their respective backtranslations can lead to performance improvements.

As future works, it would also be interesting to investigate the performance of our approach also on other datasets used for author profiling tasks. Furthermore, it could also be of interest to evaluate the impact of other languages used in the backtranslation module discussed here.

Acknowledgments

We would like to thank anonymous reviewers for their comments and suggestions that have helped to improve the presentation of the paper.

CRediT Authorship Contribution Statement

Marco Siino: Conceptualization, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing - Original draft, Writing - review & editing. Maurizio Tesconi: Writing - review & editing. Ilenia Tinnirello: Writing - review & editing. and Labs of the Evaluation Forum, CLEF ’2022, Bologna, Italy, 2022, pp. 573–583. [14] Y. Li, R. Yu, C. Shahabi, Y. Liu, Difusion convolutional recurrent neural network: Datadriven trafic forecasting, arXiv preprint arXiv:1707.01926 (2017). [15] P. Pradhyumna, G. Shreya, et al., Graph neural network (gnn) in image and video understanding using deep learning for computer vision applications, in: 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC), IEEE, 2021, pp. 1183–1189. [16] M. Siino, M. La Cascia, I. Tinnirello, Whosnext: Recommending twitter users to follow using a spreading activation network based approach, in: 2020 International Conference on Data Mining Workshops (ICDMW), IEEE, 2020, pp. 62–70. [17] E. M. Mahir, S. Akhter, M. R. Huq, et al., Detecting fake news using machine learning and deep learning algorithms, in: 2019 7th International Conference on Smart Computing & Communications (ICSCC), IEEE, 2019, pp. 1–5. [18] A. P. S. Bali, M. Fernandes, S. Choubey, M. Goel, Comparative performance of machine learning algorithms for fake news detection, in: International conference on advances in computing and data sciences, Springer, 2019, pp. 420–430. [19] S. Leonardi, G. Rizzo, M. Morisio, Automated classification of fake news spreaders to break the misinformation chain, Information 12 (2021) 248. [20] L. Cui, D. Lee, Coaid: Covid-19 healthcare misinformation dataset, arXiv preprint arXiv:2006.00885 (2020). [21] A. Giachanou, B. Ghanem, E. A. Ríssola, P. Rosso, F. Crestani, D. Oberski, The impact of psycholinguistic patterns in discriminating between fake news spreaders and fact checkers, Data & Knowledge Engineering 138 (2022) 101960. [22] R. Cervero, P. Rosso, G. Pasi, Profiling Fake News Spreaders: Personality and Visual Information Matter, in: International Conference on Applications of Natural Language to Information Systems, Springer, 2021, pp. 355–363. [23] H. Kim, Y.-S. Jeong, Sentiment classification using convolutional neural networks, Applied

Sciences 9 (2019) 2347. [24] R. Sawhney, S. Agarwal, V. Mittal, P. Rosso, V. Nanda, S. Chava, Cryptocurrency bubble detection: a new stock market dataset, financial task & hyperbolic models, arXiv preprint arXiv:2206.06320 (2022). [25] M. Ortu, S. Vacca, G. Destefanis, C. Conversano, Cryptocurrency ecosystems and social media environments: An empirical analysis through hawkes’ models and natural language processing, Machine Learning with Applications 7 (2022) 100229. [26] K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, D. Brown, Text classification algorithms: A survey, Information 10 (2019) 150. [27] M. Siino, M. La Cascia, I. Tinnirello, McRock at SemEval-2022 task 4: Patronizing and condescending language detection using multi-channel CNN, hybrid LSTM, DistilBERT and XLNet, in: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), Association for Computational Linguistics, Seattle, United States, 2022, pp. 409–417. URL: https://aclanthology.org/2022.semeval-1.55. doi:10.18653/v1/2022. semeval-1.55. [28] H. Wu, Y. Liu, J. Wang, Review of text classification methods on deep learning, CMC

Computers, Materials & Continua 63 (2020) 1309–1321.

A. Online Resources The source code of our model is available via

• GitHub

[1]

Bevendorf ,

Borrego-Obrador ,

Chinea-Ríos ,

Franco-Salvador ,

Fröbe ,

Heini ,

Kredens ,

Mayerl ,

Pęzik ,

Potthast ,

Rangel ,

Rosso ,

Stamatatos ,

Stein ,

Wiegmann ,

Wolska , , E. Zangerle, Overview of PAN 2023: Authorship Verification, Multi-Author Writing Style Analysis, Profiling Cryptocurrency Influencers, and Trigger Detection , in: A. Arampatzis , E. Kanoulas, T.

Tsikrika , A. G.

Stefanos Vrochidis , D.

Li , M.

Aliannejadi , M.

Vlachos , G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fourteenth International Conference of the CLEF Association (CLEF 2023 ), Lecture Notes in Computer Science, Springer, 2023 .

[2]

Chinea-Rios ,

Borrego-Obrador ,

Franco-Salvador ,

Rangel ,

Rosso , Profiling Cryptocurrency Influencers with Few shot Learning at PAN 2023, in: CLEF 2023 Labs and Workshops , Notebook Papers, 2023 .

[3]

Bevendorf ,

Chinea-Ríos ,

Franco-Salvador ,

Heini ,

Körner ,

Kredens ,

Mayerl ,

Pęzik ,

Potthast ,

Rangel , et al., Overview of pan 2023 : Authorship verification, multi-author writing style analysis, profiling cryptocurrency influencers, and trigger detection , in: Advances in Information Retrieval: 45th European Conference on Information Retrieval , ECIR 2023 , Dublin, Ireland, April 2- 6 , 2023 , Proceedings, Part

III

, Springer, 2023 , pp. 518 - 526 .

[4]

Siino ,

Tesconi , I. Tinnirello , Profiling cryptocurrency influencers with few-shot learning using data augmentation and electra , in: CLEF 2023 Labs and Workshops , Notebook Papers, 2023 .

[5]

Siino , I. Tinnirello, Xlnet on augmented dataset to profile cryptocurrency influencers , in: CLEF 2023 Labs and Workshops , Notebook Papers, 2023 .

[6]

Siino ,

E. Di

Nuovo , I. Tinnirello, M. La Cascia, Detection of hate speech spreaders using convolutional neural networks , in: PAN 2021 Profiling Hate Speech Spreaders on Twitter@ CLEF , volume 2936 , CEUR , 2021 , pp. 2126 - 2136 .

[7]

Siino ,

E. Di

Nuovo , I. Tinnirello,

M. La

Cascia , Fake news spreaders detection: Sometimes attention is not all you need , Information 13 ( 2022 ) 426 .

[8]

Rangel ,

Giachanou ,

B. H. H.

Ghanem ,

Rosso , Overview of the 8th author profiling task at pan 2020: Profiling fake news spreaders on twitter , in: CEUR Workshop Proceedings , volume 2696 , Sun

SITE

Central Europe, 2020 , pp. 1 - 18 .

[9]

Pizarro , Using n-grams to detect fake news spreaders on twitter , in: CLEF , 2020 , p. 1 .

[10]

Buda ,

Bolonyai , An Ensemble Model Using N-grams and Statistical Features to Identify Fake News Spreaders on Twitter , in: CLEF, 2020 , p. 1 .

[11]

Croce ,

Garlisi ,

Siino , An svm ensamble approach to detect irony and stereotype spreaders on twitter , in: CEUR Workshop Proceedings , volume 3180 , CEUR , 2022 , pp. 2426 - 2432 .

[12]

Siino , I. Tinnirello, M. La Cascia, T100: A modern classic ensemble to profile irony and stereotype spreaders , in: CEUR Workshop Proceedings , volume 3180 , CEUR , 2022 , pp. 2666 - 2674 .

[13]

Lomonaco , G. Donabauer,

Siino , Courage at checkthat! 2022: Harmful tweet detection using graph neural networks and electra , in: Working Notes of CLEF 2022-Conference

[29]

Hashida ,

Tamura , T. Sakai, Classifying tweets using convolutional neural networks with multi-channel distributed representation , IAENG International Journal of Computer Science 46 ( 2019 ) 68 - 75 .

[30]

Clark , M.-

Luong ,

Q. V.

Le ,

C. D.

Manning , Electra: Pre-training text encoders as discriminators rather than generators , arXiv preprint arXiv: 2003 . 10555 ( 2020 ).

[31]

Mangione ,

Siino , G. Garbo, Improving irony and stereotype spreaders detection using data augmentation and convolutional neural network , in: CEUR Workshop Proceedings , volume 3180 , CEUR , 2022 , pp. 2585 - 2593 .

[32]

D. R.

Beddiar ,

M. S.

Jahan ,

Oussalah , Data expansion using back translation and paraphrasing for hate speech detection , Online Social Networks and Media 24 ( 2021 ) 100153 .

[33]

Naseer ,

Asvial ,

R. F.

Sari , An empirical comparison of bert, roberta, and electra for fact verification , in: 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC) , IEEE, 2021 , pp. 241 - 246 .