ABCD Team at HOPE 2024: Hope Detection with BERTology Models and Data Augmentation

ABCD Team at HOPE 2024: Hope Detection with BERTology Models and Data Augmentation HongBui Son University of Information Technology-VNUHCM

Quarter 6, Linh Trung Ward, Thu Duc District, Ho Chi Minh City Vietnam

Vietnam National University

Ho Chi Minh City Vietnam

LeMinh Quan University of Information Technology-VNUHCM

Quarter 6, Linh Trung Ward, Thu Duc District, Ho Chi Minh City Vietnam

Vietnam National University

Ho Chi Minh City Vietnam

DangVan Thin University of Information Technology-VNUHCM

Quarter 6, Linh Trung Ward, Thu Duc District, Ho Chi Minh City Vietnam

Vietnam National University

Ho Chi Minh City Vietnam

ABCD Team at HOPE 2024: Hope Detection with BERTology Models and Data Augmentation 1613-0073 3D19C6147A21910A5FA7C3457B2342A4 GROBID - A machine learning software for extracting information from scholarly documents Hope classification Spanish language English language sentiment analysis aspect-based sentiment analysis

This paper presents our participation in the HOPE tasks at IberLEF 2024 [1, 2,3,4, 5], focusing on two of them: Task 1: Hope for Equality, Diversity, and Inclusion, and Task 2: Hope as Expectations. To address Task 1, we implemented and investigated different techniques and strategies. We first investigated the effectiveness of pre-processing steps for social media texts. Second, we employed two data augmentation strategies to tackle the class imbalance issue in the training dataset. Finally, we implemented a fine-tuning approach based on pre-trained language models combined with a simple ensemble technique. The private test results show that our best system achieved a top 5 ranking in Task 1. For Task 2, we achieved 2nd place in the binary classification subtask for Spanish datasets and 1st place for the same subtask on English datasets. Furthermore, our best results ranked 1st in the multi-classification subtask for both languages in the competition.

Introduction

HOPE at IberLEF 2024 [1,2,3,4,5] is a competition that aims to analyze the multifaceted concept of hope through Natural Language Processing (NLP). HOPE shared-task consists of two different tasks for Equality, Diversity, and Inclusion. This task is to identify the messages that promote hope and acceptance for marginalized groups on social media platforms. The challenge is designed for competitors to develop various NLP models capable of differentiating between messages that uplift and empower these communities. Success hinges on your model's ability to accurately detect hope-oriented messages within this specific social media context. Task 2 -Hope as Expectations. This second task focuses on hope as it relate to future expectations and desires. The challenge here is to build NLP models proficient in detecting expressions of hope within social media text. These models need to not only identify hope, but also categorize its nature, distinguishing between realistic and unrealistic aspirations, as well as positive hope for the future. Participating in HOPE 2024 is a unique opportunity to advance NLP significantly and address complex problems with real-world impact, pushing the boundaries of NLP tools and enhancing understanding of hope, social media, and human behavior.

In the previous year, HOPE at IberLEF 2023 [6] is also organized and focusing on the task of "Multilingual Hope Speech Detection" Various approaches were proposed and made public by numerous author. Among these, I2C-Huelva [7] Team applied a transformer model proposed for Spanish language, BERTuit. This team then achieved the second position and the first position for Spanish subtask and English subtask respectively. The same main approach is used by NLP URJC [8]. There is a little difference while this team applied BERT for English subtask and BETO for Spanish subtasks. With their optained results, they would have ranked 8th for the Spanish subtask and 1st for the English one. However, they missed the deadline for the paper submission. Distinct from the two preceding teams, besides testing XLM-R with different model setups, Zootopi Team [9] proposed two prompting scenarios for Large Language Model (ChatGPT) for the English and Spanish subtasks respectively. In the end, they achieved the 1st position in the Spanish subtask and ranked 9th in the English subtask. As we supposed, transformer-based models have been used in both subtasks and majority of the results are at the top of the competion's leaderboard. We cannot conclude that using tranformer-based models resulted in the better result than other approaches, such as traditional machine learning techniques like KNN (used by Zavira team [10]) or CNN (used be LIDOMA Team [11]), nor using the ChatGPT as Zootopi Team applied.

About the dataset for each task. In terms of Task 1, the dataset was collected between 2020 and 2023. It is an improved and extended version of the SpanishHopeEDI dataset [2]. The version of the dataset for IberLEF 2024 consists of training and dev sets on LGTB-related tweets and a test set on tweets related to the LGTBI collective and other EDI topics (unknown domains). A tweet is considered as HS if the text:

• i) explicitly supports the social integration of minorities; • ii) is a positive inspiration for the collective; • iii) explicitly encourages people who might find themselves in a situation; • iv) unconditionally promotes tolerance On the contrary, a tweet is marked as NHS if the text:

• i) expresses negative sentiment towards a community • ii) explicitly seeks violence • iii) uses gender-based insults

The dataset is composed of 2,000 tweets.

In terms of Task 2, the data collection commenced by retrieving the most recent 50,000 tweets between January and June 2022. Following this, an additional batch of 50,000 tweets was acquired within the same temporal scope using keywords associated with sentiments of hope. The dataset encompassed English and Spanish tweets originating from the first half of 2022, amounting to an aggregate of approximately 100,000 tweets per language.

Methodology

To address this challenge, we employ fine-tuning with different pre-trained language models for two tasks. We also investigate how pre-processing steps affect the models' performance. This is because the data originates from a social media platform, where proper pre-processing can significantly improve overall performance. Furthermore, we utilize various data augmentation techniques to enrich the training data. Finally, we implement a simple ensemble strategy to enhance performance for both tasks further. Figure 1 illustrates our overall pipeline for the HOPE 2024 shared task. Details of our main components are presented below.

Pre-processing Component

While analyzing the data, we discovered that the dataset contained noise and inconsistencies. To address this, applying pre-processing steps helped clean and standardize the data. This allowed the models to understand and context better, ultimately leading to more accurate results. To demonstrate the importance of pre-processing steps, we compare two strategies, including simple and specific strategies. We apply this method to both Task 1 and Task 2 to determine whether these pre-processing methods improve performance.

• Simple pre-processing steps: For this strategy, we only apply whitespace handling and punctuation removal. Figure 2 illustrates the steps in the simple pre-processing strategy. • Specific pre-processing steps: For this strategy, we leverage the tweet-processer library 1Raw text: "A veces si me gusta como salgo en las fotos #transgirl #transgender #trans #transwoman #transisbeautiful" Preprocessed text: "A veces si me gusta como salgo en las fotos" because this library offers pre-processing functionalities that include: Emoji Removal, Username Removal, Specific Substring Removal, Hyperlink Removal, Text Normalization. Figure 3 shows an example of the specific pre-processing strategy.

Raw text: "#USER# #USER# #USER#

Data Augmentation

We observed an imbalance issue between classes in Task 2. To improve overall performance, we aimed to expand the training data. To achieve this, we applied two data augmentation strategies to create new samples that can help the model learn more robust features. We briefly introduce two strategies which only applied for Task 2 as below:

• Data Combination: In this method, we combine the training datasets for English and Spanish into a single final dataset. We employ this strategy because we are utilizing multilingual models as our primary classifiers. Combining the datasets increases the number of data samples and leverages the strengths of multilingual language models. • Data Augmentation through Large Language Model: Our main idea for this approach is to utilize the power of a pre-trained large language model to diversify the samples for imbalance classes. This work uses the Gemini models to create new samples through the prompt engineering with API function 2 . We send a request via API to run iterates through each text sample of the train set. With each sample, we order the Gemini to generate a distinct text with the same language and structure while still maintaining the expressiveness of the text.

Classification Model

The Hope shared task3 consists of two sub-tasks: Task 1: Hope for Equality, Diversity and Inclusion, and Task 2: Hope as Expectations. These sub-tasks involve binary classification and multi-class classification problems, respectively. To address these different tasks, we employ finetuning based on the pre-trained BERTology language models. Since several pre-trained language models support both English and Spanish, we implemented various models to investigate their performance on this shared task. A brief description of the models is presented below.

• XLM-R (Conneau et al. [12]): This powerful language model tackles tasks across 100 languages. It leverages a technique called self-supervised learning, where it analyzes a massive dataset (2.5TB of filtered CommonCrawl data) without any human intervention. This allows XLM-RoBERTa to learn from vast amounts of publicly available text, using an automated process to create both training examples and labels from the raw data itself.

In this competition, we used both XLM-R-base and XLM-R-large. • DeBERTa (He et al. [13]): We applied DeBERTa-v3-base, an improved version of DeBERTa in order to verify whether we get a superior result while using DeBERTa, a transformerbased neural language model designed to improve the BERT and RoBERTa models with two techniques: a disentangled attention mechanism and an enhanced mask decoder. • mDeBERTa-v3 (He et al. [14]): Building upon the success of DeBERTa, mDeBERTa V3 extends its capabilities to handle multiple languages. It retains the core structure of DeBERTa but leverages a massive dataset known as CC100, containing 2.5 trillion words across 100 languages. This base version boasts 12 processing layers and a hidden size of 768, allowing it to capture complex relationships within text. While the model itself has 86 million parameters, the vocabulary (the set of words it understands) adds another 190 million. This extensive vocabulary ensures that mDeBERTa V3 can effectively handle a vast range of languages. • RoBERTuito (Pérez et al. [15]): A pre-trained model used for Sentiment Analysis in Spanish, used 500 milion tweets while training with the RoBERTa guidelines. RoBERTuito comes in 3 flavors: cased, uncased, and uncased+deaccented. In our experiments, we use base model. • Twitter-roBERTa (Barbieri et al. [16]): This RoBERTa-base model specializes in understanding the sentiment of English tweets. Trained on a massive dataset of 58 million tweets, it can effectively analyze the emotions conveyed in social media messages. (Tweet-Eval benchmark used). • Twitter-XLM-roBERTa (Barbieri et al. [17]): This XLM-RoBERTa model goes beyond just English. Trained on nearly 200 million tweets in eight languages (Arabic, English, French, German, Hindi, Italian, Spanish, and Portuguese), it can identify positive, negative, or neutral sentiment in social media posts. While it's pre-trained in these specific languages, it may even understand the sentiment in others. We decided to use this model to check whether it is effective while classifying different labels of social media texts. • Bertin-RoBERTa ([18]): A series of BERT-base models for Spanish text. We applied this model in order to observe if this model is better than traditional BERT on specific Spanish subtasks.

Ensemble Learning approach

To improve the overall performance of our models for the HOPE at IberLEF 2024 shared task, we leverage a max voting ensemble method. This technique is commonly used for classification tasks, which aligns well with the binary and multi-class classification problems in Hope's subtasks. In max voting, multiple models make predictions for each data point in the test set. Each model's prediction is considered a "vote," and the final prediction is the class label that receives the most votes from the ensemble.

Experimental Setup

Datasets and Evaluation Metrics

Task 1: Hope for Equality, Diversity and Inclusion

For Task 1, we used the official datasets provided by the organizers to train our models. To facilitate a comprehensive understanding of the data, we present both a table outlining the data distribution and a diagram illustrating the sequence lengths. Table 1 presents the data distribution for the datasets used in Task 1. Divided into a training set (1400 samples), a validation set (200 samples) and a test set (400 samples). The data concerns classifying "Hope Speech" (hs) and "Not Hope Speech" (nhs). A balanced distribution is evident in the training set (700 samples each for hs and nhs), the same as the validation set (100 samples for each category).The data in the table indicates that all hope classes have a comparable number of participants (balanced). However, distribution across the three groups is uneven (different distribution variations). These balances play a crucial role in training and fine-tuning our models while also facilitating the resolution of data-related issues.

Besides, Figure 2 depicts the distribution of sequence length, that is, the number of words within a sequence, for two distinct categories in the datasets. There appear to be two distinct clusters of data points, suggesting a possible separation between the sequence lengths of "Hope Speech" and "Not Hope Speech" samples. Overall, the sequence length distributions for both categories exhibit a remarkable degree of similarity. However, the "hs" category appears to have some samples which have shorter sequences. The other category, "nhs" exhibits a broader distribution, encompassing a wider range of sequence lengths, including a small amount of longer samples.

Task 2: Hope as Expectations

In Task 2, we also use the original datasets provided by the organizers to train our models. Table 2 describes the statistics of datasets for Task 2. As shown in Table 2, it can be seen that the distribution of data cross training, validation and test sets for binary and multi-class classification subtask in this Task. The hope can be categorized as either Binary (Hope or Not Hope) or multi-class (Not Hope, Generalized Hope, Unrealistic Hope, or Realistic Hope). The table separates the data into three sets: Train, Validation, and Test, showcasing how many instances of each sentiment label are included in each set.

In terms of the Spanish corpus, the data is imbalanced across the categories. For both binary and multi-class classifications, there are significantly more instances of Not Hope compared to the positive sentiment labels ("Hope" in Binary and "Generalized Hope", "Unrealistic Hope", and "Realistic Hope" in multi-class). The imbalanced nature of the data can make it difficult for our model to learn the positive sentiment label accurately. The model might become biased towards the majority class ("Not Hope") and misclassify positive sentiment instances.

System Settings

We deployed our main framework with the support of the Hugging Face Transformer library. All models was set up to train with 10 epochs and the learning rate was set to 2e-4 for base models and 5e-5 for large models. Considering the size of the pre-trained language models, we chose a batch size of either 32 or 16. The hyperparameters of models are tuned based on the validation set. The majority of our training are trained on Kaggle, and the P100 accelerator was selected to accelerate our training. In terms of the tokenizer, in both tasks, we used the AutoTokenizer from the pre-trained model we imported from HuggingFace. The maximum length for the sequence that the Tokenizer will generate is 512. For all our experiments, we set a fixed random seed of 42 to train the models in both Task 1 and Task 2 (English datasets and Spanish datasets). The datasets used in Task 2 have two different languages, Spanish and English. However, we decided to apply the same pre-processing methods to all datasets. However, the pre-processing process included one of our main approaches in the experiments which is discussed it more later.

Experiment Results and Discussion

Task 1: Hope for Equality, Diversity and Inclusion.

In Task 1, we will observe and evaluate whether diversifying the provided datasets improves the final results. Also, we investigate whether Ensemble Learning results in different or improved results compared to the base model. Table 3 depicts the performance of four machine learning models (XLM-R-base, RoBERTuito, DeBERTa-v3-base, mDeBERTa-v3-base) on simple preprocessed-datasets and repeat 2 models (XLM-R-base, mDeBERTa-v3-base) on specific preprocessed-datasets. When trained on data with a simple pre-processing function, a metric used to evaluate models, at 56.06%. Other models performed with scores ranging from 48.79% to 54.81%. Remarkably, both models, XLM-R-base and mDeBERTa-v3-base, exhibited a significant improvement when trained on the dataset with specific pre-processing. The mDeBERTa-v3-base model archived a massive Macro F1-score in this scenario, reaching 60.54% in terms of F1-score. The remaining models

Conclusion

This work presented our system architecture, experimental procedures, and final ranking in the HOPE 2024 competition. We implemented various techniques to investigate the performance of this shared task. This included the simple and specific pre-processing steps, dataset combination across languages, and data augmentation with large language models. We rigorously evaluated these methodologies using pre-trained models for the sub-tasks. Finally, our approach achieved the top scores in various sub-tasks. Specifically, our best system ranked in the Top 5 for Task 1, Top 2 and Top 1 for Task 2 -PolyHope Binary (Spanish and English). For Task 2 -PolyHope multi-class, we reach the Top 1 for English and Spanish language.

Figure 1 :1Figure 1: Our overall pipeline for the HOPE 2024 shared task.

Figure 2 :2Figure 2: Simple pre-processing sample.

Figure 3 :3Figure 3: Specific pre-processing steps.

Figure 4 :4Figure 4: The distribution of sample length for each class in the training and validation sets.

Ps, there are Anons who are working on military airports and installations right now. The work takes time? And even if ruskies expect them, there? s nothing they can do to stop them " Preprocessed text: "Ps there are Anons who are working on military airports and installations right now The work takes time And even if ruskies expect them theres nothing they can do to stop them"

Table 11The distribution of experimental datasets.

LabelsTraining set Validation set Test setHope Speech (hs)700100-Not Hope Speech (nhs)700100-Total1400200400

Table 22Statistics of official datasets for Task 2.Type of labelsSpanishEnglishBinarymulti-classTrain set Validation set Test setTrain set Validation set Test setNot Hope Not Hope4701799-3088502-Generalized Hope1151186-1726300-HopeUnrealistic Hope54691-648102-Realistic Hope50574-730128-Total690311501152619210321032

Table 33Experimental result Task 1: Hope for Equality, Diversity and Inclusion.DatasetsModelAvg. Macro F1hs(a) Precision Recallnhs(b) Macro-F1 Precision RecallMacro-F1XLM-R-base58.79%80.95%80.95%62.58%73.33%73.33%55.00%Simple pre-processingRoBERTuito DeBERTa-v3-base54.81% 56.06%73.68% 78.46%73.68% 78.46%63.64% 61.82%49.43% 62.69%49.43% 62.69%45.99% 50.30%mDeBERTa-v3-base59.30%85.00%85.00%63.57%64.00%64.00%54.86%Specific pre-processingXLM-R-base mDeBERTa-v3-base60.54% 60.26%74.36% 75.31%74.36% 75.31%65.17% 67.40%60.47% 61.04%60.47% 61.04%55.91% 53.11%Ensemble Learning -Max Voting61.11%82.35%82.35%66.67%62.50%62.50%55.56%

https://pypi.org/project/tweet-preprocessor/ https://ai.google.dev/gemini-api/docs/api-overview?hl=vi https://codalab.lisn.upsaclay.fr/competitions/17714

Acknowledgements

This research was supported by The VNUHCM-University of Information Technology's Scientific Research Support Fund.

also witnessed improvements, with scores ranging from 59.30% to 60.26%. Also, Ensemble learning significantly improves the overall accuracy of the classification, achieving an average Macro F1 score of 61.11%, the highest among all evaluated models. These findings suggest that applying a wider range of pre-processing techniques can significantly enhance the performance of sentiment analysis models on social media data. While the DeBERTa-v3-base model achieved the highest with simple pre-processing, All models exhibited performance gains thanks to the enhanced dataset with additional processing steps.

Besides, we explore the application of Ensemble learning, especially Max Voting, to enhance the performance of sentiment analysis models for social media data. Our findings demonstrate that while the individual metrics for some models remain suboptimal, they still exhibit improvement compared to several single models. These results underscore the effectiveness of ensemble learning in boosting sentiment analysis performance and highlight the potential for further optimization through more sophisticated techniques.

Task 2: Hope as expectations

To inform the experimental result for Task 2, we have 4 tables. Table 4 and Table 5 represent the experimental results of binary classification task on both Spanish and English datasets, while Table 6 and Table 7 describe the result on English datasets.

Subtask 2.a: Binary Hope Speech Detection

Table 4 presents the experimental findings for Task 2 Binary on Spanish datasets. Among the models trained on the simply preprocessed dataset, Twitter-XLM-roBERTa achieved the best performance with an M_F1 score of 83.61%. However, we decided to utilize the RoBERTuito model for further experiments in this task as it is specifically trained for Spanish social media data. However, despite employing more approaches, the subsequent methods failed to result in any improvements. Finally, only by implementing Ensemble Learning based on the previously obtained results did we observe a significant improvement and achieve the highest M_F1 score of 84.09%.

Table 5 depicts the influence of various techniques on the performance of our BERT models. Among the evaluated models, the XLM-R-base exhibited the most promising performance on the basic dataset with simple pre-processing, achieving the highest F1-score of 86.63%. The remaining models trained on the same datasets resulted in M_F1 scores ranging from 84.88% to 85.37%. Remarkably, applying additional pre-processing or data augmentation techniques did not resulted in any significant improvements for these models. In some cases, it even caused performance decreases compared to simple pre-processing scenarios. Besides, while Ensemble Learning did not achieve the absolute best results, it demonstrated a notable improvement compared to individual models' results.

Subtask 2.b: Multi-class Hope Speech Detection

As described in Table 6, the models performed well on the dataset subjected to basic preprocessing. Among these, XLM-R-base and Bertin-RoBERTa models achieved the highest and second-highest M_F1 scores of 65.29% and 64.09%, respectively. However, we decided to employ additional approaches on Bertin-RoBERTa to obtain more objective results using a model finetuned specifically for the Spanish texts. Consequently, methods such as Specific pre-processing, training the model using a combined train dataset, or generating more data did not cause any remarkable results, while applying the Max Voting ensemble technique resulted in the best performance, with an M_F1 score of 66.68%. Table 7 presents the experimental results of Task 2 multi-class Classification on the English Dataset. Overall, DeBERTa-v3 resulted in a remarkable performance on the simple processed dataset with an M_F1 score of 69.92% compared to Twitter-XLM-RoBERTa with an M_F1 score of 69.00%. Nonetheless, we decided to choose Twitter-XLM-RoBERTa for further investigations because it is a pretrainned model for sentiment analysis with social media text. Upon com-

System Ranking Concerning Task 1, among the employed models, Ensemble Learning ultimately resulted in the best Average Macro F1 score is 61.11%. However, the XLM-R-base model caused the highest Precision score so we submitted its prediction achieved fifth place in the overall ranking with Average Macro F1 score is 58.79%. In terms of Task 2, the official ranking results are presented in Table 8. For Task 2 -Subtask a: Binary Hope Speech Detection from Spanish datasets Ensemble Learning emerged as the most efficacious method, achieving an M_F1 score of 84 .09%, which serves as the benchmark metric for ranking. Our system in this task attained the second position. For Task 2 -Subtask a: Binary Hope Speech Detection from English datasets we attained the first position with an M_F1 score of 86.58%, demonstrating a superior outcome compared to the preceding two tasks, leveraging the XLM-R model. Transitioning to Task 2 -Subtask 2.b: multi-class Hope Speech Detection from Spanish datasets, our methodology reached an M_F1 score of 66.68% and secured the best rank utilizing the Ensemble Learning technique. Finally, in the final Task -Subtask 2.b: multi-class Hope Speech Detection from English datasets, with a M_F1 score of 72.00%, we attained the topmost position employing the XLM-R model Overview of IberLEF 2024: Natural Language Processing Challenges for Spanish and other Iberian Languages LChiruzzo SMJiménez-Zafra FRangel Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing

SEPLN

CEUR-WS 2024. 2024 Overview of HOPE at IberLEF 2024: Approaching Hope Speech Detection in Social Media from Two Perspectives, for Equality, Diversity and Inclusion and as Expectations DGarcía-Baena FBalouchzahi SButt MÁGarcía-Cumbreras ALambebo Tonja JAGarcía-Díaz SBozkurt BRChakravarthi HGCeballos V.-GRafael GSidorov LAUreña-López AGelbukh SMJiménez-Zafra Procesamiento del Lenguaje Natural 73 2024 Hope speech detection in Spanish: The LGBT case DGarcía-Baena MÁGarcía-Cumbreras SMJiménez-Zafra JAGarcía-Díaz RValencia-García Language Resources and Evaluation 2023 PolyHope: Two-level hope speech detection from tweets FBalouchzahi GSidorov AGelbukh 10.1016/j.eswa.2023.120078 Expert Systems with Applications 225 120078 2023 Regret and hope on transformers: An analysis of transformers on regret and hope speech detection datasets GSidorov FBalouchzahi SButt AGelbukh Applied Sciences 13 3983 2023 Overview of iberlef 2023: Natural language processing challenges for spanish and other iberian languages SMJiménez-Zafra FRangel MM.-Y Gómez Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023), co-located with the 39th Conference of the Spanish Society for Natural Language Processing the Iberian Languages Evaluation Forum (IberLEF 2023), co-located with the 39th Conference of the Spanish Society for Natural Language Processing

SEPLN

CEURWS 2023. 2023 I2c-huelva at hope2023@ iberlef: Simple use of transformers for automatic hope speech detection JL DOlmedo JMVázquez VPÁlvarez 2023 MÁRodríguez-García ARiaño-Martínez SMHerranz Urjc-team at hope2023@ iberlef: Multilingual hope speech detection using transformers architecture 2023 Zootopi at hope2023iberlef: Is zero-shot chat-gpt the future of hope speech detection ANgo HT HTran Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023), co-located with the 39th Conference of the Spanish Society for Natural Language Processing the Iberian Languages Evaluation Forum (IberLEF 2023), co-located with the 39th Conference of the Spanish Society for Natural Language Processing

SEPLN

CEURWS 2023. 2023 ZAhani GSidorov OKolesnikova AGelbukh Zavira at hope2023@ iberlef: Hope speech detection from text using tf-idf features and machine learning algorithms 2023 Lidoma at hope2023@iberlef: Hope speech detection using lexical features and convolutional neural networks MSTash JArmenta-Segura OKolesnikova GSidorov AFGelbukh IberLEF@SEPLN 2023 Unsupervised cross-lingual representation learning at scale AConneau KKhandelwal NGoyal VChaudhary GWenzek FGuzmán EGrave MOtt LZettlemoyer VStoyanov 10.18653/v1/2020.acl-main.747 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics DJurafsky JChai NSchluter JTetreault the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics 2020 Deberta: Decoding-enhanced bert with disentangled attention PHe XLiu JGao WChen International Conference on Learning Representations 2020 PHe JGao WChen arXiv:2111.09543 Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing 2021 Robertuito: a pre-trained language model for social media text in spanish JMPérez DAFurman LAlonso Alemany FMLuque Proceedings of the Language Resources and Evaluation Conference, European Language Resources Association the Language Resources and Evaluation Conference, European Language Resources Association

Marseille, France

2022 TweetEval: Unified benchmark and comparative evaluation for tweet classification FBarbieri JCamacho-Collados LEspinosa Anke LNeves 10.18653/v1/2020.findings-emnlp.148 Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics 2020 XLM-T: Multilingual language models in Twitter for sentiment analysis and beyond FBarbieri LEspinosa Anke JCamacho-Collados Proceedings of the Thirteenth Language Resources and Evaluation Conference, European Language Resources Association the Thirteenth Language Resources and Evaluation Conference, European Language Resources Association

Marseille, France

2022 Efficient pre-training of a spanish language model using perplexity sampling JDLa Rosa Y Eduardo GPonferrada Y Manu Romero Y PauloVillegas PabloGonzález De Prado Salas Y María BertinGrandury Procesamiento del Lenguaje Natural 68 2022