<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">ABCD Team at HOPE 2024: Hope Detection with BERTology Models and Data Augmentation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Hong</forename><surname>Bui</surname></persName>
						</author>
						<author>
							<persName><surname>Son</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Information Technology-VNUHCM</orgName>
								<address>
									<addrLine>Quarter 6, Linh Trung Ward, Thu Duc District, Ho Chi</addrLine>
									<settlement>Minh City</settlement>
									<country key="VN">Vietnam</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Vietnam National University</orgName>
								<address>
									<addrLine>Ho Chi</addrLine>
									<settlement>Minh City</settlement>
									<country key="VN">Vietnam</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Le</forename><surname>Minh Quan</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Information Technology-VNUHCM</orgName>
								<address>
									<addrLine>Quarter 6, Linh Trung Ward, Thu Duc District, Ho Chi</addrLine>
									<settlement>Minh City</settlement>
									<country key="VN">Vietnam</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Vietnam National University</orgName>
								<address>
									<addrLine>Ho Chi</addrLine>
									<settlement>Minh City</settlement>
									<country key="VN">Vietnam</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Dang</forename><surname>Van Thin</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Information Technology-VNUHCM</orgName>
								<address>
									<addrLine>Quarter 6, Linh Trung Ward, Thu Duc District, Ho Chi</addrLine>
									<settlement>Minh City</settlement>
									<country key="VN">Vietnam</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Vietnam National University</orgName>
								<address>
									<addrLine>Ho Chi</addrLine>
									<settlement>Minh City</settlement>
									<country key="VN">Vietnam</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">ABCD Team at HOPE 2024: Hope Detection with BERTology Models and Data Augmentation</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">3D19C6147A21910A5FA7C3457B2342A4</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:41+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Hope classification</term>
					<term>Spanish language</term>
					<term>English language</term>
					<term>sentiment analysis</term>
					<term>aspect-based sentiment analysis</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents our participation in the HOPE tasks at IberLEF 2024 <ref type="bibr" target="#b3">[1,</ref> 2,<ref type="bibr" target="#b5">3,</ref><ref type="bibr" target="#b6">4,</ref> 5], focusing on two of them: Task 1: Hope for Equality, Diversity, and Inclusion, and Task 2: Hope as Expectations. To address Task 1, we implemented and investigated different techniques and strategies. We first investigated the effectiveness of pre-processing steps for social media texts. Second, we employed two data augmentation strategies to tackle the class imbalance issue in the training dataset. Finally, we implemented a fine-tuning approach based on pre-trained language models combined with a simple ensemble technique. The private test results show that our best system achieved a top 5 ranking in Task 1. For Task 2, we achieved 2nd place in the binary classification subtask for Spanish datasets and 1st place for the same subtask on English datasets. Furthermore, our best results ranked 1st in the multi-classification subtask for both languages in the competition.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>HOPE at IberLEF 2024 <ref type="bibr" target="#b3">[1,</ref><ref type="bibr">2,</ref><ref type="bibr" target="#b5">3,</ref><ref type="bibr" target="#b6">4,</ref><ref type="bibr">5]</ref> is a competition that aims to analyze the multifaceted concept of hope through Natural Language Processing (NLP). HOPE shared-task consists of two different tasks for Equality, Diversity, and Inclusion. This task is to identify the messages that promote hope and acceptance for marginalized groups on social media platforms. The challenge is designed for competitors to develop various NLP models capable of differentiating between messages that uplift and empower these communities. Success hinges on your model's ability to accurately detect hope-oriented messages within this specific social media context. Task 2 -Hope as Expectations. This second task focuses on hope as it relate to future expectations and desires. The challenge here is to build NLP models proficient in detecting expressions of hope within social media text. These models need to not only identify hope, but also categorize its nature, distinguishing between realistic and unrealistic aspirations, as well as positive hope for the future. Participating in HOPE 2024 is a unique opportunity to advance NLP significantly and address complex problems with real-world impact, pushing the boundaries of NLP tools and enhancing understanding of hope, social media, and human behavior.</p><p>In the previous year, HOPE at IberLEF 2023 <ref type="bibr" target="#b8">[6]</ref> is also organized and focusing on the task of "Multilingual Hope Speech Detection" Various approaches were proposed and made public by numerous author. Among these, I2C-Huelva <ref type="bibr" target="#b9">[7]</ref> Team applied a transformer model proposed for Spanish language, BERTuit. This team then achieved the second position and the first position for Spanish subtask and English subtask respectively. The same main approach is used by NLP URJC <ref type="bibr" target="#b10">[8]</ref>. There is a little difference while this team applied BERT for English subtask and BETO for Spanish subtasks. With their optained results, they would have ranked 8th for the Spanish subtask and 1st for the English one. However, they missed the deadline for the paper submission. Distinct from the two preceding teams, besides testing XLM-R with different model setups, Zootopi Team <ref type="bibr" target="#b11">[9]</ref> proposed two prompting scenarios for Large Language Model (ChatGPT) for the English and Spanish subtasks respectively. In the end, they achieved the 1st position in the Spanish subtask and ranked 9th in the English subtask. As we supposed, transformer-based models have been used in both subtasks and majority of the results are at the top of the competion's leaderboard. We cannot conclude that using tranformer-based models resulted in the better result than other approaches, such as traditional machine learning techniques like KNN (used by Zavira team <ref type="bibr" target="#b12">[10]</ref>) or CNN (used be LIDOMA Team <ref type="bibr" target="#b13">[11]</ref>), nor using the ChatGPT as Zootopi Team applied.</p><p>About the dataset for each task. In terms of Task 1, the dataset was collected between 2020 and 2023. It is an improved and extended version of the SpanishHopeEDI dataset <ref type="bibr">[2]</ref>. The version of the dataset for IberLEF 2024 consists of training and dev sets on LGTB-related tweets and a test set on tweets related to the LGTBI collective and other EDI topics (unknown domains). A tweet is considered as HS if the text:</p><p>• i) explicitly supports the social integration of minorities; • ii) is a positive inspiration for the collective; • iii) explicitly encourages people who might find themselves in a situation; • iv) unconditionally promotes tolerance On the contrary, a tweet is marked as NHS if the text:</p><p>• i) expresses negative sentiment towards a community • ii) explicitly seeks violence • iii) uses gender-based insults</p><p>The dataset is composed of 2,000 tweets.</p><p>In terms of Task 2, the data collection commenced by retrieving the most recent 50,000 tweets between January and June 2022. Following this, an additional batch of 50,000 tweets was acquired within the same temporal scope using keywords associated with sentiments of hope. The dataset encompassed English and Spanish tweets originating from the first half of 2022, amounting to an aggregate of approximately 100,000 tweets per language.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Methodology</head><p>To address this challenge, we employ fine-tuning with different pre-trained language models for two tasks. We also investigate how pre-processing steps affect the models' performance. This is because the data originates from a social media platform, where proper pre-processing can significantly improve overall performance. Furthermore, we utilize various data augmentation techniques to enrich the training data. Finally, we implement a simple ensemble strategy to enhance performance for both tasks further. Figure <ref type="figure" target="#fig_0">1</ref> illustrates our overall pipeline for the HOPE 2024 shared task. Details of our main components are presented below.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Pre-processing Component</head><p>While analyzing the data, we discovered that the dataset contained noise and inconsistencies. To address this, applying pre-processing steps helped clean and standardize the data. This allowed the models to understand and context better, ultimately leading to more accurate results. To demonstrate the importance of pre-processing steps, we compare two strategies, including simple and specific strategies. We apply this method to both Task 1 and Task 2 to determine whether these pre-processing methods improve performance.</p><p>• Simple pre-processing steps: For this strategy, we only apply whitespace handling and punctuation removal. Figure <ref type="figure" target="#fig_1">2</ref> illustrates the steps in the simple pre-processing strategy. • Specific pre-processing steps: For this strategy, we leverage the tweet-processer library <ref type="foot" target="#foot_0">1</ref>Raw text: "A veces si me gusta como salgo en las fotos #transgirl #transgender #trans #transwoman #transisbeautiful" Preprocessed text: "A veces si me gusta como salgo en las fotos"  because this library offers pre-processing functionalities that include: Emoji Removal, Username Removal, Specific Substring Removal, Hyperlink Removal, Text Normalization. Figure <ref type="figure" target="#fig_2">3</ref> shows an example of the specific pre-processing strategy.</p><formula xml:id="formula_0">Raw text: "#USER# #USER# #USER#</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Data Augmentation</head><p>We observed an imbalance issue between classes in Task 2. To improve overall performance, we aimed to expand the training data. To achieve this, we applied two data augmentation strategies to create new samples that can help the model learn more robust features. We briefly introduce two strategies which only applied for Task 2 as below:</p><p>• Data Combination: In this method, we combine the training datasets for English and Spanish into a single final dataset. We employ this strategy because we are utilizing multilingual models as our primary classifiers. Combining the datasets increases the number of data samples and leverages the strengths of multilingual language models. • Data Augmentation through Large Language Model: Our main idea for this approach is to utilize the power of a pre-trained large language model to diversify the samples for imbalance classes. This work uses the Gemini models to create new samples through the prompt engineering with API function <ref type="foot" target="#foot_1">2</ref> . We send a request via API to run iterates through each text sample of the train set. With each sample, we order the Gemini to generate a distinct text with the same language and structure while still maintaining the expressiveness of the text.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Classification Model</head><p>The Hope shared task<ref type="foot" target="#foot_2">3</ref> consists of two sub-tasks: Task 1: Hope for Equality, Diversity and Inclusion, and Task 2: Hope as Expectations. These sub-tasks involve binary classification and multi-class classification problems, respectively. To address these different tasks, we employ finetuning based on the pre-trained BERTology language models. Since several pre-trained language models support both English and Spanish, we implemented various models to investigate their performance on this shared task. A brief description of the models is presented below.</p><p>• XLM-R (Conneau et al. <ref type="bibr" target="#b14">[12]</ref>): This powerful language model tackles tasks across 100 languages. It leverages a technique called self-supervised learning, where it analyzes a massive dataset (2.5TB of filtered CommonCrawl data) without any human intervention. This allows XLM-RoBERTa to learn from vast amounts of publicly available text, using an automated process to create both training examples and labels from the raw data itself.</p><p>In this competition, we used both XLM-R-base and XLM-R-large. • DeBERTa (He et al. <ref type="bibr" target="#b15">[13]</ref>): We applied DeBERTa-v3-base, an improved version of DeBERTa in order to verify whether we get a superior result while using DeBERTa, a transformerbased neural language model designed to improve the BERT and RoBERTa models with two techniques: a disentangled attention mechanism and an enhanced mask decoder. • mDeBERTa-v3 (He et al. <ref type="bibr" target="#b16">[14]</ref>): Building upon the success of DeBERTa, mDeBERTa V3 extends its capabilities to handle multiple languages. It retains the core structure of DeBERTa but leverages a massive dataset known as CC100, containing 2.5 trillion words across 100 languages. This base version boasts 12 processing layers and a hidden size of 768, allowing it to capture complex relationships within text. While the model itself has 86 million parameters, the vocabulary (the set of words it understands) adds another 190 million. This extensive vocabulary ensures that mDeBERTa V3 can effectively handle a vast range of languages. • RoBERTuito (Pérez et al. <ref type="bibr" target="#b17">[15]</ref>): A pre-trained model used for Sentiment Analysis in Spanish, used 500 milion tweets while training with the RoBERTa guidelines. RoBERTuito comes in 3 flavors: cased, uncased, and uncased+deaccented. In our experiments, we use base model. • Twitter-roBERTa (Barbieri et al. <ref type="bibr" target="#b18">[16]</ref>): This RoBERTa-base model specializes in understanding the sentiment of English tweets. Trained on a massive dataset of 58 million tweets, it can effectively analyze the emotions conveyed in social media messages. (Tweet-Eval benchmark used). • Twitter-XLM-roBERTa (Barbieri et al. <ref type="bibr" target="#b19">[17]</ref>): This XLM-RoBERTa model goes beyond just English. Trained on nearly 200 million tweets in eight languages (Arabic, English, French, German, Hindi, Italian, Spanish, and Portuguese), it can identify positive, negative, or neutral sentiment in social media posts. While it's pre-trained in these specific languages, it may even understand the sentiment in others. We decided to use this model to check whether it is effective while classifying different labels of social media texts. • Bertin-RoBERTa <ref type="bibr">([18]</ref>): A series of BERT-base models for Spanish text. We applied this model in order to observe if this model is better than traditional BERT on specific Spanish subtasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Ensemble Learning approach</head><p>To improve the overall performance of our models for the HOPE at IberLEF 2024 shared task, we leverage a max voting ensemble method. This technique is commonly used for classification tasks, which aligns well with the binary and multi-class classification problems in Hope's subtasks. In max voting, multiple models make predictions for each data point in the test set. Each model's prediction is considered a "vote," and the final prediction is the class label that receives the most votes from the ensemble.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Experimental Setup</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Datasets and Evaluation Metrics</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1.">Task 1: Hope for Equality, Diversity and Inclusion</head><p>For Task 1, we used the official datasets provided by the organizers to train our models. To facilitate a comprehensive understanding of the data, we present both a table outlining the data distribution and a diagram illustrating the sequence lengths. Table <ref type="table" target="#tab_1">1</ref> presents the data distribution for the datasets used in Task 1. Divided into a training set (1400 samples), a validation set (200 samples) and a test set (400 samples). The data concerns classifying "Hope Speech" (hs) and "Not Hope Speech" (nhs). A balanced distribution is evident in the training set (700 samples each for hs and nhs), the same as the validation set (100 samples for each category).The data in the table indicates that all hope classes have a comparable number of participants (balanced). However, distribution across the three groups is uneven (different distribution variations). These balances play a crucial role in training and fine-tuning our models while also facilitating the resolution of data-related issues.</p><p>Besides, Figure <ref type="figure" target="#fig_1">2</ref> depicts the distribution of sequence length, that is, the number of words within a sequence, for two distinct categories in the datasets. There appear to be two distinct clusters of data points, suggesting a possible separation between the sequence lengths of "Hope Speech" and "Not Hope Speech" samples. Overall, the sequence length distributions for both categories exhibit a remarkable degree of similarity. However, the "hs" category appears to have some samples which have shorter sequences. The other category, "nhs" exhibits a broader distribution, encompassing a wider range of sequence lengths, including a small amount of longer samples. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2.">Task 2: Hope as Expectations</head><p>In Task 2, we also use the original datasets provided by the organizers to train our models. Table <ref type="table" target="#tab_2">2</ref> describes the statistics of datasets for Task 2. As shown in Table <ref type="table" target="#tab_2">2</ref>, it can be seen that the distribution of data cross training, validation and test sets for binary and multi-class classification subtask in this Task. The hope can be categorized as either Binary (Hope or Not Hope) or multi-class (Not Hope, Generalized Hope, Unrealistic Hope, or Realistic Hope). The table separates the data into three sets: Train, Validation, and Test, showcasing how many instances of each sentiment label are included in each set.</p><p>In terms of the Spanish corpus, the data is imbalanced across the categories. For both binary and multi-class classifications, there are significantly more instances of Not Hope compared to the positive sentiment labels ("Hope" in Binary and "Generalized Hope", "Unrealistic Hope", and "Realistic Hope" in multi-class). The imbalanced nature of the data can make it difficult for our model to learn the positive sentiment label accurately. The model might become biased towards the majority class ("Not Hope") and misclassify positive sentiment instances.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">System Settings</head><p>We deployed our main framework with the support of the Hugging Face Transformer library. All models was set up to train with 10 epochs and the learning rate was set to 2e-4 for base models and 5e-5 for large models. Considering the size of the pre-trained language models, we chose a batch size of either 32 or 16. The hyperparameters of models are tuned based on the validation set. The majority of our training are trained on Kaggle, and the P100 accelerator was selected to accelerate our training. In terms of the tokenizer, in both tasks, we used the AutoTokenizer from the pre-trained model we imported from HuggingFace. The maximum length for the sequence that the Tokenizer will generate is 512. For all our experiments, we set a fixed random seed of 42 to train the models in both Task 1 and Task 2 (English datasets and Spanish datasets). The datasets used in Task 2 have two different languages, Spanish and English. However, we decided to apply the same pre-processing methods to all datasets. However, the pre-processing process included one of our main approaches in the experiments which is discussed it more later.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiment Results and Discussion</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Task 1: Hope for Equality, Diversity and Inclusion.</head><p>In Task 1, we will observe and evaluate whether diversifying the provided datasets improves the final results. Also, we investigate whether Ensemble Learning results in different or improved results compared to the base model. Table <ref type="table" target="#tab_3">3</ref> depicts the performance of four machine learning models (XLM-R-base, RoBERTuito, DeBERTa-v3-base, mDeBERTa-v3-base) on simple preprocessed-datasets and repeat 2 models (XLM-R-base, mDeBERTa-v3-base) on specific preprocessed-datasets. When trained on data with a simple pre-processing function, a metric used to evaluate models, at 56.06%. Other models performed with scores ranging from 48.79% to 54.81%. Remarkably, both models, XLM-R-base and mDeBERTa-v3-base, exhibited a significant improvement when trained on the dataset with specific pre-processing. The mDeBERTa-v3-base model archived a massive Macro F1-score in this scenario, reaching 60.54% in terms of F1-score. The remaining models</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>This work presented our system architecture, experimental procedures, and final ranking in the HOPE 2024 competition. We implemented various techniques to investigate the performance of this shared task. This included the simple and specific pre-processing steps, dataset combination across languages, and data augmentation with large language models. We rigorously evaluated these methodologies using pre-trained models for the sub-tasks. Finally, our approach achieved the top scores in various sub-tasks. Specifically, our best system ranked in the Top 5 for Task 1, Top 2 and Top 1 for Task 2 -PolyHope Binary (Spanish and English). For Task 2 -PolyHope multi-class, we reach the Top 1 for English and Spanish language.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Our overall pipeline for the HOPE 2024 shared task.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Simple pre-processing sample.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Specific pre-processing steps.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: The distribution of sample length for each class in the training and validation sets.</figDesc><graphic coords="7,129.72,84.19,333.36,252.21" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>Ps, there are Anons who are working on military airports and installations right now. The work takes time? And even if ruskies expect them, there? s nothing they can do to stop them " Preprocessed text: "Ps there are Anons who are working on military airports and installations right now The work takes time And even if ruskies expect them theres nothing they can do to stop them"</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>The distribution of experimental datasets.</figDesc><table><row><cell>Labels</cell><cell cols="3">Training set Validation set Test set</cell></row><row><cell>Hope Speech (hs)</cell><cell>700</cell><cell>100</cell><cell>-</cell></row><row><cell>Not Hope Speech (nhs)</cell><cell>700</cell><cell>100</cell><cell>-</cell></row><row><cell>Total</cell><cell>1400</cell><cell>200</cell><cell>400</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Statistics of official datasets for Task 2.</figDesc><table><row><cell></cell><cell>Type of labels</cell><cell></cell><cell>Spanish</cell><cell></cell><cell></cell><cell>English</cell><cell></cell></row><row><cell>Binary</cell><cell>multi-class</cell><cell cols="3">Train set Validation set Test set</cell><cell cols="3">Train set Validation set Test set</cell></row><row><cell cols="2">Not Hope Not Hope</cell><cell>4701</cell><cell>799</cell><cell>-</cell><cell>3088</cell><cell>502</cell><cell>-</cell></row><row><cell></cell><cell>Generalized Hope</cell><cell>1151</cell><cell>186</cell><cell>-</cell><cell>1726</cell><cell>300</cell><cell>-</cell></row><row><cell>Hope</cell><cell>Unrealistic Hope</cell><cell>546</cell><cell>91</cell><cell>-</cell><cell>648</cell><cell>102</cell><cell>-</cell></row><row><cell></cell><cell>Realistic Hope</cell><cell>505</cell><cell>74</cell><cell>-</cell><cell>730</cell><cell>128</cell><cell>-</cell></row><row><cell></cell><cell>Total</cell><cell>6903</cell><cell>1150</cell><cell>1152</cell><cell>6192</cell><cell>1032</cell><cell>1032</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3</head><label>3</label><figDesc>Experimental result Task 1: Hope for Equality, Diversity and Inclusion.</figDesc><table><row><cell>Datasets</cell><cell>Model</cell><cell>Avg. Macro F1</cell><cell cols="2">hs(a) Precision Recall</cell><cell cols="3">nhs(b) Macro-F1 Precision Recall</cell><cell>Macro-F1</cell></row><row><cell></cell><cell>XLM-R-base</cell><cell>58.79%</cell><cell>80.95%</cell><cell>80.95%</cell><cell>62.58%</cell><cell>73.33%</cell><cell>73.33%</cell><cell>55.00%</cell></row><row><cell>Simple pre-processing</cell><cell>RoBERTuito DeBERTa-v3-base</cell><cell>54.81% 56.06%</cell><cell>73.68% 78.46%</cell><cell>73.68% 78.46%</cell><cell>63.64% 61.82%</cell><cell>49.43% 62.69%</cell><cell>49.43% 62.69%</cell><cell>45.99% 50.30%</cell></row><row><cell></cell><cell>mDeBERTa-v3-base</cell><cell>59.30%</cell><cell>85.00%</cell><cell>85.00%</cell><cell>63.57%</cell><cell>64.00%</cell><cell>64.00%</cell><cell>54.86%</cell></row><row><cell>Specific pre-processing</cell><cell>XLM-R-base mDeBERTa-v3-base</cell><cell>60.54% 60.26%</cell><cell>74.36% 75.31%</cell><cell>74.36% 75.31%</cell><cell>65.17% 67.40%</cell><cell>60.47% 61.04%</cell><cell>60.47% 61.04%</cell><cell>55.91% 53.11%</cell></row><row><cell cols="2">Ensemble Learning -Max Voting</cell><cell>61.11%</cell><cell>82.35%</cell><cell>82.35%</cell><cell>66.67%</cell><cell>62.50%</cell><cell>62.50%</cell><cell>55.56%</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://pypi.org/project/tweet-preprocessor/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://ai.google.dev/gemini-api/docs/api-overview?hl=vi</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://codalab.lisn.upsaclay.fr/competitions/17714</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>This research was supported by The VNUHCM-University of Information Technology's Scientific Research Support Fund.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>also witnessed improvements, with scores ranging from 59.30% to 60.26%. Also, Ensemble learning significantly improves the overall accuracy of the classification, achieving an average Macro F1 score of 61.11%, the highest among all evaluated models. These findings suggest that applying a wider range of pre-processing techniques can significantly enhance the performance of sentiment analysis models on social media data. While the DeBERTa-v3-base model achieved the highest with simple pre-processing, All models exhibited performance gains thanks to the enhanced dataset with additional processing steps.</p><p>Besides, we explore the application of Ensemble learning, especially Max Voting, to enhance the performance of sentiment analysis models for social media data. Our findings demonstrate that while the individual metrics for some models remain suboptimal, they still exhibit improvement compared to several single models. These results underscore the effectiveness of ensemble learning in boosting sentiment analysis performance and highlight the potential for further optimization through more sophisticated techniques.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Task 2: Hope as expectations</head><p>To inform the experimental result for Task 2, we have 4 tables. Table <ref type="table">4</ref> and Table <ref type="table">5</ref> represent the experimental results of binary classification task on both Spanish and English datasets, while Table <ref type="table">6</ref> and Table <ref type="table">7</ref> describe the result on English datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1.">Subtask 2.a: Binary Hope Speech Detection</head><p>Table <ref type="table">4</ref> presents the experimental findings for Task 2 Binary on Spanish datasets. Among the models trained on the simply preprocessed dataset, Twitter-XLM-roBERTa achieved the best performance with an M_F1 score of 83.61%. However, we decided to utilize the RoBERTuito model for further experiments in this task as it is specifically trained for Spanish social media data. However, despite employing more approaches, the subsequent methods failed to result in any improvements. Finally, only by implementing Ensemble Learning based on the previously obtained results did we observe a significant improvement and achieve the highest M_F1 score of 84.09%.</p><p>Table <ref type="table">5</ref> depicts the influence of various techniques on the performance of our BERT models. Among the evaluated models, the XLM-R-base exhibited the most promising performance on the basic dataset with simple pre-processing, achieving the highest F1-score of 86.63%. The  remaining models trained on the same datasets resulted in M_F1 scores ranging from 84.88% to 85.37%. Remarkably, applying additional pre-processing or data augmentation techniques did not resulted in any significant improvements for these models. In some cases, it even caused performance decreases compared to simple pre-processing scenarios. Besides, while Ensemble Learning did not achieve the absolute best results, it demonstrated a notable improvement compared to individual models' results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.2.">Subtask 2.b: Multi-class Hope Speech Detection</head><p>As described in Table <ref type="table">6</ref>, the models performed well on the dataset subjected to basic preprocessing. Among these, XLM-R-base and Bertin-RoBERTa models achieved the highest and second-highest M_F1 scores of 65.29% and 64.09%, respectively. However, we decided to employ additional approaches on Bertin-RoBERTa to obtain more objective results using a model finetuned specifically for the Spanish texts. Consequently, methods such as Specific pre-processing, training the model using a combined train dataset, or generating more data did not cause any remarkable results, while applying the Max Voting ensemble technique resulted in the best performance, with an M_F1 score of 66.68%. Table <ref type="table">7</ref> presents the experimental results of Task 2 multi-class Classification on the English Dataset. Overall, DeBERTa-v3 resulted in a remarkable performance on the simple processed dataset with an M_F1 score of 69.92% compared to Twitter-XLM-RoBERTa with an M_F1 score of 69.00%. Nonetheless, we decided to choose Twitter-XLM-RoBERTa for further investigations because it is a pretrainned model for sentiment analysis with social media text. Upon com-  </p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">System Ranking Concerning Task 1, among the employed models, Ensemble Learning ultimately resulted in the best Average Macro F1 score is 61.11%. However, the XLM-R-base model caused the highest Precision score</title>
		<imprint/>
	</monogr>
	<note>so we submitted its prediction achieved fifth place in the overall ranking with Average Macro F1 score is 58.79%. In terms of Task 2, the official ranking results are presented in Table 8. For Task 2 -Subtask</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">a: Binary Hope Speech Detection from Spanish datasets</title>
	</analytic>
	<monogr>
		<title level="m">Ensemble Learning emerged as the most efficacious method, achieving an M_F1 score of 84</title>
				<imprint/>
	</monogr>
	<note>.09%, which serves as the benchmark metric for ranking. Our system in this task attained the second position. For Task 2 -Subtask</note>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m">a: Binary Hope Speech Detection from English datasets</title>
				<imprint/>
	</monogr>
	<note>we attained the first position with an M_F1 score of 86.58%, demonstrating a superior outcome compared to the preceding two tasks, leveraging the XLM-R model. Transitioning to Task 2 -Subtask 2.b: multi-class Hope Speech Detection from Spanish datasets, our methodology reached an M_F1 score of 66.68% and secured the best rank utilizing the Ensemble Learning technique. Finally, in the final Task -Subtask 2.b: multi-class Hope Speech Detection from English datasets, with a M_F1 score of 72.00%, we attained the topmost position employing the XLM-R model</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Overview of IberLEF 2024: Natural Language Processing Challenges for Spanish and other Iberian Languages</title>
		<author>
			<persName><forename type="first">L</forename><surname>Chiruzzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Jiménez-Zafra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<ptr target=".org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing</title>
				<meeting>the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing<address><addrLine>SEPLN</addrLine></address></meeting>
		<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2024">2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Overview of HOPE at IberLEF 2024: Approaching Hope Speech Detection in Social Media from Two Perspectives, for Equality, Diversity and Inclusion and as Expectations</title>
		<author>
			<persName><forename type="first">D</forename><surname>García-Baena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Balouchzahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Butt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á</forename><surname>García-Cumbreras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lambebo Tonja</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>García-Díaz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bozkurt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Chakravarthi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">G</forename><surname>Ceballos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V.-G</forename><surname>Rafael</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sidorov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A</forename><surname>Ureña-López</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gelbukh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Jiménez-Zafra</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procesamiento del Lenguaje Natural</title>
		<imprint>
			<biblScope unit="volume">73</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Hope speech detection in Spanish: The LGBT case</title>
		<author>
			<persName><forename type="first">D</forename><surname>García-Baena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á</forename><surname>García-Cumbreras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Jiménez-Zafra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>García-Díaz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Valencia-García</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Language Resources and Evaluation</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="1" to="28" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">PolyHope: Two-level hope speech detection from tweets</title>
		<author>
			<persName><forename type="first">F</forename><surname>Balouchzahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sidorov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gelbukh</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.eswa.2023.120078</idno>
	</analytic>
	<monogr>
		<title level="j">Expert Systems with Applications</title>
		<imprint>
			<biblScope unit="volume">225</biblScope>
			<biblScope unit="page">120078</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Regret and hope on transformers: An analysis of transformers on regret and hope speech detection datasets</title>
		<author>
			<persName><forename type="first">G</forename><surname>Sidorov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Balouchzahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Butt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gelbukh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Sciences</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page">3983</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Overview of iberlef 2023: Natural language processing challenges for spanish and other iberian languages</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Jiménez-Zafra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>.-Y</surname></persName>
		</author>
		<author>
			<persName><surname>Gómez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023), co-located with the 39th Conference of the Spanish Society for Natural Language Processing</title>
				<meeting>the Iberian Languages Evaluation Forum (IberLEF 2023), co-located with the 39th Conference of the Spanish Society for Natural Language Processing<address><addrLine>SEPLN</addrLine></address></meeting>
		<imprint>
			<publisher>CEURWS</publisher>
			<date type="published" when="2023">2023. 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">I2c-huelva at hope2023@ iberlef: Simple use of transformers for automatic hope speech detection</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L D</forename><surname>Olmedo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Vázquez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">P</forename><surname>Álvarez</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á</forename><surname>Rodríguez-García</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Riaño-Martínez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Herranz</surname></persName>
		</author>
		<title level="m">Urjc-team at hope2023@ iberlef: Multilingual hope speech detection using transformers architecture</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Zootopi at hope2023iberlef: Is zero-shot chat-gpt the future of hope speech detection</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ngo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">T H</forename><surname>Tran</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023), co-located with the 39th Conference of the Spanish Society for Natural Language Processing</title>
				<meeting>the Iberian Languages Evaluation Forum (IberLEF 2023), co-located with the 39th Conference of the Spanish Society for Natural Language Processing<address><addrLine>SEPLN</addrLine></address></meeting>
		<imprint>
			<publisher>CEURWS</publisher>
			<date type="published" when="2023">2023. 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">Z</forename><surname>Ahani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sidorov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Kolesnikova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gelbukh</surname></persName>
		</author>
		<title level="m">Zavira at hope2023@ iberlef: Hope speech detection from text using tf-idf features and machine learning algorithms</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Lidoma at hope2023@iberlef: Hope speech detection using lexical features and convolutional neural networks</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Tash</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Armenta-Segura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Kolesnikova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sidorov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">F</forename><surname>Gelbukh</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:265309454" />
	</analytic>
	<monogr>
		<title level="m">IberLEF@SEPLN</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Unsupervised cross-lingual representation learning at scale</title>
		<author>
			<persName><forename type="first">A</forename><surname>Conneau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Khandelwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Chaudhary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Wenzek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Guzmán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.747</idno>
		<ptr target="https://aclanthology.org/2020.acl-main.747.doi:10.18653/v1/2020.acl-main.747" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Chai</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Schluter</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Tetreault</surname></persName>
		</editor>
		<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="8440" to="8451" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Deberta: Decoding-enhanced bert with disentangled attention</title>
		<author>
			<persName><forename type="first">P</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2111.09543</idno>
		<title level="m">Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Robertuito: a pre-trained language model for social media text in spanish</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Pérez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Furman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Alonso Alemany</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Luque</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2022.lrec-1.785" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Language Resources and Evaluation Conference, European Language Resources Association</title>
				<meeting>the Language Resources and Evaluation Conference, European Language Resources Association<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="7235" to="7243" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">TweetEval: Unified benchmark and comparative evaluation for tweet classification</title>
		<author>
			<persName><forename type="first">F</forename><surname>Barbieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Camacho-Collados</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Espinosa Anke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Neves</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.findings-emnlp.148</idno>
		<ptr target="https://aclanthology.org/2020.findings-emnlp.148.doi:10.18653/v1/2020.findings-emnlp.148" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1644" to="1650" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">XLM-T: Multilingual language models in Twitter for sentiment analysis and beyond</title>
		<author>
			<persName><forename type="first">F</forename><surname>Barbieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Espinosa Anke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Camacho-Collados</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2022.lrec-1.27" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Thirteenth Language Resources and Evaluation Conference, European Language Resources Association</title>
				<meeting>the Thirteenth Language Resources and Evaluation Conference, European Language Resources Association<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="258" to="266" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Efficient pre-training of a spanish language model using perplexity sampling</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>La Rosa Y Eduardo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Ponferrada Y Manu Romero Y</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paulo</forename><surname>Villegas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pablo</forename><surname>González De Prado Salas Y María</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bertin</forename><surname>Grandury</surname></persName>
		</author>
		<ptr target="http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6403" />
	</analytic>
	<monogr>
		<title level="j">Procesamiento del Lenguaje Natural</title>
		<imprint>
			<biblScope unit="volume">68</biblScope>
			<biblScope unit="page" from="13" to="23" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
