<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Sentiment Analysis for Spanish Tweets based on Continual Pre-training and Data Augmentation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Yingwen</forename><surname>Fu</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Information Science and Technology</orgName>
								<orgName type="institution">Guangdong University of Foreign Studies</orgName>
								<address>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ziyu</forename><surname>Yang</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Information Science and Technology</orgName>
								<orgName type="institution">Guangdong University of Foreign Studies</orgName>
								<address>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nankai</forename><surname>Lin</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Information Science and Technology</orgName>
								<orgName type="institution">Guangdong University of Foreign Studies</orgName>
								<address>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Lianxi</forename><surname>Wang</surname></persName>
							<email>wanglianxi@gdufs.edu.cn</email>
							<affiliation key="aff0">
								<orgName type="department">School of Information Science and Technology</orgName>
								<orgName type="institution">Guangdong University of Foreign Studies</orgName>
								<address>
									<country key="CN">China</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="laboratory">Guangzhou Key Laboratory of Multilingual Intelligent Processing</orgName>
								<orgName type="institution">Guangdong Uni-versity of Foreign Studies</orgName>
								<address>
									<settlement>Guangzhou</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Feng</forename><surname>Chen</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Information Science and Technology</orgName>
								<orgName type="institution">Guangdong University of Foreign Studies</orgName>
								<address>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Sentiment Analysis for Spanish Tweets based on Continual Pre-training and Data Augmentation</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">170B0C406844BF5DFE3C7B00982ECE23</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T00:22+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Sentiment Analysis</term>
					<term>BERT</term>
					<term>Continual Pre-training</term>
					<term>Back Translation</term>
					<term>Mix up</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we report the solution of the team BERT4EVER for the sentiment analysis task for Spanish tweets in EmoEvalEs@IberLEF 2021, which aims to classify Spanish tweets into one of the following emotional categories: Anger, Disgust, Fear, Joy, Sadness, Surprise or Others. We adopt the monolingual Spanish BERT model to tackle the problem. In addition, we leverage two augmented strategies to enhance the classic fine-tuned model, namely continual pre-training and data augmentation to improve the generalization capability. Experimental results demonstrate the effectiveness of the BERT model and two augmented strategies.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Sentiment analysis is an important task in the field of natural language processing (NLP). It is often used to determine which type of emotion a text belongs to <ref type="bibr" target="#b0">[1]</ref>. However, due to the lack of voice modulations and facial expressions, understanding the emotions expressed by users on social media such as Twitter is a difficult task <ref type="bibr" target="#b1">[2]</ref>.</p><p>Researchers are constantly pursuing efficient algorithms to achieve better classification results. <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref> Therefore, in EmoEvalEs@IberLEF 2021 <ref type="bibr" target="#b13">[14]</ref>, a sentiment analysis task is proposed <ref type="bibr" target="#b14">[15]</ref>, requiring participants to perform sentiment analysis and evaluation of tweets in Spanish and classify them into one of the following emotional categories: Anger, Disgust, Fear, Joy, Sadness, Surprise or Others. This track provides Spanish tweets and the corresponding categories for participants to conduct sentiment classification experiment. However, there are two main challenges for this task:</p><p>1) The dataset size is relatively small, which is far from the amount of data required for training of commonly used classification models such as BERT <ref type="bibr" target="#b4">[5]</ref> and Bi-LSTM <ref type="bibr" target="#b5">[6]</ref>.</p><p>2) The proportion of categories is extremely imbalanced, in the provided dataset, the proportion of Fear and Disgust is much smaller than that of Others and Joy.</p><p>In tackle to the issues above, we, the BERT4EVER team, have leveraged two strategies to boost the classification performance: Continual Pre-training and Data Augmentation. These two strategies can effectively compensate for the two problems of small data size and imbalanced category proportions, so that the trained model has yielded better performance.</p><p>The remaining structure of the article is as follows. In Section 2 we will describe the task and data set given by the organizer in detail. Then in Section 3 our specific implementation is given. The final experimental results and conclusions are shown in the Section 4 and Section 5 respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Task Description</head><p>The aim of the task is to classify the sentiment conveyed in a Spanish tweet. The task is tough because it lacks the facial expression and intonation and the sentiment can be divided into the following sentiment classes: Anger, Disgust, Fear, Joy, Sadness, Surprise or Others (the sentiment conveyed in a tweet as 'neural' or no sentiment).</p><p>The datasets <ref type="bibr" target="#b6">[7]</ref> involved in this task were provided by the organizer of the Codalab. There are about 18,000 training datasets. In addition to the tweet, the labels of the dataset also include whether the tweet is offensive and what event the tweet is about. Some statistics about the training set are shown in Table <ref type="table" target="#tab_0">1</ref>. In our conducted experiment, in order to fairly explore the effectiveness of different strategies, we leveraged 5-fold cross-validation in which we divided all the datasets into 5 parts to obtain an ensemble model with a better generalization performance. 4 parts of them are for training and the remaining part is for verification. Afterwards we leverage the average results of 5 cross models as an estimation of the effectiveness of the strategy. BERT (Bidirectional Encoder Representations from Transformers) model <ref type="bibr" target="#b4">[5]</ref> is a pretrained language model (PLM) which shows excellent performance on multiple downstream NLP tasks. The model architecture is shown in Fig. <ref type="figure" target="#fig_0">1</ref>. It reads the input sequence at once and learns via two strategies, i.e., masked language modeling (MLM) and next sentence prediction (NSP). MLM is mean to randomly mask 15 percent of input words and replace them to other tokens, then predict those masked words. NSP refers to predict whether the input two sentences are consequent in the text or not to enhance the relationship between the sentences. In this paper, we leverage BETO <ref type="bibr" target="#b12">[13]</ref> as our base model. BETO is a BERT model trained on a big Spanish corpus Zenodo. BETO is of size similar to a BERT-Base and was trained with the Whole Word Masking technique. It uses a vocabulary of about 31k BPE <ref type="bibr" target="#b7">[8]</ref> subwords constructed using SentencePiece and were trained for 2M steps.</p><p>However, since our data set is based on Spanish tweet, a general pre-trained model directly applied to this data set may be limited by insufficient domain knowledge. At the same time, the problem of category imbalance (as discussed in Introduction) is also a problem we need to solve. Therefore, we proposed two strategies, Continuous pretraining and Data augmentation, to alleviate the above problems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Continual Pre-training</head><p>Inspired by <ref type="bibr" target="#b10">[11]</ref>, our continual pre-training approach to domain adaption is straightforward-we continue pretraining BETO on a large corpus of unlabeled domain-specific text. Specifically, we try two domain corpora: <ref type="bibr" target="#b0">(1)</ref> Training set in EmoEvalEs@Iber-LEF 2021: we ignore the labels in the training set and only use the raw text for continual pre-training. (2) General Spanish tweet corpus + Training set in EmoEva-lEs@IberLEF 2021: in addition to the unlabeled training data in this track, we also leverage a large general Spanish tweet corpus <ref type="bibr" target="#b11">[12]</ref> for domain-adaptive pretraining.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Data Augmentation</head><p>Data augmentation is to solve over-fitting from the data level and improve the generalization of the model. By increasing the diversity of training samples, the model can learn more essential features of the data and enhance the model's adaptability to subtle changes in samples. Back Translation. In order to generate more training data, we use back translation generate paraphrases of an unlabeled sentence 𝑥𝑥 𝑢𝑢 in constructing 𝑥𝑥′ 𝑢𝑢 . The paraphrase 𝑥𝑥′ 𝑢𝑢 , generated via translating 𝑥𝑥 𝑢𝑢 to an intermediate language and then translating it back, describes the same content as 𝑥𝑥 𝑢𝑢 and should be close to 𝑥𝑥′ 𝑢𝑢 semantically. In terms of the generated label, 𝑥𝑥 𝑢𝑢 and the corresponding back translation sample 𝑥𝑥′ 𝑢𝑢 share the same labels. We leverage English as intermediate language in back translation.</p><p>By observing the Spanish dataset, we find that the three types of categories, Disgust, Fear, and Surprise, account for the lowest proportions. Therefore, we only perform back translation in these three categories. Increase the proportion of low-proportion categories, which not only enriches the amount of training data but also reduces the model's misjudgment rate for these three low-proportion labels.</p><p>Mix Up. Mix up <ref type="bibr" target="#b8">[9]</ref> is a simple and quick data augmentation method. Its implementation method is to randomly extract two samples from the training sample to perform a simple random weighted summation. At the same time, the label of the sample corresponds to the weighted summation, and then the predicted result and the weighted summation loss is calculated for the subsequent tags, and finally the parameters are updated through backpropagation.</p><formula xml:id="formula_0">𝑥𝑥 � = 𝜆𝜆𝑥𝑥 𝑖𝑖 + (1 − 𝜆𝜆)𝑥𝑥 𝑗𝑗 , 𝑦𝑦 � = 𝜆𝜆𝑦𝑦 𝑖𝑖 + (1 − 𝜆𝜆)𝑦𝑦 𝑗𝑗<label>(1)</label></formula><p>where 𝑥𝑥 𝑖𝑖 , 𝑥𝑥 𝑗𝑗 are raw input vectors and 𝑦𝑦 𝑖𝑖 , 𝑦𝑦 𝑗𝑗 are one-hot label encodings. In this task, we simply set 𝜆𝜆 as 0.5 and get more stable predict results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiment</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Experiment Settings</head><p>We use Transformers library using Pytorch as backend to construct BERT-based models and ski-learn to construct machine learning models. The hyperparameters are shown in Table <ref type="table" target="#tab_1">2</ref>. As for evaluation, we leverage macro weighted averaged F1 score as our evaluation metric. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Experiment Results</head><p>We firstly report the offline results about some machine learning methods such as Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF) and so on and latest neural methods such as fine-tuned XLM <ref type="bibr" target="#b9">[10]</ref>, fine-tuned BETO as well as some augmented strategies including continual pre-training and back translation. The results are shown in Table <ref type="table" target="#tab_2">3</ref> and Table <ref type="table" target="#tab_3">4</ref>. Based on the offline results, we use the models (soft voting with 5 cross models) of ID 9, ID 10 and the combination of ID 9 and ID 10 (in Table <ref type="table" target="#tab_2">3</ref>) as our final submissions. The online results are shown in Table <ref type="table" target="#tab_4">5</ref>. We achieve the second place in the competition. It can be seen from Table <ref type="table" target="#tab_4">5</ref> that Fine-tuned BETO + Training set pre-training + Low proportion data back translation achieves the best result of 0.7222 in accuracy. It is worthwhile to note that the offline performance of Fine-tuned BETO + Training set pre-training + Mix up is excellent, but the online performance of it is not so good. That is also why the performance of the combination of the two models is not as good as that of the single model. We hold the opinion that the model training is over-fitting, resulting in poor generalization performance of the model, and thus the effect is impaired when tested on the test set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion</head><p>Aiming at sentiment analysis task for Spanish tweets in EmoEvalEs@IberLEF 2021, we adopt a monolingual pre-trained Spanish BERT model as our base model and finetune it with the labeled tweets. In addition, focusing on two problems of small data size and class imbalance in the original training set, we leverage two augmented strategies to enhance the classic fine-tuned model, namely continual pre-training and data augmentation. Specifically, we try two data augmentation methods: back translation and mix up. Experimental results demonstrate the effectiveness of two augmented strategies. In the future, we will further try more data augmentation methods to achieve better results on the sentiment analysis task for Spanish tweets.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. BERT Model.</figDesc><graphic coords="3,213.90,194.40,178.89,183.07" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 . Statistics of the dataset. Class Num. of Training Instances</head><label>1</label><figDesc></figDesc><table><row><cell>Happy</cell><cell>4908</cell></row><row><cell>Fear</cell><cell>260</cell></row><row><cell>Anger</cell><cell>2356</cell></row><row><cell>Surprise</cell><cell>952</cell></row><row><cell>Sad</cell><cell>2772</cell></row><row><cell>Disgust</cell><cell>89</cell></row><row><cell>Others</cell><cell>2356</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Hyperparameters.</figDesc><table><row><cell>Parameter</cell><cell>Value</cell></row><row><cell>Learning Rate</cell><cell>1e-5</cell></row><row><cell>Batch Size</cell><cell>16</cell></row><row><cell>Epoch</cell><cell>15</cell></row><row><cell>Optimizer</cell><cell>Adam</cell></row><row><cell>Device</cell><cell>Nvidia 1080i</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 .</head><label>3</label><figDesc>Correspondence between model and ID.</figDesc><table><row><cell>ID</cell><cell>Model</cell></row><row><cell>1</cell><cell>LR</cell></row><row><cell>2</cell><cell>SVM</cell></row><row><cell>3</cell><cell>RF</cell></row><row><cell>4</cell><cell>Fine-tuned BETO</cell></row><row><cell>5</cell><cell>Fine-tuned XLM</cell></row><row><cell>6</cell><cell>ID 4 + Training set pre-training</cell></row><row><cell>7</cell><cell>ID 4 + General corpus pre-training</cell></row><row><cell>8</cell><cell>ID 6 + Whole data back translation</cell></row><row><cell>9</cell><cell>ID 6 + Low proportion data back translation</cell></row><row><cell>10</cell><cell>ID 6 + Mix up</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 .</head><label>4</label><figDesc>Offline Performance.From the table above, we can see that the SVM method works best in machine learning methods, outperforming LR and RF with 0.0407 and 0.0245. In addition, neural methods are far superior to machine learning methods, indicating the superiority of neural methods especially BERT-based methods. As for BERT-based method itself, we can see that the monolingual BETO achieves better performance than multilingual XLM with an improvement of almost 0.1, demonstrate the effectiveness of monolingual BETO for this task. Besides, two augmented strategies leveraged in this paper have made certain improvements to the base model, among which Mix up augmentation achieves the best effect, reaching an average accuracy of 0.7266. In addition, continual pre-training with training set and low proportion data back translation respectively outperforms continual pre-training with general corpus and whole data back translation.</figDesc><table><row><cell>ID</cell><cell></cell><cell></cell><cell cols="2">Accuracy</cell><cell></cell><cell></cell></row><row><cell></cell><cell>Fold 1</cell><cell>Fold 2</cell><cell>Fold 3</cell><cell>Fold 4</cell><cell>Fold 5</cell><cell>Average</cell></row><row><cell>1</cell><cell>0.5163</cell><cell>0.5113</cell><cell>0.5305</cell><cell>0.5236</cell><cell>0.5236</cell><cell>0.5205</cell></row><row><cell>2</cell><cell>0.5351</cell><cell>0.5598</cell><cell>0.5704</cell><cell>0.5612</cell><cell>0.5797</cell><cell>0.5612</cell></row><row><cell>3</cell><cell>0.5346</cell><cell>0.5461</cell><cell>0.531</cell><cell>0.5461</cell><cell>0.5216</cell><cell>0.5367</cell></row><row><cell>4</cell><cell>0.708</cell><cell>0.7005</cell><cell>0.7133</cell><cell>0.7019</cell><cell>0.7044</cell><cell>0.7056</cell></row><row><cell>5</cell><cell>0.5969</cell><cell>0.6132</cell><cell>0.6158</cell><cell>0.6088</cell><cell>0.6108</cell><cell>0.6091</cell></row><row><cell>6</cell><cell>0.7036</cell><cell>0.7126</cell><cell>0.7197</cell><cell>0.7119</cell><cell>0.7126</cell><cell>0.7121</cell></row><row><cell>7</cell><cell>0.7121</cell><cell>0.7068</cell><cell>0.7112</cell><cell>0.7106</cell><cell>0.7042</cell><cell>0.709</cell></row><row><cell>8</cell><cell>0.7161</cell><cell>0.7098</cell><cell>0.7167</cell><cell>0.7172</cell><cell>0.7162</cell><cell>0.7172</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 .</head><label>5</label><figDesc>Online Performance.</figDesc><table><row><cell>Model</cell><cell cols="2">Accuracy Precision</cell><cell>Recall</cell><cell>F1-Score</cell></row><row><cell>ID 9</cell><cell>0.7222</cell><cell>0.7047</cell><cell>0.7222</cell><cell>0.7114</cell></row><row><cell>ID 10</cell><cell>0.7047</cell><cell>0.6927</cell><cell>0.7047</cell><cell>0.6942</cell></row><row><cell>Combination of ID 9 and ID 10</cell><cell>0.7204</cell><cell>0.7082</cell><cell>0.7204</cell><cell>0.7098</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>This work was supported by the National Social Science Foundation of China (No. 17CTQ045), the Soft Science Research Project of Guangdong Province (No.2019A101002108), the Science and Technology Program of Guangzhou (No.202002030227), the National Natural Science Foundation of China (No. 61572145) and the Key Field Project for Universities of Guangdong Province (No. 2019KZDZX1016). The authors would like to thank the anonymous reviewers for their valuable comments and suggestions.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Sentiment analysis and opinion mining</title>
		<author>
			<persName><forename type="first">B</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Synthesis Lectures on Human Language Technologies</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="167" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">SemEval-2017 task 4: Sentiment analysis in twitter</title>
		<author>
			<persName><forename type="first">S</forename><surname>Rosenthal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Farra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)</title>
				<meeting>the 11th International Workshop on Semantic Evaluation (SemEval-2017)<address><addrLine>Vancouver, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="502" to="518" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">BB twtr at SemEval-2017 task 4: Twitter sentiment analysis with CNNs and LSTMs</title>
		<author>
			<persName><forename type="first">M</forename><surname>Cliche</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)</title>
				<meeting>the 11th International Workshop on Semantic Evaluation (SemEval-2017)<address><addrLine>Vancouver, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="573" to="580" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">How Will Your Tweet Be Received? Predicting the Sentiment Polarity of Tweet Replies</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">T</forename><surname>Arasteh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Monajem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Christlein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Heinrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Evert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 15th International Conference on Semantic Computing (ICSC)</title>
				<imprint>
			<date type="published" when="2021">2021. 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of NAACLHLT</title>
				<meeting>NAACLHLT</meeting>
		<imprint>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Long short-term memory</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hochreiter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schmidhuber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural computation</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="1735" to="1780" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">EmoEvent: A Multilingual Emotion Corpus based on different Events</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Plaza-Del-Arco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Strapparava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A</forename><surname>Urena Lopez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Martin</forename><surname>Valdivia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 12th Language Resources and Evaluation Conference</title>
				<meeting>the 12th Language Resources and Evaluation Conference<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<publisher>European Language Resources Association</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1492" to="1498" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Google&apos;s neural machine translation system: Bridging the gap between human and machine translation</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Schuster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Norouzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Macherey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krikun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Macherey</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">CoRR</title>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">mixup: BEYOND EMPIRICAL RISK MINIMIZATION</title>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cisse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">N</forename><surname>Dauphin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lopez-Paz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Preceedings of ICLR</title>
				<imprint>
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Unsupervised cross-lingual representation learning at scale</title>
		<author>
			<persName><forename type="first">A</forename><surname>Conneau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Khandelwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Chaudhary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Wenzek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Guzmán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ACL 2020</title>
				<meeting>ACL 2020</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="8440" to="8451" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Don&apos;t Stop Pretraining: Adapt Language Models to Domains and Tasks</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gururangan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Marasović</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Swayamdipta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Beltagy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Downey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ACL</title>
				<meeting>ACL</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="8342" to="8360" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">TWilBert: Pre-trained Deep Bidirectional Transformers for Spanish Twitter</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">Á</forename><surname>González</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">F</forename><surname>Hurtado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Pla</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neurocomputing</title>
		<imprint>
			<biblScope unit="volume">426</biblScope>
			<biblScope unit="page" from="58" to="69" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Spanish Pre-Trained BERT Model and Evaluation Data</title>
		<author>
			<persName><forename type="first">J</forename><surname>Cañete</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chaperon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fuentes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pérez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Preceedings of ICLR</title>
				<imprint>
			<date type="published" when="2020">2020. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<author>
			<persName><forename type="first">M</forename><surname>Montes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rooso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gonzalo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Iberian Languages Evaluation Forum</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the Iberian Languages Evaluation Forum<address><addrLine>IberLEF</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021. 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Overview of the EmoEvalEs task on emotion detection for Spanish at IberLEF 2021</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Plaza-Del-Arco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Jiménez Zafra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Montejo Ráez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Molina González</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A</forename><surname>Ureña López</surname></persName>
		</author>
		<author>
			<persName><surname>Martín</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Valdivia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procesamiento del Lenguaje Natural</title>
		<imprint>
			<biblScope unit="volume">67</biblScope>
			<biblScope unit="issue">0</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
