<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Transformers Pipeline for Offensiveness Detection in Mexican Spanish Social Media</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Victor</forename><surname>Gómez-Espinosa</surname></persName>
							<email>victor.gomez@cimat.mx</email>
							<affiliation key="aff0">
								<orgName type="department">Mathematics Research Center (CIMAT)</orgName>
								<address>
									<postCode>66628</postCode>
									<settlement>Monterrey</settlement>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Victor</forename><surname>Muñiz-Sanchez</surname></persName>
							<email>victorm@cimat.mx</email>
							<affiliation key="aff0">
								<orgName type="department">Mathematics Research Center (CIMAT)</orgName>
								<address>
									<postCode>66628</postCode>
									<settlement>Monterrey</settlement>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Adrián</forename><surname>Pastor López-Monroy</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Mathematics Research Center (CIMAT)</orgName>
								<address>
									<postCode>36023</postCode>
									<settlement>Guanajuato</settlement>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Transformers Pipeline for Offensiveness Detection in Mexican Spanish Social Media</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">26E7600B24D568139FD31661A914076F</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T00:23+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Offensiveness Detection</term>
					<term>Mexican Spanish</term>
					<term>Transformers</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we describe the methodology proposed for participating in the MeOffendEs@IberLEF 2021 competition for the Subtask 3: Non-contextual binary classification for Mexican Spanish, which consists in the classification of tweets as offensive or non-offensive. We proposed a Transformers-based pipeline, consisting on a series of preprocessing steps and the use of an extended corpus, followed by an ensemble of BERT models. The proposed strategy obtained the best results on this task by ranking first place.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In the last years, there have been many initiatives in the NLP and Machine Learning community, to guide research efforts towards solutions in the automatic detection of threats and risks to the users of social networks. Those threats include aggressiveness, hate speech, harassment, racism, misogyny, among many others. For spanish language, those efforts have been promoted by academic competitions in specific tasks, such as the events organized by TASS <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b9">10]</ref>, PAN <ref type="bibr" target="#b4">[5]</ref> and particularly, MEX-A3T@IberLEF <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b6">7]</ref> and MeOffendEs@IberLEF <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b12">13]</ref>, which includes a track for aggresiveness and offensiveness identification task respectively for tweets in Mexican spanish.</p><p>Detection of offensive comments or posts in social media is not easy, because it is not depending on the presence or absence of specific words. As an example, consider the next tweets taken from MEX-A3T 2020 training corpus:</p><p>In the first case, the tweet is non-offensive, even when it contains vulgar and rude language. The second tweet is offensive, although the language is less vulgar than the first one. Based on that, we argue that it is necessary to take into account the context in which words are used.</p><p>There are many proposals to tackle offensive and aggressive content detection in social media for spanish language, with document representations based on n-grams (word and character level), and word embeddings, with classifiers based on standard machine learning and deep learning approaches <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b6">7]</ref>. However, in the last year, there is a visible trend in the use of contextualized representations of words, such as Bi-LSTM, Bi-GRU, and Transformers-based models, such as BETO <ref type="bibr" target="#b7">[8]</ref> with and without fine tunning <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b16">17]</ref>. State of the art results for Mexican spanish on this task has been reached with a bagging-like scheme, by combining different BERT models trained on different augmented datasets <ref type="bibr" target="#b10">[11]</ref>.</p><p>Similar to <ref type="bibr" target="#b10">[11]</ref>, we propose an ensemble of BERT models, but also, we use a pre-processing step in order to obtain valuable text descriptions of specialized language used in tweets, followed by an extension of the training corpus. The empirical evaluation shows that the proposed approach obtained the best results in the challenge by ranking first place. In the following sections, our proposal is explained in detail.</p><p>This document is organized as follows: Section 2 describes the dataset and the experimental settings. Section 3 describes the proposed pipeline, and Section 3.4 the experimental results. Finally, Section 4 outlines the conclusions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Dataset and model settings</head><p>OffendMex corpus consists of a training set of 5060 tweets and a validation set of 76 tweets, from the total, 80% was used for training and 20% for evaluation purposes. The dataset has a length mean of 24.11 and a maximum of 60 tokens, with an unbalanced ratio of 2.66 and the offensive class as the minority.</p><p>For this task, a pre-trained BERT model on Spanish was used <ref type="bibr" target="#b7">[8]</ref>, and for the fine-tuning step for small datasets (less than 100,000), we used the exhaustive search over the recommended hyperparameters <ref type="bibr" target="#b8">[9]</ref> and we choose the best one on the evaluation set. As a result, we used a BERT model with a training batch size of 16 for 4 epochs, and an Adam optimizer with a learning rate of 3e-5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Pipeline</head><p>In this section, the proposed pipeline is described: the corpus pre-processing step, second, the extended corpus step, and finally, the BERT ensemble step.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Step 1: Pre-processing</head><p>From other classification tasks such as irony or sentiment classification has been proved that adding the tweet jargon like hashtags, emojis, and emoticons as text descriptions improves tweet classification tasks through deep learning models like BERT <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref>.</p><p>Our procedure is the following:</p><p>-Hashtags are split into words (see Figure <ref type="figure" target="#fig_0">1</ref>) using the python word ninja library (https://github.com/keredson/wordninja) with a Spanish dictionary made with the Spanish fasttext vocabulary (https://fasttext.cc/docs/en/crawlvectors.html) . -Emojis are replaced with their text meaning in Spanish (see Figure <ref type="figure" target="#fig_0">1</ref>) given by the python library emoji (https://github.com/carpedm20/emoji/). -Emoticons are replaced with a text representation in Spanish similar to the emojis meanings (see Table <ref type="table">1</ref>). -Words out of vocabulary are replaced with the corresponding words (see Table <ref type="table">1</ref>). After pre-processing step, the maximum corpus length increases twice (see Figure <ref type="figure" target="#fig_1">2</ref>); this is the reason why the maximum sequence length of BERT model is 128. Table <ref type="table">1</ref>. Expressions and their Spanish text representations for emoticons and words out of vocabulary. Note that the main idea is to describe with words the meaning of the emoticon. For example, for the emoticon :) we replace by "smiling face". </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>EXPRESSIONS SPANISH</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Step 2: Extend training corpus</head><p>It was demonstrated <ref type="bibr" target="#b15">[16]</ref> that increasing the training corpus examples with other corpus labeled with a related task such as hate speech could improve model performance. In this pipeline, we choose to use hate speech and negative sentiment from the HatEval 2019 in Spanish <ref type="bibr" target="#b3">[4]</ref> and TASS 2019 for Mexican Spanish <ref type="bibr" target="#b9">[10]</ref> corpus, respectively. The methodology proposed to add examples from other corpus as a way to improve model performance and reduce the unbalanced ratio is shown in Figure <ref type="figure" target="#fig_2">3</ref>, and consists on the following three steps. In the first step the corpus must be preprocessed by the method described in section 3.1, the second step consists on training with the OffendMex 2021 corpus by the method described in section 3.3, and make inference on the HatEval and TASS corpus, and then, we select only those examples whose weights in the classifier are greater or equal to 0.95, which are added to the OffendMex corpus as offensive examples. Finally, the step three consists of training from scratch the model again. The intuitive idea of this step is to augment the training data only with those instances that could improve the classification score. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Step 3: Bert ensemble</head><p>In order to alleviate BERT instability of fine-tuning on small samples and unbalanced datasets, it was shown <ref type="bibr" target="#b10">[11]</ref> that using single BERTs as weak models and through an ensemble of 20 BERTs and a weighted voting scheme, which means that accumulating the softmax layer outputs and selecting the class with the maximum weight makes a more robust model (see Figure <ref type="figure" target="#fig_3">4</ref>). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Results</head><p>After following the proposed pipeline described in Section 3, we obtained the results shown in Table <ref type="table" target="#tab_1">2</ref>, where it can be seen that each step on the proposed pipeline helps to improve the model performance, as we expected. The best result achieved a F1 score on the evaluation set of 71.07 with a 20 BERT ensemble, pre-processing, and finally adding more examples to the training corpus. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions</head><p>This work presented a pipeline of three steps for offensiveness detection on Mexican Spanish social media that effectively achieved first place on the MeOf-fendEs@IberLEF 2021 subtask 3 competition with a F1 score of 0.7026 on the test set. Our experimental results on the evaluation set shown that each step on the pipeline improves the model performance. We thought this pipeline could be implemented quickly and successfully in other related tasks such as aggressiveness detection.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Above: Tweet before pre-processing. Below: After pre-processing.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. OffendMex token lengths statistics. Left) before pre-processing. Right) after pre-processing.</figDesc><graphic coords="3,150.93,441.31,313.50,156.75" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Methodology to add examples from other labeled corpus related to this task.</figDesc><graphic coords="5,134.77,115.83,345.83,135.55" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. F1 score for single BERT models and the ensemble (1 to 20 BERTs) with weighted voting scheme. The blue line indicates the F1 score as the BERT ensemble increases, whereas red crosses shows the score of the individual BERT added to the ensemble at each step.</figDesc><graphic coords="5,197.28,414.67,220.80,158.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>F1 Score on a preliminary evaluation subset (from the original training) consisting of 80% for training and 20% for validation.</figDesc><table><row><cell>20 BERT ensemble</cell><cell>F1(%)</cell></row><row><cell>Single BERT No Pre-Processing</cell><cell>67.19</cell></row><row><cell>No Pre-processing</cell><cell>70.28</cell></row><row><cell>With Pre-processing</cell><cell>71.00</cell></row><row><cell cols="2">After extending the training corpus 71.07</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>Gómez-Espinosa thanks CONACYT for the scholarship for Master degree studies with number: 1002761.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Overview of MEX-A3T at iberlef 2019: Authorship and aggressiveness analysis in mexican spanish tweets</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Aragón</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á Á</forename><surname>Carmona</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Montes-Y-Gómez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">J</forename><surname>Escalante</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">V</forename><surname>Pineda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Moctezuma</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Iberian Languages Evaluation Forum co-located with 35th Conference of the Spanish Society for Natural Language Processing, IberLEF@SEPLN 2019</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the Iberian Languages Evaluation Forum co-located with 35th Conference of the Spanish Society for Natural Language Processing, IberLEF@SEPLN 2019<address><addrLine>Bilbao, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019-09-24">September 24th. 2019. 2019</date>
			<biblScope unit="volume">2421</biblScope>
			<biblScope unit="page" from="478" to="494" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Overview of MEX-A3T at iberlef 2020: Fake news and aggressiveness analysis in mexican spanish</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Aragón</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">J</forename><surname>Jarquín-Vásquez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Montes-Y-Gómez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">J</forename><surname>Escalante</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">V</forename><surname>Pineda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Gómez-Adorno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Posadas-Durán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Bel-Enguix</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the Iberian Languages Evaluation Forum (IberLEF 2020) co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020)<address><addrLine>Málaga, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020-09-23">September 23th, 2020. 2020</date>
			<biblScope unit="volume">2664</biblScope>
			<biblScope unit="page" from="222" to="235" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Author profiling and aggressiveness detection in spanish tweets: MEX-A3T 2018</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Aragón</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>López-Monroy</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2018)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2018)<address><addrLine>Sevilla, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018-09-18">September 18th, 2018. 2018</date>
			<biblScope unit="volume">2150</biblScope>
			<biblScope unit="page" from="134" to="139" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter</title>
		<author>
			<persName><forename type="first">V</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bosco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Fersini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nozza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Patti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Rangel Pardo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sanguinetti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 13th International Workshop on Semantic Evaluation</title>
				<meeting>the 13th International Workshop on Semantic Evaluation<address><addrLine>Minneapolis, Minnesota, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2019-06">Jun 2019</date>
			<biblScope unit="page" from="54" to="63" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Overview of pan 2021: Authorship verification, profiling hate speech spreaders on twitter, and style change detection</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bevendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">L D L P</forename><surname>Sarracén</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kestemont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Manjavacas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Markov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mayerl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wolska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Zangerle</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Hiemstra</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Moens</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Mothe</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Perego</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Sebastiani</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="567" to="573" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Overview of TASS 2018: Opinions, health and emotions</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Cámara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Almeida-Cruz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Díaz-Galiano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Estévez-Velarde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á G</forename><surname>Cumbreras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">G</forename><surname>Vega</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gutiérrez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Montejo-Ráez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Montoyo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Muñoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Piad-Morffis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Villena-Román</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of TASS 2018: Workshop on Semantic Analysis at SEPLN, TASS@SEPLN 2018, co-located with 34nd SEPLN Conference (SE-PLN 2018)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>TASS 2018: Workshop on Semantic Analysis at SEPLN, TASS@SEPLN 2018, co-located with 34nd SEPLN Conference (SE-PLN 2018)<address><addrLine>Sevilla, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018-09-18">September 18th, 2018. 2018</date>
			<biblScope unit="volume">2172</biblScope>
			<biblScope unit="page" from="13" to="27" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Overview of MEX-A3T at ibereval 2018: Authorship and aggressiveness analysis in mexican spanish tweets</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á Á</forename><surname>Carmona</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Guzmán-Falcón</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Montes-Y-Gómez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">J</forename><surname>Escalante</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">V</forename><surname>Pineda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Reyes-Meza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename><surname>Sulayes</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2018)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2018)<address><addrLine>Sevilla, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018-09-18">September 18th, 2018. 2018</date>
			<biblScope unit="volume">2150</biblScope>
			<biblScope unit="page" from="74" to="96" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Spanish pretrained bert model and evaluation data</title>
		<author>
			<persName><forename type="first">J</forename><surname>Cañete</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chaperon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fuentes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Ho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pérez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">PML4DC at ICLR</title>
				<imprint>
			<date type="published" when="2020">2020. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2019-06">Jun 2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Overview of TASS 2019: One more further for the global spanish sentiment analysis corpus</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Díaz-Galiano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">G</forename><surname>Vega</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Casasola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chiruzzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á G</forename><surname>Cumbreras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Cámara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Moctezuma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Montejo-Ráez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A S</forename><surname>Cabezudo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">S</forename><surname>Tellez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Graff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Miranda-Jiménez</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Iberian Languages Evaluation Forum co-located with 35th Conference of the Spanish Society for Natural Language Processing, IberLEF@SEPLN 2019</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the Iberian Languages Evaluation Forum co-located with 35th Conference of the Spanish Society for Natural Language Processing, IberLEF@SEPLN 2019<address><addrLine>Bilbao, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019-09-24">September 24th, 2019. 2019</date>
			<biblScope unit="volume">2421</biblScope>
			<biblScope unit="page" from="550" to="560" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Transformers and data augmentation for aggressiveness detection in mexican spanish</title>
		<author>
			<persName><forename type="first">M</forename><surname>Guzman-Silverio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Á</forename><surname>Balderas-Paredes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>López-Monroy</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the Iberian Languages Evaluation Forum (IberLEF 2020) co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020)<address><addrLine>Málaga, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020-09-23">September 23th, 2020. 2020</date>
			<biblScope unit="volume">2664</biblScope>
			<biblScope unit="page" from="293" to="302" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Montes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gonzalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Aragón</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Agerri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á</forename><surname>Álvarez-Carmona</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Álvarez Mellado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carrillo-De Albornoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chiruzzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Freitas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Gómez Adorno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gutiérrez</surname></persName>
		</author>
		<title level="m">Proceedings of the Iberian Languages Evaluation Forum</title>
				<editor>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Jiménez-Zafra</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Lima</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Plaza-De Arco</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Taulé</surname></persName>
		</editor>
		<meeting>the Iberian Languages Evaluation Forum<address><addrLine>IberLEF</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021. 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Overview of the MeOffendEs task on offensive text detection at IberLEF 2021</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Plaza-Del-Arco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Casavantes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Escalante</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Martin-Valdivia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Montejo-Ráez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Montes-Y-Gómez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jarquín-Vásquez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Villaseñor-Pineda</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procesamiento del Lenguaje Natural</title>
		<imprint>
			<biblScope unit="volume">67</biblScope>
			<biblScope unit="issue">0</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">An effective bert-based pipeline for twitter sentiment analysis: A case study in italian</title>
		<author>
			<persName><forename type="first">M</forename><surname>Pota</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ventura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Catelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Esposito</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Sensors</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Incorporating emoji descriptions improves tweet classification</title>
		<author>
			<persName><forename type="first">A</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Blanco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Jin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2019-06">Jun 2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="2096" to="2101" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Detecting aggressiveness in mexican spanish social media content by fine-tuning transformer-based models</title>
		<author>
			<persName><forename type="first">M</forename><surname>Tanase</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zaharia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cercel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dascalu</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) co-located with 36th Conference of the Spanish Society for Natural Language Processing (SE-PLN 2020)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the Iberian Languages Evaluation Forum (IberLEF 2020) co-located with 36th Conference of the Spanish Society for Natural Language Processing (SE-PLN 2020)<address><addrLine>Málaga, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020-09-23">September 23th, 2020. 2020</date>
			<biblScope unit="volume">2664</biblScope>
			<biblScope unit="page" from="236" to="245" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">SemEval-2020 task 12: Multilingual offensive language identification in social media (OffensEval</title>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rosenthal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Atanasova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Karadzhov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mubarak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Derczynski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Pitenis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">¸</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fourteenth Workshop on Semantic Evaluation</title>
				<meeting>the Fourteenth Workshop on Semantic Evaluation<address><addrLine>Barcelona (online</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020-12">2020. Dec 2020</date>
			<biblScope unit="page" from="1425" to="1447" />
		</imprint>
	</monogr>
	<note>International Committee for Computational Linguistics</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
