Overview of TASS 2016 Resumen de TASS 2016

Overview of TASS 2016 Resumen de TASS 2016 MiguelÁngel GarcíaCumbreras Universidad de Jaén

23071 Jaén Spain

Departamento de Informática Universidad Calos III de Madrid Avd. de la Universidad

30 28911 Leganés Madrid España

JulioVillenaRomán

28034 Sngular, Madrid Spain

Sngular Data&Analytics

Av, LLano Castellano 13, Planta 5 28034 Madrid España

EugenioMartínezCámara emcamara@ujaen.es Universidad de Jaén

23071 Jaén Spain

Departamento de Informática Universidad Calos III de Madrid Avd. de la Universidad

30 28911 Leganés Madrid España

ManuelCarlos DíazGaliano Universidad de Jaén

23071 Jaén Spain

Departamento de Informática Universidad Calos III de Madrid Avd. de la Universidad

30 28911 Leganés Madrid España

MTeresa MartínValdivia Universidad de Jaén

23071 Jaén Spain

Departamento de Informática Universidad Calos III de Madrid Avd. de la Universidad

30 28911 Leganés Madrid España

LAlfonsoUreña laurena@ujaen.es JulioVillena-Román MiguelÁGarcía Cumbreras TUDarmstadt ManuelCDíaz Galiano LAlfonso UreñaLópez Universidad de Jaén

23071 Jaén Spain

Departamento de Informática Universidad Calos III de Madrid Avd. de la Universidad

30 28911 Leganés Madrid España

AlexandraBalahur JoseMaría GómezHidalgo SaraLanaSerrano ConstantineOrasan JoseManuel PereaOrtega JoséAntonioTroyano Jiménez AntonioQuirós antonio.quiros@sngular.team Universidad de Jaén

23071 Jaén Spain

28034 Sngular, Madrid Spain

Departamento de Informática Universidad Calos III de Madrid Avd. de la Universidad

30 28911 Leganés Madrid España

Sngular Data&Analytics

Av, LLano Castellano 13, Planta 5 28034 Madrid España

IsabelSegura-Bedmar Universidad de Jaén

23071 Jaén Spain

Departamento de Informática Universidad Calos III de Madrid Avd. de la Universidad

30 28911 Leganés Madrid España

PalomaMartínez Universidad de Jaén

23071 Jaén Spain

Departamento de Informática Universidad Calos III de Madrid Avd. de la Universidad

30 28911 Leganés Madrid España

JhonAdrián Cerón-Guzmán Evaluación de Modelos de Representación del Texto con Vectores de Dimensión Reducida para Análisis de Sentimiento Edgar Casasola Murillo Universidad de Jaén Universidad de Jaén Universidad de Jaén Universidad de Jaén Universidad de Jaén Universidad de Jaén Universidad de Jaén M. Teresa Martín Valdivia Universidad de Jaén L. Alfonso Ureña López Universidad de Jaén EC-Joint Research Centre (Italia José Carlos Cortizo Universidad Europea de Madrid (España) Lluís F José Carlos González-Cristobal Universidad Politécnica de Madrid (España Hurtado Universidad de Valencia (España Carlos A Iglesias Fernández Universidad Politécnica de Madrid (España Zornitsa Kozareva Information Sciences Institute (EE.UU Universidad Politécnica de Madrid (España Mitkov University of Wolverhampton Reino Unido Andrés Montoyo Universidad de Alicante (España Rafael Muñoz Universidad de Alicante (España University of Wolverhampton (Reino Unido) Universidad de Extremadura (España Ferran Pla Santamaría Universidad de Valencia (España) María Teresa Taboada Gómez Simon Fraser University

Canadá

Thelwall University of Wolverhampton (Reino Unido CEUR Workshop Proceedings Universidad de Sevilla (España) Evaluación de Modelos de Representación del Texto con Vectores de Dimensión Reducida para Análisis de Sentimiento Edgar Casasola Murillo Análisis de Sentimiento

Vectores de palabras

Santiago de Cali

Valle del Cauca Colombia

Overview of TASS 2016 Resumen de TASS 2016 1613-0073 395AE1833BBC708C1AD0F021A8378DFC GROBID - A machine learning software for extracting information from scholarly documents TASS 2016 análisis de opiniones medios sociales TASS 2016 sentiment analysis social media Sentiment Analysis Word embeddings Análisis de sentimientos clasificación de polaridad combinación de clasificadores normalización léxica tuis en español Twitter Ensemble classifier lexical normalization polarity classification sentiment analysis Spanish tweets Twitter

Resumen: Este artículo describe la quinta edición del taller de evaluación experimental TASS 2016, enmarcada dentro del Congreso Internacional SEPLN 2016. El principal objetivo de TASS es promover la investigación y el desarrollo de nuevos algoritmos, recursos y técnicas para el análisis de sentimientos en medios sociales (concretamente en Twitter), aplicado al idioma español. Este artículo describe las tareas propuestas en TASS 2016, así como el contenido de los corpus utilizados, los participantes en las distintas tareas, los resultados generales obtenidos y el análisis de estos resultados.

Agradecimientos

La organización de TASS ha contado con la colaboración de investigadores que participan en los siguiente proyectos de investigación:

• REDES (TIN2015-65136-C2-1-R) CEUR Workshop Proceedings ISSN: 1613-0073

Preámbulo

Actualmente el español es la segunda lengua materna del mundo por número de hablantes tras el chino mandarín, y la segunda lengua mundial en cómputo global de hablantes. Esa segunda posición se traduce en un 6,7% de población mundial que se puede considerar hispanohablante. La presencia del español en el mundo no tiene una correspondencia directa con el nivel de investigación en el ámbito del Procesamiento del Lenguaje Natural, y más concretamente en la tarea que nos atañe, el Análisis de Opiniones. Por consiguiente, el Taller de Análisis de Sentimientos en la SEPLN (TASS) tiene como objetivo la promoción de la investigación del tratamiento del español en sistemas de Análisis de Opiniones, mediante la evaluación competitiva de sistemas de procesamiento de opiniones.

En la edición de 2016 han participado 7 equipos, de los que 6 han enviado un artículo describiendo el sistema que han presentado, habiendo sido aceptados los 6 artículos tras ser revisados por el comité organizador. La revisión se llevó a cabo con la intención de publicar sólo aquellos que tuvieran un mínimo de calidad científica.

La edición de 2016 tendrá lugar en el seno del XXXII Congreso Internacional de la Sociedad Española para el Procesamiento del Lenguaje Natural, que se celebrará el próximo mes de septiembre en Salamanca (España) dentro del V Congreso Español de Informática (CEDI 2016).

Septiembre de 2016 Los editores CEUR Workshop Proceedings ISSN: 1613-0073

Preamble

Currently Spanish is the second native language in the world by number of speakers after the Mandarin Chinese. This second position means that the 6.7% of the world population is Spanish-speaking. The presence of the Spanish language in the world has not a direct correspondence with the number of research works related to the treatment of Spanish language in the context of Natural Language Processing, and specially in the field of Sentiment Analysis. Therefore, the Workshop on Sentiment Analysis at SEPLN (TASS) aims to promote the research of the treatment of texts written in Spanish in Sentiment Analysis systems by means of the competitive assessment of opinion processing systems.

Seven teams have participated in the 2016 edition of the workshop. Six of the seven teams have submitted a description paper of their systems. After a review process, the organizing committee has accepted the 6 papers, because all of them reached an acceptable scientific quality level.

The 2016 edition will be held at the 32 nd International Conference of the Spanish Society for Natural Language Processing (SEPLN 2016), which will take place at Salamanca in September framed by the 5 th Spanish Conference of Computer Science (CEDI 2016).

September 2016

The editors

Introduction

TASS is an experimental evaluation workshop, a satellite event of the annual SEPLN Conference, with the aim to promote the research on Sentiment Analysis in social media focused on the Spanish language. The fifth edition will be held on September 13th, 2016 at the University of Salamanca, Spain.

Sentiment Analysis (SA) is traditionally defined as the computational treatment of opinion, sentiment and subjectivity in texts (Pang & Lee, 2008). However, Cambria and Hussain (2012) offer a more updated definition: Computational techniques for the extraction, classification, understanding and evaluation of opinions and comments published on the Internet and other kind of user generated contents. It is a hard task because even humans often disagree on the polarity of a given text. And it is a harder task when the text has only 140 characters (Twitter messages or tweets).

Although SA is not a new task, it is still challenging, because the state of the art has not yet resolved some problems related to multilingualism, domain adaptation, text genre adaptation and polarity classification at fine grained level. Polarity classification has usually been tackled following two main approaches. The first one applies machine learning algorithms in order to train a polarity classifier using a labelled corpus (Pang et al. 2002). This approach is also known as the supervised approach. The second one is known as semantic orientation, or the unsupervised approach, and it integrates linguistic resources in a model in order to identify the valence of the opinions (Turney 2002).

The aim of TASS is to provide a competitive forum where the newest research works in the field of SA in social media, specifically focused on Spanish tweets, are described and discussed by scientific and business communities.

The rest of the paper is organized as follows. Section 2 describes the different corpus provided to participants. Section 3 shows the different tasks of TASS 2016. Section 4 describes the participants and the overall results are presented in Section 5. Finally, the last section shows some conclusions and future directions.

2 Corpus TASS 2016 experiments are based on two corpora, specifically built for the different editions of the workshop.

The two corpora will be made freely available to the community after the workshop. Please send an email to tass@sngularmeaning.team filling in the TASS Corpus License agreement with your email, affiliation (institution, company or any kind of organization) and a brief description of your research objectives, and you will be given a password to download the files in the password protected area. The only requirement is to include a citation to a relevant paper and/or the TASS website.

General corpus

The General Corpus contains over 68.000 tweets, written in Spanish, about 150 wellknown personalities and celebrities of the world of politics, economy, communication, mass media and culture, between November 2011 and March 2012. Although the context of extraction has a Spanish-focused bias, the diverse nationality of the authors, including people from Spain, Mexico, Colombia, Puerto Rico, USA and many other countries, makes the corpus reach a global coverage in the Spanishspeaking world. Each tweet includes its ID (tweetid), the creation date (date) and the user ID (user). Due to restrictions in the Twitter API Terms of Service (https://dev.twitter.com/terms/apiterms), it is forbidden to redistribute a corpus that includes text contents or information about users. However, it is valid if those fields are removed and instead IDs (including Tweet IDs and user IDs) are provided. The actual message content can be easily obtained by making queries to the Twitter API using the tweetid.

The general corpus has been divided into training set (about 10%) and test set (90%). The training set was released, so the participants could train and validate their models. The test corpus was provided without any tagging and has been used to evaluate the results.

Obviously, it was not allowed to use the test data from previous years to train the systems.

Each tweet was tagged with its global polarity (positive, negative or neutral sentiment) or no sentiment at all. A set of 6 labels has been defined: strong positive (P+), positive (P), neutral (NEU), negative (N), strong negative (N+) and one additional no sentiment tag (NONE).

In addition, there is also an indication of the level of agreement or disagreement of the expressed sentiment within the content, with two possible values: AGREEMENT and DISAGREEMENT. This is especially useful to make out whether a neutral sentiment comes from neutral keywords or else the text contains positive and negative sentiments at the same time.

Moreover, the polarity values related to the entities that are mentioned in the text are also included for those cases when applicable. These values are similarly tagged with 6 possible values and include the level of agreement as related to each entity.

This corpus is based on a selection of a set of topics. Thematic areas such as "política" ("politics"), "fútbol" ("soccer"), "literatura" ("literature") or "entretenimiento" ("entertainment"). Each tweet in the training and test set has been assigned to one or several of these topics (most messages are associated to just one topic, due to the short length of the text).

The annotation has been semi-automatically done: a baseline machine learning model is first run and then all tags are checked by human experts. In the case of the polarity at entity level, due to the high volume of data to check, the human annotation has only been done for the training set.

Table 1 shows a summary of the training and test corpora provided to participants. Users were journalists (periodistas), politicians (políticos) or celebrities (famosos). The only language involved was Spanish (es).

The list of topics that have been selected is the following:

• Politics (política)

• Entertainment (entretenimiento)

• Economy (economía)

• Music (música)

• Soccer (fútbol)

• Films (películas)

• Technology (tecnología)

• Sports (deportes)

• Literature (literatura)

• Other (otros)

The corpus is encoded in XML. Figure 1 shows the information of two tweets. The first tweet is only annotated with the polarity at tweet level because there is not any entity in the text. However, the second one is annotated with the global polarity of the message and the polarity associated to each of the entities that appear in the text (UPyD and Foro Asturias). system, environmental policy... Each aspect is related to one or several entities that correspond to one of the main political parties in Spain, which are:

• Partido_Popular (PP) • Partido_Socialista_Obrero_Español (PSOE) • Izquierda_Unida (IU) • Podemos • Ciudadanos (C's) • Unión_Progreso_y_Democracia (UPyD)

Each tweet in the corpus has been manually annotated by two annotators, and a third one in case of disagreement, with the sentiment polarity at aspect level. Sentiment polarity has been tagged from the point of view of the person who writes the tweet, using 3 levels: P, NEU and N. Again, no difference is made between no sentiment and a neutral sentiment (neither positive nor negative). Each political aspect is linked to its correspondent political party and its polarity.

Description of tasks

Since the first edition of TASS, a new task and a new corpus have been published. However, one of the aims of TASS is the evaluation of the progress of the research on SA. Thus, the edition of 2016 was focused on the analysis and the comparison of the systems with the submissions of previous editions. The edition of 2016 was focused on two tasks: polarity classification at tweet level and polarity classification at entity level. The polarity classification task has been proposed with the same corpus since the first edition of TASS, but the polarity classification at aspect level has been proposed with a different corpus each edition. In the edition of 2016 the classification at aspect level uses the STOMPOL corpus, which was published the first time in the edition of 2015.

Participants are expected to submit up to 3 results of different experiments for one or both of these tasks, in the appropriate format described below.

Along with the submission of experiments, participants have been invited to submit a paper to the workshop in order to describe their experiments and discussing the results with the audience in a regular workshop session.

The two proposed tasks are described next.

Task 1: Sentiment Analysis at Global Level

This task consists on performing an automatic polarity classification to determine the global polarity of each message in the test set of the General Corpus. The training set of the corpus was provided to the participants with the aim they could train and validate their models with it. There were two different evaluations: one based on 6 different polarity labels (P+, P, NEU, N, N+, NONE) and another based on just 4 labels (P, N, NEU, NONE).

Participants are expected to submit (up to 3) experiments for the 6-labels evaluation, and they are also allowed to submit (up to 3) specific experiments for the 4-labels scenario.

Results must be submitted in a plain text file with the following format:

tweetid \t polarity where polarity can be:

• P+, P, NEU, N, N+ and NONE for the 6-labels case • P, NEU, N and NONE for the 4-labels case.

The same test corpus of previous years was used for the evaluation in order to develop a comparison among the systems. The accuracy is one of the measures used to evaluate the systems, however due to the fact that the training corpus is not totally balanced the systems were also assessed by the macroaveraged precision, macro-averaged recall and macro-averaged F1-measure.

Task 2: Aspect-based sentiment analysis

A corpus with the entities and the aspect identified was provided to the participants, so the goal of the systems is the inference of the polarity at the aspect-level. As in 2015, STOMPOL corpus was the corpus used in this task. STOMPOL was divided in training and test set, the first one for the development and validation of the systems, and the second for evaluation.

Participants are expected to submit up to 3 experiments for each corpus, each in a plain text file with the following format: tweetid \t aspect-entity \t polarity Allowed polarity values are: P, N and NEU. For the evaluation, a single label combining "aspect-polarity" has been considered. As in the first task, accuracy, macro-averaged precision, macro-averaged recall and macro-averaged F1measure have been calculated for the global result.

Participants and Results

This year 7 (7 last year) groups submitted their systems The list of active participant groups is shown in Table 3, including the tasks in which they have participated.

Six of the seven participant groups sent a report describing their experiments and results achieved. Papers were reviewed and included in the workshop proceedings. References are listed in Table 4.

Group 1 2 jacerong X ELiRF-UPV X X LABDA X INGEOTEC X GASUCR X GTI X SINAI_w2v X Total 6 1

Results

This section will be focused on the description and the analysis of the results and the systems submitted by the participants.

Task 1: Sentiment Analysis at Global Level

Submitted runs and results for Task 1, evaluation based on 5 polarity levels with the whole General test Corpus are shown in Table 5. Accuracy, macro-averaged precision, macroaveraged recall and macro-averaged F1-measure have been used to evaluate each individual label and ranking the systems. In order to perform a more in-depth evaluation, results are calculated considering the classification only in 3 levels (POS, NEU, NEG) and no sentiment (NONE) merging P and P+ in only one category, as well as N and N+ in another one. The results reached by the submitted systems are shown in Table 6.

Task 2: Aspect-based Sentiment Analysis

Submitted runs and results for Task 2, with the STOMPOL corpus, are shown in Table 7. Accuracy, macro-averaged precision, macroaveraged recall and macro-averaged F1measure have been used to evaluate each individual label and ranking the systems.

Description of the systems

The systems submitted in the edition of 2016 represent the next step of the ones submitted in the previous edition. The systems may be cluster in two groups, those ones that rely on the classification power of the ensemble of several base classifiers, and those systems that change the use traditional Bag-of-Words model for the use of vectors of word embeddings in order to represent the meaning of each word. In the subsequent paragraphs the main features of the systems submitted are going to be depicted. Hurtado and Pla (2016) describe the participation of the team ELiRF-UPV in the two tasks of TASS 2016. The only difference between the systems submitted for the two tasks is the fact that the one focused on the second task has a module for the identification of the context of each of the entities and aspects annotated on the tweets. The polarity classification system relies on the ensemble of 192 configurations of a SVM classifiers. For the combination of the set of classifiers they evaluate the performance of an approach based on voting and other on stacking.

The system depicted in (Cerón-Guzmán, 2016) is also based on an approach of ensemble classifiers. In this case the base classifiers used a classifier based on logistic regression and they are combined by voting. Alvarez et al. (2016) exposed the participation of the team GTI on the task 2. The system is similar to the system of the team ELiRF-UPV in the sense that it is composed by two layers: context identification and polarity classification. Regarding the identification of the context, the authors design a heuristic method based on lexical markers. The polarity classification system is a SVM classifier that uses different type of features in order to represent the contexts of the entities and the aspects.

Montejo-Ráez and Díaz-Galiano (2016) introduce a system based on a supervised learning algorithm over vectors resulting from a weighted vector. This vector is computed using a Word2Vec algorithm. This method, which is inspired from neural-network language modelling, was executed with a collection of tweets written in Spanish and the Spanish Wikipedia in order to generate a set of word embeddings for the representation of the words of the General Corpus of TASS as dense vectors. The creation of the collection of tweets written in Spanish followed a distant supervision approach by means the assumption that tweets with happy and sad emoticons express emotions or opinions. Their experiments show massive data from Twitter can lead to a slight improvement in classification accuracy.

The system presented by the team LABDA (Quirós, Segura-Bedmar and Paloma Martínez, 2016) is similar to the one submitted by SINAI (Montejo-Ráez and Díaz-Galiano, 2016) because it also used word embeddings as schema of representation of the meaning of the words of the tweets. Quirós, Segura-Bedmar and Paloma Martínez (2016) assessed the performance of the SVM and Logistic Regression as classifiers.

Casasola Murillo and Marín Reventós (2016) submitted an unsupervised system based on the system described in Turney ( 2002), but with a specific adaptation to the classification of tweets written in Spanish.

Analysis

In Table 5 and Table 6 are shown the results of each system and they are ranked by the F1score reached, so it is not hard to know what is the best system in the edition of 2016.

On the other hand, how many tweets were rightly classified by the submitted systems? Is there a set of tweets that were not rightly classified by any system? What are the most difficult tweets to classify? These questions are going to be answered in the following paragraphs?

Table 8 shows the rate of tweets that are rightly classified by a number of systems. There are about a 6% of tweets whose polarity is not inferred by any of the submitted systems. In other words, the submitted systems in the edition of 2016 are able to classify about the 94% of the test set. So, what is the main features of that 6% of tweets that any system inferred their polarity? All the systems submitted are based on linear classifiers that do not take into account the context of each word, which means a big drawback for the understanding the meaning of a span of text.

The tweets of the Figures 3, 4 and 5 show that opinions and emotions are not only expressed by lexical markers, so the future participants should take into account the challenging task of implicit opinion analysis, irony and sarcasm detection. These new problems may be framed on the semantic level of Natural Language Processing and should be tackled by the research community in order to go a step further in the understanding of the subjective information, which is continuously published on the Internet.

Id: 171304000392663040

Sacarle 17 puntos en la final de Copa al Barça CB en el Palau Sant Jordi es una pasada.

Conclusions and Future Work

TASS was the first workshop about SA focused on the processing of texts written in Spanish. In the three first editions of TASS, the research community were mainly formed by Spanish researchers, however since the last edition, the researchers that come from South America is making bigger, so it is an evidence that the research community of Sentiment Analysis in Spanish is not only located in Spain and is formed by the Spanish speaking countries.

Anyway, the developed corpus and gold standards, and the reports from participants will for sure be helpful for knowing the state of the art in SA in Spanish.

The future work will be mainly focused on the definition of a new General Corpus because of the following reasons: 1. The language used on Twitter changes faster than the language used in traditional genres of texts, so the update of the corpus is required in order to cover a real used of the language on Twitter. 2. After several editions of the workshop, we realize that the quality of the annotation is not extremely good, so it is required to define a new corpus with a high quality annotation in order to provide a real gold standard for Spanish SA on Twitter.

The research community deeply know the

General Corpus of TASS and it wants a new challenge. A significant amount of new tasks is currently being defined in Natural Language Processing, so some of them, such as stance classification, will be studied to be proposal for the next edition of TASS. Resumen: Se describe el sistema para análisis de sentimiento desarrollado por el Grupo de Análisis de Sentimiento GAS-UCR de la Universidad de Costa Rica para la tarea 1 del workshop TASS 2016. El sistema propuesto está basado en el uso de vectores de características de baja dimensión para representación del texto. Se propone un modelo simple fundamentado en la normalización de texto con identificación de marcadores de énfasis, el uso de modelos de lenguaje para representar las características locales y globales del texto, y características como emoticones y partículas de negación. Los primeros experimentos muestran las mejoras que se obtienen en la precisión al identificar la polaridad de textos completos conforme se van incorporando las características aquí mencionadas. Palabras clave: análisis de sentimiento, clasificación de textos por polaridad, textos cortos Abstract: The Sentiment Analisys System developed by GAS-UCR team of the University of Costa Rica for task 1 of TASS 2016 workshop is presented. Preliminar evaluation results of the proposed Sentiment Analysis System are presented. The system is based on low dimension feature vectors for text representation. The proposed model is based on text normalization with emphasis mark identification, the use of local and global language models, and other features like emoticons an negation terms. Initial experimentation shows that the introduction of the selected features have a positive impact on precision at the polarity classification task. Keywords: sentiment analysis, polarity based text clasification, short texts.

Introducción

Este trabajo tiene como propósito describir el sistema utilizado por el grupo de investigación en análisis de sentimiento de la Universidad de Costa Rica en su participación en el taller TASS2016 (García-Cumbreras et al., 2016). El enfoque del trabajo del grupo ha sido el estudio de los factores que van incidiendo en las mejoras en la precisión obtenida al llevar a cabo la clasificación de la polaridad de tweets en idioma español. Nuestro sistema se fundamenta en tres elementos básicos que son: la normalización del texto en la etapa de preprocesamiento identificando los poten- * Este trabajo se ha llevado a cabo gracias al apoyo económico de la Universidad de Costa Rica y el Gobierno de la República de Costa Rica a través del MICITT. Se agradece a los asistentes del grupo de investigación GAS-UCR por su trabajo ciales marcadores de énfasis presentes en el mismo, la creación de vectores de características de dimensión reducida para disminuir el efecto de la dispersión de los datos, y la exploración del impacto del uso de diccionarios de polaridad que se generan mediante la utilización de diferentes modelos de representación del lenguaje asociados tanto al contexto local como global de los datos. Para esto estamos utilizando una adaptación propia del algoritmo de Turney (Turney, 2002)sobre un corpus de 5 millones de tweets en español. Estos modelos se almacenan en forma de diccionarios con polaridad para su posterior reutilización. Nos interesa particularmente la investigación en este campo dado que si bien desde el año 2013 se identificó una brecha importante entre la cantidad de investigación y tecnología del lenguaje desarrollada para el idioma inglés y el español (Cambria et al., 2013) (Melero et al., 2012), de la misma forma debemos tener presente que no necesariamente las soluciones para español peninsular van a tener los mismos resultados al aplicarse a variantes de español americano, por lo que los recursos y métodos que utilizamos tienen la intensión de aportar a la investigación en español y colaborar para su posterior aplicación en otros contextos de habla hispana.

Antecedentes

Entre los resultados obtenidos con sistemas con enfoques basados en aprendizaje máquina, el uso de máquina de soporte vectorial (MSV) ha ofrecido buenos resultados tanto en inglés (Kiritchenko, Zhu, y Mohammad, 2014) y (Batista y Ribeiro, 2013) como en español donde 9 de los 14 sistemas para el español presentados en TASS2015 (Villena-Román et al., 2015) hacían uso de este tipo de clasificador. Sin embargo, la dependencia del lenguaje hace que estos clasificadores dependan de los vectores de características con los que son representados los comentarios de texto. Esta extracción de características ha sido el foco de atención de múltiples trabajos como (Cabanlit y Junshean Espinosa, 2014) , (Feldman, 2013), (Guo y Wan, 2012), (Sharma y Dey, 2012) y (Wang et al., 2011). En trabajos recientes de análisis de sentimiento en español tales como el trabajo de (Martínez-Cámara et al., 2015) se utilizan varios diccionarios de polaridad y se representan utilizando un modelo de espacio vectorial MEV. El diccionario en sí se convierte en un modelo de lenguaje que sirve como recurso para lograr representaciones eficientes de los vectores utilizados para la clasificación.

En los últimos años la representación vectorial basada en modelos de lenguaje como unigramas y bigramas se movió hacia representaciones de características ya que la cantidad de términos introduce un problema asociado a su alta dispersión en el vector (Cambria et al., 2013). Si los vectores contienen un alto número de atributos diferentes, uno por término, los conjuntos de datos para entrenamiento deben contener una mayor cantidad de textos anotados que atributos para un buen entrenamiento de los clasificadores. Es por esto que los modelos de representación del lenguaje basados en unigramas, bigramas o bien skipgramas requiren de una representación vectorial eficiente. Trabajos recientes buscan la representación vectorial de las palabras en el espacio continuo como es el caso del uso de Word2Vect (Díaz-Galiano y Montejo-Ráez, 2015).

Descripción del sistema

Nuestro sistema se fundamenta en cuatro elementos que consideramos importantes de mencionar. Primero nos referiremos a la forma en que construimos nuestro diccionario con la polaridad de los términos y las razones para haber construido uno propio. Posteriormente nos referimos a nuestro proceso de preprocesamiento e identificación de potenciales marcadores de énfasis durante esta etapa inicial. En la siguiente subsección explicamos la forma en que construimos vectores de baja dimensión con información y hacemos uso del diccionario. Finalmente se menciona la forma en que se pretende capturar en los vectores de características aspectos locales con respecto a los datos de entrenamiento, y globales, a partir de modelos de representación del lenguaje general.

Creación del diccionario polarizado

Decidimos desarrollar diccionarios de polaridad propios, en lugar de utilizar los existentes, ya que consideramos que desde el punto de vista del procesamiento de lenguaje natural tradicional (Indurkhya y Damerau, 2010) estos diccionarios con polaridad pueden ser vistos cada uno, como un modelo de lenguaje particular. Por este motivo tratamos de desarrollar y evaluar una adaptación del tradicional método de generación de estos recursos lingüísticos de (Turney, 2002). La decisión anterior no se debió a la no existencia de diccionarios polarizados ya que claramente en trabajos como (Martínez-Cámara et al., 2015) se hace uso de varios de ellos, sino con el fin de incorporar la etapa de creación de diccionario dentro de la metodología de trabajo para que posteriores investigaciones en otros países de habla hispana puedan replicar el trabajo y disminuir la barrera inicial asociada a la falta de recursos lingüísticos propios y el efecto del uso del diccionario polarizado sobre la calidad de los resultados de clasificación.

El diccionario de polaridad creado utiliza un corpus recolectado durante el año 2013, con 5 millones de tweets en español. La variante con respecto al algoritmo propuesto por Turney (Turney, 2002) es la siguiente. Para el cálculo de la orientación semántica de un término, tal y como lo define Turney en su artículo original, se utilizaron grupos de palabras semilla en lugar de un solo término, y en lugar de utilizar consultas a motores de búsqueda para obtener la cantidad de textos donde aparecen las palabras analizadas cerca de las palabras positivas o negativas se utilizó el motor de búsqueda implementado con el software libre Solr http://lucene.apache.org/solr/. Con el motor se indexaron los 5 millones de tweets por lo que las consultas se ejecutaron en forma local. Este método cuenta con la ventaja de que se puede calcular entonces la orientación semántica de un término directamente o bien almacenarlo en un diccionario. En nuestro caso precalculamos la polaridad y la almacenamos en forma de diccionario. Por el momento solo se han llevado a cabo los cálculos para términos individuales.

Normalizador de texto con marcadores de énfasis

Luego de un proceso de análisis de las características presentes en el texto desarrollamos un sistema para normalización del texto. Para este preprocesamiento se segmentan los términos potenciales, signos de puntuación y emoticones. Se lleva a cabo un marcado y conversión de los términos. El proceso que seguimos hace una eliminación de los términos que son identificados en el diccionario. Este proceso se muestra en la figura 1. Las repeticiones de letras, repeticiones de sílabas y mayúsculas son identificadas y eliminadas pero estos términos se marcan como potenciales identificadores de énfasis. Ejemplos son: EXCELENTE, graciassss, buenisísimo. En esta fase se identifican los tweets que contienen palabras positivas con énfasis para su posterior uso.

Representación vectorial de baja dimensión

Dos características representadas en los vectores tienen que ver con la presencia y polaridad de los emoticones y con la presencia de partículas de negación. Además, al desarrollar esta investigación se pudo observar que los términos positivos con marcadores de énfasis son un potencial identificador de la polaridad positiva de los textos que los contienen, por lo tanto esta característi-Figura 1: Proceso de normalizacion del texto ca también fue incorporada. La presencia de marcadores de énfasis tales como repetición de caracteres, de sílabas, o mayúsculas sobre términos que aparecen como negativos en algún contexto son registrados como una característica importante en el vector. Los vectores generados utilizan la polaridad de los términos para determinar la posición en el vector de características creado. Cabe dejar claro que dependiendo del modelo de datos los términos pueden ser unigramas, bigramas o skipgramas. En el caso de los unigramas, por ejemplo, si se construye un vector con la frecuencia de los términos según su polaridad con valores de polaridad desde -1.0 hasta 1.0, el vector que se obtiene sería como el que se muestra en la figura 2. En este vector por ejemplo se muestran dos términos con polaridad, según diccionario, entre el -0.8 y -0.9, un término con polaridad entre 0.1 y 0.2, y otro con polaridad mayor a 0.9. En este caso, en nuestro diccionario, la polaridad se representa con valores distribuidos desde lo más negativo hasta lo positivo con valores entre -1.0 y 0 para los negativos y 0 a 1.0 para los positivos.

Para el taller TASS2016 quisimos evaluar inicialmente el uso de vectores con la menor dimensión posible, así que en lugar de vectores de 20 celdas utilizamos solo vectores de 5 celdas para cada grupo de características, en lugar de saltos de 0.1 el rango utilizado es de Figura 2: Vector de características 0.5.

Modelos locales y globales de representación del lenguaje

Nuestra propuesta pretende representar en los vectores de características información propia obtenida durante el proceso de entrenamiento, al igual que datos que representen información obtenida de modelos de lenguaje del español en general. En nuestro caso se utilizó inicialmente el diccionario generado a partir del corpus recolectado como insumo para obtener de él la información general del español. En el momento de entrenamiento, la polaridad de los términos en cada tweet son conocidos para ese conjunto de datos. La información global es la que se ha calculado previamente y se encuentra almacenada en forma de diccionarios. En nuestra propuesta lo que queremos hacer es representar en el vector las frecuencias de los términos de cada tweet distribuidos según su polaridad pero utilizar diferentes modelos de representación de lenguaje para llevar a cabo este cálculo. El diccionario utilizado en estos experimentos fue nuestra versión con unigramas. Se pretende utilizar representaciones con bigramas y una versión de skipgramas que incluye solo los términos anteriores a la palabra que se desea representar. Durante el entrenamiento, la polaridad obtenida en forma local es almacenada al igual que las frecuencias tomadas de diccionarios de polaridad global. Por lo tanto, los vectores cuentan con entradas para las distribuciones de polaridad local y las distribuciones de polaridad global. Aquí es donde incorporamos los diferentes modelos de lenguaje. Inicialmente trabajamos con unigramas para obtener resultados base para posteriores experimentos. Posteriormente, se genera un diccionario para bigramas y otro para lo que definimos como skip-gramas previos. Por el momento estas variantes no fueron enviadas como experimentos a TASS2016 sino solo las versiones iniciales.

Metodología

Utilizando el diccionario, el normalizador y el modelo de representación vectorial se procedió a crear vectores de respresentación con diferentes configuraciones. Primeramente se construyó una versión con vectores de dimensión 20 distribuyendo la polaridad de los términos según la polaridad almacenada para unigramas en el diccionario local. En este caso se pretende evaluar solamente el uso del diccionario y los marcadores de énfasis como repeticiones y mayúsculas. Este primer experimento es el denominado GASUCR-01. El segundo experimento consistió en evaluar un modelo un poco más robusto a nivel local con bigramas y la polaridad para el unigrama en el diccionario, si el bigrama no está presente durante el proceso de evaluación. En este caso se crearon vectores de menor dimensión para los datos locales, con solo cinco campos. Esta ejecución se idendificó como experimento GASUCR-01-noEMO-noPartNeg. Esta es la implementación base para luego evaluar el uso de bigramas tomados del contexto global. Esta versión base también fue enviada a la tarea de 4 categorías. En este caso, lo que se hizo fue unir las categorías +P y P en una sola, y la categoría +N con la N. El tercer experimento agregaba al anterior el uso de los emoticones, aparición de términos positivos con énfasis y las partículas negativas. En los resultados esta versión se identificó como GASUCR-04 En esta versión de TASS no nos dió tiempo de ejecutar las versiones con bigramas globales, ni skipgramas. Estos casos se fueron seleccionando para ir evaluando en forma incremental cada uno de los aspectos relacionados a nuestra propuesta. Con cada característica nueva se trata de determinar su impacto sobre los valores de exactitud, precisión y exhaustividad.

Conclusiones y trabajo futuro

El marco de evaluación de TASS es provechoso para los grupos que inician la investigación en análisis de sentimiento en español con el fin de extenderla a otras latitudes. En nuestro caso pudimos evaluar y comparar la calidad de los resultados de los primeros casos base de nuestro trabajo. Observamos los primeros resultados con un sistema que utiliza un método de normalización con identificación de potenciales marcadores de énfasis, un modelo de representación basado en vectores de baja dimensión, y modelos de representación del texto con características locales y globales. El trabajo además hace uso de características comunes con otros como los son el uso de emoticones y partículas negativas. Como trabajo futuro tenemos pendiente la evaluación usando 3 categorías de los datos que hacen uso de contexto local con bigramas y características adicionales como uso de emoticones, palabras positivas con énfasis, y partículas de negación. Esperamos que los mejores resultados sean obtenidos al incorporar los nuevos modelos de lenguaje que estamos calculando para bigramas y skipgramas previos al unirlo con nuestro método de representación en vectores de baja dimensión. Se desea estudiar el efecto de la reducción del tamaño del vector al igual que técnicas de extrapolación de la polaridad en los modelos para los términos que no aparecen en los datos de entrenamiento. LABDA at the 2016 TASS challenge task: using word embeddings for the sentiment analysis task *

Introduction

Knowing the opinion of customers or users has become a priority for companies and organizations in order to improve the quality of their services and products. With the ongoing explosion of social media, it affords a significant opportunity to poll the opinion of many Internet users by processing their comments. However, it should be noted that sentiment analysis, which can be defined as the automatic analysis of opinion in texts (Pang and Lee, 2008), is a challenging task because it is not strange that different people assign different polarities to a given text. On Twitter, the task is even more difficult, because the texts are small (only 140 characters) and are charectized by their informal style language, many grammatical errors and spelling mistakes, slang and vulgar vocabulary and abbreviations.

Since their introduction in 2013, the TASS shared task editions have had as main goal to promote the development of methods and resources for sentiment analysis of tweets in Spanish. This paper describes the participation of the LABDA group at the Task 1 (Sentiment Analysis at global level). In this task, the participating systems have to determine the global polarity of each tweet in the test dataset. There are two different evaluations: one based on 6 different polarity labels (P+, P, NEU, N, N+, NONE) and another based on just 4 labels (P, N, NEU, NONE). A detailed description of the task can be found in the overview paper of TASS 2016(García-Cumbreras et al., 2016). Our approach exploits word embedding representations for tweets and machine learning algorithms such as SVM and logistics regression. The word embedding model can yield significant dimensionality reduction compared to the classical Bag-Of-Word (BoW) model. The dimensionality redution can have several positive effects on our algorithms such as faster training, avoiding overfitting and better performance.

The paper is organized as follows. Section 2 describes our approach. The experimental results are presented and discussed in Section 3. We conclude in Section 4 with a summary of our findings and some directions for future work.

System

In this paper, we study the use of word embeddings (also known as word vectors) in order to represent tweets and then examine several machine learning algorithms to classify them. Word embeddings have shown promising results in NLP tasks, such as named entity recognition (Segura-Bedmar, Suárez-Paniagua, and Martınez, 2015), relation extraction (Alam et al., 2016), sentiment analysis (Socher et al., 2013b) or parsing (Socher et al., 2013a). A word embedding is a function to map words to low dimensional vectors, which are learned from a large collection of texts. At present, Neural Network is one of the most used learning techniques for generating word embeddings (Mikolov and Dean, 2013). The essential assumption of this model is that semantically close words will have similar vectors (in terms of cosine similarity). Word embeddings can help to capture semantic and syntactic relationships of the corresponding words.

While the well-known Bag-of-Words (BoW) model involves a very large number of features (as many as the number of nonstopwords words with at least a minimum number of occurrences in the training data), the word embedding representation allows a significant reduction in the feature set size (in our case, from million to just 300). The dimensionality reduction is a desirable goal, because it helps in avoiding overfitting and leads to a reduction of the training and classification times, without any performance loss.

As a preprocessing step, tweets must be cleaned. First, we remove all links and urls. We then remove usernames which can be easily recognized because their first character is the symbol @. We then transform the hashtags to words by removing its first character (that is, the symbol #). Taking advantage of regular expressions, the emoticons are detected and classified in order to count the number of positive and negative emoticons in each tweet and then we remove them from the text. Table 1 shows the list of positive and negative emoticons, which were taken from the wikipedia page https://en.wikipedia. org/wiki/List\_of\_emoticons. We con-vert the tweets to lowercase and replace misspelled accented letters with the correct one (for instance "à" with "á"). We also treat elongations (that is, the repetition of a character) by removing the repetition of a character after its second occurrence (for example, "hoooolaaaa" would be translated to "hola"). We then decided to take into account laughs (for instance "jajaja") which turned out to be challenging because of the diverse ways they are expressed (i.e. expressions like "jajajaja" or "jejeje" and even misspelled ones like "jajjajaaj") We addressed this using regular expressions to standardize the different forms (i.e. "jajjjaaj" to "jajaja") and then replace them with the word "risas". Finally we remove all non-letters characters and all stopwords present in tweets1 . Table 1: List of positive and negative emoticons

Orientation Emoticons

Once the tweets are preprocessed, they are tokenized using the NLKT toolkit (a Python package for NLP); we also performed experimentation by lemmatizing each tweet using MeaningCloud2 Text Analytic software to compare both approaches. Then, for each token, we search its vector in the word embedding model. We use a pretrained model (Cardellino, 2016), which was generated by using the word2vec algorithm (Mikolov and Dean, 2013) from a collection of Spanish texts with approximately 1.5 billion words. The dimension of the word embedding is 300. It should be noted that these texts were taken from different resources such as Spanish Wikipedia, WikiSource and Wikibooks, but none of them contains tweets. Therefore, it is possible that the main characteristics of the social media texts (such as informal style language, noisy, plenty of grammatical errors and spelling mistakes, slang and vulgar vocabulary, abbreviations, etc) are not correctly represented in this model. One of the main problems is that there is a significant number of words (almost a 13 % of the vocabulary, representing the 6 % of words occurrences) that are not found in the model. We perform a review of a small sample of these words, showing that most of them were mainly hashtags.

In our approach, a tweet of n tokens (T = w 1 , w 2 , ..., w n ) is represented as the centroid of the word vectors w i of its tokens, as shown in the following equation:

T = 1 n n i=1 w i = N j=1 w j .T F (w j , t) N j=1 T F (w j , t)(1)

where N is the vocabulary size, that is, the total number of distinct words, while T F (w j , t) refers to the number of occurrences of the j-th vocabulary word in the tweet T.

We also explore the effect of including the inverse document frequencies IDF to represent tweets (see Equation 2). This helps to increase the weight of words that occur often, but only in a few documents, while it reduces the relevance of words that occur very frequently in a larger number of texts.

T = 1 n n i=1 w i = N j=1 w j .T F (w j , t).IDF (w j ) N j=1 T F (w j , t).IDF (w j )

In addition to using the centroid, we assess the impact of complementing the tweet model with the following additional features: posWords: number of positive words present in the tweet. negWords: number of negative words present in the tweet. posEmo: number of positive emoticons present in the tweet. negEmo: number of negative emoticons present in the tweet.

For the posWords and negWords features we used the iSOL lexicon (Molina-González et al., 2013), a list composed by 2,509 positive words and 5,626 negative words. As described before, for the emoticons we used the listed in Table 1, but also added to the positive ones the number of laughs detected; and also, we included the number of recommendations present in the form of a "Follow Friday" hashtag (#FF), due to its ease of detection and its positive bias.

Classification is performed using scikitlearn, a Python module for machine learning. This package provides many algorithms such as Random Forest, Support Vector Machine (SVM) and so on. One of its main advantages is that it is supported by extensive documentation. Moreover, it is robust, fast and easy to use.

As stated before, we have two main training models: Averaged centroids and the averaged centroids including the inverted document frequency, for both the lemmatized and not-lemmatized texts. We performed experiments using three different classifiers: Random Forests, Support Vector Machines and Logistic Regression because these classifiers often achieved the best results for text classification and sentiment analysis.

Also we evaluated the impact of applying a set of emoticon's rules as a pre-classification stage, similar to (Chikersal et al., 2015), in which we determine a first stage polarity for each tweet as follows:

If posEmo is greater than zero and negEmo is equal to zero, the tweet is marked as "P".

If negEmo is greater than zero and posEmo is equal to zero, the tweet is marked as "N".

If both posEmo and negEmo are greater than zero, the tweet is marked as "NEU".

If both posEmo and negEmo are equal to zero, the tweet is marked as "NONE".

Then, after the classification takes place we made three tests: i) Applying no rule, ii) honoring the polarity defined by the rule, which means, we keep the predefined polarity if the tweet was marked as "P" or "N", otherwise we take the value estimated by the classifier, and iii) a mixed approach where we give each polarity a value (N+: -2; N: -1; NEU,NONE: 0; P: 1; P+: 2) and performed an arithmetic sum of both the predefined and estimated polarity if and only if they are not equal; with that for instance, if the classifier marked a tweet as "N" and the rules marked it as "P" the tweet will be classified as "NEU".

Results

In order to choose the best-performing classifiers, we use 10-fold cross-validation because there is no development dataset and this strategy has become the standard method in practical terms. Our experiments showed that, although the results were similar3 , the best settings for the 5-levels task are: RUN-1: Support Vector Machine, over the averaged centroids without applying any rules for pre-defining polarities.

RUN-2: Support Vector Machine, over the averaged centroids and applying the mixed rules approach. RUN-3: Logistic Regression, over the centroids with inverted document frequency and applying the mixed rules approach.

and for the 3-levels task are: RUN-1: Support Vector Machine, over the averaged centroids and applying the mixed rules approach. RUN-2: Logistic Regression, over the centroids with inverted document frequency and applying the mixed rules approach.

RUN-3: Logistic Regression, over the averaged centroids and applying the mixed rules approach.

Tables 2 and 3 show the results for these settings provided by the TASS submission system. For each run, accuracy is provided as well as the macro-averaged precision, recall and F1-measure. As expected, the results for 3 levels are higher than for 5 levels because the training dataset is larger. With the settings mentioned above, the obtained results are extremely similar, but we can state that, in terms of Accuracy, Logistic Regression report the best results; and, even it's not measured in this work, is worth mentioning that Logistic Regression's performance was observably faster.

Run

Conclusions and future work

This paper explores the use of word embeddings for the task of sentiment analysis. Instead of using, the bag-of-words model to represent tweets, these are represented as word vectors taken from a pre-trained model of word embeddings. An important advantage of word embedding model compared to the technique of bag-of-words representation is that it achieves a significant dimensional reduction of the feature set needed to represent tweets and leads, therefore, to a reduction of training and testing time of the algorithms.

In order to use word embedding models properly, a preprocessing stage had to be completed before training a classifier. Due to the unstructured nature of the tweets, this preprocessing proved to be a very important step in order to standardize at some degree the input data. The experimentation showed that the three tested classifiers obtained very similar results, with Random Forest having slight worse performance and Logistic Regression being slightly better and much more faster.

One of the main drawback of our approach is that many words do not have a word vector in the word embedding model used for our experiments. An analysis showed that many of these words come from hashtags, which are usually short phrases. Therefore, we should apply a more sophisticated method in order to extract the words forming hashtag.

As future work, we also plan to use a word embedding model trained on a collection of text from Spanish social media. We think that this will have a positive effect of the performance of our system to identify the polarity of tweets because this model will be generated from documents characterized by the main features that describe social media texts (for example, informal style language, plenty of grammatical errors and spelling mistakes, slang and vulgar vocabulary).

JACERONG at TASS 2016: An Ensemble Classifier for 1 Introduction

What people say on social media about issues of their everyday life, the society, and the world in general, has turned into a rich source of information to understand social behavior. Twitter content, in particular, has caught the attention of researchers who have investigated its potential for conducting studies on the human subjectivity at large scale, which was not feasible using tradi-tional methods. Around election time, sentiment analysis of political tweets has been widely used to capture trends in public opinion regarding important issues such as voting intention (Gayo-Avello, 2013). However, analyzing this content also presents several challenges, including the development of text analysis approaches based on Natural Language Processing techniques, which properly adapt to the informal genre and the free writ-ing style of Twitter (Han and Baldwin, 2011;Cerón-Guzmán and León-Guzmán, 2016).

TASS is a workshop aimed at fostering research on sentiment analysis of Spanish Twitter data, which provides a benchmark evaluation to compare the latest advances in the field (García-Cumbreras et al., 2016). One of the proposed tasks is to determine the opinion orientation expressed in tweets at global level. Task 1 consists on assigning one of six labels (P+, P, NEU, N, N+, NONE) to a tweet in the six-labels evaluation; or one of four labels (P, NEU, N, NONE) in the four-labels evaluation. Here, P, N, and NEU, stand for positive, negative, and neutral, respectively; NONE, instead, means no sentiment. The "+" symbol is used as intensifier.

This paper presents an ensemble-based approach to polarity classification of Spanish tweets, developed to participate in Task 1 proposed by the organizing committee of the TASS workshop. The ensemble members are (relatively) highly correct classifiers with the lowest absolute correlation with each other. The output from each classifier, which may be either a class label or probabilities for each class, is used to assign the polarity to a tweet based on a majority rule or on the highest unweighted average probability. Moreover, classifiers are adapted to deal with non-standard lexical forms in tweets, in order to improve the quality of natural language analysis.

The remainder of this paper is organized as follows. Section 2 describes the common architecture of the ensemble members (i.e., classifiers). Next, the submitted experiments, as well as the obtained results, are discussed in Section 3. Finally, Section 4 concludes the paper.

The System Architecture

The tweet text is passed through the pipeline of each system in order to assign it a class label or a probability to be of a certain class. The pipeline, which goes from text preprocessing to machine learning classification, is described below. Note that the system term is preferred over the classifier term, because a machine learning classifier receives a feature vector and produces a class label or probabilities for each class; instead, the system term enables to conceive the whole process, from preprocessing to machine learning classification.

Preprocessing

The process of text cleaning and normalization is performed in two phases: basic preprocessing and advanced preprocessing.

Basic Preprocessing

The following simple rules are implemented as regular expressions:

• Removing URLs and emails.

• HTML entities are mapped to textual representations (e.g., "<" → "<").

• Specific Twitter terms such as mentions (@user) and hashtags (#topic) are replaced by placeholders.

• Unknown characters are mapped to their closest ASCII variant, using the Python Unidecode module for the mapping.

• Consecutive repetitions of a same character are reduced to one occurrence.

• Emoticons are recognized and then classified into positive and negative, according to the sentiment they convey (e.g., ":)" → "EMO POS", ":(" → "EMO NEG").

• Unification of punctuation marks (Vilares, Alonso, and Gómez-Rodrıguez, 2014).

Advanced Preprocessing

Once the set of simple rules has been applied, the tweet text is tokenized and morphologically analyzed by FreeLing (Padró and Stanilovsky, 2012). In this way, for each resulting token, its lemma and Part-of-Speech (POS) tag are assigned. Taking these data as input, the following advanced preprocessing is applied:

• Lexical normalization. Each token is passed through a set of basic modules of FreeLing (e.g., dictionary lookup, suffixes check, detection of numbers and dates, and named entity recognition) for identifying standard word forms and other valid constructions. If a token is not recognized by any of the modules, it is marked as out-of-vocabulary (OOV) word. Then, a confusion set is formed by normalization candidates which are identical or similar to the graphemes or phonemes that make the OOV word. These candidates are elements of the union of a dictionary of Spanish standard word forms and a gazetteer of proper nouns. The best normalization candidate for the OOV word is which best fits a statistical language model. The language model was estimated from the Spanish Wikipedia corpus. Lastly, the selected candidate is capitalized according to the capitalization rules of the Spanish language. Extensive research on lexical normalization of Spanish tweets can be read in (Cerón-Guzmán and León-Guzmán, 2016).

• Negation handling. Inspired by the approach proposed by Pang et al. (Pang, Lee, and Vaithyanathan, 2002), this research defined a negated context as a segment of the tweet that starts with a (Spanish) negation word and ends with a punctuation mark (i.e., "!", ",", ":", "?", ".", ";"), but only the first n [0, 3] or all tokens labeled with any or a specific POS tag (i.e., verb, adjective, adverb, and common noun) are affected by adding it the " NEG" suffix. Note that when n = 0, no token is affected.

Feature Extraction

In this stage, the normalized tweet text is transformed into a feature vector that feeds the machine learning classifier. The features are grouped into basic features and n-gram features.

Basic Features

Some of these features are computed before the process of text cleaning and normalization is performed.

• The number of words completely in uppercase.

• The number of words with more than two consecutive repetitions of a same character.

• The number of consecutive repetitions of exclamation marks, question marks, and both punctuation marks (e.g., "!!", "??", "?!") and whether the text ends with an exclamation or question mark.

• The number of occurrences of each class of emoticons (i.e., positive and negative) and whether the last token of the tweet is an emoticon.

• The number of positive and negative words, relative to the ElhPolar lexicon (Saralegi and Vicente, 2013), the AFINN lexicon (Nielsen, 2011), or an union of both lexicons. In a negated context, the label of a polarity word is inverted (i.e., positive words become negative words, and vice versa). Additionally, a third feature labels the tweet with the class whose number of polarity words in the text is the highest.

• The number of negated contexts.

• The number of occurrences of each Partof-Speech tag.

N-gram Features

The fixed-length set of basic features is always extracted from tweets. However, the tweet text varies from another in terms of length, number of tokens, and vocabulary used. For that reason, a process that transforms textual data into numerical feature vectors of fixed length is required. This process, known as vectorization, is performed by applying the tf-idf weighting scheme (Manning, Raghavan, and Schütze, 2008). Thus, each document (i.e., a tweet text) is represented as a vector d = {t 1 , . . . , t n } R V , where V is the size of the vocabulary that was built by considering word n-grams with n [1, 4], or character n-grams with n [3, 5] in the collection (i.e., the training set). The vector is, hence, formed by word n-grams, character n-grams, or a concatenation of word and character n-grams.

Machine Learning Classification

At the last stage, the sentiment analysis system classifies a given tweet as either P+, P, NEU, N, N+, or NONE, or assigns probabilities for each class. After receiving as input the feature vector, a L2-regularized Logistic Regression classifier assigns a class label to the tweet or a probability to be of a certain class. The classifier was trained on the training set, using the Scikit-learn (Pedregosa et al., 2011) implementation of the Logistic Regression algorithm.

3 Experiments 1,720 different sentiment analysis systems were trained on the training set via 5-fold cross validation, in order to find the best parameter settings, namely: negation handling, polarity lexicon, order of word and character n-grams, and others parameters related to the vectorization process (e.g., lowercasing, frequency thresholds, etc.). The systems were sorted by their mean cross-validation score, and thus the top 50 ranked were filtered to build the ensemble. The training set is a collection of 7,219 tweets, each of which is tagged with one of six labels (i.e., P+, P, NEU, N, N+, and NONE). Note that the systems were trained for the six-labels evaluation, and therefore the P+ and P labels were merged into P, as well as the N+ and N labels were merged into N, to produce an output in accordance with the four-labels evaluation. Further description of the provided corpus, as well as of the training and test sets, can be read in (García-Cumbreras et al., 2016).

Next, the top 50 systems assigned a class label to each tweet in a collection of 1,000, which was drawn from the untagged test set with a similar class distribution to the training set. this stage, the objective was to find the systems with the lowest absolute correlation with each other; therefore, the performance was not evaluated. Then, the less-correlated combinations of 5, 10, and 25 systems, were used to build the ensembles, whose outputs correspond to the submitted experiments. These experiments are described below:

• run-1: the less-correlated combination of 5 systems, which chooses the class label that represents the majority in the predictions made by the ensemble members.

• run-2: the less-correlated combination of 10 systems, which chooses the class with the highest unweighted average probability.

• run-3: the less-correlated combination of 25 systems, which chooses the class with the highest unweighted average probability.

Tables 1 and 2 show the performance evaluation on the test set (i.e., a collection of 60,798 tweets) for six and four labels, respectively. Accuracy has been defined as the official metric for ranking the systems. In summary, the main gain occurs among the "run-1" and "run-2" experiments, with an increment of 0.5% in accuracy in the six-labels As a final point, Table 3 shows how the overall performance is affected by the low discriminative power of the ensembles (in this case, the one that correspond to "run-3") for the NEU class. With this in mind, it is proposed as future work to deal with the low representativeness of the NEU class in the training data (i.e., 9.28% of tweets), in order to properly characterize this kind of tweets.

Conclusion

This paper has described an ensemble-based approach for sentiment analysis of Spanish Twitter data at global level, developed in order to participate in Task 1 proposed by the organization of TASS workshop. Three ensembles were built on the combination of sentiment analysis systems with the lowest absolute correlation with each other. The systems were adapted to the informal genre and the free writing style that characterize Twitter, in order to improve the quality of natural language analysis. In this way, the predicted class label for a particular tweet was based on a majority rule or on the highest average probability. Experimental results showed that the less-correlated combination of 25 systems, which chose the class with the highest unweighted average probability, was the setting that best suited to the task. However, there is a great room for improvement in the learning of a proper characterization of neutral tweets.

Participación de SINAI en TASS 2016 *

SINAI participation in TASS 2016

A. Montejo-Ráez University of Jaén 23071 Jaén (Spain) amontejo@ujaen.es

M.C. Díaz-Galiano

University of Jaén 23071 Jaén (Spain) mcdiaz@ujaen.es

Resumen: Este artículo describe el sistema de clasificación de la polaridad utilizado por el equipo SINAI en la tarea 1 del taller TASS 2016. Como en participaciones anteriores, nuestro sistema se basa en un método supervisado con SVM a partir de vectores de palabras. Dichos vectores se calculan utilizando la técnicas de deeplearning Word2Vec, usando modelos generados a partir de una colección de tweets expresamente generada para esta tarea y el volcado de la Wikipedia en español. Nuestros experimentos muestran que el uso de colecciones de datos masivos de Twitter pueden ayudar a mejorar sensiblemente el rendimiento del clasificador.

Palabras clave: Análisis de sentimientos, clasificación de la polaridad, deeplearning, Word2Vec

Abstract: This paper introduces the polarity classification system used by the SI-NAI team for the task 1 at the TASS 2016 workshop. Our approach is based on a supervised learning algorithm over vectors resulting from a weighted vector. This vector is computed using a deep-learning algorithm called Word2Vec. The algorithm is applied so as to generate a word vector from a deep neural net trained over a specific tweets collection and the Spanish Wikipedia. Our experiments show massive data from Twitter can lead to a slight improvement in classificaciones accuracy. Keywords: Sentiment analysis, polarity classification, deep learning, Word2Vec, Doc2Vec

Introducción

En este trabajo describimos las aportaciones realizadas para participar en la tarea 1 del taller TASS (Sentiment Analysis at global level), en su edición de 2016 (García-Cumbreras et al., 2016). Nuestra solución continúa con las técnicas aplicadas en el TASS 2014 (Montejo-Ráez, García-Cumbreras, y Díaz-Galiano, 2014) y 2015 (Díaz-Galiano y Montejo-Ráez, 2015), utilizando aprendizaje profundo para representar el texto y una colección de entrenamiento creada con tweets que contienen emoticonos que expresan emociones de felicidad o tristeza. Para ello utilizamos el método Word2Vec, ya que ha obtenido los mejores resultados en años anteriores. Por lo tanto, generamos un vector de pesos para cada palabra del tweet utilizando Word2Vec, y realizamos la media * Este estudio está parcialmente financiado por el proyecto TIN2015-65136-C2-1-R otorgado por el Ministerio de Economía y Competitividad del Gobierno de España.

de dichos vectores para obtener una única representación vectorial. Nuestros resultados demuestran que el rendimiento del sistema de clasificación puede verse sensiblemente mejorado gracias a la introducción de estos datos en la generación del modelo de palabras, no así en el entrenamiento del clasificador de polaridad final.

La tarea del TASS en 2016 denominada Sentiment Analysis at global level consiste en el desarrollo y evaluación de sistemas que determinan la polaridad global de cada tweet del corpus general. Los sistemas presentados deben predecir la polaridad de cada tweet utilizando 6 o 4 etiquetas de clase (granularidad fina y gruesa respectivamente).

El resto del artículo está organizado de la siguiente forma. El apartado 2 describe el estado del arte de los sistemas de clasificación de polaridad en español. A continuación, se describe la colección de tweets con emoticonos utilizada para entrenar el clasificador. En el apartado 4 se describe el sistema desarro-llado y en el apartado 5 los experimentos realizados, los resultados obtenidos y el análisis de los mismos. Finalmente, en el último apartado exponemos las conclusiones y el trabajo futuro.

Clasificación de la polaridad en español

La mayor parte de los sistemas de clasificación de polaridad están centrados en textos en inglés, y para textos en español el sistema más completo, en cuanto a técnicas lingüísticas aplicadas, posiblemente sea The Spanish SO Calculator (Brooke, Tofiloski, y Taboada, 2009), que además de resolver la polaridad de los componentes clásicos (adjetivos, sustantivos, verbos y adverbios) trabaja con modificadores como la detección de negación o los intensificadores.

Los algoritmos de aprendizaje profundo (deep-learning en inglés) están dando buenos resultados en tareas donde el estado del arte parecía haberse estancado (Bengio, 2009). Estas técnicas también son de aplicación en el procesamiento del lenguaje natural (Collobert y Weston, 2008), e incluso ya existen sistemas orientados al análisis de sentimientos, como el de Socher et al. (Socher et al., 2011). Los algoritmos de aprendizaje automático no son nuevos, pero sí están resurgiendo gracias a una mejora de las técnicas y la disposición de grandes volúmenes de datos necesarios para su entrenamiento efectivo.

En la edición de TASS en 2012 el equipo que obtuvo mejores resultados (Saralegi Urizar y San Vicente Roncal, 2012) presentaron un sistema completo de pre-procesamiento de los tweets y aplicaron un lexicón derivado del inglés para polarizar los tweets. Sus resultados eran robustos en granularidad fina (65 % de accuracy) y gruesa (71 % de accuracy).

En la edición de TASS en 2013 el mejor equipo (Fernández et al., 2013) tuvo todos sus experimentos en el top 10 de los resultados, y la combinación de ellos alcanzó la primera posición. Presentaron un sistema con dos variantes: una versión modificada del algoritmo de ranking (RA-SR) utilizando bigramas, y una nueva propuesta basada en skipgrams. Con estas dos variantes crearon lexicones sobre sentimientos, y los utilizaron junto con aprendizaje automático (SVM) para detectar la polaridad de los tweets.

En 2014 el equipo con mejores resultados en TASS se denominaba ELiRF-UPV (Hur-tado y Pla, 2014). Abordaron la tarea como un problema de clasificación, utilizando SVM. Utilizaron una estrategia uno-contratodos donde entrenan un sistema binario para cada polaridad. Los tweets fueron tokeninizados para utilizar las palabras o los lemas como características y el valor de cada característica era su coeficiente tf-idf. Posteriormente realizaron una validación cruzada para determinar el mejor conjunto de características y parámetros a utilizar.

El equipo ELiRF-UPV (Hurtado, Pla, y Buscaldi, 2015) volvió a obtener los mejores resultados en la edición de TASS 2015 con una técnica muy similar a la edición anterior (SVM, tokenización, clasificadores binarios y coeficientes tf-idf). En este caso utilizaron un sistema de votación simple entre un mayor número de clasificadores con parámetros distintos. Los mejores resultados los obtuvieron con un sistema que combinaba 192 sistemas SVM con configuraciones diferentes, utilizando un nuevo sistema SVM para realizar dicha combinación.

Colección de tweets con emoticonos

Los algoritmos de deep-learning necesitan grandes volúmenes de datos para su entrenamiento. Por ese motivo se ha creado una colección de tweets específica para la detección de polaridad. Para crear dicha colección se han recuperado tweets con las siguientes características:

Que contengan emoticonos que expresen la polaridad del tweet. En este caso se han utilizado los siguientes emoticonos:

• Positivos: :) :-) :D :-D

• Negativos: :( :-(

Que los tweets no contengan URLs, para evitar tweets cuyo contenido principal se encuentra en el enlace.

Que no sean retweets, para reducir el número de tweets repetidos.

La captura de dichos tweets se realizó durante 22 días, del 18/07/2016 hasta el 9/08/2016, recuperando unos 100.000 tweets diarios aproximadamente. Tal y como se ve en la Figura 1 la recuperación fue muy homogénea y se obtuvieron más de 2.000.000 de tweets. Eliminar menciones (nombres de usuario que empiezan el caracter @).

Sustituir letras acentuadas por sus versiones sin acentuar.

Quitar las palabras vacías de contenido (stopwords).

Normalizar las palabras para que no contengan letras repetidas, sustituyendo las repeticiones de letras contiguas para dejar sólo 3 repeticiones.

Descripción del sistema

Word2Vec 1 es una implementación de la arquitectura de representación de las palabras mediante vectores en el espacio continuo, basada en bolsas de palabras o n-gramas concebida por Tomas Mikolov et al. (Mikolov et al., 2013). Su capacidad para capturar la semántica de las palabras queda comprobada en su aplicabilidad a problemas como la analogía entre términos o el agrupamiento de palabras. El método consiste en proyectar las palabras a un espacio n-dimensional, cuyos pesos se determinan a partir de una estructura de red neuronal mediante un algoritmo recurrente. El modelo se puede configurar para que utilice una topología de bolsa de palabras (CBOW) o skip-gram, muy similar al 1 https://code.google.com/p/word2vec/ anterior, pero en la que se intenta predecir los términos acompañantes a partir de un término dado. Con estas topologías, si disponemos de un volumen de textos suficiente, esta representación puede llegar a capturar la semántica de cada palabra. El número de dimensiones (longitud de los vectores de cada palabra) puede elegirse libremente. Para el cálculo del modelo Word2Vec hemos recurrido al software indicado, creado por los propios autores del método.

Tal y como se ha indicado, para obtener los vectores Word2Vec representativos para cada palabra tenemos que generar un modelo a partir de un volumen de texto grande. Para ello hemos utilizado los parámetros que mejores resultados obtuvieron en nuestra participación del 2014 (Montejo-Ráez, García-Cumbreras, y Díaz-Galiano, 2014). Por lo tanto, a partir de un volcado de Wikipedia2 en Español de los artículos en XML, hemos extraído el texto de los mismos. Obtenemos así unos 2,2 GB de texto plano que alimenta al programa word2vec con los parámetros siguientes: una ventana de 5 términos, el modelo skip-gram y un número de dimensiones esperado de 300, logrando un modelo con más de 1,2 millones de palabras en su vocabulario.

Como puede verse en la Figura 2, nuestro sistema realiza la clasificación de los tweets utilizando dos fases de aprendizaje, una en la que entrenamos el modelo Word2Vec haciendo uso de un volcado de la enciclopedia on-line Wikipedia, en su versión en español, como hemos indicado anteriormente. De esta forma representamos cada tweet con el vector resultado de calcular la media de los vectores Word2Vec de cada palabra en el tweet y su desviación típica (por lo que cada vector de palabras por modelo es de 600 dimensiones). Se lleva a cabo una simple normalización previa sobre el tweet, eliminando repetición de letras y poniendo todo a minúsculas. La segunda fase de entrenamiento utiliza el algoritmo SVM y se entrena con la colección de tweets con emoticonos explicada en el apartado 3. La implementación de SVM utilizada es la basada en kernel lineal con entrenamiento SGD (Stochastic Gradient Descent) proporcionada por la biblioteca Sci-kit Learn3 (Pedregosa et al., 2011).

Esta solución es la utilizada en las dos variantes de la tarea 1 del TASS con predicción de 4 clases: la que utiliza el corpus de tweets completo (full test corpus) y el que utiliza el corpus balanceado (1k test corpus).

Figura 2: Flujo de datos del sistema completo 5 Resultados obtenidos Hemos experimentado con el efecto que tienen en el rendimiento del sistema el uso de una colección de datos generada a partir de la captura de tweets y que han sido etiquetados según los emoticonos que contienen en la forma comentada anteriormente. La colección de más de 1,7 millones de tweets ha sido utilizada al completo para generar un modelo de vectores de palabras, cuya combinación con el de Wikipedia se ha analizado. También hemos comprobado cómo el uso de dicha colección de tweets afecta cuando se usa para el entrenamiento del modelo de clasificación de la polaridad. Para ello se han seleccionado 500,000 tweets aleatoriamente de esta colección, con sus correspondientes etiquetas P (positivo) o N (negativo) y se han combiando con la colecciónd de entrenamiento de TASS. Los resultados según las medidaas de Accuracy y Macro F1 obtenidas se muestran en la tabla 1. La primera columna nos indica a partir de cuáles datos se han generado los modelos de vectores de palabras, bien sólo con Wikipedia (W) o como combinación de ésta con los tweets del corpus construido (W+T). La segunda columna indica cómo se ha entrenado el clasificador de polaridad a partir de los textos etiquetados vectorizados con los modelos generados en el paso previo, bien sólo usando los datos de entrenamiento proporcionados por la organizacion (TASS) o incorporando los etiquetados a partir de emoticonos (TASS+T).

Como podemos observar, el uso de una colección de tweets para ampliar la capacidad de representar un modelo basado en vectores de palabras mejora sensiblemente al ge- Esto nos lleva a plantearnos la pregunta de qué ocurriría si utilizáramos sólo los tweets recopilados para generar un modelo de vectores de palabras. Los resultados que se obtienen son un 59,05 % de ajuste y un 44,43 % de F1. No cabe duda de que conviene explorar el uso de modelos de generación de características a partir de vectores de palabras.

Estos resultados mejoran nuestros datos del año pasado, en los que obtuvimos un ajuste del 61,19 % combinando vectores de palabras (Word2Vec) y vectores de documentos (Doc2Vec).

Conclusiones y trabajo futuro

A partir de los resultados obtenidos, encontramos que resulta interesante la incorporación de texto no formal (tweets) para la generación de los modelos de palabras, lo cual tiene su sentido en una tarea de clasificación que, precisamente, trabaja sobre textos no formales que tienen la misma red social como fuente. En cambio, el considerar que los emoticonos en un tweet pueden ayudar a un clasificador como SVM a mejorar en la determinación de la polaridad ha resultado una hipótesis fallida. Esto puede entenderse echando un vistazo a algunos de los tweets capturados por el sistema, donde se evidencia la dificultad, incluso para una persona, de poner en contexto el sentido del tweet y su consideración como positivo o negativo si no disponemos de un emoticono asociado.

Como trabajo futuro nos proponemos diseñar una red neuronal profunda más elaborada, pero que parta también de textos de entrenamiento tanto formales como no formales, si bien teniendo en cuanta información lingüística más avanzada como la sintáctica, en lugar de trabajar con simples bolsas de palabras. También queremos explorar el uso de redes de este tipo en el proceso de cación en no en la generación de características. Una posibilidad es utilizar una red de tipo DBN (Deep Belief Network) (Hinton y Salakhutdinov, 2006) en la que se añade una última fase donde se realiza el etiquetado de los ejemplos.

ELiRF-UPV en TASS 2016: Análisis de Sentimientos en Twitter Resumen: En este trabajo se describe la participación del equipo del grupo de investigación ELiRF de la Universitat Politècnica de València en el Taller TASS2016. Este taller es un evento enmarcado dentro de la XXXII edición del Congreso Anual de la Sociedad Española para el Procesamiento del Lenguaje Natural. Este trabajo presenta las aproximaciones utilizadas para las dos tareas planteadas en el taller, los resultados obtenidos y una discusión de los mismos. Nuestra participación se ha centrado principalmente en explorar diferentes aproximaciones para combinar un conjunto de sistemas con lo que se ha obtenido los mejores resultados en ambas tareas. Palabras clave: Twitter, Análisis de Sentimientos.

Abstract: This paper describes the participation of the ELiRF research group of the Universitat Politècnica de València at TASS2016 Workshop. This workshop is a satellite event of the XXXII edition of the Annual Conference of the Spanish Society for Natural Language Processing. This work describes the approaches used for the two tasks of the workshop, the results obtained and a discussion of these results. Our participation has focused primarily on exploring different approaches for combining a set of systems. Using these approaches we have achieved the best results in both tasks. Keywords: Twitter, Sentiment Analysis.

Introducción

El Taller de Análisis de Sentimientos (TASS) en sus cinco ediciones ha venido planteando tareas relacionadas con el análisis de sentimientos en Twitter. El objetivo principal es el de comparar y evaluar diferentes aproximaciones a estas tareas. Además, desarrolla recursos de libre acceso, básicamente, corpora anotados con polaridad, temática, tendencia política, aspectos, que son de gran utilidad para la comparación de diferentes aproximaciones a las tareas propuestas.

En esta quinta edición del TASS se proponen dos tareas de ediciones anteriores (García-Cumbreras et al., 2016): 1) Determinación de la polaridad en tweets, con diferentes grados de intensidad en la polaridad: 6 etiquetas y 4 etiquetas y 2) Determinación de la polaridad de los aspectos en el corpus STOMPOL. Este corpus consta de un con-junto de tweets sobre diferentes aspectos pertenecientes al dominio de la política.

El presente artículo resume la participación del equipo ELiRF-UPV de la Universitat Politècnica de València en todas las tareas planteadas en este taller. Primero se describen las aproximaciones y recursos utilizados en cada tarea. A continuación se presenta la evaluación experimental realizada y los resultados obtenidos. Finalmente se muestran las conclusiones y posibles trabajos futuros.

Descripción de los sistemas

Los sistemas presentados en el TASS 2016 se basan en el sistema desarrollado en la edicion anterior del TASS 2015 (Hurtado, Pla, y Buscaldi, 2015). Muchas de las características y recursos de este sistema fueron utilizados en las ediciones en las que nuestro equipo ha participado (Pla y Hurtado, 2013) (Hurtado y Pla, 2014) . El preproceso de los tweets utiliza la estrategia descrita en el trabajo TASS 2013 Hurtado, Esta consiste básicamente en la adaptación para el castellano del tokenizador de tweets Tweetmotif (Connor, Krieger, y Ahn, 2010). También se ha usado Freeling (Padró y Stanilovsky, 2012)1 como lematizador, detector de entidades nombradas y etiquetador morfosintáctico, con las correspondientes modificaciones para el dominio de Twitter. Usando esta aproximación, la tokenización ha consistido en agrupar todas las fechas, los signos de puntuación, los números y las direcciones web. Se han conservado los hashtags y las menciones de usuario. Se ha considerado y evaluado el uso de palabras y lemas como tokens así como la detección de entidades nombradas.

Todas las tareas se han abordado como un problema de clasificación. Se han utilizado Máquinas de Soporte Vectorial (SVM) por su capacidad para manejar con éxito grandes cantidades de características. En concreto usamos dos librerías (LibSVM2 y LibLinear3 ) que han demostrado ser eficientes implementaciones de SVM que igualan el estado del arte. El software está desarrollado en Python y para acceder a las librerías de SVM se ha utilizado el toolkit scikit-learn4 . (Pedregosa et al., 2011).

En este trabajo se ha explotado la técnica de combinación de diferentes configuraciones de clasificadores para aprovechar su complementariedad. Se ha utilizado la técnica de votación simple utilizada en trabajos anteriores (Pla y Hurtado, 2013) (Pla y Hurtado, 2014b) pero en este caso extendiéndola a un número mayor de clasificadores, con diferentes parámetros y características (palabras, lemas, n-gramas de palabras y lemas) así como estrategias de combinación alternativas.

Cada tweet se ha representado como un vector que contiene los coeficientes tf-idf de las características consideradas. En toda la experimentación realizada, las características y los parámetros de los clasificadores se han elegido mediante una validación cruzada de 10 iteraciones (10-fold cross-validation) sobre el conjunto de entrenamiento.

Tarea 1: Análisis de sentimientos en tweets

Esta tarea consiste en determinar la polaridad de los tweets y la organización ha definido dos subtareas. La primera distingue seis etiquetas de polaridad: N y N+ que expresan polaridad negativa con diferente intensidad, P y P+ para la polaridad positiva con diferente intensidad, NEU para la polaridad neutra y NONE para expresar ausencia de polaridad. La segunda sólo distinguen 4 etiquetas de polaridad: N, P, NEU y NONE.

El corpus proporcionado por la organización del TASS consta de un conjunto de entrenamiento, compuesto por 7219 tweets etiquetados con la polaridad usando seis etiquetas, y un conjunto de test, de 60798 tweets, al cual se le debe asignar la polaridad. La distribución de tweets según su polaridad en el conjunto de entrenamiento se muestra en la Tabla 1. Tabla 1: Distribución de tweets en el conjunto de entrenamiento según su polaridad.

Polaridad

A partir de la tokenización propuesta se realizó un proceso de validación cruzada (10fold cross validation) para determinar el mejor conjunto de características y los parámetros del modelo. Como características se probaron diferentes tamaños de n-gramas de palabras y de lemas. También se exploró la combinación de los modelos mediante diferentes técnicas de votación para aprovechar su complementariedad y mejorar las prestaciones finales. Algunas de éstas técnicas proporcionaron mejoras significativas sobre el mismo conjunto de datos, como se muestra en (Pla y Hurtado, 2014b). En todos los casos se han utilizado diccionarios de polaridad, tanto de lemas (Saralegi y San Vicente, 2013), como de palabras (Martínez-Cámara et al., 2013) y el diccionario Afinn (Hansen et al., 2011) traducido automáticamente del inglés al castellano.

Se han considerado dos alternativas para abordar la

La primera combina mediante un sistema de votación ponderada la de 192 clasificadores basados en el uso de SVM. La diferencia entre los clasificadores radica en el preprocesado y la tokenización utilizada, las características seleccionadas y los valores de los parámetros del propio modelo SVM. En concreto se realizaron todas las combinaciones posibles entre 8 tokenizaciones (lemas o palabras, detectar NE o no, detectar menciones a usuarios y hashtags, ...); 4 conjuntos distinto de características (palabras o bigramas con y sin diccionarios de polaridad) y 6 valores distintos del parámetro c del modelo SVM con kernel lineal. La clase asignada a cada tweet t viene determinada por la siguiente fórmula.

ĉ = argmax c∈C (N t (c) • P (c))(1)

Donde C es el conjunto de todas las clases, N t (c) es el número de clasificadores que asignan la clase c al tweet t, y P (c) es la probabilidad a priori de la clase c calculada utilizando el corpus de entrenamiento.

run2 La segunda alternativa explora la combinación de modelos mediante el aprendizaje de un metaclasificador. Utilizando las salidas de los mismos 192 clasificadores que en el run anterior, se ha aprendido un segundo modelo SVM que sirve para proporcionar la nueva salida combinada. Se ha destinado una parte del corpus de entrenamiento para ajustar los parámetros del metamodelo. Esta aproximación es la misma que la utilizada en la edición del TASS 2015.

Para la subtarea de 4 etiquetas el run1 se ha aprendido utilizando el corpus de aprendizaje con 4 etiquetas mientras que el run2, dada la complejidad del ajuste de parámetros del metamodelo se ha optado por adaptar el resultado de la subtarea de 6 etiquetas uniendo P y P+ como P y N y N+ como N.

En la Tabla 2 se muestran los valores de Accuracy obtenidos para las dos subtareas.

Tarea 2: Análisis de Polaridad de Aspectos en Twitter

Esta tarea consiste en asignar la polaridad a los aspectos que aparecen marcados en el corpus. Una de las dificultades de la tarea consiste en definir qué contexto se le asigna a cada aspecto para poder establecer su polaridad. Para un problema similar, detección de la polaridad a nivel de entidad, en la edición del TASS 2013, propusimos una segmentación de los tweets basada en un conjunto de heurísticas (Pla y Hurtado, 2013). Esta aproximación también se utilizó para la tarea de detección de la tendencia política de los usuarios de Twitter (Pla y Hurtado, 2014a) y para este caso proporcionó buenos resultados. En este trabajo se propone una aproximación más simple que consiste en determinar el contexto de cada aspecto a través de una ventana fija definida a la izquierda y derecha de la instancia del aspecto. Esta aproximación es la que se utilizó en nuestro sistema del TASS 2015 la cual utiliza ventanas de diferente longitud. La longitud de la ventana óptima se ha determinado experimentalmente sobre el conjunto de entrenamiento mediante una validación cruzada. Para entrenar nuestro sistema, se ha considerado el conjunto de entrenamiento únicamente, se han determinado los segmentos para cada aspecto y se ha seguido una aproximación similar a la Tarea 1.

El corpus de la tarea, corpus STOMPOL, se compone de un conjunto de tweets relacionados con una serie de aspectos políticos (como economía, sanidad, etc.) enmarcados en la campaña política de las elecciones andaluzas de 2015. Cada aspecto se relaciona con una o varias entidades que se corresponden con uno de los principales partidos políticos en España PSOE, IU, y Podemos). El corpus consta de 1.284 tweets, ha sido dividido en un conjunto de entrenamiento (784 tweets) y un conjunto de evaluación (500 tweets).

Aproximación y resultados

A continuación presentamos una pequeña descripción de las características de nuestro sistema así como el proceso seguido en la fase de entrenamiento. El sistema utiliza un clasificador basado en SVM. Para aprender los modelos sólo se utiliza el conjunto de entrenamiento proporcionado para la tarea y los diccionarios de polaridad previamente descritos. Antes de abordar el entrenamiento se determinan los segmentos de tweet que constituyen el contexto de cada una de los aspectos presentes. Se ha tenido en cuenta tres tamaños de ventana de longitudes 5, 7 y 10 palabras a la izquierda y derecha del aspecto. Cada uno de los segmentos se tokeniza y se utiliza Freeling para determinar sus lemas y ciertas entidades. A continuación se aprenden diferentes modelos combinando tamaños de ventana, parámetros del modelo y diferentes características (palabras, lemas, NE, etc). Mediante validación cruzada se elige el mejor modelo. Para esta tarea sólo hemos presentado un modelo.

Run Accuracy STOMPOL run1 0.633 Tabla 3: Resultados oficiales del equipo ELiRF-UPV en la Tarea 2 de la competición TASS-2016 para el corpus STOMPOL.

En la Tabla 3 se presentan los resultados obtenidos para la Tarea 2 con lo que nuestra aproximación ha obtenido la primera posición en dicha tarea.

Conclusiones y trabajos futuros

En este trabajo se ha presentado la participación del grupo ELiRF-UPV en las 2 tareas planteadas en TASS 2016. Nuestro equipo ha utilizado aproximaciones basadas en máquinas de soporte vectorial y se ha centrado principalmente en combinar diferentes sistemas.

Haciendo un análisis del número de participantes y de los resultados obtenidos en las dos últimas ediciones del TASS, creemos que se está cerca de alcanzar los mejores resultados posibles en la tarea de Análisis de sentimientos tal y como se ha venido planteando hasta el momento.

A la vista de los buenos resultados que se han obtenido mediante la combinación de sistemas, como trabajo futuro nos planteamos desarrollar nuevos métodos de combinación de sistemas más sofisticados así como la inclusión de otros paradigmas de clasificación más hetereogéneos (distintos de los SVM) para aumentar la complementariedad de los sistemas combinados.

Además, se pretende extender el sistema para otros idiomas. El sistema descrito ya ha sido utilizado, con ligeras modificaciones, en tareas de análisis de sentimientos para el Inglés en la competición Semeval (Martínez, Pla, y Hurtado, 2016) aunque con resultados no tan satisfactorios como en las tareas del TASS.

Agradecimientos

Este trabajo ha sido parcialmente subvencionado por el MINECO mediante el proyecto ASLP-MULAN: Audio, Speech and Language Processing for Multimedia Analytics (TIN2014-54288-C4-3-R).

Bibliografía

Introduction

The social media activity is being profused in the recent years, users post opinions and comments in Twitter and in other social platforms. Due to this, there is a huge amount of information available that could be useful for business, in order to design marketing campaigns or to apply any kind of business analysis.

As a consequence, the research on text mining and also on the field of Sentiment Analysis (sa) has grown considerably these days. sa is the part of Natural Language Processing (nlp) responsible for determining the polarity of a text or a whole sentence. The sa applied to Twitter has to be conducted in a restricted scenario due to the maxi-mum length of the post. However, tweets have other elements we have to consider, like hashtags, mentions and retweets. More concretely, aspect-based sentiment analysis (absa) consists of extracting opinions, i.e. determining the sentiment polarity, from specific entities in the text (Liu, 2012). Therefore, this task becomes a challenge on the field of nlp.

The tass Workshop (García-Cumbreras et al., 2016) and the sepln conference offer an opportunity for participants to know about the latest advances on the field of nlp for Spanish language.

Many approaches applied to sa can be found in the literature, where it is possible to distinguish between knowledge based approaches (Brooke, Tofiloski, and Taboada, 2009;Fernández-Gavilanes et al., 2016), using grammars and thesaurus and others based on machine learning approaches (Mo-hammad, Kiritchenko, and Zhu, 2013). In the last years we can also find deep learning approaches (Bengio, 2009), applied to this task.

We our supervised machine (ml) system which consists of a Support Vector Machine (svm) classifier. Our objective is to conduct the sa process at an aspect level, task 2, determining the polarity of a specific given part of a sentence.

The article is structured as follows. Section 2 is a review of the research involving sa in the Twitter domain. Then, the Section 3 describes the applied approach and the implemented system. In Section 4, we show the experimental results of our system. Finally, in Section 5 we present the conclusions and future works.

Related work

A large amount of literature related to Opinion Mining (om) and sa can be found (Pang and Lee, 2008;Martínez-Cámara et al., 2016). Most of the systems are applied to Twitter. However others are applied to social media platforms within the micro-blog context. Due to this, the approaches are varied technically and in connection with the purpose.

Two main approaches exist in sa: supervised and unsupervised learning ones. Supervised systems implement classification methods like svm, Logistic Regression (lr), Conditional Random Fields (crf), K-Nearest Neighbors (knn), etc. Cui, Mittal, and Datar (2006) affirmed that svm are more appropriate for sentiment classification than generative models, due to their capability for working with ambiguity, that is, dealing with mixed feelings. Supervised algorithms are used when the number of classes, as well as the representative members of each class, are known.

Unsupervised systems are based on linguistic knowledge like lexicons, and syntactic features in order to infer the polarity (Paltoglou and Thelwall, 2012). These last techniques represent a more effective approach in the cross-domain context and for multilingual applications. The unsupervised classification algorithms do not work with a training set, in contrast, some of them use clustering algorithms in order to distinguish groups (Li and Liu, 2010).

As noted earlier, the special case of ap-plying sa to Twitter has been fully addressed (Pak and Paroubek, 2010;Han and Baldwin, 2011). Within the chosen solutions, we highlight the text normalization approach (Fabo, Cuadros, and Etchegoyhen, 2013) and the use of key elements in classification approach (Wang et al., 2011). Others hold the advantages of using deep learning techniques in this task (dos Santos and Gatti, 2014).

According to the purpose of the developed systems, it is possible to find applications like classification of product reviews and political sentiment and election results prediction (Bermingham and Smeaton, 2011), among others.

System Overview

In this section we make a brief description of the system submitted for Task 2: Aspectbased sentiment analysis. We developed a supervised system, based on a svm classifier using different features. In the next subsections we explain the different steps required.

Preprocessing

Before applying any supervised approach to our corpus, some preprocessing is needed. First of all, we have to normalize the text, since in Twitter language we can find abbreviations, mentions, hashtags, URLs or misspellings. In order to do that, we replace the URLs with the "URL" tag and we replace the abbreviations or misspellings with the correct entire word. For mentions and hashtags, we keep them unchanged but deleting the "@" or "#" symbols. Moreover, when a hashtag is composed of several words, we split and treat them as different tokens.

After this, a lexical analysis is carried out. It consists of lemmatization and POS tagging, which are performed by means of Freeling tool (Atserias et al., 2006).

Once we have analysed lexically the texts, we decided to separate the sentences by the different aspects. For doing that, the scope of each aspect is determined, applying the following rules, which are adapted from our English aspect based sentiment anaylisis system (Alvarez-López et al., 2016) • If there is only one aspect in the sentence, we keep the sentence unchanged, and introduce it entirely as input for the next step.

• If there are multiple aspects, we separate the sentences by punctuation marks, conjunctions or other aspects found.

• If there are several aspects with between them, we that they belong to the same context, and assign the same polarity to all of them.

SVM classifier

In this section we describe the strategy followed to determine the sentiment (positive, negative or neutral) for each aspect predefined in corpus.

We develop a svm classifier, using the libsvm library (Chang and Lin, 2011). The inputs for the svm will be the sentences separated by contexts, as explained in the previous subsection. The features extracted are the following:

• Word tokens of nouns, adjectives and verbs in the sentence.

• Lemmas of verbs, nouns and adjectives that appear in each sentence.

• POS tags of nouns, adjectives and verbs.

• N-grams of different length, grouping the words in each sentence.

• Aspects appearing in the sentence. We join "aspect"-"entity", defined in each target as a feature.

• Negations. We create a negation dictionary, which contains several particles indicating negation, such as "no", "nunca", etc.

The previous features are all binary ones, assigning the value 1 if the current feature is present in the tweet and the value 0, if not.

Experimental Results

The Task 2: Sentiment Analysis at the aspect level consists of assigning a polarity label to each aspect, which were initially marked in the stompol corpus (Martínez-Cámara et al., 2016) raised by the tass organization. In this way, this corpus provides both polarity labels and the identification of the aspects that appear in each tweet. The aim is to be able to correctly assign to each aspect a positive, negative or neutral polarity.

In this regard, the stompol corpus consists of a set of Spanish tweets related to a number of political issues, such as health or economy, among others. These issues are framed in the political campaign of Andalusian elections in 2015, where each aspect relates to one or several entities that correspond to one of the main political parties in Spain (PP, PSOE, IU, UPyD, Cs and Podemos). The corpus is composed by 1,284 tweets, and has been divided into a training set (784 tweets) and a set of evaluation (500 tweets).

In order to evaluate the performance of the various features for polarity classification at an aspect-based level, we perform a series of ablation experiments as shown in Table 1. We start with the word token baseline classifier, and then add all four sets of features that help to increase performance as measured by accuracy. As we might expect, including the aspect feature has the most marked effect on the performance of polarity classification, although all the features contributed to improving overall performance on stompol corpus. Due to the low participation of research teams in task 2 this year, we decided to compare our proposal to the systems presented this year and also to that ones of last year, because of the use of the same dataset.

Type

For this reason, Table 2 compares results for our approach with different official ones submitted in 2015 and 2016 tass editions. In this way, we compared our results for a ml approach based on well-known squaredregularised logistic regression with a snippet of length 4 (Lys-2) described in Vilares et al. (2015), a clustering method focused on grouping authors with similar sociolinguistic insights (TID-spark) described in Park (2015), a recurrent neural network composed of a single long short term memory and a logistic function (Lys-1) described in Vilares et al. (2015), a ml approach based on a svm with a snipped of length 5,7 and 10 (ELiRF) described in Hurtado, Plà, and Buscaldi (2015), and the best performing run of the actual task edition (ELiRF-UPV). Comparing the results, the performance of our current model is close from the top ranking systems of this and last year.

Task edition Accuracy

ELiRF

Conclusions and future works

This paper describes the participation of the GTI group in the tass 2016, Task 2: Aspect-Based Sentiment Analysis. We developed a supervised system based on a svm classifier for the aspect-based sentiment analysis. The performance of our approach has been compared to that ones submitted this year but also to that ones submitted last year. Experimental results suggest that we need to include explore new features, such as word embedding representations or paraphrase (Zhao and Lan, 2015), in order to improve the performance.

As future work we plan to include new features explained before and to develop a new system which combines different ml classification methods. We are also interested in considering different paradigms of heterogeneous classification, such as deep learning to increase the performance.

Figure 1 :1Figure 1: Sample tweets (General corpus)

Figure 22Figure2shows the information of two sample tweets.

Figure 2 :2Figure 2: Sample tweets (STOMPOL corpus)

Figure 3 :3Figure 3: Tweet not rightly classified by any system

Figures Figure3,Figure 4Figure 5 are three examples of tweets that were not rightly classified by any system. The common feature of the three tweets is that they do not have any lexical marker that express emotion or opinion. Moreover, the tweet of the Figure 4 is sarcastic, which means an additional challenging for SA because requires a deep understanding of the language.

Figure 4 :4Figure 4: Tweet not rightly classified by any system

Positive :-), :), :D, :o), :], D:3, :c), :>, =], 8), =), :}, :ˆ), :-D, 8-D, 8D, x-D, xD, X-D, XD, =-D, =D, =-3, =3, BˆD, :'), :'), :*, :-*, :ˆ*, ;-), ;), *-), *), ;-], ;], ;D, ;ˆ), >:P, :-P, :P, X-P, x-p, xp, XP, :-p, :p, =p, :-b, :b Negative >:[, :-(, :(, :-c, :-<, :<, :-[, :[, :{, ;(, :-||, >:(, :'-(, :'(, D:<, D=, v.v

Figura 1 :1Figura 1: Número de tweets recuperados cada 12 horas

Table 1 :1Corpus statisticsAttributeValueTweets68.017Tweets (test)60.798 (89%)Tweets (test)7.219 (11%)Topics10Users154Date start (train)2011-12-02Date end (train)2012-04-10Date start (test)2011-12-02Date end (test)2012-04-10

Table 2 .2Overview ofTASS 2016 EntityTrainTestPP205125PSOE13670C's11987Podemos9880IU11143UPyD97124Total766529

Table 2 :2Number of tweets per entity and per corpus subset

Table 3 :3Participant groups

GroupReportELiRF-UPV en TASS 2016:ELiRFAnálisis de Sentimientos enTwitterGTI at TASS 2016:GTISupervised Approach for Aspect Based SentimentAnalysis in TwitterJACERONG at TASS 2016:jacerongAn Ensemble Classifier for Sentiment Analysis of SpanishTweets at Global LevelLABDA at the 2016 TASSLABDAchallenge task: using word embedding for the sentimentanalysis taskSINAIParticipación de SINAI en TASS 2016

Table 4 :4Participant reports

Table 5 :5Results for Task 1, 5 levels

Table 6 :6Results for Task 1, 3 levels

Overview of TASS 2016

Table 7 :7Results for Task 2

Number of systems Rate of tweets

00.056%10.065%20.063%30.067%40.059%50.061%60.074%70.078%80.081%90.112%100.122%110.082%120.062%130.011%

Table 8 :8Rate of tweets rightly classified (6 classes) by a number of systems

Beating Barça by 17 points in the Copa is amazing Polarity: P+ Id: 177439342497767424

hahahahahaha "@Absolutexe: ¿Lehan cambiado ya el nombre a laJunta de Andalucía por la Banda deAndalucía o aún no?"hahahahahaha "@Absolutexe: Has theJunta de Andalucía renamed Gang ofAndalucía or not yet?"Polarity: N+Id: 177439342497767424Rubalcaba pide a Rajoy quepresente ya los Presupuestos y diceque no lo hace porque espera a laselecciones andaluzasRubalcaba requires Rajoy to submit theBudget and says that he didn't becausehe is waiting the results of the electionsin AndaluciaPolarity: NONE

Evaluación de Modelos de Representación del Texto con Vectores de Dimensión Reducida para Análisis de Sentimiento *Turney, P. D. 2002. Thumbs up or thumbsdown?: Semantic orientation applied tounsupervised classification of reviews. InProceedings of the 40th Annual Meeting onAssociation for Computational Linguistics,ACL '02, pp: 417-424. Association for Evaluation of Reduced Dimension Vector Text Representation Models forComputational Linguistics, Stroudsburg, Sentiment AnalysisPA, USA. doi:10.3115/1073083.1073153.Villena-Román, J., Sara, L. S., Eugenio M. C., Edgar Casasola MurilloGabriela Marín Raventósand José Carlos G. C. 2013. TASS -Universidad de Costa RicaUniversidad de Costa RicaWorkshop on Sentiment Analysis at SEPLN. San José, Costa RicaSan José, Costa RicaRevista de Procesamiento del Lenguaje edgar.casasola@ucr.ac.crgabriela.marin@ucr.ac.crNatural, 50, pp 37-44.Villena-Román, J., Janine G. M., Sara L. S. andJosé Carlos G. C. 2014. TASS 2013 -ASecond Step in Reputation Analysis inSpanish. Revista de Procesamiento delLenguaje Natural, 52, pp 37-44.Pang, B., Lillian Lee and ShivakumarVaithyanathan.2002.Thumbsup?:Sentiment classification using machinelearning techniques. In Proceedings of theACL-02 Conference on Empirical Methodsin Natural Language Processing -Volume10, EMNLP '02, páginas 79-86. Associationfor Computational Linguistics, Stroudsburg,PA, USA. doi:10.3115/1118693.1118704.Pang, B. and Lillian Lee (2008). Opinionmining and sentiment analysis. Foundationsand Trends in Information Retrieval, 2(1-2):1-135.ISSN1554-0669.doi:10.1561/1500000011.Quirós, A., Isabel S. B. and Paloma M. 2016.LABDA at the 2016 TASS challenge task:using word embeddings for the sentimentanalysis task. In Proceedings of TASS 2016:Workshop on Sentiment Analysis at SEPLNco-located with the 32nd SEPLNConference (SEPLN 2016), Salamanca,September

Table 3 :3Results for Sentiment Analysis at global level (3 levels, Full test corpus)PRF1AccRUN-1 0.411 0.449 0.429 0.527RUN-2 0.412 0.448 0.429 0.527RUN-3 0.402 0.436 0.418 0.549Table 2: Results for Sentiment Analysis atglobal level (5 levels, Full test corpus)RunPRF1AccRUN-1 0.506 0.510 0.508 0.652RUN-2 0.508 0.508 0.508 0.652RUN-3 0.512 0.511 0.511 0.653

Table 1 :1Performance on the test set in the six-labels evaluationExperiment AccuracyMacro-PrecisionMacro-RecallMacro-F1run-10.6140.4710.5310.499run-20.6190.4760.5350.504run-30.6200.4770.5320.503Experiment AccuracyMacro-PrecisionMacro-RecallMacro-F1run-10.7020.5640.5650.564run-20.7040.5670.5680.567run-30.7050.5680.5670.568Table 2: Performance on the test set in thefour-labels evaluationClass Precision Recall F1-scoreP0.7550.7860.770NEU0.1280.0930.107N0.6310.8120.710NONE 0.7580.5780.656Table 3: Discriminative power for each classin the four-labels evaluationevaluation, and of 0.2% in the four-labelsevaluation; instead, a negligible gain occursamong the "run-2" and" run-3" experiments,taking additionally into account the compu-tational cost of running the latter.

Table 1 :1Results for polarity feature ablation experiments on stompol corpusAccuracy ImprovementWord token56.12+Lemmas57.64+1.52%+pos tags58.26+0.62%+Aspects59.94+1.68%+Negations60.60+0.66%

Table 2 :2Results of different approaches in 2015/2016 tass editions on stompol corpus-UPV201663.3ELiRF201563.3GTI201660.6LyS-1201559.9TID-spark201555.7Lys-2201554.0

Publicado en http://ceur-ws.org/. CEUR-WS.org es una publicación en serie con ISSN reconocido ISSN 1613-0073 http://snowball.tartarus.org/algorithms/spanish/stop.txt https://www.meaningcloud.com/ Experiments showed that not-lemmatized text performed better in all settings, hence the best settings reported here is using not-lematized model http://dumps.wikimedia.org/eswiki http://scikit-learn.org/ Participación de SINAI enTASS 2016 Participación de SINAI enTASS 2016 http://nlp.lsi.upc.edu/freeling/ http://www.csie.ntu.edu.tw/˜cjlin/libsvm/ http://www.csie.ntu.edu.tw/˜cjlin/liblinear/ http://scikit-learn.org/stable/ Álvarez-López, M. Fernández-Gavilanes, S. García-Méndez, J. Juncal-Martínez, F. J. González-Castaño

Acknowledgements

This work has been partially supported by a grant from the Fondo Europeo of Desarrollo Regional (FEDER) and REDES project (TIN2015-65136-C2-1-R) from the Spanish Government.

Acknowledgments

This work was supported by eGovernAbility-Access project (TIN2014-52665-C2-2-R).

LABDA at the 2016 TASS challenge task: using word embeddings for the sentiment analysis task Antonio Quirós, Isabel Segura-Bedmar, Paloma Martínez .LABDA at the 2016 TASS challenge task: using word embeddings for the sentiment analysis task Antonio Quirós, Isabel Segura-Bedmar, Paloma Martínez .* This work was supported by eGovernAbility-Access project (TIN2014-52665-C2-2-R). * This work was partially supported by the Ministerio de Economía y Competitividad under project COINS (TEC2013-47016-C2-1-R) and by Xunta de Galicia (GRC2014/046).

GTI at TASS 2016: Supervised Approach for Aspect Based

Sentiment Analysis in Twitter * GTI en TASS 2016: Una aproximación supervisada para el de sentimiento aspectos en Twitter Tamara Álvarez-López, Milagros Fernández-Gavilanes, Silvia García-Méndez, Jonathan Juncal-Martínez, Francisco Javier González-Castaño GTI Research Group, AtlantTIC University of Vigo, 36310 Vigo, Spain {talvarez,mfgavilanes,sgarcia,jonijm}@gti.uvigo.es, javier@det.uvigo.es

Resumen: Este artículo describe la participación del grupo de investigación GTI, del centro AtlantTIC, perteneciente a la Universidad de Vigo, en el tass 2016. Este taller es un evento enmarcado dentro de la XXXII edición del Congreso Anual de la Sociedad Española para el Procesamiento del Lenguaje Natural. En este trabajo se propone una aproximación supervisada, basada en clasificadores, para la tarea de análisis de sentimiento basado en aspectos. Mediante esta técnica hemos conseguido mejorar las prestaciones de ediciones anteriores, obteniendo una solución acorde con el estado del arte actual. Palabras clave: Análisis de sentimiento, aspectos, SVM, aprendizaje automático, Twitter Abstract: This paper describes the participation of the GTI research group of AtlantTIC, University of Vigo, in tass 2016. This workshop is framed within the XXXII edition of the Annual Congress of the Spanish Society for Natural Language Processing event. In this work we propose a supervised approach based on classifiers, for the aspect based sentiment analysis task. Using this technique we managed to improve the performance of previous years, obtaining a solution reflecting the actual state-of-the-art. Keywords: Sentiment analysis, aspects, SVM, machine learning, Twitter

JACERONG at TASS 2016: An Ensemble Classifier for Sentiment Analysis of Spanish Tweets at Global Level ECambria AAmir Hussain JACerón ; Guzmán 10.1007/978-94-007-5070-8 Proceedings of TASS 2016: Workshop on Sentiment Analysis at SEPLN co-located with the 32nd SEPLN Conference (SEPLN 2016) TASS 2016: Workshop on Sentiment Analysis at SEPLN co-located with the 32nd SEPLN Conference (SEPLN 2016)

Salamanca

Springer Netherlands 2012. 2016. September 2 Sentic Computing. Techniques, Tools and Applications Evaluación de Modelos de Representación del Texto con Vectores de Dimensión Reducida para Análisis de Sentimiento ECasasola Murillo MRGabriela Proceedings of TASS 2016: Workshop on Sentiment Analysis at SEPLN co-located with the 32nd SEPLN Conference (SEPLN 2016) TASS 2016: Workshop on Sentiment Analysis at SEPLN co-located with the 32nd SEPLN Conference (SEPLN 2016)

Salamanca

2016. September ELiRF-UPV en TASS 2016: Análisis de Sentimientos en Twitter LlHurtado PFerran Proceedings of TASS 2016: Workshop on Sentiment Analysis at SEPLN co-located with the 32nd SEPLN Conference (SEPLN 2016) TASS 2016: Workshop on Sentiment Analysis at SEPLN co-located with the 32nd SEPLN Conference (SEPLN 2016)

Salamanca

2016. September Participación de SINAI en TASS 2016 AMontejo-Ráez MCDíaz-Galiano Proceedings of TASS 2016: Workshop on Sentiment Analysis at SEPLN co-located with the 32nd SEPLN Conference (SEPLN 2016) ACorazza RLavelli Zanoli TASS 2016: Workshop on Sentiment Analysis at SEPLN co-located with the 32nd SEPLN Conference (SEPLN 2016)

Salamanca

2016. 2016. 2016 71 A knowledge-poor approach to chemical-disease relation extraction Spanish Billion Words Corpus and Embeddings CCardellino 2016. March Modelling public sentiment in twitter: using linguistic patterns to enhance supervised learning PChikersal SPoria ECambria AGelbukh CESiong International Conference on Intelligent Text Processing and Computational Linguistics Springer 2015 Overview of tass 2016 MAGarcía-Cumbreras JVillena-Román EMartínez-Cámara MCDíaz-Galiano MTMartín-Valdivia LA ULópez Proceedings of TASS 2016: Workshop on Sentiment Analysis at SEPLN colocated with the 32nd SEPLN Conference (SEPLN 2016) TASS 2016: Workshop on Sentiment Analysis at SEPLN colocated with the 32nd SEPLN Conference (SEPLN 2016)

Salamanca, Spain

2016. September Distributed representations of words and phrases and their compositionality TMikolov JDean Advances in neural information processing systems 2013 Semantic orientation for polarity classification in spanish reviews MDMolina-González EMartínez-Cámara M-T. Martín-Valdivia JMPerea-Ortega Expert Systems with Applications 40 18 2013 Opinion mining and sentiment analysis BPang LLee Foundations and trends in information retrieval 2 1-2 2008 Exploring word embedding for drug name recognition ISegura-Bedmar VSuárez-Paniagua PMartınez SIXTH INTERNATIONAL WORKS-HOP ON HEALTH TEXT MINING AND INFORMATION ANALYSIS (LOUHI) 2015 64 Parsing with compositional vector grammars RSocher JBauer CDManning AYNg ACL (1) 2013a Recursive deep models for semantic compositionality over a sentiment treebank RSocher APerelygin JYWu JChuang CDManning AYNg CPotts Citeseer Proceedings of the 25th International Conference Companion on World Wide Web, WWW'16 Companion JA ELeón-Guzmán the 25th International Conference Companion on World Wide Web, WWW'16 Companion 2013b. 2016 1631 International World Wide Web Conferences Steering Committee Overview of tass 2016 MAGarcía-Cumbreras JVillena-Román EMartínez-Cámara MCDíaz-Galiano MTMartín-Valdivia LAUrena-López Proceedings of TASS 2016: Workshop on Sentiment Analysis at SEPLN co-located with the 32nd SEPLN Conference (SE-PLN 2016) TASS 2016: Workshop on Sentiment Analysis at SEPLN co-located with the 32nd SEPLN Conference (SE-PLN 2016)

Salamanca, Spain, September

2016 A meta-analysis of state-of-the-art electoral prediction from Twitter data DGayo-Avello Soc. Sci. Comput. Rev 31 6 2013 Lexical normalisation of short text messages: Makn sens a #Twitter BHan TBaldwin Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies -Volume 1, HLT'11 the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies -Volume 1, HLT'11

Stroudsburg, PA, USA

Association for Computational Linguistics 2011 Scoring, weighting and the vector space model CDManning PRaghavan HSchütze An Introduction to Information Retrieval

New York, NY, USA

Cambridge University Press A new anew: evaluation of a word list for sentiment analysis in microblogs FÅNielsen Proceedings of the ESWC2011 Workshop on 'Making Sense of Microposts': Big things come in small packages the ESWC2011 Workshop on 'Making Sense of Microposts': Big things come in small packages 2011 Freeling 3.0: Towards wider multilinguality LPadró EStanilovsky Proceedings of the Language Resources and Evaluation Conference (LREC 2012) the Language Resources and Evaluation Conference (LREC 2012)

Istanbul, Turkey

ELRA 2012. May Thumbs up?: Sentiment classification using machine learning techniques BPang LLee SVaithyanathan Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing -Volume 10 the ACL-02 Conference on Empirical Methods in Natural Language Processing -Volume 10 Association for Computational Linguistics 2002 EMNLP '02 Scikit-learn: Machine learning in Python FPedregosa GVaroquaux AGramfort VMichel BThirion OGrisel MBlondel PPrettenhofer RWeiss VDubourg JVanderplas APassos DCournapeau MBrucher MPerrot EDuchesnay Journal of Machine Learning Research 12 2011 Elhuyar at tass 2013 XSaralegi ISVicente Proceedings of the Sentiment Analysis Workshop at SEPLN (TASS2013) the Sentiment Analysis Workshop at SEPLN (TASS2013) 2013. September On the usefulness of lexical and syntactic processing in polarity classification of twitter messages DVilares MAAlonso CGómez-Rodrıguez Journal of the Association for Information Science and Technology 2014 Learning deep architectures for ai BibliografíaBengio Yoshua Foundations and trends in Machine Learning 2009 2 Cross-linguistic sentiment analysis: From english to spanish JulianBrooke MilanTofiloski MaiteTaboada RANLP 2009 Organising Committee / ACL EnGalia Angelova KalinaBontcheva RuslanMitkov NicolasNicolov NikolaiNikolov 2009 editores A unified architecture for natural language processing: Deep neural networks with multitask learning RonanCollobert JasonWeston Proceedings of the 25th International Conference on Machine Learning, ICML '08 the 25th International Conference on Machine Learning, ICML '08

New York, NY, USA

ACM 2008 Participación de SINAI DW2Vec en TASS 2015 MCDíaz-Galiano AMontejo-Ráez Proc. of TASS 2015: Workshop on Sentiment Analysis at SEPLN of TASS 2015: Workshop on Sentiment Analysis at SEPLN CEUR-WS 2015. 1397 Sentiment analysis of spanish tweets using a ranking algorithm and skipgrams JaviFernández YoanGutiérrez MJosé PatricioGómez AndrésMartínez-Barco RafaelMontoyo Muñoz Proc. of the TASS workshop at SEPLN of the TASS workshop at SEPLN 2013. 2013 Overview of tass 2016 MiguelGarcía-Cumbreras JulioÁngel EugenioVillena-Román ManuelMartínez-Cámara MACarlos Díaz-Galiano LTeresa Martín-Valdivia Alfonso Ureña-López En Proceedings of TASS 2016: Workshop on Sentiment Analysis at SEPLN co-located with the 32nd SEPLN Conference (SEPLN 2016)

Salamanca, Spain

2016. September Reducing the dimensionality of data with neural networks GeoffreyEHinton RRuslan Salakhutdinov Science 313 5786 2006 Elirfupv en tass 2014: Análisis de sentimientos, detección de tópicos y análisis de sentimientos de aspectos en twitter LluísFHurtado FerranPla Proc. of the TASS workshop at SEPLN of the TASS workshop at SEPLN 2014. 2014 Elirf-upv en tass 2015: Análisis de sentimientos en twitter Lluís-FHurtado FerranPla DavideBuscaldi Proc. of TASS 2015: Workshop on Sentiment Analysis at SEPLN. CEUR-WS.org of TASS 2015: Workshop on Sentiment Analysis at SEPLN. CEUR-WS.org 2015. 1397 Efficient estimation of word representations in vector space TomasMikolov KaiChen GregCorrado JeffreyDean CoRR, abs/1301.3781 2013 Participación de SINAI Word2Vec en TASS 2014 AMontejo-Ráez MAGarcía-Cumbreras MCDíaz-Galiano Proc. of the TASS workshop at SEPLN of the TASS workshop at SEPLN 2014. 2014 Scikit-learn: Machine learning in python FabianPedregosa GaëlVaroquaux AlexandreGramfort VincentMichel BertrandThirion OlivierGrisel MathieuBlondel PeterPrettenhofer RonWeiss VincentDubourg Others The Journal of Machine Learning Research 12 2011 Tass: Detecting sentiments in spanish tweets SaralegiUrizar Xabier IñakiSan VicenteRoncal 2012 En TASS 2012 Working Notes Semi-supervised recursive autoencoders for predicting sentiment distributions RichardSocher JeffreyPennington EricHHuang AndrewYNg ChristopherDManning En Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '11 Association for Computational Linguistics

Stroudsburg, PA, USA

2011 Future information technology Springer Análisis de sentimientos, detección de tópicos y análisis de sentimientos de aspectos en twitter LluísFHurtado FerranPla DavideBuscaldi Elirfupv en tass 2015. 2014. 2014. TASS2014 Elirf-upv en tass 2015: Análisis de en twitter twitter using a support vector machine approach Martínez FerranVíctor Lluís-FPla Hurtado Dsic-elirf at semeval-2016 task 4: Message polarity classification in 2016 Bilingual Experiments on an Opinion Comparable Corpus EMartínez-Cámara MTMartín-Valdivia MDMolina-González LAUreña-López Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis 2013 Freeling 3.0: Towards wider multilinguality LluísPadró EvgenyStanilovsky ; Thirion OGrisel MBlondel PPrettenhofer RWeiss VDubourg JVanderplas APassos DCournapeau MBrucher MPerrot EDuchesnay En Proceedings of the Language Resources and Evaluation Conference (LREC 2012)

Istanbul

2012. 2011 12 Scikit-learn: Machine learning in Python Tass-2013: Análisis de sentimientos en twitter FerranPla -FLluís Hurtado En Proceedings of the TASS workshop at SEPLN 2013. IV Congreso Español de Informática 2013 Political tendency identification in twitter using sentiment analysis techniques FerranPla -FLluís Hurtado En Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, páginas

Dublin, Ireland

City University and Association for Computational Linguistics 2014a. August FerranPla -FLluís Hurtado editores, Natural Language Processing and Information Systems de Lecture Notes in Computer Science ElisabethMétais MathieuRoche MaguelonneTeisseire Springer International Publishing 2014b 8455 Sentiment analysis in twitter for spanish Elhuyar at tass 2013 XabierSaralegi SanIñaki Vicente En Proceedings of the TASS workshop at SEPLN 2013 IV Congreso Español de Informática 2013 Gti at semeval-2016 task 5: Svm and crf for aspect detection and unsupervised aspect-based sentiment analysis Elirf-Upv En Tass ; T JJuncal-Martınez MFernández-Gavilanes ECosta-Montenegro FJGonzález-Castano Proceedings of SemEval SemEval 2016. 2016 Análisis de Sentimientos en Twitter References Alvarez-López Freeling 1.3: Syntactic and semantic services in an open-source NLP library JAtserias BCasas EComelles MGonzález LPadró MPadró Proceedings of LREC LREC 2006 6 Learning deep architectures for AI YBengio Found. Trends Mach. Learn 2 1 2009. January On using Twitter to monitor political sentiment and predict election results ABermingham AFSmeaton 2011 Cross-linguistic sentiment analysis: From english to spanish JBrooke MTofiloski MTaboada RANLP 2009 Organising Committee / ACL GAngelova KBontcheva RMitkov NNicolov NNikolov 2009 RANLP Libsvm: a library for support vector machines C.-CChang C.-JLin ACM Transactions on Intelligent Systems and Technology (TIST) 2 3 27 2011 Comparative experiments on sentiment classification for online product reviews HCui VMittal MDatar Santos Proceedings of the 21st National Conference on Artificial Intelligence -Volume 2, AAAI'06 MGatti the 21st National Conference on Artificial Intelligence -Volume 2, AAAI'06 AAAI Press 2006. 2014 COLING Lexical normalization of spanish tweets with preprocessing rules, domain-specific edit distances, and language models PRFabo MCuadros TEtchegoyhen Proceedings of the Tweet Normalization Workshop co-located with 29th Conference of the Spanish Society for Natural Language Processing (SEPLN 2013) the Tweet Normalization Workshop co-located with 29th Conference of the Spanish Society for Natural Language Processing (SEPLN 2013)

Madrid, Spain

2013. September 20th, 2013 Unsupervised method for sentiment analysis in online texts MFernández-Gavilanes TÁlvarez-López JJuncal-Martínez ECosta-Montenegro FJGonzález-Castaño Expert Systems with Applications 58 2016 Lexical normalisation of short text messages: Makn sens a #twitter MAGarcía-Cumbreras JVillena-Román EMartínez-Cámara MCDíaz-Galiano MTMartín-Valdivia LAUreña-López ; Han Proceedings of the 49th Annual Meeting the Associa-Computational Linguistics: Human Language Technologies -Volume 1, HLT '11 the 49th Annual Meeting the Associa-Computational Linguistics: Human Language Technologies -Volume 1, HLT '11

Salamanca, Spain; B. and T. Baldwin; Stroudsburg, PA, USA

Association for Computational Linguistics 2016. 2011 Proceedings of TASS 2016: Workshop on Sentiment Analysis at SEPLN co-located with the 32nd SEPLN Conference (SE-PLN 2016) ELiRF-UPV en TASS 2015: Análisis de sentimientos en Twitter LFHurtado FPlà DBus - Proceedings of TASS 2015: Workshop on Sentiment Analysis at SEPLN colocated with 31st SEPLN Conference (SE-PLN 2015) TASS 2015: Workshop on Sentiment Analysis at SEPLN colocated with 31st SEPLN Conference (SE-PLN 2015)

Alicante, Spain

2015. September 15, 2015 caldi A clustering-based approach on sentiment analysis GLi FLiu Intelligent Systems and Knowledge Engineering (ISKE), 2010 International Conference on IEEE 2010 Sentiment Analysis and Opinion Mining BLiu Synthesis Lectures on Human Language Technologies Morgan & Claypool Publishers 2012 Tass 2015 -the evolution of the spanish opinion mining systems EMartínez-Cámara MAGarcía-Cumbreras JVillena-Román JGarcía-Morera Procesamiento del Lenguaje Natural 56 2016 Nrc-canada: Building the state-of-the-art in sentiment analysis of tweets SMMohammad SKiritchenko XZhu Proceedings of the seventh international workshop on Semantic Evaluation Exercises (SemEval-2013) the seventh international workshop on Semantic Evaluation Exercises (SemEval-2013)

Atlanta, Georgia, USA

2013. June Twitter as a corpus for sentiment analysis and opinion mining APak PParoubek Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10) NC CChair ) KChoukri BMaegaard JMariani JOdijk SPiperidis MRosner DTapias the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Valletta, Malta

ELRA 2010 Twitter, myspace, digg: Unsupervised sentiment analysis in social media GPaltoglou MThelwall ACM Transactions on Intelligent Systems and Technology (TIST) 3 4 66 2012 Opinion mining and sentiment analysis BPang LLee Found. Trends Inf. Retr 2 1-2 2008. January Sentiment classification using sociolinguistic clusters SPark Proceedings of TASS 2015: Workshop on Sentiment Analysis at SEPLN co-located with 31st SEPLN Conference (SEPLN 2015) TASS 2015: Workshop on Sentiment Analysis at SEPLN co-located with 31st SEPLN Conference (SEPLN 2015)

Alicante, Spain

2015. September 15, 2015 Lys at TASS 2015: Deep learning experiments for sentiment analysis on spanish tweets DVilares YDoval MAAlonso CGómez-Rodríguez Proceedings of TASS 2015: Workshop on Sentiment Analysis at SEPLN colocated with 31st SEPLN Conference (SE-PLN 2015) TASS 2015: Workshop on Sentiment Analysis at SEPLN colocated with 31st SEPLN Conference (SE-PLN 2015)

Alicante, Spain

2015. September 15, 2015 Topic sentiment analysis in Twitter: A graph-based hashtag sentiment classification approach XWang FWei XLiu MZhou MZhang Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM '11 the 20th ACM International Conference on Information and Knowledge Management, CIKM '11

New York, NY, USA

ACM 2011 Ecnu: Leveraging word embeddings to boost performance for paraphrase in Twitter JZhao MLan Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) the 9th International Workshop on Semantic Evaluation (SemEval 2015)

Denver, Colorado

Association for Computational Linguistics 2015. June