<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="it">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Multi-task Learning in Deep Neural Networks at EVALITA 2018</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Andrea</forename><surname>Cimino</surname></persName>
							<email>andrea.cimino@ilc.cnr.it</email>
						</author>
						<author>
							<persName><forename type="first">Lorenzo</forename><surname>De Mattei</surname></persName>
							<email>lorenzo.demattei@di.unipi.it</email>
						</author>
						<author>
							<persName><forename type="first">Felice</forename><surname>Dell'orletta</surname></persName>
							<email>felice.dellorletta@ilc.cnr.it</email>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="department">Istituto di Linguistica Computazionale &quot;</orgName>
								<orgName type="laboratory">Antonio Zampolli&quot; (ILC-CNR) ItaliaNLP Lab -www.italianlp.it</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">Dipartimento di Informatica</orgName>
								<orgName type="institution">Università di Pisa</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Multi-task Learning in Deep Neural Networks at EVALITA 2018</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">3DE7E2935D43335DA963FD3D438EDBBA</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T21:47+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>English. In this paper we describe the system used for the participation to the ABSITA, GxG, HaSpeeDe and IronITA shared tasks of the EVALITA 2018 conference. We developed a classifier that can be configured to use Bidirectional Long Short Term Memories and linear Support Vector Machines as learning algorithms. When using Bi-LSTMs we tested a multitask learning approach which learns the optimized parameters of the network exploiting simultaneously all the annotated dataset labels and a multiclassifier voting approach based on a k-fold technique. In addition, we developed generic and specific word embedding lexicons to further improve classification performances. When evaluated on the official test sets, our system ranked 1st in almost all subtasks for each shared task, showing the effectiveness of our approach.</p><p>Italiano. In questo articolo descriviamo il sistema utilizzato per la partecipazione agli shared task ABSITA, GxG, HaSpee-De ed IronITA della conferenza EVALITA 2018. Abbiamo sviluppato un sistema che utilizza come algoritmi di apprendimento sia reti di tipo Long Short Term Memory Bidirezionali (Bi-LSTM) che Support Vector Machines. Nell'utilizzo delle Bi-LSTM abbiamo testato un approccio di tipo multi task learning nel quale i parametri della rete vengono ottimizzati utilizzando contemporaneamente le annotazioni presenti nel dataset ed una strategia di classificazione a voti di tipo k-fold. Abbiamo creato word embeddings generici e specifici per ogni singolo task per migliorare ulteriormente le performance di classificazione.</p><p>Il nostro sistema quando valutato sui test set ufficiali ha ottenuto il primo posto in quasi tutti i sotto task di ogni shared task affrontato, dimostrando la validità del nostro approccio.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="it">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Description of the System</head><p>The EVALITA 2018 edition has been one of the most successful editions in terms of number of shared tasks proposed. In particular, a large part of the tasks proposed by the organizers can be tackled as binary document classification tasks. This gave us the possibility to test a new system specifically designed for this EVALITA edition.</p><p>We implemented a system which relies on Bi-LSTM <ref type="bibr" target="#b13">(Hochreiter et al., 1997)</ref> and SVM which are widely used learning algorithms in the document classification task. The learning algorithm can be selected in a configuration file. In this work we used the Keras <ref type="bibr" target="#b4">(Chollet, 2016)</ref> library and the liblinear <ref type="bibr" target="#b11">(Fan et al., 2008)</ref> library to generate the Bi-LSTM and SVM statistical models respectively. Since our approach relies on morphosyntactically tagged text, training and test data were automatically morphosyntactically tagged by the PoS tagger described in <ref type="bibr">(Cimino and Dell'Orletta, 2016)</ref>. Due to the label constraints in the dataset, if our system classified an aspect as not present, we forced the related positive and negative labels to be classified as not positive and not negative. We developed sentiment polarity and word embedding lexicons with the aim of improving the overall accuracy of our system. Some specific adaptions were made due to the characteristics of each shared task. In the Aspectbased Sentiment Analysis (ABSITA) 2018 shared task <ref type="bibr" target="#b1">(Basile et al., 2018)</ref> participants were asked, given a training set of Booking hotel reviews, to detect the mentioned aspect categories in a review among a set of 8 fixed categories (ACD task) and to assign the polarity (neutral, positive, neutral, positive-negative) for each detected aspect (ACP task). Since each Booking review in the training set is labeled with 24 binary labels (8 indicating the presence of an aspect, 8 indicating positivity and 8 indicating negativity w.r.t. an aspect), we addressed the ABISTA 2018 shared task as 24 binary classification problems.</p><p>The Gender X-Genre (GxG) 2018 shared task <ref type="bibr" target="#b10">(Dell'Orletta and Nissim, 2018)</ref> consisted in the automatic identification of the gender of the author of a text (Female or Male). Five different training sets and test sets were provided by the organizers for five different genres: Children essays (CH), Diary (DI), Journalism (JO), Twitter posts (TW) and YouTube comments (YT). For each test set the participants are requested to submit a system trained using in-domain training dataset and a system trained using cross-domain data only.</p><p>The IronITA task <ref type="bibr" target="#b5">(Cignarella et al., 2018)</ref> consisted of two tasks. In the first task participants had to automatically label a message as ironic or not. The second task had a more fine grain: given a message, participants had to classify whether the message is sarcastic, ironic but not sarcastic or not ironic.</p><p>Finally in the HaSpeeDe 2018 shared task <ref type="bibr" target="#b3">(Bosco et al., 2018)</ref> consisted in automatically annotating messages from Twitter and Facebook with a boolean value indicating the presence (or not) of hate speech. In particular three tasks were proposed: HaSpeeDe-FB where only the Facebook dataset could be used to classify Facebook comments, HaSpeeDe-TW where just Twitter data could be used to classify tweets and Cross-HaspeeDe where only the Facebook dataset could be used to classify the Twitter test set and vice versa (Cross-HaspeeDe FB, Cross-HaspeeDe TW).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Lexical Resources</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1.1">Automatically Generated Sentiment Polarity Lexicons for Social Media</head><p>For the purpose of modeling the word usage in generic, positive and negative contexts of social media texts, we developed three lexicons which we named T W GEN , T W N EG , T W P OS . Each lexicon reports the relative frequency of a word in three different corpora. The main idea behind building these lexicons is that positive and negative words should present a higher relative fre-quency in T W P OS and T W N EG respectively. The three corpora were generated by first downloading approximately 50,000,000 tweets and then applying some filtering rules to the downloaded tweets to build the positive and negative corpora (no filtering rules were applied to build the generic corpus). In order to build a corpus of positive tweets, we constrained the downloaded tweets to contain at least one positive emoji among heart and kisses.</p><p>Since emojis are rarely used in negative tweets, to build the negative tweets corpus we created a list of commonly used words in negative language and constrained these tweets to contain at least one of these words.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1.2">Automatically translated Sentiment Polarity Lexicons</head><p>The Multi-Perspective Question Answering (hereafter referred to as M P QA) Subjectivity Lexicon <ref type="bibr" target="#b20">(Wilson et al., 2005)</ref>. This lexicon consists of approximately 8,200 English words with their associated polarity. To use this resource for the Italian language, we translated all the entries through the Yandex translation service<ref type="foot" target="#foot_0">1</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1.3">Word Embedding Lexicons</head><p>We generated four word embedding lexicons using the word2vec<ref type="foot" target="#foot_1">2</ref> toolkit <ref type="bibr">(Mikolov et al., 2013)</ref>. As recommended in <ref type="bibr">(Mikolov et al., 2013)</ref>, we used the CBOW model that learns to predict the word in the middle of a symmetric window based on the sum of the vector representations of the words in the window. For our experiments, we considered a context window of 5 words. The Word Embedding Lexicons starting from the following corpora which were tokenized and postagged by the PoS tagger for Twitter described in <ref type="bibr">(Cimino and Dell'Orletta, 2016</ref>):</p><p>• The first lexicon was built using the itWaC corpus<ref type="foot" target="#foot_2">3</ref> . The itWaC corpus is a 2 billion word corpus constructed from the Web limiting the crawl to the .it domain and using mediumfrequency words from the Repubblica corpus and basic Italian vocabulary lists as seeds.</p><p>• The second lexicon was built using the set of the 50,000,000 tweets we downloaded to build the sentiment polarity lexicons previously described in subsection 1.1.1</p><p>• The third and the fourth lexicon were built using a corpus consisting of 538,835 Booking reviews scraped from the web. Since each review in the Booking site is split in a positive secion (indicated by a plus mark) and negative section (indicated by a minus mark), we split these reviews obtaining in 338,494 positive reviews and 200,341 negative reviews.</p><p>Starting from the positive and the negative reviews, we finally obtained two different word embedding lexicons.</p><p>Each entry of the lexicons maps a pair (word, POS) to the associated word embedding, allowing to mitigate polisemy problems which can lead to poorer results in classification. In addition, both the corpora where preprocessed in order to 1) map each url to the word "URL" 2) distinguish between all uppercased words and non-uppercased words (eg.: "mai" vs "MAI"), since all uppercased words are usually used in negative contexts. Since each task has its own characteristics in terms of information that needs to be captured from the classifiers, we decided to use a subset of the word embeddings in each task. Table <ref type="table">1</ref> sums up the word embeddings used in each shared task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Task</head><p>Booking ITWAC Twitter ABSITA GxG HaSpeeDe IronITA</p><p>Table <ref type="table">1</ref>: Word embedding lexicons used by our system in each shared task (used; not used).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2">The Classifier</head><p>The classifier we built for our participation to the tasks was designed with the aim of testing different learning algorithms and learning strategies. More specifically our classifier implements two workflows which allow testing SVM and recurrent neural networks as learning algorithms. In addition, when recurrent neural networks are chosen as learning algorithms, our classifier allows to perform neural network multi-task learning (MTL) using an external dataset in order to share knowledge between related tasks. We decided to test the MTL strategy since, as demonstrated in (De <ref type="bibr" target="#b9">Mattei et al., 2018)</ref>, it can improve the performance of the classifier on emotion recognition tasks. The benefits of this approach were investigated also by <ref type="bibr" target="#b17">Søgaard and Goldberg (2016)</ref>, which showed that MTL is appealing since it allows to incorporate previous knowledge about tasks hierarchy into neural networks architectures. Furthermore, <ref type="bibr" target="#b15">Ruder et al. (2017)</ref> showed that MTL is useful to combine even loosely related tasks, letting the networks automatically learn the tasks hierarchy.</p><p>Both the workflows we implemented share a common pattern used in machine learning classifiers consisting of a document feature extraction and a learning phase based on the extracted features, but since SVM and Bi-LSTM take input 2-dimensional and 3-dimensional tensors respectively, a different feature extraction phase is involved for each considered algorithm. In addition, when the Bi-LSTM workflow is selected the classifier can take as input an extra file which will be used to exploit the MTL learning approach. Furthermore, when the Bi-LSTM workflow is selected, the classifier performs 5-fold training approach. More precisely we build 5 different models using different training and validation sets. These models are then exploited in the classification phase: the assigned labels are the ones that obtain the majority among all the models. The 5fold approach strategy was chosen in order to generate a global model which should less be prone to overfitting or underfitting w.r.t. a single learned model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2.1">The SVM classifier</head><p>The SVM classifier exploits a wide set of features ranging across different levels of linguistic description. With the exception of the word embedding combination, these features were already tested in our previous participation at the EVALITA 2016 SENTIPOLC edition <ref type="bibr">(Cimino et al., 2016)</ref>. The features are organised into three main categories: raw and lexical text features, morpho-syntactic features and lexicon features. Due to size constraints we report only the feature names.</p><p>Raw and Lexical Text Features number of tokens, character n-grams, word n-grams, lemma n-grams, repetition of n-grams chars, number of mentions, number of hashtags, punctuation. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Morpho-syntactic Features coarse grained</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2.2">The Deep Neural Network classifier</head><p>We tested two different models based on Bi-LSTM: one that learns to classify the labels without sharing information from all the labels in the training phase (Single task learning -STL), and the other one which learns to classify the labels exploiting the related information through a shared Bi-LSTM (Multi task learning -MTL). We employed Bi-LSTM architectures since these architectures allow to capture long-range dependencies from both directions of a document by constructing bidirectional links in the network <ref type="bibr" target="#b16">(Schuster et al., 1997)</ref>. We applied a dropout factor to both input gates and to the recurrent connections in order to prevent overfitting which is a typical issue in neural networks <ref type="bibr">(Galp and Ghahramani, 2015)</ref>. We have chosen a dropout factor value of 0.50.</p><p>For what concerns GxG, as we had to deal with longer documents such as news, we employed a two layer Bi-LSTM encoder. The first Bi-LSTM layer served us to encode each sentence as a token sequence, the second layer served us to encode the sentences sequence. For what concerns ironITA we added a task-specifici Bi-LSTM for each substask before the dense layer.</p><p>Figure <ref type="figure" target="#fig_1">1</ref> shows a graphical representation of the STL and MTL architectures we employed. For what concerns the optimization process, the binary cross entropy function is used as a loss function and optimization is performed by the rmsprop optimizer <ref type="bibr" target="#b19">(Tieleman and Hinton, 2012)</ref>.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Each input word is represented by a vector</head><p>which is composed by: Word embeddings: the concatenation of the word embeddings extracted by the available Word Embedding Lexicons (128 dimensions for each word embedding), and for each word embedding an extra component was added to handle the "unknown word" (1 dimension for each lexicon used). Word polarity: the corresponding word polarity obtained by exploiting the Sentiment Polarity Lexicons. This results in 3 components, one for each possible lexicon outcome (negative, neutral, positive) (3 dimensions). We assumed that a word not found in the lexicons has a neutral polarity. Automatically Generated Sentiment Polarity Lexicons for Social Media: The presence or the absence of the word in a lexicon and the relative presence if the word is found in the lexicon. Since we built the T W GEN , T W P OS and T W N EG 6 dimensions are needed, 2 for each lexicon. Coarse Grained Part-of-Speech: 13 dimensions. End of Sentence: a component (1 dimension) indicating whether the sentence was totally read.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Results and Discussion</head><p>Table <ref type="table" target="#tab_0">2</ref> reports the official results obtained by our best runs on all the task we participated. As it can be noted our system performed extremely well, achieving the best scores almost in every single subtask. In the following subsections a discussion of the results obtained in each task is provided.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">ABSITA</head><p>We tested five learning configurations of our system based on linear SVM and DNN learning algorithms using the features described in section 1.2.1 and 1.2.2. All the experiments were aimed at testing the contribution in terms of f-score of MTL vs STL, the k-fold technique and the external resources. For what concerns the Bi-LSTM learning algorithm we tested Bi-LSTM both in the STL and MTL scenarios. In addition, to test the contribution of the Booking word embeddings, we created a configuration which uses a shallow Bi-LSTM in MTL setting without using these embeddings (MTL NO BOOKING-WE). Finally, to test the contribution of the k-fold technique we created a configuration which does not use the k-fold technique (MTL NO K-FOLD). To obtain fair comparisons in the last case we run all the experiments 5 times and averaged the scores of the runs. To test the proposed classification models, we created  The MTL configuration was the best performing among all the the models, but the difference in term of f-score among all the DNN configuration is not evident.</p><p>When analyzing the results obtained on the ACP task we can notice remarkable differences among the performances obtained by the models. Again the linear SVM was the worst performing model, but this time with a difference in terms of f-score of 6 points with respect to MTL, the best performing model on the task. It is interesting to notice that the results achieved by the DNN models have bigger difference between them in terms of f-score with respect to the ACD task: this suggests that the external resources and the kfold technique contributed significantly to obtain the best result in the ACP task. The configuration that does not use the k-fold technique scored 2 fscore points w.r.t. the MTL configuration. We can also notice that the Booking word emebeddings were particularly helpful in this task: the MTL NO BOOOKING-WE configuration in fact scored 5 points less than the best configuration. The results obtained on the internal development set lead us to choose the models for the official runs on the provided test set. Table <ref type="table">4</ref> reports the overall accuracies achieved by all our classifier configurations on the official test set, the official submitted runs are starred in the table.</p><p>As it can be noticed the best scores both in the ACD and ACP tasks were obtained by the DNN models. Surprisingly the difference in terms of fscore were reduced in both the tasks, with the exception of linear SVM, which performed 4 and 8 f-score points less in the ACD and ACP tasks respectively when compared to the best DNN model systems. The STL model outperformed the MTL models the ACD task, even though the difference in term of f-score is not relevant. When the results on the ACP are considered, the MTL model outperformed all the other models, even though the the difference in terms of f-score with respect to the STL model is not noticeable. Is it worth to notice that the k-fold technique and the Booking word embeddings seemed to again contribute in the final accuracy of the MTL system. This can be seen by looking at the results achieved by the MTL NO BOOKING-WE model and the MTL NO K-FOLD model that scored 1.2 and 1.5 f-score points less than the MTL system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">GxG</head><p>We tested three different learning configurations of our system based on linear SVM and DNN learning algorithms using the features described in section 1.2.1 and 1.2.2. For what concerns the Bi-LSTM learning algorithm we tested both the STL and MTL approaches. We tested the three configurations for each of the 5 five in-domain subtasks and for each of the 5 five cross-domain subtasks.   Table <ref type="table">8</ref>: Classification results of the different learning models on the official test set in terms of accuracy for the cross-domain tasks. (* marks runs that outperformed all the systems that participated to the task).</p><p>Table <ref type="table" target="#tab_6">7</ref> and 8 report the overall accuracy, computed as the average accuracy for the two classes (male and female), achieved by the models on the official test sets for the in-domain and the crossdomain tasks respectively (* marks the running that obtain the best results in the competition). For what concerns the in-domain subtasks the perfor-mances appear to be not in line with the ones obtained on the development set, but still our models outperform the other participant's systems in four out of five subtasks. The MTL model provided the best results for the Children and Diary test sets, while on the other test sets all the models performed quite poorly. Again when trained on all the datasets, in and cross-domain, the SVM (SVMa) perform worst then when trained on indomain data only <ref type="bibr">(SVM)</ref>. For what concerns the cross-domain subtasks, while our model gets the best performances on three out of five subtasks, the results confirm poor performances over all the subtasks, again indicating that the models have difficulties in cross-domain generalization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">HaSpeeDe</head><p>We tested seven learning configurations of our system based on linear SVM and DNN learning algorithms using the features described in section 1.2.1 and 1.2.2. All the experiments were aimed at testing the contribution in terms of f-score of the number of layers, MTL vs STL, the k-fold technique and the external resources. For what concerns the Bi-LSTM learning algorithm we tested one and two layers Bi-LSTM both in the STL and MTL scenarios. In addition, to test the contribution of the sentiment lexicon features, we created a configuration which uses a 2-layer Bi-LSTM in MTL setting without using these features (1L MTL NO SNT). Finally, to test the contribution of the kfold technique we created a configuration which does not use the k-fold technique (1 STL NO K-FOLD). To obtain fair results in the last case we run all the experiments 5 times and averaged the scores of the runs. To test the proposed classification models, we created two internal development sets, one for each dataset, by randomly selecting documents from the training sets distributed by the task organizers. The resulting development sets are composed by the 10% (300 documents) of the whole training sets.</p><p>Table <ref type="table">9</ref> reports the overall accuracies achieved by the models on our internal development sets for all the tasks. In addition, the results of baseline system (baseline row) which emits always the most probable label according to the label distribution in the training set is reported. The accuracy is calculated as the f-score obtained using the evaluation tool provided by the organizers. For what concerns the Twitter in-domain task (TW For what concerns the k-fold learning strategy, we can notice that the results achieved by the model not using the k-fold learning strategy (1 STL NO K-FOLD) are always lower than the counterpart which used the k-fold approach (+2.5 f-score points gained in the C TW task), showing the benefits of using this technique. These results lead us to choose the models for the official runs on the provided test set. Table <ref type="table">10</ref> reports the overall accuracies achieved by all our classifier configurations on the official test set, the official submitted runs are starred in the table. The best official system row reports, for each task, the best official results submitted by the participants of the EVALITA 2018 HaSpeeDe shared task. As we can note the best scores in each task were obtained by the Bi-LSTM in the MTL setting, showing that MTL networks seem to be more effective with respect to STL networks. For what concerns the Twitter in-domain task, we obtained similar results to the development set ones. A sensible drop in performance is observed in the FB task w.r.t the development set (-5 f-score points in average). Still Bi-LSTMs models outperformed the linear SVM model by 5 f-score points. In the out-domain tasks, all the models performed similarly to what observed in the development set. It is worth observing that linear SVM performed almost as a baseline system in the C FB task. In addition, in the same task the model exploiting the sentiment lexicon (1L MTL) showed a better performance (+1.5 f-score points) w.r.t to the 1L MTL NO SNT model. It is worth to notice that the kfold learning strategy was beneficial also on the official test set: the 1L STL model obtained better results (approximately +2 f-score points in each task) w.r.t. the model that did not use the k-fold learning strategy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">IronITA</head><p>We tested the four designed learning configurations of our system based on linear SVM and deep neural network (DNN) learning algorithms using the features described in section 1.2.1 and 1.2.2. To select the proposed classification models, we used k-cross validation (k=4).</p><p>Table <ref type="table" target="#tab_8">11</ref> reports the overall average f-score achieved by the models on the k-cross valida- tion sets for both the irony and sarcarsm detection tasks.</p><p>We can observe that the SVM obtains good results on irony detection but the MTL neural approach overperforms sensibly the SVM. Also we note that the usage of additional Polarity and Hate Speech datasets lead to better performances. These results lead us to choose the MTL models trained with the additional dataset for the two official run submissions.</p><p>Table <ref type="table" target="#tab_9">12</ref> reports the overall accuracies achieved by all our classifier configurations on the official test set, the official submitted runs are starred in the table. The accuracies has been computed in terms of F-Score using the official evaluation script. We submitted the runs MTL+Polarity and MTL+Polarity+Hate. The run MTL+Polarity ranked first in the subtask A, and third in the subtask B on the official leaderboard. The run MTL+Polarity ranked second in the subtask A, and fourth in the subtask B on the official leaderboard.</p><p>The results on the test set confirm the good performances of the SVM classifier on irony detection task and that the MTL neural approaches overperform the SVM. The model trained on the IronITA and SENTIPOLC datasets outperformed all the systems that participated to the subtask A, while on the subtask B it slightly underperformed the best participant system. The model trained on the IronITA, SENTIPOLC and HaSpeeDe datasets overperformed all the systems that participated to the subtask A but our model trained on IronITA and SENTIPOLC datasets only. Although the best scores in both tasks were obtained by the MTL network trained on IronITA data set only. The MTL model trained on IronITA dataset only would have outperformed all the systems submitted to both the subtasks by all participants. Seems that for these tasks the usage of additional datasets leads to overfitting issues.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Conclusions</head><p>In this paper we reported the results of our participation to the ABSITA, GxG, HaSpeeDe and IronITA shared tasks of the EVALITA 2018 conference. By resorting to a system which used Support Vector Machines and Deep Neural Networks (DNN) as learning algorithms, we achieved the best scores almost in every task, showing the effectivness of our approach. In addition, when DNN was used as learning algorithm we introduced a new multi-task learning approach and a majority vote classification approach to further improve the overall accuracy of our system. The proposed system resulted in an very effective solution achieving the first position in almost all sub-tasks for each shared task.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>Part-Of-Speech n-grams, Fine grained Part-Of-Speech n-grams, Coarse grained Part-Of-Speech distribution Lexicon features Emoticons Presence, Lemma sentiment polarity n-grams, Polarity modifier, PMI score, sentiment polarity distribution, Most frequent sentiment polarity, Sentiment polarity in text sections, Word embeddings combination.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: STL and MTL architectures.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 2 :</head><label>2</label><figDesc>Classification results of our best runs on the ABSITA, GxG, HaSpeeDe and IronITA test sets.</figDesc><table><row><cell>Task</cell><cell cols="2">Our Score Best Score</cell><cell>Rank</cell></row><row><cell></cell><cell>ABSITA</cell><cell></cell><cell></cell></row><row><cell>ACD</cell><cell>0.811</cell><cell>0.811</cell><cell>1</cell></row><row><cell>ACP</cell><cell>0.767</cell><cell>0.767</cell><cell>1</cell></row><row><cell></cell><cell cols="2">GxG IN-DOMAIN</cell><cell></cell></row><row><cell>CH</cell><cell>0.640</cell><cell>0.640</cell><cell>1</cell></row><row><cell>DI</cell><cell>0.676</cell><cell>0.676</cell><cell>1</cell></row><row><cell>JO</cell><cell>0.555</cell><cell>0.585</cell><cell>2</cell></row><row><cell>TW</cell><cell>0.595</cell><cell>0.595</cell><cell>1</cell></row><row><cell>YT</cell><cell>0.555</cell><cell>0.555</cell><cell>1</cell></row><row><cell cols="3">GxG CROSS-DOMAIN</cell><cell></cell></row><row><cell>CH</cell><cell>0.640</cell><cell>0.640</cell><cell>1</cell></row><row><cell>DI</cell><cell>0.595</cell><cell>0.635</cell><cell>2</cell></row><row><cell>JO</cell><cell>0.510</cell><cell>0.515</cell><cell>2</cell></row><row><cell>TW</cell><cell>0.609</cell><cell>0.609</cell><cell>1</cell></row><row><cell>YT</cell><cell>0.513</cell><cell>0.513</cell><cell>1</cell></row><row><cell></cell><cell>HaSpeeDe</cell><cell></cell><cell></cell></row><row><cell>TW</cell><cell>0.799</cell><cell>0.799</cell><cell>1</cell></row><row><cell>FB</cell><cell>0.829</cell><cell>0.829</cell><cell>1</cell></row><row><cell>C TW</cell><cell>0.699</cell><cell>0.699</cell><cell>1</cell></row><row><cell>C FB</cell><cell>0.607</cell><cell>0.654</cell><cell>5</cell></row><row><cell></cell><cell>IronITA</cell><cell></cell><cell></cell></row><row><cell>IRONY</cell><cell>0.730</cell><cell>0.730</cell><cell>1</cell></row><row><cell>SARCASM</cell><cell>0.516</cell><cell>0.520</cell><cell>3</cell></row><row><cell cols="4">an internal development set by randomly selecting</cell></row><row><cell cols="4">documents from the training sets distributed by the</cell></row><row><cell cols="4">task organizers. The resulting development set is</cell></row><row><cell cols="4">composed by approximately the 10% (561 docu-</cell></row><row><cell cols="3">ments) of the whole training set.</cell><cell></cell></row><row><cell cols="2">Configuration</cell><cell cols="2">ACD ACP</cell></row><row><cell>baseline</cell><cell></cell><cell cols="2">0.313 0.197</cell></row><row><cell>linear SVM</cell><cell></cell><cell cols="2">0.797 0.739</cell></row><row><cell>STL</cell><cell></cell><cell cols="2">0.821 0.795</cell></row><row><cell>MTL</cell><cell></cell><cell cols="2">0.824 0.804</cell></row><row><cell cols="2">MTL NO K-FOLD</cell><cell cols="2">0.819 0.782</cell></row><row><cell cols="4">MTL NO BOOKING-WE 0.817 0.757</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3 :</head><label>3</label><figDesc>Classification results (micro f-score) of the different learning models on our ABSITA development set.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>reports the overall accuracies achieved by the models on the internal development set for all the tasks. In addition, the results of baseline system (baseline row) which emits always the most probable label according to the label distribu-</figDesc><table><row><cell>Configuration</cell><cell>ACD</cell><cell>ACP</cell></row><row><cell>baseline</cell><cell>0.338</cell><cell>0.199</cell></row><row><cell>linear SVM</cell><cell cols="2">0.772* 0.686*</cell></row><row><cell>STL</cell><cell>0.814</cell><cell>0.765</cell></row><row><cell>MTL</cell><cell cols="2">0.811* 0.767*</cell></row><row><cell>MTL NO K-FOLD</cell><cell>0.801</cell><cell>0.755</cell></row><row><cell cols="2">MTL NO BOOKING-WE 0.808</cell><cell>0.753</cell></row><row><cell cols="3">Table 4: Classification results (micro f-score) of</cell></row><row><cell cols="3">the different learning models on the ABSITA offi-</cell></row><row><cell>cial test set.</cell><cell></cell><cell></cell></row><row><cell cols="3">tions in the training set is reported. The accuracy</cell></row><row><cell cols="3">is calculated as the micro f-score obtained using</cell></row><row><cell cols="3">the evaluation tool provided by the organizers. For</cell></row><row><cell cols="3">what concerns the ACD task it is worth noting that</cell></row><row><cell cols="3">the models based on DNN always outperform lin-</cell></row><row><cell cols="3">ear SVM, even though the difference in terms of</cell></row><row><cell cols="3">f-score is small (approximately 2 f-score points).</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head></head><label></label><figDesc>To test the proposed classification models, we created internal development sets by randomly selecting documents from the training sets distributed by the task organizers. The resulting development sets are composed by approximately 10% of the each data sets. For what concern the in-domain task, we tried to train the SVM classifier on indomain-data only and and on both in-domain and cross-domain data.</figDesc><table><row><cell cols="2">Model CH</cell><cell>DI</cell><cell>JO</cell><cell>TW</cell><cell>YT</cell></row><row><cell cols="6">SVMa 0.667 0.626 0.485 0.582 0.611</cell></row><row><cell>SVM</cell><cell cols="5">0.701 0.737 0.560 0.728 0.619</cell></row><row><cell>STL</cell><cell cols="5">0.556 0.545 0.500 0.724 0.596</cell></row><row><cell>MTL</cell><cell cols="5">0.499 0.817 0.625 0.729 0.632</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 :</head><label>5</label><figDesc>Classification results of the different learning models on development set in terms of accuracy for the in-domain tasks.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 5</head><label>5</label><figDesc>and 6 report the overall accuracy, computed as the average accuracy for the two classes (male and female), achieved by the models on the development data sets for the in-domain and</figDesc><table><row><cell cols="2">Model CH</cell><cell>DI</cell><cell>JO</cell><cell>TW</cell><cell>YT</cell></row><row><cell>SVM</cell><cell cols="5">0.530 0.565 0.580 0.588 0.568</cell></row><row><cell>STL</cell><cell cols="5">0.550 0.535 0.505 0.625 0.580</cell></row><row><cell>MTL</cell><cell cols="5">0.523 0.549 0.538 0.500 0.556</cell></row><row><cell cols="6">Table 6: Classification results of the different</cell></row><row><cell cols="6">learning models on development set in terms of</cell></row><row><cell cols="5">accuracy for the cross-domain tasks</cell></row><row><cell cols="6">the cross-domain tasks respectively. For the in-</cell></row><row><cell cols="6">domain tasks we observe that the SVM performs</cell></row><row><cell cols="6">well on the smaller datasets (Children and Di-</cell></row><row><cell cols="6">ary), while MTL neural network has the best</cell></row><row><cell cols="6">overall performances. When trained on all the</cell></row><row><cell cols="6">datasets, in-and cross-domain, the SVM (SVMa)</cell></row><row><cell cols="6">perform worst than when trained on in-domain</cell></row><row><cell cols="6">data only (SVM). For what concerns the cross-</cell></row><row><cell cols="6">domain datasets we observe poor performances</cell></row><row><cell cols="6">over all the subtasks with all the employed mod-</cell></row><row><cell cols="6">els, implying that the models have difficulties in</cell></row><row><cell cols="4">cross-domain generalization.</cell><cell></cell></row><row><cell cols="2">Model CH</cell><cell>DI</cell><cell>JO</cell><cell>TW</cell><cell>YT</cell></row><row><cell cols="2">SVMa 0.545</cell><cell>0.514</cell><cell cols="2">0.475 0.539</cell><cell>0.585</cell></row><row><cell>SVM</cell><cell>0.550</cell><cell>0.649</cell><cell cols="2">0.555 0.567</cell><cell>0.555*</cell></row><row><cell>STL</cell><cell>0.545</cell><cell>0.541</cell><cell cols="3">0.500 0.595* 0.512</cell></row><row><cell>MTL</cell><cell cols="4">0.640* 0.676* 0.470 0.561</cell><cell>0.546</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 7 :</head><label>7</label><figDesc>Classification results of the different learning models on the official test set in terms of accuracy for the in-domain tasks (* marks runs that outperformed all the systems that participated to the task).</figDesc><table><row><cell cols="2">Model CH</cell><cell>DI</cell><cell>JO</cell><cell>TW</cell><cell>YT</cell></row><row><cell>SVM</cell><cell>0.540</cell><cell cols="3">0.514 0.505 0.586</cell><cell>0.513*</cell></row><row><cell>STL</cell><cell cols="5">0.640* 0.554 0.495 0.609* 0.510</cell></row><row><cell>MTL</cell><cell>0.535</cell><cell cols="3">0.595 0.510 0.500</cell><cell>0.500</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_8"><head>Table 11 :</head><label>11</label><figDesc>Classification results of the different learning models on k-cross validation terms of average F1-score.</figDesc><table><row><cell>Configuration</cell><cell cols="2">Irony Sarcasm</cell></row><row><cell>linear SVM</cell><cell cols="2">0.734 0.512</cell></row><row><cell>MTL</cell><cell cols="2">0.745 0.530</cell></row><row><cell>MTL+Polarity</cell><cell cols="2">0.757 0.562</cell></row><row><cell cols="3">MTL+Polarity+Hate 0.760 0.557</cell></row><row><cell>Configuration</cell><cell>Irony</cell><cell>Sarcasm</cell></row><row><cell>baseline-random</cell><cell>0.505</cell><cell>0.337</cell></row><row><cell>baseline-mfc</cell><cell>0.334</cell><cell>0.223</cell></row><row><cell>best participant</cell><cell>0.730</cell><cell>0.52</cell></row><row><cell>linear SVM</cell><cell>0.701</cell><cell>0.493</cell></row><row><cell>MTL</cell><cell>0.736</cell><cell>0.530</cell></row><row><cell>MTL+Polarity</cell><cell cols="2">0.730* 0.516*</cell></row><row><cell cols="3">MTL+Polarity+Hate 0.713* 0.503*</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_9"><head>Table 12 :</head><label>12</label><figDesc>Classification results of the different learning models on the official test set in terms of F1-score (* submitted run).</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://api.yandex.com/translate/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://code.google.com/p/word2vec/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">http://wacky.sslmit.unibo.it/doku.php?id=corpora</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 used for this research.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Overview of the EVALITA 2016 SENTiment POLarity Classification Task</title>
		<author>
			<persName><forename type="first">Francesco</forename><surname>Barbieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Valerio</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Danilo</forename><surname>Croce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Malvina</forename><surname>Nissim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nicole</forename><surname>Novielli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Viviana</forename><surname>Patti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of EVALITA &apos;16, Evaluation of NLP and Speech Tools for Italian</title>
				<meeting>EVALITA &apos;16, Evaluation of NLP and Speech Tools for Italian<address><addrLine>Naples, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016-12">2016. December</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">Pierpaolo</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Valerio</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Danilo</forename><surname>Croce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marco</forename><surname>Polignano</surname></persName>
		</author>
		<title level="m">Overview of the EVALITA Aspect-based Sentiment Analysis (ABSITA) Task</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m">Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA&apos;18)</title>
				<editor>
			<persName><forename type="first">T</forename><surname>Caselli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Novielli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Patti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</editor>
		<meeting>the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA&apos;18)<address><addrLine>Turin, Italy</addrLine></address></meeting>
		<imprint>
			<date>December</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Overview of the Evalita 2018 Hate Speech Detection Task</title>
		<author>
			<persName><forename type="first">Cristina</forename><surname>Bosco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Felice</forename><surname>Dell'orletta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fabio</forename><surname>Poletto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Manuela</forename><surname>Sanguinetti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maurizio</forename><surname>Tesconi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of EVALITA &apos;18, Evaluation of NLP and Speech Tools for Italian</title>
				<editor>
			<persName><forename type="first">T</forename><surname>Caselli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Novielli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Patti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</editor>
		<meeting>EVALITA &apos;18, Evaluation of NLP and Speech Tools for Italian<address><addrLine>Turin, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018-12">2018. December</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">Franois</forename><surname>Chollet</surname></persName>
		</author>
		<ptr target="https://github.com/fchollet/keras/tree/master/keras" />
		<title level="m">Keras</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">Alessandra</forename><surname>Cignarella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Simona</forename><surname>Frenda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Valerio</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cristina</forename><surname>Bosco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Viviana</forename><surname>Patti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paolo</forename><surname>Rosso</surname></persName>
		</author>
		<title level="m">Overview of the Evalita 2018 Task on Irony Detection in Italian Tweets</title>
				<imprint>
			<publisher>IronITA</publisher>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m">Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA&apos;18)</title>
				<editor>
			<persName><forename type="first">T</forename><surname>Caselli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Novielli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Patti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</editor>
		<meeting>the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA&apos;18)</meeting>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Tandem LSTM-SVM Approach for Sentiment Analysis</title>
		<author>
			<persName><forename type="first">Andrea</forename><surname>Cimino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Felice</forename><surname>Dell'orletta</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of EVALITA &apos;16, Evaluation of NLP and Speech Tools for Italian</title>
				<meeting>EVALITA &apos;16, Evaluation of NLP and Speech Tools for Italian<address><addrLine>Naples, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016-12">2016. December</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Building the state-of-the-art in POS tagging of Italian Tweets</title>
		<author>
			<persName><forename type="first">Andrea</forename><surname>Cimino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Felice</forename><surname>Dell'orletta</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) &amp; Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2016)</title>
				<meeting>Third Italian Conference on Computational Linguistics (CLiC-it 2016) &amp; Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2016)<address><addrLine>Napoli, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016-12-05">2016. December 5-7, 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Multi-Task Learning in Deep Neural Network for Sentiment Polarity and Irony classification</title>
		<author>
			<persName><forename type="first">Andrea</forename><surname>Lorenzo De Mattei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Felice</forename><surname>Cimino</surname></persName>
		</author>
		<author>
			<persName><surname>Dell'orletta</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2nd Workshop on Natural Language for Artificial Intelligence</title>
				<meeting>the 2nd Workshop on Natural Language for Artificial Intelligence<address><addrLine>Trento, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018-11-22">2018. November 22-23, 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Overview of the EVALITA Cross-Genre Gender Prediction in Italian (GxG) Task</title>
		<author>
			<persName><forename type="first">Felice</forename><surname>Dell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">'</forename><surname>Orletta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Malvina</forename><surname>Nissim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA&apos;18)</title>
				<editor>
			<persName><forename type="first">T</forename><surname>Caselli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Novielli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Patti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</editor>
		<meeting>the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA&apos;18)</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">LIBLINEAR: A Library for Large Linear Classification Journal of Machine Learning Research</title>
		<author>
			<persName><forename type="first">Kai-Wei</forename><surname>Rong-En Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cho-Jui</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiang-Rui</forename><surname>Hsieh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chih-Jen</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><surname>Lin</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="1871" to="1874" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">A theoretically grounded application of dropout in recurrent neural networks</title>
		<author>
			<persName><forename type="first">Yarin</forename><surname>Gal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zoubin</forename><surname>Ghahramani</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1512.05287</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Efficient estimation of word representations in vector space</title>
		<author>
			<persName><forename type="first">Sepp</forename><surname>Hochreiter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jurgen</forename><surname>Schmidhuber ; Kai Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Greg</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeffrey</forename><surname>Dean</surname></persName>
		</author>
		<idno>arXiv1:1301.3781</idno>
	</analytic>
	<monogr>
		<title level="m">Neural computation Tomas Mikolov</title>
				<imprint>
			<date type="published" when="1997">1997. 2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
	<note>Long short-term memory</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">SemEval-2016 task 4: Sentiment analysis in Twitter</title>
		<author>
			<persName><forename type="first">Preslav</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alan</forename><surname>Ritter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sara</forename><surname>Rosenthal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fabrizio</forename><surname>Sebastiani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Veselin</forename><surname>Stoyanov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th International Workshop on Semantic Evaluation</title>
				<meeting>the 10th International Workshop on Semantic Evaluation<address><addrLine>SemEval-</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016. 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Document modeling with gated recurrent neural network for sentiment classification</title>
		<author>
			<persName><forename type="first">Sebastian</forename><surname>Ruder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Joachim</forename><surname>Bingel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Isabelle</forename><surname>Augenstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anders</forename><surname>Søgaard</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1705.08141422-1432</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
			<pubPlace>Lisbon, Portugal</pubPlace>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Bidirectional recurrent neural networks</title>
		<author>
			<persName><forename type="first">Mike</forename><surname>Schuster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kuldip</forename><forename type="middle">K</forename><surname>Paliwal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Signal Processing</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="2673" to="2681" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Deep multi-task learning with low level tasks supervised at lower layers</title>
		<author>
			<persName><forename type="first">Anders</forename><surname>Søgaard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoav</forename><surname>Goldberg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 54th Annual Meeting of the Association for Computational Linguistics<address><addrLine>Berlin, Portugal</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="231" to="235" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Document modeling with gated recurrent neural network for sentiment classification</title>
		<author>
			<persName><forename type="first">Duyu</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bing</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ting</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of EMNLP 2015</title>
				<meeting>EMNLP 2015<address><addrLine>Lisbon, Portugal</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1422" to="1432" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Lecture 6.5-RmsProp: Divide the gradient by a running average of its recent magnitude</title>
		<author>
			<persName><forename type="first">Tijmen</forename><surname>Tieleman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Geoffrey</forename><surname>Hinton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">COURSERA: Neural Networks for Machine Learning</title>
				<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Recognizing contextual polarity in phraselevel sentiment analysis</title>
		<author>
			<persName><forename type="first">Theresa</forename><surname>Wilson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zornitsa</forename><surname>Kozareva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Preslav</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sara</forename><surname>Rosenthal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Veselin</forename><surname>Stoyanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alan</forename><surname>Ritter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of HLT-EMNLP 2005</title>
				<meeting>HLT-EMNLP 2005<address><addrLine>Stroudsburg, PA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACL</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="347" to="354" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">UNIMELB at SemEval-2016 Tasks 4A and 4B: An Ensemble of Neural Networks and a Word2Vec Based Model for Sentiment Classification</title>
		<author>
			<persName><forename type="first">Xingyi</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Huizhi</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Timothy</forename><surname>Baldwin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th International Workshop on Semantic Evaluation</title>
				<meeting>the 10th International Workshop on Semantic Evaluation<address><addrLine>SemEval-</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016. 2016</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
