<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Tandem LSTM-SVM Approach for Sentiment Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Cimino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>andrea.cimino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>felice.dellorlettag@ilc.cnr.it</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bhuwan Dhingra, Zhong Zhou, Dylan Fitzpatrick, Michael Muehl, William Cohen. 2016. Tweet2Vec: Character-Based Distributed Representations for Social Media. In Proceedings of the 54th Annual Meeting of the ACL. Berlin</institution>
          ,
          <addr-line>German</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Felice Dell'Orletta. 2009. Ensemble system for Partof-Speech tagging. In Proceedings of EVALITA '09, Evaluation of NLP and Speech Tools for Italian. December</institution>
          ,
          <addr-line>Reggio Emilia</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <abstract>
        <p>English. In this paper we describe our approach to EVALITA 2016 SENTIPOLC task. We participated in all the subtasks with constrained setting: Subjectivity Classification, Polarity Classification and Irony Detection. We developed a tandem architecture where Long Short Term Memory recurrent neural network is used to learn the feature space and to capture temporal dependencies, while the Support Vector Machines is used for classification. SVMs combine the document embedding produced by the LSTM with a wide set of general-purpose features qualifying the lexical and grammatical structure of the text. We achieved the second best accuracy in Subjectivity Classification, the third position in Polarity Classification, the sixth position in Irony Detection.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Italiano. In questo articolo
descriviamo il sistema che abbiamo utilizzato
per affrontare i diversi compiti del task
SENTIPOLC della conferenza EVALITA
2016. In questa edizione abbiamo
partecipato a tutti i sotto compiti nella
configurazione vincolata, cioe` senza utilizzare
risorse annotate a mano diverse rispetto a
quelle distribuite dagli organizzatori. Per
questa partecipazione abbiamo sviluppato
un metodo che combina una rete neurale
ricorrente di tipo Long Short Term
Memory, utilizzate per apprendere lo spazio
delle feature e per catturare dipendenze
temporali, e Support Vector Machine per
la classificazione. Le SVM combinano la
rappresentazione del documento prodotta
da LSTM con un ampio insieme di
features che descrivono la struttura
lessicale e grammaticale del testo. Attraverso
questo sistema abbiamo ottenuto la
seconda posizione nella classificazione della
Soggettivita`, la terza posizione nella
classificazione della Polarita` e la sesta nella
identificazione dell’Ironia.
1</p>
    </sec>
    <sec id="sec-2">
      <title>Description of the system</title>
      <p>
        We addressed the EVALITA 2016 SENTIPOLC
task
        <xref ref-type="bibr" rid="ref1">(Barbieri et al., 2016)</xref>
        as a three-classification
problem: two binary classification tasks
(Subjectivity Classification and Irony Detection) and a
four-class classification task (Polarity
Classification).
      </p>
      <p>
        We implemented a tandem LSTM-SVM
classifier operating on morpho-syntactically tagged
texts. We used this architecture since similar
systems were successfully employed to tackle
different classification problems such keyword spotting
        <xref ref-type="bibr" rid="ref11">(Wo¨llmer et al., 2009)</xref>
        or the automatic estimation
of human affect from speech signal
        <xref ref-type="bibr" rid="ref12">(Wo¨llmer et
al., 2010)</xref>
        , showing that tandem architectures
outperform the performances of the single classifiers.
      </p>
      <p>In this work we used Keras (Chollet, 2016) deep
learning framework and LIBSVM (Chang et al.,
2001) to generate respectively the LSTM and the
SVMs statistical models.</p>
      <p>Since our approach relies on
morphosyntactically tagged texts, both training
and test data were automatically
morphosyntactically tagged by the POS tagger described
in (Dell’Orletta, 2009). In addition, in order to
improve the overall accuracy of our system
(described in 1.2), we developed sentiment polarity
and word embedding lexicons1 described below.</p>
      <p>1All the created lexicons are made freely available at the
following website: http://www.italianlp.it/.
1.1</p>
      <sec id="sec-2-1">
        <title>Lexical resources</title>
      </sec>
      <sec id="sec-2-2">
        <title>1.1.1 Sentiment Polarity Lexicons</title>
        <p>Sentiment polarity lexicons provide mappings
between a word and its sentiment polarity (positive,
negative, neutral). For our experiments, we used a
publicly available lexicons for Italian and two
English lexicons that we automatically translated. In
addition, we adopted an unsupervised method to
automatically create a lexicon specific for the
Italian twitter language.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Existing Sentiment Polarity Lexicons</title>
        <p>We used the Italian sentiment polarity lexicon
(hereafter referred to as OP EN ER) (Maks et
al., 2013) developed within the OpeNER
European project2. This is a freely available lexicon
for the Italian language3 and includes 24,000
Italian word entries. It was automatically created
using a propagation algorithm and the most frequent
words were manually reviewed.</p>
      </sec>
      <sec id="sec-2-4">
        <title>Automatically translated Sentiment Polarity</title>
      </sec>
      <sec id="sec-2-5">
        <title>Lexicons</title>
        <p>
          The Multi–Perspective Question Answering
(hereafter referred to as M P QA)
Subjectivity Lexicon
          <xref ref-type="bibr" rid="ref10">(Wilson et al., 2005)</xref>
          . This
lexicon consists of approximately 8,200 English
words with their associated polarity. In order
to use this resource for the Italian language,
we translated all the entries through the
Yandex translation service4.
        </p>
        <p>
          The Bing Liu Lexicon (hereafter referred to
as BL)
          <xref ref-type="bibr" rid="ref2">(Hu et al., 2004)</xref>
          . This lexicon
includes approximately 6,000 English words
with their associated polarity. This resource
was automatically translated by the Yandex
translation service.
        </p>
      </sec>
      <sec id="sec-2-6">
        <title>Automatically created Sentiment Polarity</title>
      </sec>
      <sec id="sec-2-7">
        <title>Lexicons</title>
        <p>We built a corpus of positive and negative tweets
following the Mohammad et al. (2013) approach
adopted in the Semeval 2013 sentiment polarity
detection task. For this purpose we queried the
Twitter API with a set of hashtag seeds that
indicate positive and negative sentiment polarity.
We selected 200 positive word seeds (e.g.
“vincere” to win, “splendido” splendid, “affascinante”
2http://www.opener-project.eu/
3https://github.com/opener-project/public-sentimentlexicons
4http://api.yandex.com/translate/
fascinating), and 200 negative word seeds (e.g.,
“tradire” betray, “morire” die). These terms were
chosen from the OPENER lexicon. The
resulting corpus is made up of 683,811 tweets extracted
with positive seeds and 1,079,070 tweets extracted
with negative seeds.</p>
        <p>The main purpose of this procedure was to
assign a polarity score to each n-gram occurring
in the corpus. For each n-gram (we considered
up to five n-grams) we calculated the
corresponding sentiment polarity score with the following
scoring function: score(ng) = P M I (ng; pos)
P M I (ng; neg), where PMI stands for pointwise
mutual information.</p>
      </sec>
      <sec id="sec-2-8">
        <title>1.1.2 Word Embedding Lexicons</title>
        <p>Since the lexical information in tweets can be very
sparse, to overcame this problem we built two
word embedding lexicons.</p>
        <p>
          For this purpose, we trained two predict
models using the word2vec5 toolkit
          <xref ref-type="bibr" rid="ref4">(Mikolov et al.,
2013)</xref>
          . As recommended in
          <xref ref-type="bibr" rid="ref4">(Mikolov et al., 2013)</xref>
          ,
we used the CBOW model that learns to
predict the word in the middle of a symmetric
window based on the sum of the vector
representations of the words in the window. For our
experiments, we considered a context window of
5 words. These models learn lower-dimensional
word embeddings. Embeddings are represented by
a set of latent (hidden) variables, and each word is
a multidimensional vector that represent a specific
instantiation of these variables. We built two Word
Embedding Lexicons starting from the following
corpora:
        </p>
        <p>The first lexicon was built using a tokenized
version of the itWaC corpus6. The itWaC
corpus is a 2 billion word corpus constructed
from the Web limiting the crawl to the .it
domain and using medium-frequency words
from the Repubblica corpus and basic Italian
vocabulary lists as seeds.</p>
        <p>The second lexicon was built from a
tokenized corpus of tweets. This corpus was
collected using the Twitter APIs and is made up
of 10,700,781 italian tweets.
1.2</p>
      </sec>
      <sec id="sec-2-9">
        <title>The LSTM-SVM tandem system</title>
        <p>
          SVM is an extremely efficient learning algorithm
and hardly to outperform, unfortunately these type
5http://code.google.com/p/word2vec/
6http://wacky.sslmit.unibo.it/doku.php?id=corpora
of algorithms capture “sparse” and “discrete”
features in document classification tasks, making
really hard the detection of relations in sentences,
which is often the key factor in detecting the
overall sentiment polarity in documents
          <xref ref-type="bibr" rid="ref8">(Tang et al.,
2015)</xref>
          . On the contrary, Long Short Term
Memory (LSTM) networks are a specialization of
Recurrent Neural Networks (RNN) which are able
to capture long-term dependencies in a sentence.
This type of neural network was recently tested
on Sentiment Analysis tasks
          <xref ref-type="bibr" rid="ref8">(Tang et al., 2015)</xref>
          ,
          <xref ref-type="bibr" rid="ref13">(Xu et al., 2016)</xref>
          where it has been proven to
outperform classification performance in several
sentiment analysis task
          <xref ref-type="bibr" rid="ref6">(Nakov et al., 2016)</xref>
          with
respect to commonly used learning algorithms,
showing a 3-4 points of improvements. For this
work, we implemented a tandem LSTM-SVM to
take advantage from the two classification
strategies.
        </p>
        <p>Figure 1 shows a graphical representation of
the proposed tandem architecture. This
architecture is composed of 2 sequential machine learning
steps both involved in training and classification
phases. In the training phase, the LSTM network
is trained considering the training documents and
the corresponding gold labels. Once the
statistical model of the LSTM neural network is
computed, for each document of the training set a
document vector (document embedding) is computed
exploiting the weights that can be obtained from
the penultimate network layer (the layer before the
SoftMax classifier) by giving in input the
considered document to the LSTM network. The
document embeddings are used as features during the
training phase of the SVM classifier in
conjunction with a set of widely used document
classification features. Once the training phase of the
SVM classifier is completed the tandem
architecture is considered trained. The same stages are
involved in the classification phase: for each
document that must be classified, an embedding
vector is obtained exploiting the previously trained
LSTM network. Finally the embedding is used
jointly with other document classification features
by the SVM classifier which outputs the predicted
class.</p>
      </sec>
      <sec id="sec-2-10">
        <title>1.2.1 The LSTM network</title>
        <p>In this part, we describe the LSTM model
employed in the tandem architecture. The LSTM unit
was initially proposed by Hochreiter and
Schmidhuber (Hochreiter et al., 1997). LSTM units are
Unlabeled Tweets
and itWaC Corpus</p>
        <p>Twitter/itWaC
Word Embeddings</p>
        <p>LSTM
Sentence
embeddings Extraction</p>
        <p>SVM Model
Generation
Final
statistical model</p>
        <p>Document
feature extraction
able to propagate an important feature that came
early in the input sequence over a long distance,
thus capturing potential long-distance
dependencies.</p>
        <p>LSTM is a state-of-the-art learning algorithm
for semantic composition and allows to compute
representation of a document from the
representation of its words with multiple abstraction levels.
Each word is represented by a low dimensional,
continuous and real-valued vector, also known
as word embedding and all the word vectors are
stacked in a word embedding matrix.</p>
        <p>
          We employed a bidirectional LSTM
architecture since these kind of architecture allows to
capture long-range dependencies from both directions
of a document by constructing bidirectional links
in the network
          <xref ref-type="bibr" rid="ref7">(Schuster et al., 1997)</xref>
          . In addition,
we applied a dropout factor to both input gates
and to the recurrent connections in order to
prevent overfitting which is a typical issue in
neural networks
          <xref ref-type="bibr" rid="ref8">(Galp and Ghahramani , 2015)</xref>
          . As
suggested in
          <xref ref-type="bibr" rid="ref8">(Galp and Ghahramani , 2015)</xref>
          we
have chosen a dropout factor value in the
optimum range [0:3; 0:5], more specifically 0.45 for
this work. For what concerns the optimization
process, categorical cross-entropy is used as a loss
function and optimization is performed by the
rmsprop optimizer
          <xref ref-type="bibr" rid="ref9">(Tieleman and Hinton, 2012)</xref>
          .
        </p>
        <p>Each input word to the LSTM architecture is
represented by a 262-dimensional vector which is
composed by:
Word embeddings: the concatenation of the two
word embeddings extracted by the two available
Word Embedding Lexicons (128 dimensions for
each word embedding, a total of 256 dimensions),
and for each word embedding an extra component
was added in order to handle the ”unknown word”
(2 dimensions).</p>
        <p>Word polarity: the corresponding word polarity
obtained by exploiting the Sentiment Polarity
Lexicons. This results in 3 components, one for each
possible lexicon outcome (negative, neutral,
positive) (3 dimensions). We assumed that a word not
found in the lexicons has a neutral polarity.
End of Sentence: a component (1 dimension)
indicating whether or not the sentence was totally
read.</p>
      </sec>
      <sec id="sec-2-11">
        <title>1.2.2 The SVM classifier</title>
        <p>The SVM classifier exploits a wide set of
features ranging across different levels of
linguistic description. With the exception of the word
embedding combination, these features were
already tested in our previous participation at the
EVALITA 2014 SENTIPOLC edition (Cimino et
al., 2014). The features are organised into three
main categories: raw and lexical text features,
morpho-syntactic features and lexicon features.</p>
      </sec>
      <sec id="sec-2-12">
        <title>Raw and Lexical Text Features</title>
        <p>Topic: the manually annotated class of topic
provided by the task organizers for each tweet.
Number of tokens: number of tokens occurring
in the analyzed tweet.</p>
        <p>Character n-grams: presence or absence of
contiguous sequences of characters in the analyzed
tweet.</p>
        <p>Word n-grams: presence or absence of
contiguous sequences of tokens in the analyzed tweet.
Lemma n-grams: presence or absence of
contiguous sequences of lemma occurring in the
analyzed tweet.</p>
      </sec>
      <sec id="sec-2-13">
        <title>Repetition of n-grams chars: presence or ab</title>
        <p>sence of contiguous repetition of characters in the
analyzed tweet.</p>
        <p>Number of mentions: number of mentions (@)
occurring in the analyzed tweet.</p>
        <p>Number of hashtags: number of hashtags
occurring in the analyzed tweet.</p>
        <p>Punctuation: checks whether the analyzed tweet
finishes with one of the following punctuation
characters: “?”, “!”.</p>
      </sec>
      <sec id="sec-2-14">
        <title>Morpho-syntactic Features</title>
      </sec>
      <sec id="sec-2-15">
        <title>Coarse grained Part-Of-Speech n-grams: pres</title>
        <p>ence or absence of contiguous sequences of
coarse–grained PoS, corresponding to the main
grammatical categories (noun, verb, adjective).</p>
      </sec>
      <sec id="sec-2-16">
        <title>Fine grained Part-Of-Speech n-grams: pres</title>
        <p>ence or absence of contiguous sequences of
finegrained PoS, which represent subdivisions of the
coarse-grained tags (e.g. the class of nouns is
subdivided into proper vs common nouns, verbs into
main verbs, gerund forms, past particles).</p>
      </sec>
      <sec id="sec-2-17">
        <title>Coarse grained Part-Of-Speech distribution:</title>
        <p>the distribution of nouns, adjectives, adverbs,
numbers in the tweet.</p>
      </sec>
      <sec id="sec-2-18">
        <title>Lexicon features</title>
        <p>Emoticons: presence or absence of positive or
negative emoticons in the analyzed tweet. The
lexicon of emoticons was extracted from the site
http://it.wikipedia.org/wiki/Emoticon and
manually classified.</p>
        <p>Lemma sentiment polarity n-grams: for each
n-gram of lemmas extracted from the analyzed
tweet, the feature checks the polarity of each
component lemma in the existing sentiment polarity
lexicons. Lemma that are not present are marked
with the ABSENT tag. This is for example the
case of the trigram “tutto molto bello” (all very
nice) that is marked as “ABSENT-POS-POS”
because molto and bello are marked as positive in
the considered polarity lexicon and tutto is absent.
The feature is computed for each existing
sentiment polarity lexicons.</p>
        <p>Polarity modifier: for each lemma in the tweet
occurring in the existing sentiment polarity
lexicons, the feature checks the presence of adjectives
or adverbs in a left context window of size 2. If
this is the case, the polarity of the lemma is
assigned to the modifier. This is for example the case
of the bigram “non interessante” (not interesting),
where “interessante” is a positive word, and “non”
is an adverb. Accordingly, the feature “non POS”
is created. The feature is computed 3 times,
checking all the existing sentiment polarity lexicons.
PMI score: for each set of unigrams, bigrams,
trigrams, four-grams and five-grams that occur in
the analyzed tweet, the feature computes the score
given by Pi–gram2tweet score(i–gram) and
returns the minimum and the maximum values of the
five values (approximated to the nearest integer).</p>
      </sec>
      <sec id="sec-2-19">
        <title>Distribution of sentiment polarity: this feature</title>
        <p>computes the percentage of positive, negative and
neutral lemmas that occur in the tweet. To
overcome the sparsity problem, the percentages are
rounded to the nearest multiple of 5. The feature
is computed for each existing lexicon.</p>
      </sec>
      <sec id="sec-2-20">
        <title>Most frequent sentiment polarity: the feature re</title>
        <p>turns the most frequent sentiment polarity of the
lemmas in the analyzed tweet. The feature is
computed for each existing lexicon.</p>
      </sec>
      <sec id="sec-2-21">
        <title>Sentiment polarity in tweet sections: the feature</title>
        <p>first splits the tweet in three equal sections. For
each section the most frequent polarity is
computed using the available sentiment polarity
lexicons. The purpose of this feature is aimed at
identifying change of polarity within the same tweet.
Word embeddings combination: the feature
returns the vectors obtained by computing
separately the average of the word embeddings of the
nouns, adjectives and verbs of the tweet. It
computed once for each word embedding lexicon,
obtaining a total of 6 vectors for each tweet.
2</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results and Discussion</title>
      <p>We tested five different learning configurations of
our system: linear and quadratic support vector
machines (linear SVM, quadratic SVM) using the
features described in section 1.2.2, with the
exception of the document embeddings generated by
the LSTM; LSTM using the word embeddings
described in 1.2.2; A tandem SVM-LSTM
combination with linear and quadratic SVM kernels
(linear Tandem, quadratic Tandem) using the features
described in section 1.2.2 and the document
embeddings generated by the LSTM. To test the
proposed classification models, we created an internal
development set randomly selected from the
training set distributed by the task organizers. The
resulting development set is composed by the 10%
(740 tweets) of the whole training set.</p>
      <sec id="sec-3-1">
        <title>Configuration</title>
        <p>linear SVM
quadratic SVM
LSTM
linear Tandem
quadratic Tandem</p>
      </sec>
      <sec id="sec-3-2">
        <title>Subject.</title>
        <p>Polarity
outperform the SVM and LSTM ones. In addition,
the quadratic models perform better than the linear
ones. These results lead us to choose the linear and
quadratic tandem models as the final systems to be
used on the official test set.</p>
        <p>Table 2 reports the overall accuracies achieved
by all our classifier configurations on the official
test set, the official submitted runs are starred in
the table. The best official Runs row reports, for
each task, the best official results in EVALITA
2016 SENTIPOLC. As can be seen, the
accuracies of different learning models reveal a different
trend when tested on the development and the test
sets. Differently from what observed in the
development experiments, the best system results to
be the LSTM one and the gap in terms of
accuracy between the linear and quadratic models is
lower or does not occur. In addition, the
accuracies of all the systems are definitely lower than the
ones obtained in our development experiments. In
our opinion, such results may depend on the
occurrence of out domain tweets in the test set with
respect to the ones contained in the training set.
Different groups of annotators could be a further
motivation for these different results and trends.
3</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>
        In this paper, we reported the results of our
participation to the EVALITA 2016 SENTIPOLC tasks.
By resorting to a tandem LSTM-SVM system we
achieved the second place at the Subjectivity
Classification task, the third place at the Sentiment
Polarity Classification task and the sixth place at the
Irony Detection task. This tandem system
combines the ability of the bidirectional LSTM to
capture long-range dependencies between words from
both directions of a tweet with SVMs which are
able to exploit document embeddings produced by
LSTM in conjunction with a wide set of
generalpurpose features qualifying the lexical and
grammatical structure of a text. Current direction of
research is introducing a character based LSTM
        <xref ref-type="bibr" rid="ref4 ref5">(dos
Santos and Zadrozny, 2013)</xref>
        in the tandem system.
Character based LSTM proven to be particularly
suitable when analyzing social media texts
(Dhingra et al., 2016).
2001.
      </p>
      <p>vecat
Andre Cimino, Stefano Cresci, Felice Dell’Orletta,
Maurizio Tesconi. 2014. Linguistically-motivated
and Lexicon Features for Sentiment Analysis of
Italian Tweets. In Proceedings of EVALITA ’14,
Evaluation of NLP and Speech Tools for Italian.
December, Pisa, Italy.
Yarin Gal and Zoubin Ghahramani. 2015. A
theoretically grounded application of dropout in recurrent
neural networks. arXiv preprint arXiv:1512.05287
Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long
short-term memory. Neural computation
ACM SIGKDD international conference on
Knowledge discovery and data mining, KDD ’04. 368-177,
New York, NY, USA. ACM.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Valerio Basile, Danilo Croce, Malvina Nissim, Nicole Novielli,
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the EVALITA 2016 SENTiment POLarity Classification Task</article-title>
          . In Pierpaolo Basile, Anna Corazza, Franco Cutugno, Simonetta Montemagni, Malvina Nissim, Viviana Patti, Giovanni Semeraro and Rachele Sprugnoli, editors,
          <source>Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ). December, Naples, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Minqing</given-names>
            <surname>Hu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Bing</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Mining and summarizing customer reviews</article-title>
          .
          <source>In Proceedings of the tenth</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Isa</given-names>
            <surname>Maks</surname>
          </string-name>
          , Ruben Izquierdo, Francesca Frontini, Montse Cuadros, Rodrigo Agerri and
          <string-name>
            <given-names>Piek</given-names>
            <surname>Vossen</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Generating Polarity Lexicons with WordNet propagation in 5 languages. 9th LREC, Language Resources</article-title>
          and Evaluation Conference. Reykjavik, Iceland.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Kai Chen, Greg Corrado and
          <string-name>
            <given-names>Jeffrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Efficient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv1:1301</source>
          .
          <fpage>3781</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Saif</given-names>
            <surname>Mohammad</surname>
          </string-name>
          , Svetlana Kiritchenko and
          <string-name>
            <given-names>Xiaodan</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>NRC-Canada: Building the state-of-theart in sentiment analysis of tweets</article-title>
          .
          <source>In Proceedings of the Seventh international workshop on Semantic Evaluation Exercises, SemEval-2013</source>
          .
          <fpage>321</fpage>
          -
          <lpage>327</lpage>
          , Atlanta, Georgia, USA.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Preslav</given-names>
            <surname>Nakov</surname>
          </string-name>
          , Alan Ritter, Sara Rosenthal, Fabrizio Sebastiani and
          <string-name>
            <given-names>Veselin</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>SemEval2016 task 4: Sentiment analysis in Twitter</article-title>
          .
          <source>In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Mike</given-names>
            <surname>Schuster and Kuldip</surname>
          </string-name>
          <string-name>
            <given-names>K.</given-names>
            <surname>Paliwal</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Bidirectional recurrent neural networks</article-title>
          .
          <source>IEEE Transactions on Signal Processing</source>
          <volume>45</volume>
          (
          <issue>11</issue>
          ):
          <fpage>2673</fpage>
          -
          <lpage>2681</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Duyu</given-names>
            <surname>Tang</surname>
          </string-name>
          , Bing Qin and
          <string-name>
            <given-names>Ting</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Document modeling with gated recurrent neural network for sentiment classification</article-title>
          .
          <source>In Proceedings of EMNLP</source>
          <year>2015</year>
          .
          <volume>1422</volume>
          -
          <fpage>1432</fpage>
          , Lisbon, Portugal.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Tijmen</given-names>
            <surname>Tieleman</surname>
          </string-name>
          and
          <string-name>
            <given-names>Geoffrey</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <source>2012. Lecture 6</source>
          .5
          <article-title>-RmsProp: Divide the gradient by a running average of its recent magnitude</article-title>
          .
          <source>In COURSERA: Neural Networks for Machine Learning.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Theresa</surname>
            <given-names>Wilson</given-names>
          </string-name>
          , Zornitsa Kozareva, Preslav Nakov, Sara Rosenthal, Veselin Stoyanov and
          <string-name>
            <given-names>Alan</given-names>
            <surname>Ritter</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Recognizing contextual polarity in phraselevel sentiment analysis</article-title>
          .
          <source>In Proceedings of HLTEMNLP</source>
          <year>2005</year>
          .
          <volume>347</volume>
          -
          <fpage>354</fpage>
          , Stroudsburg, PA, USA. ACL.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Martin</given-names>
            <surname>Wo</surname>
          </string-name>
          ¨llmer, Florian Eyben, Alex Graves,
          <source>Bjo¨rn Schuller and Gerhard Rigoll</source>
          .
          <year>2009</year>
          .
          <article-title>Tandem BLSTM-DBN architecture for keyword spotting with enhanced context modeling Proc</article-title>
          . of NOLISP.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Martin</given-names>
            <surname>Wo</surname>
          </string-name>
          <article-title>¨llmer, Bjo¨rn Schuller, Florian Eyben</article-title>
          and
          <string-name>
            <given-names>Gerhard</given-names>
            <surname>Rigoll</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Combining Long Short-Term Memory and Dynamic Bayesian Networks for Incremental Emotion-Sensitive Artificial Listening IEEE Journal of Selected Topics in Signal Processing</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>XingYi Xu</surname>
          </string-name>
          ,
          <source>HuiZhi Liang and Timothy Baldwin</source>
          .
          <year>2016</year>
          . UNIMELB at SemEval
          <article-title>-2016 Tasks 4A and 4B: An Ensemble of Neural Networks and a Word2Vec Based Model for Sentiment Classification</article-title>
          .
          <source>In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>