<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Neural Sentiment Analysis for a Real-World Application</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniele Bonadimanz</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Castellucciy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Favalliy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raniero Romagnoliy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Moschittiz</string-name>
        </contrib>
      </contrib-group>
      <issue>64</issue>
      <abstract>
        <p>English. In this paper, we describe our neural network models for a commercial application on sentiment analysis. Different from academic work, which is oriented towards complex networks for achieving a marginal improvement, real scenarios require flexible and efficient neural models. The possibility to use the same models on different domains and languages plays an important role in the selection of the most appropriate architecture. We found that a small modification of the state-of-theart network according to academic benchmarks led to a flexible neural model that also preserves high accuracy.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Italiano. In questo lavoro,
descriviamo i nostri modelli di reti neurali per
un’applicazione commerciale basata sul
sentiment analysis. A differenza del
mondo accademico, dove la ricerca e`
orientata verso reti anche complesse per
il raggiungimento di un miglioramento
marginale, gli scenari di utilizzo reali
richiedono modelli neurali flessibili,
efficienti e semplici. La possibilita´ di
utilizzare gli stessi modelli per domini e
linguaggi variegati svolge un ruolo
importante nella scelta dell’architettura.
Abbiamo scoperto che una piccola modifica
della rete allo stato dell’arte rispetto ai
benchmarks accademici produce un
modello neurale flessibile che preserva anche
un’elevata precisione.</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction</title>
      <p>
        In recent years, Sentiment Analysis (SA) in
Twitter has been widely studied. Its popularity has
been fed by the remarkable interest of the
industrial world on this topic as well as the relatively
easy access to data, which, among other, allowed
the academic world to promote evaluation
campaigns, e.g.,
        <xref ref-type="bibr" rid="ref12">(Nakov et al., 2016)</xref>
        , for different
languages. Many models have been developed and
tested on these benchmarks, e.g.,
        <xref ref-type="bibr" rid="ref10 ref16 ref3 ref9">(Li et al., 2010;
Kiritchenko et al., 2014; Severyn and Moschitti,
2015; Castellucci et al., 2016)</xref>
        . They all appear
very appealing from an industrial perspective, as
SA is strongly connected to many types of
business through specific KPIs1. However, previous
academic work has not provided clear indications
on how to select the most appropriate learning
architecture for industrial applications.
      </p>
      <p>In this paper, we report on our experience on
adopting academic models of SA to a
commercial application. This is a social media and
microblogging monitoring platform to analyze brand
reputation, competition, the voice of the customer
and customer experience. More in detail,
sentiment analysis algorithms register customers’
opinions and feedbacks on services and products, both
direct and indirect.</p>
      <p>An important aspect is that such clients push for
easily adaptable and reliable solutions. Indeed,
multi-tenant applications and sentiment analysis
requirements cause a high variability of the
approaches to the tasks within the same platform.
This should be capable of managing multi-domain
and multi-channel content in different languages
as it provides services for several clients in
different market segments. Moreover, scalability and
lightweight use of computational resources
preserving accuracy is also an important aspect.
Finally, dealing with different client domains and
data potentially requires constantly training new
models with limited time availability.</p>
      <p>
        To meet the above requirements we started from
1Key Performance Indicators are strategic factors
enabling the performance measurement of a process or activity.
the state-of-the-art model proposed in
        <xref ref-type="bibr" rid="ref16">(Severyn
and Moschitti, 2015)</xref>
        , which is a Convolutional
Neural Network (CNN) with few layers mainly
devoted to encoding a sentence representation. We
modified it by adopting a recurrent pooling layer,
which allows the network to learn longer
dependencies in the input sentence. An additional
benefit is that such simple architecture makes the
network more robust to biases from the dataset,
generalizing better on the less represented classes.
Our experiments on the SemEval data in English
as well as on a commercial dataset in Italian show
a constant improvement of our networks over the
state of the art.
      </p>
      <p>In the following, Section 2 places the current
work in the literature. Section 3 introduces the
application scenario. Sections 4 and 5 presents
respectively our proposal for a flexible architecture
and the experimental results. Finally, Section 6
reports the conclusions.
2</p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        Although sentiment analysis has been around for
one decade, a clear and exact comparison of
models has been achieved thanks to the organization
of international evaluation campaigns. The main
campaign for SA in Twitter in English is SemEval,
which has been organized since 2013. A similar
campaign in the Italian language (SENTIPOLC)
        <xref ref-type="bibr" rid="ref2">(Barbieri et al., 2016)</xref>
        is promoted within Evalita
since 2014.
      </p>
      <p>
        Among other approaches, Neural Networks
(NNs), and in particular CNNs, outperformed the
previous state of the art techniques
        <xref ref-type="bibr" rid="ref1 ref16 ref3 ref5">(Severyn and
Moschitti, 2015; Castellucci et al., 2016; Attardi
et al., 2016; Deriu et al., 2016)</xref>
        . Those systems
share some architectural choices: (i) use of
Convolutional Sentence encoders
        <xref ref-type="bibr" rid="ref7">(Kim, 2014)</xref>
        , (ii)
leveraging pre-trained word2vec embeddings
        <xref ref-type="bibr" rid="ref11">(Mikolov
et al., 2013)</xref>
        and (iii) use of distant supervision to
pre-train the network
        <xref ref-type="bibr" rid="ref6">(Go et al., 2009)</xref>
        . Despite
this network is simple and provides state of the art
results, it does not model long-term dependencies
in the tweet by construction.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>Application Scenario</title>
      <p>Our commercial application is a social media
and micro-blogging monitoring platform, which is
used to analyze brand reputation, competitors, the
voice of the customer and customer experience. It
is capable of managing multi-domain and
multichannel content in different languages and it is
provided as a service for several clients on
different market segments.</p>
      <p>The application uses an SA algorithm to analyze
the customers’ opinions and feedbacks on services
and products, both direct and indirect. The
sentiment metric is used by the application clients to
point out customer experience, expectations, and
perception. The final aim is to promptly react
and identify improvement opportunities and,
afterward, measure the impact of the adopted
initiatives.
3.1</p>
      <sec id="sec-4-1">
        <title>Focused Problem Description</title>
        <p>Industrial applications, used by demanding
clients, and dealing with real data tend to prefer
easily adaptable and reliable solutions. Major
problems are related to multi-tenant applications
with several client requirements on the sentiment
analysis problem, often requiring variations
on task approaches within the same platform.
Moreover, high attention is put on scalability
and lightweight use of computational resources,
preserving accurate performance. Finally, dealing
with different client domains and data potentially
requires constantly training new models with
limited time availability.
3.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Data Description</title>
        <p>The commercial social media and micro-blogging
monitoring platform continuously acquires data
coming from several sources; among these, we
selected Twitter data as the main source for our
purposes.</p>
        <p>First, the public Twitter stream was collected for
several months without specific domain restriction
to build the dataset used for the word embedding
training. The total amount of tweets used accounts
for 100 million Italian tweets and 50 million
English tweets.</p>
        <p>Then, a dataset has been constructed from a
specific market sector in Italian. The data collection
was performed on the public Twitter stream with
specific word restriction performed in order to
filter the tweets of interest on the automotive
domain. Afterward, the commercial platform applies
different techniques in order to exclude from these
collections the tweets that are not relevant for the
specific insight analysis.</p>
        <p>The messages were then used to construct the
dataset for our experiments. A manual
annotation phase has been performed together with the
demanding client in order to best suit the insight
objective requirement. Even though structured
guidelines were agreed upon before creating the
dataset and continuously checked against, this
approach tended to generate dataset characteristics:
in particular, unbalanced distribution of the
examples over the different classes has been
measured. It makes necessary a flexible model
capable of handling such phenomena without the
need of costly tuning phases and/or network
reengineering.
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Our Neural Network Approach</title>
      <p>
        The task of SA in Twitter aims at
classifying a tweet t 2 T into one of the three
sentiment classes c 2 C, where C =
fpositive; neutral; negativeg. This can be
achieved by learning function f : T ! C through
a neural network. The architecture here proposed
is based on
        <xref ref-type="bibr" rid="ref16">(Severyn and Moschitti, 2015)</xref>
        and it
is structured in three steps: (i) a tweet is encoded
into an embedding matrix, (ii) an encoder maps
the tweet matrix into a fixed size vector and (iii)
a single output layer (a logistic regression layer)
classifies this vector over the three classes.
      </p>
      <p>In contrast to Severyn and Moschitti (2015), we
adopted a Recurrent Pooling layer that allows the
network to learn longer dependencies in the input
sentence (i.e. sentiment shifts). This architectural
change makes the network less sensible to learn
biases from the dataset and therefore generalize
better on poorly represented classes.</p>
      <p>Embedding: a tweet t is represented as a
sequence of words fw1; ::; wj ; ::; wN g. Tweets are
encoded into a sentence matrix t 2 Rd jtj,
obtained by concatenating its word vectors wj ,
where d is the size of the word embeddings.
Sentence Encoder: it is a function that maps the
sentence matrix t into a fixed size vector x
representing the whole sentence. Severyn and
Moschitti (2015) used a convolutional layer followed
by a global max-pooling layer to encode tweets.
The convolution operation applies a sliding
window operation (with window of size m) over the
input sentence matrix. More specifically, it applies
a non-linear transformation generating an output
matrix ~x 2 RN dconv where dconv is the
number of convolutional filters and N is the length of
the sentence. The max-pooling operation applies
an element-wise max operation to the transformed
sentence matrix ~x, resulting in a fixed size vector
representing the whole sentence.</p>
      <p>
        In this work, we propose to substitute the
maxpooling operation with a Bidirectional Gated
Recurrent Unit (BiGRU)
        <xref ref-type="bibr" rid="ref15 ref4">(Chung et al., 2014;
Schuster and Paliwal, 1997)</xref>
        . The GRU is a Gated
Recurrent Neural Network capturing long term
dependencies over the input. A GRU processes the
input in a direction (e.g., from left to right),
updating a hidden state that keeps the memory of what
the network has processed so far. In this way, a
whole sentence can be represented by taking the
hidden state at the last step. In order to capture
dependencies in both directions, i.e., a stronger
representation of the sentence, we apply a BiGRU,
which performs a GRU operation in both the
directions BiGRU (x~) = [GR!U (x~); GRU (x~)].
Classification: the final module of the network
is the output layer (a logistic regression) that
performs a linear transformation over the sentence
vector by mapping it in a dclass dimensional
vector followed by a softmax activation, where dclass
is the number of classes.
5
      </p>
    </sec>
    <sec id="sec-6">
      <title>Experiments</title>
      <sec id="sec-6-1">
        <title>5.1 Setup</title>
        <p>Similarly to Severyn and Moschitti (2015), for the
CNN, we use a convolutional operation of size 5
and dconv = 128 with rectified linear unit
activation, ReLU. For the BiGRU, we use 150 hidden
units for both GRU and GR!U obtaining a fixed
size vector of size 300.</p>
        <p>Word embeddings: for all the proposed
models, we pre-initialize the word embedding matrices
with the standard skip-gram embedding of
dimensionality 50 trained on tweets retrieved from the
Twitter Stream.</p>
        <p>
          Training: the network is trained using SGD
with shuffled mini-batches using the Adam
update rule
          <xref ref-type="bibr" rid="ref4 ref8 ref9">(Kingma and Ba, 2014)</xref>
          and an early
stopping
          <xref ref-type="bibr" rid="ref13">(Prechelt, 1998)</xref>
          strategy with patience
p = 10. Early stopping allows avoiding
overfitting and to improve the generalization
capabilities of the network. Then, we opted for adding
dropout
          <xref ref-type="bibr" rid="ref17">(Srivastava et al., 2014)</xref>
          with rates of 0:2
to improve generalization and avoid co-adaptation
of features
          <xref ref-type="bibr" rid="ref17">(Srivastava et al., 2014)</xref>
          .
        </p>
        <p>
          Datasets: we trained and evaluated our
architecture on two datasets: the English dataset of
Semeval 2015
          <xref ref-type="bibr" rid="ref14">(Rosenthal et al., 2015)</xref>
          described by
Table 1 in terms of the size of the data splits and
positive, negative and neutral instances. We used
the validation set for parameter tuning and to apply
early stopping whereas the systems are evaluated
on the two test sets of 2013 and 2015, respectively.
        </p>
        <p>The Italian dataset was built in-house for the
automotive domain: we collected from the Twitter
stream as explained in Section 3.2 and divided it
into three different splits for training, validation
and testing, respectively. Table 2 shows the size of
the splits. Due to the nature of the domain, many
tweets in the dataset are neutral or objective, this
makes the label distribution much different from
the usual benchmarks. For example, the neutral
class is the least represented in the English dataset
(see Table 1) and the most represented in the
Italian data. The imbalance can potentially bias
neural networks towards the most represented class.
One of the features our approach is to diminish
such effect.</p>
        <p>Evaluation metrics: we used the following
evaluation metrics, Macro-F1 (the average of the
F1 over the three sentiment categories).
Additionally, we report the F 1p;n, which is the average F1
of the positive and negative class. This metric is
the official evaluation score of the SemEval
competition.
5.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Results on English Data</title>
        <p>Table 3 presents the results on the English dataset
of SemEval 2015. The first row shows the
outcome reported by Severyn and Moschitti (2015)
(S&amp;M). CNN+Max is a reimplementation of the
above system with Convolution and Max-Pooling
but trained just on the official training data without
distant supervision. This system is used as a strong
baseline in all our experiments. Lastly, we report
the results obtained with the BiGRU pooling
strategy described in Section 4. The proposed
architecture presents a slight improvement over the strong
baseline ( 1 point of both F 1 and F 1p;n score on
the test).
5.3</p>
      </sec>
      <sec id="sec-6-3">
        <title>Results on Italian Data</title>
        <p>Table 4 presents the result on the Italian
dataset. Despite that on this dataset the proposed
CNN+BiGRU model obtains lower F1 scores, it
shows improved performance in terms of F 1p;n (5
points on both validation and test sets). This
suggests that the proposed model tends to generalize
better on the less represented classes, which, in the
case of the Italian training dataset, are the positive
and negative classes (as pointed out in Table 2).
5.4</p>
      </sec>
      <sec id="sec-6-4">
        <title>Discussion of the Results</title>
        <p>We analyzed the classification scores of some
words to show that our approach is less affected
by the skewed distribution of the dataset. The
sentiment trends, as captured by the neural
network in terms of scores, are shown in Table
5.4). For example, the word Mexico classified by
CNN+Max produces the scores, 0:06, 0:35, 0:57,
while CNN+BiGRU outcome, 0:18, 0:52, 0:30,
for the negative, neutral and positive classes,
respectively. This shows that CNN+BiGRU is less
biased by the data distribution of the sampled word
in the dataset, which is, 0, 1, 5, i.e., Mexico
appears 5 times more in positive than in neutral
messages and never in negative messages.</p>
        <p>This skewed distribution biased more
CNN+Max as the positive class gets 0:57
while the negative one only 0.06. CNN+BiGRU is
able, instead, to recover the correct neutral class.
We believe that CNN+Max is more influenced by
the distribution bias as the max pooling operation
seems to capture very local phenomena. In
contrast, BiGRU exploits the entire word sequence
and thus can better capture larger informative
context.</p>
        <p>A similar analysis in Italian shows the same
trends. For example, the word panda is classified
as, 0:05, 0:28, 0:66, by CNN+Max and 0:07, 0:56,
0:35 by CNN+BiGRU, for negative, neutral and
positive classes, respectively. Again, the
distribution in the Italian training set of this word is very
skewed towards the positive class: it confirms that
CNN+Max is more influenced by the distribution
bias, while our architecture can better deal with it.
6</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>In this paper, we have studied state-of-the-art
neural networks for the Sentiment Analysis of
Twitter text associated with a real application scenario.
We modified the network architecture by
applying a recurrent pooling layer enabling the learning
of longer dependencies between words in tweets.
The recurrent pooling layer makes the network
more robust to unbalanced data distribution. We
have tested our models on the academic
benchmark and most importantly on our data derived
from a real-world commercial application. The
results show that our approach works well for
both English and Italian languages. Finally, we
observed that our network suffers less from the
dataset distribution bias.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Attardi</surname>
          </string-name>
          , Daniele Sartiano, Chiara Alzetta, and
          <string-name>
            <given-names>Federica</given-names>
            <surname>Semplici</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Convolutional neural networks for sentiment analysis on italian tweets</article-title>
          .
          <source>In Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ), Napoli, Italy, December 5-
          <issue>7</issue>
          ,
          <year>2016</year>
          . CEURWS.org.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Valerio Basile, Danilo Croce, Malvina Nissim, Nicole Novielli, and
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the evalita 2016 sentiment polarity classification task</article-title>
          .
          <source>In Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Castellucci</surname>
          </string-name>
          , Danilo Croce, and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Basili</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Context-aware convolutional neural networks for twitter sentiment analysis in italian</article-title>
          .
          <source>In Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ), Napoli, Italy, December 5-
          <issue>7</issue>
          ,
          <year>2016</year>
          . CEURWS.org.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Junyoung</given-names>
            <surname>Chung</surname>
          </string-name>
          , Caglar Gulcehre, KyungHyun Cho, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Empirical evaluation of gated recurrent neural networks on sequence modeling</article-title>
          .
          <source>arXiv preprint arXiv:1412</source>
          .
          <fpage>3555</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Jan</given-names>
            <surname>Deriu</surname>
          </string-name>
          , Maurice Gonzenbach, Fatih Uzdilli, Aurelien Lucchi, Valeria De Luca, and
          <string-name>
            <given-names>Martin</given-names>
            <surname>Jaggi</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Swisscheese at semeval-2016 task 4: Sentiment classification using an ensemble of convolutional neural networks with distant supervision</article-title>
          .
          <source>In SemEval@ NAACL-HLT</source>
          , pages
          <fpage>1124</fpage>
          -
          <lpage>1128</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Alec</given-names>
            <surname>Go</surname>
          </string-name>
          , Richa Bhayani, and
          <string-name>
            <given-names>Lei</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Twitter sentiment classification using distant supervision</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Yoon</given-names>
            <surname>Kim</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Convolutional neural networks for sentence classification</article-title>
          . In Alessandro Moschitti, Bo Pang, and Walter Daelemans, editors,
          <source>Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29</source>
          ,
          <year>2014</year>
          , Doha,
          <string-name>
            <surname>Qatar,</surname>
          </string-name>
          <article-title>A meeting of SIGDAT, a Special Interest Group of the ACL</article-title>
          , pages
          <fpage>1746</fpage>
          -
          <lpage>1751</lpage>
          . ACL.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Diederik</given-names>
            <surname>Kingma</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jimmy</given-names>
            <surname>Ba</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>arXiv preprint arXiv:1412</source>
          .
          <fpage>6980</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Svetlana</given-names>
            <surname>Kiritchenko</surname>
          </string-name>
          , Xiaodan Zhu, and
          <string-name>
            <surname>Saif</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Mohammad</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Sentiment analysis of short informal texts</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <volume>50</volume>
          (
          <issue>1</issue>
          ):
          <fpage>723</fpage>
          -
          <lpage>762</lpage>
          , May.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Shoushan</given-names>
            <surname>Li</surname>
          </string-name>
          , Sophia Yat Mei Lee, Ying Chen, ChuRen Huang, and
          <string-name>
            <given-names>Guodong</given-names>
            <surname>Zhou</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Sentiment classification and polarity shifting</article-title>
          .
          <source>In Proceedings of the 23rd International Conference on Computational Linguistics</source>
          , pages
          <fpage>635</fpage>
          -
          <lpage>643</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Ilya Sutskever, Kai Chen, Greg S Corrado, and
          <string-name>
            <given-names>Jeff</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Preslav</given-names>
            <surname>Nakov</surname>
          </string-name>
          , Alan Ritter, Sara Rosenthal, Fabrizio Sebastiani, and
          <string-name>
            <given-names>Veselin</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Semeval2016 task 4: Sentiment analysis in twitter</article-title>
          .
          <source>In SemEval@ NAACL-HLT</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Lutz</given-names>
            <surname>Prechelt</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Early stopping-but when? In Neural Networks: Tricks of the Trade, This Book is an Outgrowth of a 1996 NIPS Workshop</article-title>
          , pages
          <fpage>55</fpage>
          -
          <lpage>69</lpage>
          , London, UK, UK. Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Sara</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          , Preslav Nakov, Svetlana Kiritchenko, Saif Mohammad, Alan Ritter, and
          <string-name>
            <given-names>Veselin</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2015</year>
          . Semeval-2015 task 10:
          <article-title>Sentiment analysis in twitter</article-title>
          .
          <source>In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2015</year>
          ), pages
          <fpage>451</fpage>
          -
          <lpage>463</lpage>
          , Denver, Colorado, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Mike</given-names>
            <surname>Schuster and Kuldip K Paliwal</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Bidirectional recurrent neural networks</article-title>
          .
          <source>IEEE Transactions on Signal Processing</source>
          ,
          <volume>45</volume>
          (
          <issue>11</issue>
          ):
          <fpage>2673</fpage>
          -
          <lpage>2681</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Aliaksei</given-names>
            <surname>Severyn</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Moschitti</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Twitter sentiment analysis with deep convolutional neural networks</article-title>
          .
          <source>In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pages
          <fpage>959</fpage>
          -
          <lpage>962</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Nitish</given-names>
            <surname>Srivastava</surname>
          </string-name>
          , Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and
          <string-name>
            <given-names>Ruslan</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Dropout: a simple way to prevent neural networks from overfitting</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>15</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>