=Paper=
{{Paper
|id=Vol-1749/paper_029
|storemode=property
|title=Context–aware Convolutional Neural Networks for Twitter Sentiment Analysis in Italian
|pdfUrl=https://ceur-ws.org/Vol-1749/paper_029.pdf
|volume=Vol-1749
|authors=Giuseppe Castellucci,Danilo Croce,Roberto Basili
|dblpUrl=https://dblp.org/rec/conf/clic-it/CastellucciCB16
}}
==Context–aware Convolutional Neural Networks for Twitter Sentiment Analysis in Italian==
<pdf width="1500px">https://ceur-ws.org/Vol-1749/paper_029.pdf</pdf>
<pre>
                    Context-aware Convolutional Neural Networks
                      for Twitter Sentiment Analysis in Italian
                        Giuseppe Castellucci, Danilo Croce, Roberto Basili
                              Department of Enterprise Engineering
                                 University of Roma, Tor Vergata
                             Via del Politecnico 1, 00133 Roma, Italy
                castellucci@ing.uniroma2.it, {croce,basili}@info.uniroma2.it


                     Abstract                         Evalita 2016 evaluation campaign is described.
                                                      The system is based on a cascade of three clas-
    English.        This paper describes the          sifiers based on Deep Learning methods and it
    Unitor system that participated to the            has been applied to all the three sub-tasks of
    SENTIment POLarity Classification task            SENTIPOLC: Subjectivity Classification, Polar-
    proposed in Evalita 2016. The system im-          ity Classification and the pilot task called Irony
    plements a classification workflow made           Detection. Each classifier is implemented with
    of several Convolutional Neural Network           a Convolutional Neural Network (CNN) (LeCun
    classifiers, that generalize the linguistic in-   et al., 1998) according the modeling proposed in
    formation observed in the training tweets         (Croce et al., 2016). The adopted solution ex-
    by considering also their context. More-          tends the CNN architecture proposed in (Kim,
    over, sentiment specific information is           2014) with (i) sentiment specific information de-
    injected in the training process by us-           rived from an automatically derived polarity lex-
    ing Polarity Lexicons automatically ac-           icon (Castellucci et al., 2015a), and (ii) with the
    quired through the automatic analysis of          contextual information associated with each tweet
    unlabeled collection of tweets. Unitor            (see (Castellucci et al., 2015b) for more informa-
    achieved the best results in the Subjectiv-       tion about the contextual modeling in SA in Twit-
    ity Classification sub-task, and it scored        ter). The Unitor system ranked 1st in the Sub-
    2nd in the Polarity Classification sub-task,      jectivity Classification task and 2nd in the Polar-
    among about 25 different submissions.             ity Detection task among the unconstrained sys-
    Italiano. Questo lavoro descrive il sis-          tems, resulting as one of the best solution in the
    tema Unitor valutato nel task di SEN-             challenge. It is a remarkable result as the CNNs
    TIment POLarity Classification proposto           have been trained without any complex feature en-
    all’interno di Evalita 2016. Il sistema é        gineering but adopting almost the same modeling
    basato su un workflow di classificazione          in each sub-task. The proposed solution allows
    implementato usando Convolutional Neu-            to achieve state-of-the-art results in Subjectivity
    ral Network, che generalizzano le evidenze        Classification and Polarity Classification task by
    osservabili all’interno dei dati di adde-         applying unsupervised analysis of unlabeled data
    stramento analizzando i loro contesti e           that can be easily gathered by Twitter.
    sfruttando lessici specifici per la analisi          In Section 2 the deep learning architecture
    del sentimento, generati automaticamente.         adopted in Unitor is presented, while the clas-
    Il sistema ha ottenuto ottimi risultati, otte-    sification workflow is presented in 3. In Section
    nendo la miglior performance nel task di          4 the experimental results are reported and dis-
    Subjectivity Classification e la seconda nel      cussed, while Section 5 derives the conclusions.
    task di Polarity Classification.
                                                      2   A Sentiment and Context aware
                                                          Convolutional Neural Networks
1   Introduction
                                                      The Unitor system is based on the Convolu-
In this paper, the Unitor system participating        tional Neural Network (CNN) architecture for text
in the Sentiment Polarity Classification (SEN-        classification proposed in (Kim, 2014), and further
TIPOLC) task (Barbieri et al., 2016) within the       extended in (Croce et al., 2016). This deep net-
work is characterized by 4 layers (see Figure 1).            bias term bc that is used to classify a message,
   The first layer represents the input through word         given the learned representations c̃. In particu-
embedding: it is a low-dimensional representation            lar, the final classification y is obtained through
of words, which is derived by the unsupervised               argmaxy∈Y (sof tmax(S · c̃ + bc )), where Y is
analysis of large-scale corpora, with approaches             the set of classes of interest.
similar to (Mikolov et al., 2013). The embedding                In order to reduce the risk of over-fitting, two
of a vocabulary V is a look-up table E, where                forms of regularization are applied, as in (Kim,
each element is the d−dimensional representation             2014). First, a dropout operation over the penulti-
of a word. Details about this representation will            mate layer (Hinton et al., 2012) is adopted to pre-
be discussed in the next sections. Let xi ∈ Rd be            vent co-adaptation of hidden units by randomly
the d-dimensional representation of the i-th word.           dropping out, i.e., setting to zero, a portion of
A sentence of length n is represented through the            the hidden units during forward-backpropagation.
concatenation of the word vectors composing it,              The second regularization is obtained by con-
i.e., a matrix I whose dimension is n × d.                   straining the l2 norm of S and bc .
   The second layer represents the convolutional
features that are learned during the training stage.         2.1   Injecting Sentiment Information through
A filter, or feature detector, W ∈ Rf ×d , is applied              Polarity Lexicons
over the input layer matrix producing the learned            In (Kim, 2014), the use of word embeddings is
representations. In particular, a new feature ci is          advised to generalize lexical information. These
learned according to: ci = g(W · Ii:i+f −1 + b),             word representations can capture paradigmatic re-
where g is a non-linear function, such as the                lationships between lexical items. They are best
rectifier function, b ∈ R is a bias term and                 suited to help the generalization of learning al-
Ii:i+f −1 is a portion of the input matrix along             gorithms in natural language tasks. However,
the first dimension. In particular, the filter slides        paradigmatic relationships do not always reflect
over the input matrix producing a feature map                the relative sentiment between words. In Deep
c = [c1 , . . . , cn−h+1 ]. The filter is applied over the   Learning, it is a common practice to make the in-
whole input matrix by assuming two key aspects:              put representations trainable in the final learning
local invariance and compositionality. The former            stages. This is a valid strategy, but it makes the
specifies that the filter should learn to detect pat-        learning process more complex. In fact, the num-
terns in texts without considering their exact po-           ber of learnable parameters increases significantly,
sition in the input. The latter specifies that each          resulting in the need of more annotated examples
local patch of height f , i.e., a f -gram, of the input      in order to adequately estimate them.
should be considered in the learned feature repre-
                                                                We advocate the adoption of a multi-channel in-
sentations. Ideally, a f -gram is composed through
                                                             put representation, which is typical of CNNs in
W into a higher level representation.
                                                             image processing. A first channel is dedicated to
   In practice, multiple filters of different heights        host representations derived from a word embed-
can be applied resulting in a set of learned                 ding. A second channel is introduced to inject
representations, which are combined in a third               sentiment information of words through a large-
layer through the max-over-time operation, i.e.,             scale polarity lexicon, which is acquired accord-
c̃ = max{c}. It is expected to select the most               ing to the methodology proposed in (Castellucci
important features, which are the ones with the              et al., 2015a). This method leverages on word
highest value, for each feature map. The max-                embedding representations to assign polarity in-
over-time pooling operation serves also to make              formation to words by transferring it from sen-
the learned features of a fixed size: it allows to           tences whose polarity is known. The resultant lex-
deal with variable sentence lengths and to adopt             icons are called Distributional Polarity Lexicons
the learned features in fully connected layers.              (DPLs). The process is based on the capability
   This representation is finally used in the fourth         of word embedding to represent both sentences
layer, that is a fully connected softmax layer.              and words in the same space (Landauer and Du-
It classifies the example into one of the cate-              mais, 1997). First, sentences (here tweets) are la-
gories of the task. In particular, this layer is             beled with some polarity classes: in (Castellucci
characterized by a parameter matrix S and a                  et al., 2015a) this labeling is achieved by apply-
      good
      luck                                                                                                  targeted
      to                                                                                                    classes
      all
      the
      juniors
      tomorrow
      :)
      !
                   word embedding     DPL

           input layer - embedding lookup       convolution filters         max pooling       fully connected
                                                     (2,3,4)                                   softmax layer
      Figure 1: The Convolutional Neural Network architecture adopted for the Unitor system.

ing a Distant Supervision (Go et al., 2009) heuris-                   tor from the embedding with the polarity scores
tic. The labeled dataset is projected in the em-                      derived from the DPL1 . In Table 1, a compari-
bedding space by applying a simple but effective                      son of the most similar words of polarity carri-
linear combination of the word vectors composing                      ers is compared when the polarity lexicon is not
each sentence. Then, a polarity classifier is trained                 adopted (second column) and when the multi-
over these sentences in order to emphasize those                      channel schema is adopted (third column). Notice
dimensions of the space more related to the polar-                    that, the DPL positively affects the vector repre-
ity classes. The DPL is generated by classifying                      sentations for SA. For example, the word pessimo
each word (represented in the embedding through                       is no longer in set of the 3-most similar words of
a vector) with respect to each targeted class, using                  the word ottimo. The polarity information cap-
the confidence level of the classification to derive                  tured in the DPL making words that are seman-
a word polarity signature. For example, in a DPL                      tically related and whose polarity agrees nearer in
the word ottimo is 0.89 positive, 0.04 negative and                   the space.
0.07 neutral (see Table 1). For more details, please
refer to (Castellucci et al., 2015a).                                 2.2   Context-aware model for SA in Twitter
                                                                      In (Severyn and Moschitti, 2015) a pre-training
            Term                 w/o DPL        w/ DPL                strategy is suggested for the Sentiment Analy-
                                pessimo       ottima
   ottimo (0.89,0.04,0.07)      eccellente    eccellente              sis task. The adoption of heuristically classified
                                ottima        fantastico              tweet messages is advised to initialize the network
                                peggior       peggior                 parameters. The selection of messages is based
  peggiore (0.17,0.57,0.26)     peggio        peggio
                                migliore      peggiori                on the presence of emoticons (Go et al., 2009)
                                deprimente    deprimente              that can be related to polarities, e.g. :) and :(.
    triste (0.04,0.82,0.14)     tristissima   tristissima             However, selecting messages only with emoticons
                                felice        depressa
                                                                      could potentially introduce many topically unre-
Table 1: Similar words in the embedding without                       lated messages that use out-of-domain linguistic
(2nd column) and with (3rd column) DPL, whose                         expressions and limiting the contribution of the
scores (positivity, negativity, neutrality) are in the                pre-training. We instead suggest to adopt another
first column.                                                         strategy for the selection of pre-training data. We
                                                                      draw on the work in (Vanzo et al., 2014), where
   This method has two main advantages: first, it                     topically related messages of the target domain
allows deriving a signature for each word in the                      are selected by considering the reply-to or hash-
embedding to be used in the CNN; second, this                         tag contexts of each message. The former (con-
method allows assigning sentiment information to                      versational context) is made of the stream of mes-
words by observing their usage. This represents                       sages belonging to the same conversation in Twit-
an interesting setting to observe sentiment related                   ter, while the latter (hashtag context) is composed
phenomena, as often a word does not carry a senti-                    by tweets preceding a target message and shar-
ment if not immersed in a context (i.e., a sentence).                 ing at least one hashtag with it. In (Vanzo et al.,
   As proposed in (Croce et al., 2016), in order                      2014), these messages are first classified through a
to keep limited the computational complexity of                           1
                                                                            We normalize the embedding and the DPL vectors before
the training phase of CNN, we augment each vec-                       the juxtaposition.
context-unaware SVM classifier. Here, we are go-        tive tweets, with respect to the classes neutral,
ing to leverage on contextual information for the       positive, negative and conflict. The
selection of pre-training material for the CNN. We      Irony CNN is trained over the subset of subjective
select the messages both in the conversation con-       tweets, with respect to the classes ironic and
text, and we classify them with a context-unaware       not-ironic.
classifier to produce the pre-training dataset.            Each CNN classifier has been trained in the
                                                        two settings specified in the SENTIPOLC guide-
3   The Unitor Classification Workflow                  lines: constrained and unconstrained. The con-
The SENTIPOLC challenge is made of three sub-           strained setting refers to a system that adopted
tasks aiming at investigating different aspects of      only the provided training data. For example, in
the subjectivity of short messages. The first sub-      the constrained setting it is forbidden the use of
task is the Subjectivity Classification that consists   a word embedding generated starting from other
in deciding whether a message expresses subjec-         tweets. The unconstrained systems, instead, can
tivity or it is objective. The second task is the       adopt also other tweets in the training stage. In
Polarity Classification: given a subjective tweet       our work, the constrained CNNs are trained with-
a system should decide whether a tweet is ex-           out using a pre-computed word embedding in the
pressing a neutral, positive, negative or conflict      input layer. In order to provide input data to the
position. Finally, the Irony Detection sub-task         neural network, we randomly initialized the word
aims at finding whether a message is express-           embedding, adding them to the parameters to be
ing ironic content or not. The Unitor system            estimated in the training process: in the follow-
tackles each sub-task with a different CNN clas-        ing, we will refer to the constrained classifica-
sifier, resulting in a classification workflow that     tion workflow as Unitor. The unconstrained
is summarized in the Algorithm 1: a message is          CNNs are instead initialized with pre-computed
first classified with the Subjectivity CNN-based        word embedding and DPL. Notice that in this set-
classifier S; in the case the message is classified     ting we do not back-propagate over the input layer.
as subjective (subjective=True), it is also             The word embedding is obtained from a corpus
processed with the other two classifiers, the Po-       downloaded in July 2016 of about 10 millions of
larity classifier P and the Irony classifier I. In      tweets. A 250-dimensional embedding is gener-
the case the message is first classified as objec-      ated according to a Skip-gram model (Mikolov et
tive (subjective=False), the remaining clas-            al., 2013)2 . Starting from this corpus and the gen-
sifiers are not invoked.                                erated embedding, we acquired the DPL accord-
                                                        ing to the methodology described in Section 2.1.
Algorithm 1 Unitor classification workflow.             The final embedding is obtained by juxtaposing
1: function T A G (tweet T, cnn S, cnn P, cnn I)        the Skip-gram vectors and the DPL3 , resulting in a
2:    subjective = S(T)
3:    if subjective==True then                          253-dimensional representation for about 290, 000
4:        polarity = P(T), irony = I(T)                 words, as shown in Figure 1. The resulting clas-
5:    else                                              sification workflow made of unconstrained classi-
6:        polarity = none, irony = none
7:    end if                                            fier is called Unitor-U1. Notice that these word
        return subjective, polarity, irony              representations represent a richer feature set for
8: end function
                                                        the CNN, however the cost of obtaining them is
                                                        negligible, as no manual activity is needed.
   The same CNN architecture is adopted to im-             As suggested in (Croce et al., 2016), the con-
plement all the three classifiers and tweets are        textual pre-training (see Section 2.2) is obtained
modeled in the same way for the three sub-tasks.        by considering the conversational contexts of the
Each classifier has been specialized to the corre-      provided training data. This dataset is made of
sponding sub-task by adopting different selection       about 2, 200 new messages, that have been clas-
policies of the training material and adapting the      sified with the Unitor-U1 system. This set of
output layer of the CNN to the sub-task specific
                                                           2
classes. In detail, the Subjectivity CNN is trained           The following settings are adopted: window 5 and min-
over the whole training dataset with respect to the     count 10 with hierarchical softmax
                                                            3
                                                              Measures adopting only the Skip-gram vectors have been
classes subjective and objective. The Po-               pursued in the classifier tuning stage; these have highlighted
larity CNN is trained over the subset of subjec-        the positive contribution of the DPL.
messages is adopted to initialize the network pa-       .7444 (Table 2). Moreover, also the Unitor-U2
rameters. In the following, the system adopting         system is capable of adequately classify whether
the pre-trained CNNs is called Unitor-U2.               a message is subjective or not. The fact that the
    The CNNs have a number of hyper-parameters          pre-trained system is not performing as well as
that should be fine-tuned. The parameters we            Unitor-U1, can be ascribed to the fact that the
investigated are: size of filters, i.e., capturing      pre-training material size is actually small. Dur-
2/3/4/5-grams. We combined together multiple            ing the classifier tuning phases we adopted also
filter sizes in the same run. The number of filters     the hashtag contexts (about 20, 000 messages)
for each size: we selected this parameter among         (Vanzo et al., 2014) to pre-train our networks: the
50, 100 and 200. The dropout keep probability           measures over the development set indicated that
has been selected among 0.5, 0.8 and 1.0. The fi-       probably the hashtag contexts were introducing
nal parameters has been determined over a devel-        too many unrelated messages. Moreover, the
opment dataset, made of the 20% of the training         pre-training material has been classified with the
material. Other parameters have been kept fixed:        Unitor-U1 system. It could be the case that
batch size (100), learning rate (0.001), number         the adoption of such added material was not so
of epochs (15) and L2 regularization (0.0). The         effective, as instead demonstrated in (Croce et
CNNs are implemented in Tensorflow4 and they            al., 2016). In fact, in that work the pre-training
have been optimized with the Adam optimizer.            material was classified with a totally different
                                                        algorithm (Support Vector Machine) and a totally
4       Experimental Results                            different representation (kernel-based). In this
In Tables 2, 3 and 4 the performances of the            setting, the different algorithm and representation
Unitor systems are reported, respectively for the       produced a better and substantially different
task of Subjectivity Classification, Polarity Classi-   dataset, in terms of covered linguistic phenomena
fication and Irony Detection. In the first Table (2)    and their relationships with the target classes.
the F-0 measure refers to the F1 measure of the         Finally, the constrained version of our system, ob-
objective class, while F-1 refers to the F1 mea-        tained a remarkable score of .7134, demonstrating
sure of the subjective class. In the Table 3 the F-0    that the random initialization of the input vectors
measure refers to the F1 measure of the negative        can be also adopted for the classification of the
class, while F-1 refers to the F1 measure of the        subjectivity of a message.
positive class. Notice that in this case, the neutral
                                                             System        F-0     F-1    F-Mean    Rank
class is mapped to a “not negative” and “not posi-         Unitor-C       .6486   .6279    .6382     11
tive” classification and the conflict class is mapped      Unitor-U1      .6885   .6354    .6620      2
to a “negative” and “positive” classification. The         Unitor-U2      .6838   .6312    .6575      3
F-0 and F-1 measures capture also these configu-
                                                              Table 3: Polarity Classification results
rations. In Table 4 the F-0 measure refers to the
F1 measure of the not ironic class, while F-1 refers
                                                           In Table 3 the Polarity Classification results
to the F1 measure of the ironic class. Finally, F-
                                                        are reported. Also in this task, the performances
Mean is the mean between these F-0 and F-1 val-
                                                        of the unconstrained systems are higher with re-
ues, and is the score used by the organizers for
                                                        spect to the constrained one (.662 against .6382).
producing the final ranks.
                                                        It demonstrates the usefulness of acquiring lex-
      System          F-0     F-1    F-Mean    Rank     ical representations and use them as inputs for
    Unitor-C         .6733   .7535    .7134     4       the CNNs. Notice that the performances of the
    Unitor-U1        .6784   .8105    .7444     1
    Unitor-U2        .6723   .7979    .7351     2       Unitor classifiers are remarkable, as the two un-
                                                        constrained systems rank in 2nd and 3rd position.
        Table 2: Subjectivity Classification results    The contribution of the pre-training is not positive,
                                                        as instead measured in (Croce et al., 2016). Again,
  Notice that our unconstrained system                  we believe that the problem resides in the size and
(Unitor-U1) is the best performing system               quality of the pre-training dataset.
in recognizing when a message is expressing a              In Table 4 the Irony Detection results are re-
subjective position or not, with a final F-mean of      ported. Our systems do not perform well, as all
    4
        https://www.tensorflow.org/                     the submitted systems reported a very low recall
      System        F-0     F-1   F-Mean    Rank           2016. Overview of the EVALITA 2016 SENTi-
    Unitor-C       .9358   .016    .4761     10            ment POLarity Classification Task. In Pierpaolo
    Unitor-U1      .9373   .008    .4728     11
                                                           Basile, Anna Corazza, Franco Cutugno, Simonetta
    Unitor-U2      .9372   .025    .4810     9
                                                           Montemagni, Malvina Nissim, Viviana Patti, Gio-
                                                           vanni Semeraro, and Rachele Sprugnoli, editors,
         Table 4: Irony Detection results
                                                           Proceedings of Third Italian Conference on Compu-
                                                           tational Linguistics (CLiC-it 2016) & Fifth Evalua-
                                                           tion Campaign of Natural Language Processing and
for the ironic class: for example, the Unitor-U2           Speech Tools for Italian. Final Workshop (EVALITA
recall is only .0013, while its precision is .4286. It     2016). Associazione Italiana di Linguistica Com-
can be due mainly to two factors. First, the CNN           putazionale (AILC).
devoted to the classification of the irony of a mes-
                                                         Giuseppe Castellucci, Danilo Croce, Diego De Cao,
sage has been trained with a dataset very skewed           and Roberto Basili. 2014. A multiple kernel ap-
towards the not-ironic class: in the original dataset      proach for twitter sentiment analysis in italian. In
only 868 over 7409 messages are ironic. Second, a          Fourth International Workshop EVALITA 2014.
CNN observes local features (bi-grams, tri-grams,
                                                         Giuseppe Castellucci, Danilo Croce, and Roberto
. . . ) without ever considering global constraints.       Basili. 2015a. Acquiring a large scale polarity lexi-
Irony, is not a word-level phenomenon but, in-             con through unsupervised distributional methods. In
stead, it is related to sentence or even social as-        Proc. of 20th NLDB, volume 9103. Springer.
pects. For example, the best performing system in
                                                         Giuseppe Castellucci, Andrea Vanzo, Danilo Croce,
Irony Detection in SENTIPOLC 2014 (Castellucci             and Roberto Basili. 2015b. Context-aware models
et al., 2014) adopted a specific feature, which es-        for twitter sentiment analysis. IJCoL vol. 1, n. 1:
timates the violation of paradigmatic coherence of         Emerging Topics at the 1st CLiC-It Conf., page 69.
a word with respect to the entire sentence, i.e., a
                                                         Danilo Croce, Giuseppe Castellucci, and Roberto
global information about a tweet. This is not ac-          Basili. 2016. Injecting sentiment information in
counted for in the CNN here discussed, and ironic          context-aware convolutional neural networks. Pro-
sub-phrases are likely to be neglected.                    ceedings of SocialNLP@ IJCAI, 2016.

                                                         Alec Go, Richa Bhayani, and Lei Huang. 2009. Twit-
5   Conclusions                                            ter sentiment classification using distant supervision.
                                                           CS224N Project Report, Stanford.
The results obtained by the Unitor system at
SENTIPOLC 2016 are promising, as the system              Geoffrey Hinton, Nitish Srivastava, Alex Krizhevsky,
won the Subjectivity Classification sub-task and           Ilya Sutskever, and Ruslan Salakhutdinov.
                                                           2012. Improving neural networks by prevent-
placed in 2n d position in the Polarity Classifica-        ing co-adaptation of feature detectors.     CoRR,
tion. While in the Irony Detection the results             abs/1207.0580.
are not satisfactory, the proposed architecture is
straightforward as its setup cost is very low. In        Yoon Kim. 2014. Convolutional neural networks
                                                           for sentence classification. In Proceedings EMNLP
fact, the human effort in producing data for the           2014, pages 1746–1751, Doha, Qatar, October. As-
CNNs, i.e., the pre-training material and the ac-          sociation for Computational Linguistics.
quisition of the Distributional Polarity Lexicon is
very limited. In fact, the former can be easily ac-      Tom Landauer and Sue Dumais. 1997. A solution to
                                                           plato’s problem: The latent semantic analysis the-
quired with the Twitter Developer API; the latter is       ory of acquisition, induction and representation of
realized through an unsupervised process (Castel-          knowledge. Psychological Review, 104.
lucci et al., 2015a). In the future, we need to bet-
                                                         Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. 1998.
ter model the irony detection problem, as proba-
                                                            Gradient-based learning applied to document recog-
bly the CNN here adopted is not best suited for             nition. Proc. of the IEEE, 86(11), Nov.
such task. In fact, irony is a more global linguistic
phenomenon than the ones captured by the (local)         Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Cor-
                                                           rado, and Jeffrey Dean. 2013. Distributed represen-
convolutions operated by a CNN.                            tations of words and phrases and their composition-
                                                           ality. CoRR, abs/1310.4546.

References                                               Aliaksei Severyn and Alessandro Moschitti. 2015.
                                                           Twitter sentiment analysis with deep convolutional
Francesco Barbieri, Valerio Basile, Danilo Croce,          neural networks. In Proc. of the SIGIR 2015, pages
  Malvina Nissim, Nicole Novielli, and Viviana Patti.      959–962, New York, NY, USA. ACM.
Andrea Vanzo, Danilo Croce, and Roberto Basili.
  2014. A context-based model for sentiment analysis
  in twitter. In Proc. of 25th COLING, pages 2345–
  2354.

</pre>