<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>To Parse or Not to Parse: An Experimental Comparison of RNTNs and CNNs for Sentiment Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zahra Ahmadi</string-name>
          <email>zaahmadi@uni-mainz.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aleksandrs Stier</string-name>
          <email>stier@students.uni-mainz.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcin Skowron</string-name>
          <email>marcin.skowron@ofai.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Kramer</string-name>
          <email>kramer@informatik.uni-mainz.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Austrian Research Institute for Artificial Intelligence</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institut fu ̈r Informatik, Johannes Gutenberg-Universita ̈t</institution>
          ,
          <addr-line>Mainz</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recent years have seen a variety of different deep learning architectures for sentiment analysis. However, little is known about their comparative performance and merits on a common ground, across a variety of datasets, and on the same level of optimization. In this paper, we provide such a comparison for two popular architectures, Recursive Neural Tensor Networks (RNTNs) and Convolutional Neural Networks (CNNs). Although RNTNs have been shown to work well in many cases, they require intensive manual labeling due to the socalled vanishing gradient problem. To enable an extensive comparison of the two architectures, this paper employs two methods to automatically label the internal nodes: a rule-based method and (this time as part of the RNTN method) a convolutional neural network. This enables us to compare these RNTN models to a relatively simple CNN architecture. On almost all benchmark datasets the CNN architecture outperforms the variants of RNTNs tested in the paper. These results suggest that CNNs already offer good predictive performance and, at the same time, more research on RNTNs would be needed to further exploit sentence structure.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The advent of social media such as twitter, blogs, ratings and reviews has created a
surge of research on the task of sentiment analysis especially for short texts such as
sentences [
        <xref ref-type="bibr" rid="ref16 ref17">17, 16</xref>
        ]. However, a single sentence has a limited amount of contextual data
which makes its sentiment prediction challenging. To effectively solve this problem,
one may model sentences to analyze and represent their semantic content. Neural
network based sentence modeling approaches have been increasingly considered [
        <xref ref-type="bibr" rid="ref12 ref19 ref6">12, 19,
6</xref>
        ] for their significant advantages of removed requirements for feature engineering and
preservation of the word order and syntactic structures, in contrast to the traditional
bag-of-words model, where sentences are encoded as unordered collections of words.
      </p>
      <p>
        Most existing neural network models in the context of sentence classification fall
into one of two groups: Recursive Neural Networks (RecNNs) and Convolutional
Neural Networks (CNNs). RecNNs have shown excellent abilities to model word
combinations in a sentence. However, they depend on well-performing parsers to provide the
topological structure. These are not available for many languages and do not perform
well in noisy domains. Further, they often require labeling of all phrases in sentences to
reduce the so-called vanishing gradient problem [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. On the other hand, CNN models
apply a convolution operator sequentially on word vectors using sliding windows. Each
sentence is treated individually as a bag of n-grams, and long-range dependency
information spanning multiple sliding windows is therefore lost [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Another limitation of
CNN models is their requirement for the exact specification of their architecture and
hyperparameters [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>We conducted extensive experiments over a range of benchmark datasets to
compare the two network architectures: RNTNs and CNNs. Our goal is to provide an
indepth analysis on how these models perform across different settings. Such a
comparison is missing in the literature, likely because recursive networks often require
laborintensive manual labeling of phrases. Such annotations are unavailable for many
benchmark datasets. We propose two methods to label the internal phrases automatically and
also investigate whether there is an effect of using constituency parsing instead of
dependency parsing in the RNTN model. In this way, we aim to contribute to a better
understanding of the limitations of the two network models and how to improve them.</p>
      <p>The remainder of this paper is organized as follows: A brief review on the related
literature is presented in Section 2. Section 3 explains the details of network
architectures. In Section 4, results of the experiments on common benchmarks are discussed,
and finally, Section 5 concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Neural network approaches which are used in sentiment analysis range from basic
Neural Bag-of-Words (NBoW) to more representative compositional approaches such
as RecNNs [
        <xref ref-type="bibr" rid="ref18 ref4">18, 4</xref>
        ], CNNs [
        <xref ref-type="bibr" rid="ref6 ref7">7, 6</xref>
        ], and LSTM models [
        <xref ref-type="bibr" rid="ref10 ref22">10, 22</xref>
        ].
      </p>
      <p>
        Recursive neural networks [
        <xref ref-type="bibr" rid="ref15 ref3">15, 3</xref>
        ] work by feeding an external parse tree to the
network. At every node in the tree, the composition is done in a bottom-up fashion by a
weight matrix shared over all nodes of the tree. Recurrent Neural Networks (RNN) are
a special case of recursive networks where their structure is linear instead of a tree [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
An in-depth comparison of RecNN and RNN showed that when long-distance semantic
dependencies play a role, recursive models offer useful power [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Yet RecNNs
implicitly model the interaction among input vectors, whereas Recursive Neural Tensor
Networks (RNTNs) have been proposed to allow more explicit interactions [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>
        CNNs, as the alternative models for predicting sentiment, apply one-dimensional
convolution kernels in sequential order to extract local features. Recently, new
architectures have been proposed to resolve the limitation of CNNs in losing long-range
dependency information [
        <xref ref-type="bibr" rid="ref11 ref20">11, 20</xref>
        ], or to overcome the fixed structure of CNNs for one
input length [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Method</title>
      <p>In this section we first present our approach to the automatic labeling of RNTNs and
then explain our proposed architecture for the CNN.
SoftMax
soam1 e unbelaie2vably hilaar3ious moma4ents
Fig. 1: An example of an RNTN architecture with word vector dimension of size 4 for sentiment
classification of a given input sequence, which is parsed by a constituency parser. V and W are
the tensor matrix and the recursive weight matrix, respectively.</p>
      <p>
        Recursive Neural Tensor Network Architecture. RNTNs [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] are a generalization of
RecNNs where the interactions among input vectors are encoded in a single
composition function (Figure 1). Here, we propose two methods to make the labeling process
automatic:
- Rule-based method: The RNTN model was first proposed for sentiment analysis
purposes. Hence, our first approach uses a rule-based method to determine the
opinion of a phrase. We use four types of dictionaries: A dictionary of sentiments
consisting of 6; 360 entries with a sentiment range of [ 3; +3], a negation dictionary
consisting of 28 entries, a dictionary of intensifier terms consisting of 47 words with
a weight range of [1; 3], and a dictionary of diminishers consisting of 26 entries with a
weight range of [ 3; 1]. For any phrase, we start analyzing from the end backwards
to the beginning: If any sentiment term found, we search for an intensifier/diminisher
term to increase/decrease the absolute value of the sentiment. Then we search for a
negation term. If one is found and there is no intensifier/diminisher before the
sentiment term, the sentiment is reversed; otherwise if the phrase includes both the
negation term and an intensifier/diminisher, the sentiment is set to weak negative.
- CNN-based method: A more general approach for labeling the phrases is to use a
pre-trained CNN model. We use the architecture proposed here (see below for the
description) to train a model on the sentence level, and use the resulting model to
label the internal phrases for the RNTN. In this way, we could apply the RNTN to
domains other than sentiment classification as well.
      </p>
      <p>
        Convolutional Neural Network Architecture. Deep convolutional neural networks have
led to a series of breakthrough results in image classification. Although recent evidence
shows that network depth is of crucial importance to obtain better results [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], most
of the models in the sentiment analysis and sentence modeling literature use a simple
architecture (e.g. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] uses a one-layer CNN). Inspired by the success of CNNs in image
classification, our goal is to expand the convolution and Max-Pooling layers in order to
      </p>
      <p>SoftMax
fully connected layer
MaxPooling layer
Convolution layer
filters of size 2 d</p>
      <p>padding = 1
MaxPooling layer
size = 2, stride = 2
Convolution layer
filters of size 2 d
Convolution layer
filters of size 1 d</p>
      <p>
        some
unbelievably
hilarious
moments
achieve better performance by deepening the models and adding higher non-linearity
to the structure. However, deeper models are also more difficult to train [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. To reduce
the computational complexity, we choose small filter sizes. In our experiments, we have
come up with a simple CNN model that consists of six layers (Figure 2): The first layer
applies 1 d filters on the word vectors, where d is the word vector dimension. The
essence of adding such a layer to the network is to derive more meaningful features
from word vectors for every single word before feeding them to the rest of the network.
This helps us achieving better performance since the original word vectors capture only
sparse information about the words’ labels. In contrast to our proposed layer, [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] uses a
so-called non-static approach to modify the word vectors through the training phase.
      </p>
      <p>
        The second layer of our CNN model is again a convolution layer with the filters
of size 2 d. The output of this layer is fed into a Max-Pooling layer with pooling
size and stride 2. The reason for applying such a Max-Pooling layer in the middle
layers of the network is to reduce the dimensionality and to speed up the training phase.
This layer does not have notable effect on the accuracy of the resulting model. Next,
on the fourth layer, convolving filters of size 2 d with a padding size 1 are again
applied to the output of previous layer. Padding preserves the original input size. The
next layer applies Max-Pooling to the whole input at once. Using bigger pooling sizes
leads to better results [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Finally, the last layer is a fully connected SoftMax layer
which outputs the probability distribution over the labels.
In this section, we first introduce the benchmark datasets and experimental settings,
then we will investigate the variants of RNTNs and compare their performance to the
proposed CNN architecture.
We compare the models on a set of commonly applied benchmark datasets (Table 1):
The Movie Review (MR) dataset3 was extracted from Rotten Tomato reviews [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
where the reviews can be positive or negative. As MR dataset does not have a separate
test set, we use 10-fold cross-validation in the experiments. An extended version of MR
dataset relabeled by Socher et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] in the Stanford Sentiment Treebank (SST-5)4 has
five fine-grained labels: negative, somewhat negative, neutral, somewhat positive and
positive. A binary version of the SST-5 dataset (SST-2) was created by removing the
neutral sentences and assigning the remaining to two classes: negative or positive. The
3 https://www.cs.cornell.edu/people/pabo/movie-review-data/
4 http://nlp.stanford.edu/sentiment/Data
SemEval-20165 dataset is a set of tweets and was provided by the SemEval contest.
Tweets were labeled by one of the three labels: negative, neutral and positive.
4.2
      </p>
      <sec id="sec-3-1">
        <title>Experimental Settings</title>
        <p>
          In our experiments, we use the pre-trained Glove [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] word vector models6: On the
SemEval-2016 dataset, we use Twitter specific word vectors that were trained on 2
billion tweets. On other datasets, we use the model trained on the web data from Common
Crawl which contains a case-sensitive vocabulary of size 2:2 million. In all the
experiments, the size of the word vector, the minibatch and the epochs were set to 25, 20
and 100, respectively. We use f = tanh and a learning rate of 0:01 in all the RNTN
models. In CNN models, the number of filters in the convolutional layers are set to 100,
200 and 300, respectively; and the maximum length of the sentences is 32. For shorter
sentences, they are padded with zero vectors. In RNTN models which use constituency
parsers, we use the Stanford parser [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. For those models which use dependency parsers,
we use the Tweebo parser [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] – a dependency parser specifically developed for Twitter
data – for the SemEval-2016 dataset and on the rest of the datasets, we use the Stanford
neural network dependency parser [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
4.3
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Results</title>
        <p>In this section, we present the results of automatic labeling of phrases, the effect of
the selected parser type, and describe the overall evaluation results for the presented
RNTN and CNN models. Finally we discuss the effect of automatic labeling on the
performance of the RNTN.
- Comparison of automatic labeling methods: We first use the manually labeled
SST-5 dataset to test the effectiveness of our automatic labeling methods. We extract
all the possible phrases of the whole dataset with respect to their parse trees and use
our rule-based method to label them. The accuracy of the rule-based method is 69%
and its confusion matrix is reported in Table 3 (left). In the next step we train a CNN
model on the training sentences and use the resulting model to label the phrases. The
5 http://alt.qcri.org/semeval2016/task4/
6 http://nlp.stanford.edu/projects/glove/
accuracy of the CNN model labeling is 40% and the corresponding confusion matrix
is presented in Table 3 (right). Although the accuracy of the CNN model is far lower
than that of the rule-based method, we observe that the CNN is a better model to
correctly classify positive and negative classes than the rule-based method. In turn,
the rule-based method is superior in the classification of the neutral class.
- Constituency parser vs. dependency parser: When analyzing the effect of using
a dependency parser instead of a constituency parser in RNTNs (Table 2), for some
datasets (e.g. MR) a significant loss of performance is visible. This is particularly
noticeable when the labeling method is CNN (e.g. 70% to 49% in MR). The
reason for this observation could be the difference of the word order resulting from a
dependency parser compared to the n-gram features extracted by the CNN.
- RNTN vs. CNN: Table 2 shows a detailed comparison of the RNTN version to the
CNN model and the rule-based method. With the same settings of parameters, we
see a better performance of the CNN model on all the datasets, with the exception of
the SST-5 dataset. The largest performance (in terms of F-measure) improvement can
be observed on the SST-2 and SemEval-2016 datasets, 0.70 and 0.51; and 0.77 and
0.56, respectively, for the best performing RNTN and CNN approaches. The possible
reasons may be related to the enormously large number of parameters that have to be
optimized in the tensor and the effects of the applied automatic labeling of phrases
used on the RNTN. Therefore, a future research direction could try to reduce this
space and find a better initialization.
- Effect of automatic labeling on RNTN performance: Table 4 presents the
performance of different versions of the RNTN trained on the manually labeled SST-5
dataset versus the rule-based and CNN-based automatic labeling variants. As we can
see, automatic labeling will result in a significant degradation of performance on
SST-5. Comparing the results with the CNN model in Table 2 shows that the
manually labeled RNTN outperforms the CNN architecture in terms of overall accuracy
and F-measure. Looking into the confusion matrix of both methods (Table 5)
indicates that the RNTN is better at predicting neutral and positive labels while the
CNN is better at classifying negative and more positive labels. Unfortunately,
currently there is no other dataset that is manually labeled at the phrase level. A future
direction could be further evaluating the impacts of the phrase labeling accuracy on
various datasets.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>In this paper we proposed two methods to automatically label the internal nodes of
recursive networks to reduce the labor-intensive task of manually labeling the phrases
in predicting the sentiment of the sentences. We then conducted an in-depth study of
the RNTN model and compared the model to a relatively simple CNN architecture.
Experimental results demonstrate that the proposed CNN model outperforms the RNTN
variants. The findings also show that there is still room for improvement of RNTNs in
terms of determining tensor functions in a more informed manner.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgement</title>
      <p>The authors thank PRIME Research for supporting the first author during her research
time. The third author is supported by the Austrian Science Fund (FWF): P27530-N15.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.:</given-names>
          </string-name>
          <article-title>A fast and accurate dependency parser using neural networks</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <fpage>740</fpage>
          -
          <lpage>750</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>In: IEEE Conference on Computer Vision and Pattern Recognition</source>
          . pp.
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Hermann,
          <string-name>
            <given-names>K.M.</given-names>
            ,
            <surname>Blunsom</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>The role of syntax in vector space models of compositional semantics</article-title>
          .
          <source>In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <fpage>894</fpage>
          -
          <lpage>904</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Irsoy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cardie</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Deep recursive neural networks for compositionality in language</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          . pp.
          <fpage>2096</fpage>
          -
          <lpage>2104</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manjunatha</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boyd-Graber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Daume´ III, H.:
          <article-title>Deep unordered composition rivals syntactic methods for text classification</article-title>
          .
          <source>In: Proceedings of 53rd Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <fpage>1681</fpage>
          -
          <lpage>1691</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kalchbrenner</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grefenstette</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blunsom</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A convolutional neural network for modelling sentences</article-title>
          .
          <source>In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <fpage>655</fpage>
          -
          <lpage>665</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural networks for sentence classification</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <fpage>1746</fpage>
          -
          <lpage>1751</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>Accurate unlexicalized parsing</article-title>
          .
          <source>In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics</source>
          . pp.
          <fpage>423</fpage>
          -
          <lpage>430</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kong</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swayamdipta</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhatia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          :
          <article-title>A dependency parser for tweets</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <fpage>1001</fpage>
          -
          <lpage>1012</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luong</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
          </string-name>
          , E.:
          <article-title>When are tree structures necessary for deep learning of representations?</article-title>
          <source>In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning</source>
          . pp.
          <fpage>2304</fpage>
          -
          <lpage>2314</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Dependency-based convolutional neural networks for sentence embedding</article-title>
          .
          <source>In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</source>
          . pp.
          <fpage>174</fpage>
          -
          <lpage>179</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , Karafia´t, M.,
          <string-name>
            <surname>Burget</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cernocky</surname>
            <given-names>`</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Khudanpur</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Recurrent neural network based language model</article-title>
          .
          <source>In: 11th Annual Conference of the International Speech Communication Association</source>
          . pp.
          <fpage>1045</fpage>
          -
          <lpage>1048</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales</article-title>
          .
          <source>In: Proceedings of the 43rd annual meeting on association for computational linguistics</source>
          . pp.
          <fpage>115</fpage>
          -
          <lpage>124</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.: Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Pollack</surname>
            ,
            <given-names>J.B.</given-names>
          </string-name>
          :
          <article-title>Recursive distributed representations</article-title>
          .
          <source>Artificial Intelligence</source>
          <volume>46</volume>
          (
          <issue>1</issue>
          ),
          <fpage>77</fpage>
          -
          <lpage>105</lpage>
          (
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>Reforgiato</given-names>
            <surname>Recupero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Presutti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Consoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Nuzzolese</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.G.</surname>
          </string-name>
          :
          <article-title>Sentilo: Frame-based sentiment analysis</article-title>
          .
          <source>Cognitive Computation</source>
          <volume>7</volume>
          (
          <issue>2</issue>
          ),
          <fpage>211</fpage>
          -
          <lpage>225</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Rexha</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Kr o¨ll,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Dragoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Kern</surname>
          </string-name>
          , R.:
          <article-title>Polarity classification for target phrases in tweets: A word2vec approach</article-title>
          . In: Sack,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Rizzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Steinmetz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Mladenic´</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Lange</surname>
          </string-name>
          , C. (eds.) The Semantic Web:
          <article-title>ESWC 2016 Satellite Events</article-title>
          . pp.
          <fpage>217</fpage>
          -
          <lpage>223</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huval</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          :
          <article-title>Semantic compositionality through recursive matrix-vector spaces</article-title>
          .
          <source>In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning</source>
          . pp.
          <fpage>1201</fpage>
          -
          <lpage>1211</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perelygin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chuang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potts</surname>
            ,
            <given-names>C.P.</given-names>
          </string-name>
          :
          <article-title>Recursive deep models for semantic compositionality over a sentiment treebank</article-title>
          .
          <source>In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <fpage>1631</fpage>
          -
          <lpage>1642</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , R.,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Dependency sensitive convolutional neural networks for modeling sentences and documents</article-title>
          . In:
          <article-title>The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          . pp.
          <fpage>1512</fpage>
          -
          <lpage>1521</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification</article-title>
          .
          <source>CoRR abs/1510</source>
          .03820 (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sobhani</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Long short-term memory over recursive structures</article-title>
          .
          <source>In: Proceedings of the 32nd International Conference on Machine Learning</source>
          . pp.
          <fpage>1604</fpage>
          -
          <lpage>1612</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>