<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Short text classi cation using deep representation: A case study of Spanish tweets in Coset Shared Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Erfaneh Gharavi</string-name>
          <email>e.gharavi@ut.ac.ir</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kayvan Bijari</string-name>
          <email>kayvan.bijari@ut.ac.ir</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of New Science and Technologies, University of Tehran</institution>
          ,
          <addr-line>Tehran</addr-line>
          ,
          <country country="IR">Iran</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>28</fpage>
      <lpage>35</lpage>
      <abstract>
        <p>Topic identi cation as a speci c case of text classi cation is one of the primary steps toward knowledge extraction from the raw textual data. In such tasks, words are dealt with as a set of features. Due to high dimensionality and sparseness of feature vector result from traditional feature selection methods, most of the proposed text classi cation methods for this purpose lack performance and accuracy. In dealing with tweets which are limited in the number of words the aforementioned problems are re ected more than ever. In order to alleviate such issues, we have proposed a new topic identi cation method for Spanish tweets based on the deep representation of Spanish words. In the proposed method, words are represented as multi-dimensional vectors, in other words, words are replaced with their equivalent vectors which are calculated based on some transformation of raw text data. Average aggregation technique is used to transform the word vectors into tweet representation. Our model is trained based on deep vectorized representation of the tweets and an ensemble of different classi ers is used for Spanish tweet classi cation. The best result obtained by a fully connected multi-layer neural network with three hidden layers. The experimental results demonstrate the feasibility and scalability of the proposed method.</p>
      </abstract>
      <kwd-group>
        <kwd>Deep representation</kwd>
        <kwd>Word Vector Representation</kwd>
        <kwd>Spanish Tweet Classi cation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Topic identi cation is one of the primary steps toward text understanding. This
process has to be done automatically due to the large amount of existing texts.
A few number of topic identi cation applications are as follow [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        People use social medias to state their idea about social events. People's
tweets capture researchers' attention on different issues and they have been used
for a variety of purposes, such as marketing communication, education and
politics [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Speci cally about the political conventions such as election which make
individuals active and leads to thousands of tweets. Topic identi cation of a
given tweet is the rst step toward its analysis.
      </p>
      <p>
        There are a lot of challenges regarding classifying tweets includes
colloquialism in tweets, spelling variation, use of special characters, violating regular
grammar rules etc [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The short text does not provide sufficient word
occurrences.
      </p>
      <p>
        In order to work with textual data, it should be described numerically to
enable computers to process that. In traditional approaches, words are considered
as distinct features for representing textual data. Those representations suffer
from scarcity and disability to detect synonyms [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. To avoid these issues which
required time-consuming feature engineering, deep learning techniques are used
which proves its competency in many application such as NLP [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The essential
goal of deep learning [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] is to improve all procedures of NLP in an efficient way.
Deep representation of text data makes it easy to compare words and sentences
as well as minimizing the need to use lexicons.
      </p>
      <p>
        In this paper, we used deep text representation to classify Spanish tweets.
We applied the method provided in our previous work [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to this classi cation
Shared task to assess the feasibility of our approach in variety of natural language
processing task in different languages.
      </p>
      <p>The structure of this paper is as follows: rst, related works are studied in
Section 2, deep representation of text is described in Section 3. An introduction
to tweet classi cation using deep representation is given in Section 4. Section 5
deals with an introduction to the evaluation metric, and results of the proposed
approach over the provided data sets. Finally, Section 6 ends the paper with
conclusion and some insights.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>In this section, some well-known methods for text classi cation are described
and speci cally some of the recently presented tweet classi cation approaches
are discussed and reviewed.</p>
      <p>
        In the work of Ghavidel et al.[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] after selecting keywords using standard
term frequency/inverse document frequency (TF-IDF), vector space method was
applied for classifying a Persian text corpus consisting 10 categories. Li et al [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
classify text by combining the functionality of Support Vector Machine (SVM)
and K-nearest neighbor for text classi cation, precision of 94.11(%) was resulted
for 12 topics. In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], BINA et al. used K-nearest neighbor algorithm using
statistical features including 3-grams, 4-grams and three criteria including Manhattan,
Dice and dot product were used for classifying Hamshahri corpus. As a feature
for text classi cation, emoticons are used in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] for tweet topic classi cation.
Dilrukshi et al.[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] Classi ed Twitter feeds using SVM Text classi er. Bakliwal
et al.[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] used Stanford and Mejaj data set to nd sentiments in its tweets. They
assign a positive and a negative probability to each word, to nd sentiments
of each tweet. Wikipedia and Wordnet used in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] to cluster short texts
accurately. Malkani &amp; Gillie[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] used supervised SVM, Neural Network (NN), Naive
Bayes (NB), and Random Forest for Twitter topic classi cation. SVM
outperforms other supervised classi cation algorithms. Zubiaga et al.[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] extracted 15
language independent features from the trending topics. They trained an SVM
classi er with the features and it is then used to classify trending topics.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Deep Text Representation</title>
      <p>
        Deep learning tries to nd more abstract features using deep multiple layer
graph. Each layer has linear or non-linear function to transform data into more
abstract ones [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Hierarchical nature of concept makes new feature
representation a suitable approach for natural language processing. Advantages of using
deep methods for NLP task are listed below:
{ No hand crafted feature engineering is required
{ Fewer number of features in comparison to the traditional methods
{ No labeled data is required [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
3.1
      </p>
      <sec id="sec-3-1">
        <title>Word Vector Representation</title>
        <p>
          In application of deep representation in natural language processing, each word
is described by the surrounding context. This vector which is generated by a
deep neural networks and contain semantic and syntactic information about the
word. In distributed word representation, generally known as word-embedding,
the similar words have the similar vectors [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. Skip-grams and continuous bag
of words, which are employed by this study, are two-layer neural networks that
are trained for language modeling task [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Text Document Vector Representation</title>
        <p>
          A composition function should be provided for combining word vectors to
represent text. Paragraph Vector is an unsupervised algorithm that used the idea of
word vector training and considered a matrix for each piece of text. This matrix
also update during language modeling task. Paragraph vector outperforms other
methods such as bag-of-words models for many applications [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Socher [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]
introduce Recursive Deep Learning methods which are variations and extensions
of unsupervised and supervised recursive neural networks (RNNs). This method
encodes two word vectors into one vector by auto-encoder networks. Socher also
presents many variation of these deep combination functions such as
MatrixVector Recursive Neural Networks (MV-RNN) [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. There are also some simple
mathematical methods which applied as a composition function generally used
as benchmarks [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Tweet Classi cation using Deep Text Representation</title>
      <p>In this section, we describe our approach for Spanish tweet classi cation. These
steps include pre-processing that describe the text re nement process, then
composition function to combine the word embedding to construct and represent
each tweet is illustrated. Then the classi cation algorithms applied to classi ed
the tweets into the aforementioned categories. Figure 1 shows the steps of tweet
topic identi cation procedure.</p>
      <p>Spanish Tweets</p>
      <p>Preprocessing
(remove
stopwords, ...)
Calculate sentence</p>
      <p>vector
Deep representation
for training tweets
train classifier</p>
      <p>Model</p>
      <p>Deep representation</p>
      <p>for test tweets
Classify test tweets</p>
      <p>by model
As the rst step of the processing, the following steps are done as pre-processing
of every block of text, i.e tweet. Such as elimination of special characters as:
\&amp;", \(", \)", \#", and removal of all numbers.</p>
      <p>Other most common pre-processing functions is removing the stop-words. In
this regard, we simply omit list of stop words available on-line1. This list includes
178 stop words like: \una", \es", \soy", \vamos", etc.</p>
      <sec id="sec-4-1">
        <title>1 http://www.ranks.nl/stopwords/spanish</title>
        <p>4.2</p>
        <sec id="sec-4-1-1">
          <title>Tweet Representation</title>
          <p>We rst retrieve Spanish 300-dimensional word vectors on-line2. Then Spanish
stop words eliminated while text pre-processing. After that, for each sentence
an average of all word vectors is calculated as in equation (1).</p>
          <p>F 1macro = L1 ∑ F1(y1; y^1)</p>
          <p>l2L</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>2 http://crscardellino.me/SBWCE/ 3 http://media ows.es/coset/</title>
        <p>Si =
∑n
i=1 wi
n
Where S is the vector representation for tweet and wi is the word vector for
i th word of the sentences and n is the number of words in that sentence.</p>
        <p>We represent each tweet in train and test corpus by this approach. For each
tweet in training, validation and test corpus we have a 300-dimensional text
representation. These vectors considered as feature vectors which are required
to classify the tweets.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental Evaluation</title>
      <p>In this section, at rst, dataset used in the Coset Shared task will be described.
Then evaluation metrics for tweet classi cation are de ned.
5.1</p>
      <sec id="sec-5-1">
        <title>Dataset</title>
        <p>In COSET Shared task, provided dataset consist of 5 issues: political issues,
related to the most abstract electoral confrontation; policy issues, about sectoral
policies; personal issues, on the life and activities of the candidates; campaign
issues, related with the evolution of the campaign; and other issues. The tweets
are written in Spanish and they talk about the 2015 Spanish General Election.
The training set consists 2242 tweets. The development contains 250 tweets to
help participants to train their model and test it on 624 test tweets3.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Evaluation Criteria</title>
        <p>The metric used for evaluating the participating systems was the F1 macro
measure. This metric considers the precision and the recall of the systems
predictions, combining them using the harmonic mean. Provided that the classes are
not balanced, the Coset committee proposed using the macro-averaging method
for preventing systems biased towards the most populated classes.
(1)
(2)
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Results</title>
        <p>
          The results of applying the proposed method over training Spanish Tweet corpus
is presented in table 1, The nal results as well as rankings are reported in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>Vectorized tweets are trained and tested via different learning algorithms
such as random forest, support vector machine with linear kernel, naive bayse,
and logistic regression. Table 1 shows the achieved results of different learning
algorithms on the Spanish corpus.</p>
        <p>Furthermore, since neural network structures usually grasp a better intuition
of deeply extracted set of features from the datasets, vectorized sentences are
feed into a multi layer perception neural network with three hidden layers inside.
table 2 shows results of the neural network and its characteristic over the given
dataset.</p>
        <p>Based on the achieved experimental results of studied algorithms, it is clear
that neural network structure is a splendid choice to deal with vectorized
sentences for performing Twitter topic classi cation task. In this regard and with
numbers of trials, a multi layer perceptron neural network with 3 hidden layers
each containing 500, 240, and 100 neurons is selected for the Coset shared task.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>
        In this paper, we proposed a method for topic identi cation as a typical case of
text categorization on Spanish tweets by using the deep representation of words.
Results from experimental evaluations showed the feasibility of our approach.
With no hand engineering feature and simple composition function, we achieved
55(%) based on F 1macro score. The best method reported 64(%) of F 1macro [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>In order to improve the proposed method, we can consider the following ideas:
In this work, we used words which are included in Spanish word vectors, while it
is clear that incorporating stemming would help us to nd all words in the tweet
on that list. We can also provide methods to deal with out of vocabulary words.
In addition, considering the unknown topic in the proposed method is another
issue in this area.</p>
      <p>Furthermore, given its admirable runtime, our proposed method is scalable
and is applicable for a large number of documents and can be used for practical
purpose. As a future work, we are going to apply other composition functions
and also try word-by-word vector comparison in order to eliminate drawbacks
of the current method.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The authors would like to thank the reviewers for providing helpful comments
and recommendations which improve the paper signi cantly.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aggarwal</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A Survey of Text Classi cation Algorithms</article-title>
          .
          <source>Mining Text</source>
          Data pp.
          <volume>163</volume>
          {
          <issue>222</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bakliwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arora</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madhappan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kapre</surname>
          </string-name>
          , N.:
          <article-title>Mining sentiments from tweets</article-title>
          .
          <source>Proceedings of the WASSA</source>
          <volume>12</volume>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Batool</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khattak</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maqbool</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Precise tweet classi cation and sentiment analysis</article-title>
          .
          <source>In: 2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS)</source>
          . pp.
          <volume>461</volume>
          {
          <fpage>466</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (jun
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Learning Deep Architectures for AI</article-title>
          .
          <source>Foundations and Trends ⃝R in Machine Learning</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <volume>1</volume>
          {
          <fpage>127</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ducharme</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vincent</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janvin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A Neural Probabilistic Language Model</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          <volume>3</volume>
          ,
          <issue>1137</issue>
          {
          <fpage>1155</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>BINA</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>AHMADI</surname>
            ,
            <given-names>M. H.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>RAHGOZAR</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Farsi Text Classi cation Using N-Grams and Knn Algorithm A Comparative Study</article-title>
          .
          <source>In: Data mining</source>
          . pp.
          <volume>385</volume>
          {
          <issue>39</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Collobert</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A uni ed architecture for natural language processing: Deep neural networks with multitask learning</article-title>
          .
          <source>Proceedings of the 25th international conference on Machine learning</source>
          pp.
          <volume>160</volume>
          {
          <issue>167</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Collobert</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karlen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kavukcuoglu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuksa</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Natural Language</surname>
          </string-name>
          <article-title>Processing (Almost) from Scratch</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <volume>2493</volume>
          {
          <fpage>2537</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Dilrukshi</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Zoysa</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caldera</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Twitter news classi cation using SVM</article-title>
          .
          <source>In: Proceedings of the 8th International Conference on Computer Science and Education</source>
          ,
          <string-name>
            <surname>ICCSE</surname>
          </string-name>
          <year>2013</year>
          . pp.
          <volume>287</volume>
          {
          <issue>291</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gharavi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bijari</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veisi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zahirnia</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>A Deep Learning Approach to Persian Plagiarism Detection</article-title>
          . Working notes of FIRE 2016-
          <article-title>Forum for Information Retrieval Evaluation (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Ghavidel</given-names>
            <surname>Abdi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Vazirnezhad</surname>
          </string-name>
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Bahrani</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Persian text classi cation</article-title>
          .
          <source>In: 4th conference on Information and Knowledge Technology</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Gimenez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baviera</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Llorca</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gamir</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Overview of the 1st Classi cation of Spanish Election Tweets Task at IberEval 2017</article-title>
          .
          <article-title>In: Notebook Papers of 2nd SEPLN Workshop on Evaluation of Human Language Technologies for Iberian Languages (IBEREVAL), Murcia</article-title>
          , Spain,
          <source>September 19, CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Osindero</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teh</surname>
            ,
            <given-names>Y.W.:</given-names>
          </string-name>
          <article-title>A Fast Learning Algorithm for Deep Belief Nets</article-title>
          .
          <source>Neural Computation</source>
          <volume>18</volume>
          (
          <issue>7</issue>
          ),
          <volume>1527</volume>
          {
          <fpage>1554</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chua</surname>
          </string-name>
          ,
          <source>T.s.: Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge. Proceedings of the 18th ACM conference on Information and knowledge</source>
          management pp.
          <volume>919</volume>
          {
          <issue>928</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <source>Distributed Representations of Sentences and Documents</source>
          <volume>32</volume>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miao</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Two-level hierarchical combination method for text classi cation</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>38</volume>
          (
          <issue>3</issue>
          ),
          <year>2030</year>
          {
          <year>2039</year>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Malkani</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gillie</surname>
          </string-name>
          , E.:
          <article-title>Supervised Multi-Class Classi cation of Tweets (</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Margarida De</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cachopo</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernard</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doutor</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Em</surname>
            <given-names>lio</given-names>
          </string-name>
          , J., Pava~o,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Doutora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Maria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Trancoso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Arlindo</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          , Limede De Oliveira,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Mario</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Gaspar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Pavel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Calado</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Improving Methods for Single-label Text Categorization (</article-title>
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. Mitchell, J.,
          <string-name>
            <surname>Lapata</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <source>Composition in Distributional Models of Semantics. Cognitive Science</source>
          <volume>34</volume>
          (
          <issue>8</issue>
          ),
          <volume>1388</volume>
          {
          <fpage>1429</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Read</surname>
          </string-name>
          , J.:
          <article-title>Using Emoticons to reduce Dependency in Machine Learning Techniques for Sentiment Classi cation</article-title>
          . ACL Student Research workshop (June),
          <volume>43</volume>
          {
          <fpage>48</fpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Socher</surname>
          </string-name>
          , R.:
          <article-title>Recursive Deep Learning for Natural Language Processing and Computer Vision</article-title>
          .
          <source>PhD thesis (August)</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huval</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          :
          <article-title>Semantic compositionality through recursive matrix-vector spaces</article-title>
          .
          <source>In: Proceedings of the</source>
          <year>2012</year>
          <article-title>joint conference on empirical methods in natural language processing and computational natural language learning</article-title>
          . pp.
          <volume>1201</volume>
          {
          <fpage>1211</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Zubiaga</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spina</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martnez</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fresno</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Real-time classi cation of Twitter trends</article-title>
          .
          <source>Journal of the Association for Information Science and Technology</source>
          <volume>66</volume>
          (
          <issue>3</issue>
          ),
          <volume>462</volume>
          {
          <fpage>473</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>