<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Isabel Segura-Bedmar[</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>LABDA's early steps toward Multimodal Stance detection</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Carlos III de Madrid</institution>
          ,
          <addr-line>Leganes 28911, Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>0000</year>
      </pub-date>
      <volume>0002</volume>
      <fpage>180</fpage>
      <lpage>186</lpage>
      <abstract>
        <p>In this paper, we describe our participation at the task on MultiModal Stance Detection in Tweets on Catalan 1Oct Referendum. Tweets are cleaned and represented using the simple Bag-of-Words approach with tf-idf vectors. Then, we explore the most widely used and e cient classi ers in text classi cation. Some algorithms are adapted to be multi-class learning by using one-versus-all strategy because they are naturally binary. For each algorithm, we perform grid search on all combinations of its parameters in order to nd the set of parameters which provides the most accurate model. Our system employing text and context obtains the top macro F1 (28.02%) for spanish tweets.</p>
      </abstract>
      <kwd-group>
        <kwd>Multimodal stance detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The goal of the stance detection is to determine the stance of the author of a
text with respect to a speci c topic. The stance can take the following values:
favor (positive), against (negative) or neutral [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In last years, several shared
tasks on stance detection have been organized such as SemEval-2016 Task 6:
Detecting Stance in Tweets [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and Stance and Gender detection in tweets on
Catalan Independence (StanceCat 2017) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In 2018, a new shared task,
MultiStanceCat, is organized with the goal of detecting the author's stance about
the Catalan independence referendum, which was hold on 1 October 2017. The
MultiStanceCat 2018 task [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] goes one step further than these previous shared
tasks and does not only provide the texts of the tweets, but also gives their
previous and next tweets and the images from the authors timeline. Thus, the
participating systems can develop approaches that exploit text and images to
infer the stance expressed in the tweets.
For lack of time and experience on visual computing, we decided to use an
approach that only exploits the text of tweets. The stance detection can be
formulated as a multi-class problem with three classes (FAVOR, AGAINST and
NEUTRAL).
      </p>
      <p>
        We performed an exhaustive evaluation of the most widely used and e cient
classi ers in text classi cation. In particular, we used the following algorithms:
Multinomial Naive Bayes [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Linear Support Vector Machine (SVM)[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Logistic
Regression [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], k-Nearest Neighbours algorithm (k-NN) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Decision Trees [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
and Random Forest [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. As some of these classi ers are binary (Linear SVM,
Logistic Regression and Multinomial NB), they must be adapted to a multi-class
classi cation problem by using the one-versus-all strategy. All the experiments
were conducted in Python using Scikit Learn for classi cation.
      </p>
      <p>Multinomial Naive Bayes classi er has been proven very e ective for text
classi cation. It is a probabilistic model based on theorem of Bayes. This
classi er calculates the probabilities of each text belonging to each class and then
selects the class with the maximum probability. The adjective naive comes from
the assumption that all features are independent given class. Although such
an independence assumption is not usually true, the algorithm often performs
surprisingly well with a fast computational time. Moreover, it requires a small
amount of training data, is very easy to implement and is also very scalable.
Despite its simplicity, the Naive Bayesian classi er often exceeds more sophisticated
classi cation algorithms.</p>
      <p>
        SVM, perhaps one of the most popular and successful classi ers, is a
nonprobabilistic linear classi er that tries to nd the hyperplane that best separates
the classes, maximizing the margin between them while, at the same time,
minimizing the number of misclassi cation errors. The main reason of its success
is that most text classi cation problems are linearly separable [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Moreover,
SVM is able to learn, irrespective of the dimensionality of the feature space,
because it is based on maximization of the margin, not the number of features [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
If the classes are separable by a wide margin, then the model will be able to
generalize even with a very large number of features. There are several kernel
functions such as linear kernel, polynomial kernel, sigmoid kernel or radial basis
function (RBF) kernel. A kernel function transforms the input space into a high
dimensional space where the problem can be represented as a linear problem.
Linear kernel is much faster, while RBF generally provides better performance.
However, when the number of features is large, which is typical in text classi
cation, the RBF kernel does not provide better performance than using the linear
kernel. In our experiments, we only tried with linear kernel.
      </p>
      <p>Logistic Regression is a linear classi er, which can be used to predict the
probability of an event. Its main advantage is that its results have an easier
interpretation than those obtained by other classi cation algorithms. Moreover,
this algorithm provides a regularization parameter to avoid over- tting. Among
their disadvantages, it requires much more data than other classi ers to obtain
stable and accurate results. Moreover, it is not able to capture complex
relationships in the data.</p>
      <p>k-NN is one of the simplest classi cation algorithm. It is based on the idea
that the closer instances are, the more probability they belong to the same class.
In this way, one of its main advantages is that it is a lazy classi er because it
does not create a training model from the training dataset, but rather compares
the test instance with all instances to determine its class. Moreover, the classi er
does not depend on the data distribution.</p>
      <p>Random forest is an ensemble classi er of a collection of decision trees by
randomly selecting examples from the training data. The nal prediction is
calculated by aggregating the predictions of each tree. Learning from di erent trees
leads to mitigate the over- tting as well as errors due to bias and variance in
the decision trees. Random forests are more robust and generally exhibit better
results than decision trees.</p>
      <p>Figure 1 shows the distribution of the tweets labelled with stance in the
training dataset. Most of the tweets written in Catalan are clearly in favor of
holding the referendum. However, for the tweets written in Spanish, the stance
seems to be distributed more-or-less equally in the three classes. AGAINST and
NEUTRAL have a very close number of tweets (above 3,200), while FAVOR is
the class with less tweets (around 2,000). This more balanced distribution may
help the learning of the algorithms for Spanish tweets, while the task may be
more di cult for tweets written in Catalan because the classes are not balanced
4</p>
      <p>I. Segura-Bedmar
(there are very few instances for the AGAINST class). The training dataset
contains a total of 8,764 tweets written in Spanish and 9,009 written in Catalan.</p>
      <p>We performed some experiments in order to determine if the context tweets
could help in the task. The results were positive, and thereby, we decided to
include the previous and next tweets of each tweet as part of it. We also tried with
the StanceCat 2017 dataset, however in this case, the experiments showed that it
did not improve the performance. Thus, nally, the StanceCat 2017 dataset was
not used for training our system. The tweets were represented using the simple
Bag-Of-Word (BoW) approach, but instead of using the word frequencies, we
used their inverse document frequencies (tf-idf) to measure the word relevance
in the whole collection of tweets. To do this, we used the T dfVectorizer class
to convert the tweets into tf-idf values.</p>
      <p>As the organizers have not provided any validation set, we randomly
generate a test dataset (20% from the training dataset). To do this, we used the
Strati edShu eSplit class that provides a random split with same balance of
classes. Moreover, we performed grid search on all combinations of the
parameters and for each classi er in order to nd best setting (see Table 1). We used
the GridSearchCV class.
LABDA's early steps toward Multimodal Stance detection
5
dataset. Linear SVM obtained the top F1 for the classes FAVOR (F1=91%) and
NEUTRAL (F1=75%). Random Forest and Multinomial NB also achieved the
top F1 for FAVOR.</p>
      <p>Based on the experiment results, we decided to use Linear SVM to process
the test dataset.</p>
      <p>We sent two di erent runs: using the context tweets and without using them.
The organizers published the nal results and our classi er using text and
context information achieved the top macro F1 (0.2802) for Spanish. However, this
setting is the fourth place for Catalan with a macro F1 of 0.2876 (the top F1
was 0.3068).
4</p>
    </sec>
    <sec id="sec-2">
      <title>Conclusion</title>
      <p>Our system is a very simple approach, which only exploits the tweets. The task
is very attractive and there is much room for improvement. We will try machine
learning classi ers trained using hand-engineered features as well as word
embeddings. We also plan to extend our research by using deep learning methods.
6</p>
      <p>I. Segura-Bedmar</p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <p>This work was supported by the Research Program of the Ministry of
Economy and Competitiveness - Government of Spain, (DeepEMR project
TIN201787548-C2-1-R).
LABDA's early steps toward Multimodal Stance detection
7</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Altman</surname>
            ,
            <given-names>N.S.:</given-names>
          </string-name>
          <article-title>An introduction to kernel and nearest-neighbor nonparametric regression</article-title>
          .
          <source>The American Statistician</source>
          <volume>46</volume>
          (
          <issue>3</issue>
          ),
          <volume>175</volume>
          {
          <fpage>185</fpage>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Random forests</article-title>
          .
          <source>Machine Learning 45(1)</source>
          ,
          <volume>5</volume>
          {
          <fpage>32</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Joachims</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Text categorization with support vector machines: Learning with many relevant features</article-title>
          .
          <source>Machine Learning</source>
          pp.
          <volume>137</volume>
          {
          <issue>142</issue>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Langley</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iba</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>An analysis of bayesian classi ers</article-title>
          .
          <source>In: AAAI</source>
          . vol.
          <volume>90</volume>
          , pp.
          <volume>223</volume>
          {
          <issue>228</issue>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Mohammad</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiritchenko</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sobhani</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cherry</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Semeval-2016 task 6: Detecting stance in tweets</article-title>
          .
          <source>In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)</source>
          . pp.
          <volume>31</volume>
          {
          <issue>41</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Quinlan</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          :
          <article-title>Induction of decision trees</article-title>
          .
          <source>Machine Learning</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <volume>81</volume>
          {
          <fpage>106</fpage>
          (
          <year>1986</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Taule</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bosco</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Overview of the task on stance and gender detection in tweets on catalan independence at ibereval 2017</article-title>
          . In: 2nd Workshop on
          <article-title>Evaluation of Human Language Technologies for Iberian Languages</article-title>
          ,
          <string-name>
            <surname>IberEval</surname>
          </string-name>
          <year>2017</year>
          . vol.
          <year>1881</year>
          , pp.
          <volume>157</volume>
          {
          <fpage>177</fpage>
          .
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Taule</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Overview of the task on multimodal stance detection in tweets on catalan 1oct referendum</article-title>
          .
          <source>In: Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2018</year>
          ), Seville, Spain (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>The nature of statistical learning theory</article-title>
          . Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>S.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duncan</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          :
          <article-title>Estimation of the probability of an event as a function of several independent variables</article-title>
          .
          <source>Biometrika</source>
          <volume>54</volume>
          (
          <issue>1-2</issue>
          ),
          <volume>167</volume>
          {
          <fpage>179</fpage>
          (
          <year>1967</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>