<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Idiap and UAM Participation at MEX-A3T Evaluation Campaign</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Esaú Villatoro-Tello</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriela Ramírez-de-la-Rosa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sajit Kumar</string-name>
          <email>kumar.sajit.sk@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shantipriya Parida</string-name>
          <email>shantipriya.parida@idiap.ch</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petr Motlicek</string-name>
          <email>petr.motlicek@idiap.ch</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre of Excellence in AI, Indian Institute of Technology Kharagpur</institution>
          ,
          <addr-line>West Bengal</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Idiap Research Institute</institution>
          ,
          <addr-line>Rue Marconi 19, 1920, Martigny</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universidad Autónoma Metropolitana, Unidad Cuajimalpa</institution>
          ,
          <addr-line>Mexico City</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <fpage>252</fpage>
      <lpage>257</lpage>
      <abstract>
        <p>This paper describes our participation in the shared evaluation campaign of MexA3T 2020. Our main goal was to evaluate a Supervised Autoencoder (SAE) learning algorithm in text classification tasks. For our experiments, we used three diferent sets of features as inputs, namely classic word n-grams, char n-grams, and Spanish BERT encodings. Our results indicate that SAE is adequate for longer and more formal written texts. Accordingly, our approach obtained the best performance ( = 85.66%) in the fake-news classification task.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Supervised Autoencoders</kwd>
        <kwd>Text Representation</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Natural Language Processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In this era where social media and instant messaging is widely used for communication, the reach
and volume of these text messages are enormous. The use of aggressive language or dissemination
of false news is widespread across these communication channels. It is impossible to verify the text
messages manually. We need automated systems that help users of these communication channels to
determine if they are reading real or fake news or to try to flag when someone has been targeted with
aggressive messages.</p>
      <p>
        Besides the fact that most of the previous works done in these two tasks, namely aggressiveness
detection and fake-news detection, are for English, little research has been done for Spanish using the
most recent NLP techniques such as deep learning approaches. On the one hand, for aggressiveness
detection, in past editions of the MEX-A3T1 challenge [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], only three out of nine approaches used
some deep learning classifier, particularly for CNN, LSTM, and GRU, with no good performances [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
On the other hand, most of the current research on fake-news detection has been done for the English
language, using graph CNNs [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and more recently attention mechanism-based transformer models
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Our participation at MEX-A3T 2020 aimed at exploring the use of Supervised Autoencoder (SAE)
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] in two diferent text classification tasks: i) aggressiveness detection in Spanish tweets, where
documents are very short and informal texts; and, ii) fake-news detection from Spanish newspapers,
where documents are larger and contain a more formal written style. We found that SAE can
generalize well for both tasks, particularly, for the aggression detection our approach obtains an F1 macro
of 80.7%, while for the fake-news detection we reached the best score with an F1 macro of 85.6%.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>
        For both tasks, we aimed at evaluating the impact of recent generalization techniques, namely SAE
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] with a varied set of features as input vectors. Although SAE has been extensively evaluated in
image classification tasks [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], very few works exist evaluating the impact of SAE in text classification
tasks, e.g. language detection [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Next, we briefly describe the SAE theory, and we provide some
details on how the document representation was generated for all the explored features.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Supervised Autoencoder</title>
        <p>
          An autoencoder (AE) is a neural network that learns a representation (encoding) of input data and then
learns to reconstruct the original input from the learned representation. The autoencoder is mainly
used for dimensionality reduction or feature extraction [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Normally, it is used in an unsupervised
learning fashion, meaning that we leverage the neural network for the task of representation learning.
By learning to reconstruct the input, the AE extracts underlying abstract attributes that facilitate
accurate prediction of the input.
        </p>
        <p>
          Thus, an SAE is an autoencoder with the addition of a supervised loss on the representation layer.
The addition of supervised loss to the autoencoder loss function acts as a regularizer and results in
the learning of the better representation for the desired task [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. For the case of a single hidden layer,
a supervised loss is added to the output layer and for a deep supervised autoencoder, the innermost
(smallest) layer would have a supervised loss added to the bottleneck layer that is usually transferred
to the supervised layer after training the autoencoder.
        </p>
        <p>For all our performed experiments, the overall configuration of the SAE model was done using
nonlinear activation function (ReLU) with 3 hidden layers, the number of nodes in the representation
layer was set to 300, and we trained to a maximum of 100 epochs.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Input Features</title>
        <p>
          The SAE receives as input the representation of the document build using Spanish pre-trained BERT
encodings (BETO [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]), traditional text representation techniques such as word and char n-grams
(ranges 1-2 and 1-3), and, combinations of BETO encodings plus traditional words/char n-grams
vectors.
        </p>
        <p>
          We choose to evaluate the impact of word and char n-grams since as previous research has shown
[
          <xref ref-type="bibr" rid="ref9">9, 10, 11</xref>
          ], word n-grams are capable of capturing the identity of a word and its contextual usage, while
character n-grams are additionally capable of providing an excellent trade-of between sparseness
and word’s identity, while at the same time they combine diferent types of information: punctuation,
morphological makeup of a word, lexicon and even context. For generating this type of features we
used the CountVectorizer and TfidfTransformer libraries from the scikitlearn2 toolkit. For the
case of the fake-news detection task, we empirically chose the best values for the min-df and max-df
parameters, which are reported on Table 3. For the aggressiveness task, these values were fixed (for
all the experiments) to min-df= 0.001 and max-df= 0.3.
        </p>
        <p>
          Additionally, we evaluate the impact of transformer-based models [12] as a language representation
strategy. For our experiments we tested BETO3, a BERT model trained on a large dataset of Spanish
documents [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. As known, the [CLS] token acts an “aggregate representation” of the input tokens,
and can be considered as a sentence representation for many classification tasks [13]. Accordingly,
we apply the following approaches for generating the representation of the document: i) for the
aggressiveness task, each tweet is directly passed to the BETO model, and is represented using the
encoding of the last hidden layer from the [CLS] token; ii) for the fake-news detection task, we split
the news document into smaller chunks, obtain the [CLS] encoding of each chunk, and then we apply
either a min, max, mean pooling for generating the final document representation. Table 1 depicts the
type and variations of features tested during the training phase.
        </p>
        <p>Finally, it is worth mentioning that we did not apply any preprocessing steps in any of the tasks.
To validate our experiments, we performed a stratified 10 cross-fold validation strategy.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Aggressiveness Identification</title>
      <p>The ofensive language in Mexican Spanish corpus used for this task has 10,475 Spanish tweets. The
training partition contains 7332 tweets with two possible classes (aggressive or non-aggressive). More
details of this corpus can be found in [14]. Table 2 shows the results obtained in both, the validation
phase and our two runs submitted for the final evaluation of this task over 3143 unseen tweets. The
diference between the two submitted outputs, i.e., run id 1 and 2 ( †), is the classifier, submission 2
was trained using a Multi-Layer Perceptron (MLP).</p>
    </sec>
    <sec id="sec-4">
      <title>4. Fake-News Identification</title>
      <p>The fake-news Spanish corpus used in this task has 971 news from 9 diferent topics. The training
partition provided for the development stage has 676 news with a binary class (fake or true). Each
news is compose by the headline, body, and the URL from where the news was published (the complete
description of this corpus can be found in [15]). For our experiments, we used only the headline and
the body of the news as a single document. Table 3 shows the results obtained in the development
stage of the challenge, and the two runs submitted for the final evaluation of the tasks over 295 unseen
news.</p>
      <p>2https://scikit-learn.org/stable/index.html
3https://github.com/dccuchile/beto</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>This paper describes Idiap &amp; UAM participation at the MEX-A3T 2020 shared task on the Classification
of Fake-News and Aggressiveness analysis. Our participation aimed at analyzing the performance of
recent generalization techniques, namely deep supervised autoencoders. To this end, we performed
a comparative analysis among simple transformers based language representation strategies and
traditional text representations such as word and character n-grams. Notably, the SAE method benefits
the most when it is feed with input features generated from the combination of BERT encodings and
word/char n-grams. Particularly, for the aggression detection task, our proposed approach can obtain
F+</p>
      <p>Fa relative improvement of 1.1% over the stronger baseline, while for the fake-news detection task the
improvement over the baseline is 8.1%.</p>
      <p>As future work, we plan to perform an analysis of what are the dataset characteristics that allow
the SAE approach to provide good performances. Also, we want to evaluate the impact of SAE’s
hyperparameter tuning through optimization methods, such as Bayes Optimizer[16], and evaluate
our proposed approach on other similar classification tasks.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The work was supported by an innovation project (under an InnoSuisse grant) oriented to improve
the automatic speech recognition and natural language understanding technologies for German. Title:
“SM2: Extracting Semantic Meaning from Spoken Material” funding application no. 29814.1 IP-ICT
and EU H2020 project “Real-time network, text, and speaker analytics for combating organized crime"
(ROXANNE), grant agreement: 833635. The first author, Esaú Villatoro-Tello is supported partially by
Idiap, SNI-CONACyT, CONACyT project grant CB-2015-01-258588, and UAM-C Mexico during the
elaboration of this work.
[10] A. Kulmizev, B. Blankers, J. Bjerva, M. Nissim, G. van Noord, B. Plank, M. Wieling, The power
of character n-grams in native language identification, in: Proceedings of the 12th Workshop
on Innovative Use of NLP for Building Educational Applications, 2017, pp. 382–389.
[11] F. Sánchez-Vega, E. Villatoro-Tello, M. Montes-y Gómez, P. Rosso, E. Stamatatos, L.
VillaseñorPineda, Paraphrase plagiarism identification with character-level features, Pattern Analysis and
Applications 22 (2019) 669–681.
[12] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin,
Attention is all you need, in: Advances in neural information processing systems, 2017, pp.
5998–6008.
[13] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional
transformers for language understanding, in: Proceedings of the 2019 Conference of the North
American Chapter of the Association for Computational Linguistics: Human Language
Technologies, Volume 1 (Long and Short Papers), 2019, pp. 4171–4186. URL: https://www.aclweb.
org/anthology/N19-1423. doi:10.18653/v1/N19-1423.
[14] M. J. Díaz-Torres, P. A. Morán-Méndez, L. Villasenor-Pineda, M. Montes-y Gómez, J. Aguilera,
L. Meneses-Lerín, Automatic detection of ofensive language in social media: Defining linguistic
criteria to build a Mexican Spanish dataset, in: Proceedings of the Second Workshop on Trolling,
Aggression and Cyberbullying, European Language Resources Association (ELRA), Marseille,
France, 2020, pp. 132–136. URL: https://www.aclweb.org/anthology/2020.trac-1.21.
[15] J.-P. Posadas-Durán, H. Gomez Adorno, G. Sidorov, J. Moreno, Detection of fake news in a new
corpus for the spanish language, Journal of Intelligent Fuzzy Systems 36 (2019) 4869–4876.
doi:10.3233/JIFS-179034.
[16] J. Snoek, H. Larochelle, R. P. Adams, Practical bayesian optimization of machine learning
algorithms, in: Advances in neural information processing systems, 2012, pp. 2951–2959.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Aragón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jarquín</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montes-y Gómez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Escalante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Villaseñor-Pineda</surname>
          </string-name>
          , H. GómezAdorno,
          <string-name>
            <given-names>G.</given-names>
            <surname>Bel-Enguix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Posadas-Durán</surname>
          </string-name>
          ,
          <article-title>Overview of mex-a3t at iberlef 2020: Fake news and aggressiveness analysis in mexican spanish</article-title>
          ,
          <source>in: Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF)</source>
          , Malaga, Spain, September,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Aragón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>Álvarez-Carmona</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Montes-y Gómez</surname>
            ,
            <given-names>H. J.</given-names>
          </string-name>
          <string-name>
            <surname>Escalante</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Villasenor-Pineda</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Moctezuma</surname>
          </string-name>
          ,
          <article-title>Overview of mex-a3t at iberlef 2019: Authorship and aggressiveness analysis in mexican spanish tweets</article-title>
          ,
          <source>in: Notebook Papers of 1st SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF)</source>
          , Bilbao, Spain,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Monti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Frasca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Eynard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mannion</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Bronstein</surname>
          </string-name>
          ,
          <article-title>Fake news detection on social media using geometric deep learning</article-title>
          , arXiv preprint arXiv:
          <year>1902</year>
          .
          <volume>06673</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Qazi</surname>
          </string-name>
          , M. U. S. Khan,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <article-title>Detection of fake news using transformer model</article-title>
          ,
          <source>in: 2020 3rd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>A classification supervised auto-encoder based on predefined evenlydistributed class centroids</article-title>
          , arXiv preprint arXiv:
          <year>1902</year>
          .
          <volume>00220</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patterson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>White</surname>
          </string-name>
          ,
          <article-title>Supervised autoencoders: Improving generalization performance with unsupervised regularizers</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>107</fpage>
          -
          <lpage>117</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Parida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Villatoro-Tello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Motlicek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhan</surname>
          </string-name>
          ,
          <article-title>Idiap submission to swiss-german language detection shared task</article-title>
          ,
          <source>in: Proceedings of the 5th Swiss Text Analytics Conference (SwissText) &amp; 16th Conference on Natural Language Processing (KONVENS)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cañete</surname>
          </string-name>
          , G. Chaperon,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fuentes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pérez</surname>
          </string-name>
          ,
          <article-title>Spanish pre-trained bert model and evaluation data</article-title>
          , in: to appear
          <source>in PML4DC at ICLR</source>
          <year>2020</year>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Miao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-H.</given-names>
            <surname>Chauchat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>N-grams based feature selection and text representation for chinese text classification</article-title>
          ,
          <source>International Journal of Computational Intelligence Systems</source>
          <volume>2</volume>
          (
          <year>2009</year>
          )
          <fpage>365</fpage>
          -
          <lpage>374</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>