<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Valentino Giudice[</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Aspie96 at HAHA (IberLEF 2019): Humor Detection in Spanish Tweets with Character-Level Convolutional RNN</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Turin</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>0000</year>
      </pub-date>
      <volume>0002</volume>
      <fpage>165</fpage>
      <lpage>171</lpage>
      <abstract>
        <p>A characterization of humor based upon machine learning that allows its automatic detection is not yet speci ed. This report describes the system used by the Aspie96 team in the HAHA shared task (part of IberLEF 2019) for humor recognition in tweets in Spanish: a neural network using exclusively character-level features. Humor Detection Telling if a tweet was intended to be humorous by the author or not. The results of the subtask were measured using F1-score for the positive (humorous) class. Funniness Score Prediction Predicting the level of funniness of humorous tweets. The results of the subtask were measured using RMSE (root-meansquared error). The tweets had been crowd-annotated in [1] according to a voting scheme: annotators could, for each tweet, mark it as not humorous or mark it as humorous and give it a number of stars (1 to 5) according to its funniness (5 being the best), for a total of six options. Combining the votes of the annotators, each tweet was labelled as humorous or not humorous and each humorous tweet was given a funniness score, from 1 to 5, equal to its average number of star. In the training dataset, for each tweet, the following information was provided:</p>
      </abstract>
      <kwd-group>
        <kwd>humor</kwd>
        <kwd>neural network</kwd>
        <kwd>natural language processing</kwd>
        <kwd>Twitter</kwd>
        <kwd>Spanish</kwd>
        <kwd>HAHA</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        HAHA (Humor Analysis based on Human Annotation), a shared task organized
within IberLEF 2019 (Iberian Languages Evaluation Forum) and described in
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], proposed two di erent subtasks related to automatic humor detection in
tweets in Spanish:
Tweet ID The ID of the tweet, on Twitter, not intended to be used to extract
metadata.
      </p>
      <p>Text The text of the tweet.</p>
      <p>Is humorous Whether the tweet is humorous.</p>
      <p>Votes (Not humor, 1 star, 2 stars, 3 stars, 4 stars, 5 stars) The
number of votes given to the tweet, for each possibility.</p>
      <p>Funniness score The average funniness score of the tweet, only provided for
humorous tweets.</p>
      <p>In the testing dataset, just the text of each tweet was provided.</p>
      <p>The competition was run using the CodaLab platform1. Each team was
allowed up to 10 submissions per day and a total maximum of 20 submissions.
Each team could decide, at any moment, which one submission to include in the
leaderboards, which were always visible to all participants and updated in real
time. A separate leaderboard was used for each subtask.</p>
      <p>Each submission could be meant for the humor detection subtask only or for
both subtasks and annotated each tweet in the testing dataset with a binary
label, indicating whether it had been detected as humorous and, if it was meant
for the funniness score prediction dataset also, the predicted funniness score
(even for tweets not detected as humorous). The predicted funniness score was
only considered for tweets classi ed as humorous in the gold testing dataset.</p>
      <p>The Aspie96 team took part in both subtasks, using a neural network with
character-level features.</p>
      <p>The structure of the model and its results are described in the following
sections.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Description of the System</title>
      <p>The system used by the Aspie96 team strictly works at character level, without
using any word-level features (such as word embeddings) or any data other than
what is provided for the speci c task at hand.</p>
      <p>
        It is a neural network adapted from [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], where it was used for the IronITA
2018 task of irony detection in tweets in Italian described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>The input of the system is a xed-size list of arrays. The neural network
begins with a series of unidimensional convolutional layers: each lter has a small
width (of 3) and convolves trough the input. The output of each convolutional
layer is again a list of arrays: the length of each array is constant for each layer
and is set trough a hyperparameter (it is 8 for all convolutional layers), while
the length of the list is slightly smaller than that of the input one, because
the convolutional layers don't use padding. The series of convolutional layers is
followed by a bidirectional recurrent layer. The output of the bidirectional layer,
which is an individual dense vector representing information about the whole
tweet, is the input of a simple fully connected layer, with one output, whose
1 https://competitions.codalab.org/
activation function is the logistic function (the logistic function has an output
between 0 and 1).</p>
      <p>The purpose of the unidirectional convolutional layers is to create a
higherlevel dense representation of each input vector, together with its surrounding
ones, providing context. The purpose of the bidirectional recurrent layer is to
produce an individual vector which can be considered as a representation of the
tweet. Additional layers meant for regularization are used (a gaussian noise layer
applied to the input of the neural network and several dropout layers).</p>
      <p>The input tweet is represented as a list with xed length (leading to padding
on the left or truncation to the right, where needed) of sparse vectors. Each
vector of the list represents an individual character of the tweet and contains
ags whose values are either 0 or 1.</p>
      <p>Most of the ags are mutually exclusive and are used to identify a character
among a list of known ones. Additional ags are used to represent properties of
the character.</p>
      <p>The full list of known characters is the following:</p>
      <p>Space ! " # $ % &amp; ' ( ) * + , - . / 0 1 2 3 4 5 6 7
8 9 : ; = ? @ [ ] _ a b c d e f g h i j k l m n o p q
r s t u v w x y z | ~</p>
      <p>Emojis are represented similarly to their Unicode name (in English), with
additional ags.</p>
      <p>The full list of additional ags is:
Uppercase letter Indicates whether the character is an uppercase letter.
Accent Indicates whether the character is an accented vowel, regardless of the
accent being acute or grave.</p>
      <p>Emoji Indicates whether the character is part of the Unicode name of an emoji.
Emoji start Indicates whether the character is the rst in the Unicode name
of an emoji.</p>
      <p>Letter Indicates whether the character is a letter.</p>
      <p>Number Indicates whether the character is a numerical digit.</p>
      <p>Inverted Indicates if the character is an inverted question mark or exclamation
mark ( or ).</p>
      <p>Tilde Whether the character is an N with virgulilla (~n or N~).</p>
      <p>Multiple spaces are represented as one and unknown characters are ignored.</p>
      <p>
        A more in-depth description of the role of each layer is given in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. A
visualization of the network is given in Figure 1.
      </p>
      <p>
        The main di erence between the model used for the HAHA task and the one
presented in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for IronITA is the language: Spanish contains some characters
which are not part of the Italian language.
      </p>
      <p>The same structure was used for both HAHA subtasks: for the humor
detection subtask, the output of the network was rounded to 0 or 1 to get a binary
value and for the funniness score prediction subtask it was multiplied by ve to
get a value between 0 and 5 (in principle this could have allowed the model to
output funniness scores below 1, but that was not the case for any of the labels
in the submission).
3</p>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>A total of 18 teams took part in the humor detection subtask.</p>
      <p>In the humor detection subtask, the Aspie96 team got an F1-score, for the
positive class, of 0.711, ranking in the 13th position, below LadyHeidy, with an
F1-score of 0.725 and above dodinh, with an F1-score of 0.660.</p>
      <p>As a comparison, the best ranking team (adilism) got an F1-score of 0.821.</p>
      <p>The values for precision, recall and accuracy for the Aspie96 team were,
respectively, 0.678 (14th position), 0.749 (11th position) and 0.763 (13th position).</p>
      <p>The baseline system was one marking tweets as humorous with a probability
of 0.5. It got an F1-score, precision, recall and accuracy of, respectively, 0.440,
0.394, 0.497 and 0.505.</p>
      <p>The results of the Aspie96 team are summarized in Table 1 and compared
with the best system and the baseline system.</p>
      <p>A total of 13 out of 18 teams took part in the funniness score prediction
subtask.</p>
      <p>In the funniness score prediction subtask, the Aspie96 team got a RMSE of
1.673, ranking in the 12th position, below garain, with a RMSE of 1.653 (a lower
RMSE is better) and above dodinh, with a RMSE of 1.810.
Similar models had been presented before.</p>
      <p>
        A roughly similar model based on convolutions for text classi cation was
presented in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] in the context of EMNLP 2014, described in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. It used
wordlevel features (trough a pretrained and then ne-tuned embedding layer) as the
input of an individual convolutional layer, which used multiple kernels of di erent
sizes, to detect di erent high-level features. A timewise max-pooling layer was
then used to produce a vector whose length was the same as the number of
kernels in the convolutional layer. The resulting vector was the input of a fully
connected layer producing the output of the neural network. The model produced
results better of those of the state of the art at the time on 4 out of 7 tasks.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], a model more similar to the one proposed was presented. The model
used a character-level convolutional neural network for text classi cation,
achieving competitive results. However, it did not use a recurrent layer and represented
each input character as a one-hot encoded vector. The model was trained
using very big datasets as this can work better for character-level neural networks,
which don't rely on pretrained word embeddings, making use of information
outside the training set. Because of its structure and attributes, the model wasn't
much exible and easily adaptable to di erent kinds of usage.
5
      </p>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>The results, for both HAHA subtasks, were signi cantly better than the baseline,
however there is clearly much room for improvement.</p>
      <p>
        As the system is a mere adaptation of the one presented in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], this allows to
compare its performance among the two di erent tasks.
      </p>
      <p>
        In the binary classi cation subtask presented in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the model of the
described in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] had a performance which was not too much worse from the one of
the teams ranking above, the best of which described in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        However, in the HAHA binary classi cation subtask (humor detection) the
results in the leaderboard were much more spread out and the results obtained
by the Aspie96 team were quite di erent from those of the best systems. Despite
the scores of the best ranking teams being better than those shown in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the
results obtained by the Aspie96 team were not.
      </p>
      <p>
        The structure of the neural network used in the HAHA task by the Aspie96
team, although having been originally presented in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for the classi cation of
Italian tweets, was not built speci cally for the Italian language, but for both
Italian and English and its structure suggests its applicability for di erent tasks
of tweets classi cation (as long as the language is an alphabetical one and uses
the Latin alphabet). The results obtained in HAHA, however, show the neural
network to be less exible than anticipated and the nature of the speci c task
at hand to in uence how close its results can be to the best ones achieved.
      </p>
      <p>This does not mean a character-level approach cannot be used e ectively
for humor detection, but the structure of the neural network must be improved
in order to be more general, resulting in better results and consistency among
di erent tasks.</p>
      <p>For the simple case of binary tweet classi cation, the ideal result would be
that of obtaining a structure capable of working across di erent languages (at
least Spanish, English and Italian), regardless of the high-level task (whether
humor detection, irony detection or any other).</p>
      <p>Results outside the scope of this paper, regarding the performance of the
model, in di erent tasks, in di erent languages, also con rm the current
inconsistency of the results obtained by the structure, but show its applicability and
its ability to get good results for at least some di erent tasks, suggesting it is
indeed needed to make it more general.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Castro</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garat</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moncecchi</surname>
          </string-name>
          , G.:
          <article-title>A Crowd-Annotated Spanish Corpus for Humor Analysis</article-title>
          .
          <source>In: Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media</source>
          . pp.
          <volume>7</volume>
          {
          <issue>11</issue>
          (
          <year>2018</year>
          ), https: //aclweb.org/anthology/papers/W/W18/W18-3502/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castro</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Etcheverry</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garat</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prada</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          : Overview of HAHA at IberLEF 2019:
          <article-title>Humor Analysis based on Human Annotation</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ).
          <source>CEUR Workshop Proceedings</source>
          , CEUR-WS, Bilbao,
          <source>Spain (9</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cignarella</surname>
            ,
            <given-names>A.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frenda</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bosco</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , et al.:
          <article-title>Overview of the EVALITA 2018 Task on Irony Detection in Italian Tweets (IronITA)</article-title>
          .
          <source>In: Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'18)</source>
          . pp.
          <volume>26</volume>
          {
          <fpage>34</fpage>
          . CEUR Workshop Proceedings, CEUR-WS (
          <year>2018</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2263</volume>
          /paper005.pdf
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cimino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Mattei</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dell'Orletta</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>: Multi-task learning in deep neural networks at evalita 2018</article-title>
          .
          <article-title>In: Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'18)</article-title>
          . pp.
          <volume>86</volume>
          {
          <fpage>95</fpage>
          . CEUR Workshop Proceedings, CEUR-WS (
          <year>2018</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2263</volume>
          /paper013. pdf
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Giudice</surname>
          </string-name>
          , V.:
          <article-title>Aspie96 at IronITA (EVALITA</article-title>
          <year>2018</year>
          )
          <article-title>: Irony Detection in Italian Tweets with Character-Level Convolutional RNN</article-title>
          .
          <source>In: Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'18)</source>
          . pp.
          <volume>160</volume>
          {
          <issue>165</issue>
          (
          <year>2018</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2263</volume>
          /paper026.pdf
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Convolutional Neural Networks for Sentence Classi cation</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <volume>1746</volume>
          {
          <fpage>1751</fpage>
          . Association for Computational Linguistics, Doha, Qatar (
          <year>2014</year>
          ), https://doi.org/10.3115/v1/
          <fpage>D14</fpage>
          -1181
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Moschitti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daelemans</surname>
          </string-name>
          , W. (eds.):
          <source>Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          .
          <article-title>Association for Computational Linguistics</article-title>
          , Doha, Qatar (
          <year>2014</year>
          ), https://doi.org/10.3115/v1/D14-1
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jake</surname>
            <given-names>Zhao</given-names>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          , LeCun, Y.:
          <article-title>Character-level Convolutional Networks for Text Classi cation</article-title>
          . In: Cortes,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Lawrence</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.D.</given-names>
            ,
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.D.</given-names>
            ,
            <surname>Sugiyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Garnett</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>28</volume>
          . pp.
          <volume>649</volume>
          {
          <fpage>657</fpage>
          . Curran Associates, Inc. (
          <year>2015</year>
          ), http://papers.nips.cc/paper/ 5782-character
          <article-title>-level-convolutional-networks-for-text-classi cation</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>