<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>HAHA@IberLEF2021: Humor Analysis using Ensembles of Simple Transformers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Karish Grover</string-name>
          <email>karish19471@iiitd.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tanishq Goel</string-name>
          <email>tanishq.goel@research.iiit.ac.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indraprastha Institute of Information Technology</institution>
          ,
          <addr-line>Delhi</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>International Institute of Information Technology</institution>
          ,
          <addr-line>Hyderabad</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>This paper describes the system submitted to the Humor Analysis based on Human Annotation (HAHA) task at IberLEF 2021. This system achieves the winning F1 score of 0.8850 in the main task of binary classi cation (Task 1) utilizing an ensemble of a pre-trained multilingual BERT, pre-trained Spanish BERT (BETO), RoBERTa, and a naive Bayes classi er. We also achieve second place with macro F1 Scores of 0.2916 and 0.3578 in Multi-class Classi cation and Multi-label Classi cation tasks, respectively, and third place with an RMSE score of 0.6295 in the Regression task.</p>
      </abstract>
      <kwd-group>
        <kwd>Natural Language Processing</kwd>
        <kwd>Ensemble Learning</kwd>
        <kwd>Humor Classi cation</kwd>
        <kwd>Pre-trained Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Humor Analysis based on Human Annotation (HAHA) 2021 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is a challenge
that aims to classify Spanish tweets as humorous or not and further analyze
humor by determining the characteristics present in the tweets which contribute
to the humor. This challenge proposes four tasks: to classify the corpus as
humorous or not, rating the humor present in the tweets, multi-class classi cation
to nd humor mechanism, and Multi-label classi cation tasks to nd the humor
target.
2.1
      </p>
      <sec id="sec-1-1">
        <title>Voting and Ensemble Learning</title>
        <p>
          Incorporating voting in ensembles is a machine learning algorithm. These
algorithms have been utilized in various domains ranging from Early diabetes
prediction, heart diseases prediction [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] to elds of NLP for Named Entity Recognition.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Data</title>
      <p>We were provided with a corpus of crowd-annotated tweets separated into three
subsets: training (24,000 tweets), development (6,000 tweets), and testing (6,000
tweets).</p>
      <p>The columns present in the corpus utilized for training and testing are as follows:
{ text - Text of the tweet.
{ is-humor - binary value (0 or 1) indicating if the tweet is humorous or not.
{ humor-rating - Real value (between 1 and 5) representing the average score
the annotators gave to the tweet.
{ humor-mechanism - Label for humor mechanism. Only a subset of the
tweets have the humor mechanism annotated.</p>
      <p>{ humor-target - Zero or more labels for humor target, separated by ";".
4</p>
    </sec>
    <sec id="sec-3">
      <title>Task Description</title>
      <p>This challenge3 proposes four sub-tasks which are as follows:</p>
      <p>Humor Detection: The main aim is to classify if a tweet is humorous.</p>
      <p>Funniness Score Prediction: Regression task which aims to rate a tweet
in terms of humor.</p>
      <p>Humor Mechanism Classi cation: A multi-class classi cation task with
the primary goal of predicting the mechanism by which the tweet conveys humor.</p>
      <p>Humor Target Classi cation: A multi-label classi cation task which aims
at exploring the content of the joke based on its target.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Methodology</title>
      <p>We have released our code4 and experiments for easy replication. All the
following models were ne-tuned using the AdamW optimizer, with a learning rate
of 4e-5 and batch size of 8. These models were trained on the NVIDIA Tesla
T4 GPU.
The results for this task are summarized in Table 2. The baseline provided by
the organizers for this task uses Naive Bayes with TFIDF features for Binary
Classi cation of tweets achieving an F1 score of 0.6619 over the testing corpus.</p>
      <p>In the nal solution, we tried a series of ensembles of pre-trained models. We
use the Simple Transformers classi cation model, Classi cationModel for this
task which uses a pre-trained model for this task of Binary Classi cation.</p>
      <sec id="sec-4-1">
        <title>3 https://www.fing.edu.uy/inco/grupos/pln/haha/ 4 https://github.com/TanishqGoel/HAHA-IberLEF2021_Jocoso</title>
        <p>
          The nal model is based on hard voting in an ensemble of 5 models:-
Multilingual cased BERT (mBERT) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] which was pre-trained on 104 languages
including Spanish; BETO [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], which is a BERT model pre-trained on a big
Spanish corpus[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. ALBERT , which was pre-trained on the English language
using a masked language modeling (MLM) objective; a variant of BETO model
ne-tuned for sentiment analysis (sBETO), trained with TASS 2020 corpus
(around 5000 tweets) of several dialects of Spanish. RoBERTa base, which is a
model pre-trained on a large corpus of English data in a self-supervised fashion.
Finally, we use a Multinomial Naive Bayes Classi er using TFIDF features.
We use the Tensor ow implementation available on HuggingFace5. All the
models were ne-tuned for 3 epochs and took approximately 18-20 minutes for the
complete training process per model.
        </p>
        <p>While training our models on the given 24,000 tweets, we observed that
BETO outperforms all other pre-trained models. We experimented with
various ensembles from these pre-trained models based on hard voting. We used a
90:10 split for the training corpus without any preprocessing. 2 We have solved
this problem with the technique of classi cation voting ensemble, predicting the
results based on the majority vote of contributing models (preference is given to
BETO and multilingual BERT with high individual F1 scores).
5.2</p>
        <sec id="sec-4-1-1">
          <title>Task 2 : Regression</title>
          <p>The results for this task are summarized in Table 3. Here the baseline is SVM
with TFIDF features which achieves an RMSE of 0.6704 over the test corpus.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>5 https://huggingface.co/models</title>
        <p>2 We observe that preprocessing reduces the F1 score.</p>
        <p>
          In this task, we tried a series of ensembles of pre-trained models, and
results are predicted utilizing the technique of regression voting ensembles. We
combine our model with a regression head. Our ensemble comprises of 6
pretrained models:- Multilingual Base cased BERT (mBERT), ALBERT base
v2, RoBERTa base, DistilBERT base cased [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], BETO [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and XLNet
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] base cased model followed by regression voting. All the models were
netuned for 3 epochs and took approximately 10 minutes for the complete training
process per model.
        </p>
        <sec id="sec-4-2-1">
          <title>Task 3 : Multi-Class Classi cation</title>
          <p>The results of task 3 are summarized in table 4. The baseline provided by the
organizers for Task 3 achieves a macro F1 score of 0.1001 over the training
corpus, which is based on Naive Bayes with TFIDF features.</p>
          <p>
            Our model, with a Macro F1 score of 0.2916, utilizes BETO [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] to solve this
problem of multi-class classi cation. We ne-tuned our model over the training
corpus, which comprises of approx 4800 tweets for this task. All the models were
ne-tuned for 3 epochs and took approximately 4-5 minutes for the complete
training process per model.
5.4
          </p>
        </sec>
        <sec id="sec-4-2-2">
          <title>Task 4 : Multi-Label Classi cation</title>
          <p>We use MultiLabelClassi cationModel from Simple Transformers for
this task. Our nal system comprises of a pre-trained Spanish BERT cased
6 Combining BETO Cased and Uncased: The BETO model classi er outputs
Softmax probabilities for all the classes. We choose the top 3 classes i.e. the classes
with the highest probabilities for both the models. Next, from these 6 classes, we
choose the class which appears maximum times as the nal prediction.
model, which is ne-tuned for 4 epochs on approximately 2000 tweets. It took
approximately 5 minutes for the complete training process per model. Various
ensembles and their results are listed in the above table.
6</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>This paper describes the winning solution for Task 1, the second-place solution
for task 3 and task 4, and the third-place solution for Task 2 in the evaluation
phase of the Humor Analysis based on Human Annotation (HAHA) challenge
at the Iberian Languages Evaluation Forum (IberLEF) 2021. During the
development phase, our models achieved rst place in all four tasks. The combined
results for both phases are mentioned in Table 4.</p>
      <p>In all the tasks, we tried to exploit the power of voting in ensembles to get
excellent results. For Task 1, 6 of our ensemble models outperform the second
and third place solutions. Similarly, in other tasks, our models outperform the
next place solutions by a high margin.</p>
      <p>Further work can be done in preprocessing the Spanish tweets to analyze
the e ects of various preprocessing methods on Humor prediction. An interesting
approach is the translation of Spanish tweets to English and back to Spanish
(i.e., Back Translation) as a method of preprocessing, which is a domain open
for further experimenting and research.
7 Pre-processing includes cleaning, tokenizing, and parsing:- URLs, hashtags,
mentions, reserved words (RT, FAV), emojis, and smileys. Sample preprocessor can be
found at https://pypi.org/project/tweet-preprocessor/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Luis</given-names>
            <surname>Chiruzzo</surname>
          </string-name>
          , Santiago Castro, Santiago Gongora, Aiala Rosa,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Meaney</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Rada</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          . Overview of HAHA at IberLEF 2021:
          <article-title>Detecting, Rating and Analyzing Humor in Spanish</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          ,
          <volume>67</volume>
          (
          <issue>0</issue>
          ),
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Peng-Yu Chen</surname>
          </string-name>
          and
          <string-name>
            <surname>Von-Wun Soo</surname>
          </string-name>
          .
          <article-title>Humor recognition using deep learning</article-title>
          .
          <source>In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>2</volume>
          (
          <issue>Short Papers)</issue>
          , pages
          <fpage>113</fpage>
          {
          <fpage>117</fpage>
          ,
          <string-name>
            <surname>New</surname>
            <given-names>Orleans</given-names>
          </string-name>
          , Louisiana,
          <year>June 2018</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Minghan</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Hao Yang</surname>
            , Ying Qin, Shiliang Sun, and
            <given-names>Yao</given-names>
          </string-name>
          <string-name>
            <surname>Deng</surname>
          </string-name>
          .
          <article-title>Uni ed humor detection based on sentence-pair augmentation and transfer learning</article-title>
          .
          <source>In EAMT</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Orion</given-names>
            <surname>Weller</surname>
          </string-name>
          and
          <string-name>
            <given-names>Kevin</given-names>
            <surname>Seppi</surname>
          </string-name>
          .
          <article-title>Humor detection: A transformer gets the last laugh</article-title>
          .
          <source>In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          , pages
          <fpage>3621</fpage>
          {
          <fpage>3625</fpage>
          ,
          <string-name>
            <surname>Hong</surname>
            <given-names>Kong</given-names>
          </string-name>
          , China,
          <year>November 2019</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Adilzhan</given-names>
            <surname>Ismailov</surname>
          </string-name>
          .
          <article-title>Humor analysis based on human annotation challenge at iberlef 2019: First-place solution</article-title>
          .
          <source>In IberLEF@SEPLN</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Issa</given-names>
            <surname>Annamoradnejad</surname>
          </string-name>
          and
          <string-name>
            <given-names>Gohar</given-names>
            <surname>Zoghi</surname>
          </string-name>
          . Colbert:
          <article-title>Using bert sentence embedding for humor detection</article-title>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Khalid</given-names>
            <surname>Raza</surname>
          </string-name>
          . Chapter 8
          <article-title>- improving the prediction accuracy of heart disease with ensemble learning and majority voting rule</article-title>
          . In Nilanjan Dey, Amira S. Ashour, Simon James Fong, and Surekha Borra, editors,
          <source>U-Healthcare Monitoring Systems, Advances in Ubiquitous Sensing Applications for Healthcare</source>
          , pages
          <volume>179</volume>
          {
          <fpage>196</fpage>
          . Academic Press,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Pengcheng</given-names>
            <surname>He</surname>
          </string-name>
          , Xiaodong Liu,
          <string-name>
            <given-names>Jianfeng</given-names>
            <surname>Gao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Weizhu</given-names>
            <surname>Chen</surname>
          </string-name>
          . Deberta:
          <article-title>Decoding-enhanced bert with disentangled attention</article-title>
          .
          <source>In International Conference on Learning Representations</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>CoRR</source>
          , abs/
          <year>1810</year>
          .04805,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10] Jose Can~ete, Gabriel Chaperon, Rodrigo Fuentes,
          <string-name>
            <surname>Jou-Hui</surname>
            <given-names>Ho</given-names>
          </string-name>
          , Hojin Kang, and
          <string-name>
            <given-names>Jorge</given-names>
            <surname>Perez</surname>
          </string-name>
          .
          <article-title>Spanish pre-trained bert model and evaluation data</article-title>
          .
          <source>In PML4DC at ICLR</source>
          <year>2020</year>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Victor</surname>
            <given-names>Sanh</given-names>
          </string-name>
          , Lysandre Debut, Julien Chaumond, and Thomas Wolf.
          <article-title>Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter</article-title>
          . ArXiv, abs/
          <year>1910</year>
          .01108,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Zhilin</surname>
            <given-names>Yang</given-names>
          </string-name>
          , Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and
          <string-name>
            <surname>Quoc</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Le</surname>
          </string-name>
          . Xlnet:
          <article-title>Generalized autoregressive pretraining for language understanding</article-title>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>