=Paper=
{{Paper
|id=Vol-2943/haha_paper7
|storemode=property
|title=ColBERT at HAHA 2021: Parallel Neural Networks for Rating Humor in Spanish Tweets
|pdfUrl=https://ceur-ws.org/Vol-2943/haha_paper7.pdf
|volume=Vol-2943
|authors=Issa Annamoradnejad,Gohar Zoghi
|dblpUrl=https://dblp.org/rec/conf/sepln/AnnamoradnejadZ21
}}
==ColBERT at HAHA 2021: Parallel Neural Networks for Rating Humor in Spanish Tweets==
<pdf width="1500px">https://ceur-ws.org/Vol-2943/haha_paper7.pdf</pdf>
<pre>
   ColBERT at HAHA 2021: Parallel Neural
 Networks for Rating Humor in Spanish Tweets

                       Issa Annamoradnejad1[0000−0003−3147−6389]
                          and Gohar Zoghi2[0000−0003−0298−4069]
     1
         Department of Computer Engineering, Sharif University of Technology, Iran
                               i.moradnejad@gmail.com
                    2
                       Golestan University of Medical Sciences, Iran
                                zoughi.g@goums.ac.ir


          Abstract. Previously, we proposed ColBERT, a humor detection model
          based on the general linguistic structure of humor for formal English
          texts. ColBERT uses BERT model to produce embeddings for the text
          sentences, which will be put as inputs into a parallel neural network. In
          this paper, we utilized the proposed model on informal Spanish texts to
          detect humor and rate its level. The current task has three differences
          compared to the original humor detection task on the ColBERT dataset:
          (1) rating humor is a regression task rather than binary classification, (2)
          texts are informal, and (3) texts are in a different language. Using our
          general model and without any knowledge of the Spanish language, we
          participated in HAHA shared task at IberLEF 2021 Forum and achieved
          2nd place for humor rating and 3rd place for binary humor detection. The
          results confirm robustness of our proposed model.

          Keywords: humor rating · parallel neural networks · computational
          humor · informal texts · Spanish tweets


1        Introduction
Computational humor detection is a an ongoing research track that has several
delicacies due to the linguistic features of humor and the various mechanisms
that can be incorporated to bring laughter in humans.
    In previous works, researchers mostly focused on the binary task of humor
detection, where the goal is to separate humor from non-humor texts. However,
it would also be beneficial to rate the level of humor existing in a humorous text.
The new task would contribute in fixing the level of humor for chatbots based
on the mood of user or query.
    Computational humor has a long list of history, from using statistical and N-
gram analysis [10], Regression Trees [7], Word2Vec combined with K-NN Human
    IberLEF 2021, September 2021, Málaga, Spain.
    Copyright © 2021 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
Centric Features [12], Convolutional Neural Networks [3, 11], and pre-trained
models based on transfer learning [1].
    Previously, we proposed the ColBERT model [1] based on the linguistic fea-
tures of humor for the binary task of humor detection in formal English texts.
Our approach separates sentences and uses the English BERT-base-uncased pre-
trained model to encode them into sentence embeddings. They are separately
fed into parallel hidden layers of neural network to extract mid-level features for
each sentence (related to context, type of sentence, etc). The final layers com-
bine the output of all previous lines of hidden layers in order to predict the final
output. In theory, these final layers should determine the congruity of sentences
and detect the transformation of reader’s viewpoint after reading the punchline.
    In this paper, we aim to test the robustness of the model by applying the
model on a new context, which has three new challenges compared to the pre-
vious one:

 1. The new task is on Spanish language, a language that we have no previous
    knowledge.
 2. The input texts are informal texts of Twitter users.
 3. The new task is about rating humor, thus it is a regression task.

   We did not change the model structure, its hyper-parameters, or any of the
pre-processing functions. However, in order to extract sentence embedding for
Spanish texts, we changed the pre-trained model from the English BERT-base-
uncased to a recent Spanish equivalent (BETO-uncased [2]).
   To evaluate our performance, we participated in HAHA shared task at Iber-
LEF 2021 Forum [4]. Thus the paper is also a description of the methods that
we used for the competition track. The competition task is divided into four
sub-tasks all of which target informal Spanish tweets. While the task include
four sub-tasks (Table 1), we particularly focused on the first two sub-tasks that
required the least amount of modifications on our original model.


              Table 1. The four sub-tasks at the HAHA task 2021 [4]

Sub-task Title                     Description
Humor Detection                    A binary classification of humor.
Humor Rating                       Predicting a score between 1 to 5 for a tweet as-
                                   suming it as a joke.
Humor Mechanism Classification     A multi-class classification task that predicts the
                                   mechanism by which the tweet conveys humor,
                                   such as irony, wordplay or exaggeration.
Humor Target Classification        A multi-label classification task that predicts the
                                   target or context of the tweet, such as weight,
                                   racist, sexist, etc.
2     Proposed Method
In this section, we present the three conceptual steps that lead to the proposed
method. First, we will explore the linguistic structure that we particularly fo-
cused to achieve this method. Then, we explain the overall structure of the
ColBERT model, and finally, we give a description of the changes that we made
in this paper.

2.1   Humor Structure
Many linguists theorized that humor arises from the sudden transformation of an
expectation into nothing [6]. Therefore, the structure of a joke generally includes
two or three stages of storytelling that ends with a punchline [5, 9]. Punchline as
the last part of a joke brings laughter through its incongruity to the perceiver’s
previous expectations.
   Based on Raskin’s Semantic Script Theory of Humor (SSTH) [8], humor has
the necessary condition of having two distinct related scripts opposite in nature,
such as real/unreal, possible/impossible. This is compatible with the two-staged
theory which ends with a punchline. While the punchline is related to previous
sentences, it is included as an opposition to previous lines in order to transform
the reader’s expectation of the context.

2.2   ColBERT Model
Based on the linguistic theories on the structure of humor, if one reads each
sentence of a joke separately, it will be perceived as normal and non-humorous
texts. However, if we try to comprehend all sentences together in one context or
in one line of story, the text becomes humorous. ColBERT model utilizes this
linguistic characteristic of humor in its structure.
    In short, it uses separate paths of hidden layers especially designed to extract
latent features from each sentence. In addition, it has an additional path to
extract latent features of the whole text. Hence, the neural network structure
includes one parallel path to view text as a whole and several other paths to
view each sentence separately. It is composed of a few steps (Figure 1)[1]:

 1. Separate sentences and tokenize them individually.
 2. Convert textual parts to proper numerical inputs for the neural network,
    using a pre-trained BERT based model. This step is performed individually
    on each sentence (left side in Figure 1) and also on the whole text (right side
    in Figure 1).
 3. Feed embeddings into parallel hidden layers of neural network to extract
    mid-level latent features for each path (related to context, type of sentence,
    etc). The output size is 20 for each path.
 4. Feed embeddings for the whole text in the last path, similar to previous step.
    The output size is 60. We do this as there may exist meaningful relationships,
    such as synonyms and antonyms.
                 Fig. 1. Architecture of the ColBERT method. [1]


5. Three sequential layers combine the output of previous paths of hidden layers
   in order to predict the final output. In theory, these final layers should de-
   termine the congruity of sentences and detect the transformation of reader’s
   viewpoint after reading the punchline.

2.3   Changes for the Spanish Language
In order to change the target context from formal English texts to informal
Spanish tweets, we kept the model structure and hyper-parameters as before.
However, in order to extract sentence embedding for Spanish texts, we changed
the pre-trained model from the English BERT-base-uncased to a recent Spanish
equivalent. This was a required and logical step to achieve meaningful embed-
dings. For this goal, we selected BETO model [2] from a long line of huggingface
models.
    BETO [2] is a BERT model trained on a big Spanish corpus that has a similar
size to the BERT-base model and was trained with the Whole Word Masking
technique. Compared to other BERT based models proposed for the Spanish
language (including multilingual BERT models), it achieved higher accuracy in
several selected tasks.
    We tried a few pre-processing methods for separating sentences and cleaning
the text that were previously proposed for the Spanish language, but they were
unsuccessful in achieving higher accuracy in our evaluations. Therefore, we split
the sentences as before, and did not perform any data cleaning.
2.4   The multi-class and multi-label sub-tasks
For the last two sub-tasks, we reduced multi-class and multi-label classification
to multiple regression tasks using classical one-against-all (OAA) approach. In
this approach, we kept the single-target model structure as before and added
one post-processing step to accumulate all results and predict the final value.
    For the multi-class classification task, we predicted the probability of each
class separately and used the class with the maximum predicted value as our
final prediction. For the multi-label classification sub-task, we used a threshold
(0.5) to select all labels applicable for the given text.


3     Results and Discussions
17 teams participated in HAHA 2021 shared task. Based on the official results
reported by the organizers, we managed to achieve the 2nd place for humor rating
sub-task with a very close score to the first team (0.002 difference). As reported
in Table 2, our model achieved 0.6246 score for the official test data based on
Root-Mean-Squared-Error (RMSE) metric. The first three teams are separated
by a huge gap from the rest of participants, even from the fourth team which
also used BERT language model for their predictions.


Table 2. Performance of our model in rating humor compared to other teams (Eval-
uation by RMSE)

                    Rank   Model                Score
                                                (RMSE)
                    1      UMUTeam              0.6226
                    2      ColBERT              0.6246
                    3      Jocoso               0.6296
                    4      BERT4EVER            0.6587
                    ...    ...                  ...


   As we mentioned earlier, HAHA 2021 also organized three more sub-tasks
about computational analysis of humor, all of which are evaluated using F1-
Score. While our focus was on the task of humor rating, we managed to use our
model to participate in those sub-tasks. Table 3 compares our performance with
other top-ranking teams.
   For the binary humor detection sub-task, we did not have to perform any
extra steps or modifications on our method. Our proposed method was able
to achieve 3rd place among 17 teams with 0.8696 F1-score. Compared to our
previous evaluations for humor detection on formal English texts, this is a 10
percent drop in F1-score, which can be attributed to the informality of texts,
our lack of knowledge on Spanish language, and weaker cleaning and sentence
embedding methods.
   For the last two sub-tasks, we managed to achieve 7th and 5th places, ac-
cordingly (Table 3). This lower performance compared to the first two sub-tasks
can largely be attributed to the simple taken approach (OAA) and the naive
threshold for all classes (0.5).


Table 3. Comparison of our model with other high-scoring models for the rest of
sub-tasks (all evaluated by F1-score)

     Model                Humor Detection Humor Mechanism Humor Target
                          (binary)        (multi-class)   (multi-label)
     Jocoso               0.8850          0.2916          0.3578
     icc                  0.8716          0.2522          0.3110
     kuiyongyi            0.8681          0.2187          0.2836
     BERT4EVER            0.8645          0.3396          0.4228
     UMUTeam              0.8544          0.2087          0.3225
     Baseline             0.6619          0.1001          0.0527
     ColBERT              0.8696 (3rd)    0.2060 (7th)    0.3099 (5th)


4   Conclusions
In this paper, we showed robustness of our proposed method for computational
humor, through its performance evaluation on rating and detecting humor in
informal Spanish tweets. The new context has three important challenges com-
pared to the previous task of detecting humor in formal English texts.
    For the new context, we did not change any part of our proposed method and
only replaced the chosen pre-trained model that we used for generating sentence
embeddings from an to a state-of-the-art Spanish model. For the multi-class and
multi-label classification sub-tasks, we reduced them to multiple regression tasks
and achieved the final output using one-against-all (OAA) approach.
    We participated at HAHA 2021 competition at IberLEF 2021 Forum, and
competed alongside several teams from all over the world. Based on the official
results, our general method was able to achieve the second place for rating humor
and third place for detecting humor in Spanish tweets.

5   Acknowledgements
We would like to thank the organizers of HAHA shared task for supporting
research interest in the field of computational humor in the past few years.

References
 1. Annamoradnejad, I., Zoghi, G.: Colbert: Using bert sentence embedding for humor
    detection. arXiv preprint arXiv:2004.12765 (2020)
 2. Cañete, J., Chaperon, G., Fuentes, R., Ho, J.H., Kang, H., Pérez, J.: Spanish pre-
    trained bert model and evaluation data. In: PML4DC at ICLR 2020 (2020)
 3. Chen, P.Y., Soo, V.W.: Humor recognition using deep learning. In: Proceedings of
    the 2018 Conference of the North American Chapter of the Association for Com-
    putational Linguistics: Human Language Technologies, Volume 2 (Short Papers).
    pp. 113–117 (2018)
 4. Chiruzzo, L., Castro, S., Góngora, S., Rosá, A., Meaney, J.A., Mihalcea, R.:
    Overview of HAHA at IberLEF 2021: Detecting, Rating and Analyzing Humor
    in Spanish. Procesamiento del Lenguaje Natural 67(0) (2021)
 5. Eysenck, H.J.: The appreciation of humour: an experimental and theoretical study
    1. British Journal of Psychology. General Section 32(4), 295–309 (1942)
 6. Kant, I.: Kritik der urteilskraft, vol. 39. Meiner (1913)
 7. Purandare, A., Litman, D.: Humor: Prosody analysis and automatic recognition for
    f* r* i* e* n* d* s. In: Proceedings of the 2006 Conference on Empirical Methods
    in Natural Language Processing. pp. 208–215 (2006)
 8. Raskin, V.: Semantic mechanisms of humor, vol. 24. Springer Science & Business
    Media (2012)
 9. Suls, J.M.: A two-stage model for the appreciation of jokes and cartoons: An
    information-processing analysis. The psychology of humor: Theoretical perspec-
    tives and empirical issues 1, 81–100 (1972)
10. Taylor, J.M., Mazlack, L.J.: Computationally recognizing wordplay in jokes. In:
    Proceedings of the Annual Meeting of the Cognitive Science Society. vol. 26 (2004)
11. Weller, O., Seppi, K.: Humor detection: A transformer gets the last laugh. arXiv
    preprint arXiv:1909.00252 (2019)
12. Yang, D., Lavie, A., Dyer, C., Hovy, E.: Humor recognition and humor anchor ex-
    traction. In: Proceedings of the 2015 Conference on Empirical Methods in Natural
    Language Processing. pp. 2367–2376 (2015)

</pre>