<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Applying Pre-trained Model and Fine-tune to Conduct Humor Analysis on Spanish Tweets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yongyi Kui</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Institute of Yunnan University</institution>
          ,
          <addr-line>Yunnan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes in detail the four subtasks of HAHA@ IberLEF 2021 [7] : Humor Analysis based on Human Annotation. Subtask 2 is a regression problem, and the other three subtasks are all text classi cation problems. The data comes from the Twitter social platform, and the language is Spanish. The classi cation problem is mainly solved by integrating the Multilingual Bert model and the LSTM model, and the regression problem is solved by the GPT-2 Model. According to the o cial evaluation results, the result of the method proposed in this paper ranks fourth, eighth, fth, and sixth on the four subtasks, respectively. For this task, I have uploaded the code to GitHub kuiyongyi for easy reference by others.</p>
      </abstract>
      <kwd-group>
        <kwd>Spanish Text Classi cation</kwd>
        <kwd>Humor Analysis</kwd>
        <kwd>Pre-trained Model</kwd>
        <kwd>Fine-tuning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Humor is a very common phenomenon in human communication. It is relatively
easy for humans to understand whether the content of a text is humorous, but
the computer can learn the characteristic information in the text, only after
learning a large amount of corpus and then detect whether the text is humorous
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Humor detection has been a relatively hot eld for many years.
Semeval2015 Task 11 proposes the in uence of gurative language such as metaphor
and irony on sentiment analysis. Semeval-2017 Task 6 requires predicting the
ranking of comedy shows based on humorous tweets of comedy shows. Both
IberEVAL 2018 and IberLEF 2019 include two subtasks: humor detection and
interest score prediction. Castro [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] established a corpus of annotated tweets,
allowing annotators to judge subjectively which tweets are humorous, and then
build a humor classi er for Spanish tweets based on the method of supervised
learning. Barbieri and Saggion [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] proposed a machine learning method based
on a set of language-driven features. Radev [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] described a method for humor
detection in cartoon subtitles. Yang [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] constructed di erent classi ers through
feature sets to detect humor. However, many existing systems use a combination
of Pre-trained Models and ne-tuning [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] to achieve humor detection.
      </p>
      <p>The rest of the paper is organized as follows. In the second part of the paper,
I give an overview of the task and the data. The third part describes the model
used in this task. In the fourth part, the experiments of the four subtasks are
described. The fth part gives the test results of the model. Finally, the sixth
part of the article is the conclusion.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Task and Data Description</title>
      <sec id="sec-2-1">
        <title>Subtasks</title>
        <p>HAHA@IberLEF2021 includes four sub-tasks, which aim to predict the humor
classi cation or humor degree of the text. The four subtasks are de ned as
follows:</p>
        <p>Subtask 1: Given the Twitter text in Spanish, the goal is to predict whether
each tweet is humorous. This is a binary classi cation problem. This question is
measured by F1 score.</p>
        <p>Subtask 2: To measure the degree of humor on the text predicted to be
humorous, this is a regression task. The evaluation standard of this subtask is
the root mean square error.</p>
        <p>Subtask 3: It is required to predict a humor mechanism category for tweets
among the twelve humor mechanisms given. This is a multi-classi cation
problem. The performance of this task will be measured using the Macro-F1 score.</p>
        <p>Subtask 4: Given tweets and fteen humorous target tags, the goal is to
predict the corresponding humorous target tags for each tweets (at least zero and
at most 15 tags), which is a text multi-label classi cation task. The evaluation
standard for this question is also Macro-F1.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Dataset</title>
        <p>
          The Dataset [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] provided by HAHA@IberLEF 2021 includes three Subdatasets:
haha 2021 train (24,000 tweets), haha 2021 dev (6,000 tweets), and haha 2021 test
(6,000 tweets).
        </p>
        <p>Each record in haha 2021 train consists of twelve columns of data, which are
respectively Id, the tweets corresponding to each Id (text includes punctuation
and emoticons), non-humorous voting, the number of votes for each of the ve
levels, is humor, humor rating, humor mechanism, and humor target.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Data cleaning</title>
        <p>Since the text in the HAHA Dataset comes from Twitter social media, the length
of the text varies, most of the text is generally short, and the content or format
of the tweets is informal, (including spelling errors, character, and vocabulary
repetition). It is necessary to clean up the tweets in the Dataset, because in this
way the Language Model can predict the content to be covered based on the
context and semantics. The data cleaning methods involved in this paper are as
follows:
{ Replace the repeated characters or words (three times or more) in the tweet
with a single character or word;
{ Delete Delete the emoticons appearing in the text;
{ Delete Delete the HTML tags that appear in the text;
{ Delete Replace the newline character in the text with spaces.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>System</title>
    </sec>
    <sec id="sec-4">
      <title>Description</title>
      <p>In this section, I will give an overview of the system that implements these four
subtasks. We can simply regard the task of Humor Analysis based on Human
Annotation in IberLEF 2021 as text classi cation and regression problems, given
input text, predict one or more labels, and predict the humor score value of the
text.</p>
      <p>One disadvantage of the LSTM model is that the input information is
compressed into a vector with the same length as the number of LSTM memory
units, so the LSTM model cannot remember long tweets. However, the Twitter
text length of this task is very short, so I consider using LSTM to further extract
text feature information after the output of the Pre-training Model.
3.1</p>
      <sec id="sec-4-1">
        <title>Model</title>
        <p>
          Since the text of the Dataset is Spanish, this paper uses three Cross-language
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] Pre-training Models of Multilingual Bert [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] , XLM, and XLM-RoBerta,
as well as models such as XLNet, Albert, and GPT-2 to extract text feature
information. Here is a brief introduction to each Pre-training Model.
        </p>
        <p>
          The Bert model uses Masked Language Modeling [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] to train the
bidirectional Transformers to generate deep bidirectional language representations.
After the Pre-training phase is over, you just need to add an output layer for
ne-tuning. This time, Bert-base-multilingual-uncased is used. This model uses
a corpus of one hundred and two languages, including Spanish, during the
Pretraining phase.
        </p>
        <p>Albert uses Transformer and GELU activation function. Albert uses such
as parameter sharing and matrix decomposition to reduce the number of model
parameters. Albert uses Sentence Order Prediction Loss to replace Next Sentence
Prediction Loss.</p>
        <p>XLM is a Cross-language model, similar to Bert, it is also a Masked
Language Model, and its Pre-training method is the next token prediction. In this
task, XLM-mlm-tlm-xnli15-1024 is used. The model uses fteen corpus including
Spanish for Cross-language sentence training.</p>
        <p>XLM-Roberta is a model trained on a corpus of one hundred di erent
languages. Unlike XLM, it doesn't require lang tensors to understand which
language is used and should be able to determine the correct language from the
input ids. In this task, XLM-RoBerta-base is used.</p>
        <p>
          XLNet mainly optimized the Bert model in three aspects. First, the
Autoencoding Model [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] is replaced with an Autoregressive [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] Model, and then
the Transformer of the Bert model is improved with Transformer-xl, and XLNet
uses a dual-stream [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] attention mechanism. In the Pre-training phase, the Next
Sentence Prediction method of the Bert model is discarded.
        </p>
        <p>The GPT-2 model is usually padding the inputs on the right and was trained
using causal language modeling targets. The GPT-2 model and the Bert model
are constructed through the decoder and encoder modules of the transformer,
respectively.
3.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Parameter setting</title>
        <p>In subtask 1, the optimizer chooses AdamW, train for 50 epochs with a 3e-5
learning rate and a 32 batch size. And the weight decay parameter is set to 1e-2,
the maxlen parameter value is set to 64, and the loss function is selected as
Crossentropy.</p>
        <p>In subtask 2, the loss function used is mse loss, the learning rate is set to
1e-5, the maxlen parameter value is set to 64, batch size is set to 32, and after
every sixty-four steps of training, a veri cation is carried out.</p>
        <p>In subtask 3 and subtask 4, the learning rate and weight decay parameters
are both 5e-6 and 1e-2, the epoch is 50 and 100 respectively, the loss
function uses CategoricalCrossentropy and BinaryCrossentropy respectively, both
the drop out parameter and the optimizer are set to 0.5 and Adam.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <sec id="sec-5-1">
        <title>Subtask 1</title>
        <p>Divide haha 2021 train according to the ratio of 4/1 as the Training Dataset
and Validation Dataset of the models in subtask 1. The actual Training Dataset,
Validation Dataset, and Testing Dataset lengths of subtask 1 are 19200, 4800,
and 6000 respectively. Next, the data is cleaned, word segmented, and coded
before it is ready to be input into the model.</p>
        <p>In subtask 1, First, I used three Pre-trained Models (Albert-base-v2,
XLNetbase-cased and Bert-base-multilingual-uncased) for text binary classi cation.
The results show that the Multilingual Bert model is used to obtain the highest
score in the text classi cation problem of Spanish materials. The F1 score of
this model on the Validation Dataset is 0.8712, and then add a 4-layer
unidirectional LSTM network after the model to further extract text features, and
nally input the LSTM results into the fully connected layer for classi cation.
This combination has an F1 score of 0.8785 on the Validation Dataset, compared
to a single pre-trained model, the score increased by 0.0073. Thus, the prediction
result of this combined model is the answer I nally submitted on subtask 1.
The performance of these models on the Validation Dataset is shown in Table 1.
4.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Subtask 2</title>
        <p>There are 9253 pieces of data available for subtask 2 in the haha 2021 train.
Divide the data according to the method of subtask 1. The Training Dataset
length of subtask 2 is 7402, the Validation Dataset length is 1851, and the
Testing Dataset length is 6000.</p>
        <p>In subtask 2, I compared the performance of three Cross-language
Pretraining Models and the GPT-2 model in the regression problem. These four
Pre-training Models are Bert-base-multilingual-uncased, XLM-mlm-xnli15-1024,
XLM-RoBerta-base, and GPT-2. The experimental results show that the
performance of the GPT-2 model in subtask2 is slightly better than the other three
models, therefore, I use the prediction result of the GPT-2 model as the answer
I nally submitted in subtask 2. The speci c performance of these four models
on the Validation Dataset is shown in Table 2.
4.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Subtask 3 &amp; 4</title>
        <p>In haha 2021 train, humor mechanism column and humour target column have
4800 and 1629 non-empty data respectively. Divide the data according to the
method of subtask 1. The Training Dataset lengths of subtask 3 and subtask 4
Bert-base-multilingual-uncased
Bert-base-multilingual-uncased+BiLSTM
are 3840 and 1303, respectively, the Validation Dataset lengths are 960 and 326,
respectively, and, the length of the test Dataset for both is 6000.</p>
        <p>
          After dividing the data, rst, encode the labels of the Training Dataset and
Validation Dataset in subtask 3 and subtask 4 into one hot vectors. Next, the
Dataset is processed in the same way as subtask 1. Then, use the sequential
function to build a double-layer BiLSTM [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. In the double-layer BiLSTM model,
LSTM Cells are set to 192 and 64 respectively, and the return sequences
parameter is set to True and False respectively. Subtask 3 is a multi-class classi cation
problem, while subtask 4 is a multi-label classi cation. The biggest di erence
between the two is only the activation function used in the last layer of the
network. The activation function used in subtask 3 is Softmax, while subtask 4
uses Sigmoid.
        </p>
        <p>The results of subtask 1 show that the Multilingual Bert model performs
slightly better than Albert and XLNet in the Spanish Twitter text classi cation
task. So in subtask 3 and subtask 4, I use the multilingual Bert model as the
basis. First, input the processed data into the Multilingual Bert model; next,
input the result of the Bert model into the two-layer BiLSTM network to further
extract features; nally, input the result of the LSTM network into the fully
connected layer (The number of neurons in the fully connected layer of subtasks
3 and 4 are twelve and fteen respectively) for classi cation. Subtask 3 nally
outputs a twelve-dimensional vector, taking the element with the largest value
upwards to 1, and setting the remaining elements to 0 (The 1 and 0 in subtask
3 and 4 respectively indicate that it is predicted to be or not predicted to be a
certain category). The output of subtask 4 is a fteen-dimensional vector, and the
threshold is set to 0.5. For each element in the output vector, if it is greater than
the threshold, it is set to 1 upwards, otherwise, it is set to 0 downwards. I used
the prediction results of the combined model of Multilingual Bert and BiLSTM
as the answers I nally submitted on the last two subtasks. The performance of
my solution on veri cation sets of subtask 3 and subtask 4 is shown in Table 3.
5</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Results</title>
      <p>Among all the teams participating in this task, the result of my solution ranked
fourth, eighth, fth, and sixth among the four subtasks. Table 4 lists the scores
of the best performing teams in the four subtasks of the IberLEF 2021 HaHa
competition, and the scores of my method on the o cial Testing Dataset.
This paper describes the data processing, the comparison of Pre-trained
Models, and the nal model construction in the HAHA@ IberLEF2021 challenge.
Although the solution I proposed has achieved good results, it is undeniable
that there is still a lot of room for improvement. Due to time constraints, I can
not do error analysis at present. In my future work, I will conduct a detailed
error analysis to understand the limitations of the program.</p>
      <p>
        In future work, rst of all, we can try to extract the emoticon information
in tweets instead of deleting it directly. If both the emoticons and the text
information can be extracted, it will promote humor classi cation. Secondly,
during the experiment, I found that the model had over tting problems. For
this reason, I will try to use transfer learning [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to make improvements in the
future. Finally, the distribution of data of each category in subtask 3 and subtask
4 is not balanced, especially in subtask 4. For this reason, I will consider dealing
with unbalanced data by setting the weight of the loss function.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Azhar</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khodra</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          :
          <article-title>Fine-tuning pretrained multilingual bert model for indonesian aspect-based sentiment analysis (</article-title>
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Barbieri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saggion</surname>
          </string-name>
          , H.:
          <article-title>Automatic detection of irony and humour in twitter</article-title>
          . In: ICCC. pp.
          <volume>155</volume>
          {
          <issue>162</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Deep learning of representations for unsupervised and transfer learning</article-title>
          .
          <source>In: Proceedings of ICML workshop on unsupervised and transfer learning</source>
          . pp.
          <volume>17</volume>
          {
          <fpage>36</fpage>
          . JMLR Workshop and Conference Proceedings (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Si</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Palm: Pretraining an autoencoding&amp;autoregressive language model for context-conditioned generation (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bohnet</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mcdonald</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simes</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andor</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maynez</surname>
          </string-name>
          , J.:
          <article-title>Morphosyntactic tagging with a meta-bilstm model over context sensitive token encodings</article-title>
          .
          <source>In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Castro</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cubero</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garat</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moncecchi</surname>
          </string-name>
          , G.:
          <article-title>Is this a joke? detecting humor in spanish tweets</article-title>
          . In: Springer International Publishing (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castro</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gongora</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meaney</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mihalcea</surname>
          </string-name>
          , R.: Overview of HAHA at IberLEF 2021:
          <article-title>Detecting, Rating and Analyzing Humor in Spanish</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>67</volume>
          (
          <issue>0</issue>
          ) (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castro</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Haha 2019 dataset: A corpus for humor analysis in spanish</article-title>
          .
          <source>In: Proceedings of The 12th Language Resources and Evaluation Conference</source>
          . pp.
          <volume>5106</volume>
          {
          <issue>5112</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duan</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gong</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qin</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Codebert: A pre-trained model for programming and natural languages (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gong</surname>
            ,
            <given-names>X.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>J.X.</given-names>
          </string-name>
          , Zhang, T.:
          <article-title>Sentiment analysis using autoregressive language modeling and broad learning system</article-title>
          .
          <source>In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anantharaman</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Larger-scale transformers for multilingual masked language modeling (</article-title>
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ye</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>H.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bing</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yan</surname>
          </string-name>
          , R.:
          <article-title>Unsupervised domain adaptation of a pretrained cross-lingual language model</article-title>
          . arXiv preprint arXiv:
          <year>2011</year>
          .
          <volume>11499</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Human behavior recognition based on attention mechanism</article-title>
          .
          <source>In: 2020 International Conference on Arti cial Intelligence and Education (ICAIE)</source>
          . pp.
          <volume>103</volume>
          {
          <fpage>107</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Mihalcea</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strapparava</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Making computers laugh: investigations in automatic humor recognition</article-title>
          .
          <source>In: Conference on Human Language Technology Empirical Methods in Natural Language Processing</source>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stent</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tetreault</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pappu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iliakopoulou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chanfreau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , de Juan,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Vallmitjana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Jaimes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          , et al.:
          <article-title>Humor in collective discourse: Unsupervised funniness detection in the new yorker cartoon caption contest</article-title>
          .
          <source>arXiv preprint arXiv:1506.08126</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lavie</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
          </string-name>
          , E.:
          <article-title>Humor recognition and humor anchor extraction</article-title>
          .
          <source>In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>