<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>RoMa at HAHA-2021: Deep Reinforcement Learning to Improve a Transformed-based Model for Humor Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mariano Rodriguez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Reynier Ortega-Bueno</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Rosso</string-name>
          <email>prossog@prhlt.upv.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>PRHLT Research Center, Universitat Politecnica de Valencia</institution>
          ,
          <addr-line>Valencia</addr-line>
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad de Oriente</institution>
          ,
          <country country="CU">Cuba</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we describe our system we participated in the shared task \Humor Analysis based on Human Annotation (HAHA) at IberLEF-2021 with. Our system relies on data representations learned through ne-tuned neural language models. The representations are used to train a Siamese Neural Network (SNN) which learns to verify whether or not a pair of tweets belong to the same or distinct classes. A key point in our model is the heuristic used to create the pair of messages in the training and test phases. For that, we used a Deep Reinforcement Learning (DRL) strategy that aims at identifying a set of optimal prototypes in each class. In general, the results achieved are encouraging and give us a starting point for further improvements.</p>
      </abstract>
      <kwd-group>
        <kwd>Humor recognition</kwd>
        <kwd>Deep Reinforcement Learning</kwd>
        <kwd>Siamese Neural Networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Humor is an important part of human communication. Time ago a philosopher
had a conception establishing that humor, deep down, is a type of catharsis that
makes existence more bearable, like art. He said:</p>
      <p>
        Perhaps I know best why it is man alone who laughs; he alone su ers so
deeply that he had to invent laughter. (Nietzsche, 1888)
Humor comes from a variety of sources, making it a real challenge to design a
computational model for addressing its automatic recognition on texts.
Sometimes humorous texts use wordplays as an engine to provoke laughter, in other
cases they appeal to social, cultural, and commonsense backgrounds to produce
funny response. In other cases it makes use of irony, satire, hyperbole and other
gurative devices to achieve its goals. The di culty of the task increases when
language is short and informal like the one used in Twitter. All this raises
interest in humor recognition tasks within Natural Language Processing (NLP)
and Human-Computer Interaction (HCI). On this line, the HAHA Task: Humor
Analysis based on Human Annotation at IberLEF-2021 aims at computationally
recognizing humor in Spanish tweets [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        In this paper, we adapt and re-evaluate the RoMa system [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] that we
employed int Task 7 at SemEval21 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] for addressing the task of humor recognition
in Spanish tweets. This architecture combines learned representations with an
SNN [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to learn a metric for discriminating whether or not a pair of tweets
belong to the same class (i.e., humorous tweets and not humorous tweets). Also,
we considered applying a variation by introducing Deep Reinforcement Learning
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] within its structure. We bring empirical evidence through experiments on the
human-annotated datasets that the DRL-based strategy outperforms the
original version of the RoMa system. The source code of the work is public available
on GitHub: https://github.com/mjason98/haha21
      </p>
      <p>The paper is organized as follows: in Section 2 we brie y introduce the
proposed task and the main ideas that motivated our proposal. Section 3 presents
our proposed architecture and gives details about their modules. The
experiments and results are described in Section 4. Finally, in Section 5 we present our
conclusions and interesting directions for future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>The 2021 edition of the HAHA shared task, as part of IberLEF-2021, aims at
classifying humorous tweets written in Spanish. The rst subtask is about Humor
Detection and proposes the problem of determining whether a tweet is funny or
not. An annotated corpus of tweets in Spanish was provided to carry out this
task. The dataset is composed of 24000 tweets, 9253 labeled as humor, and 14747
as not humor. In this work, we focused only on this subtask.</p>
      <p>
        In previous works, speci cally in HaHakathon at Semeval21, with the team
named RoMa, we presented a system to address the problem of humor detection,
based on SNN. The Siamese model (SiaNet) required a pair of messages as input
in both training and test phases. In that work, we transformed the classi cation
problem into a veri cation one, in which data is classi ed by comparisons with
two reference sets employing the SiaNet model. This algorithm can be split into
three main steps:
1) A Pretrained Language Model (PLM) was used to represent the tweets as
vectors. We used a transformer model [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] as PLM and ne-tune it on the
humor dataset to achieve a target-dependent representation of the data.
2) Vectors are separated into two classes: the rst one contains the vectors
corresponding to tweets annotated as humorous, and the second contains
the opposite set. Then, a clustering algorithm is applied and a prototype set
is extracted for each class.
3) Finally, each element from the dataset is paired with a prototype generating
negative and positive examples (i.e., pairs from the same and opposite classes
respectively). These pairs are used for training the SiaNet model. When this
training process ends, the system is used to classify unlabeled tweets by
giving them the label of the closest prototype. The closeness of two messages
is measured by a distance function, in this case, the SiaNet.
      </p>
      <p>The interest in Siamese architectures remains strong, so we decided to test
the performance of this algorithm (RoMa) in the HAHA humor task introducing
a variation in the second step.</p>
      <p>
        The clustering algorithm used in RoMa was a graph-based method, and one
of the challenges with this approach was the threshold tuning in the graph
construction. In the RoMa system, we build a graph of -distance, analogous to the
-similarity graphs proposed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], for the humor and non-humor classes. The
nodes in the graphs represent the tweets from the training set and the edges
joining two nodes are weighted with the distance between them. In the -distance's
graph the edges with weights greater than the threshold are removed,
allowing only the closest representations to be in the same connected subgraph.
Afterwards, communities are detected on the -distance's graphs employing the
InfoMap algorithm [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] which is based on the map equation [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This algorithm
reports a set of subgraphs whose nodes are paired with a ow value. For each
subgraph, the k nodes with the highest ow value are selected as prototypes.
      </p>
      <p>The threshold is adjusted until the number of extracted prototypes lies within
a range. This interval is de ned by two integers. In this work, as in RoMa, 200
and 300 were used. It is a fact that the performance of SiaNet in the task will be
linked to the quality and quantity of the extracted prototypes. A variation in the
threshold carries a di erent graph structure and, therefore, di erent prototypes.
We found out that the SiaNet model is very sensitive to these variations.</p>
      <p>In this work we propose, to replace the clustering algorithm by a Deep
QLearning approach, where the number of prototypes is controlled by an upper
limit value set in advance. One advantage of this approach is that the
relationships among tweets are learned, in contrast to the graph method where the
Euclidean distance was used to weigh the edges.
3</p>
    </sec>
    <sec id="sec-3">
      <title>System Overview</title>
      <p>We keep the rst and third steps as well the modular schema of the RoMa
architecture. This is the composition of an encoder module (Encoder ) and a prediction
module (Classi er ), which are trained independently. We replace the
clusteringbased approach from the prototype selection phase by a Deep Q-Network (DQN)
method. In the following subsections we provide the most relevant details of our
system.
3.1</p>
      <sec id="sec-3-1">
        <title>Encoder Module</title>
        <p>
          The Encoder plays an important role because it is concerned with learning an
abstract representation that vanishes the colinearity between its features and
compresses the textual information on a single dense vector. The core of our
Encoder's design is based on a Transformer model (TM). Our architecture di ers
from RoMa system in the pretrained TM. Particularly, in this work we used
BETO [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] since HaHakathon was a challenge based on English tweets.
        </p>
        <p>For ne-tuning the TM-based encoder we add an intermediate layer that
receives the vectors from the output sequence of the TM. On this sequence of
vectors, a normalized sum operation is applied. Then, an output layer makes
the nal prediction for the targeted task. For each layer of the TM, a di erent
learning rate is set up, increasing it using a multiplier while the neural network
gets deeper. This multiplier increases 0.1 points from a layer Li to another Li+1.
We use this dynamic learning rate to keep most information from pre-training
at shallow layers and biasing the deeper ones to learn about the speci c tasks.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Prototype Selection through Deep Q-Learning</title>
        <p>
          The task of prototype selection is addressed using Deep Q-Learning [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], which is
a model-free reinforcement learning technique [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. The reinforcement learning
algorithm, which is called the agent, learns by maximizing rewards in some
environment. At each time step t = (0; 1; 2; :::; n), the agent receives as input
data the state st, which is a snapshot of the environment. Then, the agent
evaluates that data and takes the action at, from a set of possible actions given
its current state. At the next time step, the environment gives a reward, rt+1,
to the agent and change itself to a new state st+1. The rewards are the only
learning signals the agent is given. Maximizing the total reward that the agent
receives is its goal.
        </p>
        <p>Environment and States Our environment works with the vector
representation of the tweets produced by the Transformer model. For every time step, it
gives to the agent the k-th vector vk and the currently candidate prototypes pti
in a list as state st:
st = (pt1 ; pt2 ; :::; ptM ; vk)
p0i = 0
(1)
where M is the total number of prototypes allowed, and k = 1 + (t mod T ) with
T the length of the training set.</p>
        <p>
          Agent and Actions The agent was designed using two layers of Multi-head
Attention mechanism [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] and a feed-forward layer on top with the output size of
the total number of actions. We nd the use of Multi-head Attention, in this case,
convenient since the state st is given by a list of vectors where the agent must
be related with all the current prototypes in order to learn relations allowing to
perform the best-rewarding action. We provide to the agent a total of M + 1
actions. In the rst M actions ai we replace pti with the vector vt, and in the
last action aM+1 we ignore that vector.
        </p>
        <p>Reward The reward was designed as follow:
where W is a hyperparameter and ACC a metric that measures the accuracy of
predicting the humor class on the validation set using the prototypes fpt+1i g.
The prediction operation can be described by equation 3, using the Euclidean
distance as Df instead of the SiaNet model.
It is well known, that the training phase of deep reinforcement learning
algorithms often is strongly time-consuming. For that reason, we introduce the
parameter W in the reward design. The algorithm starts with W equal to 10%
of the training data size. We model a reward schedule such that for a set of
iterations index, W is increased by itself unless is greater than T . When the
environment produces a reward di erent than 0, is conducted an environment
reset, that is all the candidate prototypes are erased.</p>
        <p>The increasing behavior of W represents an increment in the learning di
culty. In this way, we induce the agent to archive good results with few examples
instead of waiting until all vectors are processed to get a reward. Therefore, fewer
training iterations are needed.</p>
        <p>
          Nevertheless, the number of zero rewards is still huge, which motivates us to
introduce an Intrinsic Curiosity Module (ICM) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] in our system at training time.
This imbues the agent with a sense of curiosity, facing the sparse reward problem
since the rewards in the environment are sparsely distributed. The process for
training the agent uses the strategies proposed in [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] with initially -greedy
policy and then softmax function.
        </p>
        <p>
          After the agent nishes its training and the environment reset, the schedule
of W is ignored and k takes zero value. Then, interactions between the agent and
the environment start over until all vectors from the training set are processed.
Finally, the prototypes within the last environment state are used by SiaNet in
the third step. Remark that in contrast to RoMa, we do not run our prototype
selection method separately in the humor and not humor classes. In this case, it
accepts both classes as input, letting it to decide the number of prototypes for
each group from the total number.
The classi cation module architecture lies on SiaNet. This network consists of
two input messages and one output that indicate how distant they are according
to their representation [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Both messages are encoded by using the ne-tuned
Transformers model (see Section. 3.1). Later, each input is passed through two
dense layers with 32 and 16 hidden neurons each. Then, the representations
of the messages are compared to each other through a distance metric. The
speci c features the model learns to extract, make that message representations
corresponding to opposite classes have a distance greater than the threshold
de ned in the loss function used. Particularly, we used the Contrastive Loss [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
with an empirical value threshold of 0.85.
        </p>
        <p>
          For training SiaNet, the dataset needs to be processed for constructing pairs
of messages from the same class and pairs of messages from distinct classes. The
process to compound the pairs used remains equal to the one in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. During the
test phase, given an unlabeled tweet, we obtain the encoding of z by using the
Encoder. After that, we predict its label using the next equation:
y^ = arg miinfDf (z; xi;j )g
(3)
where xi;j is the jth prototype in the class i, i 2 fno humor; humorg, and Df
our SiaNet.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Results</title>
      <p>In this edition of HAHA, the contest had development and evaluation phases.
Submissions of system predictions were allowed in both phases, but the o cial
results of the competition were only those from the second one. We use a 10-fold
validation strategy for hyperparameters tuning during the whole contest.</p>
      <p>
        An epoch in the training process of the agent nishes once an environment
reset arrives. In this work, we use 2000 epochs. All learnable parameters for the
PLM, the agent, the ICM and SiaNet were trained using Adam [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. As part of
the experiments, our system was tested on the classi cation tasks in previous
editions of HAHA. In Table 1 can be observed the performance of our system
in the 2018 and 2019 competitions, where F1 was used as evaluation metric.
Columns prototypes and icm represent the maximum number of prototypes and
the use of the ICM strategy. The last row shows the results using the graph-based
clustering algorithm (GBC) used by the RoMa system.
      </p>
      <p>Looking at Table 1, the results using ICM are similar as when it is omitted.
We hypothesize that W and the schedule over this parameter mitigates the
malicious consequences of the sparse reward in this particular environment-reward
pair design. Another hypothesis lies in the similarity of the states, produced by
the vector representation provided by the ne-tuned PLM. Both ideas need a
deeper analysis which we plan to explore in future works.</p>
      <p>During the experiments, increased stability in SiaNet's training was observed.
This is, small changes in the hyperparameters did not impact the model's
learning curve. This stability e ect only occurred when prototypes produced by the
DQN agent were used instead of those generated by the RoMa method. Also,
better results were found using fewer prototypes. For the o cial submission, 52
prototypes with ICM was the best setting for our system.
4.1</p>
      <p>O</p>
      <p>cial Results
For the evaluation phase we submitted predictions made by our system as well as
predictions from the RoMa system. The one based on Deep Q-Network achieved
the best results among our submissions, ranking us in 7th place out of 17 teams
with F1-Score of 0.8583, whereas the best system reached 0.8850.</p>
      <p>An interesting fact about the performance of our system was the negative
e ect of increasing the number of prototypes. This e ect should be the opposite
since having more references to compare an unseen message must yield more
steady predictions. We hypothesize that one cause is the agent-actions design.
In our model, the number of actions is equal to the number of prototypes plus
one, which implies the use of considerably large action space and induces the
agent to learn a more complex strategy.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this work, we presented a model for addressing the humor recognition in
Spanish tweets. The model employs the deep representations learned by
Transformers models for encoding the tweets. These representations are used by a
Siamese Neural Network combined with a Deep Q-Network prototype selection
method. A key point of our system was the schedule creation within the
reward design to reduce training time. The achieved results show that our system
outperformed the original version of RoMa, based on graph clustering.</p>
      <p>As future work, we will analyze why ICM did not provide the expected
improvement during our training process since lots of zero rewards were present.
Also, we propose through experimentation with a non- xed size action space, to
prove the hypothesis of the agent-actions design problem presented in Section
4.1. Another direction we plan to address is exploring the use of more robust
Deep Reinforcement Learning algorithms and reward policies.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The work of the second and third authors was in the framework of the research
project MISMIS-FAKEnHATE on MISinformation and MIScommunication in
social media: FAKE news and HATE speech (PGC2018-096212-B-C31), funded
by Spanish Ministry of Science and Innovation, and DeepPattern
(PROMETEO/2019/121), funded by the Generalitat Valenciana.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bromley</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bentz</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guyon</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , LeCun, Y.,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , Sackinger, E.,
          <string-name>
            <surname>Shah</surname>
          </string-name>
          , R.:
          <article-title>Signature Veri cation Using a \siamese" time delay Neural Network</article-title>
          .
          <source>International Journal of Pattern Recognition and Arti cial Intelligence</source>
          <volume>7</volume>
          (
          <issue>04</issue>
          ),
          <volume>669</volume>
          {
          <fpage>688</fpage>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Burda</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Edwards</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pathak</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Storkey</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Efros</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          :
          <article-title>Largescale Study of Curiosity-Driven Learning</article-title>
          . arXiv preprint arXiv:
          <year>1808</year>
          .
          <volume>04355</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Can~ete, J.,
          <string-name>
            <surname>Chaperon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fuentes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ho</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
          </string-name>
          , J.:
          <article-title>Spanish PreTrained BERT Model and Evaluation Data</article-title>
          .
          <source>In: PML4DC at ICLR</source>
          <year>2020</year>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castro</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gongora</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meaney</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mihalcea</surname>
          </string-name>
          , R.: Overview of HAHA at IberLEF 2021:
          <article-title>Detecting, Rating and Analyzing Humor in Spanish</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>67</volume>
          (
          <issue>0</issue>
          ) (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Edler</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anton</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosvall</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The MapEquation Software Package</article-title>
          . URL: https://mapequation. org (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Garcia</surname>
            ,
            <given-names>R.J.G.</given-names>
          </string-name>
          :
          <article-title>Algoritmos de Agrupamiento sobre Grafos y su Paralelizacion</article-title>
          .
          <source>Ph.D. thesis</source>
          , Universidad Jaume I (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chopra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , LeCun, Y.:
          <article-title>Dimensionality Reduction by Learning an Invariant Mapping</article-title>
          .
          <source>In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)</source>
          . vol.
          <volume>2</volume>
          , pp.
          <volume>1735</volume>
          {
          <fpage>1742</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A Method for Stochastic Optimization</article-title>
          . In: Bengio,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>LeCun</surname>
          </string-name>
          , Y. (eds.) 3rd
          <source>International Conference on Learning Representations, ICLR</source>
          <year>2015</year>
          , San Diego, CA, USA, May 7-
          <issue>9</issue>
          ,
          <year>2015</year>
          , Conference Track Proceedings (
          <year>2015</year>
          ), http://arxiv.org/abs/1412.6980
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Deep Reinforcement Learning: An Overview</article-title>
          .
          <source>arXiv preprint arXiv:1701.07274</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Meaney</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Wilson,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Chiruzzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Lopez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Magdy</surname>
          </string-name>
          , W.:
          <article-title>SemEval 2021 Task 7: Hahackathon, Detecting and Rating Humor and O ense</article-title>
          .
          <source>In: 15th International Workshop on Semantic Evaluation</source>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mnih</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kavukcuoglu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silver</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antonoglou</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wierstra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedmiller</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Playing Atari with Deep Reinforcement Learning</article-title>
          .
          <source>arXiv preprint arXiv:1312.5602</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Rosvall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Axelsson</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bergstrom</surname>
          </string-name>
          , C.T.:
          <article-title>The Map Equation</article-title>
          .
          <source>The European Physical Journal Special Topics</source>
          <volume>178</volume>
          (
          <issue>1</issue>
          ),
          <volume>13</volume>
          {
          <fpage>23</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Sutton</surname>
            ,
            <given-names>R.S.:</given-names>
          </string-name>
          <article-title>AGB: Reinforcement Learning: An Introduction. A Bradford Book (</article-title>
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Tamayo</surname>
            ,
            <given-names>R.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bueno</surname>
            ,
            <given-names>R.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Roma at Semeval-2021 Task 7
          <article-title>:A Transformer-based Approach for Detecting and Rating Humor and Offense</article-title>
          .
          <source>In: Proceedings of the 15th International Workshop on Semantic Evaluation</source>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polosukhin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Attention is All you Need</article-title>
          .
          <source>arXiv preprint arXiv:1706.03762</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Zai</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brown</surname>
          </string-name>
          , B.: In: Deep Reinforcement Learning in Action, pp.
          <volume>223</volume>
          {
          <fpage>234</fpage>
          .
          <string-name>
            <surname>Manning Publications</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>