U O U P V2 at HAHA 2019: BiGRU Neural
    Network Informed with Linguistic Features for
                Humor Recognition

          Reynier Ortega-Bueno1 , Paolo Rosso2 , and José E. Medina Pagola3
      1
       Center for Pattern Recognition and Data Mining, Universidad de Oriente,
                              Santiago de Cuba, Cuba
                          reynier.ortega@cerpamid.co.cu
    2
      PRHLT Research Center, Universitat Politècnica de València, Valencia, Spain
                                  jmedinap@uci.cu
                 3
                   University of Informatics Sciences, Havana, Cuba
                                prosso@dsic.upv.es


          Abstract. Verbal humor is an illustrative example of how humans use
          creative language to produce funny content. We, as human being, access
          to humor or comicality with the purpose of projecting more complex
          meanings which, usually, represent a real challenge, not only for com-
          puters, but for humans as well. For that, understanding and recognizing
          humorous content automatically has been and continue being an im-
          portant issue in Natural Language Processing (NLP) and even more in
          Cognitive Computing. In order to addressing this challenge, in this pa-
          per we describe our U O U P V2 system developed for participating in the
          second edition of the HAHA (Humor Analysis based on Human Anno-
          tation) task proposed at IberLEF 2019 Forum. Our starting point was
          the UO UPV system we participated in HAHA 2018 with some modifi-
          cation in its architecture. This year we explored other way to inform our
          Attention based Recurrent Neural Network model with linguistic knowl-
          edge. Experimental results show that our system achieves positive results
          ranked 7th out of 18 teams.

          Keywords: Spanish Humor Classification, BiGRU Neural Network, So-
          cial Media, Linguistic Features


1     Introduction
Natural language systems have to deal with many problems related with texts
comprehension, but these problems become very hard when creativity and figu-
rative devices are used in verbal and written communication. Human can easily
understand the underlying meaning of such texts but, for a computer to disen-
tangle the meaning of creative expressions such as irony and humor, it requires
much additional knowledge, and complex methods of reasoning.
    Copyright c 2019 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0). IberLEF 2019, 24 Septem-
    ber 2019, Bilbao, Spain.
        Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


    Humor is an illustrative example of how humans use creative language devices
in social communication. Humor not only serves to interchange information or
share implicit meaning, but also engages a relationship between those exposed
to the funny message. It can help people see the amusing side of problems and
can help them distance themselves from stressors. In the same way, it helps to
regulate ours emotions. Moreover, the manners in which people produce funny
content also reveal insight about their genre and personal traits.
    From a computational linguistics points of view many methods have been
proposed to tackle the task of recognizing humor from texts [16,15,22,20,1]. These
focus the attention on investigating linguistic features which can be considered as
markers and indicators of verbal humor. Also, due to the closely relation between
irony and humor, other works have studied these phenomena with the goal of
shed some light about what is common and what is distinct from linguistic point
of view. [9,1,20]
    Other methods focused on recognizing humor on messages from Twitter
based on supervised learning [1,23,7,10,26]. Deep Neural Networks based meth-
ods have obtained competitive results in humor recognition on tweets. Among
them, Recurrent Neural Networks (RNN) models and their bidirectional vari-
ant capture relevant information like long term dependencies. Also, attention
mechanism have become to be a strong tool, that has endowed the RNN model
with the capability of paying more attention to those elements that increase the
effectiveness of these networks in several tasks of NLP [13,24,28,27]
    Previous researches have focused on English language; however, for Spanish,
the availability of corpora is scarce, which limits the amount of research done
for this language. HAHA 2018 [4] became the first shared task addressing the
problem of humor recognition in Spanish content in social media. Three systems
were proposed to solve the task. The best ranked approach used a model based on
the EvoMSA tool with uses a EvoDAG method (Evolutionary Algorithm) [21].
This is a steady-state Genetic Programming system with tournament selection.
The main characteristic of EvoDAG is that the genetic operation is performed
at the root. EvoDAG was inspired by the geometric semantic crossover. The
second system [17] proposed a model based on Bidirectional Long Short Term
Memory (BiLSTM) neural networks with attention mechanism. The authors
used word2vec as input for the network and also a set of linguistically motivated
features (stylistic, structural and content, and affective ones). The linguistic
information was combined with the deep representation learned in the next to the
last layer. The results showed that incorporating linguistic knowledge improves
the overall performance. The third system presented to the shared task, trained a
method based on SVM using a bag of character n-grams of sizes 1 to 8 character
models.
    Considering the advantages of linguistic features for capturing deep linguis-
tics aspects of the language also the capability of RNN for learning deep rep-
resentation and long term dependencies from sequential data, in this paper, we
present a method that combines the linguistic features used for humor recog-
nition and an Attention based Bidirectional Gated Recurrent Unit (BiGRU)


                                        213
        Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


model. The system works with an attention layer which is applied at the top of
a BiGRU to generate a context vector for each word embedding which is then
fed to another BIGRU network. Finally, the learned representation is fed to a
Feed Forward Network (FNN) to classify whether the tweet is humorous or not.
Motivated by the results shown in [17], we explore to incorporate the linguistic
information to the model through initial hidden state in the first BiGRU layer.
    The paper is organized as follows. Section 2 presents a brief description of the
HAHA task. Section 3 introduces our system for humor detection. Experimental
results are subsequently discussed in Section 4. Finally, in Section 5 we present
our conclusions and attractive directions for future work.


2   HAHA Task and Dataset
HAHA 2019 is the second edition of the first shared task that addresses the
problem of recognizing humor in Spanish tweets. Similar to the first edition,
in the HAHA 2019 task, two subtasks were proposed. The first one, “Humor
Detection”, aims at predicting whether a tweet is a joke or not (intended humor
by the author or not) and the second one “Funniness Score Prediction”, is for
predicting a score value into 5-star ranking, supposing it is a joke.
    Participants were provided with a human-annotated corpus of 30000 Span-
ish tweets [6], divided in 24000 and 6000 for training and test respectively. The
training subset contains 9253 tweets with funny content and 14747 tweets consid-
ered as non humorous. As could be observed, the classes distribution are slightly
unbalanced, hence a difficulty is added to learn automatically the models.
    System evaluation metrics were used and reported by the organizers. They
use F1 measure on humor class for the subtask of “Humor Detection”, moreover,
precision, recall and accuracy were also reported.


3   Our U O U P V2 System
The motivation behind of our approach are firstly to investigate the capability
of Recurrent Neural Network, specifically, the Gated Recurrent Unit (GRU) [8]
to capture long-term dependencies. They showed to be able to learn the depen-
dencies in lengths of considerably large sequences. GRU networks simplified the
complexity of the LSTM networks [11], being computationally more efficient.
Moreover, attention mechanisms have endowed these networks with a power-
ful strategy to increase their effectiveness achieving better results [27,29,24,12].
Recently, the initial hidden state of the recurrent neural network has been a suc-
cessful explored way to inform the networks with contextual information [25].
Secondly, humor recognition based on features engine and supervised learning
have been well studied in previous research papers. These features have proved
to be good indicators and markers of humor in text. For these reasons, in this
approach we propose a method that enrich the Attention-based GRUs model
with linguistic knowledge which is passed to the network using the initial hid-
den state. In Section 3.1 we describe the tweets preprocessing phase. Following,


                                        214
        Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


in Section 3.2 we present the linguistic features used for encoding humorous
content. Finally, in Section 3.3 we introduce the neural network model and the
way in which linguistic features are introduced. The Figure 1 shows the overall
architecture of our system.


                Fig. 1. Overall architecture of the system U O U P V2


3.1   Preprocessing

In the preprocessing step, the tweets are cleaned. Firstly, the emoticons, urls,
hashtags, mentions, twitter-reserve words as RT (for retweet) and FAV (for fa-
vorite) are recognized and replaced by a corresponding wildcard which encodes
the meaning of these special words. Afterwards, tweets are morphologically ana-
lyzed by FreeLing [18]. In this way, for each resulting token, its lemma is assigned.
Then, the tweets are represented as vectors with a word embedding model. This
embedding was generated by using the FastText algorithm [2] from the Spanish
Billion Words Corpus [3] and an in-house background corpus of 9 millions of
Spanish tweets.


                                        215
           Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


3.2     Linguistic Features

In our work, we explored several linguistic features useful for humor recognition
in texts [16,15,22,20,1,5] which can be grouped in three main categories: Stylistic,
Structural and Content, and Affective. Particularly, we considered stylistic fea-
tures such as: length, dialog markers, quotation, punctuation marks, emphasized
words, url, emoticons, hashtag, etc. Features for capturing lexical and semantic
ambiguity, sexual, obscene, animal and human-related terms, etc., were con-
sidered as Content and Structural. Finally, due to the relation of humor with
expressions of sentiment and emotions we used features for capturing affects,
attitudes, sentiments and emotions. For more details about the features see [17].
    Notice that, our proposal did not consider the positional features used in
[17]. Moreover taking into account the close relation between irony and humor
and motivated by the results presented in [14] we include psycho-linguistic fea-
tures extracted from the LIWC [19]. This resource contains about 4,500 entries
distributed in 65 categories. Specifically, for this work we decided to use all
categories as independent features.
    Taking into account the previous features, we represent each message by one
vector VTi with dimensionality equal to 165. Also, in order to reduce and improve
this representation we applied a feature selection method. Specifically we use the
Wilcoxon Rank-sum test for paired samples. By using this test all features were
ranked considering their p − value.


3.3     Recurrent Network Architecture

We propose a model that consists in a BiGRU neural network at the word level.
Each time step t the BiGRU gets as input a word vector wt . Afterward, an
attention layer is applied over each hidden state ht . The attention weights are
learned using the concatenation of the current hidden state ht of the BiGRU and
the past hidden state st−1 in the second BiGRU layer. Finally, the target humor
of the tweet is predicted by an FFN with one hidden layer, and an output layer
with two neurons. Our overall architecture is described in the following sections.


3.4     First BiGRU Layer

In NLP problems, standard GRU receives sequentially (left to right order) at
each time step a word embedding wt and produces a hidden state ht . Each hid-
den state ht is calculated as follow:

      zt = σ(W (z) xt + U (z) ht−1 + b(z) )                     (update gate)
      rt = σ(W (r) xt + U (r) ht−1 + b(r) )                     (reset gate)
      ĥt = tanh(W (ĥ) xt + U (ĥ) ht−1 + b(ĥ) )              (memory cell)
      ht = zt ⊕ ht−1 + (1 − zt ) ⊕ ĥt


                                            216
             Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


   Where all W (∗) , U (∗) and b(∗) are parameters to be learned during training.
Function σ is the sigmoid function and ⊕ stands for element-wise multiplication.

     Bidirectional GRU, on the other hand, makes the same operations as stan-
dard GRU, but processing the incoming text in a left-to-right and a right-to-left
                                                                            →
                                                                            −
order in parallel. Thus, it outputs two hidden state at each time step ht and
←
−
ht . The proposed method uses a BiGRU network which considers each new hid-
                                                     →
                                                     − ← −
den state as the concatenation of these two hˆt = [ ht , ht ]. The idea behind this
BiGRU layer is to capture long-range and backwards dependencies simultane-
ously. In this layer is where the linguistic information is passed throughout the
                                                   −
                                                   →             ←
                                                                 −
model. We initialized both initial hidden state [h0 = g(Ti ), h0 = g(Ti )] where
g(.) receives a tweet and returns a vector which encodes contextual and linguistic
knowledge g(Ti ) = VTi .

3.5     Attention Layer
With an attention mechanism we allow the BiGRU to decide which segment
of the sentence should “attend”. Importantly, we let the model learn what to
attend on the basic of the input sentence and what it has produced so far.
Let H ∈ R2×Nh ×T the matrix of hidden states [hˆ1 , hˆ2 , . . . , hˆT ] produced by the
first BiGRU layer, where Nh is the size of the hidden state and T is the length of
the given sequence. The goal is then to derive a context vector ct that captures
relevant information and feeds it as input to the next BiGRU layer. Each ct is
calculate as follow:
             T
             X                            exp(β(st−1 , ĥi ))
      ct =           αt,t0 hˆt0 αt,i =   T
             t0 =1
                                         X
                                               exp(β(st−1 , hˆj ))
                                         j=1
β(si , hj ) = VaT ∗ tanh(Wa × [si , hˆj ])

   Where Wa and Va are the trainable attention parameters, st−1 is the past
hidden state of the second BiGRU layer and hˆt is the current hidden state. The
idea of the concatenation layer is to take into account not only the input sentence
but also the past hidden state to produce the attention weights.

3.6     Second BiGRU Layer
The goal of this layer is to obtain a deep dense representation of the message
with the intention to determine whether the tweet is humorous or not. This net-
work at each time step receives the context vector ct which is propagated until
the final hidden state sT . This vector is a high level representation of the tweet.
Afterwards, it is passed to a feed forward network (FFN) with 3 hidden layers,
and we use a softmax layer at the end as follow:


                                                    217
        Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


                           ŷ = sof tmax(W0 × dense1 + b0 )
                          dense1 = relu(W1 × dense2 + b1 )
                          dense2 = relu(W2 × dense3 + b2 )
                              dense3 = relu(W3 × sT + b3 )
    Where Wj and bj (j = 0, ...3) denote the weight matrices and bias vectors
for the last three layers with a softmax at the end. Finally, cross entropy is used
as the loss function, which is defined as:

               yi ∗ log(yˆi )
           X
    L=−
            i
   Where yi is the ground true classification of the tweet (humor vs. not humor)
and yˆi is the predicted value by the model.


4   Experiments and Results
For tuning some parameters of the proposed model we used a stratified k-fold
cross validation with 5 partitions on the training dataset. At this time, a fea-
tures selection process was not performed, therefore we consider all linguistic
features. During the training phase, we fixed some hyper-parameters, concretely:
the batch size =256, epochs=10, units of the GRU cell was defined as 256, op-
timizer=“Adam” and dropout in the GRU cells=0.3. After that, we evaluated
different subsets of linguistic features, particularly, five setting of features were
explored. The considered subsets were: N o F ea (linguistic information is not
considered), 64 F ea (the 64 best ranked features according to p−value), 128 F ea
(the 128 best ranked features according to p − value), 141 F ea (all features with
p − value ≥ 0.05) and All F ea (all linguistic features).


Table 1. Results of the U O U P V2 system using the feature selection strategy on the
training dataset.

        Features Pr h Rc h F1 h Pr noh Rc noh F1 noh F1 AVG
         No Fea 0.818 0.695 0.750 0.826  0.902 0.862  0.806
         All Fea 0.795 0.751 0.767 0.851 0.871 0.859  0.813
         Fea 64 0.829 0.717 0.763 0.838  0.901 0.867  0.815
         Fea 128 0.756 0.817 0.785 0.880 0.834 0.856  0.820
         Fea 141 0.779 0.769 0.771 0.858 0.859 0.857  0.814


    As can be observed in the Table 1, a slight improvement is obtained when
linguistic features were passed to the model. Particularly, the subset of F ea 128
achieved the best F1 score (F1 h=0.785) in the humor class. Also, when linguistic
information was missing a gradual drop of 3.5%, in term of F1-score was observed
in the humor class.
    Regarding official evaluation and results, for the system’s submission, partic-
ipants were allowed to send more than one model till a maximum of 10 possible


                                        218
        Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


runs. Taking into account the results showed in the Table 1 we submitted three
runs. The difference among them is the number of linguistic features consid-
ered for informing the U O U P V2 model. In Run1 we use the subset of features
F ea 64, for Run2 we used the subset of features F ea 128 while in Run3 the
subset of features F ea 141 was considered. We achieved 0.765, 0.773, and 0.765,
in terms of F1 score in the humor class, for Run1, Run2 and Run3 respectively.
These values are consistent with the results obtained in the training phase.

             Table 2. Official results for the Humor Detection subtask

       Rank              Team         F1           Pr        Rc         Acc
       1                 adilism      0.821        0.791     0.852      0.855
       2                 kevinb       0.816        0.802     0.831      0.854
       3                 bfarzin      0.810        0.782     0.839      0.846
       4                 jamestjw     0.798        0.793     0.804      0.842
       5                 job80        0.788        0.758     0.819      0.828
       6                 jimblair     0.784        0.745     0.827      0.822
       7                 U O U P V2   0.773        0.780     0.765      0.824
       8                 vaduvabogdan 0.772        0.729     0.820      0.811
       ...               ...          ...          ...       ...        ...
       18                premjithb    0.495        0.478     0.514      0.591
       19                hahaPLN      0.440        0.394     0.497      0.505


   Regarding the official ranking, a first glance at Table 2 allows to observe that
our best submissions (U O U P V2 ) was ranked as 7th from a total of 18 of teams.

5   Conclusion
In this paper we presented our modification of the UO UPV system (U O U P V2 )
for the task of humor recognition (HAHA) at IberLEF 2019. We only partici-
pated in the “Humor Detection” subtask and ranked 7th out of 18 team. Our
proposal combines linguistic features with an Attention-based BiGRU Neural
Network. The model consists of a Bidirectional GRU neural network with an
attention mechanism that allows to estimate the importance of each word and
then, this context vector is used with another BiGRU model to estimate whether
the tweet is humorous or not. Regarding the feature selection, the best result was
achieved when the 128 best ranked (according to p−value) features were consid-
ered. The results, also shown that adding linguistic information through initial
hidden state caused an improvement in the effectiveness based on F1-measure.

Acknowledgments
The work of the second author was partially funded by the Spanish MICINN un-
der the research project MISMIS-FAKEnHATE on Misinformation and Miscom-


                                        219
        Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


munication in social media: FAKE news and HATE speech (PGC2018-096212-
B-C31).


References

 1. Barbieri, F., Saggion, H.: Automatic Detection of Irony and Humour in Twitter.
    In: Fifth International Conference on Computational Creativity (2014)
 2. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching Word Vectors with
    Subword Information. Transactions of the ACL. 5, 135–146 (2017)
 3. Cardellino, C.: Spanish Billion Words Corpus and Embeddings (2016). [On-
    line]. Available: http://crscardellino.me/SBWCE/. Retrieved May 4, 2018, http:
    //crscardellino.me/SBWCE/
 4. Castro, S., Chiruzzo, L., Rosá, A.: Overview of the HAHA Task : Humor Anal-
    ysis based on Human Annotation at IberEval 2018. In: Rosso, P., Gonzalo, J.,
    Martı́nez, R., Montalvo, S., Carrillo De Albornoz Cuadrado, J. (eds.) Proceedings
    of the Third Workshop on Evaluation of Human Language Technologies for Iberian
    Languages (IberEval 2018) co-located with 34th Conference of the Spanish Soci-
    ety for Natural Language Processing (SEPLN 2018). pp. 187–194. CEUR-WS.org,
    Sevilla, Spain (2018)
 5. Castro, S., Garat, D., Moncecchi, G.: Is This a Joke? Detecting Humor in Span-
    ish Tweets. In: Ibero-American Conference on Artificial Intelligence. pp. 139–150
    (2016)
 6. Castro, Santiago and Chiruzzo, Luis and Rosá, Aiala and Garat, Diego and Mon-
    cecchi, G.: A Crowd-Annotated Spanish Corpus for Humor Analysis. In: Proceed-
    ings of SocialNLP 2018, The 6th International Natural Language Processing for
    Social Media (2018)
 7. Cattle, A., Bay, C.W., Kong, H.: SRHR at SemEval-2017 Task 6: Word Asso-
    ciations for Humour Recognition. In: 11th International Workshop on Semantic
    Evaluations (SemEval-2017). pp. 401–406 (2017)
 8. Chung, J., Gülçehre, Ç., Cho, K., Bengio, Y.: Empirical Evaluation of Gated
    Recurrent Neural Networks on Sequence Modeling. CoRR abs/1412.3 (2014),
    http://arxiv.org/abs/1412.3555
 9. Gibbs, R.W., Bryant, G.A., Colston, H.L.: Where is the humor in verbal irony ?
    Humor 27(4), 575–595 (2014)
10. Han, X., Toner, G.: QUB at SemEval-2017 Task 6: Cascaded Imbalanced Classifica-
    tion for Humor Analysis in Twitter. In: 11th International Workshop on Semantic
    Evaluations (SemEval-2017). pp. 380–384 (2017)
11. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation
    9(8), 1735–1780 (1997)
12. Lin, K., Lin, D., Cao, D.: Sentiment Analysis Model Based on Structure Attention
    Mechanism. In: UK Workshop on Computational Intelligence. pp. 17–27. Springer
    (2017)
13. Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based
    neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
14. Marı́a del Pilar Salas-Zárate , Mario Andrés Paredes-Valverde , Miguel Ángel
    Rodriguez-Garcı́a , Rafael Valencia-Garcı́a, G.A.H.: Automatic Detection of Satire
    in Twitter: A psycholinguistic-based approach. Knowledge-Based Systems (2017),
    http://dx.doi.org/10.1016/j.knosys.2017.04.009


                                        220
        Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


15. Mihalcea, R., Pulman, S.: Characterizing Humour: An Exploration of Features in
    Humorous Texts. In: International Conference on Intelligent Text Processing and
    Computational Linguistics. pp. 337–347 (2007)
16. Mihalcea, R., Strapparava, C.: Learning to laugh (automatically): computational
    models for humor recognition. Computational Intelligence 22(2), 126–142 (2006)
17. Ortega-Bueno, R., Muñiz, C.E., Rosso, P., Medina-Pagola, J.E.: UO UPV : Deep
    Linguistic Humor Detection in Spanish Social Media. In: Rosso, P., Gonzalo, J.,
    Martı́nez, R., Montalvo, S., Carrillo-de Albornoz, J. (eds.) Proceedings of the Third
    Workshop on Evaluation of Human Language Technologies for Iberian Languages
    (IberEval 2018) co-located with 34th Conference of the Spanish Society for Natural
    Language Processing (SEPLN 2018). pp. 203–213. CEUR-WS.org, Sevilla, Spain
    (2018)
18. Padró, L., Stanilovsky, E.: FreeLing 3.0: Towards Wider Multilinguality. In: Pro-
    ceedings of the (LREC 2012) (2012)
19. Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic inquiry and word count:
    LIWC 2001. Mahway: Lawrence Erlbaum Associates 71 (2001)
20. Reyes, A., Rosso, P., Buscaldi, D.: From humor recognition to irony detection:
    The figurative language of social media. Data and Knowledge Engineering 74, 1–
    12 (2012), http://dx.doi.org/10.1016/j.datak.2012.02.005
21. Salgado, V., Tellez, E.S.: INGEOTEC at IberEval 2018 Task HaHa: µTC and
    EvoMSA to Detect and Score Humor in Texts. In: Rosso, P., Gonzalo, J., Martı́nez,
    R., Montalvo, S., Carrillo De Albornoz Cuadrado, J. (eds.) Proceedings of the
    Third Workshop on Evaluation of Human Language Technologies for Iberian Lan-
    guages (IberEval 2018) co-located with 34th Conference of the Spanish Society for
    Natural Language Processing (SEPLN 2018). pp. 195–202. CEUR-WS.org, Sevilla,
    Spain (2018)
22. Sjobergh, J., Araki, K.: Recognizing Humor Without Recognizing Meaning. In:
    International Workshop on Fuzzy Logic and Applications. pp. 469—-476 (2007)
23. Turcu, R.A., Alexa, L., Amarandei, S.M., Herciu, N., Scutaru, C., Iftene, A.:
    #WarTeam at SemEval-2017 Task 6: Using Neural Networks for Discovering
    Humorous Tweets. In: 11th International Workshop on Semantic Evaluations
    (SemEval-2017). pp. 407–410 (2017)
24. Wang, Y., Huang, M., Zhao, L., Others: Attention-based lstm for aspect-level sen-
    timent classification. In: Proceedings of the 2016 Conference on Empirical Methods
    in Natural Language Processing. pp. 606–615 (2016)
25. Wenke, S., Fleming, J.: Contextual Recurrent Neural Networks. CoRR abs/1902.0
    (2019), http://arxiv.org/abs/1902.03455
26. Yan, X., Pedersen, T.: Duluth at SemEval-2017 Task 6: Language Models in Humor
    Detection. In: 11th International Workshop on Semantic Evaluations (SemEval-
    2017). pp. 385–389. No. 2 (2017)
27. Yang, M., Tu, W., Wang, J., Xu, F., Chen, X.: Attention Based LSTM for Target
    Dependent Sentiment Classification. In: AAAI. pp. 5013–5014 (2017)
28. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention
    networks for document classification. In: Proceedings of the 2016 Conference of
    the North American Chapter of the Association for Computational Linguistics:
    Human Language Technologies. pp. 1480–1489 (2016)
29. Zhang, Y., Zhang, P., Yan, Y.: Attention-based LSTM with Multi-task Learning
    for Distant Speech Recognition. Proc. Interspeech 2017 pp. 3857–3861 (2017)


                                        221