=Paper= {{Paper |id=Vol-2624/paper3 |storemode=property |title=Supervised Pun Detection and Location with Feature Engineering and Logistic Regression |pdfUrl=https://ceur-ws.org/Vol-2624/paper3.pdf |volume=Vol-2624 |authors=Jingyuan Feng,Özge Sevgili,Steffen Remus,Eugen Ruppert,Chris Biemann |dblpUrl=https://dblp.org/rec/conf/swisstext/FengSRRB20 }} ==Supervised Pun Detection and Location with Feature Engineering and Logistic Regression== https://ceur-ws.org/Vol-2624/paper3.pdf
                        Supervised Pun Detection and Location
                   with Feature Engineering and Logistic Regression

    Jingyuan Feng? , Özge Sevgili† , Steffen Remus† , Eugen Ruppert† , and Chris Biemann†
                          ?
                  Technische Universität Hamburg, Hamburg, Germany
                      †
                        Universität Hamburg, Hamburg, Germany
                            jingyuan.feng@tuhh.de
       {sevgili,remus,ruppert,biemann}@informatik.uni-hamburg.de


                       Abstract                               nunciation.
                                                                 With such ambiguities, they can usually achieve
     Puns, by exploiting ambiguities, are com-                a humorous or rhetorical effect. Puns can be seen
     monly used in literature to achieve a hu-                not only as jokes, but also are widely used in liter-
     morous or rhetorical effect. Previous ap-                ature and can be traced back as early as the Roman
     proaches mainly focus on machine learn-                  playwright Plautus (Pollack, 2011).
     ing models or rule-based methods, how-                      Puns can be a challenge to appreciate, even for
     ever, they have not addressed how and                    humans. It requires not only sufficient associa-
     why a pun is detected or located. Focus-                 tions but also rich English and background knowl-
     ing on this, we propose a system for rec-                edge. A well-functioning system in machine trans-
     ognizing and locating English puns. Re-                  lation may help non-English users better under-
     garding the fact of limited training data                standing literary criticism and analysis. Besides,
     and the aim of measuring how relevant a                  it may also enhance the experience of human-
     predictor and its direction of the associa-              computer interaction (Hempelmann, 2008).
     tion is, we compile a dataset and explore                   This paper focuses on the detection and lo-
     different feature sets as input for logis-               cation of homographic and heterographic puns.
     tic regression, and measure their influence              The state-of-the-art systems mostly deployed rule-
     in terms of the assigned weights. To our                 based strategies and a few purposed complex ma-
     best knowledge, our system achieves bet-                 chine learning models. Their experimental results
     ter results than state-of-the-art systems on             reached 83 % to 90 % F1 in pun detection and
     three subtasks for different types of puns               80 % in the pun location identification task on the
     respectively.                                            SemEval-2017 Task 7 dataset (Miller et al., 2017).
                                                              Our contributions are: (1) accumulating a dataset
                                                              for pun detection; (2) utilizing a logistic regression
1    Introduction                                             model to show the relations with straightforward
                                                              features on both sentence level and word level; (3)
Puns are a type of wordplay that deliberately ex-
                                                              unveiling how puns may work according to their
ploits two or more different meanings of the same
                                                              type.
or similar words in a sentence. Puns utilizing the
same word with ambiguous senses is known as ho-               2       Related Work
mographic. I used to be a banker but I lost in-
terest; in this sentence, the trigger word “inter-            Most previous work focus on pun generation and
est” could mean “curiosity” and “a fixed charge               modelling. Starting in 2004, Taylor and Ma-
for borrowing money”1 . Whereas, puns using dif-              zlack (2004) used N-grams to recognize and lo-
ferent words with similar soundings are called het-           cate wordplay. In 2015, Miller and Gurevych
erographic. Are evil wildebeests bad gnus?; here,             (2015) adapted knowledge-based Word Sense Dis-
“gnus” and “news” (/nu:z/)2 have the same pro-                ambiguation (WSD) to “disambiguate” different
                                                              meanings of puns. By processing sentences, Kao
Copyright c 2020 for this paper by its authors. Use permit-       1
                                                                  http://wordnetweb.princeton.edu/perl/webwn
ted under Creative Commons License Attribution 4.0 Interna-       2
                                                                  https://dictionary.cambridge.org/dictionary/
tional (CC BY 4.0)                                            english/
et al. (2016) built an information-theory-based          i.e. helping us to uncover hidden relationships in
computational model for interpreted puns with            the puns.
“ambiguity” and “distinctiveness”.
                                                         3   Methods
   In the pun detection part, Sevgili et al. (2017)
computed PMI scores for every pair of words              The influence of each feature can be traced based
and looked for the strong associations; Peder-           on the results. These terms explore the statistic
sen (2017) applied different settings of WSD ap-         characteristics of a pun as well as its semantic
proaches to voting for puns; Doogan et al. (2017)        properties. In general, they can be categorized into
calculated phonetic distances for heterographic          the following 4 types.
puns. Several papers also proposed supervised               Part-of-speech (POS) tag: By analyzing the
methods: Indurthi and Oota (2017) differentiated         statistics of the dataset, nouns, verbs, adjectives
puns from non-puns using bi-directional Recurrent        and proper nouns take up about 98 % of all pun
Neural Network (RNN) with word embeddings as             words. Besides, a verb-type pun word is almost
features. Xiu et al. (2017) also trained a classifier,   certain to appear at the end.
but on a self-collected training set, with features         Representation of the entire sentence: Pre-
based on WordNet (Miller, 1995) and word2vec             trained doc2vec (Le and Mikolov, 2014) and
(Mikolov et al., 2013) embeddings. Diao et al.           BERT (Devlin et al., 2019) language models are
(2019) created the PSUGA model for hetero-               used to get a representation of the sentence as the
graphic puns, which applies a hierarchical atten-        contextual background for disambiguation.
tion mechanism to learn phoneme and spelling re-            Sentence separation: Many researchers be-
lations. For pun location identification, Doogan         lieve that the pun word often locates in the latter
et al. (2017) selected words whose two senses hav-       half of a sentence. However, Sevgili et al. (2017),
ing higher similarity scores with two different con-     Oele and Evang (2017) lost structure when using
tent words. Vechtomova (2017) developed eleven           PMI and WSD respectively; Vechtomova (2017)
features as rules, including position information,       failed on most complex sentences by splitting with
PMI, TF-IDF, etc., to score candidate words. Zou         certain keywords. Instead, we use dependency
and Lu (2019) jointly detected and located puns          parsing to extract the largest strict sub-tree in the
with tags from an LSTM (Long Short-Term Mem-             sentence structure as the second part, leaving the
ory) and CRFs (Conditional Random Fields). Cai           rest as the first part (see Figure 1). It separates
et al. (2018) also applied a BiLSTM, but based on        sentences and preserves the structure regardless of
sense-aware models.                                      sentence types.
   Two works by Mao et al. (2020) and Zhou et al.                          hid
(2020) were reviewed and published concurrently.
Mao et al. (2020) captured long-distance and                 They    from                in
short-distance semantic relations between words;
Zhou et al. (2020) combined contextualized word                     gunman             sauna
embedding and pronunciation embedding with a
self-attentive encoder, reaching 2 %, 2 %, 13 %                      the         a             sweat
and 7 % increase in F-score on the four tasks re-
spectively.                                                                          where they could it out
   Previous studies focused on machine learning
models or rule-based methods (Diao et al., 2019),        Figure 1: Example sentence for dependency parsing.
                                                         e.g., They hid from the gunman in a sauna where they
however, they are not able to measure how asso-          could sweat it out. After sentence separation: where
ciated a predictor with the purpose or its direction     they could sweat it out (important part) and They hid
with. Instead of using rules that are mainly based       from the gunman in a sauna (the rest part).
on belief, or deploying neural networks which,
due to their intrinsic complexity, indicate no clear       Word embedding or meaning: We use GloVe
clue on the relationship between input and out-          (Pennington et al., 2014) to derive word embed-
put, we use logistic regression and combinations         ding and other approaches like path distances of
of widely-used terms. This gives us the best result      word senses in WordNet to get meanings for word
so far and also provides us a valuable by-product,       pairs.
    System                        P      R     A      F1           System                     P       R       A      F1
    Zhou et al. (2020)4         .942   .957          .949          Zhou et al. (2020)       .948    .956            .952
    Zou and Lu (2019)4          .912   .933          .922          Zou and Lu (2019)        .867    .931            .898
    Indurthi and Oota (2017)5   .902   .897   .853   .900          Diao et al. (2019)6      .879    .851    .829    .865
    Sevgili et al. (2017)       .755   .933   .736   .835          Sevgili et al. (2017)    .773    .930    .755    .844
    Pedersen (2017)             .783   .872   .736   .825          Doogan et al. (2017)     .871    .819    .784    .844
    First setting               .924   .937   .900   .930          First setting            .921    .939    .899    .930
    Second setting              .828   .928   .811   .875          Second setting           .831    .938    .820    .881

Table 1: Homographic pun detection results with the            Table 2: Heterographic pun detection with the top two
top three teams from the competition and two best re-          teams from the competition and three best recent stud-
cent studies (the upper part). f4, f5, f6, f8, f9, f10         ies; the same features as homographic.
and f12 are used in both settings: 5-fold CV and the
one using own training data.                                             Description
                                                                 f1      The number of words regarding its POS tags.
                                                                 f2      The distance of last appeared POS tags respec-
                                                                         tively normalized to sentence length.
4        Experiments                                             f3      Individual sums of all founded PMI values ac-
                                                                         cording to POS tags.
4.1       Subtask 1: Pun Detection                               f4      doc2vec sentence representation.
                                                                 f5      doc2vec dot product for separated parts.
Pun detection is a binary classification problem                 f6      doc2vec for both parts of the sentence.
                                                                 f7      doc2vec cosine similarity of word pair from sep-
with a sentence as the input, and the decision of                        arated parts of the sentence in descending order.
whether it is punning as the output.                                     The first 10 values are taken.
                                                                 f8      It has three elements: if the sentence contains a
   Data: The published dataset (Miller et al.,                           similar idiomatic representation; if it differs ex-
2017) contains 2250 contexts for the homographic                         actly one word; how much they are in common.
and 1780 for the heterographic type. We also gen-                f9      Word similarity based on the shortest path in
                                                                         WordNet. We evaluated with path similarity7 .
erated a corpus from “Pun of the Day”3 , which                           For each sentence, all word pairs from two sub-
contains mixed types of puns. After removing                             sentences are evaluated, and the 4 largest results
                                                                         are chosen.
duplicates, we found 843 puns (disregarding pun                  f10     The number of associated words in the first part
types) with a significantly larger standard devia-                       of the sentence. For each word ω from the sec-
                                                                         ond part that exists in Free Association corpus8 ,
tion of sentence length compared to the given cor-                       we count how many content words from the first
pus. We fitted the dataset by limiting the range                         part are listed as associative words of word ω ac-
of word counts and ended up with 707 puns and a                          cording to Free Association corpus.
                                                                 f11     The number of words that are predicted differ-
variety of negative samples made of non-punning                          ently if one word ω is masked, using BERT.
jokes, famous sayings and other short collections.               f12     Sentence representation using BERT.
The compiled dataset can be made available upon
                                                                       Table 3: Feature lists for pun detection.
request.
   Setting: In this subtask, we did experiments
on two different settings: 5-fold cross-validation             a positive influence on the final result, and vice
and purely with collected training data. For the               versa. To unify, we choose the same feature sets
first setting, we cross-validated with the official            for both homographic and heterographic sets.
dataset, since it is not split by the provider. To                Results: Table 1 and 2 provide the experimen-
make it comparable to the previous research, we                tal results for homographic and heterographic pun
tested all the folds independently and calculated              detection, respectively. The 5-fold CV utilizes all
the macro score in the end, thus the final result              data provided in the task; the latter one does not
covers all benchmark data with 5 sub-experiments;              use any data from the task for training.
for the second one, we trained on self-collected                  From both results, our system with the first
data and evaluated on the official dataset. Both               setting (5-fold cross-validation) leads to the best
experiment metrics use standard precision, recall,             scores, compared with all the teams which used
accuracy and F-score.                                             6
                                                                    5-fold cross-validation on the original dataset and our
   Features: Table 3 lists the features; among                 compiled corpus from Pun of the Day.
                                                                  7
them, f4, f5, f6, f8, f9, f10 and f12 give                          path similarity from nltk returns a score denoting how
                                                               similar two senses are, based on the shortest path that
     3
     Pun of the Day: http://www.punoftheday.com/               connects the senses in the is-a (hypernym/hyponym)
     4                                                         taxonomy. http://www.nltk.org/howto/wordnet.html
     Both teams used 10-fold cross-validation.                    8
                                                                    Free Association is a collection of word pairs that
   5
     Trained on part of the dataset, evaluated on 675 of the   people tend to think first when given the other word:
2250 homographic contexts according to task organizer.         http://w3.usf.edu/FreeAssociation/AppendixC/
 System                      C       P      R     F1        System                   C       P      R     F1
 Zhou et al. (2020)                .904   .875   .889       Zhou et al. (2020)             .942   .904   .923
 Mao et al. (2020)                 .850   .813   .831       Mao et al. (2020)              .888   .858   .873
 Zou and Lu (2019)                 .835   .771   .802       Zou and Lu (2019)              .814   .775   .794
 Cai et al. (2018)                 .815   .747   .780       Vechtomova (2017)       .998   .797   .795   .796
 Doogan et al. (2017)       .999   .664   .662   .663       Doogan et al. (2017)    1.00   .685   .685   .685
 Vechtomova (2017)          .999   .653   .652   .652       Sevgili et al. (2017)   .988   .659   .652   .655
 Indurthi and Oota (2017)   1.00   .522   .522   .522       Our system              1.00   .849   .849   .849
 Our system                 1.00   .762   .762   .762
                                                         Table 5: Heterographic pun location results with the
Table 4: Homographic pun location results with the top   best three teams from the competition and three best
three teams from the competition and three best recent   teams from recent studies; all features except f7 are
studies; all features except f2 and f4 are used.         used.


part of official data for training (all teams listed     and calculated. Scores were computed using stan-
above except Sevgili et al. (2017) and Doogan            dard coverage, precision, recall and F-score mea-
et al. (2017)). Compared with the other teams            sures.
participating in this subtask, our second setting           Features: Table 6 lists all the features used
(training separated) also outperforms them by            in this subtask. All of them are word-based and
about 4 %.                                               each assigns a vector or value to words. f2 and
   Besides, there is a 5 % performance drop from         f4 are only for heterographic since they contain
training in the 5-fold CV to the self-collected cor-     homonymic information. We then concatenate all
pus respectively. This may result from multi-            vectors from chosen features. After logistic re-
ple reasons. For instance, the self-collected cor-       gression, words with the highest score presume to
pus does not categorize the punning type; the            be pun location.
non-punning samples may consist of some hidden              Results: Our system achieves competitive re-
puns; the two corpora vary in terms of their prop-       sults compared to teams from the competition. For
erties, etc.                                             location on homographic puns, our model’s per-
   Ablation test: In the ablation test, we found         formance is lower than Zou and Lu (2019) and
that the major factors are sentence representation       Cai et al. (2018), which use LSTM (see Table 4).
(f4, f12), the relation between both parts (f5,          Added pronunciation features (f2 and f4), our
f6, f7 and f10) and word meaning (f8, f9).               system (using all features except f7) exceeds the
While sentence representation offers the basis, the      state-of-the-art results by around 5 % for hetero-
relation between both parts also helps in general        graphic puns (see Table 5).
or individual word pairs.                                   Ablation test: Both f6 and f7 lead to a sig-
   Furthermore, since all features have different        nificant increment in the result. They give words,
dimensions, f12 occupies more than 2/3 of the            especially content words, with latter order more
feature space, and most of the top 15 % impor-           weight. f3 uses doc2vec to extract relations be-
tant components are from it. The third parame-           tween separated parts, while f9 is to find out the
ter in f8 calculates the maximum ratio of over-          “surprise” within a pun using MaskLM. Together,
lapping words to the word length of the found id-        they contribute to around 2 % improvement. These
iomatic representation and always has the largest        two features answer to our hypothesis, and also
influence, then comes f4, and sometimes f6.              tend to focus on the differences between double
                                                         meanings of the trigger words and the two parts
4.2   Subtask 2: Pun Location                            of sentences. f10 concatenates the GloVe vector
Pun location is to find out which word in the con-       to represent the word itself. It results in an ap-
tent is punning, given a pun-containing sentence.        proximately 7 % boost. Homonymic information
   Data: The dataset provided by the organizer in-       (f2) helps, but still left much to explore. First,
cludes 1271 homographic puns and 1780 hetero-            the data is heterographic instead of homophonic
graphic ones.                                            (e.g., “orifice” and “office”). Besides, variances
   Setting: In this subtask, we used 5-fold cross-       of the word are not considered (e.g., “knowingly”
validation to test. Like in Subtask 1, the official      and “no”). Third, puns may exploit names or com-
dataset was randomly split into 5 folds. The pre-        pounds (e.g., “Clarence” and “clearance”).
dictions from each fold were then accumulated               This problem is remedied by adding word fre-
            Description                                            ing and interpretation, and may be assembled into
    f1      Assign value 1 to the last content word in the sen-
            tence. Namely, if the feature is used, the last con-   machine translation in the future.
            tent word in sentences will be concatenated with
            vector [1], while a vector [0] for the other words.    Acknowledgments
    f2      If there is the same pronunciation of the word in
            CMU Pronouncing Dictionary9 .
    f3      Maximum doc2vec cosine similarity of word pair         We thank the anonymous reviewers for sugges-
            from separated parts of the sentence.                  tions on the submission; the paper was partially
    f4      The number of context words that have lower            supported by the German Academic Exchange
            doc2vec cosine similarity with word ω than with
            any of ω’s homonyms.                                   Service and partially supported by base.camp at
    f5      Assign value 0, 1 or 2 for word ω according to its     Universität Hamburg.
            frequency in Brown corpus10 . The rarer the word
            is, the higher the value it is assigned.
    f6      The position of word ω in the sentence, assign 1
            if it is in the second half, plus additional 1 if it   References
            also lays in the last quarter.
    f7      Mark last N, V, Adj, Propn in form of a vector         Yitao Cai, Yin Li, and Xiaojun Wan. 2018. Sense-
            at their place (from subtask 1). For example, if          aware neural models for pun location in texts. In
            word ω is the last verb in the sentence, its vector       Proceedings of the 56th Annual Meeting of the As-
            for this feature should be [0,1,0,0].                     sociation for Computational Linguistics (Volume 2:
    f8      doc2vec of the whole sentence (from subtask 1).
    f9      The number of context words that are predicted            Short Papers), pages 546–551, Melbourne, Aus-
            differently using BERT if word ω is masked.               tralia.
    f10     GloVe vector of the word.
                                                                   Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
     Table 6: Pun location features and description.                  Kristina Toutanova. 2019. BERT: Pre-training of
                                                                      deep bidirectional transformers for language under-
                                                                      standing. In Proceedings of the 2019 Conference of
                                                                      the North American Chapter of the Association for
quency feature (f5) instead of given special at-
                                                                      Computational Linguistics: Human Language Tech-
tention to particular names or structure patterns.                    nologies, Volume 1 (Long and Short Papers), pages
Although this feature works significantly in het-                     4171–4186, Minneapolis, MN, USA.
erographic, it barely influences the homographic
                                                                   Yufeng Diao, Hongfei Lin, Liang Yang, Xiaochao Fan,
ones. Unlike heterographic puns, one needs a
                                                                     Di Wu, Dongyu Zhang, and Kan Xu. 2019. Het-
word with two senses that are widely known to                        erographic pun recognition via pronunciation and
people in the homographic case. So a rare word                       spelling understanding gated attention network. In
can hardly be used in that situation.                                The World Wide Web Conference, pages 363–371,
                                                                     San Francisco, CA, USA.
5        Conclusion and Future Work                                Samuel Doogan, Aniruddha Ghosh, Hanyang Chen,
                                                                     and Tony Veale. 2017. Idiom savant at SemEval-
We provide a dataset for pun detection and built a                   2017 Task 7: Detection and interpretation of En-
model that achieves state-of-the-art on three sub-                   glish puns. In Proceedings of the 11th International
tasks for different types of puns. We found that                     Workshop on Semantic Evaluation (SemEval-2017),
                                                                     pages 103–108, Vancouver, Canada.
three things affect a pun: general interpretation of
the content, relation for both parts and word mean-                Christian F Hempelmann. 2008. Computational hu-
ing. In case we know it is punning as a prior,                       mor: Beyond the pun. The Primer of Humor Re-
we can utilize e.g., word position or “surprise” ac-                 search. Humor Research, 8:333–360.
cording to their type, to locate the punning word.                 Vijayasaradhi Indurthi and Subba Reddy Oota. 2017.
   We deployed homophones to heterographic                            Fermi at SemEval-2017 Task 7: Detection and in-
tasks; this could be an interesting topic for future                  terpretation of homographic puns in English lan-
work as well as a test of higher-order associations                   guage. In Proceedings of the 11th International
                                                                      Workshop on Semantic Evaluation (SemEval-2017),
between word pairs. Nevertheless, with the im-                        pages 457–460, Vancouver, Canada.
proved results of pun detection and interpretation,
our system provides a step for further understand-                 Justine T Kao, Roger Levy, and Noah D Goodman.
                                                                      2016. A computational model of linguistic humor
   9
     Carnegie Mellon University (CMU) Pronouncing                     in puns. Cognitive science, 40(5):1270–1285.
Dictionary is an open-source pronunciation dictionary:
http://www.speech.cs.cmu.edu/cgi-bin/cmudict                       Quoc Le and Tomas Mikolov. 2014. Distributed repre-
    10
    It is a general English-language corpus with a total of          sentations of sentences and documents. In Interna-
roughly one million words:                                           tional conference on machine learning, pages 1188–
https://archive.org/details/BrownCorpus                              1196, Beijing, China.
Junyu Mao, Rongbo Wang, Xiaoxi Huang, and Zhiqun          Olga Vechtomova. 2017. Uwaterloo at SemEval-2017
  Chen. 2020. Compositional semantics network with          Task 7: Locating the pun using syntactic character-
  multi-task learning for pun location. IEEE Access,        istics and corpus-based metrics. In Proceedings of
  8:44976–44982.                                            the 11th International Workshop on Semantic Eval-
                                                            uation (SemEval-2017), pages 421–425, Vancouver,
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-        Canada.
  rado, and Jeff Dean. 2013. Distributed representa-
  tions of words and phrases and their compositional-     Yuhuan Xiu, Man Lan, and Yuanbin Wu. 2017. Ecnu at
  ity. In Advances in neural information processing         semeval-2017 task 7: Using supervised and unsuper-
  systems, pages 3111–3119, Lake Tahoe, NV, USA.            vised methods to detect and locate English puns. In
                                                            Proceedings of the 11th International Workshop on
George A Miller. 1995.      WordNet: a lexical              Semantic Evaluation (SemEval-2017), pages 453–
  database for English. Communications of the ACM,          456, Vancouver, Canada.
  38(11):39–41.
Tristan Miller and Iryna Gurevych. 2015. Automatic        Yichao Zhou, Jyun-Yu Jiang, Jieyu Zhao, Kai-Wei
   disambiguation of English puns. In Proceedings           Chang, and Wei Wang. 2020. ”The Boating Store
   of the 53rd Annual Meeting of the Association for        Had Its Best Sail Ever”: Pronunciation-attentive
   Computational Linguistics and the 7th International      contextualized pun recognition. arXiv preprint
   Joint Conference on Natural Language Processing          arXiv:2004.14457.
   (Volume 1: Long Papers), pages 719–729, Beijing,       Yanyan Zou and Wei Lu. 2019. Joint detection and
   China.                                                   location of English puns. In Proceedings of the 2019
Tristan Miller, Christian Hempelmann, and Iryna             Conference of the North American Chapter of the
   Gurevych. 2017. Semeval-2017 task 7: Detection           Association for Computational Linguistics: Human
   and interpretation of English puns. In Proceed-          Language Technologies, Volume 1 (Long and Short
   ings of the 11th International Workshop on Semantic      Papers), pages 2117–2123, Stroudsburg, PA, USA.
   Evaluation (SemEval-2017), pages 58–68, Vancou-
   ver, Canada.
Dieke Oele and Kilian Evang. 2017. Buzzsaw at
  SemEval-2017 Task 7: Global vs. local context
  for interpreting and locating homographic English
  puns with sense embeddings. In Proceedings of
  the 11th International Workshop on Semantic Eval-
  uation (SemEval-2017), pages 444–448, Vancouver,
  Canada.
Ted Pedersen. 2017. Duluth at SemEval-2017 Task
  7: Puns Upon a Midnight Dreary, Lexical Seman-
  tics for the Weak and Weary. In Proceedings of
  the 11th International Workshop on Semantic Eval-
  uation (SemEval-2017), pages 416–420, Vancouver,
  Canada.
Jeffrey Pennington, Richard Socher, and Christopher D
   Manning. 2014. Glove: Global vectors for word
   representation. In Proceedings of the 2014 confer-
   ence on empirical methods in natural language pro-
   cessing (EMNLP), pages 1532–1543, Doha, Qatar.
John Pollack. 2011. The Pun Also Rises: How the
  Humble Pun Revolutionized Language, Changed
  History, and Made Wordplay More Than Some An-
  tics. Penguin, New York, NY, USA.
Özge Sevgili, Nima Ghotbi, and Selma Tekir. 2017. N-
   Hance at SemEval-2017 Task 7: A Computational
   Approach using Word Association for Puns. In Pro-
   ceedings of the 11th International Workshop on Se-
   mantic Evaluation (SemEval-2017), pages 436–439,
   Vancouver, Canada.
Julia M Taylor and Lawrence J Mazlack. 2004. Com-
   putationally recognizing wordplay in jokes. In Pro-
   ceedings of the Annual Meeting of the Cognitive Sci-
   ence Society, 26, Chicago, IL, USA.