N&&N at HAHA@IberLEF2021: Determining the Mechanism of Spanish Tweets using ColBERT Nedaa Alsalman1 and Noor Ennab2 1 Jordan University of Science and Technology, Irbid, Jordan 2 nmalsalman19, nsabuennab20 @cit.just.edu.jo Abstract. In modern technologies humor determination has value for the evaluation of chatbots and personal assistants, the purpose of humor detection is to detect whether a tweet, a sentence or even a word have humor or not. We have participated in HAHA@IberLEF2021 competi- tion for detecting humor in Spanish tweets based on the general linguistic structure of humor task1 and determining the mechanism of these tweets for task 3. So, we used the dataset available on this competition, about 24,000 tweets. Our approach was using ColBERT to generate embed- dings for each tweet and then use these embeddings as inputs of hidden layers in a neural network, to helping measure if the tweet is humor or not for task 1. Also, we used sequential Bert for detecting the mecha- nism of humor for task 3. Our results in evaluation phase for competition given 0.7693 for task 1 (position 12), and 0.0404 for task 3 (position 9). Keywords: ColBERT· Humor· Mechanism· Spanish· Tweets. 1 Introduction Humor is considered as the experience to provoke laughter and provides enter- tainment, it is a good, important aspect of human behavior, which can be ex- pressed in several ways, including text, gestures, and voice signals. It has received a lot of attention as a research study in several areas such as philosophy, psychol- ogy, sociology, and linguistics. For computational texts, several methodologies have been developed and studied for classifying them including social media text to identify the meanings of several words, learning about cultures, and the ways of expressing them based on machine learning or deep learning as a new research topic for natural language processing. From the previous studies on computa- tional text, analysis feelings like irony through figurative language[7], classifying tweets depending on the most humorous tweets[9], detecting humor’s tweets au- tomatically, and rating them depending on the most humorous one[18][6]. The subject of humor has an ancient history in the human archives. when saying IberLEF 2021, September 2021, Málaga, Spain. Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). that humor is the remarkable characteristic of mankind is no exaggeration, It has been the subject of debate for at least two or three thousand years. Humor, considered an umbrella term that captures the irony of laughter with sarcastic humor and comedy[17]. In this we are paper targeting task 1 and task 3 in HAHA@IberLEF2021, task 1 humor detection: deciding whether a tweet has humor or not, second prediction score: the funniness class: calculate a funniness score for a tweet, these tasks are similar to the tasks in HAHA@IberLEF 2018 and 2019, while the other tasks are new, as the main goal in this paper is to classify the mechanism of humor: after classifying a tweet to be funny then predict the mechanism from a set of classes, such as irony, pun, or exaggeration, only one tweet per class is allowed in this task[5]. Figure 1 shows the task objective of our work. Figure 1: Tasks Workflow Our approach used sequential Bert model and some prepossessing technique to encode text into word embedding represent as layer and hidden layer that help to understand text constant and feature related to it, then combine layer to predict the correct output. In addition to obtaining an accurate model, to evaluate model performance in detection humor mechanism through F1-score and Macro-F1 score. The structure of the paper is as follows: Section 2 Related work. Section 3 Methodology and that describe works step with details. Section 4 Result of experimental methodology 5 presents our experimental results. Section 6 Conclusion. 2 Related work (Aniruddha Ghosh et al). 2015, they participated in task 11 for SemEval 2015 about sentiment analysis of figurative language tweet to detected sarcasm and metaphor and evaluated their model worked accuracy using Cosine for simi- larity and a Mean-Squared-Error(RMSE) measure[7]. (Han et al). 2017, they participated in semeval-2017 task 6 which to understand a sense of humor in text and help to classified it to humor or not, using Naı̈ve Bayes model Multi- nomial and measure training performance using macro-average recall [9], while (Yan et al.). 2017, they participated on same task but using big ram language models, were better than trigram models on the evaluation data [18]. On the other-hand (Fleşcan Lovin Arseni et al). 2017, they participated on task to find humor tweets through comparing and ordering them debend on which is most humor using Naı̈ve Bayes and neural networks algorithms[6]., while (Potash et al). 2017, they participated for same task but worked on extract features of these tweet to determine humority using neural network model [14]. (Ahuja et al). 2018, they suggested depending on classical approaches to classify the text is humor or not and dark humor, using large data set from jokes, machine learning techniques to classified, then distinguish between models accuracy that were achieves, so SVM is best model[1]. While (Liu et al). 2018, they suggested to worked on exploiting the syntactic infrastructure to support humor recognition, have made significant improvements and some features of the syntactic structure are constantly linked to humor, which indicates importance the linguistic phe- nomena[12].(Ankush Khandelwal et al). 2018, they suggested detecting humor tweets on different languages such as Indian, English with symbols and classified them humorous(H) and non-humorous(N), using n-grams with support vector machine model and bag-of-Words[11].While(Castro et al). 2018, the organizers of HAHA IberLEF 2018, they provide an overview of the competition goals and results, so, it was to detection humor for Spanish tweets through automatic de- tection and automatic rating them [3]. In (Sushmitha Reddy Sane et al). 2019, they suggested same Ankush Khandel- wal et al) but using deep learning model CNN, biLSTM and word2vec, use of bilingual motifs as input and representation it[15].While (Chiruzzo et al). 2019, the organizers of HAHA at IberLEF 2019, they provide an overview of the com- petition goals and results, so, it was to recognition if a tweet considers humorous or not and giving the result is a cheerful sense of humor or not, [4]. In (Kamrul Hasan et al). 2019, they proposed new database to detect humor through mul- timedia, vision and audio, depend on natural language processing approach[10]. In (Miller et al).2020, they participated on SemEval-2021 Task 7 to detection and rank humor for short tweets by automatically exploiting judgments of hu- man preference and linguistic annotations using GCPL that was good at English language and applied it in Spanish language in the work[13].While (Winters et al).2020, they suggested making the dataset larger and more complex than prior works especially for Dutch language, so, worked on three set using several algo- rithms to created texts as pattern or similar jokes in the dataset, then detection the humor text using several languages through models and comparing results in what model was could be able to differentiate the jokes or not, so, the RobBERT was best[16]. (Gupta et al). 2021, they participated in SemEval-2021 Task 7 to show that could develop reasonable humor and detection it was using some model like NLP, BERT, RoBERTa, ERNIE2.0, DeBERTa and XLNet, because there is no widespread as a dis aggregated dataset, most previous work in the field has not explored large neural models for personal understanding of humor[8]. 3 Overview In this section, we summarize our participation as follows: • Working on dataset available in HAHA@IberLEF2021, then using some pre- processing techniques on dataset to be more fixable to train model. • Propose an automated approach for humor mechanism detection, then intro- duce the model architecture and components in detail. • Evaluate our model and compare its performance with previous worked. Figure 2 Show: overall workflow structure in this paper. Figure 2: Paper workflow. 4 Data The dataset CSV files contain tweets in the Spanish language, 24,000 tweets for train, 6,000 tweets for development and others 6,000 for the test. With features ID, tweets is humor or not classifying as 0 or 1, score from one (not funny) to five (excellent), mechanism of humor to conveys and classify humor from a set of classes, and predict the target of the joke. • Table 1 Numerical analysis of data. • Table 2 Show the number of tweets for each mechanism category. count isHumor votesNo votes1 votes2 votes3 votes4 votes5 mean 0.38 2.47 0.79 0.61 0.42 0.18 0.05 std 0.48 11.98 1.80 1.61 1.70 1.00 0.39 min 0.00 0.00 0.00 0.00 0.00 0.00 0.00 25% 0.00 1.00 0.00 0.00 0.00 0.00 0.00 50% 0.00 3.00 0.00 0.00 0.00 0.00 0.00 75% 1.00 3.00 1.00 1.00 0.00 0.00 0.00 max 1.00 1290.0 183.0 161.0 208.0 129.0 46.0 Table 1. Numerical analysis of data mechanism type Number of it in dataset wordplay 28 unmasking 25 misunderstanding 25 reference 19 absurd 17 exaggeration 16 analogy 14 irony 13 stereotype 9 parody 9 embarrassment 9 insults 7 Table 2. Number of tweets for each mechanism category. 4.1 Data preprocessing Preprocessing techniques are used based on the text data type, start by divid- ing the given content into smaller parts called token, each word, number, and accentuation marks can be considered as tokens, it is exceptionally valuable for finding patterns and considered as a base step for stemming and lemmatization, the tokenization process make a difference in changing sensitive information to non-sensitive components. Then removing stop words that most common words in a regional language that represent meaningless words or derivation from words to reduced their base or root form. Word embedding algorithms utilizing the pre- trained GloVe Global Vectors for Word Representation , an unsupervised learn- ing process for getting vector representations of words, it permits us to require words from the content, and change each word in that corpus into a sample in high-dimensional space and implies that each comparable word will be assem- bled. More precisely, to get all the word with the same meaning together. 5 Methodology This section will target our approach method to determine whether tweets are humorous or not, then find its mechanism. So, in a technical viewpoint, using ColBERT modeling algorithm that takes the input to find the target value. 5.1 Analysis for humor Structure Starting to view the structure of the tweets, for understanding them, finding words that relate to determining the text humorous or not and helps identify a mechanism for that. For example ”Necesito con suma urgencia un calentamiento global para mi casa” this tweet in Spanish language which mean: ”I urgently need global warming for my home”. There are several approaches that have been taken to understand humor structured and classified it as humor and what its mechanism, our approach in next subsection. 5.2 Models As following to analysis Structure, the understanding the structure of the tweets to find the meaning that can helping in determining which tweet is humor, by extracting features of it, then take this features as inputs for the model, process them sequentially and represent them as layers, after that using the activation function to design a neural network that helping control all the layers to make a correct prediction that represent as a target output layer by using the classi- fication model that using several paths to extract features for the hidden layers to get final output consider statues of tweets as we need. Figure 3 displays the architecture of ColBERT model, which contains a set of steps: • First, take the sentence into two parts left and right in sequential way word by word to find contextual relations between them then tokenize and encode it using Bert, which working to training the model language on input that selecting randomly to predicting 20% or fewer of tokens then split it almost 80% of word that selecting randomly replacing with [MASK] token with 10% of word that selecting randomly and 10% of the input word. • Taking the tokens as an input to accumulate it with word embedding, and we used other layer type as flatten, dense for the hidden layer in the neural network to sequentially extract the features using the activation function to activate the work of the layer in neural network. • To predict the output of each sentence after combining all layer, represent it as vector of numbers. Therefore, it will look at the goal to determine the sentence is humor or not and find the mechanism for it. Figure 3: ColBert Work[2] 5.3 Implementation Notes To deal with clear data and impurities free text, applying some preprocessing techniques from Keras for textual data: • Convert the text to lowercase then remove stop word such as ”is, a, the”, after that replace the symbols such as (/()[]—@,;) in the text to space and remove bad symbols such as (0-9a-z +-)from the text. • Tokenize the data for top 10000 common words, and cut off sentences after 50 words, then padding the sequences, after that converting categorical labels in humor mechanism to a dummy values such as irony that converted to 01011, finally, shuffle the data. • Using Glove unsupervised algorithm for embedding word, that allow to take subset of data to put the similar word together. • Model type ’Bert base uncased’ to determine whether the text is humor or not, after tokenizing the data, then fit model using epochs=50, batch-size=32, validation-split=0.1 we have a score of accuracy about 65% therefor, we try different number to get best result. • Bert sequential: starting by determin- ing the number of dense layers, selecting the activation function type, and the dropout(adding regularization) to avoid overfitting in the train, loading the pre- trained word embedding into the embedding layer and freezing other layers. • Fit the model using learning rate =1e-4,loss= ’categorical cross entropy’ for epochs=200, batch-size=1000, validation-split=0.1. • The evaluation metrics for the model in our colab using TP, TN, FP, FN, F1 Score and MSR from sklearn to find accuracy, then evaluate accuracy and loss function for score using batch-size=32,and verbose=1. 5.4 Experimental Results As follows to implementation notes subsection, our experiments result in de- velopment stage for task 1 using ColBERT model with model type ’bert base uncased’. F1 metric for task 1 in the evaluation phase, which was 0.7693, in 12th place, then comparing our result in HAHAIberLEF-2021 with F1 metrics for 2018 which was 0.7972 and for 2019 which was 0.821. Task 3 which new ideas and main goal for this paper F1 metrics was 0.0404 in 9th place. 6 Conclusion When saying that humor is the remarkable characteristic of mankind is no ex- aggeration. It has been the subject of debate for at least two or three thousand years. We used sequential Bert for detecting the mechanism of humor base on the given data set on HAHA@IberLEF2021 competition for detecting humor in Spanish tweets based on the general linguistic structure of humor, and we proposed Col-BERT to generate embeddings word from these tweets and using these embeddings as inputs of hidden layers in a neural network, to find our target output. References 1. Vikram Ahuja, Taradheesh Bali, and Navjyoti Singh. What makes us laugh? inves- tigations into automatic humor classification. In Proceedings of the Second Work- shop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media, pages 1–9, 2018. 2. Issa Annamoradnejad and Gohar Zoghi. Colbert: Using bert sentence embedding for humor detection. arXiv preprint arXiv:2004.12765, 2020. 3. Santiago Castro, Luis Chiruzzo, and Aiala Rosá. Overview of the haha task: Humor analysis based on human annotation at ibereval 2018. In IberEval@ SEPLN, pages 187–194, 2018. 4. Luis Chiruzzo, Santiago Castro, Mathias Etcheverry, Diego Garat, Juan José Prada, and Aiala Rosá. Overview of haha at iberlef 2019: Humor analysis based on human annotation. In IberLEF@ SEPLN, pages 132–144, 2019. 5. Luis Chiruzzo, Santiago Castro, Santiago Góngora, Aiala Rosá, J. A. Meaney, and Rada Mihalcea. Overview of HAHA at IberLEF 2021: Detecting, Rating and Analyzing Humor in Spanish. Procesamiento del Lenguaje Natural, 67(0), 2021. 6. Iuliana Alexandra Fleşcan-Lovin-Arseni, Ramona Andreea Turcu, Cristina Sirbu, Larisa Alexa, Sandra Maria Amarandei, Nichita Herciu, Constantin Scutaru, Diana Trandabat, and Adrian Iftene. # warteam at semeval-2017 task 6: Using neural networks for discovering humorous tweets. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 407–410, 2017. 7. Aniruddha Ghosh, Guofu Li, Tony Veale, Paolo Rosso, Ekaterina Shutova, John Barnden, and Antonio Reyes. Semeval-2015 task 11: Sentiment analysis of figu- rative language in twitter. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015), pages 470–478, 2015. 8. Aishwarya Gupta, Avik Pal, Bholeshwar Khurana, Lakshay Tyasgi, and Ashutosh Modi. Humor@ iitk at semeval-2021 task 7: Large language models for quantifying humor and offensiveness. arXiv preprint arXiv:2104.00933, 2021. 9. Xiwu Han and Gregory Toner. Qub at semeval-2017 task 6: Cascaded imbalanced classification for humor analysis in twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 380–384, 2017. 10. Md Kamrul Hasan, Wasifur Rahman, Amir Zadeh, Jianyuan Zhong, Md Iftekhar Tanveer, Louis-Philippe Morency, et al. Ur-funny: A multimodal lan- guage dataset for understanding humor. arXiv e-prints, pages arXiv–1904, 2019. 11. Ankush Khandelwal, Sahil Swami, Syed S Akhtar, and Manish Shrivastava. Humor detection in english-hindi code-mixed social media content: Corpus and baseline system. arXiv preprint arXiv:1806.05513, 2018. 12. Lizhen Liu, Donghai Zhang, and Wei Song. Exploiting syntactic structures for humor recognition. In Proceedings of the 27th International Conference on Com- putational Linguistics, pages 1875–1883, 2018. 13. Tristan Miller, Erik-Lân Do Dinh, Edwin Simpson, and Iryna Gurevych. Pre- dicting the humorousness of tweets using gaussian process preference learning. Procesamiento del Lenguaje Natural, 64:37–44, 2020. 14. Peter Potash, Alexey Romanov, and Anna Rumshisky. Semeval-2017 task 6:# hashtagwars: Learning a sense of humor. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 49–57, 2017. 15. Sushmitha Reddy Sane, Suraj Tripathi, Koushik Reddy Sane, and Radhika Mamidi. Deep learning techniques for humor detection in hindi-english code-mixed tweets. In Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 57–61, 2019. 16. Thomas Winters and Pieter Delobelle. Dutch humor detection by generating neg- ative examples. arXiv preprint arXiv:2010.13652, 2020. 17. Zhihui Wu. The laughter-eliciting mechanism of humor. English Linguistics Re- search2, 1:52–63, 2013. 18. Xinru Yan and Ted Pedersen. Duluth at semeval-2017 task 6: Language models in humor detection. arXiv preprint arXiv:1704.08390, 2017.