<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">NLPalma Joker 2024: Yet, no Humor with Humorousness -Task 2 Humour Classification According to Genre and Technique ⋆</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Victor</forename><forename type="middle">Manuel</forename><surname>Palma-Preciado</surname></persName>
							<email>victorpapre@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Centro de Investigacion en Computacion (CIC)</orgName>
								<orgName type="institution">Instituto Politécnico Nacional (IPN)</orgName>
								<address>
									<settlement>Mexico City</settlement>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Université de Bretagne Occidentale</orgName>
								<address>
									<settlement>HCTI</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Carolina</forename><surname>Palma-Preciado</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Centro de Investigacion en Computacion (CIC)</orgName>
								<orgName type="institution">Instituto Politécnico Nacional (IPN)</orgName>
								<address>
									<settlement>Mexico City</settlement>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Grigori</forename><surname>Sidorov</surname></persName>
							<email>sidorov@cic.ipn.mx</email>
							<affiliation key="aff0">
								<orgName type="department">Centro de Investigacion en Computacion (CIC)</orgName>
								<orgName type="institution">Instituto Politécnico Nacional (IPN)</orgName>
								<address>
									<settlement>Mexico City</settlement>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">NLPalma Joker 2024: Yet, no Humor with Humorousness -Task 2 Humour Classification According to Genre and Technique ⋆</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">128DEC90B9F433462073B44D6869A221</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:53+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>BERT</term>
					<term>Natural language processing</term>
					<term>Humour classification</term>
					<term>Humour</term>
					<term>Wordplays</term>
					<term>Jokes</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The following work aims to describe the team participation in JOKER 2024, which focuses on developing various methods for classifying text that exhibit different techniques and humorous intentions. Understanding such aspects of humor can often be challenging for human beings. By classifying humor into these categories, we aim to establish more robust methods for classification, which can be applied across various fields of study. Current models offer high potential for training and fine-tuning complex tasks like humor classification. This ranges from the traditional use of Convolutional Neural Networks (CNNs) to the widely utilized modern Transformer paradigm BERT-like models. The results were mixed, as different approaches were chosen. It is believed that, given their performance, the models can still be optimized and their accuracy improved. Overall, the results are satisfactory for a first approach using the usual BERTlike model and embeddings such a USE with a CNN.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The main objective of this work is to find robust methods to achieve good results in Task 2 "Classification According to Genre and Technique "of JOKER CLEF 2024 <ref type="bibr" target="#b4">[5]</ref> for different types of humor. In Task 2, the model must classify sentences containing a wide range of humorous constructs into different classes. This task is based on the English dataset of JOKER 2024 <ref type="bibr" target="#b1">[2]</ref>. The primary goal is to accurately perform multiclass classification, automatically categorizing text into the following classes: irony, sarcasm, exaggeration, incongruity-absurdity, self-deprecating, and wit-surprise. The aim is to develop a model capable of clearly identifying these classes.</p><p>The study of humor is an underexplored topic, making resources such as corpora and models trained in different kinds of humor difficult to obtain or non-existent. Humor is believed to be imbued with cultural characteristics, increasing the complexity of understanding humorous expressions and making humor subjective and challenging to tackle.</p><p>Since humans find it difficult to generalize humor, certain features can help machines understand it in a specific way, as their ability to compute similarities is stronger than that of humans. Consequently, humor detection methods can infer some aspects of humor better than humans, who interpret it subjectively. Access to a sufficiently robust dataset and baseline provides a measure to scale studies in the field of humor <ref type="bibr" target="#b9">[10]</ref>.</p><p>The main objective of this work is to find robust methods to achieve good results in Task 2 for different types of humor. In Task 2, the model must classify sentences containing a wide range of humorous constructs into different classes. This task is based on the English dataset of JOKER 2024 <ref type="bibr" target="#b1">[2]</ref>. Where the primary goal is to accurately perform multiclass classification, automatically categorizing text into the following classes: irony, sarcasm, exaggeration, incongruity-absurdity, self-deprecating, and wit-surprise. The aim is to develop a model capable of clearly identifying them.</p><p>Adapting various methods and models to achieve the desired results for each class should be the main approach. It is important to note that each class contains a different number of examples. Taking this into account can ease the training process by maintaining balance among the classes. Additionally, we need to consider whether there is sufficient data to train the model. If not, we should increase the data using other methods to ensure better results.</p><p>Classifying humorous sentences presents a unique challenge because some sentences unintentionally resemble others, causing confusion. For example, irony and sarcasm can often blur together, as can incongruity-absurdity and exaggeration, making them tricky to interpret even for humans. Therefore, we need a method that excels at differentiating these nuances. Understanding the types of sentences our model needs to discern between each class is crucial, as the similarity between them can hinder and confuse the model, introducing noise that must be addressed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Materials and Methods</head><p>This section details the three key stages of the experiments. First, it describes how the dataset for the classification task was composed and process. Next, it outlines the selection process, workflow, and configuration of the models. Finally, it details the resources used.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Dataset</head><p>The provided dataset for Task 2, as previously described, is a multiclassification dataset as it contains text labeled in six different class labels: irony (IR), sarcasm (SC), exaggeration (EX), incongruity-absurdity (AID), self-deprecating (SD), and wit-surprise (WS).</p><p>Since the aim is to compare different approaches made by the participants, the organizers have prepared a train and test sub dataset for this purpose. For the training dataset, 1,742 samples were given, and their distribution among the classes can be seen in (Figure <ref type="figure" target="#fig_0">1</ref>), where class WS has the most samples and EX has the fewest.</p><p>The models were trained on the English corpus consisting wordplays, which was further divided into an internal training and validation set with a 70:30 (1,219:523) for a hold-out stratify validation method. Then each model was then used to evaluate the test set provided by JOKER, comprising 6,642 sentences. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Model selection</head><p>As part of the model selection different approaches were taken the structure of the treatments is as follows:</p><p>• Transformers Paradigm Models (BERT, BERT multilingual, DistilBERT, among others). • CNN (Convolutional Neural Network) + Universal Sentence Encoder.</p><p>Since experimentation can be time-consuming, it is often necessary to repeat them to improve results. Specifically, generating embeddings can take a long time. Therefore, using increasingly powerful computers becomes a necessity to reduce computational times and push the models further, allowing multiple runs of experimentation simultaneously. In this case, the Google Colab Pro environment allows the utilization of GPU resources without runtime limits, facilitating the experimentation stage.</p><p>The best yield results of our methods are taken for submission, the firsts treatments were based one of the most know constructs the transformers BERT-like, and the second one based on a quite simple CNN structure paired with a quite powerful representation of embeddings (USE).</p><p>A multilingual model such as multilingual BERT, pre-trained in 104 languages, was initially considered for training <ref type="bibr" target="#b7">[8]</ref>. However, with the purpose of finding the best model during evaluation, it was decided to keep the model but apply a separate approach. As a result, two models were trained: one specifically for English and one with multilingual capabilities, both utilizing the same BERT architecture <ref type="bibr" target="#b5">[6]</ref>.</p><p>The models were trained with BERT <ref type="bibr" target="#b3">[4]</ref> and multilingual BERT <ref type="bibr" target="#b2">[3]</ref>, loaded from the Hugging Face transformers module with the help of the Ktrain wrapper <ref type="bibr" target="#b0">[1]</ref>, which facilitated the process of loading and fine-tuning the models quickly and simply.</p><p>Since transformers do not require extensive preprocessing <ref type="bibr" target="#b8">[9]</ref>, the training process was relatively straightforward. However, during the fine-tuning phase, it was necessary to experiment with different parameters to achieve the best results in validation accuracy and loss. The best models were saved in Keras H5 format for future reference. For the training of the BERT-like models, the following parameters were utilized: a batch size of 32, 8 training epochs, and a learning rate of 5e-5.</p><p>In the case of the CNN, KERAS was used, as simple and easy form to deploy this structure, for the embedding (USE) due to changes on the Tensorhub platform, Kaggle was used to handle the embeddings obviously with a different checkpoint as the one on Tensorhub, in this case 512 characteristics were taken from the USE model and then utilizing a wide variety of optimizers for the multiclass classification. The creation and approach of the network was influence from the work of <ref type="bibr" target="#b6">[7]</ref>, of course with our own twist was taken, in this case lower blocks of CNN, with two and three stacked convolutional blocks after a few attempts with different architectures, two blocks of simple connected convolutional network yield the best results, using a learning rate of 3e-1 and 5e-6. We obtained mixed results but in comparation with the BERT-like model, were not as good.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Resources</head><p>Among the resources used to train and evaluate the models, Google's Colab environment was employed. This platform enables Python programming and execution, while also providing easier use of GPU. The use of GPU resources allowed for faster execution in the performance of the described tasks, by enabling multiple simultaneous computations. The server used has the following specifications: GPU NVIDIA-SMI 525.85.12, CUDA v12.0, and 25 of RAM. As part of the resources, BERT-like models that are less demanding in the use of resources, it is necessary to consider the data volume and the desired width of tokenization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Results</head><p>This section presents the results of the internal evaluation, which used a 30% split of the training dataset for validation with both the CNN+USE and BERT models. The evaluation of the test dataset is described in the task overview document provided by the organizers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">CNN+USE</head><p>The results of the CNN neural network together with USE for text representations are presented in Figure <ref type="figure" target="#fig_1">2</ref>. Compared to the BERT-like model paradigm, the classifier has lower performance but still achieves some acceptable results. The network's overall performance had a weighted average accuracy of 47%, which is quite low in general.</p><p>The model struggled to distinguish between incongruity-absurdity with exaggeration and irony. This suggests that the embeddings did not capture the subtle characteristics that differentiate these types of humor. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">BERT model</head><p>In the case of the BERT model, two runs were conducted: one with the base model and another with the multilingual version, both had similar results, with the multilingual model showing only slight improvement.</p><p>Figure <ref type="figure" target="#fig_2">3</ref> shows the results obtained, exhibiting behavior very similar to the CNN+USE model. Exaggeration, irony, and sarcasm are the most conflicting classes, so it is expected that these classes have slightly lower performance. The class wit-surprise have, similar results in both models, as it has the best overall performance. The weighted average accuracy of the model was 70.%. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Best Results</head><p>Table <ref type="table">1</ref> showcases the best results achieved with the BERT model. it's noteworthy that the CNN+USE network's performance was inferior to that of BERT-like models and its analysis is not provide in this study. Each class is represented by a sentence that embodies its distinct characteristics. To illustrate our findings, we'll utilize the top result for each one.</p><p>As depicted below, the table presents probabilities for each class alongside the corresponding text (joke). It seems that the most effective methods yield results favoring longer, more anecdotal humorous sentences, as all of them have a confidence probability above 90%. This observation is somewhat corroborated when compared with Table <ref type="table" target="#tab_0">2</ref>, where the worst cases are presented. This contrasts with the majority of the poorer results, which still maintain a relatively high probability, typically around 40%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Best classify wordplay by class with BERT While some sentences vividly exemplify their class, others pose ambiguity. This variability in humor structures poses a challenge for accurate sentence-joke identification. The model may occasionally struggle to discern highly similar sentences, an issue that appears somewhat neglected. For instance, it's widely recognized that sentences employing exaggerated humor typically feature magnifying adjectives. In the case of irony and sarcasm, the fine line between them is determined by the underlying context. Thus, we posit that considering these nuances could mitigate misclassifications and errors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusions</head><p>The results obtained through the proposed classification approaches show promise, yet further refinement could strengthen and improve them. Specifically, in the classification of humorous classes, employing BERT-like models yielded favorable outcomes; however, there remains room for improvement through more effective fine-tuning and exploration of diverse variations in BERT-like architectures. In essence, this study has yielded encouraging findings, demonstrating the potential of transformer-based models in multiclass classification tasks.</p><p>Although a detailed performance analysis with evaluation metrics is yet to be provided, the analysis has demonstrated that the model exhibits strong confidence in classifying each category, even when confronted with a heavily imbalanced original dataset. Addressing this imbalance by augmenting the data or increasing the sample size for each class could potentially enhance the model's performance.</p><p>In conclusion, while this study primarily utilized deep learning models such as neural networks like CNNs and transformers like BERT, it's worth noting that other architectures remain unexplored and warrant investigation. The field of machine learning continually evolves, and exploring diverse models could lead to further insights and advancements.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Train dataset frequency by classes.</figDesc><graphic coords="3,148.10,155.98,312.52,249.25" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Confusion matrix for the CNN+USE model.</figDesc><graphic coords="5,126.60,206.46,355.69,283.45" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Confusion matrix for the BERT model.</figDesc><graphic coords="6,119.50,85.05,355.69,283.45" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 2</head><label>2</label><figDesc>Worst classify wordplay by class with BERTIt appears that sentence length influences classification. Notably, poorly classified instances tend to be concise one-liners. Hence, detailed descriptions and contextual information could prove beneficial for each class. Additionally, BERT seems to utilize question marks and exclamation points for differentiation.</figDesc><table><row><cell>Text</cell><cell>Class</cell><cell>Probability</cell></row><row><cell>Yogi had a whiskey, water, and tea drink every</cell><cell>IR</cell><cell>0.9944</cell></row><row><cell>night. He was a toddy bear.</cell><cell></cell><cell></cell></row><row><cell>How did the hipster burn his mouth? He ate his</cell><cell>SC</cell><cell>0.9864</cell></row><row><cell>pizza before it was cool.</cell><cell></cell><cell></cell></row><row><cell>Covid 19 coronavirus: Women are claiming 'boobs</cell><cell>EX</cell><cell>0.9577</cell></row><row><cell>get bigger' after having Pfizer jab</cell><cell></cell><cell></cell></row><row><cell>Someone, please help me! I'm way too young to be</cell><cell>AID</cell><cell>0.9818</cell></row><row><cell>this old already.</cell><cell></cell><cell></cell></row><row><cell>I've always known #Bez is my spirit animal but</cell><cell>SD</cell><cell>0.9871</cell></row><row><cell>seeing the mess he makes on confirms it 100%</cell><cell></cell><cell></cell></row><row><cell>what a legend that man is.</cell><cell></cell><cell></cell></row><row><cell>it's all about women in stem struggles. what about</cell><cell>WS</cell><cell>0.9732</cell></row><row><cell>women in interactive media struggles: i no longer</cell><cell></cell><cell></cell></row><row><cell>win against my friends in smash because they</cell><cell></cell><cell></cell></row><row><cell>major in goddamn video game.</cell><cell></cell><cell></cell></row><row><cell>Text</cell><cell>Class</cell><cell>Probability</cell></row><row><cell>I can't believe today is the last day we can be gay.</cell><cell>IR</cell><cell>0.4298</cell></row><row><cell>I started a band called 999 megabytes. We haven't gotten a gig yet.</cell><cell>SC</cell><cell>0.5050</cell></row><row><cell>Children of Karen's don't get autism because they</cell><cell></cell><cell></cell></row><row><cell>weren't vaccinated. They do however have hearing problems from listening to their moms</cell><cell>EX</cell><cell>0.4932</cell></row><row><cell>scream at managers.</cell><cell></cell><cell></cell></row><row><cell>I put my phone on vibrate. An hour later, I finally received a text message.</cell><cell>AID</cell><cell>0.4845</cell></row><row><cell>I don't know anything about Coronavirus other</cell><cell></cell><cell></cell></row><row><cell>than if you have it; you get an undeniable urge to</cell><cell>SD</cell><cell>0.3680</cell></row><row><cell>go the airport.</cell><cell></cell><cell></cell></row><row><cell>The satellite went into orbit on January 1st causing a new year's revolution.</cell><cell>WS</cell><cell>0.3714</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">ktrain: A Low-Code Library for Augmented Machine Learning</title>
		<author>
			<persName><forename type="first">S</forename><surname>Arun</surname></persName>
		</author>
		<author>
			<persName><surname>Maiya</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2004.10703</idno>
		<imprint>
			<date type="published" when="2020">2020. 2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
	<note>BigScience Workshop. BLOOM (Revision 4ab0472</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">CLEF 2024 JOKER Lab: Automatic Humour Analysis</title>
		<author>
			<persName><forename type="first">L</forename><surname>Ermakova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Miller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A-G</forename><surname>Bosser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Palma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sidorov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jatowt</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-56072-9_5</idno>
		<ptr target="https://doi.org/10.1007/978-3-031-56072-9_5" />
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval. ECIR 2024</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">N</forename><surname>Goharian</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">14613</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<ptr target="https://huggingface.co/google-bert/bert-base-multilingual-uncased" />
		<title level="m">google-bert/bert-base-multilingual-uncased • Hugging Face</title>
				<imprint>
			<date type="published" when="2001-03-11">2001. march 11</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<ptr target="https://huggingface.co/google-bert/bert-base-uncased" />
		<title level="m">google-bert/bert-base-uncased • Hugging Face</title>
				<imprint>
			<date type="published" when="2001-03-11">2001. march 11</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Overview of JOKER @ CLEF-2024: Automatic humour analysis</title>
		<author>
			<persName><forename type="first">Liana</forename><surname>Ermakova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anne-Gwenn</forename><surname>Bosser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tristan</forename><surname>Miller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Victor</forename><surname>Manuel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Palma</forename><surname>Preciado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Grigori</forename><surname>Sidorov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Adam</forename><surname>Jatowt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction: Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024)</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">Lorraine</forename><surname>Goeuriot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Philippe</forename><surname>Mulhem</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Georges</forename><surname>Quénot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Didier</forename><surname>Schwab</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Laure</forename><surname>Soulier</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Giorgio</forename><surname>Maria</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Di</forename><surname>Nunzio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Petra</forename><surname>Galuščáková</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Alba</forename><surname>García Seco De Herrera</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Guglielmo</forename><surname>Faggioli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Nicola</forename><surname>Ferro</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</title>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno>CoRR, abs/1810.04805</idno>
		<ptr target="http://arxiv.org/abs/1810.04805" />
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Humor Recognition Using Deep Learning</title>
		<author>
			<persName><forename type="first">Peng-Yu</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Von-Wun</forename><surname>Soo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<meeting>the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>New Orleans, Louisiana</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="113" to="117" />
		</imprint>
	</monogr>
	<note>Short Papers</note>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">How Multilingual is Multilingual BERT</title>
		<author>
			<persName><forename type="first">T</forename><surname>Pires</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Schlinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Garrette</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/p19-1493</idno>
		<ptr target="https://doi.org/10.18653/v1/p19-1493" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Attention is All you Need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
		<ptr target="https://arxiv.org/pdf/1706.03762v5" />
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="5998" to="6008" />
		</imprint>
		<respStmt>
			<orgName>Cornell University ; Cornell University</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">En arXiv</note>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">Victor</forename><surname>Manuel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Palma</forename><surname>Preciado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Grigori</forename><surname>Sidorov ; Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename></persName>
		</author>
		<title level="m">Carolina Palma Preciado Assessing WordplayPun classification from JOKER dataset with pretrained BERT humorous models</title>
				<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="1828" to="1833" />
		</imprint>
	</monogr>
	<note>CLEF</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
