<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">NLP&amp;IR@UNED at CheckThat! 2021: Check-worthiness estimation and fake news detection using transformer models</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Juan</forename><forename type="middle">R</forename><surname>Martinez-Rico</surname></persName>
							<email>jrmartinezrico@invi.uned.es</email>
							<affiliation key="aff0">
								<orgName type="department">Dpto. Lenguajes y Sistemas Informáticos</orgName>
								<orgName type="laboratory">NLP &amp; IR Group</orgName>
								<orgName type="institution">Universidad Nacional de Educación a Distancia (UNED)</orgName>
								<address>
									<postCode>28040</postCode>
									<settlement>Madrid</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Juan</forename><surname>Martinez-Romo</surname></persName>
							<email>juaner@lsi.uned.es</email>
							<affiliation key="aff0">
								<orgName type="department">Dpto. Lenguajes y Sistemas Informáticos</orgName>
								<orgName type="laboratory">NLP &amp; IR Group</orgName>
								<orgName type="institution">Universidad Nacional de Educación a Distancia (UNED)</orgName>
								<address>
									<postCode>28040</postCode>
									<settlement>Madrid</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Instituto Mixto de</orgName>
								<orgName type="institution">Investigación -Escuela Nacional de Sanidad (IMIENS)</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lourdes</forename><surname>Araujo</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dpto. Lenguajes y Sistemas Informáticos</orgName>
								<orgName type="laboratory">NLP &amp; IR Group</orgName>
								<orgName type="institution">Universidad Nacional de Educación a Distancia (UNED)</orgName>
								<address>
									<postCode>28040</postCode>
									<settlement>Madrid</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Instituto Mixto de</orgName>
								<orgName type="institution">Investigación -Escuela Nacional de Sanidad (IMIENS)</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">NLP&amp;IR@UNED at CheckThat! 2021: Check-worthiness estimation and fake news detection using transformer models</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">C195CC39FB90E342FBF80E7152AA646B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:50+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>check-worthiness</term>
					<term>fake news detection</term>
					<term>transformer models</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This article describes the different approaches used by the NLPIR@UNED team in the CLEF2021 Check-That! Lab to tackle the tasks 1A-English, 1A-Spanish and 3A-English. The goal of Task 1A in English is to determine which tweets within a set of COVID-19 related tweets are worth checking. Task 1A in Spanish is similar but in this case the tweets are related to political issues in Spain. In both tasks, transformer models have been used to identify check-worthy tweets, obtaining the first place in the task in English and the fourth place in the task in Spanish. Task 3A is focused on determining the veracity of a news article. It is a multi-class classification problem with four possible values: true, partially false, false, and other. For this task we have used two different approaches: a gradient-boosting classifier with TF-IDF and LIWC features, and a transformer model fed with the first tokens of each news article. We got the fourth place out of 25 participants in this task.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Despite the efforts carried out in recent times to combat the proliferation of fake news, these have not stopped growing, taking advantage of events conducive to its dissemination, such as the current pandemic, or the events that occurred in the last presidential elections in the United States. Therefore, the existence of initiatives such as this CheckThat! Lab <ref type="bibr" target="#b0">[1]</ref> <ref type="bibr" target="#b1">[2]</ref>, which give researchers in this area of natural language processing the opportunity to propose and share different ideas that can help mitigate this problem, are appreciated.</p><p>In this article, we present the approaches used by our team in the tasks of check-worthiness and fake news detection. Since transformer models have become a fundamental tool that adapts to many of the tasks related to natural language processing obtaining state-of-the-art results, we have chosen to take them as our first option in each of the tasks. However, in Task3a we decided to also use more classical approaches since the size of the news articles to be checked exceeded the input sequence size that is reasonable to define in a transformer model.</p><p>We have organized the rest of the article as follows: in section 2 we briefly describe the transformer models, the approach we have used in tasks 1A-English and 1A-Spanish and we comment on the results obtained, in section 3 we explain our approach in the fact-checking task and discuss the results obtained, and section 4 contains our conclusions and future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Transfomers for Check-Worthiness</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Previous Approaches in the Check-Worthiness Task</head><p>Among the approaches that have been used to tackle this task we can highlight the initial work carried out by <ref type="bibr" target="#b2">[3]</ref> where they make use of classifiers such as Random Forest, SVM or Multinomial Naive Bayes, and features based on TF-IDF representations, parts of speech tags, sentiment scores, and entity types. To the aforementioned methods <ref type="bibr" target="#b3">[4]</ref> add features such as average embedding vector of the sentence, linguistic features that count the number of words in the sentence that belong to a certain lexicon, contextual features such as the position of a sentence with respect to others in a segment of text, discourse features such as the detection of contradictions, and as a classifier uses a Deep Feed-Forward Neural Network. Already within this Check That! Lab we have seen in past editions the use of recurrent neural networks by <ref type="bibr" target="#b4">[5]</ref> where each token is represented in three ways: through embeddings, and with part of speech tags and syntactic dependencies encoded as one-hot vectors. In the same edition <ref type="bibr" target="#b5">[6]</ref> makes use of character n-gram features with a k-nearest neighbors classifier. More recently in this same Lab, transformer models began to be used for the check-worthiness task by many of the participants <ref type="bibr" target="#b6">[7]</ref>[8] <ref type="bibr" target="#b8">[9]</ref>. In the next section we will see a short description of this architecture.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">The Transformer Model</head><p>Since its appearance as an alternative to neural machine translation models, transformer models <ref type="bibr" target="#b9">[10]</ref> have become a preferred model when compared to other natural language processing techniques, not only in machine translation, but in other tasks such as sequence classification, summarization, named entity recognition, text generation, extractive question answering or language modeling.</p><p>A transformer is a deep learning model that "translates" input sequences into output sequences using an encoder-decoder architecture. It uses an attention mechanism to identify the most relevant parts of the input and output sequences. Previous models such as RNNs also use an attention mechanism but are limited by their sequential nature when processing input data. Transformers, by relying solely on the attention mechanism, do not need to process the input sequences in a specific order, allowing them to process these sequences in parallel and thus reducing training times.</p><p>The model is fed with training data in the form of sequence pairs (input, target). The first is applied in the encoder block and the second in the decoder block.</p><p>In recurrent models, sequences are introduced token by token, thus providing the relative position of each of these tokens in the sequence. Since transformers do not process sequences in this way, this positional information is provided to the model as a mask added to the input and target sequences.</p><p>The encoder block is made up of a stack of n identical encoders, each of them with a selfattention layer and a feed fordward neural network. The decoder block is made up of the same number n of decorders and each of them is composed of a self-attention layer, an encoder-decoder attention layer and a feed forward neural network.</p><p>The self-attention layers allow to identify within the same sequence, which tokens are more relevant for another token that is being considered at that moment. On the contrary, the encoder-decoder attention layer relates tokens of the input and target sequences. The attention layers are not monolithic, but are composed of several attention heads that focus on different portions of the sequence.</p><p>The output of the encoder block is the one that feeds all the encoder-decoder attention layers of the decoder block, while the output of the decoder block links with a linear layer and this with a softmax layer that maps each position of the target sequence with the output vocabulary.</p><p>What is described above is the original model however, after its presentation a large number of models derived from the transformer architecture have appeared. For example, one of the most successful is BERT <ref type="bibr" target="#b10">[11]</ref>, which basically eliminates the decoder block present in transformers, and in its training the input sequences are masked in such a way that it processes them bidirectionally.</p><p>Another point to highlight is that as part of these architectural-models a series of data-models pre-trained in an unsupervised manner with large datasets have been released. This allows us to easily apply transfer-learning to different tasks such as those mentioned at the beginning of this section.</p><p>Next, we will describe how we have used some of these models in the check-worthiness and fake news detection tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Task 1A English</head><p>The objective of Task1a-English <ref type="bibr" target="#b11">[12]</ref> is, given a set of tweets in English language related to the COVID-19 topic, to identify which tweets are worth checking by assigning a score to each of them.</p><p>To tackle this task we eliminated any metadata present in the tweets and have focused only on the textual information provided.</p><p>Taking into account that all the tweets to be evaluated are about COVID-19, we have searched a well-known repository of pre-trained models<ref type="foot" target="#foot_0">1</ref> , and we have found one that is trained in tweets related to this topic.</p><p>Finally, we have used the BERTweet model <ref type="bibr" target="#b12">[13]</ref>, a BERT-architecture model initially pretrained with 850 million tweets in English using the RoBERTa <ref type="bibr" target="#b13">[14]</ref> pre-training procedure, to which the same authors performed a second 40-epoch pre-training with 23 million English tweets related to the COVID-19 topic.</p><p>To check if actually using a pre-trained model for the same topic and document type had a superior behavior to other pre-trained models and architectures in more neutral datasets, we implemented a grid search procedure in which we varied the number of periods, the size of the lot and the model/architecture used. The rest of the hyperparameters have been kept in the default values that each model has.</p><p>Among the transformer models we have tested are BERT, ALBERT <ref type="bibr" target="#b14">[15]</ref>, RoBERTa, DistilBERT <ref type="bibr" target="#b15">[16]</ref>, and Funnel-Transformer <ref type="bibr" target="#b16">[17]</ref>. Table <ref type="table" target="#tab_0">1</ref> shows the best results obtained for each model for the mean average precision, F1, precision-recall curve and ROC curve measurements, sorted by mean average precision.</p><p>As we can see, the best behavior is obtained with the model that is pre-trained in tweets related to the COVID-19 topic.</p><p>Therefore we select the first two models bertweet-covid19-base-uncased and bertweet-covid19base-cased and we test various values of the epsilon parameter obtaining the best results with the value 2.5 × 10 −9 . These results are shown in Table <ref type="table">2</ref>.</p><p>We also found that although we always initialized the Python, NumPy, and PyTorch random number generators with the same seeds, the same results did not always appear for a given set of parameters. Therefore, to make the final shipments, we do not join the training and dev datasets to have a larger one with which to train the models, but we train the models with the training dataset and evaluate them with the dev dataset, repeatedly executing the same configurations of parameters and selecting the test files to send from the best results obtained on the dev dataset, assuming that an initial random configuration that behaved well in the dev dataset would also do so in the test dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Task 1A Spanish</head><p>In this version of Task 1A, the set of tweets is defined in Spanish language and these tweets are related to issues of Spanish politics.</p><p>As in Task 1A English, we have used several transformer models to evaluate which one best suits these types of tweets. The tested models have been BERT, Electra <ref type="bibr" target="#b17">[18]</ref> and RoBERTa.</p><p>After a preliminary grid search with different pre-trained models in Spanish and different values of batch size and epochs, keeping the rest of the hyperparameters in their default values, we obtained the results shown in Table <ref type="table" target="#tab_1">3</ref>. The best results are shown for each pre-trained model.</p><p>Since the model Electra mrm8488-electricity-base-discriminator <ref type="foot" target="#foot_1">2</ref> is the one with a slightly higher result, it is the one we selected for a more exhaustive search for parameters. This Electra model is pre-trained with 20GB of the Spanish-language Oscar corpus <ref type="bibr" target="#b18">[19]</ref>.</p><p>We also realized, extracting the vocabulary from this pre-trained model, that among the first 1000 tokens there were 971 unused tokens of type <ref type="bibr">[unusedNNN]</ref>.</p><p>To see if these tokens could be useful, we pulled all the out-of-vocabulary tokens of the training dataset. From this set of words, we manually selected those that seemed most relevant to us and had three or more appearances, mainly the names of politicians, political parties, the media, and hashtags used in electoral campaigns. In total, the list consisted of 197 tokens.</p><p>With this list, we create a dictionary to group tokens that correspond to the same concept. For example, #PINParental, pin and parental were matched with the same PINParental token.</p><p>In this dictionary, we substitute the tokens on the right side by tokens <ref type="bibr">[unusedNNN]</ref> to obtain a match between the out-of-vocabulary tokens with the unused tokens of the model, and both in the training loop and in the evaluation loop we did the replacement of the out-of-vocabulary tokens using this dictionary.</p><p>Unfortunately, the results obtained with this strategy were not as expected, obtaining better results without substituting out-of-vocabulary tokens. The best results obtained after repeated Task MAP MRR RP P@1 P@3 P@5 P@10 P@20 P@30 1A Spanish 0.492 1.000 0.475 1.000 1.000 1.000 0.800 0.800 0.620 1A English 0.224 1.000 0.211 1.000 0.667 0.400 0.300 0.200 0.160 runs with different batch sizes and epochs are shown in Table <ref type="table" target="#tab_2">4</ref>, along with the best results obtained by substituting tokens.</p><p>To send the submissions to this version in Spanish of subtask 1A, the same strategy was used as in the English version: training the model repeatedly for the same parameters and send the configurations with the best values in the dev dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5.">Task 1A Results</head><p>Finally, two submissions were made for the Spanish version of Task 1A and three submissions for the English version. The official evaluation measure was mean average precision (MAP). In Spanish we obtained the fourth position among six participants while in English we obtained the first position among ten participants. The results are shown in Table <ref type="table">5</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Fake News Detection Task</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Previous Approaches in the Fake News Detection Task</head><p>The approaches to the detection of fake news that have been made so far can be divided into three groups: knowledge-based methods, content-based method and context-based methods.</p><p>In the former, each claim is compared with a source of evidence that supports that claim. The source of evidence can be a knowledge graph <ref type="bibr" target="#b19">[20]</ref> in which case we must extract subjectpredicate-object triples from the claim and verify their existence in the graph, or we can be use as a source of evidence the information retrieved from a query to a search engine <ref type="bibr" target="#b20">[21]</ref>, having then to compare the information obtained with the claim using techniques such as similarity, stance detection, contradiction detection, etc.</p><p>Content-based methods only use the textual information in the document. The features obtained can be latent, such as word or sentence embeddings, or explicit such as TF-IDF vectors, bag of words vectors, word counts <ref type="bibr" target="#b21">[22]</ref>, psycho-linguistic features <ref type="bibr" target="#b22">[23]</ref>, etc. Transformers and RNNs can also be considered as a content-based method that uses latent features.</p><p>In context-based features the information surrounding the claim is used to verify its degree of truthfulness. Examples of these features can be those based on propagation <ref type="bibr" target="#b23">[24]</ref>, based on the user's reputation <ref type="bibr" target="#b24">[25]</ref>, based on their profile <ref type="bibr" target="#b25">[26]</ref>, etc.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Task 3A -English</head><p>For the fake news detection task in English <ref type="bibr" target="#b26">[27]</ref>, from a set of news articles we have to classify each item in one of the following categories: true, partially true, false, or other <ref type="bibr" target="#b27">[28]</ref>[29] <ref type="bibr" target="#b29">[30]</ref>, taking into account the main claim of the news article.</p><p>The organizers provided three different training datasets <ref type="bibr" target="#b30">[31]</ref>, so we joined these three datasets and left 20% as a dev dataset for a total of 760 training instances and 190 validation instances.</p><p>To tackle this task we have used two different approaches. The first of them is, as in the tasks dedicated to determining the check-worthiness of a sentence, to use transformer models to check if the latent features that these models extract from the documents can be related to their veracity.</p><p>The second approach is to use the more classical ensemble methods together with various types of features such as TF-IDF and LIWC.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1.">Transformer approach</head><p>A grid search has been carried out with four different transformer models: ALBERT, BERT, DistilBERT and Funnel-Transformer, and different batch sizes and number of epochs.</p><p>Given that one of the limitations of the transformer models is the length of the sequence that they accept as input, we have assumed that the relevant information for each news article is likely to be found at the beginning of it. In this way we have extracted the first 150 and 200 tokens as input for the models. We have also tried to use the first 150 tokens of the article title as input. As some instances had no title, in those cases we have used the first 150 tokens of the article text. The four possible class values have been converted to integer values so that they could be processed correctly.</p><p>The Table <ref type="table" target="#tab_3">6</ref> shows the best results obtained for each transformer model. Given that this is a multi-class classification, we have used precision, coverage and F1 as evaluation measures, taking this last measure as the main one. As can be seen, the title of the article does not seem to contain enough information about its veracity, and a longer sequence length provides better results, as expected.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2.">Ensemble approach</head><p>In this second approach we use the random forest <ref type="bibr" target="#b31">[32]</ref> and gradient boosting <ref type="bibr" target="#b32">[33]</ref> classifiers. We extracted the text of each article and processed it with the LIWC2015 <ref type="bibr" target="#b33">[34]</ref> text analysis tool, obtaining a total of 93 discrete features<ref type="foot" target="#foot_2">3</ref> such as Analytic, Clout, Authentic, Tone, etc. The use of LIWC in this task is motivated by the premise that false articles may have certain linguistic features that are not present in legitimate articles, and this can be reflected in the results offered by this tool. We also extract the TF-IDF vectors as features from the text of the articles.</p><p>To build the latest feature set, for each article we do a Google search using the article title as query terms.</p><p>From the first 20 results obtained, we extract the domain names from each URL and concatenate them, separating them with spaces, constructing text strings with the shape "www.politifact.com www.reuters.com www.nytimes.com apnews.com ... ". With these strings we also build a TF-IDF representation. Thus, we assume that if domain names of sites dedicated to fact-checking appear among the first 20 results, that article is at least suspected of containing some controversy. On the other hand, if the domain names are from prestigious media, the original article, true or false, may be important.</p><p>To select the proper configuration, we keep the LIWC features fixed, and we try to optionally concatenate the text TF-IDF features and the domain names TF-IDF features.</p><p>In Random Forest the number of estimators has been established at 100, the maximum depth of the tree at 1000 and as a criterion to evaluate the split quality gini has been used. In Gradient Boosting the number of estimators has also been set to 100 and as a loss function deviance has been used. The result of these tests is shown in Table <ref type="table" target="#tab_4">7</ref>.</p><p>As can be seen, the Gradient Boosting classifier is superior to Random Forest in all feature configurations. It is also able to take advantage of the information provided by all the concatenated features, while the Random Forest classifier obtains the best result when only the LIWC features are used. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Task 3A Results</head><p>In this task we have made three submissions. The first one has been generated by Gradient Boosting with the three types of features: LIWC, domain names TF-IDF and text TF-IDF. The second submission we have done with Albert transformer with albert-base language model and the article text as input with a sequence length of 150. Moreover, for primary submission we have used the same type of transformer but with a sequence length of 200.</p><p>With the best of these submissions we have achieved an F1-macro measure of 0.468 which places us in fourth position among 25 participants.</p><p>Table <ref type="table">8</ref> shows our reproduction of the results obtained by the three submissions. Unlike what happened in the dev dataset, with the test dataset the best model has been the Gradient Boosting classifier that uses the features based on LIWC, domain names TF-IDF and text TF-IDF. This tells us that although transformer models can perform well in the fake news detection task with little or no feature engineering, the use of text analysis tools like LIWC along with other handcrafted features can still be useful for profiling fake news.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusions and Future Work</head><p>In this edition of CheckThat! Lab, our team has explored the two main tasks in detecting fake news: the selection of sentences or tweets to verify and the verification of these elements themselves.</p><p>Regarding the check-worthiness task, we have verified that the transformer models can extract the latent features present in the tweets more efficiently than other methods, although is necessary to carefully choose the appropriate data model for the task, with large performance differences between some models and others.</p><p>Our participation in the English version of this task has been very positive, obtaining the first position, while in the Spanish version we have been in fourth place. We have also detected that in Spanish the mean average precision on the dev dataset (0.495) was much lower than that obtained in English (0.849). This may be due to the fact that the dataset used is not specifically pre-trained on tweets or on Spanish politics.</p><p>In the task of detecting fake news we have participated with two different approaches. On the one hand, we have used transformer models trying to extract linguistic features that identify fraudulent articles, and expecting good behavior from them. On the other hand, we have used a fairly simple Gradient Boosting classifier that uses linguistic features extracted through the LIWC tool, TF-IDF text features, and a TF-IDF representation of domain names retrieved from a Google search. We have used this second system as contrastive submission since its results were inferior to those of the transformer models. However, in the test dataset the best performance was obtained with this last model.</p><p>Being our first participation in a fake news detection task, the result was positive, obtaining fourth place among 25 participants.</p><p>We think that although it can always be improved, the check-worthiness task can be approached reasonably well by means of transformers models, so our future work will be mainly devoted to investigating alternative methods to those used in this laboratory to tackle the task of fact-checking and detection of fake news, for example using knowledge methods to verify claims.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Task 1A English -Transformer models analysis: results on dev dataset</figDesc><table><row><cell>Model</cell><cell cols="4">Epochs Batch Size MAP</cell><cell>F1</cell><cell>P-R</cell><cell>ROC</cell></row><row><cell>bertweet-covid19-base-uncased</cell><cell></cell><cell>5</cell><cell>16</cell><cell cols="2">0.849 0.767 0.848 0.874</cell></row><row><cell>bertweet-covid19-base-cased</cell><cell></cell><cell>5</cell><cell>16</cell><cell cols="2">0.845 0.790 0.843 0.879</cell></row><row><cell>bertweet-base</cell><cell></cell><cell>5</cell><cell>10</cell><cell cols="2">0.842 0.774 0.841 0.873</cell></row><row><cell>roberta-base</cell><cell></cell><cell>5</cell><cell>8</cell><cell cols="2">0.793 0.709 0.791 0.836</cell></row><row><cell>funnel-transformer/small</cell><cell></cell><cell>3</cell><cell>8</cell><cell cols="2">0.785 0.654 0.784 0.783</cell></row><row><cell>funnel-transformer/small-base</cell><cell></cell><cell>3</cell><cell>8</cell><cell cols="2">0.785 0.654 0.784 0.783</cell></row><row><cell>funnel-transformer/intermediate</cell><cell></cell><cell>3</cell><cell>8</cell><cell cols="2">0.761 0.637 0.759 0.768</cell></row><row><cell cols="2">funnel-transformer/intermediate-base</cell><cell>3</cell><cell>8</cell><cell cols="2">0.761 0.637 0.759 0.768</cell></row><row><cell>distilbert-base-cased</cell><cell></cell><cell>5</cell><cell>8</cell><cell cols="2">0.752 0.688 0.749 0.790</cell></row><row><cell>funnel-transformer/medium</cell><cell></cell><cell>5</cell><cell>8</cell><cell cols="2">0.737 0.707 0.731 0.820</cell></row><row><cell>funnel-transformer/medium-base</cell><cell></cell><cell>5</cell><cell>8</cell><cell cols="2">0.737 0.707 0.731 0.820</cell></row><row><cell>bert-base-cased</cell><cell></cell><cell>5</cell><cell>8</cell><cell cols="2">0.733 0.672 0.729 0.774</cell></row><row><cell>bert-base-multilingual-cased</cell><cell></cell><cell>5</cell><cell>8</cell><cell cols="2">0.726 0.636 0.722 0.786</cell></row><row><cell>albert-base-v2</cell><cell></cell><cell>5</cell><cell>16</cell><cell cols="2">0.694 0.677 0.691 0.756</cell></row><row><cell>distilbert-base-multilingual-cased</cell><cell></cell><cell>5</cell><cell>8</cell><cell cols="2">0.680 0.697 0.673 0.764</cell></row><row><cell>Table 2</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="5">Task 1A English -Selected transformer models: results on dev dataset</cell></row><row><cell>Model</cell><cell cols="4">Epochs Batch Size MAP</cell><cell>F1</cell><cell>P-R</cell><cell>ROC</cell></row><row><cell>bertweet-covid19-base-uncased</cell><cell>6</cell><cell>14</cell><cell></cell><cell cols="2">0.862 0.800 0.861 0.874</cell></row><row><cell>bertweet-covid19-base-cased</cell><cell>5</cell><cell>14</cell><cell></cell><cell cols="2">0.860 0.797 0.859 0.883</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3</head><label>3</label><figDesc>Task 1A Spanish -Transformer models analysis: results on dev dataset</figDesc><table><row><cell>Model</cell><cell cols="3">Epochs Batch Size MAP</cell><cell>F1</cell><cell>P-R</cell><cell>ROC</cell></row><row><cell>Electra mrm8488-electricidad-base-discriminator</cell><cell>3</cell><cell>16</cell><cell cols="3">0.495 0.384 0.492 0.885</cell></row><row><cell>BERT Geotrend-bert-base-es-cased</cell><cell>3</cell><cell>8</cell><cell cols="3">0.474 0.439 0.472 0.874</cell></row><row><cell>BERT dccuchile-bert-base-spanish-wwm-cased</cell><cell>3</cell><cell>16</cell><cell cols="3">0.467 0.458 0.465 0.879</cell></row><row><cell>RoBERTa mrm8488-RuPERTa-base</cell><cell>3</cell><cell>8</cell><cell cols="3">0.376 0.341 0.372 0.836</cell></row><row><cell>Electra mrm8488-electricidad-base-generator</cell><cell>5</cell><cell>8</cell><cell cols="3">0.325 0.130 0.318 0.830</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 4</head><label>4</label><figDesc>Task 1A Spanish -Selected models: results on dev dataset</figDesc><table><row><cell>Model</cell><cell cols="3">Epochs Batch Size MAP</cell><cell>F1</cell><cell>P-R</cell><cell>ROC</cell></row><row><cell>mrm8488-elect-base-discr. without replacement</cell><cell>3</cell><cell>12</cell><cell cols="3">0.514 0.480 0.512 0.878</cell></row><row><cell>mrm8488-elect-base-discr. without replacement</cell><cell>3</cell><cell>14</cell><cell cols="3">0.510 0.472 0.506 0.892</cell></row><row><cell>mrm8488-elect-base-discr. without replacement</cell><cell>3</cell><cell>16</cell><cell cols="3">0.509 0.390 0.506 0.892</cell></row><row><cell>mrm8488-elect-base-discr. with replacement</cell><cell>3</cell><cell>18</cell><cell cols="3">0.466 0.277 0.463 0.870</cell></row><row><cell>mrm8488-elect-base-discr. with replacement</cell><cell>6</cell><cell>18</cell><cell cols="3">0.458 0.417 0.456 0.839</cell></row><row><cell>mrm8488-elect-base-discr. with replacement</cell><cell>4</cell><cell>10</cell><cell cols="3">0.452 0.419 0.449 0.872</cell></row><row><cell>Table 5</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Task 1A -Submission official results</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 6</head><label>6</label><figDesc>Task 3A -Transformer models results on dev dataset</figDesc><table><row><cell>Model</cell><cell cols="2">Epochs Batch Size</cell><cell>Input</cell><cell>Prec. Rec.</cell><cell>F1</cell></row><row><cell>albert-base-v2</cell><cell>9</cell><cell>8</cell><cell cols="3">Text 200 0.445 0.424 0.427</cell></row><row><cell>funnel-transformer-intermediate</cell><cell>7</cell><cell>8</cell><cell cols="3">Text 200 0.436 0.409 0.402</cell></row><row><cell>albert-base-v2</cell><cell>8</cell><cell>8</cell><cell cols="3">Text 150 0.418 0.398 0.397</cell></row><row><cell>funnel-transformer-intermediate</cell><cell>9</cell><cell>8</cell><cell cols="3">Text 150 0.405 0.394 0.387</cell></row><row><cell>bert-base-cased</cell><cell>9</cell><cell>8</cell><cell cols="3">Text 200 0.383 0.386 0.382</cell></row><row><cell>distilbert-base-cased</cell><cell>6</cell><cell>8</cell><cell cols="3">Text 200 0.397 0.371 0.374</cell></row><row><cell>bert-base-cased</cell><cell>10</cell><cell>8</cell><cell cols="3">Text 150 0.370 0.368 0.362</cell></row><row><cell>distilbert-base-cased</cell><cell>9</cell><cell>8</cell><cell cols="3">Text 150 0.351 0.345 0.345</cell></row><row><cell>distilbert-base-cased</cell><cell>6</cell><cell>8</cell><cell cols="3">Title 150 0.354 0.367 0.344</cell></row><row><cell>bert-base-cased</cell><cell>6</cell><cell>8</cell><cell cols="3">Title 150 0.375 0.375 0.340</cell></row><row><cell>funnel-transformer-intermediate</cell><cell>8</cell><cell>8</cell><cell cols="3">Title 150 0.423 0.329 0.322</cell></row><row><cell>albert-base-v2</cell><cell>6</cell><cell>8</cell><cell cols="3">Title 150 0.335 0.341 0.316</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 7</head><label>7</label><figDesc>Task 3A -Ensemble models features analysis: results on dev dataset</figDesc><table><row><cell>Model</cell><cell cols="5">Domain Text LIWC Prec. Rec.</cell><cell>F1</cell></row><row><cell>Gradient Boosting</cell><cell>true</cell><cell>true</cell><cell>true</cell><cell cols="2">0.428 0.369 0.366</cell></row><row><cell>Gradient Boosting</cell><cell>false</cell><cell>true</cell><cell>true</cell><cell cols="2">0.419 0.366 0.364</cell></row><row><cell>Gradient Boosting</cell><cell>false</cell><cell>false</cell><cell>true</cell><cell cols="2">0.420 0.346 0.338</cell></row><row><cell>Gradient Boosting</cell><cell>true</cell><cell>false</cell><cell>true</cell><cell cols="2">0.393 0.343 0.334</cell></row><row><cell>Random Forest</cell><cell>false</cell><cell>false</cell><cell>true</cell><cell cols="2">0.386 0.335 0.319</cell></row><row><cell>Random Forest</cell><cell>false</cell><cell>true</cell><cell>true</cell><cell cols="2">0.574 0.325 0.303</cell></row><row><cell>Random Forest</cell><cell>true</cell><cell>true</cell><cell>true</cell><cell cols="2">0.524 0.306 0.277</cell></row><row><cell>Random Forest</cell><cell>true</cell><cell>false</cell><cell>true</cell><cell cols="2">0.462 0.274 0.226</cell></row><row><cell>Table 8</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Task 3A -Submissions official results</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Model</cell><cell></cell><cell></cell><cell></cell><cell>Prec.</cell><cell>Rec.</cell><cell>F1</cell></row><row><cell cols="6">Gradient Boosting + Domain + Text + LIWC 0.5055 0.4805 0.4680</cell></row><row><cell cols="3">Albert-base + sequence lenght 150</cell><cell></cell><cell cols="2">0.4653 0.4109 0.4237</cell></row><row><cell cols="3">Albert-base + sequence lenght 200</cell><cell></cell><cell cols="2">0.3779 0.3742 0.3691</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://huggingface.co/transformers/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://huggingface.co/mrm8488/electricidad-base-discriminator</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">These are all the features that this tool provides.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work has been partially supported by the Spanish Ministry of Science and Innovation within the DOTT-HEALTH Project (MCI/AEI/FEDER, UE) under Grant PID2019-106942RB-C32, as well as project EXTRAE II (IMIENS 2019) and the research network AEI RED2018-102312-T (IA-Biomed).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The CLEF-2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News</title>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Da San Martino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Elsayed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Barrón-Cedeño</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Míguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shaar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Alam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Haouari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hasanain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Babulkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Shahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Struß</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-72240-1_75</idno>
		<ptr target="https://link.springer.com/chapter/10.1007/978-3-030-72240-1_75" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 43rd European Conference on Information Retrieval, ECIR &apos;21</title>
				<meeting>the 43rd European Conference on Information Retrieval, ECIR &apos;21<address><addrLine>Lucca, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="639" to="649" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Overview of the CLEF-2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News</title>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Da San Martino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Elsayed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Barrón-Cedeño</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Míguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shaar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Alam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Haouari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hasanain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Babulkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Shahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Struß</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Modha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kutlu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">S</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 12th International Conference of the CLEF Association: Access Evaluation Meets Multiliguality, Multimodality, and Visualization, CLEF &apos;2021</title>
				<meeting>the 12th International Conference of the CLEF Association: Access Evaluation Meets Multiliguality, Multimodality, and Visualization, CLEF &apos;2021<address><addrLine>Bucharest, Romania</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Detecting check-worthy factual claims in presidential debates</title>
		<author>
			<persName><forename type="first">N</forename><surname>Hassan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tremayne</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th acm international on conference on information and knowledge management</title>
				<meeting>the 24th acm international on conference on information and knowledge management</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1835" to="1838" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A context-aware approach for detecting worth-checking claims in political debates</title>
		<author>
			<persName><forename type="first">P</forename><surname>Gencheva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Màrquez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Barrón-Cedeño</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Koychev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference Recent Advances in Natural Language Processing</title>
				<meeting>the International Conference Recent Advances in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2017">2017. 2017</date>
			<biblScope unit="page" from="267" to="276" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The Copenhagen Team Participation in the Check-Worthiness Task of the Competition of Automatic Identification and Verification of Claims in Political Debates of the CLEF</title>
		<author>
			<persName><forename type="first">C</forename><surname>Hansen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hansen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">G</forename><surname>Simonsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lioma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">CheckThat! Lab</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">UPV-INAOE-Autoritas -Check That: Preliminary Approach for Checking Worthiness of Claims</title>
		<author>
			<persName><forename type="first">B</forename><surname>Ghanem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Montes-Y Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">6</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Williams</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rodrigues</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Novak</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2009.02431[cs</idno>
		<idno>arXiv:</idno>
		<ptr target="2009.02431" />
		<title level="m">Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of claims using transformer-based models</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Nikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">D S</forename><surname>Martino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Koychev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2009.02931[cs</idno>
		<idno>arXiv:</idno>
		<ptr target="2009.02931" />
		<title level="m">Team Alex at CLEF CheckThat! 2020: Identifying Check-Worthy Tweets With Transformer Models</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">S</forename><surname>Cheema</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hakimov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ewerth</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2007.10534</idno>
		<idno>arXiv:</idno>
		<ptr target="2007.10534" />
		<title level="m">Check_square at CheckThat! 2020: Claim Detection in Social Media via Fusion of Transformer and Syntactic Features</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">\</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="5998" to="6008" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.04805</idno>
		<title level="m">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Overview of the CLEF-2021 CheckThat! Lab Task 1 on Check-Worthiness Estimation in Tweets and Political Debates</title>
		<author>
			<persName><forename type="first">S</forename><surname>Shaar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hasanain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Hamdan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">S</forename><surname>Ali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Haouari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">K</forename><surname>Nikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">A</forename><surname>Yavuz Selim Kartal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Da San Martino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Barrón-Cedeño</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Míguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Elsayed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum, CLEF &apos;2021</title>
				<meeting><address><addrLine>Bucharest, Romania</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">Q</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Vu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">T</forename><surname>Nguyen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2005.10200</idno>
		<title level="m">BERTweet: A pre-trained language model for English Tweets</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1907.11692[cs</idno>
		<idno>arXiv:</idno>
		<ptr target="1907.11692" />
		<title level="m">RoBERTa: A Robustly Optimized BERT Pretraining Approach</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">Z</forename><surname>Lan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Goodman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Gimpel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sharma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Soricut</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1909.11942[cs</idno>
		<idno>arXiv:</idno>
		<ptr target="1909.11942" />
		<title level="m">ALBERT: A Lite BERT for Self-supervised Learning of Language Representations</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Distilbert</forename></persName>
		</author>
		<idno type="arXiv">arXiv:1910.01108[cs</idno>
		<idno>arXiv:</idno>
		<ptr target="1910.01108" />
		<title level="m">a distilled version of BERT: smaller, faster, cheaper and lighter</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Funnel-Transformer: out Sequential Redundancy for Efficient Language Processing</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2006.03236[cs,stat</idno>
		<idno>arXiv:</idno>
		<ptr target="2006.03236" />
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-T</forename><surname>Luong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2003.10555</idno>
		<idno>arXiv:</idno>
		<ptr target="2003.10555" />
		<title level="m">ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">J O</forename><surname>Suárez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Romary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sagot</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.156</idno>
		<idno>arXiv:</idno>
		<ptr target="2006.06202" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1703" to="1714" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Computational Fact Checking from Knowledge Networks</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">L</forename><surname>Ciampaglia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shiralkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">M</forename><surname>Rocha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bollen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Menczer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Flammini</surname></persName>
		</author>
		<idno type="DOI">10.1371/journal.pone.0128193</idno>
		<ptr target="http://dx.plos.org/10.1371/journal.pone.0128193.doi:10.1371/journal.pone.0128193" />
	</analytic>
	<monogr>
		<title level="j">PLOS ONE</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">e0128193</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Karadzhov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Marquez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Barron-Cedeno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Koychev</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1710.00341[cs</idno>
		<idno>arXiv:</idno>
		<ptr target="1710.00341" />
		<title level="m">Fully Automated Fact Checking Using External Sources</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">On lying and being lied to: A linguistic analysis of deception in computer-mediated communication</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T</forename><surname>Hancock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">E</forename><surname>Curry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Goorha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Woodworth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Discourse Processes</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page" from="1" to="23" />
			<date type="published" when="2007">2007</date>
			<publisher>Taylor &amp; Francis</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">The lie detector: Explorations in the automatic recognition of deceptive language</title>
		<author>
			<persName><forename type="first">R</forename><surname>Mihalcea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Strapparava</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Association for Computational Linguistics</title>
				<meeting>the ACL-IJCNLP 2009 Conference Short Papers, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="309" to="312" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">B</forename><surname>Gouza</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1805.08751</idno>
		<title level="m">Fake news detection with deep diffusive network model</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Do not trust the trolls: Predicting credibility in community question answering forums</title>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mihaylova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Màrquez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shiroya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Koychev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference Recent Advances in Natural Language Processing</title>
				<meeting>the International Conference Recent Advances in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2017">2017. 2017</date>
			<biblScope unit="page" from="551" to="560" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Shu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zafarani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1904.13355</idno>
		<idno>arXiv:</idno>
		<ptr target="1904.13355" />
		<title level="m">The Role of User Profile for Fake News Detection</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Overview of the CLEF-2021 CheckThat! Lab Task 3 on Fake News Detection</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Shahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Struß</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum, CLEF &apos;2021</title>
				<meeting><address><addrLine>Bucharest, Romania</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">An exploratory study of covid-19 misinformation on twitter</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Shahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dirkson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">A</forename><surname>Majchrzak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Online Social Networks and Media</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="page">100104</biblScope>
			<date type="published" when="2021">2021</date>
			<publisher>Elsevier</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">FakeCovid -A Multilingual Cross-domain Fact Check News Dataset for COVID-19</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Shahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nandini</surname></persName>
		</author>
		<ptr target="http://workshop-proceedings.icwsm.org/pdf/2020_14.pdf" />
	</analytic>
	<monogr>
		<title level="m">Workshop Proceedings of the 14th International AAAI Conference on Web and Social Media</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Shahi</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2010.00502</idno>
		<title level="m">AMUSED: An Annotation Framework of Multi-modal Social Media Data</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<title level="m" type="main">Task 3: Fake news detection at CLEF-2021 CheckThat!</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Shahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Struß</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.4714517</idno>
		<ptr target="https://doi.org/10.5281/zenodo.4714517.doi:10.5281/zenodo.4714517" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Breiman</surname></persName>
		</author>
		<title level="m">Random forests, Machine learning</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="5" to="32" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Stochastic gradient boosting</title>
		<author>
			<persName><forename type="first">J</forename><surname>Friedman</surname></persName>
		</author>
		<idno type="DOI">10.1016/S0167-9473(01)00065-2</idno>
		<ptr target="https://linkinghub.elsevier.com/retrieve/pii/S0167947301000652.doi:10.1016/S0167-9473(01)00065-2" />
	</analytic>
	<monogr>
		<title level="j">Computational Statistics &amp; Analysis</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="page" from="367" to="378" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Pennebaker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">L</forename><surname>Boyd</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Jordan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Blackburn</surname></persName>
		</author>
		<title level="m">The development and psychometric properties of LIWC2015</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
