NLP&IR@UNED at CheckThat! 2021: Check-worthiness estimation and fake news detection using transformer models

NLP&IR@UNED at CheckThat! 2021: Check-worthiness estimation and fake news detection using transformer models JuanRMartinez-Rico jrmartinezrico@invi.uned.es Dpto. Lenguajes y Sistemas Informáticos NLP & IR Group Universidad Nacional de Educación a Distancia (UNED)

28040 Madrid Spain

JuanMartinez-Romo juaner@lsi.uned.es Dpto. Lenguajes y Sistemas Informáticos NLP & IR Group Universidad Nacional de Educación a Distancia (UNED)

28040 Madrid Spain

Instituto Mixto de Investigación -Escuela Nacional de Sanidad (IMIENS) LourdesAraujo Dpto. Lenguajes y Sistemas Informáticos NLP & IR Group Universidad Nacional de Educación a Distancia (UNED)

28040 Madrid Spain

Instituto Mixto de Investigación -Escuela Nacional de Sanidad (IMIENS) NLP&IR@UNED at CheckThat! 2021: Check-worthiness estimation and fake news detection using transformer models 1613-0073 C195CC39FB90E342FBF80E7152AA646B GROBID - A machine learning software for extracting information from scholarly documents check-worthiness fake news detection transformer models

This article describes the different approaches used by the NLPIR@UNED team in the CLEF2021 Check-That! Lab to tackle the tasks 1A-English, 1A-Spanish and 3A-English. The goal of Task 1A in English is to determine which tweets within a set of COVID-19 related tweets are worth checking. Task 1A in Spanish is similar but in this case the tweets are related to political issues in Spain. In both tasks, transformer models have been used to identify check-worthy tweets, obtaining the first place in the task in English and the fourth place in the task in Spanish. Task 3A is focused on determining the veracity of a news article. It is a multi-class classification problem with four possible values: true, partially false, false, and other. For this task we have used two different approaches: a gradient-boosting classifier with TF-IDF and LIWC features, and a transformer model fed with the first tokens of each news article. We got the fourth place out of 25 participants in this task.

Introduction

Despite the efforts carried out in recent times to combat the proliferation of fake news, these have not stopped growing, taking advantage of events conducive to its dissemination, such as the current pandemic, or the events that occurred in the last presidential elections in the United States. Therefore, the existence of initiatives such as this CheckThat! Lab [1] [2], which give researchers in this area of natural language processing the opportunity to propose and share different ideas that can help mitigate this problem, are appreciated.

In this article, we present the approaches used by our team in the tasks of check-worthiness and fake news detection. Since transformer models have become a fundamental tool that adapts to many of the tasks related to natural language processing obtaining state-of-the-art results, we have chosen to take them as our first option in each of the tasks. However, in Task3a we decided to also use more classical approaches since the size of the news articles to be checked exceeded the input sequence size that is reasonable to define in a transformer model.

We have organized the rest of the article as follows: in section 2 we briefly describe the transformer models, the approach we have used in tasks 1A-English and 1A-Spanish and we comment on the results obtained, in section 3 we explain our approach in the fact-checking task and discuss the results obtained, and section 4 contains our conclusions and future work.

Transfomers for Check-Worthiness

Previous Approaches in the Check-Worthiness Task

Among the approaches that have been used to tackle this task we can highlight the initial work carried out by [3] where they make use of classifiers such as Random Forest, SVM or Multinomial Naive Bayes, and features based on TF-IDF representations, parts of speech tags, sentiment scores, and entity types. To the aforementioned methods [4] add features such as average embedding vector of the sentence, linguistic features that count the number of words in the sentence that belong to a certain lexicon, contextual features such as the position of a sentence with respect to others in a segment of text, discourse features such as the detection of contradictions, and as a classifier uses a Deep Feed-Forward Neural Network. Already within this Check That! Lab we have seen in past editions the use of recurrent neural networks by [5] where each token is represented in three ways: through embeddings, and with part of speech tags and syntactic dependencies encoded as one-hot vectors. In the same edition [6] makes use of character n-gram features with a k-nearest neighbors classifier. More recently in this same Lab, transformer models began to be used for the check-worthiness task by many of the participants [7][8] [9]. In the next section we will see a short description of this architecture.

The Transformer Model

Since its appearance as an alternative to neural machine translation models, transformer models [10] have become a preferred model when compared to other natural language processing techniques, not only in machine translation, but in other tasks such as sequence classification, summarization, named entity recognition, text generation, extractive question answering or language modeling.

A transformer is a deep learning model that "translates" input sequences into output sequences using an encoder-decoder architecture. It uses an attention mechanism to identify the most relevant parts of the input and output sequences. Previous models such as RNNs also use an attention mechanism but are limited by their sequential nature when processing input data. Transformers, by relying solely on the attention mechanism, do not need to process the input sequences in a specific order, allowing them to process these sequences in parallel and thus reducing training times.

The model is fed with training data in the form of sequence pairs (input, target). The first is applied in the encoder block and the second in the decoder block.

In recurrent models, sequences are introduced token by token, thus providing the relative position of each of these tokens in the sequence. Since transformers do not process sequences in this way, this positional information is provided to the model as a mask added to the input and target sequences.

The encoder block is made up of a stack of n identical encoders, each of them with a selfattention layer and a feed fordward neural network. The decoder block is made up of the same number n of decorders and each of them is composed of a self-attention layer, an encoder-decoder attention layer and a feed forward neural network.

The self-attention layers allow to identify within the same sequence, which tokens are more relevant for another token that is being considered at that moment. On the contrary, the encoder-decoder attention layer relates tokens of the input and target sequences. The attention layers are not monolithic, but are composed of several attention heads that focus on different portions of the sequence.

The output of the encoder block is the one that feeds all the encoder-decoder attention layers of the decoder block, while the output of the decoder block links with a linear layer and this with a softmax layer that maps each position of the target sequence with the output vocabulary.

What is described above is the original model however, after its presentation a large number of models derived from the transformer architecture have appeared. For example, one of the most successful is BERT [11], which basically eliminates the decoder block present in transformers, and in its training the input sequences are masked in such a way that it processes them bidirectionally.

Another point to highlight is that as part of these architectural-models a series of data-models pre-trained in an unsupervised manner with large datasets have been released. This allows us to easily apply transfer-learning to different tasks such as those mentioned at the beginning of this section.

Next, we will describe how we have used some of these models in the check-worthiness and fake news detection tasks.

Task 1A English

The objective of Task1a-English [12] is, given a set of tweets in English language related to the COVID-19 topic, to identify which tweets are worth checking by assigning a score to each of them.

To tackle this task we eliminated any metadata present in the tweets and have focused only on the textual information provided.

Taking into account that all the tweets to be evaluated are about COVID-19, we have searched a well-known repository of pre-trained models1 , and we have found one that is trained in tweets related to this topic.

Finally, we have used the BERTweet model [13], a BERT-architecture model initially pretrained with 850 million tweets in English using the RoBERTa [14] pre-training procedure, to which the same authors performed a second 40-epoch pre-training with 23 million English tweets related to the COVID-19 topic.

To check if actually using a pre-trained model for the same topic and document type had a superior behavior to other pre-trained models and architectures in more neutral datasets, we implemented a grid search procedure in which we varied the number of periods, the size of the lot and the model/architecture used. The rest of the hyperparameters have been kept in the default values that each model has.

Among the transformer models we have tested are BERT, ALBERT [15], RoBERTa, DistilBERT [16], and Funnel-Transformer [17]. Table 1 shows the best results obtained for each model for the mean average precision, F1, precision-recall curve and ROC curve measurements, sorted by mean average precision.

As we can see, the best behavior is obtained with the model that is pre-trained in tweets related to the COVID-19 topic.

Therefore we select the first two models bertweet-covid19-base-uncased and bertweet-covid19base-cased and we test various values of the epsilon parameter obtaining the best results with the value 2.5 × 10 −9 . These results are shown in Table 2.

We also found that although we always initialized the Python, NumPy, and PyTorch random number generators with the same seeds, the same results did not always appear for a given set of parameters. Therefore, to make the final shipments, we do not join the training and dev datasets to have a larger one with which to train the models, but we train the models with the training dataset and evaluate them with the dev dataset, repeatedly executing the same configurations of parameters and selecting the test files to send from the best results obtained on the dev dataset, assuming that an initial random configuration that behaved well in the dev dataset would also do so in the test dataset.

Task 1A Spanish

In this version of Task 1A, the set of tweets is defined in Spanish language and these tweets are related to issues of Spanish politics.

As in Task 1A English, we have used several transformer models to evaluate which one best suits these types of tweets. The tested models have been BERT, Electra [18] and RoBERTa.

After a preliminary grid search with different pre-trained models in Spanish and different values of batch size and epochs, keeping the rest of the hyperparameters in their default values, we obtained the results shown in Table 3. The best results are shown for each pre-trained model.

Since the model Electra mrm8488-electricity-base-discriminator 2 is the one with a slightly higher result, it is the one we selected for a more exhaustive search for parameters. This Electra model is pre-trained with 20GB of the Spanish-language Oscar corpus [19].

We also realized, extracting the vocabulary from this pre-trained model, that among the first 1000 tokens there were 971 unused tokens of type [unusedNNN].

To see if these tokens could be useful, we pulled all the out-of-vocabulary tokens of the training dataset. From this set of words, we manually selected those that seemed most relevant to us and had three or more appearances, mainly the names of politicians, political parties, the media, and hashtags used in electoral campaigns. In total, the list consisted of 197 tokens.

With this list, we create a dictionary to group tokens that correspond to the same concept. For example, #PINParental, pin and parental were matched with the same PINParental token.

In this dictionary, we substitute the tokens on the right side by tokens [unusedNNN] to obtain a match between the out-of-vocabulary tokens with the unused tokens of the model, and both in the training loop and in the evaluation loop we did the replacement of the out-of-vocabulary tokens using this dictionary.

Unfortunately, the results obtained with this strategy were not as expected, obtaining better results without substituting out-of-vocabulary tokens. The best results obtained after repeated Task MAP MRR RP P@1 P@3 P@5 P@10 P@20 P@30 1A Spanish 0.492 1.000 0.475 1.000 1.000 1.000 0.800 0.800 0.620 1A English 0.224 1.000 0.211 1.000 0.667 0.400 0.300 0.200 0.160 runs with different batch sizes and epochs are shown in Table 4, along with the best results obtained by substituting tokens.

To send the submissions to this version in Spanish of subtask 1A, the same strategy was used as in the English version: training the model repeatedly for the same parameters and send the configurations with the best values in the dev dataset.

Task 1A Results

Finally, two submissions were made for the Spanish version of Task 1A and three submissions for the English version. The official evaluation measure was mean average precision (MAP). In Spanish we obtained the fourth position among six participants while in English we obtained the first position among ten participants. The results are shown in Table 5.

Fake News Detection Task

Previous Approaches in the Fake News Detection Task

The approaches to the detection of fake news that have been made so far can be divided into three groups: knowledge-based methods, content-based method and context-based methods.

In the former, each claim is compared with a source of evidence that supports that claim. The source of evidence can be a knowledge graph [20] in which case we must extract subjectpredicate-object triples from the claim and verify their existence in the graph, or we can be use as a source of evidence the information retrieved from a query to a search engine [21], having then to compare the information obtained with the claim using techniques such as similarity, stance detection, contradiction detection, etc.

Content-based methods only use the textual information in the document. The features obtained can be latent, such as word or sentence embeddings, or explicit such as TF-IDF vectors, bag of words vectors, word counts [22], psycho-linguistic features [23], etc. Transformers and RNNs can also be considered as a content-based method that uses latent features.

In context-based features the information surrounding the claim is used to verify its degree of truthfulness. Examples of these features can be those based on propagation [24], based on the user's reputation [25], based on their profile [26], etc.

Task 3A -English

For the fake news detection task in English [27], from a set of news articles we have to classify each item in one of the following categories: true, partially true, false, or other [28][29] [30], taking into account the main claim of the news article.

The organizers provided three different training datasets [31], so we joined these three datasets and left 20% as a dev dataset for a total of 760 training instances and 190 validation instances.

To tackle this task we have used two different approaches. The first of them is, as in the tasks dedicated to determining the check-worthiness of a sentence, to use transformer models to check if the latent features that these models extract from the documents can be related to their veracity.

The second approach is to use the more classical ensemble methods together with various types of features such as TF-IDF and LIWC.

Transformer approach

A grid search has been carried out with four different transformer models: ALBERT, BERT, DistilBERT and Funnel-Transformer, and different batch sizes and number of epochs.

Given that one of the limitations of the transformer models is the length of the sequence that they accept as input, we have assumed that the relevant information for each news article is likely to be found at the beginning of it. In this way we have extracted the first 150 and 200 tokens as input for the models. We have also tried to use the first 150 tokens of the article title as input. As some instances had no title, in those cases we have used the first 150 tokens of the article text. The four possible class values have been converted to integer values so that they could be processed correctly.

The Table 6 shows the best results obtained for each transformer model. Given that this is a multi-class classification, we have used precision, coverage and F1 as evaluation measures, taking this last measure as the main one. As can be seen, the title of the article does not seem to contain enough information about its veracity, and a longer sequence length provides better results, as expected.

Ensemble approach

In this second approach we use the random forest [32] and gradient boosting [33] classifiers. We extracted the text of each article and processed it with the LIWC2015 [34] text analysis tool, obtaining a total of 93 discrete features3 such as Analytic, Clout, Authentic, Tone, etc. The use of LIWC in this task is motivated by the premise that false articles may have certain linguistic features that are not present in legitimate articles, and this can be reflected in the results offered by this tool. We also extract the TF-IDF vectors as features from the text of the articles.

To build the latest feature set, for each article we do a Google search using the article title as query terms.

From the first 20 results obtained, we extract the domain names from each URL and concatenate them, separating them with spaces, constructing text strings with the shape "www.politifact.com www.reuters.com www.nytimes.com apnews.com ... ". With these strings we also build a TF-IDF representation. Thus, we assume that if domain names of sites dedicated to fact-checking appear among the first 20 results, that article is at least suspected of containing some controversy. On the other hand, if the domain names are from prestigious media, the original article, true or false, may be important.

To select the proper configuration, we keep the LIWC features fixed, and we try to optionally concatenate the text TF-IDF features and the domain names TF-IDF features.

In Random Forest the number of estimators has been established at 100, the maximum depth of the tree at 1000 and as a criterion to evaluate the split quality gini has been used. In Gradient Boosting the number of estimators has also been set to 100 and as a loss function deviance has been used. The result of these tests is shown in Table 7.

As can be seen, the Gradient Boosting classifier is superior to Random Forest in all feature configurations. It is also able to take advantage of the information provided by all the concatenated features, while the Random Forest classifier obtains the best result when only the LIWC features are used.

Task 3A Results

In this task we have made three submissions. The first one has been generated by Gradient Boosting with the three types of features: LIWC, domain names TF-IDF and text TF-IDF. The second submission we have done with Albert transformer with albert-base language model and the article text as input with a sequence length of 150. Moreover, for primary submission we have used the same type of transformer but with a sequence length of 200.

With the best of these submissions we have achieved an F1-macro measure of 0.468 which places us in fourth position among 25 participants.

Table 8 shows our reproduction of the results obtained by the three submissions. Unlike what happened in the dev dataset, with the test dataset the best model has been the Gradient Boosting classifier that uses the features based on LIWC, domain names TF-IDF and text TF-IDF. This tells us that although transformer models can perform well in the fake news detection task with little or no feature engineering, the use of text analysis tools like LIWC along with other handcrafted features can still be useful for profiling fake news.

Conclusions and Future Work

In this edition of CheckThat! Lab, our team has explored the two main tasks in detecting fake news: the selection of sentences or tweets to verify and the verification of these elements themselves.

Regarding the check-worthiness task, we have verified that the transformer models can extract the latent features present in the tweets more efficiently than other methods, although is necessary to carefully choose the appropriate data model for the task, with large performance differences between some models and others.

Our participation in the English version of this task has been very positive, obtaining the first position, while in the Spanish version we have been in fourth place. We have also detected that in Spanish the mean average precision on the dev dataset (0.495) was much lower than that obtained in English (0.849). This may be due to the fact that the dataset used is not specifically pre-trained on tweets or on Spanish politics.

In the task of detecting fake news we have participated with two different approaches. On the one hand, we have used transformer models trying to extract linguistic features that identify fraudulent articles, and expecting good behavior from them. On the other hand, we have used a fairly simple Gradient Boosting classifier that uses linguistic features extracted through the LIWC tool, TF-IDF text features, and a TF-IDF representation of domain names retrieved from a Google search. We have used this second system as contrastive submission since its results were inferior to those of the transformer models. However, in the test dataset the best performance was obtained with this last model.

Being our first participation in a fake news detection task, the result was positive, obtaining fourth place among 25 participants.

We think that although it can always be improved, the check-worthiness task can be approached reasonably well by means of transformers models, so our future work will be mainly devoted to investigating alternative methods to those used in this laboratory to tackle the task of fact-checking and detection of fake news, for example using knowledge methods to verify claims.

Table 11Task 1A English -Transformer models analysis: results on dev datasetModelEpochs Batch Size MAPF1P-RROCbertweet-covid19-base-uncased5160.849 0.767 0.848 0.874bertweet-covid19-base-cased5160.845 0.790 0.843 0.879bertweet-base5100.842 0.774 0.841 0.873roberta-base580.793 0.709 0.791 0.836funnel-transformer/small380.785 0.654 0.784 0.783funnel-transformer/small-base380.785 0.654 0.784 0.783funnel-transformer/intermediate380.761 0.637 0.759 0.768funnel-transformer/intermediate-base380.761 0.637 0.759 0.768distilbert-base-cased580.752 0.688 0.749 0.790funnel-transformer/medium580.737 0.707 0.731 0.820funnel-transformer/medium-base580.737 0.707 0.731 0.820bert-base-cased580.733 0.672 0.729 0.774bert-base-multilingual-cased580.726 0.636 0.722 0.786albert-base-v25160.694 0.677 0.691 0.756distilbert-base-multilingual-cased580.680 0.697 0.673 0.764Table 2Task 1A English -Selected transformer models: results on dev datasetModelEpochs Batch Size MAPF1P-RROCbertweet-covid19-base-uncased6140.862 0.800 0.861 0.874bertweet-covid19-base-cased5140.860 0.797 0.859 0.883

Table 33Task 1A Spanish -Transformer models analysis: results on dev datasetModelEpochs Batch Size MAPF1P-RROCElectra mrm8488-electricidad-base-discriminator3160.495 0.384 0.492 0.885BERT Geotrend-bert-base-es-cased380.474 0.439 0.472 0.874BERT dccuchile-bert-base-spanish-wwm-cased3160.467 0.458 0.465 0.879RoBERTa mrm8488-RuPERTa-base380.376 0.341 0.372 0.836Electra mrm8488-electricidad-base-generator580.325 0.130 0.318 0.830

Table 44Task 1A Spanish -Selected models: results on dev datasetModelEpochs Batch Size MAPF1P-RROCmrm8488-elect-base-discr. without replacement3120.514 0.480 0.512 0.878mrm8488-elect-base-discr. without replacement3140.510 0.472 0.506 0.892mrm8488-elect-base-discr. without replacement3160.509 0.390 0.506 0.892mrm8488-elect-base-discr. with replacement3180.466 0.277 0.463 0.870mrm8488-elect-base-discr. with replacement6180.458 0.417 0.456 0.839mrm8488-elect-base-discr. with replacement4100.452 0.419 0.449 0.872Table 5Task 1A -Submission official results

Table 66Task 3A -Transformer models results on dev datasetModelEpochs Batch SizeInputPrec. Rec.F1albert-base-v298Text 200 0.445 0.424 0.427funnel-transformer-intermediate78Text 200 0.436 0.409 0.402albert-base-v288Text 150 0.418 0.398 0.397funnel-transformer-intermediate98Text 150 0.405 0.394 0.387bert-base-cased98Text 200 0.383 0.386 0.382distilbert-base-cased68Text 200 0.397 0.371 0.374bert-base-cased108Text 150 0.370 0.368 0.362distilbert-base-cased98Text 150 0.351 0.345 0.345distilbert-base-cased68Title 150 0.354 0.367 0.344bert-base-cased68Title 150 0.375 0.375 0.340funnel-transformer-intermediate88Title 150 0.423 0.329 0.322albert-base-v268Title 150 0.335 0.341 0.316

Table 77Task 3A -Ensemble models features analysis: results on dev datasetModelDomain Text LIWC Prec. Rec.F1Gradient Boostingtruetruetrue0.428 0.369 0.366Gradient Boostingfalsetruetrue0.419 0.366 0.364Gradient Boostingfalsefalsetrue0.420 0.346 0.338Gradient Boostingtruefalsetrue0.393 0.343 0.334Random Forestfalsefalsetrue0.386 0.335 0.319Random Forestfalsetruetrue0.574 0.325 0.303Random Foresttruetruetrue0.524 0.306 0.277Random Foresttruefalsetrue0.462 0.274 0.226Table 8Task 3A -Submissions official resultsModelPrec.Rec.F1Gradient Boosting + Domain + Text + LIWC 0.5055 0.4805 0.4680Albert-base + sequence lenght 1500.4653 0.4109 0.4237Albert-base + sequence lenght 2000.3779 0.3742 0.3691

https://huggingface.co/transformers/ https://huggingface.co/mrm8488/electricidad-base-discriminator These are all the features that this tool provides.

Acknowledgments

This work has been partially supported by the Spanish Ministry of Science and Innovation within the DOTT-HEALTH Project (MCI/AEI/FEDER, UE) under Grant PID2019-106942RB-C32, as well as project EXTRAE II (IMIENS 2019) and the research network AEI RED2018-102312-T (IA-Biomed).

The CLEF-2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News PNakov GDa San Martino TElsayed ABarrón-Cedeño RMíguez SShaar FAlam FHaouari MHasanain NBabulkov ANikolov GKShahi JMStruß TMandl 10.1007/978-3-030-72240-1_75 Proceedings of the 43rd European Conference on Information Retrieval, ECIR '21 the 43rd European Conference on Information Retrieval, ECIR '21

Lucca, Italy

2021 Overview of the CLEF-2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News PNakov GDa San Martino TElsayed ABarrón-Cedeño RMíguez SShaar FAlam FHaouari MHasanain NBabulkov ANikolov GKShahi JMStruß TMandl SModha MKutlu YS Proceedings of the 12th International Conference of the CLEF Association: Access Evaluation Meets Multiliguality, Multimodality, and Visualization, CLEF '2021 the 12th International Conference of the CLEF Association: Access Evaluation Meets Multiliguality, Multimodality, and Visualization, CLEF '2021

Bucharest, Romania

2021 Detecting check-worthy factual claims in presidential debates NHassan CLi MTremayne Proceedings of the 24th acm international on conference on information and knowledge management the 24th acm international on conference on information and knowledge management 2015 A context-aware approach for detecting worth-checking claims in political debates PGencheva PNakov LMàrquez ABarrón-Cedeño IKoychev Proceedings of the International Conference Recent Advances in Natural Language Processing the International Conference Recent Advances in Natural Language Processing 2017. 2017 The Copenhagen Team Participation in the Check-Worthiness Task of the Competition of Automatic Identification and Verification of Claims in Political Debates of the CLEF CHansen CHansen JGSimonsen CLioma CheckThat! Lab 8 2018. 2018 UPV-INAOE-Autoritas -Check That: Preliminary Approach for Checking Worthiness of Claims BGhanem MMontes-Y Gomez FRangel PRosso 2018 6 EWilliams PRodrigues VNovak arXiv:2009.02431[cs arXiv: Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of claims using transformer-based models 2020 ANikolov GD SMartino IKoychev PNakov arXiv:2009.02931[cs arXiv: Team Alex at CLEF CheckThat! 2020: Identifying Check-Worthy Tweets With Transformer Models 2020 GSCheema SHakimov REwerth arXiv:2007.10534 arXiv: Check_square at CheckThat! 2020: Claim Detection in Social Media via Fusion of Transformer and Syntactic Features 2020 Attention is all you need AVaswani NShazeer NParmar JUszkoreit LJones ANGomez \Kaiser IPolosukhin Advances in neural information processing systems 2017 JDevlin M.-WChang KLee KToutanova arXiv:1810.04805 Bert: Pre-training of deep bidirectional transformers for language understanding 2018 arXiv preprint Overview of the CLEF-2021 CheckThat! Lab Task 1 on Check-Worthiness Estimation in Tweets and Political Debates SShaar MHasanain BHamdan ZSAli FHaouari MKNikolov FAYavuz Selim Kartal GDa San Martino ABarrón-Cedeño RMíguez TElsayed PNakov Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum, CLEF '2021

Bucharest, Romania

2021 DQNguyen TVu ATNguyen arXiv:2005.10200 BERTweet: A pre-trained language model for English Tweets 2020 arXiv preprint YLiu MOtt NGoyal JDu MJoshi DChen OLevy MLewis LZettlemoyer VStoyanov arXiv:1907.11692[cs arXiv: RoBERTa: A Robustly Optimized BERT Pretraining Approach 2019 ZLan MChen SGoodman KGimpel PSharma RSoricut arXiv:1909.11942[cs arXiv: ALBERT: A Lite BERT for Self-supervised Learning of Language Representations 2020 VSanh LDebut JChaumond TWolf Distilbert arXiv:1910.01108[cs arXiv: a distilled version of BERT: smaller, faster, cheaper and lighter 2020 Funnel-Transformer: out Sequential Redundancy for Efficient Language Processing ZDai GLai YYang QVLe arXiv:2006.03236[cs,stat arXiv: 2020 KClark M.-TLuong QVLe CDManning arXiv:2003.10555 arXiv: ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators 2020 A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages PJ OSuárez LRomary BSagot 10.18653/v1/2020.acl-main.156 arXiv: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics the 58th Annual Meeting of the Association for Computational Linguistics 2020 Computational Fact Checking from Knowledge Networks GLCiampaglia PShiralkar LMRocha JBollen FMenczer AFlammini 10.1371/journal.pone.0128193 PLOS ONE 10 e0128193 2015 GKaradzhov PNakov LMarquez ABarron-Cedeno IKoychev arXiv:1710.00341[cs arXiv: Fully Automated Fact Checking Using External Sources 2017 On lying and being lied to: A linguistic analysis of deception in computer-mediated communication JTHancock LECurry SGoorha MWoodworth Discourse Processes 45 2007 Taylor & Francis The lie detector: Explorations in the automatic recognition of deceptive language RMihalcea CStrapparava Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Association for Computational Linguistics the ACL-IJCNLP 2009 Conference Short Papers, Association for Computational Linguistics 2009 JZhang LCui YFu FBGouza arXiv:1805.08751 Fake news detection with deep diffusive network model 2018 arXiv preprint Do not trust the trolls: Predicting credibility in community question answering forums PNakov TMihaylova LMàrquez YShiroya IKoychev Proceedings of the International Conference Recent Advances in Natural Language Processing the International Conference Recent Advances in Natural Language Processing 2017. 2017 KShu XZhou SWang RZafarani HLiu arXiv:1904.13355 arXiv: The Role of User Profile for Fake News Detection 2019 Overview of the CLEF-2021 CheckThat! Lab Task 3 on Fake News Detection GKShahi JMStruß TMandl Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum, CLEF '2021

Bucharest, Romania

2021 An exploratory study of covid-19 misinformation on twitter GKShahi ADirkson TAMajchrzak Online Social Networks and Media 22 100104 2021 Elsevier FakeCovid -A Multilingual Cross-domain Fact Check News Dataset for COVID-19 GKShahi DNandini Workshop Proceedings of the 14th International AAAI Conference on Web and Social Media 2020 GKShahi arXiv:2010.00502 AMUSED: An Annotation Framework of Multi-modal Social Media Data 2020 arXiv preprint Task 3: Fake news detection at CLEF-2021 CheckThat! GKShahi JMStruß TMandl 10.5281/zenodo.4714517 2021 LBreiman Random forests, Machine learning Springer 2001 Stochastic gradient boosting JFriedman 10.1016/S0167-9473(01)00065-2 Computational Statistics & Analysis 38 2002 JWPennebaker RLBoyd KJordan KBlackburn The development and psychometric properties of LIWC2015 2015 Technical Report