TIET at CLEF CheckThat! 2020: Verified Claim Retrieval Utsav Shukla and Aayushmaan Sharma Thapar Institute of Engineering and Technology, India {ushukla_be17,aaayushmaan_be17}@thapar.edu Abstract. Internet is the all-singing, all-dancing, multi-pronged tool of modern man, but, for its many benefits it also presents us with daunt- ing challenges. The user is bombarded with tons of information and it is becoming increasingly difficult to tell fact from fiction. Consequently, over the past few years research work has intensified in this regard to better equip the user against false information and indoctrination. In this paper, we propose a system for automatic retrieval of supporting claims given an input claim. Our system uses elasticsearch along with a transformer model to generate a similarity score in two steps. We have submitted our system to CLEF-2020 Check That! Lab’s Task-2: Claim Retrieval and our primary evaluations gave promising results which we will present, analyse and improve henceforth in the paper. Keywords: Fact checking · Information Retrieval · Claim Retrieval · Natural Language Processing 1 Introduction The internet has come a really long way in the past couple of decades in terms of acceptability and reach. And with easy accessibility to low-cost internet, and wide-appeal of social media services like Twitter, there is an overabundance of unverified information online. Now, this glaring lack of factual validation com- bined with the lucrative carte blanche that the internet allows, has given way to the taxing problem of gross misinformation and disregard for factual correctness. Needless to say, this has been a green pasture for feckless rumour-mongers and ill-intentioned propagandists. So, now more than ever, there is an exigent demand for automatic fact checking to safeguard internet users from misinformation and indoctrination. To that end, multiple investigative-journalistic fact-checking or- ganisations like Snopes, IFCN, Full Fact etc. have emerged. But, manual fact checking is a very demanding and slow process that can take one full day to research and write about a claim [1]. Copyright c 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25 Septem- ber 2020, Thessaloniki, Greece. Looking at these problems faced in manual fact checking, automated fact checking can be very advantageous to fully automate or assist the existing pipelines. There has been a significant amount of work in the field of automated fact checking in recent years. Various tasks have been formulated related to or that come under automated fact checking [2]. There are some research works that use external knowledge sources too (Web in this case) for a fully automated fact checking pipeline [3]. A very similar methodology that aims at retrieving already fact checked claims was proposed in a very recent work [4]. To contribute to the ongoing research in this regard, we, team Trueman from TIET, decided to participate in the CLEF Check That! Lab’s Task-2: Claim Retrieval [5] [6]. In this paper, we focus on one of the key components of a fact-checking pipeline, that is, claim retrieval. The principles governing our approach are sim- plicity, directness and scalability. Our system uses elastic search along with trans- former model similarity between the tweet and claim to generate a score in two steps : exact matching of words using BM25 and then, an NLP-based model to check for semantic similarities [7]. The initial evaluation results were good with the primary task being : fact checking of tweets by finding related, corroborating claims in the corpus given for the task. However, to improve accuracy we un- dertook manual analysis of faulty results and subsequently designed a suitable experiment to make an attempt to troubleshoot the problems faced. Code for our experiments and our results file can be found here1 . 2 Methodology We propose an unsupervised system for the task. Our submitted system uses elastic search BM25 score along with cosine similarity between two text pieces’ BERT encodings. Figure 1 shows the working of our submitted system. Fig. 1: Our submitted system 1 https://github.com/us241098/checkthat2020_submission 2.1 Indexing the verified claims: Firstly, all verified claims are indexed in elasticsearch. Ideally some preprocessing like removing URLs, hashtags and @ sign should be performed before indexing, but we skip this step as verified claims in the corpus are mostly free of these. Creating this index ensures quick retrieval of related articles at the time of query. 2.2 Ranking using BM25: At the time of query, we retrieve top 1000 matching claims for the query/tweet along with their BM25 scores. BM scores are dependent on the exact match between the words of claim and tweet and are assigned to every claim-tweet pair. However, our initial experiments have shown that using just these scores for ranking claims give underwhelming results, hence, we use transformer-based similarity in our next step and update ranks after adding the scores assigned by it. 2.3 Transformer based model: Since BM25 similarity relies on exact word match between the text pieces, it fails to capture the semantic similarity between claims and tweets. To compensate for this, we have used a BERT fine-tuned on NLI data to generate encodings for our text piece, we get the similarity between two encodings by computing cosine distance between them. Based on the similarity we update the scores from BM25 and subsequently update the ranks. In our system’s implementation we have used the sentence transformer python package and bert base nli mean tokens model. 3 Results and Improvements In this section, results of our submission to Task 2 will be discussed and compared to the best performing system. In the subsequent subsection, we will discuss the cases where our system performs poorly and attempt to improve our submitted system. Table 1 shows our system’s result, metrics on which systems were eval- uated were MAP@1, 3, 5, all, Precision@1, 3, 5 and Reciprocal Rank@1, 3, 5. Systems were ranked on the basis of performance on MAP@5. Our submission was ranked 6th among 9 participants. Table 1: Our results and comparison with first place. MAP Precision RR @1 @3 @5 - @1 @3 @5 @1 @3 @5 Buster.AI(1st) 0.897 0.926 0.929 0.929 0.895 0.32 0.195 0.895 0.923 0.927 trueman(6th) 0.743 0.768 0.773 0.782 0.74 0.267 0.164 0.74 0.766 0.771 trueman (unsubmitted) 0.757 0.797 0.80 0.808 0.759 0.283 0.173 0.759 0.798 0.802 3.1 Analysis We manually analysed our results and found that our system made errors in the following cases: Proper Noun Overlap: We observed that our system sometimes fails and returns claims that do not have the same proper nouns as the query and some- times lets some proper nouns slip by. Table 2 shows some examples of the faulty results vis-à-vis proper noun overlap. Table 2: bold overlapped proper nouns; italics missed proper nouns Tweet Query (id) VClaim (id) Model’s pick (id) @HydroxCookie where Nabisco closed their Japan renamed a town are you’re cookies made? Chicago plant and moved ’Usa’ so that it could le- Oreo moved to Mexico. all production of Oreos to gitimately stamp its ex- Made in USA wins my Mexico. (6592) ports ’Made in USA.’ business. (1055) (5891) Good to see the Nish- Video shows an amaz- The comedian and actor ioka shot being replayed ing behind-the-back re- Tim Allen wrote a lengthy again and again. . . that’s turn by Japanese tennis Facebook post that at- what we should be talking player Yoshihito Nishioka. tacked liberals and Demo- about. . . @MikeCTennis (10082) cratic politicians and was (1051) shared widely in August 2019. (9254) Hyperlinks: We found that in many cases, the textual information in a tweet is very vague and crucial information regarding the tweet is contained in a hyperlink. So, to solve this problem we sought to extract article titles from the hyperlinks. Some erroneous results dealing with hyperlinks are shown in Table 3. Hashtags (#) and At (@) sign: Upon analysis, we found out that our system was having major trouble if crucial proper nouns were contained inside a hashtag or at (@) sign. Some sample tweets showcasing this problem are tab- ularised below in Table 4. 3.2 Improvements to our submitted system: Learning from the above analysis we have added a new module to our submit- ted system that takes removal of special symbols and noun overlap into account and accordingly updates the similarity scores. Since more often than not peo- ple put crucial (for retrieval purposes) information inside hashtags and at signs, Table 3: Many times information needed to verify a claim is in the hyperlink Tweet Query (id) VClaim (id) Model’s pick (id) Is @jacindaardern willing In August 2018, French The U.S.’s leading group to denounce this legisla- politicians passed a law of pediatricians issued tion of child sexual abuse? which stated that a child a strong statement con- https://t.co/6YMlJi is capable of consenting to demning tolerance of gen- O8zr (1013) having sex with an adult. der dysphoria in children. (5691) (676) Three. million. gal- A spill by the Environ- A new study reporting lons. Our story from mental Protection Agency on the 2015 death of a Colorado on the An- rendered the normally Colorado infant claims the imas River disaster: pristine blue Animas River event was the world’s first http://t.co/s40r7orNj4 a terrifying mustard yel- documented pot overdose. pic.twitter.com/Ey2 low. (2558) (1838) 6EaBEhK (1048) Holy crap. I have never, There are small islands Actor Kurt Russell said in my entire career as of fire ants floating in the that he has never seen an ant researcher, seen floodwaters from Tropical a man as dedicated and *anything* like this. Storm Harvey. (716) determined as President https://t.co/jIjTOo3 Trump. (3125) fZc (1018) Fig. 2: Improved (unsubmitted) system in our improved system we separated the words contained in these signs us- ing regular expressions and prioritised them when checking for overlaps. Ideally, overlap of all proper nouns should be prioritised and to that end we added a Table 4: Errors due to at (@) symbol and hashtags (#) Tweet Query (id) VClaim (id) Model’s pick (id) @italiaricci You should You can now order pizza On 30 July 2008, The get one for your house! from pizza vending ma- Cheesecake Factory #PizzaVendingMa- chines. (10315) restaurants will be sell- chine #NowIWantOne ing cheesecake for $1.50 pic.twitter.com/3SV5Z per slice. Cheesecake Fac- 9bAuX (1016) tory Serves Up a Delicious 30th Anniversary Celebra- tion (6761). We booked the one airline A Spirit Airlines em- An airline promotion al- that doesn’t give military ployee was rude to a sol- lows husbands to take free bags ???? @Spiri- dier, charged him for a their wives along on busi- tAirlines (1045) carry-on bag, and told ness trips for free, but a his father that the air- survey later conducted by line ’doesn’t cater to the the airline finds that 95% military.’ (1177) of the wives were unaware of the promotion. (3391)) #Best news you’ll hear ’Scarface’ is being remade A live poll conducted all day! #ScarfaceRe- with Leonardo DiCaprio by ABC News in Au- make, starring non other cast as Tony Montana. gust 2016 shows Donald than @LeoDiCaprio (815) Trump, Jill Stein, and announced for 2016!! Gary Johnson all well #CantWait (1084). ahead of Hillary Clinton. (1638) module to reward proper noun overlaps. We calculate the overlap using Leven- shtein distance [8].The more overlapped proper nouns in tweet and claim, the higher scores reward. Table 1 shows performance of our improved system true- man(unsubmitted), we can see our new system performs slightly better than our submitted system on all evaluation metrics. 4 Conclusion and future work In this paper we proposed a system for retrieval of verified claims given a query and submitted our results presented our results at CLEF-2020 Check That Lab!’s Claim Retrieval task. We strove to resolve some of the issues that our system faced initially by annotating and analysing faulty results, and managed to im- prove our performance. Yet, we strongly believe that there is a margin for future work to resolve some existing problems and to broaden the model’s knowledge source in order to achieve practical application. The problem that the model was facing in regard to hyperlinks still needs some work in order to achieve full resolution. Since most of the links shared were from news or media websites, we think that title and byline retrieval from the hyperlinks will improve the system’s performance considerably. Also, inculcating Image modality to analyse tweets with pictures merits future work in this regard. For the sake of this re- search, we considered fact-checking websites as a knowledge source in order to validate claims. We aspire to do the same and regard fact-checking and investiga- tive journalism websites as a knowledge source but in an Indian context. This approach will improve our model’s contextual sensibility and improve chances of identifying factual incorrectness and misinformation significantly. We believe that these approaches are good enough to merit significant future works and further research. References [1] Naeemul Hassan et al. “The quest to automate fact-checking”. In: Proceed- ings of the 2015 Computation+ Journalism Symposium. 2015. [2] James Thorne and Andreas Vlachos. “Automated Fact Checking: Task For- mulations, Methods and Future Directions”. In: Proceedings of the 27th In- ternational Conference on Computational Linguistics. Santa Fe, New Mex- ico, USA: Association for Computational Linguistics, Aug. 2018, pp. 3346– 3359. url: https://www.aclweb.org/anthology/C18-1283. [3] Georgi Karadzhov et al. “Fully Automated Fact Checking Using External Sources”. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017. Varna, Bulgaria: INCOMA Ltd., Sept. 2017, pp. 344–353. doi: 10.26615/978-954-452-049-6_046. url: https://doi.org/10.26615/978-954-452-049-6_046. [4] Shaden Shaar et al. “That is a Known Lie: Detecting Previously Fact- Checked Claims”. In: Proceedings of the 58th Annual Meeting of the Associ- ation for Computational Linguistics. Online: Association for Computational Linguistics, July 2020, pp. 3607–3618. doi: 10.18653/v1/2020.acl-main. 332. url: https://www.aclweb.org/anthology/2020.acl-main.332. [5] Alberto Barron-Cedeno et al. Overview of CheckThat! 2020: Automatic Identification and Verification of Claims in Social Media. 2020. arXiv: 2007. 07997 [cs.CL]. [6] Shaden Shaar et al. “Overview of CheckThat! 2020 English: Automatic Identification and Verification of Claims in Social Media”. In: [7] Stephen Robertson and Hugo Zaragoza. “The Probabilistic Relevance Frame- work: BM25 and Beyond”. In: Foundations and Trends in Information Re- trieval 3 (Jan. 2009), pp. 333–389. doi: 10.1561/1500000019. [8] L. Yujian and L. Bo. “A Normalized Levenshtein Distance Metric”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 29.6 (2007), pp. 1091–1095.