UB ET at CheckThat! 2020: Exploring Ad hoc Retrieval Approaches in Verified Claims Retrieval Edwin Thuma, Motlogelwa Nkwebi Peace, and Leburu-Dingalo Tebo and Mudongo Monkgogi Department of Computer Science, University of Botswana {thumae,motlogel,leburut,mudongom}@ub.ac.bw Abstract. In this paper, we explore three different ad hoc retrieval ap- proaches to rank verified claims, so that those that verify the input claim are ranked on top. In particular, we deploy DPH Divergence from Ran- domness (DFR) term weighting model to rank the verified claims. In ad- dition, we deploy the Sequential Dependence (SD) variant of the Markov Random Fields (MRF) for term dependence to re-rank documents (veri- fied claims) that have query terms (input claim) in close proximity. More- over, we deploy LambdaMART, which is a learning to rank algorithm that use machine learning techniques to learn an appropriate combina- tion of features into an effective ranking model. Keywords: Check-Worthiness, Claim Retrieval, Proximity Search, Learn- ing to Rank, Ad-hoc Retrieval 1 Introduction Information posted on social media platforms such as Twitter is not often fact- checked by an authoritative entity before being published [2, 11]. In some in- stances, these posts on social media are coming from unreliable sources whose main objective is to disinform the general public. Such an action often yields un- desirable results. For example, disinformation is often used in political campaigns in order to influence the outcome of political elections. It is for this reason that the Information Retrieval (IR) and the natural language processing community have invested significant effort in developing techniques to address disinforma- tion, misinformation, factuality and credibility [2, 11]. This is evidenced by the CheckThat! lab1 , which is running under the Conference and Labs of the Evalu- ation Forum (CLEF)2 . The CheckThat! Lab at CLEF 2020 is the third version of the lab. The other editions are the CheckThat! 2018 and CheckThat2019. The Copyright c 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25 Septem- ber 2020, Thessaloniki, Greece. 1 https://sites.google.com/view/clef2020-checkthat 2 https://clef2020.clef-initiative.eu/ main purpose of these labs is to foster research in the development of techniques that would enable identification and verification of claims. In this paper, we present the results of our participation to the CheckThat! 2020 Task 2: Claim Retrieval, where we explore three different ad hoc retrieval approaches to rank verified claims, so that those that verify the input check-worthy tweet are ranked on top. 2 Background In this section, we present a brief but essential background on the different ad- hoc retrieval approaches used in our investigation. In particular, we start by providing a description of the DPH term weighting model in Section 2.1. This is followed by a description of the learning to rank techniques in Section 2.2. 2.1 DPH Term Weighting Model For all our experimental investigation, we used the parameter-free DPH term weighting model from the Divergence from Randomness (DFR) framework [1]. The DPH term weighting model calculates the score of a document d for a given query Q as follows:   tf · log((tf · avgl l ) · ( tfNc )) + 0.5 · log(2 · π · tf · (1 − tM LE )) P scoreDP H (d, Q) = t∈Q qtf · norm · (1) where qtf , tf and tf c are the frequencies of the term t in the query Q , in the document d and in the collection C respectively. N is number of documents in the collection C, avg l is the average length of documents in the collection C 2 and l is the length of the document d. tM LE = tfl and norm = (1−ttfM+1LE ) . 2.2 Learning to Rank Approach Learning to rank techniques are algorithms that use machine learning tech- niques to learn an appropriate combination of features into an effective rank- ing model [4]. This effective ranking model can be leant through the following steps [5, 6]: 1. Top K retrieval: Using a set of training queries that have relevance assess- ment, retrieve a sample of k documents using an initial weighting model such as DPH. 2. Feature extraction: For each document in the retrieved sample, extract a set of features. These features can either be query-dependent (term weighting models, term dependence models) or query-independent (click count, fraction of stopwords). The feature vector for each document is labelled according to the already existing relevance judgements. 3. Learning: Learn an effective ranking model by deploying an effective learning to rank technique on the feature vectors of the top k documents. The learned model can be deployed in a retrieval setting as follows: 4. Top K retrieval: For each unseen query, the top k documents are retrieved using the same retrieval strategy as in step (1) 5. Feature extraction: A set of features are extracted for each document in the sample of k documents. These features should be the same as those extracted in step (2). 6. Re-rank the documents: Re-rank the documents for the query by applying a learned model on every feature vector of the documents in the sample. The final ranking of the documents is obtained by sorting the predicted scores in descending order. In this work, we deploy LambdaMART [3], which is a tree-based learner. A tree-based learner builds a set of regression trees T . The final score of a document d is obtained by traversing the nodes of a particular tree t, according to the decisions based on the vector of feature values of the document fd [3, 6]. The leaf node of the tree traversed represents the final score of the document d. This can be expressed as: X score(d, Q) = t(fd ) (2) t∈T 3 Experimental Setting FAQ Retrieval Platform: For all our experiments, we used Terrier-4.23 [8], an open source Information Retrieval (IR) platform. All the documents used in this study were first pre-processed before indexing and this involved tokenising the text and stemming each token using the full Porter stemming algorithm [10]. We indexed the collection using blocks in order to save positional information with each term. Training Learning to Rank Techniques: For our learning to rank approach, we used the Terrier-4.2 Fat4 [6] framework. Fat is a method of allowing many fea- tures to be computed within one run of Terrier. To train and test LambdaMART, we used the default parameter values of the algorithms. 4 Description of the Different Runs T2-EN-UB ET-DPH: For all our runs, we used the parameter-free DPH Di- vergence From Randomness term weighting model in Terrier-4.2 IR platform to score and rank the documents (verified claims) T2-EN-UB ET-DPH LTR: We used T2-EN-UB ET-DPH as the baseline system. As improvement, we deployed a learning to rank technique. For our 3 http://terrier.org/ 4 http://terrier.org/docs/v4.0/learning.html learning to rank technique, we used the training and development tweets with their qrels for training and validation. We used the Terrier-4.2 Fat framework to retrieve 1000 documents for each topic (tweet) using the DPH term weighting model, and then calculated several additional query dependent features in Ta- ble 1. Using these features, we used Jforests to learn a LambdaMART model. We then applied this learned model on the test tweets to generate a final ranking. Table 1. All query-dependent (QD) features used in this work. Features Type Total Weighting models (BM25, PL2 and TF-IDF) QD 3 Proximity (Dependence) Models (DFRDependenceScoreModifier [9] and MRFDe- QD 2 pendenceScoreModifier [7] ) Total 5 T2-EN-UB ET-DPH MRF: We used T2-EN-UB ET-DPH as the baseline system. As improvement, we deployed the Sequential Dependence (SD) variant of the markov random field for term dependence [7] to re-rank documents (ver- ified claims) that have query terms (input claim) in close proximity. Sequential Dependence only assumes a dependence between neighbouring query terms. 5 Results and Discussion Table 2. Task 2, English: Performance for all the 3 Runs Run ID MAP@1 MAP35 MAP@5 P@1 P@3 P@5 T2-EN-UB ET-DPH 0.843 0.868 0.873 0.840 0.300 0.185 T2-EN-UB ET-DPH LTR 0.818 0.862 0.864 0.815 0.307 0.186 T2-EN-UB ET-DPH MRF 0.838 0.865 0.869 0.835 0.300 0.184 Table 2 presents our evaluation results. The official evaluation measure for Task 2: Claim Retrieval is MAP@k, where k = 5. We also present Precision@k. The results of this study suggests that ad-hoc retrieval approaches such as term weighting models, proximity (Dependence) models and learning to rank tech- niques can be used to rank verified claims for a given check-worthy tweet. Over- all, our primary submission T2-EN-UB ET-DPH LTR ranked third out of 10 submissions. It is worth noting that an attempt to improve the retrieval performance using a learning to rank technique resulted in a degradation in performance. An examination of the data revealed that for a majority of check- worthy tweets, there was a single verified claim. This lack off sufficient training data could have resulted in the degradation in retrieval performance. For exam- ple, after performing a first-pass retrieval with DPH and attempting to improve the ranking with our learned ranking model, some verified claims that verify the input claim ranked lower than in the previous ranking. 6 Conclusion In this paper, three different ad hoc retrieval approaches were explored to de- termine their effectiveness in raking verified claims so that those that verify the input claim are ranked on top. The results of this study suggests that term weighting models such as DPH can be used to rank verified claims for a given check-worthy tweet. In our attempt to improve the retrieval effectiveness using a learning to rank technique, we noticed a degradation in retrieval performance. In future, we will explore using sufficient training data in our learning to rank tech- nique coupled with additional query dependent and query independent features. Similarly, re-ranking the verified claims using markov random field for term de- pendence resulted in the degradation in performance. In our experiments, default parameter settings were used. Further research could usefully explore using dif- ferent parameters settings such as varying the window size in order to improve the retrieval performance. References 1. G. Amati, E. Ambrosi, M. Bianchi, C. Gaibisso, and G. Gambosi. FUB, IASI- CNR and University of Tor Vergata at TREC 2007 Blog Track. In Proceedings of the 16th Text REtrieval Conference (TREC-2007), pages 1–10, Gaithersburg, Md., USA., 2007. Text REtrieval Conference (TREC). 2. Alberto Barrón-Cedeño, Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino, Maram Hasanain, Reem Suwaileh, Fatima Haouari, Nikolay Babulkov, Bayan Ham- dan, Alex Nikolov, Shaden Shaar, and Zien Sheikh Ali. Overview of CheckThat! 2020: Automatic identification and verification of claims in social media. 3. Yasser Ganjisaffar, Rich Caruana, and Cristina Lopes. Bagging gradient-boosted trees for high precision, low variance ranking models. In Proceedings of the 34th in- ternational ACM SIGIR conference on Research and development in Information, SIGIR ’11, pages 85–94, New York, NY, USA, 2011. ACM. 4. T.-Y. Liu. Learning to Rank for Information Retrieval. Foundations and Trends in Information Retrieval, 3(3):225–331, June 2009. 5. C. Macdonald, R.L. Santos, and I. Ounis. The whens and hows of learning to rank for web search. Information Retrieval, 16(5):584–628, October 2013. 6. C. Macdonald, R.L.T. Santos, I. Ounis, and B. He. About learning models with multiple query-dependent features. ACM Transactions on Information Systems (TOIS), 31(3):11:1–11:39, August 2013. 7. Donald Metzler and W. Bruce Croft. A markov random field model for term de- pendencies. In Ricardo A. Baeza-Yates, Nivio Ziviani, Gary Marchionini, Alistair Moffat, and John Tait, editors, SIGIR 2005: Proceedings of the 28th Annual In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, August 15-19, 2005, pages 472–479. ACM, 2005. 8. I. Ounis, G. Amati, Plachouras V., B. He, C. Macdonald, and Johnson. Terrier Information Retrieval Platform. In Proceedings of the 27th European Conference on IR Research, volume 3408 of Lecture Notes in Computer Science, pages 517–519, Berlin, Heidelberg, 2005. Springer-Verlag. 9. V. Plachouras and I. Ounis. Multinomial Randomness Models for Retrieval with Document Fields. In Proceedings of the 29th European Conference on IR Research, pages 28–39, Berlin, Heidelberg, 2007. Springer-Verlag. 10. M.F. Porter. An Algorithm for Suffix Stripping. Readings in Information Retrieval, 14(3):313–316, 1997. 11. Shaden Shaar, Alex Nikolov, Nikolay Babulkov, Firoj Alam, Alberto Barrón- Cedeño, Tamer Elsayed, Maram Hasanain, Reem Suwaileh, Fatima Haouari, Gio- vanni Da San Martino, and Preslav Nakov. Overview of CheckThat! 2020 English: Automatic identification and verification of claims in social media.