Sharing retrieved information using Linked Credibility Reviews Ronald Denauxa , Jose Manuel Gomez-Pereza a Expert System, Madrid, Spain Abstract In recent years, the advent of deep learning for NLP is enabling the accurate retrieval of semantically similar content. Such retrieval, along with check-worthiness and stance detection, is crucial for identify- ing misinformation and linking it to relevant verified information. While sources of credibility signals (and methods to retrieve them) are abundant on the web, they vary greatly in terms of quality and relevance and can be quite scarce for specific claims. These special requirements for such IR systems suggest the need for good abstractions to represent relevant aspects like the credibility of retrieved in- formation and the confidence of automated systems that retrieved the information. In this paper, we (i) summarise Linked Credibility Reviews, existing work that provides a conceptualisation and exchange format for representing the credibility of retrieved verified information and (ii) discuss the role this conceptualisation can play in information retrieval systems for reducing online misinformation. Keywords misinformation detection, shared conceptualisation, information exchange, distributed information retrieval 1. Introduction The Web and social media have ushered an era of ultra-fast spreading of messages without the need for centralized media like publishing houses, newspapers, radio or TV channels. However, this lack of centralization also entails a lack of editorial and quality control over the messages that spread online. This misinforming capacity of the web and social media is increasingly being exploited and is having a detrimental effect on society as evidenced by problems like political polarization and interference in democratic and policymaking processes. Information retrieval is bound to play a crucial role in reducing online misinformation as it has the potential to bring back some of the quality control that was lost in the transition from traditional media to the web. However, traditional information retrieval metrics and ap- proaches cannot be directly applied if they are to be used to spot and limit the spread of mis- information. Additional aspects like credibility and harmfulness have to be taken into account when determining relevancy of results and other requirements like explainability, reproducibil- ity, trust and decentralisation have to be taken into account. All of this means that datasets and information retrieval systems need to be aware of the wider online context where they are ROMCIR 2021: Workshop on Reducing Online Misinformation through Credible Information Retrieval, held as part of ECIR 2021: the 43rd European Conference on Information Retrieval, March 28 April 1, 2021, Lucca, Italy (Online Event) " rdenaux@expert.ai (R. Denaux); jmgomez@expert.ai (J.M. Gomez-Perez)  0000-0001-5672-9915 (R. Denaux); 0000-0002-5491-6431 (J.M. Gomez-Perez) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) deployed and this requires a well designed conceptual model and vocabulary to publish and share such artifacts on-line. In this paper, we summarize existing work on such conceptual models and vocabularies produced recently by the semantic web research community. Rather than presenting new re- search, our intention is to build a bridge between the information retrieval and semantic web communities as we believe such collaboration is necessary to successfully tackle misinforma- tion online. 2. IR Requirements for Misinformation Reduction As mentioned above, information retrieval systems for online misinformation reduction have several new requirements in addition to standard IR systems. In this section, we enumerate and describe these additional requirements. 1. New relevancy aspects: In standard information similarity, content (and metadata) similarity are the main aspects that determine relevancy of retrieved results. In the con- text of misinformation, aspects like credibility, stance and harmfulness also need to be considered. 2. Subjectivity: Aspects like credibility and harmfulness are subjective as they depend on the content being evaluated, the evaluation method and the background information used to evaluate both aspects. 3. Explainability: IR systems should be able to explain the reason why retrieved content was deemed relevant, credible or harmful. 4. Reproducibility: When possible, the methods used should be reproducible by third parties, especially end users. This allows end users to verify the methods and results even if they are not aligned with their subjective views on, for example, credibility. 5. Trust: Since full reproducibility or explainability are not always possible (or desirable), the final credibility sources should be inspectable. This allows end users to build (or remove) trust with those sources. 6. Decentralizable: Systems that aim to aid in online misinformation reduction should be designed with web-scale in mind. Similarly, the subjectivity and trust requirements means that a single source of credible information is not possible and should be avoided. All of this is best achieved if such systems are designed to be decentralizable. All these requirements mean that IR systems to be used for misinformation reduction should be designed to address these needs. In particular, providing a list of document identifiers (and their credibility) as the output of a IR system clearly is insufficient. Recent work by Denaux and Gomez-Perez [1] proposed a conceptual model and vocabulary that can be used, which we summarise next. 3. Credibility Reviews Linked Credibility Reviews (LCR) [1], is a linked data model for composable and explainable mis- information detection. Its key insight is that calculations of credibility are ultimately subjective and have to be modeled accordingly. This subjectivity is achieved by modeling steps in the in- formation retrieval and credibility assessment as “Reviews”. The approach can be described at a conceptual level and can be implemented by extending the Schema.org vocabulary [2]1 . 3.1. Conceptual Model A Credibility Review (CR) is an extension of the generic Review data model defined in Schema.org. A Review R can be conceptualised as a tuple (𝑑, 𝑟, 𝑝) where R: • reviews a data item 𝑑, via property itemReviewed, this can be any data node (e.g. an article, claim or social media post). • assigns a numeric or textual rating 𝑟 to (some, often implicit, reviewAspect of) 𝑑, via property reviewRating • optionally provides provenance information 𝑝, e.g. via properties author and isBasedOn. A Credibility Review (CR) is a subtype of Review, defined as a tuple ⟨𝑑, 𝑟, 𝑐, 𝑝⟩, where the CR: • 𝑟 must have reviewAspect credibility and is recommended to be expressed as a numeric value in range [−1, 1] and is qualified with a rating confidence 𝑐 (in range [0, 1]). • the provenance 𝑝 is mandatory and must include information about: – credibility signals (CS) used to derive the credibility rating, which can be either (i) Reviews for data items relevant to 𝑑 or (ii) ground credibility signals (GCS) resources (which are not CRs) in databases curated by a trusted person or organization. – the author of the review. The author can be a person, organizations or bot. Bots are automated agents that produce CRs. 3.2. Vocabulary to share Credibility Reviews While schema.org provides most of the vocabulary needed to describe CRs, Denaux and Gomez-Perez had to extend it slightly to be able to express the proposed model. The relevant fragment of schema.org and the proposed extensions are depicted in figure 1, focused on CRs for textual web content (although notice that the model is also applicable to other modalities). Besides the conceptual model and vocabulary, in [1], the authors also propose generic strate- gies for retrieving relevant credibility signals and propagating both the credibility and confi- dence values. 4. Acred: example implementation Acred is an example implementation for the Linked Credibility Review architecture, also pre- sented in [1]. It uses a database of 85K sentences with estimated credibilities. For 45K sen- tences a known ClaimReviews provides a credibility score with fairly high confidence; this 1 See also https://schema.org/ Figure 1: Credibility Review data model, extending schema.org. was mainly provided by ClaimsKG [3], with trusted credibility labels provided by fact-checkers. The remaining 40K sentences were extracted from a variety of news sites and their credibility was estimated based on the trustworthiness of the publisher as determined by MisinfoMe [4, 5] and ultimately site reputation organizations like NewsGuard and Web Of Trust2 . For information retrieval, acred uses a sentence encoder trained on STS-B [6] (similar to Sentence BERT [7]) and FAISS [8] to index the 85K sentences as well as a Solr instance to store further metadata about associated ClaimReviews or sites where the sentence was published. A second model is used to perform stance detection, which provides polarity which is nec- essary for correctly propagating credibility values. Heuristic rules are used to combine the credibility confidence and the similarity between the query and database sentences. Given an input document, a third model is used to select query sentences which are factual and check- worthy [9] and further heuristics are used to select the best combination of highest confidence and lowest credibility. See the original paper [1] and corresponding GitHub repository3 for the full details. Acred was evaluated on the Clef’18 CheckThat! Factuality Task [10] task and achieved state of the art results with a 0.6475 MAE (compared to the previous score of 0.7050 [11]). Similarly, acred obtained an accuracy of 0.716 on the PolitiFact fragment of FakeNewsNet, which is also state of the art when not using social signals (shares, replies and comments on Twitter to the original article). Both of these results are remarkable when considering that acred does not require a training step since all its steps rely on pre-trained models. Note as well that the representation of credibility score and confidence can be mapped to different labeling schemes: FakeNewsNet only has labels “fake” or “real”, while Clef’18 has labels “true”, “half-true” and “false” (a third dataset presented in [1] even has 6 different labels including “not verifiable” and “uncertain”). 2 https://www.newsguardtech.com/, https://www.mywot.com/. See https://misinfo.me/misinfo/credibility/ origins for a full list of such sources. 3 https://github.com/rdenaux/acred (a) Tweet with label and feedback buttons (b) Credibility Review with explanation Figure 2: Example UIs for a (dis)agreement task for a tweet. The user can provide feedback about correct or incorrect labels predicted by acred. (b) Kept evidence graph (a) Full evidence graph Figure 3: Evidence graph for the credibility review and tweet shown in Fig. 1. The big “meter” icon represents the main credibility review, next to the icon for the tweet. All the other nodes form the evidence gathered by acred and used to determine the credibility of the tweet. The output of acred is Credibility Review which can be simplified as a single label (see fig. 2a). However, the provenance and authorship relations can be exploited to render a full (or partial) “evidence graph” (see fig. 3) including intermediate steps (facilitating verifiability and repro- ducibility) or to generate explanations (see fig. 2b). 5. Conclusion In this paper, we summarized recent work on a conceptual model and vocabulary for describ- ing and sharing retrieved documents, their credibility and its implication for the credibility of potentially misinforming documents online. We believe that the IR community working on retrieval of content for reducing online misinformation should be aware of this work and can benefit from adopting this conceptual model. As stated in section 2, these IR systems need to deal with a variety of requirements and such conceptual models and vocabularies are a way to ensure these requirements are met. The proposed conceptual model makes it easier to propa- gate and document results and steps in IR systems, thus facilitating explainability and repro- ducibility. It also makes explicit where the boundaries are between the subjective credibility and similarity analyses are and the trust in ground credibility sources like fact-checkers and organizations like NewsGuard. The fact that the sample implementation relies on third-party services like ClaimsKG, MisinfoMe, various fact-checkers and site reviewing organizations fur- ther shows that the conceptual model facilitates decentralization and handling of subjective reviews. Note finally that acred only uses a small set of “ground credibility signals” (ClaimRe- views and website reviews), there are several such sources of signals as identified by the W3C Credibilty Signals working group.4 . Some of the open issues with acred are discussed in [1, 12]. Although the conceptual model in [1] only provides a model for expressing credibility, a similar approach could be taken for Harmfulness Reviews, which is an orthogonal aspect to credibility (and arguably even more subjective). Finally, it should be mentioned that the seman- tic web community is also proposing similar vocabularies [13]5 ; however these impose certain distinctions which may not be relevant for the IR community. Acknowledgments Work supported by the European Comission under grant 770302 – Co-Inform – as part of the Horizon 2020 research and innovation programme. References [1] R. Denaux, J. M. Gomez-Perez, Linked credibility reviews for explainable misinformation detection, in: J. Z. Pan, V. Tamma, C. d’Amato, K. Janowicz, B. Fu, A. Polleres, O. Senevi- ratne, L. Kagal (Eds.), The Semantic Web – ISWC 2020, Springer International Publishing, Cham, 2020, pp. 147–163. [2] R. V. Guha, D. Brickley, S. Macbeth, Schema. org: evolution of structured data on the web, Communications of the ACM 59 (2016) 44–51. [3] A. Tchechmedjiev, P. Fafalios, K. Boland, M. Gasquet, M. Zloch, B. Zapilko, S. Dietze, K. Todorov, M. Zloch, ClaimsKG: A Knowledge Graph of Fact-Checked Claims. Interna- tional Semantic Web Conference (2019) 309–324. 4 https://credweb.org/signals-beta/ 5 And more recently the Open Claims model (under review) [4] M. Mensio, H. Alani, MisinfoMe: Who’s Interacting with Misinformation?, in: 18th International Semantic Web Conference: Posters & Demonstrations, 2019. [5] M. Mensio, H. Alani, News Source Credibility in the Eyes of Different Assessors, in: Conference for Truth and Trust Online, In Press, 2019. [6] D. Cer, M. Diab, E. Agirre, I. Lopez-Gazpio, L. Specia, SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation, in: Proc. of the 10th International Workshop on Semantic Evaluation, 2018, pp. 1–14. arXiv:1708.00055. [7] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert- networks, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3973–3983. [8] J. Johnson, M. Douze, H. Jégou, Billion-scale similarity search with gpus, arXiv preprint arXiv:1702.08734 (2017). [9] K. Meng, D. Jimenez, F. Arslan, J. D. Devasier, D. Obembe, C. Li, Gradient-Based Ad- versarial Training on Transformer Networks for Detecting Check-Worthy Factual Claims (2020). arXiv:2002.07725. [10] P. Nakov, A. Barrón-Cedeño, R. Suwaileh, L. M. . Arquez, W. Zaghouani, P. Atanasova, S. Kyuchukov, G. Da, S. Martino, Overview of the CLEF-2018 CheckThat! Lab on Auto- matic Identification and Verification of Political Claims, in: International Conference of the Cross-Language Evaluation Forum for European Languages, 2018, pp. 372—-387. [11] D. Wang, J. G. Simonsen, B. Larsen, C. Lioma, The Copenhagen Team Participation in the Factuality Task of the Competition of Automatic Identification and Verification of Claims in Political Debates of the CLEF-2018 Fact Checking Lab., CLEF (Working Notes) 2125 (2018). [12] R. Denaux, F. Merenda, J. Manuel, Towards crowdsourcing tasks for accurate misinfor- mation detection, in: Advances in Semantics and Linked Data: Joint Workshop Proceed- ings from ISWC 2020, SEMIFORM: Semantics for Online Misinformation Detection, Mon- itoring, and Prediction, volume 2722, CEUR-WS, 2020. URL: http://ceur-ws.org/Vol-2722/ semiform2020-paper-2.pdf. [13] K. Boland, P. Fafalios, A. Tchechmedjiev, Modeling and Contextualizing Claims, in: 2nd International Workshop on Contextualised Knowledge Graphs, 2019.