CHIST-ERA Triple: improving data interoperability and federation across RDF knowledge graphs and Solid Pods Jerven Bolleman1 , Elias Crum2 , Iulian Dragan1 , Jakub Galgonek3 , Mark Ibberson1 , Tarcisio Mendes de Farias1 , Marek Moos3 , Marco Pagni1 , Ruben Taelman2 , Jiří Vondrášek3 and Ana Claudia Sima1,∗ 1 SIB Swiss Institute of Bioinformatics 2 Ghent University 3 IOCB Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences Abstract The TRIPLE project, a collaborative effort between the SIB Swiss Institute of Bioinformatics, the University of Ghent and the IOCB Prague, aims to boost the (re)usability of existing knowledge graph resources and improve software tools for RDF data access, documentation and data model visualization. In addition, TRIPLE will increase interoperability between existing public SPARQL endpoints and private data stored in Solid Pods, thus creating an ecosystem of research data that can be seamlessly integrated through efficient and expressive federated SPARQL queries. Keywords RDF, federated SPARQL, Solid Pods, Open Research Data The Resource Description Framework (RDF) provides a powerful way to structure data resources, coupled with the SPARQL query language, which can be used to interrogate these data, even when they are physically distributed, through the use of federated queries. Although powerful, these technologies are still currently underexploited due to significant technical and usability barriers limiting their use to a select few researchers. Knowledge graphs in RDF are generally not sufficiently described, with limited documentation, making it difficult to ascertain what to query and how to do so. In addition, execution times for complex queries are often very long and difficult to optimise. Finally, the query results are often difficult to integrate with SWAT4HCLS 2024: Bridging Life Sciences and Technology, February 26-29, Leiden, The Netherlands ∗ Corresponding author. Envelope-Open jerven.bolleman@sib.swiss (J. Bolleman); elias.crum@ugent.be (E. Crum); iulian.dragan@sib.swiss (I. Dragan); jakub.galgonek@uochb.cas.cz (J. Galgonek); mark.ibberson@sin.swiss (M. Ibberson); tarcisio.medes@sib.swiss (T. M. d. Farias); marek.moos@uochb.cas.cz (M. Moos); marco.pagni@sib.swiss (M. Pagni); ruben.taelman@UGent.be (R. Taelman); jiri.vondrasek@uochb.cas.cz (J. Vondrášek); ana-claudia.sima@sib.swiss (A. C. Sima) Orcid 0000-0002-7449-1266 (J. Bolleman); 0009-0005-3991-754X (E. Crum); 0000-0002-7038-544X (J. Galgonek); 0000-0003-3152-5670 (M. Ibberson); 0000-0002-3175-5372 (T. M. d. Farias); 0009-0008-9770-3971 (M. Moos); 0000-0001-9292-9463 (M. Pagni); 0000-0001-5118-256X (R. Taelman); 0000-0002-6066-973X (J. Vondrášek); 0000-0003-3213-4495 (A. C. Sima) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Figure 1: TRIPLE system architecture and data flow, interoperating Solid Pods and SPARQL endpoints private data, such as preliminary, unpublished results. The TRIPLE project, a collaborative effort between the SIB Swiss Institute of Bioinformatics, the University of Ghent and the IOCB Prague, will address these challenges by developing innovative solutions on four fronts: 1. Storing private (unpublished) data in Solid [1] Pods, an emerging technology enabling decentralised data vaults to host private RDF endpoints, execute federated SPARQL queries and cache results. 2. Optimising federated queries spanning public and private SPARQL endpoints allowing users to query multiple resources from within their Solid Pod. 3. Adapting state-of-the-art RDF documentation tools and making them available for all SPARQL endpoints, including Solid Pods. 4. Developing data model visualisations, sets of standardised federated queries, and query analysis tools to help users understand the data and run efficient federated queries. Finally, a demonstrator will show the impact of these advances when applied to a technically challenging use case of scientific relevance: the search for suitable organisms for bioremediation1 . The components and data flow in the TRIPLE system architecture are illustrated in Figure 1. The TRIPLE project will improve the (re)usability of existing knowledge graph resources and software tools to access them. In doing so, TRIPLE will create conditions for reproducible research in any domain based on open or shared data and software. Furthermore, the interoperability with Solid Pods will enable researchers to integrate their data with knowledge graphs. Acknowledgments We acknowledge support from the CHIST-ERA Open Research Data (ORD) grant. The SIB received funding from the Swiss National Science Foundation (SNSF). The University of Ghent acknowledges funding from the Research Foundation – Flanders (FWO). The IOCB Prague is grateful for funding from the Technology Agency of the Czech Republic (TAČR) within the National Recovery Plan, project No. TH86010003. 1 https://en.wikipedia.org/wiki/Bioremediation References [1] A. V. Sambra, E. Mansour, S. Hawke, M. Zereba, N. Greco, A. Ghanem, D. Zagidulin, A. Aboulnaga, T. Berners-Lee, Solid: a platform for decentralized social applications based on linked data, MIT CSAIL & Qatar Computing Research Institute, Tech. Rep. (2016).