Data and System Traceability for Transparent AI in Medical Imaging Sara Colantonio1,∗ , Andrea Berti1,2 , Gianluca Carloni1,2 , Claudia Caudai1 , Giulio Del Corso1 , Danila Germanese1 , Eva Pachetti1,2 , Maria Antonietta Pascali1 , Varvara Kalokyri3 , Haridimos Kondylakis3 , Charalampos Kalantzopoulos4 , Nikolaos Tachos4 , Dimitris Fotiadis4 , Valentina Giannini5,6 , Simone Mazzetti5,6 , Daniele Regge5,6 , Nickolas Papanikolaou7 , Konstantinos Marias3 and Manolis Tsiknakis3 1 Institute of Information Science and Technologies, National Research Council of Italy (ISTI-CNR), Pisa, Italy 2 University of Pisa, Pisa, Italy 3 Foundation for Research and Technology Hellas (FORTH), Institute of Computer Science, Heraklion, Greece 5 Foundation for Research and Technology Hellas, Ioannina, Greece 5 Department of Surgical Sciences, University of Turin, Turin, Italy 6 Department of Radiology, Candiolo Cancer Institute, FPO-IRCCS, Candiolo, Italy 7 Champalimaud Foundation, Computational Clinical Imaging Group, Lisboa, Portugal Abstract Artificial intelligence holds the promise to revolutionize medical practices, particularly the realm of image-based diagnostics. Nonetheless, the integration of artificial intelligence technologies brings forth a range of immediate and future challenges that are the focus of almost all related discussions. A responsible approach to the development and use of artificial intelligence is essential to effectively address and mitigate these challenges, via strong scientific foundations, technical reliability, thorough testing and validation procedures, risk assessment and alignment with ethical principles. Central to this is the principle of transparency, as a key ingredient to foster trust and reliability. Transparency can be upheld through measures such as disclosing data sources and their use, as well as demonstrating transparent system development, operation and use. In this respect, it is strictly interconnected with the traceability of data and AI systems. This discussion paper briefly outlines the most relevant issues related to transparency and the methods used in the EU H2020 ProCAncer-I project to fulfill its mandates, in terms of data and system traceability, also linked to other projects, such as the Tuscany Region’s NAVIGATOR project, and in compliance with the requirements of the FUTURE-AI guidelines. Keywords Transparent Artificial Intelligence, traceability, oncologic imaging, AI Model Passport SEBD 2024: 32nd Symposium on Advanced Database Systems, June 23-26, 2024, Villasimius, Sardinia, Italy ∗ Corresponding author. Envelope-Open sara.colantonio@isti.cnr.it (S. Colantonio); andrea.berti@isti.cnr.itl (A. Berti); gianluca.carloni@isti.cnr.it (G. Carloni); claudia.caudai@isti.cnr.it (C. Caudai); giulio.delcorso@isti.cnr.it (G. D. Corso); danila.germanese@isti.cnr.it (D. Germanese); eva.pachetti@isti.cnr.it (E. Pachetti); maria.antonietta.pascali@isti.cnr.it (M. A. Pascali); vkalokyri@ics.forth.gr (V. Kalokyri); vkalokyri@ics.forth.gr (H. Kondylakis); xkalantzopoulos@gmail.com (C. Kalantzopoulos); ntachos@gmail.com (N. Tachos); fotiadis@cs.uoi.gr (D. Fotiadis); valentina.giannini@unito.it (V. Giannini); simone.mazzetti@ircc.it (S. Mazzetti); daniele.regge@ircc.it (D. Regge); nickolas.papanikolaou@research.fchampalimaud.org (N. Papanikolaou); kmarias@ics.forth.gr (K. Marias); tsiknaki@ics.forth.gr (M. Tsiknakis) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 1. Introduction In the realm of clinical applications, the impact and adoption of Artificial Intelligence (AI) and Machine Learning (ML) technologies hinge on their ability to demonstrate reliability and clinical usefulness, ensure absolute patient safety, and earn the trust and approval of clinical end users and stakeholders [1]. However, trust is a dynamic and multi-layered concept that extends beyond technical performance to encompass psychological, sociological, philosophical, and ethical dimensions1 . It is shaped by both objective factors and subjective perceptions and beliefs. Consequently, the extensive efforts by the scientific, regulatory, and standardization communities aimed so far to identify crucial elements AI systems must possess to address concerns and foster trust among users. A critical element in building trust is a commitment to transparency. The High-Level Expert Group (HLEG) on Artificial Intelligence’s guidelines [2] outlined transparency through three aspects: traceability, explainability, and frank disclosure of an AI system’s limitations. The newly ratified AI Act [3] also emphasizes transparency as a core objective. It is designed to ensure individuals understand the design and usage of AI systems, along with the responsibilities companies and public authorities have concerning decisions made by AI. The AI Act supports the HLEG’s guidelines by stating “AI systems shall be developed and used in a way that allows appropriate traceability and explainability while making humans aware that they communicate or interact with an AI system as well as duly informing users of the capabilities and limitations of that AI system and affected persons about their rights”. Transparency involves detailed documentation of an AI system’s entire lifecycle along with the underlying operations that dictate its functioning. Ensuring transparency from the very design of an AI system is crucial to eliminate any uncertainty about its functionality and its application by those using it for clinical decisions. Not by chance, transparency is a core element of the FUTURE-AI guidelines [4, 5], being undertaken with the three guiding principles of Traceability, Explainability, and Usability. Transparency also ensures that an AI system is designed to be reproducible and auditable, laying the groundwork for accountability and responsibility. In the realm of academia, the push for transparency in AI aligns with well-known principles of open data and open science [6]. Yet, in the private sector, achieving transparency remains a complex issue, often due to competitive dynamics within the industry. In this brief discussion paper, we outline the key facets of transparency in medical imaging and provide a summary of the approaches being implemented in the EU H2020 ProCAncer-I project2 . 2. The implications of AI transparency in oncologic imaging In the field of oncologic imaging, AI-based methodologies are increasingly dependent on data- driven approaches that manage large-scale, multimodal datasets efficiently. Specifically, in prostate cancer diagnostics, multiparametric magnetic resonance imaging (MRI) plays a critical role in detecting the presence of tumors and providing insights into tumor phenotypes [7, 8, 9]. However, for a comprehensive assessment of patient risk and condition, it is imperative to 1 https://plato.stanford.edu/entries/trust/ 2 https://www.procancer-i.eu/ integrate these imaging data with clinical information, such as hormone levels, demographic details, and medical history. Additionally, the source of imaging data often varies, emanating from different clinical institutions with disparate clinical standards (e.g., PIRADS versions), acquisition protocols, and equipment from various manufacturers. Such variability has been shown to affect the efficacy of AI-driven tools, echoing the influence of heterogeneous population characteristics [10]. Given this intricate landscape, addressing transparency in AI for medical imaging demands a holistic approach. It requires diligent practices and robust technical measures to tracing and keep track of all the relevant choices across the entire AI life-cycle, from the initial data gathering phase through to the development, deployment, and operational stages of the system. Such an all-encompassing approach is vital for effectively managing the challenges posed by varied data origins, changing clinical guidelines, and the integration of imaging with clinical data for holistic analyses. In this context, the pursuit of transparency entails the establishment of a traceability system that serves as a definitive indicator of integrity and responsibility for both the AI technologies and the data on which they operate, thereby ensuring every phase is marked by clearness and accountability. Transparency, in this scenario as in any other critical one, is articulated through the following key dimensions: • Data Traceability: this involves meticulous documentation of the data origins, including details about who owns the data, how it was gathered, the clinical standards adhered to during collection, steps taken during data curation, locations and methods of data storage, and any pre-processing activities carried out. Such detailed record-keeping ensures every piece of data can be tracked through its life-cycle. • AI System Traceability: this aspect covers the comprehensive and methodical documenta- tion concerning the development, validation, and testing processes of the AI system. A standardized reporting format ensures that every step in the creation and application of the AI system can be audited and reviewed. • Decision Transparency: this focuses on elucidating the AI system’s decision-making processes. By providing clear explanations of the logic and rationale underpinning the AI’s classifications or predictions, healthcare professionals are empowered to make informed decisions based on AI insights. The traceability of data relies on well-established guidelines concerning data provenance and reuse. In contrast, traceability for AI models is still in the process of comprehensive development, despite some functionalities being partially integrated into MLOps frameworks. Decision transparency, a concept gaining momentum within the domain of eXplainable AI (XAI), represents a renewed emphasis on the decision-making process, drawing from a long-lasting pursuit [11, 12]. XAI is instrumental in facilitating productive collaboration between humans and AI systems. However, successful implementation necessitates the application of techniques from a broad array of fields, including human-computer interactions. Establishing standardized explanations from AI systems is paramount, ensuring they are robust and aligned with user needs to empower individuals and grant them complete control over the system. 3. Transparency measures in ProCAncer-I The ProCAncer-I project aims to build the largest database of anonymized multiparametric MRI images related to prostate cancer, while adhering to the regulations outlined in the European Union General Data Protection Regulation (GDPR). The project’s scope encompasses a spectrum of clinical scenarios, ranging from the diagnosis and characterization of prostate cancer to predicting responses to treatment and the likelihood of side effects post-treatment. To this end, the clinical partners participating to the ProCAncer-I consortium have meticu- lously outlined the collection requirements for all clinical, imaging, pathology, and follow-up data. These requirements encompass essential clinical details that must accompany the images, such as prostate-specific antigen levels, biopsy outcomes, and confirmations of prostate can- cer through prostatectomy reports. Additionally, specific guidelines have been established to capture vital information related to the medical images, aligned with the unique needs of each clinical scenario. This alignment ensures the development of an AI model that can effectively address the objectives set forth for the project. Utilizing these multimodal data, the technical partners are currently focused on creating AI models capable of addressing the primary clinical tasks outlined in the project. As part of this effort, a dedicated project activity is focused on ensuring trustworthiness and transparency by addressing all three dimensions mentioned above. 3.1. Data traceability Within the project, medical and clinical data have undergone FAIRification, conforming to the principles of being Findable, Accessible, Interoperable, and Reusable. This process has been facilitated through a GDPR-compliant project infrastructure, utilizing the MOLGENIS metadata platform. This platform serves as the central metadata repository, enabling users to search for clinical and imaging metadata and assemble cohorts based on various variables. To represent the multimodal dataset, the Observational Medical Outcomes Partnership (OMOP)-Common Data Model (CDM) framework was expanded to include standardized imaging attributes [13]. This enhancement enabled streamlined cohort identification by leveraging DICOM metadata for the training and quality assurance of AI models. Furthermore, the extension encompassed refinements in the curation processes, establishing connections between the original and curated images through the utilization of standardized vocabularies. 3.2. AI System traceability AI system traceability requires comprehensive documentation of the entire development process of an AI model or system, justifying and tracing it back to the data used for training and validation, the involved contributors, as well as the processing and refinement steps undertaken. This information is encapsulated in what we refer to as the AI Model Passport, which is housed within a designated model registry. In drafting the content of the AI Model Passport, an exhaustive review of existing literature on AI traceability was conducted. The aim was to identify any available solutions, potential gaps, or shortcomings. The literature review categorized the identified works and approaches into three primary groups: • Data and model provenance schemas • Existing traceability tools • Guidelines and recommendations The initial focus was on provenance models designed to track the development of an AI model by archiving its history in connection with the data elements involved in its creation and the processes contributing to its evolution. Several models, particularly those centered on data, have emerged within the context of Open Science and data FAIRification. Notably, models such as DublinCore and CRMDig have already established themselves as standards. While various models addressing AI model lineage are available, widespread adoption and standardization remain limited. Notably, leading tech companies offer advanced models, though they have yet to be universally recognized as standards. The investigation continued with an exploration of existing tools devised or conceptualized to facilitate AI traceability. Progress in this domain is rapid, with advancements stemming from areas beyond AI, such as software development and Continuous Integration and Con- tinuous Deployment (CI/CD) life-cycles. Nonetheless, current tools fall short of providing all-encompassing end-to-end traceability support. Achieving this level of functionality would demand significant generalization capabilities and a comprehensive analysis of all pertinent factors. The inquiry also extended to the latest traceability recommendations and guidelines. These guidelines delineate the essential information required for documenting the development, deployment, and usage of AI solutions. A cohesive proposal outlining the entities, components, and tags to be recorded is necessary to ensure comprehensive and meticulous record-keeping. 3.2.1. ProCAncer-I AI Model Passport The structure, content, and organization of the ProCAncer-I AI Model Passport were defined through a comprehensive analysis of the literature as summarized above, along with an in-depth examination of data, AI/ML model provenance schemas, and the existing traceability tools. A divide-et-impera strategy was adopted by scrutinizing each phase of the AI development and deployment chain individually, starting from data selection to model in-production monitoring. Subsequently, we worked to delineate the content required for the Passport for each of these stages. The ProCAncer-I AI Model Passport has been formulated as a minimal provenance model schema comprising information aligning with the various stages of the AI chain: 1. The data collection process, tracking dataset characteristics and data localization. 2. The data processing pipeline, detailing data transformations from raw data to harmonized datasets. 3. The model training and validation process, including features extracted from data, training parameters, evaluation metrics, test conditions, and general AI/ML model characteristics for user monitoring and reproducibility. 4. The model operation and monitoring, storing performance metrics, uncertainty estimation, and metrics for detecting performance changes. Figure 1: Phases of the AI development and deployment chain with a sample schema initially modeled for the AI Training phase. For each phase, we analyzed existing schemas, their content, and ontologies to identify existing information. This guided the creation of an initial metadata schema with essential items required for each stage. The phased organization and an example of the initial schema are depicted in Figure 1. Real examples of AI models and pipelines developed within the project were used to ensure the drafted set included all necessary items and considered all options in the ontologies and vocabularies. This comprehensive consideration involved iterative rounds of consultation between Passport designers and AI developers, documented through written specification documents. Figure 2 illustrates the initial version of the Passport featuring vision transformers developed as part of the project. The current version of the passport has already been seamlessly integrated with industry- leading tools for data governance and management, such as DVC, and MLOps frameworks, like MLflow. This integration automates the population of metadata fields from these tools, reducing the need for manual input significantly. While the integration was an intensive effort, the results represent a groundbreaking solution. 3.3. Decision or algorithmic transparency Research in Explainable Artificial Intelligence (XAI) aims to equip human decision-makers with insights into the operational logic of AI systems, particularly regarding their decision- making processes. However, the inherently data-driven nature of these algorithms means they Figure 2: An example instantiating the ProCAncer-I AI Model Passport for a Vision Transformer developed in the project, with possible options for each item in brackets. operate based on implicit problem specifications learned during training, which might not be immediately transparent or interpretable to humans. From the inception of the ProCAncer-I project, a concerted effort was made by both technical and clinical partners to delineate an optimal strategy for achieving explainability. This strategy required careful consideration of the various model types being developed (such as deep learning or radiomics) and the specific clinical tasks being addressed (for example, segmentation, detection, characterization). The process involved a comprehensive evaluation of the different use cases and AI methodologies to strike a balance that met the needs and expectations of both AI developers and clinical end-users. Upon thorough analysis, a consensus was reached to integrate both local and global explain- ability techniques. This dual approach aims to assess and validate the AI model’s performance from a broad perspective while providing targeted explanations for individual predictions or decisions. The exploration of XAI methods for both radiomics and DL models revealed specific challenges and necessitated tailored approaches for each. In this respect, a significant initiative undertaken by the clinical and technical partners within the consortium was the organization of a thematic session on AI explainability, conducted during a Consortium Plenary meeting. The purpose of this session was threefold: to gauge the clinical partners’ understanding of explainability and the various XAI methodologies, to discern their expectations concerning the explainability and interpretability of AI models, and to solicit their preferences regarding expla- nation modalities. Concerning the last objective, when exploring preferences for explanation Figure 3: Left: types of explanations and visualization modalities for deep learning models. Right: Ranking of participants’ preference for each option modalities in the context of a deep learning models applied to general image-based prediction tasks (as shown in Figure 3 on the left), the participants demonstrated a clear preference for explanations that employed visual saliency maps to highlight areas of the image significantly impacting the model’s prediction. This approach was the most favored, with prototype-based explanations ranking as the subsequent preferred option. These findings are going to be reflected in the implementation choices for the visualization and provisioning of explanations. 4. Conclusions The ProCAncer-I project is dedicating significant resources to guarantee that the AI models crafted within the project garner trust and acceptance among the involved clinical partners but also across the broader clinical community and patients. In this respect, the introduction of the AI Model Passport emerges as an innovative strategy designed to streamline compliance with transparency directives, will be more and more stringent in the near future. Acknowledgments The work was partially supported by by the ProCancer-I European Union’s H2020 program under Grant Agreement No. 952159 and the Tuscany Region project NAVIGATOR funded and supported by Bando Ricerca Salute Regione Toscana 2018 (DD 15397/2018). References [1] L. Marti-Bonmati, D.-M. Koh, K. Riklund, M. Bobowicz, Y. Roussakis, J. C. Vilanova, J. J. Fütterer, J. Rimola, P. Mallol, G. Ribas, et al., Considerations for artificial intelligence clinical impact in oncologic imaging: an ai4hi position paper, Insights into Imaging 13 (2022) 89. [2] HLEG, High level expert group. ethics guidelines for trustworthy ai, https://tinyurl.com/ 4tej3t38, 2019. Accessed: 2024-03-15. [3] EC, Artificial intelligence act, legislative resolution of 13 march 2024 on the proposal for a regulation of the european parliament and of the council on laying down harmonised rules on artificial” p9 ta(2024)0138, https://www.europarl.europa.eu/doceo/document/ TA-9-2024-0138_EN.pdf, 2024. Accessed: 2024-03-15. [4] K. Lekadir, R. Osuala, C. Gallin, N. Lazrak, K. Kushibar, G. Tsakou, S. Aussó, L. C. Alberich, K. Marias, M. Tsiknakis, S. Colantonio, N. Papanikolaou, Z. Salahuddin, H. C. Woodruff, P. Lambin, L. Martí-Bonmatí, Future-ai: Guiding principles and consensus recommenda- tions for trustworthy artificial intelligence in medical imaging, 2023. arXiv:2109.09658 . [5] K. Lekadir, A. Feragen, A. J. Fofanah, A. F. Frangi, A. Buyx, A. Emelie, A. Lara, A. R. Porras, A.-W. Chan, A. Navarro, et al., Future-ai: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare, arXiv preprint arXiv:2309.12325 (2023). [6] R. Borgheresi, A. Barucci, S. Colantonio, G. Aghakhanyan, M. Assante, E. Bertelli, E. Carlini, R. Carpi, C. Caudai, D. Cavallero, et al., Navigator: an italian regional imaging biobank to promote precision medicine for oncologic patients, European radiology experimental 6 (2022) 53. [7] E. Bertelli, L. Mercatelli, C. Marzi, E. Pachetti, M. Baccini, A. Barucci, S. Colantonio, L. Gherardini, L. Lattavo, M. A. Pascali, et al., Machine and deep learning prediction of prostate cancer aggressiveness using multiparametric mri, Frontiers in oncology 11 (2022) 802964. [8] E. Pachetti, S. Colantonio, M. A. Pascali, On the effectiveness of 3d vision transformers for the prediction of prostate cancer aggressiveness, in: International Conference on Image Analysis and Processing, Springer, 2022, pp. 317–328. [9] E. Pachetti, S. Colantonio, 3d-vision-transformer stacking ensemble for assessing prostate cancer aggressiveness from t2w images, Bioengineering 10 (2023) 1015. [10] N. M. Rodrigues, J. G. de Almeida, A. S. C. Verde, A. M. Gaivão, C. Bilreiro, I. Santiago, J. Ip, S. Belião, R. Moreno, C. Matos, et al., Analysis of domain shift in whole prostate gland, zonal and lesions segmentation and detection, using multicentric retrospective data, Computers in Biology and Medicine (2024) 108216. [11] S. Colantonio, M. Martinelli, D. Moroni, O. Salvetti, D. Perticone, A. Sciacqua, F. Chiarugi, D. Conforti, A. Gualtieri, V. Lagani, Decision support and image & signal analysis in heart failure, Proc. of HEALTHINF. Madeira (2008) 288–295. [12] F. Chiarugi, S. Colantonio, D. Emmanouilidou, M. Martinelli, D. Moroni, O. Salvetti, Deci- sion support in heart failure through processing of electro-and echocardiograms, Artificial intelligence in medicine 50 (2010) 95–104. [13] V. Kalokyri, H. Kondylakis, S. Sfakianakis, K. Nikiforaki, I. Karatzanis, S. Mazzetti, N. Tachos, D. Regge, D. I. Fotiadis, K. Marias, et al., Mi-common data model: Extending observational medical outcomes partnership-common data model (omop-cdm) for registering medical imaging metadata and subsequent curation processes, JCO Clinical Cancer Informatics 7 (2023) e2300101.