Explaining Artificial Intelligence Predictions of Disease Progression with Semantic Similarity Susana Nunes† , Rita T. Sousa† , Filipa Serrano† , Ruben Branco, Diogo F. Soares, Andreia S. Martins, Eleonora Auletta, Eduardo N. Castanho, Sara C. Madeira, Helena Aidos and Catia Pesquita* LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal Abstract The complexity of neurodegenerative diseases has motivated the development of artificial intelligence approaches to predicting risk of impairment and disease progression. However, and despite the success of these approaches, their mostly black-box nature hinders their adoption for disease management. Explainable artificial intelligence holds the promise to bridge this gap by producing explanations of models and their predictions that promote understanding and trust by users. In the biomedical domain, given its complexity, explainable artificial intelligence approaches have much to benefit from being able to link models to representations of domain knowledge – ontologies. Ontologies afford more explainable features because they are semantically enriched and contextualized and as such can be better understood by end users; and they also model existing knowledge, and thus support inquiry into how a given artificial intelligence model outcome fits with existing scientific knowledge. We propose an explainability approach that leverages on the rich panorama of biomedical ontologies to build semantic similarity-based explanations that contextualize patient data and artificial intelligence predictions. These explanations mirror a fundamental human explanatory mechanism - similarity - while tackling the challenges of data complexity, heterogeneity and size. Keywords Semantic Similarity, Ontology, Amyotrophic Lateral Sclerosis, Explainable AI 1. Introduction Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disease targeting the upper and lower motor neurons, leading to progressive and diffuse paralysis and eventual respiratory death. The underlying causes of ALS are not fully understood, and there is a heterogeneity of clinical symptoms and a highly fluctuating life expectancy. Therefore, multiple approaches have been proposed to predict disease progression to improve patient-personalized treatment [1, 2]. The iDPP  CLEF 2022 [3, 4] is focused on the evaluation of Artificial Intelligence (AI) CLEF 2022: Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy * Corresponding author. † These authors contributed equally. $ scnunes@ciencias.ulisboa.pt (S. Nunes); risousa@ciencias.ulisboa.pt (R. T. Sousa); fserrano@lasige.di.fc.ul.pt (F. Serrano); rmbranco@ciencias.ulisboa.pt (R. Branco); dfsoares@ciencias.ulisboa.pt (D. F. Soares); amartins@lasige.di.fc.ul.pt (A. S. Martins); eauletta@ciencias.ulisboa.pt (E. Auletta); ejcastanho@ciencias.ulisboa.pt (E. N. Castanho); sacmadeira@ciencias.ulisboa.pt (S. C. Madeira); haidos@ciencias.ulisboa.pt (H. Aidos); clpesquita@ciencias.ulisboa.pt (C. Pesquita) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) algorithms to predict ALS progression. Task 2, in particular, is dedicated to predicting time of impairment targeting different events: non-invasive ventilation (NIV), percutaneous endoscopic gastrostomy (PEG), and death. It includes highly-curated datasets based on real ALS patients, followed at clinical institutions in Lisbon, Portugal, and Turin, Italy. Machine learning (ML) approaches are increasingly successful in predicting the progression of ALS, including the prognostic prediction of the need for non-invasive ventilation [5, 6] and patient profiling [7]. The complexity of the underlying domain and data, coupled with the black-box nature of many of the algorithms employed hinder their widespread adoption on the management of these diseases. To tackle this, one of the challenge’s goals is also to propose new approaches to make those prediction algorithms explainable. Explainable artificial intelligence (XAI) is focused on developing approaches that ensure algorithmic fairness, identify potential bias/problems in the training data, ensure that the algorithms perform as expected, and bridge the gap between the ML community and other scientific disciplines [8]. The lack of trust is the main barrier in the adoption of AI in clinical practice [9]. XAI approaches can be widely classified into two categories: transparent models, which are interpretable by design and include decision trees and linear models [10] and post- hoc explainability[11, 12], where models are explained through external techniques. This last category can also be divided into model-agnostic techniques, designed to be applied to any ML model, and model-specific techniques tailored to explain a particular ML model. Recently, post-hoc methods have been applied to explain recurrent neural networks that predict the need for non-invasive ventilation [13]. In the biomedical domain, given its complexity, XAI approaches have much to benefit from being able to link ML models to representations of domain knowledge [14, 15]. These Explainable Knowledge-enabled Systems include a representation of the domain knowledge and support explanation approaches that are context-aware and provenance enabled [16]. This knowledge integration is crucial, since well-grounded explanations should be domain-dependent, which means that they should be set within a context that depends on the task, background knowledge and expectations of the user [17]. Semantic web technologies such as ontologies and knowledge graphs represent an unpar- alleled solution to the problem of knowledge-enabled explanations since they provide the needed semantic context [18]. Ontologies are semantic models that encode the knowledge of a domain, in which each element is precisely defined, and the relationships between elements are parameterized or constrained [19]. Ontologies add two very relevant dimensions to explainability: one the one hand, they afford explainable features, i.e., features that are semantically enriched and contextualized and as such can be better understood by end users; and on the other, they model existing knowledge, and thus support inquiry into how a given AI model outcome fits with extant knowledge. These two aspects are key for explainability and trust. This work builds on the recently developed Brainteaser Ontology [20] and explores how this ontology coupled with the rich panorama of more general biomedical ontologies can support similarity-based explanations for patient end-stage event predictions that build upon the contextualization of patient data and AI predictions. 2. Proposal Similarity assessment is a natural explanatory mechanism [21] since the identification of similar features to group similar objects is a basic cognitive ability. This is a frequent strategy in clinical settings, where similarity between patients can be used by clinicians to help decide on the best course of action through analogical reasoning [22]. Measuring similarity is also fundamental to many ML algorithms, making similarity both a representation that can be explored by ML and explanation approaches. The basis of a similarity-based explanation is that a prediction for a given entity is formulated based on its similarity to other relevant entities with known outcome. While comparing the numerical values assigned to patient features such as weight or blood pressure can be achieved directly, the comparison of more complex concepts, such as symptoms, is not straightforward. However, when ontologies are used to describe data, the semantic annotations of data objects can be used to compute their semantic similarity[23]. A semantic annotation assigns real-world entities in a domain to their semantic description (i.e. class of an ontology) [24], and the ontology data model can be applied to a set of individual entities to create a knowledge graph [25]. Ontologies thus provide the scaffolding for comparing entities at a higher level of complexity by comparing their meaning. A semantic similarity measure is then a function that, given two entities described in the knowledge graph, returns a numerical score reflecting the closeness in meaning between them. Figure 1 presents a subgraph of the Brainteaser Ontology (BO) enriched with the National Cancer Institute Thesaurus (NCIt) ontology [26], where patients are described according to the symptoms that characterize them. Symptoms are organized into a hierarchical structure given by the is_a links. Patients B and C can be considered more similar because they share the annotations "Spasticity" and "Feeding difficulties and mismanagement". Patient A is also somewhat similar to Patient B since they are both annotated with "Muscle symptom". Most semantic similarity measures are taxonomic in nature, they explore the hierarchical relations between classes in an ontology. For instance, Patient B is directly annotated with "Muscle twitching" whereas Patient A exhibits "Muscle symptom". By exploring the hierarchical relations modeled in the ontology, both patients share an annotation to "Muscle symptom". Consequently, ontologies with a rich taxonomic backbone typically allow more precise measures of semantic similarity. Semantic similarity-based explanations are based on understanding why particular objects, in this case, patients, are considered similar or different and how that relates to the AI predictions. Classical semantic similarity returns a single score, which on the one hand is easy to understand and condenses the representation of a potential very large feature set into a single score, but on the other affords little insight. To improve on the level of detail supported by similarity-based explanations, we propose to compute several semantic similarity scores that describe similarity in meaning under different semantic aspects encoded by the ontologies. A semantic aspect (SA) corresponds to a subgraph of the ontology rooted at a class of interest. For example, an explanation may be grounded on three different semantic similarity scores: co-morbidities similarity, lifestyle similarity and symptom similarity. These semantic similarity-based explanations are considered post-hoc, since they provide explanations for the models’ predictions, as opposed to approaches that explain the models’ workings. They are also considered local, since they focus on explaining specific predictions Figure 1: Semantic annotation of patients using a subgraph of Brainteaser Ontology enriched with the NCIt ontology. made by the model, instead of providing global explanations that apply to all predictions [27]. In particular, semantic similarity explanations focus on how a certain prediction fits with the patient features and their interpretation according to existing scientific knowledge encoded in the ontologies. 3. Methodology In the context of the iDPP  CLEF 2022, we propose a novel approach that generates semantic similarity-based explanations for patient-level predictions. The underlying idea is that we can explain the prediction for one patient by considering aspect-oriented semantic similarity with other relevant patients based on the most important features used by ML approaches or selected by users. To build rich and easy to understand semantic-similarity based explanations, our approach requires five steps (see Figure 2) : (1) the enrichment of the Brainteaser Ontology through integration of other biomedical ontologies; (2) the semantic annotation of patients (if not already available); (3) the similarity calculation between patients; (4) selection of the set of patients to explain a specific prediction; and (5) the visualization of the generated similarity-based explanations. 3.1. Ontology Integration To ensure a rich and comprehensive semantic annotation of the data, we enrich the Brainteaser Ontology with links to other ontologies. The BO models the data collected to describe disease progression in ALS and MS. It reuses single classes from other biomedical ontologies listed in Table 1. However it does not include import statements which hinders its ability to support a more complete semantic similarity calculation since it is missing the context of the reused classes. By explicitly linking BO to the other 9 biomedical ontologies and controlled vocabularies it reuses through import statements we can establish a rich semantic landscape with approximately 770,680 classes in total and covering a variety of domains. Table 1 Main ontologies and number of classes used to create the Brainteaser Ontology. Ontology Classes National Cancer Institute thesaurus (NCIt) 173,001 Ontology of Genes and Genomes (OGG) 69,689 Uberon 20,849 Medical Action Ontology (MAxO) 15,086 Ontology for MIRNA Target (OMIT) 90,916 SNOMED Clinical Terms (SNOMED CT) 358,483 Anatomical Therapeutic Chemical Classification (ATCC) 6,567 International Standard Classification of Occupations (ISCO) 619 Experimental Factor Ontology (EFO) 35,470 Figure 3 represents the impact of this enrichment via integration. On the left side the subgraph of the neighborhood of the classes "Rheumatoid Arthritis", "Pulmonary Sarcoidosis" and "Heart Disorder" as modeled in the BO is shown. Using the unenriched BO ontology, would result in patients exhibiting theses diseases or disorders being all considered equally similar: the shared meaning of these ontology classes is simply that they are all subclasses of "Disease or Disorder". The right side illustrates the enriched subgraph including the full NCIt hierarchy, which encodes considerably more contextual information, and supports semantic similarity measures in correctly identifying that "Rheumatoid Arthritis" and "Pulmonary Sarcoidosis" are more similar, since they are both Autoimmune Diseases. 3.2. Patient Semantic Annotation Semantic annotation needs to be performed only when patient data is not annotated with the ontology. Semantic annotations can be performed manually, but there are a number of existing tools that facilitate this step such as the well known NCBO Annotator [28, 29]. An annotation assigns an ontology class to describe an entity. In our case an annotation is a tuple a = that defines an ontology class to describe a patient at a given point in time (clinical visit). A patient can then be represented by a vector of all their annotations, i.e. the ontology classes that describe them and the time points where the assignment was registered. 𝑃 = {𝑎1 , ..., 𝑎𝑛 } (1) Figure 2: Overview of the semantic-similarity explanations methodology 3.3. Semantic Similarities between Patients We take as input the most important features employed by ML methods to predict disease progression and/or other features considered relevant by users, and use them to define the different semantic aspects for which semantic similarity is calculated. More formally, the similarity between two patients corresponds to a vector of semantic similarity scores corresponding to each relevant aspect. This similarity may be computed irrespective of time of annotation (considering all annotations for all time points) or by time point. 𝑠𝑖𝑚(𝑃1 , 𝑃2 ) = {𝑠𝑖𝑚𝑆𝐴1 (𝑃1 , 𝑃2 ), ..., 𝑠𝑖𝑚𝑆𝐴𝑛 (𝑃1 , 𝑃2 )} (2) Several well-established works explore the taxonomical (hierarchical) component of the ontologies to measure the shared meaning between two entities in the ontology [30]. A popular semantic similarity measure is ResnikBMA based on the class-based measure proposed by Resnik [31] in which the similarity between two classes corresponds to the In- Figure 3: Impact of the Brainteaser Ontology enrichment through the integration of several relations between domains. formation Content (IC) of their most informative common ancestor. The IC is a measure the reflects the specificity of a class in the ontology. 𝑠𝑖𝑚(𝑐1 , 𝑐2 ) = 𝑚𝑎𝑥{𝐼𝐶(𝑐)) : 𝑐 𝜖{𝐴(𝑐1 ) ∩ 𝐴(𝑐2 )}} (3) where 𝑐 is a class in 𝐴(𝑐𝑖 ), the set of ancestors of 𝑐𝑖 . ResnikBMA then uses the Best-Match Average to consider the best scoring pairs of classes from each entity. Σ𝑐1 𝜖𝐴(𝑒1 ) 𝑠𝑖𝑚(𝑐1 , 𝑐2 ) Σ𝑐2 𝜖𝐴(𝑒2 ) 𝑠𝑖𝑚(𝑐1 , 𝑐2 ) 𝐵𝑀 𝐴(𝑒1 , 𝑒2 ) = + (4) 2|𝐴(𝑒1 )| 2|𝐴(𝑒2 )| where 𝐴(𝑒𝑖 ) is the number of annotations for entity 𝑒𝑖 and 𝑠𝑖𝑚(𝑒1 , 𝑒2 ) is the semantic similarity between the class 𝑐1 and class 𝑐2 . Recently, more sophisticated approaches based on knowledge graph embeddings allow the representation of each entity with a vector that approximates the similarity properties of the graph and can then be used to compute similarity using operations such as cosine similarity [32]. 3.4. Building Semantic Similarity Explanations The models proposed by our team for the task of predicting the time of impairment for ALS of the iDPP  CLEF 2022 addressed the prediction of the event separately from predicting the time window to the event. For event prediction, all time points were considered but the first three proved to be the most relevant for the models. These were then also used in predicting time to event. Likewise, to develop explanations better suited for this approach, we also developed two types of explanations: event explanation and time window to event explanation. A key aspect is selecting the most relevant patients to explain an event prediction. We define three types of explanatory patients: (1) The N most similar patients exhibiting the same outcome; (2) The N least similar patients with the same outcome; (3) The N most similar patients exhibiting a different outcome. Since we compute several similarities in the previous step, to select these explanatory patients, we propose an aggregated similarity where different features can be assigned weights. Candidate explanatory patients are then ranked by their semantic similarity to the target patient and the final set of explanatory patients is selected. A semantic similarity explanation of a given target patient can then be defined as a two- dimensional tensor where the first dimension represents patients and the second dimension the semantic similarities according to different aspects: 𝑆𝑆𝐸(𝑃𝑡𝑎𝑟𝑔𝑒𝑡 ) = {𝑠𝑖𝑚(𝑃𝑡𝑎𝑟𝑔𝑒𝑡 , 𝑃𝑖 ), ..., 𝑠𝑖𝑚(𝑃𝑡𝑎𝑟𝑔𝑒𝑡 , 𝑃𝑛 )} (5) where 𝑃𝑖..𝑛 correspond to patients sampled from the three types. Regarding the prediction of a time window to a specific event, we follow a similar approach: (1) The N most similar patients exhibiting the same event in the same time window; (2) The N least similar patients exhibiting the same event in the same time window; (3) The N most similar patients exhibiting the same event in a different time window. N can be defined by the end-user, and may take a different value for each patient type. The explanations for the time window to event predictions take the form of a three-dimensional tensor, where the first dimension is the explanatory patients, the second dimension corresponds to the semantic similarities according to different aspects and the third dimension represents the time points. 3.5. Explanation Visualization Finally, one fundamental aspect of explainability is communication to the end user. Our proposal orchestrates semantic similarity explanations into a visualization that combines global and aspect-oriented similarity for different sets of relevant patients. We chose to use heatmaps to visualize the similarity explanations. For event prediction explanations the representation of the two-dimensional tensor is straightforward, the y-axis contains the patients and the x-axis the similarity scores according to the different aspects. Each cell in the map is a colour-coded representation of the similarity of each patient to the patient we want to explain for each feature, where a darker colour corresponds to higher similarity. The visualization also includes the global similarity employed to select the different types of patients. For time to event explanations, the third dimension is encapsulated within the x-axis with three cells (one per time point). 4. Semantic Similarity Explanations To showcase our proposal for semantic similarity explanations, we simulated patients with annotations to the BO. One of them is the target patient for whom an event is predicted and an explanation is built. The remaining patients are patients for whom the outcome is known and they are used to explain the target patient. To explain an event prediction, we set N=2 for the three types of explanatory patients. To explain time window prediction we employed N=1,5,1 for each explanatory patient type respectively. The heatmaps were built using the following selection of features: lifestyle, onset, co-morbidities, symptoms and pharmacological substances. Figure 4 depicts the event prediction explanation, while Figure 5 illustrates the time window to event prediction explanation. Patients in green correspond to the most similar patients to the target patient (P1) and whose outcome event or time to event is the same. Patients in white correspond to the most similar patients similar to P1, but with a different prediction. Finally, patients in pink correspond to the least similar patients to the target patient, but who share the same prediction. Figure 4: Semantic similarity explanation for the event prediction. Figure 5: Semantic similarity explanation for the time window to the event prediction. 5. Conclusions Similarity is a natural explanatory approach, but computing similarity for complex data is challenging. Semantic similarity provides an opportunity for explainability in the context of the iDPP  CLEF 2022 since it allows more complex patient comparisons supported by the scientific context encoded in ontologies. Our proposal is based on first enriching the Brainteaser Ontology with explicit imports of reused ontologies to support a more granular computation of patient semantic similarity. The semantic similarity explanations are based on calculating specific semantic similarity values according to relevant aspects either identified as relevant for the ML predictions or selected by the end users. The end result is heatmap-based visualizations that allow the comparison of the target patient whose predictions we want to understand with other relevant patients with known outcomes. This method can be integrated with other types of similarities, for instance, comparisons of Revised Amyotrophic Lateral Sclerosis Functional Rating Scale (ALSFRS-R) scores. A challenge for this explainability method is that it requires patient data to be sufficiently detailed to ensure high quality annotations. The data currently available for the challenge can be used to support these methods, but it does not yet afford a complete patient representation under the BO ontology. As data becomes richer and more detailed, the potential value of semantic similarity explanations increases. However, their true value must be measured in user studies, where predictions and explanations are shown to experts and their ability to help users understand a prediction is assessed. Acknowledgements The authors are funded by the FCT through LASIGE Research Unit (ref. UIDB/00408/2020 and ref. UIDP/00408/2020),AIpALS (PTDC/CCI-CIF/4613/2020), PhD research scholarships to RTS (SFRH/BD/145377/2019), DFS (2020.05100.BD), ENC (2021.07810.BD); and by the BRAINTEASER project and KATY project that have received funding from the European Union’s Horizon 2020 research and innovation programme under the grants agreements No 101017598 and No 101017453, respectively. References [1] V. Grollemund, P.-F. Pradat, G. Querin, F. Delbot, G. Le Chat, J.-F. Pradat-Peyre, P. Bede, Machine learning in amyotrophic lateral sclerosis: achievements, pitfalls, and future directions, Frontiers in neuroscience 13 (2019) 135. [2] S. Pires, M. Gromicho, S. Pinto, M. d. Carvalho, S. C. Madeira, Patient stratification using clinical and patient profiles: Targeting personalized prognostic prediction in als, in: International Work-Conference on Bioinformatics and Biomedical Engineering, Springer, 2020, pp. 529–541. [3] A. Guazzo, I. Trescato, E. Longato, E. Hazizaj, D. Dosso, G. Faggioli, G. M. Di Nunzio, G. Silvello, M. Vettoretti, E. Tavazzi, C. Roversi, P. Fariselli, S. C. Madeira, M. de Car- valho, M. Gromicho, A. Chiò, U. Manera, A. Dagliati, G. Birolo, H. Aidos, B. Di Camillo, N. Ferro, Intelligent Disease Progression Prediction: Overview of iDPP@CLEF 2022, in: A. Barrón-Cedeño, G. Da San Martino, M. Degli Esposti, F. Sebastiani, C. Macdonald, G. Pasi, A. Hanbury, M. Potthast, G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Mul- tilinguality, Multimodality, and Interaction. Proceedings of the Thirteenth International Conference of the CLEF Association (CLEF 2022), Lecture Notes in Computer Science (LNCS) 13390, Springer, Heidelberg, Germany, 2022. [4] A. Guazzo, I. Trescato, E. Longato, E. Hazizaj, D. Dosso, G. Faggioli, G. M. Di Nunzio, G. Silvello, M. Vettoretti, E. Tavazzi, C. Roversi, P. Fariselli, S. C. Madeira, M. de Carvalho, M. Gromicho, A. Chiò, U. Manera, A. Dagliati, G. Birolo, H. Aidos, B. Di Camillo, N. Ferro, Overview of iDPP@CLEF 2022: The Intelligent Disease Progression Prediction Challenge, in: G. Faggioli, N. Ferro, A. Hanbury, M. Potthast (Eds.), CLEF 2022 Working Notes, CEUR Workshop Proceedings (CEUR-WS.org), ISSN 1613-0073, 2022. [5] D. Soares, R. Henriques, M. Gromicho, S. Pinto, M. d. Carvalho, S. C. Madeira, Towards triclustering-based classification of three-way clinical data: A case study on predicting non-invasive ventilation in als, in: International Conference on Practical Applications of Computational Biology & Bioinformatics, Springer, 2020, pp. 112–122. [6] A. S. Martins, M. Gromicho, S. Pinto, M. de Carvalho, S. C. Madeira, Learning prognostic models using diseaseprogression patterns: Predicting the need fornon-invasive ventilation in amyotrophic lateralsclerosis, IEEE/ACM Transactions on Computational Biology and Bioinformatics (2021). [7] T. Leão, S. C. Madeira, M. Gromicho, M. de Carvalho, A. M. Carvalho, Learning dynamic bayesian networks from time-dependent and time-independent data: Unraveling disease progression in amyotrophic lateral sclerosis, Journal of Biomedical Informatics 117 (2021) 103730. [8] L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, L. Kagal, Explaining explanations: An overview of interpretability of machine learning, in: 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA), IEEE, 2018, pp. 80–89. [9] E. Glikson, A. W. Woolley, Human trust in artificial intelligence: Review of empirical research, Academy of Management Annals 14 (2020) 627–660. [10] C. Rudin, Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead, Nature Machine Intelligence 1 (2019) 206–215. [11] M. T. Ribeiro, S. Singh, C. Guestrin, "Why should i trust you?": explaining the predictions of any classifier, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, Association for Computing Machinery, New York, NY, USA, 2016, p. 1135–1144. doi:10.1145/2939672.2939778. [12] R. Guidotti, A. Monreale, S. Ruggieri, D. Pedreschi, F. Turini, F. Giannotti, Local rule-based explanations of black box decision systems, arXiv preprint arXiv:1805.10820 (2018). [13] M. Müller, M. Gromicho, M. de Carvalho, S. C. Madeira, Explainable models of disease progression in als: Learning from longitudinal clinical data with recurrent neural networks and deep model explanation, Computer Methods and Programs in Biomedicine Update 1 (2021) 100018. [14] A. Holzinger, C. Biemann, C. S. Pattichis, D. B. Kell, What do we need to build explainable ai systems for the medical domain?, arXiv preprint arXiv:1712.09923 (2017). [15] B. Wollschlaeger, E. Eichenberg, K. Kabitzsch, Explain yourself: A semantic annotation framework to facilitate tagging of semantic information in health smart homes., in: HEALTHINF, 2020, pp. 133–144. [16] I. Tiddi, et al., Directions for explainable knowledge-enabled systems, Knowledge Graphs for eXplainable Artificial Intelligence: Foundations, Applications and Challenges 47 (2020) 245. [17] D. Gunning, M. Stefik, J. Choi, T. Miller, S. Stumpf, G.-Z. Yang, XAI - explainable artificial intelligence, Science Robotics 4 (2019) eaay7120. doi:10.1126/scirobotics.aay7120. [18] F. Lécué, On the role of knowledge graphs in explainable ai, Semantic Web 11 (2019) 41–51. [19] S. Staab, R. Studer, Handbook on ontologies, Springer-Verlag, Berlin Heidelberg, 2010. doi:10.1007/978-3-540-92673-3. [20] M. Bettin, G. M. Di Nunzio, D. Dosso, G. Faggioli, N. Ferro, N. Marchetti, G. Silvello, Deliverable 9.1 – Project ontology and terminology, including data mapper and RDF graph builder, BRAINTEASER, EU Horizon 2020, Contract N. GA101017598. https://brainteaser. health/, 2021. [21] D. Wang, Q. Yang, A. Abdul, B. Y. Lim, Designing theory-driven user-centric explainable ai, in: Proceedings of the 2019 CHI conference on human factors in computing systems, 2019, pp. 1–15. [22] D. Dumas, P. A. Alexander, L. M. Baker, S. Jablansky, K. N. Dunbar, Relational reasoning in medical education: Patterns in discourse and diagnosis., Journal of Educational Psychology 106 (2014) 1021. [23] C. Pesquita, Towards semantic integration for explainable artificial intelligence in the biomedical domain., in: HEALTHINF, 2021, pp. 747–753. [24] J. Jovanović, E. Bagheri, Semantic annotation in biomedicine: the current landscape, Journal of biomedical semantics 8 (2017) 1–18. [25] L. Ehrlinger, W. Wöß, Towards a definition of knowledge graphs., SEMANTiCS (Posters, Demos, SuCCESS) 48 (2016) 1–4. [26] S. de Coronado, L. W. Wright, G. Fragoso, M. W. Haber, E. A. Hahn-Dantona, F. W. Hartel, S. L. Quan, T. Safran, N. Thomas, L. Whiteman, The nci thesaurus quality assurance life cycle, Journal of biomedical informatics 42 (2009) 530–539. [27] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, D. Pedreschi, A survey of methods for explaining black box models, ACM computing surveys (CSUR) 51 (2018) 1–42. [28] C. Jonquet, N. Shah, C. Youn, M. Musen, C. Callendar, M.-A. Storey, Ncbo annotator: Semantic annotation of biomedical data, ISWC (2009). [29] A. Tchechmedjiev, A. Abdaoui, V. Emonet, S. Melzi, J. Jonnagaddala, C. Jonquet, Enhanced functionalities for annotating and indexing clinical text with the NCBO Annotator+, Bioinformatics 34 (2018) 1962–1965. URL: https://doi.org/10.1093/bioinformatics/bty009. doi:10.1093/bioinformatics/bty009. [30] C. Pesquita, D. Faria, A. O. Falcão, P. Lord, F. M. Couto, Semantic similarity in biomedical ontologies, PLOS Computational Biology 5 (2009) 1–12. doi:10.1371/journal.pcbi. 1000443. [31] P. Resnik, Using information content to evaluate semantic similarity in a taxonomy, in: Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 1, IJCAI’95, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1995, p. 448–453. [32] Q. Wang, Z. Mao, B. Wang, L. Guo, Knowledge graph embedding: A survey of approaches and applications, IEEE Transactions on Knowledge and Data Engineering 29 (2017) 2724– 2743. doi:10.1109/TKDE.2017.2754499.