Integrating Knowledge Graphs for Explainable
         Artificial Intelligence in Biomedicine?

             Marta Contreiras Silva, Daniel Faria, and Catia Pesquita

    LASIGE, Dep. de Informática, Faculdade de Ciências da Universidade de Lisboa,
                                      Portugal
    The rich panorama of publicly available data and ontologies in the biomedical
domain represents an unique opportunity for developing explainable knowledge-
enabled systems in biomedical Artificial Intelligence (AI) [1, 3, 4]. Building on
decades of work by the semantic web and biomedical ontologies communities,
a semi-automated approach for building and maintaining a Knowledge Graph
(KG) to support AI-based personalized medicine is within our grasp. However,
personalized medicine also poses significant challenges that require advances to
the state of the art, such as the diversity and complexity of the domain and
underlying data, coupled with the requirements for explainability.
    We propose an approach (see Figure 1) to build a KG for personalized
medicine to serve as a rich input for the AI system (ante-hoc) and incorporate its
outcomes to support explanations, by connecting input and output (post-hoc).
    A preparatory step is Data and ontology collection and curation. This
includes the selection and curation of relevant public datasets for the domain
in question, the identification of ontologies referenced by the datasets, and the
selection of other relevant ontologies to ensure adequate coverage of the domain
and sufficient semantic richness to support explanations. Additionally, data pri-
vacy inherent to patient data should inform the decision to make part of the
KG private to its data providers and the data integration process also mostly
automatic to reduce the need of human involvement [2].
    The first step in our approach is Ontology Matching. Key challenges are
scalability and complex matching, since building a comprehensive KG requires
matching multiple ontologies with hundreds of thousands of concepts, covering
different domains, and with different modeling perspectives. Regarding scalabil-
ity, our solution is to match ontologies iteratively, by matching and merging the
largest pair of ontologies into a single one, then mapping and merging this to the
third largest ontology, and so on, using complex matching algorithms to uncover
rich relations across domains [6, 7]. Before the final integration of ontologies,
alignments are partially validated by experts to ensure an accurate KG that can
support explanations [5].
    The Semantic Data Annotation process relies on the development of
parsers to interpret each type of dataset, and annotation algorithms to produce
an RDF version of the dataset that is semantically integrated into the KG.
    Finally, the Integration with the AI system ensures that the KG serves
as both input to AI methods (directly or through feature generation [8]) and
?
    Copyright © 2021 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
2       M.C. Silva et al.

also encodes the AI outcomes, which supports a shared semantic space for data,
scientific context, and predictions capable of supporting KG-based explanations
methods, including querying, reasoning and similarity searches [9].


Fig. 1. Overview of the proposed approach to build a KG to support Explainable AI
(XAI) in personalized medicine. Ontology Matching and Semantic Data Annotation
are used to construct the KG (1), which serves as input for the AI system (2), and
incorporates its outcomes (3); explanations will be derived from the KG (4).
Acknowledgments This work was supported by FCT through the LASIGE Re-
search Unit (UIDB/00408/2020 and UIDP/00408/2020). It was also partially sup-
ported by the KATY project which has received funding from the European Union’s
Horizon 2020 research and innovation program under grant agreement No 101017453.

References
1. Chari, S., Gruen, D.M., Seneviratne, O., McGuinness, D.L.: Directions for explain-
   able knowledge-enabled systems. arXiv preprint arXiv:2003.07523 (2020)
2. Chen, C., Cui, J., Liu, G., Wu, J., Wang, L.: Survey and open problems in pri-
   vacy preserving knowledge graph: Merging, query, representation, completion and
   applications. arXiv preprint arXiv:2011.10180 (2020)
3. Holzinger, A., Langs, G., Denk, H., Zatloukal, K., Müller, H.: Causability and ex-
   plainability of artificial intelligence in medicine. WIREs Data Mining and Knowledge
   Discovery 9(4), e1312 (2019)
4. Lecue, F.: On the role of knowledge graphs in explainable AI. Semantic Web 11(1),
   41–51 (2020)
5. Li, H., Dragisic, Z., Faria, D., Ivanova, V., Jiménez-Ruiz, E., Lambrix, P., Pesquita,
   C.: User validation in ontology alignment: functional assessment and impact. The
   Knowledge Engineering Review 34 (2019)
6. Lima, B., Faria, D., Pesquita, C.: Pattern-guided association rule mining for complex
   ontology alignment. In: ISWC 2021 Poster & Demo Track (2021)
7. Oliveira, D., Pesquita, C.: Improving the interoperability of biomedical ontologies
   with compound alignments. Journal of biomedical semantics 9(1), 1–13 (2018)
8. Paulheim, H., Fümkranz, J.: Unsupervised generation of data mining features from
   linked open data. In: Proceedings of the 2nd International Conference on Web In-
   telligence, Mining and Semantics - WIMS ’12. p. 1. ACM Press (2012)
9. Pesquita, C.: Towards semantic integration for explainable artificial intelligence in
   the biomedical domain. In: BIOSTEC 2021. vol. 5, pp. 747–753 (2020)