=Paper=
{{Paper
|id=Vol-3415/paper-3
|storemode=property
|title=Generating Knowledge Graph Based Explanations for Drug Repurposing Predictions
|pdfUrl=https://ceur-ws.org/Vol-3415/paper-3.pdf
|volume=Vol-3415
|dblpUrl=https://dblp.org/rec/conf/swat4ls/OzkanCYED23
}}
==Generating Knowledge Graph Based Explanations for Drug Repurposing Predictions==
<pdf width="1500px">https://ceur-ws.org/Vol-3415/paper-3.pdf</pdf>
<pre>
Generating Knowledge Graph Based Explanations for
Drug Repurposing Predictions
Elif Ozkan1 , Remzi Celebi1 , Arif Yilmaz1 , Vincent Emonet1 and Michel Dumontier1
1
    Institue of Data Science, Maastricht University, Maastricht, The Netherlands


                                         Abstract
                                         Over the past years, computer assisted drug repurposing methods have started to gain more attention
                                         as they offer a faster and a more effective way to treat many diseases. While these methods are quite
                                         promising in terms of power of prediction, the hesitation regarding the use of these methods in practice
                                         still remains due to their highly complex working mechanisms, which limits their interpretability.
                                              Explainable Artificial Intelligence (XAI), which takes transparency, interpretability, informativeness
                                         as its main foundations, could address the limitations of the black-box models. In this context, Knowledge
                                         Graphs (KGs) could leverage the explanations provided to the user in the biomedical domain, as they are
                                         capable of represent relations between the entities in a semantically consistent way. Knowledge Graphs
                                         have the potential to generate graph-based representations, while providing the context, which make it
                                         easily interpretable by humans.
                                              In this paper, we propose an approach, which is a KG based explainable AI framework in the field
                                         of drug repurposing as an extension of the PREDICT Method. The approach is centered on generating
                                         similarity-based explanations by extracting the relevant paths from the input, which consists of a disease
                                         and a predicted drug for the treatment of the disease. To demonstrate the utility of this approach, we
                                         demonstrate how the graphical operations used in the KG could be used to generate plausible explanations,
                                         by conducting a use case on Alzheimer Disease. Our findings suggest that the utilization of biomedical
                                         KGs and this approach has a great potential to provide transparent explanations as it is able to illustrate
                                         the relations between drug, disease entities which are quite relevant to the target input. Application of
                                         this approach to the drug repurposing and to other similar domains, could be helpful to overcome the
                                         limitations caused by the black-box nature of the computational drug repurposing models and could be a
                                         powerful tool to enhance the understanding of decision making process of models and simplify scientific
                                         communication among domain experts and computer scientists.

                                         Keywords
                                         Knowledge Graph, Explainable AI, XAI, drug repurposing


1. Introduction
The advancements in the field of Artificial Intelligence (AI) have been successfully utilized
in the computer-assisted biomedical tasks in the past few years. AI and Machine learning
methods applied in this field bears significant promise for drug discovery and repurposing as
they significantly accelerate and offer new alternatives for the process of treatment [1]. Drug

SWAT4HCLS 2023: The 14th International Conference on Semantic Web Applications and Tools for Health Care and Life
Sciences, February 13–16, 2023, Basel, Switzerland
$ e.yozkan@student.maastrichtuniversity.nl (E. Ozkan); remzi.celebi@maastrichtuniversity.nl (R. Celebi);
a.yilmaz@maastrichtuniversity.nl (A. Yilmaz); vincent.emonet@maastrichtuniversity.nl (V. Emonet);
michel.dumontier@maastrichtuniversity.nl (M. Dumontier)
                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
companies and researchers started to pay more attention to the computational drug repurposing
methods in the recent years in order to find a faster and effective path for the treatment of
COVID-19 [2]. Although the performance of many computational drug repurposing frameworks
are quite promising such as the PREDICT method which utilizes similarity search based on
the principle of "guilt by association" [3], their inner working mechanisms are still seen as a
black-box as the way these frameworks make decisions are not entirely evident [4]. Hence, this
limitation restricts the full adoption process of computational drug repurposing methods by
institutions.
   Explainable AI could play a critical role in addressing to the limitations which are caused
by the highly complex, non-transparent nature of the computational drug repurposing models
and help us understand and interpret the underlying models, in order to mitigate the lack
of interpretability of certain machine learning models and to augment human reasoning and
decision-making. In the biomedical context, alongside the natively interpretable models such
as Random Forests or Decision Trees, Knowledge Graphs (KG), which are semantically rich,
interlinking data structures that formally represents the relationship between different entities,
have started to be leveraged [5].Knowledge Graphs (KGs), graph-based representations of
knowledge, are capable of encoding the complex in the form of structured statement, in the
way that it is human interpretable [6].
   In this paper, our approach, which is a knowledge graph based explainable AI approach
in the specific context of drug repurposing, is proposed. The proposed approach involves
extracting relevant subgraph, given a drug and a disease pair, in order to provide similarity-
based explanations in the form of a knowledge graph as an explainable extension of the PREDICT
method [3] which is based upon the principle of "Guilt by Association" and easily adaptable
to the knowledge graph structure. This method is then evaluated by conducting a case study
on drug candidates, which could potentially treat Alzheimer disease. It was made accessible
to the user as a branch of the OpenPredict [7] model, which is the concrete implementation
of a drug-repositioning framework. The outline of this paper is as follows. A more in depth
information about the related work done by other researches is provided in Section 2, and it is
followed by the methodology which is adopted in this research in Section 3. The results and
discussion are presented in Sections 4 and 5 respectively.


2. Related Work
AI-based drug repurposing is defined as the identification, prediction and evaluation of new use
cases and indications for existing and approved drugs using computational methods, such as Ma-
chine Learning and Deep Learning. One of the most effective computational approaches utilized
in the context of drug repurposing is consideration similarities between entities, specifically by
analyzing the drug and disease based similarities, as well as the their combined similarities.
   In this sense, the PREDICT method presents a framework that provides predictions on novel
associations between desired drugs and diseases [3]. The framework is mainly based on the
‘Guilt by Association’ (GBA) approach which was first proposed by Chiang and Butte, which
involves the measurements of similarities among the known drug and disease to drug-disease
pairs, given a target query drug and disease. The known associations between entities and the
later formed associations are used as features and then are fed into a classification algorithm in
order to provide a final prediction. PREDICT is a quite effective framework as it enables the
incorporation of additional features related to similarity between drugs and diseases. However,
the usability of this effective framework itself holds some limitations as the underlying features
and reasoning behind the final predictions made by the framework, since both predictive and
interpretative features are isolated by the complex classification algorithms [8].
   Augmentation of Knowledge Graphs into AI systems in the biomedical field, specifically for
the drug discovery and repurposing tasks, allows for generating explanations of the system by
providing informed and labeled visualizations by converting knowledge formalization rules and
logic into a form, which is more suitable for human comprehension, to the user. For instance,
Edwards et. al [9]. in their study on Explainable Biomedical recommendations via reinforcement
learning, propose a neuro-symbolic approach which involves the application of multi-hop neural
driven recommendation to complex biomedical knowledge graphs . They conclude that such KG
based approaches has a great potential to generate explanations and improve the performance
of the black-box methods.
   Similarly, Liu et.al. [10] in their study regarding Neural Multi-Hop Reasoning with logical
rules on Biomedical Knowledge graphs propose a novel neuro-symbolic approach PoLo (Policy-
Guided Walks With Logical Rules) that leverages the interpretability and the structure of
Knowledge Graphs to conduct guided policy walks. The experimental findings that they have
found for this specific approach based on KGs, on the use case of drug repurposing of the
novel disease COVID-19, demonstrated that path-based reasoning methods outperform existing
black-box methods on the drug repurposing task as well as providing a natural transparency
mechanism which makes this approach more transparent to the existing black-box methods.
   Moreover, Wang et. al. [11], in their study on discovering the potential reactions of antitumor
drugs adopted a Tumor-Biolink knowledge graph (TBKG) based method which is comprised
of four main steps including (1) graph building, (2) reaction discovery, (3) graph verification,
(4)clinical validation, and in which they explored the relations among tumors, biomarkers and
drugs. It is concluded in the study that the generated knowledge graphs could have successfully
been interpreted and validated by the domain experts and therefore, their approach is capable
of providing explanations and transparency of their reaction discovery process.
   Inherently explainable predictive models such as Decision Trees and Classification Rules as
well as biomedical Knowledge Graphs are utilized for the drug interaction tasks to bring its
explainability to a higher level [12]. Bresso et.al. utilize these simple classification methods
to develop an explainable AI system for investigating drug interactions and they have used
Decision Trees to make predictions from the generated Knowledge Graphs. Along with the
quantitative performance metrics they also conduct qualitative experiments with the domain
experts for explainability, similarly to the clinical validation step in Wang et. al’s study. It
demonstrates that the synthesis of knowledge graphs with inherently explainable prediction
methods provide explainable and comprehensible models to explore activity reactions of drugs.
3. Method
In order to provide an interpretable drug repurposing framework, we developed a knowledge
graph based pipeline. The primary purpose of this pipeline is to generate a knowledge graph
which indicates two types of relationship; similar_to which is the similarity between the drug-
drug and disease-disease pairs, and the treats relationship between a drug and a disease.
   The base information regarding the similarity between drug-drug & disease-disease pairs and
the relations between drug-disease pairs are curated into a dataset which includes the vector
embeddings of 593 drugs obtained from DrugBank and 313 associated diseases from Online
Mendelian Inheritance in Man, (OMIM) databases.
   The overall strategy includes identifying a set of ranked paths through the generated Knowl-
edge Graph that provide plausible explanations for a predicted drug indication based on their
similarities to known drug-disease pairs. The generated explanation is based on a input which
is composed of a desired drug-disease pair. Our approach generates a KG of ranked paths in
three steps : Path Generation, Path Ranking and the Generation of Explanation Graph.
   Path Generation step involves creating a set of paths based on a given input. Each path has
n-length, consisting of two types of relations; similar_to between drug-drug and treats between
drug-disease entities. The level of similarity between two entities, 𝐸1 and 𝐸2 , is obtained by
taking the cosine similarity 𝑆𝐶 of their vector embeddings which is given by :
                                                      𝐸1 · 𝐸2
                                   𝑆𝐶 (𝐸1 , 𝐸2) =                                            (1)
                                                    ‖𝐸1 ‖ ‖𝐸2 ‖
   As the paths of length 𝑛 > 3 are less biochemically relevant and the increasing path lengths
become increasingly difficult to understand, 0, 1 and 2-hop relation paths are used to connect
the drug and the disease.
   Five cases are taken into account during the path formation. As Figure 1 demonstrates,
the treats relation among the known drug-disease pairs are retrieved immediately (drug2-
disease1). For the unknown drug-disease pairs, for instance drug1-disease1, the structural
similarity between the homotypic times should also be considered in order to form an edge
which represents the relation treats. In this specific case, drug1 is similar to drug2, which is
known to be indicated for the treatment of disease1, and similarly, it is known that disease1
is similar to disease2. Therefore, it is possible to form an edge between the entities drug1 and
disease1. However, the plausibility of this edge depends on the similarity scores that the other
existing edges have, and in practice, the number of paths between two predicted entities is quite
high due to the size of the data sets. Therefore, an additional graphical action is needed to rank
the weight of the formed paths and select the most relevant paths to form the treats relation,
based on an input drug-disease pair.
   The path ranking operation is achieved by the adoption of principle of parsimony, which
suggests that explanations with simpler and shorter paths are more relevant compared to the
paths that are longer, and might have relatively less indirect information. Each path formed in
the previous phase are ranked according to their assigned weight. The weight 𝑤𝜋𝑘 , assigned to
each path 𝜋𝑘 is computed by :
                                                  ∑︁
                                         𝑤𝜋 𝑘 =            𝑒𝑖                                  (2)
                                                  𝑒𝑖 ∈𝜋𝑘
Figure 1: Knowledge graph based explanation.


   where the edge weight 𝑒𝑖 is set to 1 in case of a relation of type treats, and to (1 − 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦)
in the edge connects two entities of the same type. Once a weight is assigned to each path in
the KG, they are ranked by their weights in an ascending order. As a last step, top 𝑘 weighted
paths with highest weights are included in the explanation graph. Moreover, to ensure the
relevance and significance of the included entities in the final explanation, as well as simplifying
the graphs to provide more readable explanations, we introduce an additional binary variable
"min_similarity_threshold", which restricts the amount of included entities further, according to
a desired similarity threshold. The restriction process is achieved by taking the entities, which
are in the top 𝑛 percentile of all entities, in terms of their similarity scores to the target. If there
are no entities which satisfy a certain similarity threshold, i.e, the similarity between the found
entities are too weak, an empty explanation graph is returned.


4. Use case
In order to observe the effectiveness of the generated explanation through the Knowledge
Graphs, a case study for Alzheimer Disease (OMIM:104300), which is indicated to carry similar
characteristics with diseases such as dementia and Parkinson’s disease [13], was conducted
through the OpenPredict API, using the PREDICT dataset and model.
  The drug Amandatine is suggested by the PREDICT model as a potential treatment for
Alzheimer’s disease. In order to understand the relation between the drug Amandatine and
Alzheimer Disease, we use this pair as an input to the pipeline as shown in Figure 2.
  Considering the above input, the generated explanation graph, the pipeline first extracts the
individual edges, then augments them into paths, as illustrated in Figure 3.
Figure 2: Amandatine is predicted for the treatment of Alzheimer Disease.


   Amandatine is structurally similar to Donepezil, and Rivastigmine which are directly in-
dicated for the treatment of Alzheimer Disease. It is also shown to be similar to drugs Car-
bidopa, Zonisamide and Haloperidol which treat Parkinson’s Disease, Epilepsy and Dementia
& Schizophrenia , which are similar to Alzheimer, respectively. The generated paths are then
merged into a single knowledge graph, displaying the relationships between the drug-disease
entities as a complete semantic network as shown in Figure 4. The resulting explanation graph
provides plausible explanations as the entities included in the graph are closely related to the
target pair (Amandatine-Alzheimer). For instance, many studies have shown that there is strong
evidence that Parkinson’s Disease and Alzheimer Disease have overlapping similarities in terms
of clinical and neuropathologic features [14] and Carbidopa is indicated for treatment of early
symptoms of Alzheimer [15] [16].


Figure 3: The paths formed by the pipeline given Amantadine-Alzheimer pair as the input


   The min_similarity_threshold is taken as 10 in this case study, considering the availability of
the instances in the data. In order to observe whether the variable min_similarity_threshold
causes loss of information in this specific case study, an alternative explanation is generated
without taking min_similarity_threshold into consideration. The paths formed, that are not
restricted by a certain threshold, turned out to be indeed more populated with entities, as seen
in Figure 5. The result is quite interesting as in this example, min_similarity_threshold indeed
reduced the size of the explanation graph in a way that the entities included in the graph are
more relevant.In Figure 5, Clonidine is shown to be similar to Amandatine. Clonidine is known
to be indicated for the treatment of Gilles La Tourette Syndrome, which is a disease mostly
related to neuropsychiatric movement and typically starts developing from childhood [17]. For
the entity Carbidopa, in comparison with the explanation graph restricted with the similarity
threshold, Multiple Sclerosis, a neuroskeletal disorder [18], is also shown to be similar to the
Figure 4: The final explanation graph.


Alzheimer Disease. In this context, it is possible to say that these diseases are relatively less
related to Alzheimer, therefore the utilization of min_similarity_threshold enabled excluding
less relevant entities.


Figure 5: The paths formed without being restricted by min_similarity_threshold


5. Discussion
The results that have been obtained from the conducted case study demonstrated that the
building semantic connections using Knowledge Graphs could provide meaningful and effective
explanations in the biomedical, specifically drug repurposing, domain. Although this study is
mainly focused on drug repurposing domain, it is intended to show that the proposed pipeline,
which takes the Knowledge Graph structure as its main baseline, is a powerful pipeline to
generate plausible explanations.
  Although, the literary sources and previous studies were taken as a basis to qualitatively
evaluate how effective the proposed pipeline is, in generating explanations, evaluation of
conducted case studies by domain experts could be a further and a more reliable justification.
In this sense, this comes as the main limitation in the evaluation process. Furthermore, another
limitation in terms of applicability in other domains and cases could be that the path ranking
process might be problematic as an edge could have a dominantly large weight, especially, if
the weights are not normalized. Therefore, it would be sensible to consider the alternative path
ranking strategies, such as finding the shortest path in the graph, as well as the one used in this
pipeline. Exclusion of entities with lower similarities through determined thresholds simplify
the outputted knowledge graphs, allowing for easier interpretations by the domain experts. In
order to prevent the possible hindrances, knowledge graphs generated using different thresholds
could be observed.
   Computational drug repurposing methods are still not fully adopted by the institutions due
to the lack of explainability behind the sophisticated methods [1]. In this sense, utilization of
Knowledge Graphs could help domain experts to augment the explanations provided with their
expertise and reasoning to gain more insight on the studied subjects. It could also encourage
considering the drug-disease relations that have not been studied yet as the Knowledge Graph
explanation visualizes not only the entities related to the target but also the entities related to
the intermediate entities along the paths.
   This method has also drawn some challenges that are still yet to be tackled. For instance,
considering more complex relations such as the interaction between the target and the interme-
diate drugs may foster obtaining a deeper level of understanding of the treatment potential of
the target disease by the prospective drug. Another challenge might be the augmentation of
new drug and disease information to the pipeline. The vector embedding conversion is easily
performed as a reproducible strategy is adopted, however augmentation of large information
could cause redundancy and sometimes loss of information due to the larger filtering and
simplification which would be performed in parallel with the increasing search space.
   Overall, the case study conducted on the proposed pipeline is an indicator of the promising
potential of Knowledge Graphs, and semantic operations that come with it, in providing trans-
parent and understandable explanations in the biomedical domain, and the challenges that it
introduces are an incentive to enhance KG-Based Explainable AI methods in the domain.


6. Conclusion
In this work, a knowledge graph based explanation framework is proposed for drug repositioning
task. The proposed approach could be utilized to provide explanations and improve the main
principles of Explainable AI, by providing accountability, reliability and transparency regarding
the decisions that were made through computational methods.
   The proposed framework took the PREDICT method as a baseline in providing the explana-
tions. This way, by enhancing the Guilt by Association strategy that PREDICT method uses by
augmenting KGs, along with several graphical operations, the relations between the related
entities to the given drug-disease pairs are demonstrated as transparent explanations to the
users in the form of structured predicates.
References
 [1] J. Jiménez-Luna, F. Grisoni, G. Schneider, Drug discovery with explainable artificial
     intelligence, Nature Machine Intelligence 2 (2020) 573–584.
 [2] S. Ekins, M. Mottin, P. R. Ramos, B. K. Sousa, B. J. Neves, D. H. Foil, K. M. Zorn, R. C. Braga,
     M. Coffee, C. Southan, et al., Déjà vu: stimulating open drug discovery for sars-cov-2,
     Drug discovery today 25 (2020) 928–941.
 [3] A. Gottlieb, G. Y. Stein, E. Ruppin, R. Sharan, Predict: a method for inferring novel drug
     indications with application to personalized medicine, Molecular systems biology 7 (2011)
     496.
 [4] W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, B. Yu, Definitions, methods, and
     applications in interpretable machine learning, Proceedings of the National Academy of
     Sciences 116 (2019) 22071–22080. doi:https://doi.org/10.1073/pnas.1900654116.
 [5] A. Callahan, N. H. Shah, Machine learning in healthcare, in: Key Advances in Clinical
     Informatics, Elsevier, 2017, pp. 279–291.
 [6] I. Tiddi, S. Schlobach, Knowledge graphs as tools for explainable machine learning: A
     survey, Artificial Intelligence 302 (2022) 103627.
 [7] R. Celebi, J. R. Moreira, A. A. Hassan, S. Ayyar, L. Ridder, T. Kuhn, M. Dumontier, Towards
     fair protocols and workflows: the openpredict use case, PeerJ computer science 6 (2020)
     e281.
 [8] E. Bresso, P. Monnin, C. Bousquet, F.-E. Calvier, N.-C. Ndiaye, N. Petitpain, M. Smaïl-
     Tabbone, A. Coulet, Investigating adr mechanisms with explainable ai: a feasibility study
     with knowledge graph mining, BMC medical informatics and decision making 21 (2021)
     1–14. doi:https://doi.org/10.1186/s12911-021-01518-6.
 [9] G. Edwards, S. Nilsson, B. Rozemberczki, E. Papa, Explainable biomedical recommendations
     via reinforcement learning reasoning on knowledge graphs, 2021. doi:10.48550/ARXIV.
     2111.10625.
[10] Y. Liu, M. Hildebrandt, M. Joblin, M. Ringsquandl, R. Raissouni, V. Tresp, Neural multi-hop
     reasoning with logical rules on biomedical knowledge graphs, in: European Semantic
     Web Conference, Springer, 2021, pp. 375–391. doi:https://doi.org/10.48550/arXiv.
     2103.10367.
[11] M. Wang, X. Ma, J. Si, H. Tang, H. Wang, T. Li, W. Ouyang, L. Gong, Y. Tang, X. He, et al.,
     Adverse drug reaction discovery using a tumor-biomarker knowledge graph, Frontiers in
     genetics 11 (2021) 625659. doi:10.3389/fgene.2020.625659.
[12] E. Bresso, P. Monnin, C. Bousquet, F.-E. Calvier, N.-C. Ndiaye, N. Petitpain, M. Smaïl-
     Tabbone, A. Coulet, Investigating adr mechanisms with explainable ai: a feasibility study
     with knowledge graph mining, BMC medical informatics and decision making 21 (2021)
     1–14. doi:https://doi.org/10.1186/s12911-021-01518-6.
[13] C. Reitz, C. Brayne, R. Mayeux, Epidemiology of alzheimer disease, Nature Reviews
     Neurology 7 (2011) 137–152.
[14] D. P. Perl, C. O. Warren, D. Calne, Alzheimer’s disease and parkinson’s disease: distinct
     entities or extremes of a spectrum of neurodegeneration?, Annals of neurology 44 (1998)
     S19–S31.
[15] C. S. Okereke, L. Kirby, D. Kumar, E. I. Cullen, R. D. Pratt, W. A. Hahne, Concurrent
     administration of donepezil hcl and levodopa/carbidopa in patients with parkinson’s
     disease: assessment of pharmacokinetic changes and safety following multiple oral doses,
     British journal of clinical pharmacology 58 (2004) 41–49.
[16] P. Chopade, N. Chopade, Z. Zhao, S. Mitragotri, R. Liao, V. Chandran Suja, Alzheimer’s
     and parkinson’s disease therapies in the clinic, Bioengineering & Translational Medicine
     (2022) e10367.
[17] E. Jakubovski, K. R. Müller-Vahl, Gilles de la tourette syndrome: symptoms, causes and
     therapy, Psychotherapie, Psychosomatik, Medizinische Psychologie 67 (2017) 252–268.
     doi:https://doi.org/10.1186/s12911-021-01518-6.
[18] N. Ghasemi, S. Razavi, E. Nikzad, Multiple sclerosis: pathogenesis, symptoms, diagnoses
     and cell-based therapy, Cell Journal (Yakhteh) 19 (2017) 1.

</pre>