=Paper=
{{Paper
|id=Vol-1428/BDM2I_2015_paper_10
|storemode=property
|title=Formalizing Knowledge and Evidence about Potential Drug-drug Interactions
|pdfUrl=https://ceur-ws.org/Vol-1428/BDM2I_2015_paper_10.pdf
|volume=Vol-1428
|dblpUrl=https://dblp.org/rec/conf/semweb/SchneiderBRCHMN15
}}
==Formalizing Knowledge and Evidence about Potential Drug-drug Interactions==
Formalizing knowledge and evidence about potential drug-drug interactions Jodi Schneider1 , Mathias Brochhausen2 , Samuel Rosko1 , Paolo Ciccarese36 , William R. Hogan4 , Daniel Malone5 , Yifan Ning1 , Tim Clark6 , and Richard D. Boyce1 1 Department of Biomedical Informatics, University of Pittsburgh jos188, scr25, yin2, rdb20@pitt.edu 2 Department of Biomedical Informatics, University of Arkansas for Medical Sciences mbrochhausen@uams.edu 3 Innovation Lab at PerkinElmer, paolo.ciccarese@gmail.com 4 University of Florida, hoganwr@ufl.edu 5 College of Pharmacy, The University of Arizona, malone@pharmacy.arizona.edu 6 Massachusetts General Hospital and Harvard Medical School tim clark@harvard.edu Abstract. Potential drug-drug interactions (PDDI) are a significant source of preventable drug-related harm. One contributing factor is that there is no standard way to represent PDDI knowledge claims and asso- ciated evidence in a computable form. The research we present in this paper addresses this problem by creating a new version of the Drug In- teraction Knowledge Base, with scalable, interlinkable repositories for PDDI evidence and PDDI knowledge claims. Keywords: Linked Data, drug-drug interactions, evidence bases, Mi- cropublications, Nanopublications, knowledge bases 1 Introduction A challenging area of focus for patient safety is the management of potential drug-drug interactions (PDDIs). These are defined as co-prescription or co- administration of two drugs known to interact, which potentially exposes the patient to adverse drug events [9]. PDDIs are a significant source of preventable drug-related harm: according to a recent review, clinically important events at- tributable to PDDI exposure occur in 5.3% to 14.3% of inpatients, and are re- sponsible for 0.02% to 0.17% of the 129 million emergency department visits that occur each year [12].1 Unfortunately, most drug information sources disagree sub- stantially in their guidance about specific PDDIs [1, 16, 13, 2]. Addressing this is urgent as United States healthcare organizations consider PDDI screening in their strategies to achieve the effective use of electronic health records. 1 http://www.cdc.gov/nchs/fastats/ervisits.htm There are both technical and social factors underlying the disagreement that exists across drug information sources [14]. As Figure 1 shows, evidence that might be relevant for establishing PDDI knowledge claims is distributed across several sources including product labeling, the scientific literature, and case re- ports. Each source provides complementary evidence that editors of drug infor- mation resources (public sources [2] or proprietary sources such as Micromedex, Epocrates, and Medscape) must synthesize. A major social factor underlying dis- agreement is that drug information editors have different criteria for assessing evidence. Fortunately, two different conference series have brought leading drug information editors to discuss a standard set of methods for assessing evidence [14, 10]. Pre$market)studies Post$market)studies Clinical) experience Rarely(reported(in Reported(in Reported(in Rarely(reported(in Scientific) literature Rarely(reported(in Product)labeling Source(for Source(for Drug)Compendia) )synthesize)PDDI)evidence)into) knowledge)claims but • May)fail)to)include) important) evidence • Disagree)if)specific) evidence)items) can)support) or)refute)PDDI)knowledge)claims Fig. 1. Editors of drug information resources might seek evidence for or against poten- tial drug-drug interactions from numerous sources. Different information is reported in each type of source, making synthesis necessary. A major technical factor yet to be addressed is that there currently does not exist a standard way to represent PDDI knowledge claims and associated evidence in a computable form. As a result, drug information editors resort to ad hoc information retrieval methods that can yield different sets of evidence to assess [10]. The research we present in this paper addresses this problem by creating scalable, interlinkable repositories for both PDDI evidence and PDDI knowledge claims. This paper describes our new approach. In Section 2, we outline requirements. In Section 3, we discuss the technical details. In Section 4, we present a benchmark analysis that tests the ability of the new approach to scale. After discussion, we conclude the paper. 2 Background and requirements In prior work, we created the original Drug Interaction Knowledge Base (DIKB- old) [3, 4]. The DIKB2 is an evidence-focused knowledge base designed to support pharmacoepidemiology and clinical decision support. It contains quantitative and qualitative knowledge claims about drug mechanisms and pharmacokinetic drug-drug interactions for over 60 drugs. Prior work on the DIKB-old focused on development of an evidential ap- proach representing the evidence associated with a scientific claim. The system considers the evidence board as a socio-technical reasoning system that manages both a knowledge base and an evidence base. The knowledge base holds PDDI knowledge claims while the evidence base stores information artifacts that can be used to support or challenge those claims. PDDI knowledge claims may be direct (e.g., “drug X interacts with drug Y”); or inferred from pharmacological properties (e.g.,“drug X inhibits enzyme Q which is important for the clearance of drug Y from the body”). In prior work [3, 4], all evidence was collected and entered by an evidence board consisting of an informaticist and a minimum of two drug-experts. The board used the following process to manage the evidence and knowledge base components: 1. All members of the board select drugs of interest. This determines the set of PDDI knowledge claims to be investigated. 2. The informaticist conducts a systematic search for evidence that might sup- port or refute the pre-determined PDDI knowledge claims. 3. Retrieved items are filtered by applying study inclusion criteria. 4. Evidence items that meet inclusion criteria are entered into the system where they are linked to specific PDDI knowledge claims and any evidence use assumptions (knowledge claims that must be true for the evidence to hold). 5. A truth value for each knowledge claim is determined based on belief criteria. Experience with the DIKB-old revealed a great need for improvements to the system that would make this process more efficient. First, a substantial amount of time was spent on reconciling and integrating information from various sources (Figure 1). Decision rationales were not recorded in a computable form and the evidence board did not have a process in place to keep up with relevant new evidence. Furthermore, the DIKB-old was ontologically informal, failed to adopt common biomedical ontology terms3 , and did not distinguish drug and enzyme classes from individuals. This hindered automated reasoning that integrated ex- ternal knowledge sources and resulted in treating PDDIs the same as observed 2 When we do not need to distinguish between the old (‘DIKB-old’) and new (‘new DIKB’) versions, we simply mention ‘DIKB’. 3 For example, the DIKB-old used the predicate ‘substrate of’ to represent the metabolic process of xenobiotic catalysis. However, this predicate was defined with- out reference to the formally defined biological process (e.g, such as that provided by the Gene Ontology). drug-drug interactions. In the new system we wished to resolve these issues. We also wished to retain the ability to compute with a logical representation of drug mechanism knowledge claims, using a rule-based theory of how to infer PDDIs from metabolic mechanistic knowledge of how drugs interact [4]. We summarize these as three requirements for the new DIKB: R1 Create a maintainable structure that supports evidence entry of data, meth- ods, and materials from multiple sources on an ongoing basis. R2 Create computable, logical representations of drug mechanism knowledge claims. R3 Link to biological processes while also carefully distinguishing between a drug drug interaction (an actual occurrence in a patient) and a potential drug drug interaction (an information content entity that may exist because of an observation or inference). Our approach to addressing these criteria are as follows: Addressing R1 We adopt the emerging Micropublications (MP) [6] model for literature in- tegration using ‘argument graphs’ to represent published claims as formal assertions linked to primary data and resources. Addressing R2 We extended the MP ontology to add two new properties, MP:formalizedAs/MP:formalizes, to enable natural language claims to be linked to useful logical formalizations. Addressing R3 To stress that potential drug drug interactions are information artifacts, we use a new ontology called DIDEO [5] which has several advantages. DIDEO: (a) Reuses identifiers from existing ontologies (e.g., CHEBI, PRO) that rep- resent biological entities and processes; (b) Differentiates between the representation (statements about drugs and drug-drug interactions) and the represented (actual drug-drug interac- tions); (c) Prevents unwanted existential import (further explained in Section 3.3 below); and (d) Distinguishes between the type of a drug or enzyme and portions of a specific drug or enzyme, by using punning. 3 Technical implementation 3.1 Create a maintainable structure that supports evidence entry of data, methods, and materials from multiple sources We used micropublications to create a structure that supports evidence entry of data, methods, and materials from multiple sources [15]. We now represent PDDI knowledge claims and supporting evidence as queryable RDF statements4 con- structed using the Micropublication ontology (MP) [6]. PDDI knowledge claims 4 Queryable at http://purl.org/net/nlprepository/swat-4-med-safety-sparql- endpoint and evidence were transformed from the DIKB-old model into the new one using Python scripts. Drug identifiers were converted to ChEBI identifiers to enable the use of DIDEO. So far, the mapping has been completed for 70% of the drugs that had data from clinical studies or mechanistic experiments. We envision that additional DIKB micropublications could be created by multiple parties, including evidence boards, and potentially the original authors, as we describe in Section 5. Figure 2 shows the generic form of a DIKB micropublication graph using the example erythromycin - simvastatin interaction. Notice that MP has rigor- ously defined ontology classes that support the DIKB evidence curation process discussed above. In MP, the primary object of interest is the claim. A claim is supported by methods, materials, and data: – MP:Claim, a text string representing a scientific claim. – MP:Method, representing a scientific method. – MP:Materials, for materials, such as study participants and drugs. – MP:Data, such as the area under the concentration curve (AUC). These are used for entering evidence and later, the evidence is used to determine truth values for the claims that the evidence supports. obo:CHEBI_48923 obo:CHEBI_9150 obo:DIDEO_00000000 mp:qualifiedBy mp:qualifiedBy mp:qualifiedBy MP mp:argues 1 Claim erythromycin increases the AUC of simvastatin 1 mp:supports Data 1 mp:supports mp:supports http://dx.doi.org/ Method 10.1016/ 1 S0009-9236(98)90151-5 mp:supports Materials 1 Fig. 2. DIKB micropublication graph for the erythromycin - simvastatin interaction. The process for managing the evidence base and knowledge base described in Section 2 includes assessing the truth value of each PDDI knowledge claim us- ing belief criteria. Operationally, the evidence board uses labels from a taxonomy of evidence types5 to tag each evidence item as it is entered into the evidence base. The board then decides which evidence types are credible for specific types of PDDI knowledge claims: this specifies a belief criterion. As an example, the evidence board might decide that, to support a claim that a drug is a substrate of an enzyme, only clinical drug-drug interaction studies 5 http://purl.org/net/drug-interaction-knowledge-base/evidence-types- and-inclusion-criteria are admissible. This would become a belief criterion for all ‘substrate of’ claims. To implement a belief criterion in the new DIKB, the evidence base is queried to find all PDDI knowledge claims that have at least one supporting evidence item meeting the criterion. The resulting claims are assigned the value of ‘True’. 3.2 Create computable, logical representations PDDI knowledge claims mention specific entities such as drugs, drug metabolites, enzymes, and biological pathways whose relationships with each other are more generally modeled in a rule-based theory that infers PDDIs [4]. Sources external to the DIKB provide additional formalized knowledge about these entities. For example, the Gene Ontology provides cellular location and molecular function for the enzyme CYP3A4; this is relevant when the evidence board seeks information about gene expression and about enzyme metabolization. The spans of unstructured text in MP:Claim are not inherently computable entities, and the semantic qualifiers (MP:qualifiedBy) cannot specify the order (i.e. separate the object drug from the precipitant drug). Therefore, we extended the MP ontology to add two new properties, MP:formalizedAs/MP:formalizes, that enable natural language claims represented as MP:Claim to be linked to their logical representation. RDF is also the language chosen for the formalization of MP:Claim resources, so that a single query language (i.e., SPARQL) can be used to retrieve infor- mation from the whole evidence base. We chose to represent the logical form of knowledge claims using OWL for two reasons. First, OWL provides classes and properties that enable the representation of logical statements in RDF. Second, logical statements written in OWL can be checked for logical consistency and new inferences by a reasoner such as Hermit [7]. We chose to formalize claims using the Nanopublication (NP) [8] ontology because: 1. NP provides a class called NP:Assertion that can hold any RDF graph, including logical statements written in OWL. 2. OWL logical statements stored as an NP:Assertion can be integrated into full nanopublications that combine the NP:Assertion, the provenance of the assertion, and the provenance of the nanopublication into a single publishable and citable entity. A nanopublication represents the logical structure of a claim as an RDF graph. Like micropublications, nanopublications are publishable and citable enti- ties. Their citability and use of provenance enable us to make the evidence review process transparent and auditable. The uptake of nanopublications by the wider community suggests that nanopublication is a relevant publishing mechanism for reconsumption by others.6 Unlike micropublications, nanopublications have no 6 One measure of uptake is the variety of authors of papers using nanopublications; see the bibliography at http://nanopub.org/wordpress/?page_id=638. Another is the size and geographic distribution of current nanopublication datasets: see [11] Table 1 and Figure 3, respectively. explicit evidence structure and do not support claim conflict. They are therefore complementary to micropublications, which provide these missing features. 3.3 Handling reasonable extrapolation Reasonable extrapolation is an important way to infer a PDDI. In contrast to drug-drug interactions (DDIs) that are based on observing an actual drug-drug interaction in some patient, inferred PDDIs based on reasonable extrapolation might not actually occur in reality. Since we do not know whether a PDDI occurs, we cannot assume the existence of any instance of a drug interaction. To model this correctly, we differentiate between actual drug interactions and statements about PDDIs using the DIDEO ontology [5]. catalyzes6a6Phase6I6or6 molecularly6 Phase6II6enzymatic6 erythromycin decreases6activity CYP3A4 reaction6involving simvastatin obo:CHEBI_48923 obo:RO_0002449 obo:PR_000006130 obo:DIDEO_00000096 obo:CHEBI_9150 inhibits