Comparing Pharmacologic Classes in NDF-RT and SNOMED CT Jonathan Mortensen1 , Olivier Bodenreider∗2 1 Department of Biomedical Engineering, Case Western Reserve University, Cleveland, Ohio, USA 2 National Library of Medicine, Bethesda, Maryland, USA Email: Jonathan Mortensen - Jonathan.Mortensen@case.edu; Olivier Bodenreider∗ - olivier@nlm.nih.gov; ∗ Corresponding author Abstract Background: Clinical decision support systems and semantic mining require interoperable representations of pharmacologic classes across reference terminological systems. We explore two such systems: NDF-RT and SNOMED CT. Methods: We evaluate the overlap of pharmacologic classes in NDF-RT (VA Classes) and SNOMED CT. We compare classes based on the set of their members (drugs) across systems, using the Jaccard coefficient as a measure of overlap between two classes. Results: There is a limited overlap among the two systems. The average Jaccard value is 0.293. Only 11.5% of the VA classes have a Jaccard value of 0.75 or above. Conclusions: The analysis of discrepancies between pharmacologic classes across systems offers a strategy for identifying classes in need of critical review. Due to the heterogeneity of the representation of pharmacologic classes in various terminologies, we recommend that drugs, not classes, be annotated in text for semantic mining purposes. Introduction therapeutic properties of some drugs on angina pectoris. Some classes are also defined in reference to Pharmacologic classes are typically established in several properties, e.g., nitrate vasodilator, referring reference to some of the properties of the active to both the chemical structure of nitrates and their moiety, with respect to chemistry, physiology, relaxing action on the musculature of blood vessels metabolism and therapeutic intent [1]. For example, (physiologic effect). In other words, pharmacologic the classes platelet aggregation inhibitors and classes provide an abstract representation of drug anticoagulants refer to the physiologic effect of drugs properties, useful in the context of clinical decision decreasing platelet aggregation and coagulation, support and for the annotation of biomedical respectively. In contrast, the class cardiac glycoside resources, including clinical text and the biomedical refers to the chemical structure of drugs such as literature. digoxin, while the class antianginal refers to the 116 While interoperability among terminologies is a a basic hierarchy. A drug generally belongs to requirement for clinical decision support, in which only one class. Examples of VA classes include decision support rules are defined in reference to ANTIMALARIALS, of which the clinical drug concepts in various terminologies (e.g., concepts for QUININE SO4 162.5MG TAB is a member. Its drug classes), it is also important that annotations to parent class is ANTIPROTOZOALS. In addition, biomedical entities such as drug classes be consistent there are 425 EPCs (not used in this work). Differing within and across datasets when such datasets are from the VA Classes, the EPCs have a nearly flat exchanged and integrated, as these annotations form hierarchy and are defined in reference to various the basis for knowledge discovery through semantic properties, such as physiologic effect, therapeutic mining. intent, ingredient and mechanism of action. The National Drug File - Reference Terminology The July 11th 2010 Version of NDF-RT was used (NDF-RT) is a drug terminology produced by in the evaluation. the Department of Veterans Affairs in the United SNOMED CT is currently the largest clinical States and is recommended as the standard in e- terminology. It is developed and maintained prescribing systems [2]. Other clinical terminologies by the International Health Terminology Standard such as SNOMED CT also include pharmacologic Development Organization (IHTSDO) [4]. In information. SNOMED CT, the drugs are simply related to The objective of this work is to evaluate the pharmacologic classes through the isa relationship. degree to which annotations to drug classes in For example, there is an isa relationship between the various terminological systems are interoperable, drug Quinine and the class Cinchona antimalarial. with a focus on pharmacologic classes from NDF- The January 31, 2010 Version of SNOMED CT was RT. More specifically, we evaluate the overlap of used in the study. VA classes to those in SNOMED CT. The analysis Related Work. Others have examined many of the classes reveals discrepancies between the two aspects of NDF-RT. [5] investigated the coverage systems and offers a strategy for identifying classes of the Physiologic Effects hierarchy in NDF- in need of critical review. RT. It was found that the physiologic effects category was sufficient for classifying medications. [6] investigated the addition of pharmacogenomics into the hierarchy. [7] applied NDF-RT to mapping Background text from medication lists at the Mayo Clinic In this section, we give a brief presentation of NDF- using the SmartAccess Vocabulary Server. NDF- RT and SNOMED CT and present some related RT covered 97.8% of the concepts found in the work on NDF-RT. medication lists, indicating NDF-RT can be used The National Drug File - Reference Ter- in a clinical setting for medication purposes. [8] minology (NDF-RT) is based upon the National compared NDF-RT to the National Drug File, Drug File, a listing of medications produced by Medicare Part D and a proprietary knowledge base. the Department of Veteran Affairs [3]. It serves It was determined that 76% of the classes from the as a reference standard for a variety of medical three original terminologies were contained in NDF- situations related to drugs and medications. NDF- RT. In recent work, [9] evaluated the correspondence RT is a description logic-based model available in of NDF-RT drugs and classes to RxNorm drugs and OWL and XML formats. It includes 9 “Kinds” classes. As of October 2009, approximately 50% of of information: Cellular or Molecular Interac- the drugs did not correspond between terminologies. tions, Clinical Kinetics, Diseases Manifestations or [10] mapped medications to diseases, showing a Physiologic States, Pharmaceutical Preparations, clear example of how NDF-RT can be applied in Physiological Effects, RxNorm Dose Forms, Ther- clinical decision support situations. As another apeutic Categories, and VA Drug Interactions. The example of clinical applications [11] integrated NDF- Pharmaceutical Preparations hierarchy organizes RT into the process of generating structured product drugs into three categories: Products by Generic labeling. Finally, [12] used the NDF-RT drug classes Ingredient Combination, Products by VA Class and to determine the anti-coagulation status of patients External Pharmacologic Classes (EPC). based on their medication list, demonstrating a first There are 485 VA Drug Classes organized into step in clinical decision support. 117 Our study focuses not on content coverage, but between NDF-RT and SNOMED CT is used to rather on interoperability among systems of drug summarize the external comparison and determine classes in various terminologies, including NDF-RT the overall similarity between the two class systems. and SNOMED CT. These terminologies were chosen To obtain an extension, the clinical drug as a reference because they contain drug hierarchies, members of a drug class were obtained. As opposed are mature, and are widely used. More specifically, to VA (where drug classes are linked directly to we want to assess whether similar sets of drugs are clinical drugs), in SNOMED CT, the ingredients of linked to the same classes in different systems. a class were first obtained, then the clinical drugs As part of the evaluation, we use a concept for those ingredients were obtained using relations alignment technique described by [13]. NDF-RT was in NDF-RT, thus keeping the domain of clinical loaded into a Virtuoso endpoint [14] for SPARQL drugs limited to only NDF-RT. In addition, the drug querying, which allowed for evaluation of the drug members of a class included its drugs and all drugs classes. which were members of any subclasses. For example, the clinical drug QUININE SO4 260MG TAB is linked directly to the VA class ANTIMALARIALS, Methods but is also considered a member of the its To evaluate the drug classes in NDF-RT, we parent class ANTIPROTOZOALS. Using the drug developed an extensional method of evaluation, members, the drug member intersection was found, comparing between VA classes and SNOMED CT comparing the extension of the VA classes to the drug classes. Instead of comparing pharmacologic SNOMED CT classes. classes based on lexical resemblance of their names, we compare the extensions of these classes. The extension of a pharmacologic class is the set Results of drugs a class has as members. The degree to which There are 485 VA and 722 SNOMED CT classes. any two drug classes are similar was determined by To reduce comparisons (and noise), classes which the overlap of their extensions. This is measured by did not have any drug members were removed. the Jaccard Coefficient, There were 95 VA (20%) and 195 SNOMED CT |A ∩ B| (27%) classes without members. Examples of classes J(A, B) = , with no members include INVESTIGATIONAL |A ∪ B| ANTI-TUBERCULAR DRUGS (VA), ANTIFUN- where the intersection is the number of drugs which GALS,TOPICAL OTIC (VA), Antineoplastic alka- are the same between any two classes and the union loid (SNOMED CT) and Corticosteroids used in the is the total number of drugs between any two drug treatment of asthma (SNOMED CT). classes [15]. An example of extension is presented in Among the 15,027 clinical drugs in NDF-RT, Figure 1a. Here, the VA class THROMBOLYTICS 8414 correspond to ingredients also present in and the SNOMED CT class Thrombolytic share SNOMED CT. Examples of clinical drugs specific 6 drugs, including streptokinase, while the drug to NDF-RT include medicinal products from classes drotrecogin is specific to the SNOMED CT class. such as HERBS/ALTERNATIVE THERAPIES The Jaccard value is computed as the cardinality (e.g.,WILD CHERRY BARK PWDR). of the intersection (6) over that of the union of the The extensional comparison was obtained by classes (7), i.e., 0.86. (In actuality, the classes are calculating the overlap between the sets of drug compared, not based on the ingredients, but based members of class pairs and can be summarized by on the clinical drugs they have as members. The the average Jaccard coefficient for all class pairs corresponding ingredients are shown in Figure 1a for between NDF-RT and SNOMED CT. Through their brevity.) average Jaccard value, pairs of pharmacologic class The extension of each VA class is compared to systems can be compared for their overall similarity. that of every class in SNOMED CT. For a given VA The average Jaccard value is 0.293, indicating class, the SNOMED CT class for which the highest limited overlap overall between drug extensions Jaccard value is found is selected as the best match. across the two class systems. The average Jaccard of the pairwise comparisons The distribution of the average highest Jaccard 118 (a) (b) Figure 1: (a) Comparison of the extensions of the VA class THROMBOLYTICS and the SNOMED CT class Thrombolytic, (b) Distribution of Jaccard (highest per class) value for the VA classes is shown in Figure hydroxide. Conversely, solutions of glycerol are 1b. Very few classes exhibit complete overlap classified as osmotic laxatives in SNOMED CT, but (Jaccard = 1.0). Examples include the VA class as LAXATIVES, RECTAL in NDF-RT. DIRECT RENIN INHIBITOR and the SNOMED CT class Renin Inhibitor. This particular class contains the clinical drugs corresponding to only Discussion one ingredient, aliskiren. The proportion of VA Overlap among Class Systems classes with a Jaccard value of 0.75 or above is The similarity among the two pharmacologic class 11.5%. For example, the Jaccard value for the systems under investigation (VA and SNOMED overlap between the VA class THROMBOLYTICS CT) is relatively limited. The average Jaccard and the SNOMED CT class thrombolytic is 0.86. As values among classes based on shared drugs is shown in Figure 1a, the clinical drugs corresponding 0.293. The following reasons can be proposed as to six ingredients are common to both the VA class an explanation, in addition to sheer differences and the SNOMED CT class. These ingredients in classification and discrepancies illustrated in are alteplase, anistreplase, reteplase, streptokinase, the section above. In some cases, there is no tenecteplase and urokinase. Additionally, SNOMED equivalent class in SNOMED CT for a given CT also lists drotrecogin as a member of the class VA class, especially for high-level aggregation thrombolytic, although the indications for this drug classes (e.g., BLOODPRODUCTS / MODIFIERS seem to be limited to severe sepsis. / VOLUME EXPANDERS ), residual classes (e.g., Finally, 75.6% of the VA classes have a CARDIOVASCULAR AGENTS, OTHERS ) and Jaccard value lower than 0.5. For example, classes specific to topical forms (e.g., BETA- the Jaccard value for the overlap between the BLOCKERS, TOPICAL OPHTALMIC ). Another VA class HYPEROSMOTIC LAXATIVES and the reason is that partially overlapping classes are SNOMED CT class osmotic laxatives is only 0.16. defined using different classificatory criteria. For While clinical drugs corresponding to the ingredients example, ophtalmic forms of beta-blockers such lactulose and magnesium sulfate are common to as TIMOLOL MALEATE 0.5% GEL, OPH are both classes, many clinical drugs found in the VA classified as BETA-BLOCKERS, TOPICAL OPH- class are not in the SNOMED CT class (e.g., other TALMIC in NDF-RT and as anti glaucoma agent magnesium salts such as magnesium biphosphate in SNOMED CT. While the former class only and magnesium hydroxide). Interestingly, clinical contains beta-blockers, the latter includes a wider drugs corresponding to magnesium hydroxide are range of products (e.g., apraclonidine). Another part of a different SNOMED CT class, saline difference between the two class systems is that 119 the pharmacologic class is a property of the clinical corresponding classes can be added automatically drug for the VA classes, whereas it is inherited in reference to the most useful pharmacologic through the ingredient for SNOMED CT classes. class system in a particular context. Annotations For example, injectable forms of acetylcysteine to another pharmacologic class system can be are classified as ANTIDOTES/DETERRENTS, recomputed from the ingredients in case of reuse of OTHERS in NDF-RT, while topical solutions (e.g., these resources for a different purpose. for inhalation) are classified as MUCOLYTICS. In contrast, all forms of this drug are classified as both drugs used in the treatment of paracetamol poisoning and mucolytic agent in SNOMED CT. Finally, we also found a limited number of errors, such as the classification of the antibacterial drug Limitations and Future Work NORFLOXACIN 0.3% SOLN, OPH as BETA- BLOCKERS, TOPICAL OPHTALMIC in NDF- There are a few limitations to this work. The RT. evaluation was only a quantitative evaluation, comparing the two terminologies. It was not an evaluation of the clinical quality or the use of NDF- Classes without Drug Members RT in a clinical situation. In addition, the domain of drugs used in the comparison was only clinical One particular difference between the two pharma- drugs in NDF-RT, as we assumed the clinical drugs cologic class systems is the number of classes for to be complete. No comparisons were done at which there is no drug in NDF-RT. (These classes the ingredient level. Because of this, we obtained were omitted from our statistics). These differences all the NDF-RT clinical drugs of an ingredient have different causes in different systems. For VA from the terminology. Some classes that have no classes, most classes with no drugs correspond to drugs members may have had ingredients; however, investigational drugs. In contrast, in SNOMED these ingredients were either not present in NDF- CT, such classes correspond essentially to classes for RT or they did not have clinical drugs associated which the corresponding medicinal products are out with them, resulting in the class not having drug of the scope of NDF-RT, including blood products members. (e.g., Red cells - irradiated ), dietary products (e.g., Gluten free food product) and various prescribable For a complete evaluation of NDF-RT, the entities (e.g., Sterile maggots). external pharmacologic classes (EPCs) will be included in future work. To leverage these classes, they first must be enriched with drugs. A technique Consequences for Semantic Mining to do such an operation has been piloted by [12], As the sets of drugs available in terminological which utilizes a description logics based classifier to systems vary considerably across systems, with classify drugs into EPCs. minimal overlap among them, annotation of the In addition to the extensional approach used in literature directly with classes from a given system this study, we would like to explore an intensional is likely to result in annotated datasets that will approach to comparing the classes, leveraging not be interoperable, and whose annotations will be synonymy relations in the Unified Medical Language difficult to reconcile. Even if some terminologies System (UMLS). In practice, classes across systems such as SNOMED CT and NDF-RT tend to could be mapped through the UMLS and the provide good coverage of clinical drugs, their overlap extensions of equivalent classes could be compared. with other terminologies in terms of pharmacologic classes remains limited. Finally, this work may be considered a class In practice, a better option for semantic mining is centric approach, focused around drugs associated to annotate drugs rather than pharmacologic classes. with classes. Future work will include a drug centric Drugs names are relatively standard (at least at approach, which focuses on classes associated with the ingredient level) and integration resources such drugs. More specifically, we will study the set of as RxNorm are already available. Once resources pharmacologic classes associated with a given drug have been annotated at the ingredient level, the in different pharmacologic class systems. 120 Conclusions 3. Lincoln MJ, Brown SH, Nguyen V, Cromwell T, Carter J, By using an automated method of comparing Erlbaum M, Tuttle M: US Department of Veterans Affairs enterprise reference terminology strategic classes using drug class extensions, inconsistencies overview. Medinfo 2004, 11(Pt 1):391–395. between terminologies were discovered. These 4. SNOMED CT (Systematized Nomenclature of inconsistencies serve as an indicator for possible Medicine-Clinical Terms)[http://www.ihtsdo.org/ review. The automated method of pairwise class snomed-ct/]. member comparison complements standard lexical 5. Rosenbloom ST, Awad J, Speroff T, Elkin PL, Rothman matching and can serve as an additional quality R, III AS, Peterson J, Bauer BA, Wahner-Roedler DL, assurance tool for terminologies. This methodology Lee M, et al.: Adequacy of representation of the National Drug File Reference Terminology Phys- sets a framework for pairwise comparison of iologic Effects reference hierarchy for commonly drug classes between terminological systems using prescribed medications. In AMIA Annu Symp Proc only their drug members. Finally, due to the 2003:569–78. heterogeneity of the representation of pharmacologic 6. Chute CG, Carter JS, Tuttle MS, Haber M, Brown classes in various terminologies, we recommend that SH: Integrating pharmacokinetics knowledge into a drug ontology as an extension to support drugs, not classes, be annotated in text for semantic pharmacogenomics. In AMIA Annu Symp Proc mining purposes. 2003:170–4. 7. Brown SH, Elkin PL, Rosenbloom ST, Husser C, Bauer BA, Lincoln MJ, Carter J, Erlbaum M, Tuttle MS: VA Competing interests National Drug File Reference Terminology: a cross-institutional content coverage study. Medinfo The authors declare that they have no competing 2004, 11(Pt 1):477–81. interests. 8. Carter JS, Brown SH, Bauer BA, Elkin PL, Erlbaum MS, Froehling DA, Lincoln MJ, Rosenbloom ST, Wahner- Roedler DL, Tuttle MS: Categorical information in Authors’ contributions pharmaceutical terminologies. In AMIA Annu Symp Proc 2006:116–20. Jonathan Mortensen and Olivier Bodenreider con- 9. Pathak J, Chute CG: Analyzing categorical informa- ceived and designed the study. Jonathan Mortensen tion in two publicly available drug terminologies: acquired the data and performed the analysis and RxNorm and NDF-RT. Journal of the American interpretation of the data. Both authors contributed Medical Informatics Association 2010, 17(4):432–439. to the redaction of the manuscript and approved its 10. Burton MM, Simonaitis L, Schadow G: Medication final version. and Indication Linkage: A Practical Therapy for the Problem List? In AMIA Annual Symposium Proceedings 2008:86–90. 11. Schadow G: Structured product labeling improves Acknowledgements detection of drug-intolerance issues. Journal of This research was supported in part by the Intramural the American Medical Informatics Association 2009, Research Program of the National Institutes of Health 16(2):211–219. (NIH), National Library of Medicine (NLM) in addition 12. Bodenreider O, Mougin F, Burgun A: Automatic to the Choose Ohio First Scholarship Program. determination of anticoagulation status with NDF-RT. In Proceedings of the 13th ISMB’2010 SIG meeting ”Bio-ontologies” 2010:140–143. 13. Bodenreider O, Burgun A: Aligning knowledge References sources in the UMLS: methods, quantitative 1. Carter JS, Brown SH, Erlbaum MS, Gregg W, Elkin results, and applications. In Stud Health Technol PL, Speroff T, Tuttle MS: Initializing the VA Inform, Volume 107(Pt 1) 2004:327–31. medication reference terminology using UMLS metathesaurus co-occurrences. In Proceedings of the 14. Virtuoso Universal Server[http://virtuoso. AMIA Symposium 2002:116–20. openlinksw.com/]. 2. Miller RA: Clinical Decision Support and Elec- 15. Jaccard P: Étude comparative de la distribution tronic Prescribing Systems: A Time for Respon- florale dans une portion des Alpes et des Jura. sible Thought and Action. Journal of the American Bulletin de la Société Vaudoise des Sciences Naturelles Medical Informatics Association 2005, 12(4):403–409. 1901, (37):547–579. 121